Out on a data storage market limb: Six predictions for 2015

An IT industry analyst article published by SearchStorage.

Our crystal ball tells us this will be a year of change for the data storage market.

With another year just getting underway, we here at Taneja Group felt we needed a few analyst predictions to get things off on the right foot. The easiest predictions, and often the most likely ones, are that things will continue mostly as they are. But what fun is that? So, like any good fortune teller, we held hands around a crystal ball, gathered our prescient thoughts and with the help of the storage spirits came up with these six predictions for change in the data storage market for 2015.

  1. The overall traditional storage market will stay relatively flat despite huge growth in big data and the onrushing Internet of Things. Most new big data will be unstructured and big data architectures like Hadoop will still tend to leverage DAS for storage. In addition, many big data players are pushing the data lake or hub concept to land even bigger chunks of other enterprise data on big data clusters. While we do see some salvation in this space from vendors […] that enable big data analysis to leverage traditional enterprise storage, it won’t be enough to make a big dent in 2015.We’ve also noticed that many storage shops have yet to take advantage of the emerging capacity optimizations now available (e.g., thin provisioning, linked clones, global deduplication, inline compression and so on) in recent versions of competitive arrays that are becoming table stakes for new acquisition decisions. Hybrid arrays, in particular, are bringing flash-enabling space efficiencies across their full complement of storage tiers, and most arrays these days are turning at least hybrid.
  2. Speaking of flash, there are too many all-flash array (AFA) vendors and not enough differentiation. During 2012/2013 the first AFA vendors had the market to themselves, but with all the big players rolling out full-fledged flash offerings, opportunities are declining. With [many vendors] all pushing their own solutions (both AFA and hybrid), the remaining independent vendors will have a harder time finding a niche where they can survive. We also expect to see a new round of very high-end performance storage architectures in 2015[…]As a related trend, we anticipate that hybrid-based Tier-1 arrays will lose ground to AFAs in general, as the cost of flash drops and flash performance proves valuable to most if not all Tier-1 I/O. In virtualization environments, this trend will be somewhat lessened by the rise in popularity of server-side flash and/or memory caching/tiering solutions.
  3. Data protection and other add-on storage capabilities will become more directly baked into storage solutions. We expect to see more traditional arrays follow the examples of

…(To read the complete six item prediction see as-published article over there)

New choices bring enterprise big data home

An IT industry analyst article published by SearchDataCenter.

Enterprises recognize the tantalizing value of big data analytics, but traditional concerns about data management and security have held back deployments — until now.

When big data practices come to your organization, it’s all about location, location, location.

I’ve heard recently from a bunch of big-data-related vendors that are all vying to gain from your sure-to-grow big data footprint. After all, big data isn’t about minimizing your data set, but making the best use of as much data as you can possibly manage. That’s not a bad definition of big data if you are still looking for one. With all this growing data, you will need a growing data center infrastructure to match.

This big data craze really got started with Apache Hadoop’s Distributed File System (HDFS), which unlocked the vision of massive data analysis based on cost-effective scale-out clusters of commodity servers using relatively cheap local attached disks. Hadoop and its ecosystem of solutions let you keep and analyze all kinds of data in its natural raw low-level form (i.e., not fully database structured), no matter how much you pile up or how fast it grows.

The problem is that once you get beyond, err, data science projects, old familiar enterprise data management issues return to the forefront, including data security, protection, reliability, operational performance and creeping Opex costs.

While Hadoop and HDFS mature with each release, there are still a lot of gaps when it comes to meeting enterprise requirements. It turns out that those commodity scale-out clusters of direct-attached storage (DAS) might not actually offer the lowest total cost of ownership when big data lands in production operations…

…(read the complete as-published article there)

Figuring out the real price of flash technology

An IT industry analyst article published by SearchSolidStateStorage.

Sometimes comparing the costs of flash arrays is an apples-to-oranges affair — interesting, but not very helpful.

We’re often told by hybrid and all-flash array vendors that their particular total cost of ownership (TCO) is effectively lower than the other guy’s. We’ve even heard vendors claim that by taking certain particulars into account, the per-gigabyte price of their flash solution is lower than that of spinning disk. Individually, the arguments sound compelling; but stack them side by side and you quickly run into apples-and-oranges issues.

Storage has a lot of factors that should be profiled and evaluated such as IOPS, latency, bandwidth, protection, reliability, consistency and so on, and these must match up with client workloads with unique read/write mixes, burstiness, data sizes, metadata overhead and quality of service/service-level agreement requirements. Standard benchmarks may be interesting, but the best way to evaluate storage is to test it under your particular production workloads; a sophisticated load gen and modeling tool like that from Load DynamiX can help with that process.

But as analysts, when we try to make industry-level evaluations hoping to compare apples to apples, we run into a host of half-hidden factors we’d like to see made explicitly transparent if not standardized across the industry. Let’s take a closer look…

…(read the complete as-published article there)

A Shift To Enterprise Features For Big Data Solutions: Notes from Strata/Hadoop World NY 2014

(Excerpt from original post on the Taneja Group News Blog)

I had a blast last week at Strata/Hadoop World NY 2014. I got a real sense that the mass of big data sponsors/vendors are finally focusing on what it takes to get big data solutions into production operations. In fact, in one of the early keynotes it was noted that the majority of the attendees were implementing software engineers and not necessarily analytical data scientists. Certainly there was no shortage of high profile use cases bandied about and impressive sessions on advanced data science, but on the show floor much of the talk was about making big data work in real world data centers.

I’ll certainly be diving into many of these topics more deeply, but here is a not-so-brief roundup of major themes culled from the 20+ sponsors I met with at the show:

…(read the full post)

Flash runs past read cache

An IT industry analyst article published by SearchDataCenter.

Just because you can add a cache doesn’t mean you should. It is possible to have the wrong kind, so weigh your options before implementing memory-based cache for a storage boost.

Can you ever have too much cache?

[Cache is the new black…] As a performance optimizer, cache has never gone out of style, but today’s affordable flash and cheap memory are worn by every data center device.

Fundamentally, a classic read cache helps avoid long repetitive trips through a tough algorithm or down a relatively long input/output (I/O) channel. If a system does something tedious once, it temporarily stores the result in a read cache in case it is requested again.

Duplicate requests don’t need to come from the same client. For example, in a large virtual desktop infrastructure (VDI) scenario, hundreds of virtual desktops might want to boot from the same master image of an operating system. In a cache, every user gets a performance boost and saves the downstream system from a lot of duplicate I/O work.

The problem with using old-school, memory-based cache for writes is if you lose power, you lose the cache. Thus, [unless with battery backup] it is used only for read cache. Writes are set up to “write through” — new data must persist somewhere safe on the back end before the application continues.

Flash is nonvolatile random access memory (NVRAM) and is used as cache or as a tier of storage directly…

…(read the complete as-published article there)

Commodity storage has its place, but an all-flash architecture thrills

An IT industry analyst article published by SearchSolidStateStorage.

Some IT folks are trying to leverage commodity servers and disks with software-implemented storage services. But others want an all-flash architecture.

Every day we hear of budget-savvy IT folks attempting to leverage commodity servers and disks by layering on software-implemented storage services. But at the same time, and at some of the same datacenters, highly optimized flash-fueled acceleration technologies are racing in with competitive performance and compelling price comparisons. Architecting IT infrastructure to balance cost vs. capability has never been easy, but the potential differences and tradeoffs in these storage approaches are approaching extremes. It’s easy to wonder: Is storage going commodity or custom?

One of the drivers for these trends has been with us since the beginning of computing: Moore’s famous law is still delivering ever-increasing CPU power. Today, we see the current glut of CPU muscle being recovered and applied to power up increasingly virtualized and software-implemented capabilities. Last year, for example, the venerable EMC VNX line touted a multi-year effort toward making its controllers operate “multi-core,” which is to say they’re now able to take advantage of plentiful CPU power with new software-based features. This trend also shows up in the current vendor race to roll out deduplication. Even if software-based dedupe requires significant processing, cheap extra compute is enabling wider adoption.

In cloud and object storage, economics trump absolute performance with capacity-oriented and software-implemented architectures popping up everywhere. Still, competitive latency matters for many workloads. When performance is one of the top requirements, optimized solutions that leverage specialized firmware and hardware have an engineered advantage.

For maximum performance, storage architects are shifting rapidly toward enterprise-featured solid-state solutions. Among vendors, the race is on to build and offer the best all-flash solution…

…(read the complete as-published article there)

Flash Gets Faster: Avalanche Technology Intends To Dominate

(Excerpt from original post on the Taneja Group News Blog)

Flash Summit 2014 runs this week, and we expect to hear a lot about how flash is poised to take over the IT storage world. One of the key new entrants to keep an eye out for is Avalanche Technology. While they are ultimately aiming to develop STT-MRAM solutions that could totally change the way IT infrastructure works, right now they are busy rolling out their new “solid-state” first array built from the ground up to intimately leverage available NAND flash.

…(read the full post)

Five VM-Level Infrastructure Adaptations — Virtualization Review

An IT industry analyst article published by Virtualization Review.

Infrastructure is evolving for the better, making the job of the admin easier in the long run. Here are five ways it’s evolving to work at the VM level.

article_adaptations-of-the-infrastructureIt used to be that IT struggled to intimately understand every app in order to provide the right supporting infrastructure. Today, server virtualization makes the job much easier, because IT can now just cater to VMs. By working and communicating at the VM level, both app owners and infrastructure admins stay focused, using a common API to help ensure apps are hosted effectively and IT runs efficiently.

But the virtual admin still has to translate what each VM requires, going beyond direct-server resources into the specialized domains of other IT infrastructure silos. While silos have traditionally pooled rare expertise to optimize expensive resources, in today’s virtualized world, silos seem to offer more friction than leverage. Here are five ways infrastructure is evolving to work at the VM level.

  1. TAKE 1
    VM-Centric Storage.

…(read the complete as-published article there)

Internet of Things data will boost storage

An IT industry analyst article published by SearchStorage.

IT departments can benefit from its storage vendors eavesdropping on their arrays.

Tired of big data stories? Unfortunately, they’re not likely to stop anytime soon, especially from vendors hoping to re-ignite your Capex spending. But there is some great truth to the growing data explosion, and it pays to consider how that will likely cause incredible changes to our nice, safe, well-understood current storage offerings. For some it might feel like watching a train wreck unfolding in slow motion, but for others it just might be an exciting big wave to surf on. Either way, one of the biggest contributions to data growth will come from the so-called Internet of Things.

Basically, the Internet of Things just means that clever new data sensors (and, in many cases, remote controls) will be added to more and more devices that we interact with every day, turning almost everything we touch into data sources. The prime example is probably your smartphone, which is capable of reporting your location, orientation, usage, movement, and even social and behavioral patterns. If you’re engineering-minded, you can order a cheap Raspberry Pi computer and instrument any given object in your house today, from metering active energy devices to tracking passive items for security to monitoring environmental conditions. You can capture your goldfish’s swimming activity or count how many times someone opened the refrigerator door before dinner.

One challenge is that this highly measured world will create data at an astonishing rate, even if not all the data is or ever will be interesting or valuable. We might resist this in our own homes, but the Internet of Things trend breached the data center walls long ago. Most active IT components and devices have some built-in instrumentation and logging already, and we’ll see additional sensors added to the rest of our gear. Naturally, if there are instrumented IT components and devices, someone (probably us) will want to collect and keep the data around for eventual analysis, just in case.

Adding to the potential data overload, there’s an emerging big data science principle that says the more data history one has the better. Since we don’t necessarily know today all the questions we might want to ask of our data in the future, it’s best to retain all the detailed history perpetually. That way we always have the flexibility to answer any new questions we might ever think of, and as a bonus gain visibility over an ever-larger data set as time goes by…

…(read the complete as-published article there)

Pernix Data FVP Now Cache Mashing RAM and Flash

(Excerpt from original post on the Taneja Group News Blog)

Performance acceleration solutions tend to either replace key infrastructure or augment what you have. PernixData FVP for VMware clusters is firmly in the second camp, today with a new release making even better use of total cluster resources to provide IO performance acceleration to “any VM, on any host, with any shared storage”.

…(read the full post)