Get the most from cloud-based storage services

An IT industry analyst article published by SearchStorage.

We have been hearing about the inevitable transition to the cloud for IT infrastructure since before the turn of the century. But, year after year, storage shops quickly become focused on only that year’s prioritized initiatives, which tend to be mostly about keeping the lights on and costs low. A true vision-led shift to cloud-based storage services requires explicit executive sponsorship from the business side of an organization. But unless you cynically count the creeping use of shadow IT as an actual strategic directive to do better as an internal service provider, what gets asked of you is likely — and unfortunately — to perform only low-risk tactical deployments or incremental upgrades.

Not exactly the stuff of business transformations.

Cloud adoption at a level for maximum business impact requires big executive commitment. That amount of commitment is, quite frankly, not easy to generate.

…(read the complete as-published article there)

What’s the future of data storage in 2016?

An IT industry analyst article published by SearchStorage.

It’s hard to make stunning predictions on the future of data storage that are certain to come true, but it’s that time of year and I’m going to step out on that limb again. I’ll review my predictions from last year as I go — after all, how much can you trust me if I’m not on target year after year? (Yikes!)

Last year, I said the total data storage market would stay flat despite big growth in unstructured data. I’d have to say that seems to be true, if not actually dropping. Despite lots of new entrants in the market, the average vendor margin in storage is narrowing with software-defined variants showing up everywhere, open-source alternatives nibbling at the edges, commodity-based appliances becoming the rule, and ever-cheaper “usable” flash products improving performance and density at the same time.

…(read the complete as-published article there)

Big data analytics applications impact storage systems

An IT industry analyst article published by SearchStorage.

Whether driven by direct competition or internal business pressure, CIOs, CDOs and even CEOs today are looking to squeeze more value, more insight and more intelligence out of their data. They no longer can afford to archive, ignore or throw away data if it can be turned into a valuable asset. At face value, it might seem like a no-brainer — “we just need to analyze all that data to mine its value.” But, as you know, keeping any data, much less big data, has a definite cost. Processing larger amounts of data at scale is challenging, and hosting all that data on primary storage hasn’t always been feasible.

Historically, unless data had some corporate value — possibly as a history trail for compliance, a source of strategic insight or intelligence that can optimize operational processes — it was tough to justify keeping it. Today, thanks in large part to big data analytics applications, that thinking is changing. All of that bulky low-level bigger data has little immediate value, but there might be great future potential someday, so you want to keep it — once it’s gone, you lose any downstream opportunity.

To extract value from all that data, however, IT must not only store increasingly large volumes of data, but also architect systems that can process and analyze it in multiple ways.

…(read the complete as-published article there)

Navigate data lakes to manage big data

An IT industry analyst article published by SearchStorage.

Big data sure is exciting to business folks, with all sorts of killer applications just waiting to be discovered. And you no doubt have a growing pile of data bursting the seams of your current storage infrastructure, with lots of requests to mine even more voluminous data streams. Haven’t you been collecting microsecond end-user behavior across all your customers and prospects, not to mention collating the petabytes of data exhaust from instrumenting your systems to the nth degree? Imagine the insight management would have if they could look at all that data at once. Forget about data governance, data management, data protection and all those other IT worries — you just need to land all that data in a relatively scale-cheap Hadoop cluster!

Seriously, though, big data lakes can meet growing data challenges and provide valuable new services to your business. By collecting a wide variety of data sets relevant to the business all in one place and enabling multi-talented analytics based on big data approaches that easily scale, many new data mining opportunities can be created. The total potential value of a data lake grows with the amount of useful data it holds available for analysis. And, one of the key tenets of big data and the big data lake concept is that you don’t have to create a master schema ahead of time, so non-linear growth is possible.

The enterprise data lakes or hub concept was first proposed by big data vendors like Cloudera and Hortonworks, ostensibly using vanilla scale-out HDFS-based commodity storage. But it just so happens that the more data you keep on hand, the more storage of all kinds you will need. Eventually, all corporate data is likely to be considered big data. However, not all of that corporate data is best hosted on a commodity scale-out HDFS cluster.

So, today, traditional storage vendors are signing up to the big data lakes vision. From a storage marketing perspective, it seems like data lakes are the new cloud. “Everyone needs a data lake. How can you compete without one (or two or three)?” And there are a variety of enterprise storage options for big data, including enterprise storage, that can provide remote storage that acts like HDFS, Hadoop virtualization that can translate other storage protocols into HDFS, and scalable software-defined storage options.

…(read the complete as-published article there)

Moving to all-flash? Think about your data storage infrastructure

An IT industry analyst article published by SearchStorage.

Everyone is now onboard with flash. All the key storage vendors have at least announced entry into the all-flash storage array market, with most having offered hybrids — solid-state drive-pumped traditional arrays — for years. As silicon storage gets cheaper and denser, it seems inevitable that data centers will migrate from spinning disks to “faster, better and cheaper” options, with non-volatile memory poised to be the long-term winner.

But the storage skirmish today seems to be heading toward the total cost of ownership end of things, where two key questions must be answered:

  • How much performance is needed, and how many workloads in the data center have data with varying quality of service (QoS) requirements or data that ages out?
  • Are hybrid arrays a better choice to handle mixed workloads through advanced QoS and auto-tiering features?

All-flash proponents argue that cost and capacity will continue to drop for flash compared to hard disk drives (HDDs), and that no workload is left wanting with the ability of all-flash to service all I/Os at top performance. Yet we see a new category of hybrids on the market that are designed for flash-level performance and then fold in multiple tiers of colder storage. The argument there is that data isn’t all the same and its value changes over its lifetime. Why store older, un-accessed data on a top tier when there are cheaper, capacity-oriented tiers available?

It’s misleading to lump together hybrids that are traditional arrays with solid-state drives (SSDs) added and the new hybrids that might be one step evolved past all-flash arrays. And it can get even more confusing when the old arrays get stuffed with nothing but flash and are positioned as all-flash products. To differentiate, some industry wags like to use the term “flash-first” to describe newer-generation products purpose-built for flash speeds. That still could cause some confusion when considering both hybrids and all-flash designs. It may be more accurate to call the flash-first hybrids “flash-converged.” By being flash-converged, you can expect to buy one of these new hybrids with nothing but flash inside and get all-flash performance.

We aren’t totally convinced that the future data center will have just a two-tier system with flash on top backed by tape (or a remote cold cloud), but a “hot-cold storage” future is entirely possible as intermediate tiers of storage get, well, dis-intermediated. We’ve all predicted the demise of 15K HDDs for a while; can all the other HDDs be far behind as QoS controls get more sophisticated in handling the automatic mixing of hot and cold to create any temperature storage you might need? …

…(read the complete as-published article there)

Out on a data storage market limb: Six predictions for 2015

An IT industry analyst article published by SearchStorage.

Our crystal ball tells us this will be a year of change for the data storage market.

With another year just getting underway, we here at Taneja Group felt we needed a few analyst predictions to get things off on the right foot. The easiest predictions, and often the most likely ones, are that things will continue mostly as they are. But what fun is that? So, like any good fortune teller, we held hands around a crystal ball, gathered our prescient thoughts and with the help of the storage spirits came up with these six predictions for change in the data storage market for 2015.

  1. The overall traditional storage market will stay relatively flat despite huge growth in big data and the onrushing Internet of Things. Most new big data will be unstructured and big data architectures like Hadoop will still tend to leverage DAS for storage. In addition, many big data players are pushing the data lake or hub concept to land even bigger chunks of other enterprise data on big data clusters. While we do see some salvation in this space from vendors […] that enable big data analysis to leverage traditional enterprise storage, it won’t be enough to make a big dent in 2015.We’ve also noticed that many storage shops have yet to take advantage of the emerging capacity optimizations now available (e.g., thin provisioning, linked clones, global deduplication, inline compression and so on) in recent versions of competitive arrays that are becoming table stakes for new acquisition decisions. Hybrid arrays, in particular, are bringing flash-enabling space efficiencies across their full complement of storage tiers, and most arrays these days are turning at least hybrid.
  2. Speaking of flash, there are too many all-flash array (AFA) vendors and not enough differentiation. During 2012/2013 the first AFA vendors had the market to themselves, but with all the big players rolling out full-fledged flash offerings, opportunities are declining. With [many vendors] all pushing their own solutions (both AFA and hybrid), the remaining independent vendors will have a harder time finding a niche where they can survive. We also expect to see a new round of very high-end performance storage architectures in 2015[…]As a related trend, we anticipate that hybrid-based Tier-1 arrays will lose ground to AFAs in general, as the cost of flash drops and flash performance proves valuable to most if not all Tier-1 I/O. In virtualization environments, this trend will be somewhat lessened by the rise in popularity of server-side flash and/or memory caching/tiering solutions.
  3. Data protection and other add-on storage capabilities will become more directly baked into storage solutions. We expect to see more traditional arrays follow the examples of

…(To read the complete six item prediction see as-published article over there)

Internet of Things data will boost storage

An IT industry analyst article published by SearchStorage.

IT departments can benefit from its storage vendors eavesdropping on their arrays.

Tired of big data stories? Unfortunately, they’re not likely to stop anytime soon, especially from vendors hoping to re-ignite your Capex spending. But there is some great truth to the growing data explosion, and it pays to consider how that will likely cause incredible changes to our nice, safe, well-understood current storage offerings. For some it might feel like watching a train wreck unfolding in slow motion, but for others it just might be an exciting big wave to surf on. Either way, one of the biggest contributions to data growth will come from the so-called Internet of Things.

Basically, the Internet of Things just means that clever new data sensors (and, in many cases, remote controls) will be added to more and more devices that we interact with every day, turning almost everything we touch into data sources. The prime example is probably your smartphone, which is capable of reporting your location, orientation, usage, movement, and even social and behavioral patterns. If you’re engineering-minded, you can order a cheap Raspberry Pi computer and instrument any given object in your house today, from metering active energy devices to tracking passive items for security to monitoring environmental conditions. You can capture your goldfish’s swimming activity or count how many times someone opened the refrigerator door before dinner.

One challenge is that this highly measured world will create data at an astonishing rate, even if not all the data is or ever will be interesting or valuable. We might resist this in our own homes, but the Internet of Things trend breached the data center walls long ago. Most active IT components and devices have some built-in instrumentation and logging already, and we’ll see additional sensors added to the rest of our gear. Naturally, if there are instrumented IT components and devices, someone (probably us) will want to collect and keep the data around for eventual analysis, just in case.

Adding to the potential data overload, there’s an emerging big data science principle that says the more data history one has the better. Since we don’t necessarily know today all the questions we might want to ask of our data in the future, it’s best to retain all the detailed history perpetually. That way we always have the flexibility to answer any new questions we might ever think of, and as a bonus gain visibility over an ever-larger data set as time goes by…

…(read the complete as-published article there)

Stop counting megabytes; it’s all about application-aware storage now

An IT industry analyst article published by SearchStorage.

Raw capacity numbers are becoming less useful as deduplication, compression and application-aware storage provide more value than sheer capacity.

Whether clay pots, wooden barrels or storage arrays, vendors have always touted how much their wares can reliably store. And invariably, the bigger the vessel, the more impressive and costly it is, both to acquire and manage. The preoccupation with size as a measure of success implies that we should judge and compare offerings on sheer volume. But today, the relationship between physical storage media capacity and the effective value of the data “services” it delivers has become much more virtual and cloudy. No longer does a megabyte of effective storage mean a megabyte of real storage.

Most array vendors now incorporate capacity-optimizing features such as thin provisioning, compression and data deduplication. But now it looks like those vendors might just be selling you megabytes of data that aren’t really there. I agree that it’s the effective storage and resulting cost efficiency that counts, not what goes on under the hood or whether the actual on-media bits are virtual, compacted or shared. The type of engine and the gallons in the tank are interesting, but it’s the speed and distance you can go that matter.

Corporate data that includes such varied things as customer behavior logs, virtual machine images and corporate email that’s been globally deduped and compressed might deflate to a twentieth or less of its former glory. So when a newfangled flash array only has 10 TB of actual solid-state drives, but based on an expected minimum dedupe ratio is sold as a much larger effective 100+ TB, are we still impressed with the bigger number? We know our raw data is inherently “inflated” with too many copies and too little sharing. It should have always been stored “more” optimally.

But can we believe that bigger number? What’s hard to know, although perhaps it’s what we should be focusing on, is the reduction ratio we’ll get with our particular data set, as deflation depends highly on both the dedupe algorithm and the content…

…(read the complete as-published article there)

External storage might make sense for Hadoop

An IT industry analyst article published by SearchStorage.

Using Hadoop to drive big data analytics doesn’t necessarily mean building clusters of distributed storage; a good old array might be a better choice.

Using Hadoop to drive big data analytics doesn’t necessarily mean building clusters of distributed storage — good old external storage might be a better choice.

The original architectural design for Hadoop made use of relatively cheap commodity servers and their local storage in a scale-out fashion. Hadoop’s original goal was to enable cost-effective exploitation of data that was previously not viable. We’ve all heard about big data volume, variety, velocity and a dozen other “v” words used to describe these previously hard-to-handle data sets. Given such a broad target by definition, most businesses can point to some kind of big data they’d like to exploit.

Big data is growing bigger every day and storage vendors with their relatively expensive SAN and network-attached storage (NAS) systems are starting to work themselves into the big data party. They can’t simply leave all that data to server vendors filling boxes with commodity disk drives. Even if Hadoop adoption is just in its early stages, the competition and confusing marketing noise is ratcheting up.

In a Hadoop scale-out design, each physical node in the cluster hosts both local compute and a share of data; it’s intended to support applications, such as search, that often need to crawl through massively large data sets. Much of Hadoop’s value lies in how it effectively executes parallel algorithms over distributed data chunks across a scale-out cluster.

Hadoop is made up of a compute engine based on MapReduce and a data service called the Hadoop Distributed File System (HDFS). Hadoop takes advantage of high data “locality” by spreading big data sets over many nodes using HDFS, farming out parallelized compute tasks to each data node (the “map” part of MapReduce), followed by various shuffling and sorting consolidation steps to produce a result (the “reduce” part).

Commonly, each HDFS data node will be assigned DAS disks to work with. HDFS will then replicate data across all the data nodes, usually making two or three copies on different data nodes. Replicas are placed on different server nodes, with the second replica placed on a different “rack” of nodes to help avoid rack-level loss. Obviously, replication takes up more raw capacity than RAID, but it also has some advantages like avoiding rebuild windows.

So if HDFS readily handles the biggest of data sets in a way native to the MapReduce style of processing, uses relatively cheap local disks and provides built-in “architecture-aware” replication, why consider enterprise-class storage? …

…(read the complete as-published article there)

How the future of data storage will stack up

An IT industry analyst article published by SearchStorage.

In the near future, data storage won’t be a passive player as it integrates more closely with applications and workloads.

In a previous column, [we] wrote that the concept of LUNs is dead or at least dying as the primary way storage will be managed in the future. This has become evident with the emergence and increased adoption of products offering advanced virtual machine (VM)-centric storage. Shifting the focus from LUNs to VMs changes the storage game for VM administrators who can continue to work with constructs they understand directly, storage folks who have to elevate their service offerings, and even those pesky end users who might benefit from increased performance and availability (and hopefully lower costs).

You could view the end of the LUN as a consequence of industry commoditization of low-level array functionality as storage vendors compete to offer better, higher-level products; or you might chalk it up to a highly competitive marketplace where the most efficient and effective IT can help win the day. Either way, we think it’s inevitable that storage solutions will keep evolving up the stack. The big question is what comes next. What are the next valuable levels as one climbs the storage stack? Let’s start with familiar storage types and work up to some possible future storage solutions…

…(read the complete as-published article there)