Storage Tiers or Tears?
I always enjoy reading StorageNewsletter and in particular the insights of Philippe Nicolas. His recent article on Storage Tiering (“Tier or Not Tier”) is a fairly complete overview of the history of the companies involved and the broad technical evolution of tiering through the ages1.
As an originally storage-focused analyst, I had a front row seat to many of the mentioned vendor marketing gyrations around storage tiering. As Philippe outlines, there have been quite a few variations on the theme as each successive generation of solutions attempts to displace older approaches. The driver has always been to reduce the cost (much of which stems from complexity) of keeping and maintaining growing data sets.
A tiering approach (or avoidance of tiering in the case of some newer all-fancy flash solutions) is designed in some way to serve hot data from the fastest tier while leveraging cheaper tiers of media for increasingly colder data. In the early days, data was migrated with larger software packages (i.e. archived) from faster array to slower storage (perhaps even tape) as it cooled off. Fast arrays could store “meta stubs” for colder files, with some provision to recall cold files if and when accessed to make it look like all files were still online in the fast array. This is still a valid approach for some data scenarios as evidenced by the success of AWS Glacier storage services.
When arrays got smarter and could hold multiple types of media internally, the array itself could have volumes configured on different tiers. Most arrays then evolved intelligence to internally move data between faster to slower (and importantly cheaper) media. With the advent of flash, hybrid arrays with both flash and HDD tiers became super-proficient at optimizing the use of their flash tier for performance and their slower but cheaper HDD tiers for capacity with internal data migration, caching and even smart data reduction.
It became popular at this time to increasingly track and analyze not just the data/file access patterns, but to track and profile users and applications in order to do smarter, more predictive optimizations in storage. Smart data tiering today can take advantage of fast object storage, LTFS (linear tape file system) and cloud distribution to expand the optimizations in both space (globally) and time (from really cold to edge hot).
Storage tiering conceptually is graduating into the more fully automated realm of “just keeping data where its best used at least cost”, transparent to end users or applications. We are seeing the long-time need for expert storage admins to make deliberate capacity and performance planning and hosting decisions fade into history as global file systems embed the intelligence to maximize data service levels over all the resources available to them.
And while some storage companies are still focused on competing at the media v.s. media level (who has cheaper faster tiering, or better all flash “not tiering” or what not), we believe that more and more storage buyers care less and less about what’s inside the box. What they care about primarily is safely storing and serving data to their users and applications, minimizing cost and maximizing performance (for the need at hand).
In other words, storage is the service while storage tiering is an internal means to that end. It’s technically interesting, but increasingly is just irrelevant details to the buyer. We want to understand the economic curves of cost/capacity v performance capabilities for data consumers (and caretakers). The storage internally should of course make best use of its resources.
1Although reading the English version of what is likely exquisite original French often requires a bit of imagination to recover the literal gist and tone of inflection.