An IT industry analyst article published by SearchStorage.
Not too long ago, storage arrays were holed up deep in the data center and manageable without requiring much knowledge about the data actually stored therein. A storage admin might have known it was database data for a key application requiring high performance and solid backups, for example, but the database administrator took care of all the data-specific details. Today, this artificial wall separating information about data and the storage that it holds is changing, and rapidly.
Convergence isn’t only closing the gaps between silos of infrastructure, it is collapsing the distance between the job of persistence on the back end in storage and what stored data actually means and is used for on the front end. No longer desirable or even sufficient to store and protect bit patterns deep in the innards of the data center, you must now manage storage in ways that directly advance business operations.
In fact, it’s becoming a competitive necessity to leverage data at every level, or tier, of persistence throughout the data’s lifecycle. This is good for IT folks, as new data-aware storage is helping IT come to the forefront of key business processes.
Smart storage systems are powered by a glut of CPU/cores, cheaper flash and memory, agile software-defined storage functions and lessons learned from the big data analytics world. Internally, smarter storage systems can do a better job of optimizing capacity and performance through smart deduplication and compression schemes, application-aligned caching and tiering, and policy-definable quality of service (QoS) and data protection schemes. Externally, smart storage systems can create and serve new kinds of metadata about the data inside, providing for better management and governance, application QoS reporting and alignment, and can even help to create direct business value.
The roots of data awareness
Data-aware storage has its roots in old archival “content-addressable storage” architectures, which were early object-based archives that kept additional metadata (i.e., data about data) in order to exactly manage retention requirements (and possibly help with legal discovery actions). Systems often indexed and made this metadata accessible outside of the content itself and, eventually, even content was indexed and made searchable for e-discovery processing. However, as appropriate for archival cold storage, this data intelligence was created offline in post-processing and only applied to static archived data sets, and therefore rarely used.
Ten years ago, the emergence of big data approaches demonstrated that masses of live, unstructured and highly varied data could have tremendous primary business value. Today, the massive web-scale object stores popular for cloud-building and used to power production web and mobile applications often store all kinds of metadata. In fact, these stores support user-defined metadata that developers can arbitrarily extend for advanced application-specific tagging or data labeling. Some advanced file systems directly incorporate content indexing on data ingest to enable end-users to query primary storage for content containing specific words or phrases.
As an example of this evolution, consider the difference between two popular online file-sharing services, Dropbox and Evernote. Both can be used to store and sync various files across devices and share them between groups of users. Dropbox was the baseline standard defining online file sharing and collaboration, but Evernote goes much farther — although for a narrower set of use cases — by becoming innately content-aware with full content search, inline viewers and editors for common file types, extra metadata (e.g., URL source or reference if available, user tagging) and “similar content” recommendations. Although I use both daily, I view Dropbox as just another file-sharing alternative, while Evernote is critical to my workflow.
IT data awareness
Company lawyers (for e-discovery) and detectives (in security) require online systems that proactively identify abnormal behavior to produce early warnings on possible breaches. Smart data-aware storage systems can fold in auditing-type information and help correlate files, data and metadata with patterns of “events” — such as application crashes, file systems filling up, new users granted root access and shared or hidden key directories.
I remember one particularly blatant storage misusage (on a DEC VAX!) when we caught someone hoarding huge amounts of NSFW material on a little-accessed file system. Today’s more content-aware smart storage systems could alert security about such transgressions and warn (or even prevent) creative boundary-pushing users from crossing into job-termination territory to begin with.
Benefits of data-aware storage
- Fine-grained data protection: Storage that knows, for example, what VM files or volumes belong to or — even better — a specific policy to enforce that VM’s data can directly ensure appropriate data protection (e.g., the right level of RAID or replication).
- Fine-grained QoS: Similarly, storage that knows what database files require which kinds of performance acceleration can directly prioritize I/O and cache resources for optimal application performance.
- Content indexing and search: Large stores used for text-based data can deliver extra value by indexing content upon ingestion and enabling built-in admin and (even) end-user search.
- Social storage analysis: Storage can track usage and access by users and groups as metadata. Then other users can easily find out who in an organization had recent interest in certain content, identify group collaboration patterns and receive recommendations of new things to research based on collaborative filtering (e.g., “people who like the things I like also liked X”).
- Active capacity and utilization management: Storage can also track metadata about “per-data” system resource performance, capacity and utilization metrics. This enables storage admins to directly see what is going on in IT infrastructure for any piece or group of data tracked directly back to end users, departments and applications. Smart storage can also help optimize its own configuration and behavioral alignment to workloads.
- Analytics and machine learning: As storage grows smarter, expect to see increasing amounts of both low-level compute processing and automated machine learning incorporated directly into the storage layer. Storage-side functions could then be used to automatically categorize, score, translate, transform, visualize and report on data even as it’s being created and stored.
…(read the complete as-published article there)