Learn storage techniques for managing unstructured data use

An IT industry analyst article published by SearchStorage.


article_Learn-storage-techniques-for-managing-unstructured-data-use
Rearchitect storage to maximize unstructured data use at the global scale for larger data sets coming from big data analytics and other applications.

Mike Matchett
Small World Big Data

Back in the good old days, we mostly dealt with two storage tiers. We had online, high-performance primary storage directly used by applications and colder secondary storage used to tier less-valuable data out of primary storage. It wasn’t that most data lost value on a hard expiration date, but primary storage was pricey enough to constrain capacity, and we needed to make room for newer, more immediately valuable data.

We spent a lot of time trying to intelligently summarize and aggregate aging data to keep some kind of historical information trail online. Still, masses of detailed data were sent off to bed, out of sight and relatively offline. That’s all changing as managing unstructured data becomes a bigger concern. New services provide storage for big data analysis of detailed unstructured and machine data, as well as to support web-speed DevOps agility, deliver storage self-service and control IT costs. Fundamentally, these services help storage pros provide and maintain more valuable online access to ever-larger data sets.

Products for managing unstructured data may include copy data management (CDM), global file systems, hybrid cloud architectures, global data protection and big data analytics. These features help keep much, if not all, data available and productive.

Handling the data explosion

The underlying theme of many new storage offerings is to extend enterprise-quality IT management and governance across multiple tiers of global storage.

We’re seeing a lot of high-variety, high-volume and unstructured data. That’s pretty much everything other than highly structured database records. The new data explosion includes growing files and file systems, machine-generated data streams, web-scale application exhaust, endless file versioning, finer-grained backups and rollback snapshots to meet lower tolerances for data integrity and business continuity, and vast image and media repositories.

The public cloud is one way to deal with this data explosion, but it’s not always the best answer by itself. Elastic cloud storage services are easy to use to deploy large amounts of storage capacity. However, unless you want to create a growing and increasingly expensive cloud data dump, advanced storage management is required for managing unstructured data as well. The underlying theme of many new storage offerings is to extend enterprise-quality IT management and governance across multiple tiers of global storage, including hybrid and public cloud configurations.

If you’re architecting a new approach to storage, especially unstructured data storage at a global enterprise scale, here are seven advanced storage capabilities to consider:

Automated storage tiering. Storage tiering isn’t a new concept, but today it works across disparate storage arrays and vendors, often virtualizing in-place storage first. Advanced storage tiering products subsume yesterday’s simpler cloud gateways. They learn workload-specific performance needs and implement key quality of service, security and business cost control policies.

Much of what used to make up individual products, such as storage virtualizers, global distributed file systems, bulk data replicators, and migrators and cloud gateways, are converging into single-console unifying storage services. Enmotus and Veritas offer these simple-to-use services. This type of storage tiering enables unified storage infrastructure and provides a core service for many different types of storage management products.

Metadata at scale. There’s a growing focus on collecting and using storage metadata — data about stored data — when managing unstructured data. By properly aggregating and exploiting metadata at scale, storage vendors can better virtualize storage, optimize services, enforce governance policies and augment end-user analytical efforts.

Metadata concepts are most familiar in an object or file storage context. However, advanced block and virtual machine-level storage services are increasingly using metadata detail to help with tiering for performance. We also see metadata in data protection features. Reduxio’s infinite snapshots and immediate recovery based on timestamping changed blocks take advantage of metadata, as do change data capture techniques and N-way replication. When looking at heavily metadata-driven storage, it’s important to examine metadata protection schemes and potential bottlenecks. Interestingly, metadata-heavy approaches can improve storage performance because they usually allow for high metadata performance and scalability out of band from data delivery.

Storage analytics. You can use metadata and other introspective analytics about storage use gathered across enterprise storage, both offline and increasingly in dynamic optimizations. Call-home management is one example of how these analytics are used to better manage storage…(read the complete as-published article there)

Data center storage architecture gets smarter with AI

An IT industry analyst article published by SearchStorage.


article_Data-center-storage-architecture-gets-smarter-with-AI
Trends, such as event-triggered computing, as exemplified by Lambda Architectures, converge on data center storage to hasten data center intelligence evolution.

Mike Matchett

Infrastructure is getting smarter by the day. It’s reached the point where I’m afraid artificially intelligent IT will soon turn the tables and start telling me how to manage my own personal “lifecycle.” Well, I would be afraid if I believed all those AI vendors suddenly claiming they offer AI-powered infrastructure.

Now, we all want smarter, more automated, self-optimizing infrastructure — especially with storage — but I don’t see storage infrastructure components engaging in a human conversation with people about anything anytime soon. Storage is definitely getting smarter in more practical ways, however, and these changes are being seen in places such as data center storage architecture.

I’m excited by the hot storage trend toward embedding machine learning algorithms aimed at key optimization, categorization, search and pattern detection tasks. Corporate data assets are growing, and so is the potential value that comes from gathering and analyzing big data. It’s difficult to manually find those nuggets of data gold, though. And with the coming onslaught of the internet of things (IoT), data prospecting challenges will add mining huge amounts of fast streaming, real-time machine-generated and operational transactional data to the mix.

To help us take advantage of these potential information riches, storage vendors have started inserting intelligent algorithms into the storage layer directly…(read the complete as-published article there)

HP Vertica Goes Hadoop on MapR’s Read Write Infrastructure

(Excerpt from original post on the Taneja Group News Blog)

Today MapR and HP Vertica are rolling out an exciting joint integration, nicely addressing full SQL-on-Hadoop use cases. Vertica is now runnable, actually “pluggable”, on and into MapR’s enterprise quality Hadoop distribution. This is an interesting feat that depends highly on MapR’s unique implementation of enterprise grade storage in place of the open source HDFS guts.

…(read the full post)