Learn storage techniques for managing unstructured data use

An IT industry analyst article published by SearchStorage.


article_Learn-storage-techniques-for-managing-unstructured-data-use
Rearchitect storage to maximize unstructured data use at the global scale for larger data sets coming from big data analytics and other applications.

Mike Matchett
Small World Big Data

Back in the good old days, we mostly dealt with two storage tiers. We had online, high-performance primary storage directly used by applications and colder secondary storage used to tier less-valuable data out of primary storage. It wasn’t that most data lost value on a hard expiration date, but primary storage was pricey enough to constrain capacity, and we needed to make room for newer, more immediately valuable data.

We spent a lot of time trying to intelligently summarize and aggregate aging data to keep some kind of historical information trail online. Still, masses of detailed data were sent off to bed, out of sight and relatively offline. That’s all changing as managing unstructured data becomes a bigger concern. New services provide storage for big data analysis of detailed unstructured and machine data, as well as to support web-speed DevOps agility, deliver storage self-service and control IT costs. Fundamentally, these services help storage pros provide and maintain more valuable online access to ever-larger data sets.

Products for managing unstructured data may include copy data management (CDM), global file systems, hybrid cloud architectures, global data protection and big data analytics. These features help keep much, if not all, data available and productive.

Handling the data explosion

The underlying theme of many new storage offerings is to extend enterprise-quality IT management and governance across multiple tiers of global storage.

We’re seeing a lot of high-variety, high-volume and unstructured data. That’s pretty much everything other than highly structured database records. The new data explosion includes growing files and file systems, machine-generated data streams, web-scale application exhaust, endless file versioning, finer-grained backups and rollback snapshots to meet lower tolerances for data integrity and business continuity, and vast image and media repositories.

The public cloud is one way to deal with this data explosion, but it’s not always the best answer by itself. Elastic cloud storage services are easy to use to deploy large amounts of storage capacity. However, unless you want to create a growing and increasingly expensive cloud data dump, advanced storage management is required for managing unstructured data as well. The underlying theme of many new storage offerings is to extend enterprise-quality IT management and governance across multiple tiers of global storage, including hybrid and public cloud configurations.

If you’re architecting a new approach to storage, especially unstructured data storage at a global enterprise scale, here are seven advanced storage capabilities to consider:

Automated storage tiering. Storage tiering isn’t a new concept, but today it works across disparate storage arrays and vendors, often virtualizing in-place storage first. Advanced storage tiering products subsume yesterday’s simpler cloud gateways. They learn workload-specific performance needs and implement key quality of service, security and business cost control policies.

Much of what used to make up individual products, such as storage virtualizers, global distributed file systems, bulk data replicators, and migrators and cloud gateways, are converging into single-console unifying storage services. Enmotus and Veritas offer these simple-to-use services. This type of storage tiering enables unified storage infrastructure and provides a core service for many different types of storage management products.

Metadata at scale. There’s a growing focus on collecting and using storage metadata — data about stored data — when managing unstructured data. By properly aggregating and exploiting metadata at scale, storage vendors can better virtualize storage, optimize services, enforce governance policies and augment end-user analytical efforts.

Metadata concepts are most familiar in an object or file storage context. However, advanced block and virtual machine-level storage services are increasingly using metadata detail to help with tiering for performance. We also see metadata in data protection features. Reduxio’s infinite snapshots and immediate recovery based on timestamping changed blocks take advantage of metadata, as do change data capture techniques and N-way replication. When looking at heavily metadata-driven storage, it’s important to examine metadata protection schemes and potential bottlenecks. Interestingly, metadata-heavy approaches can improve storage performance because they usually allow for high metadata performance and scalability out of band from data delivery.

Storage analytics. You can use metadata and other introspective analytics about storage use gathered across enterprise storage, both offline and increasingly in dynamic optimizations. Call-home management is one example of how these analytics are used to better manage storage…(read the complete as-published article there)

Is demand for data storage or supply driving increased storage?

An IT industry analyst article published by SearchStorage.


article_Is-demand-for-data-storage-or-supply-driving-increased-storage
Figuring out whether we’re storing more data than ever because we’re producing more data or because constantly evolving storage technology lets us store more of it isn’t easy.

Mike Matchett
Small World Big Data

Whether you’re growing on-premises storage or your cloud storage footprint this year, it’s likely you’re increasing total storage faster than ever. Where we used to see capacity upgrade requests for proposals in terms of tens of terabytes growth, we now regularly see RFPs for half a petabyte or more. When it comes to storage size, huge is in.

Do we really need that much more data to stay competitive? Yes, probably. Can we afford extremely deep storage repositories? It seems that we can. However, these questions raise a more basic chicken-and-egg question: Are we storing more data because we’re making more data or because constantly evolving storage technology lets us?

Data storage economics
Looked at from a pricing perspective, the question becomes what’s driving price — more demand for data storage or more storage supply? I’ve heard economics professors say they can tell who really understands basic supply and demand price curve lessons when students ask this kind of question and consider a supply-side answer first. People tend to focus on demand-side explanations as the most straightforward way of explaining why prices fluctuate. I guess it’s easier to assume supply is a remote constant while envisioning all the possible changes in demand for data storage.

As we learn to wring more value out of our data, we want to both make and store more data.

But if storage supply is constant, given our massive data growth, then it should be really expensive. The massive squirreling away of data would instead be constrained by that high storage price (low availability). This was how it was years ago. Remember when traditional IT application environments struggled to fit into limited storage infrastructure that was already stretched thin to meet ever-growing demand?

Today, data capacities are growing large fast, and yet the price of storage keeps dropping (per unit of storage capacity). There’s no doubt supply is rising faster than demand for data storage. Technologies that bring tremendous supply-side benefits, such as the inherent efficiencies in shared cloud storage — and Moore’s law and clustered open source file systems like Hadoop Distributed File System and other technologies — have made bulk storage capacity so affordable that despite massive growth in demand for data storage, the price of storage continues to drop.

Endless data storage
When we think of hot new storage technologies, we tend to focus on primary storage advances such as flash and nonvolatile memory express. All so-called secondary storage comes, well, second. It’s true the relative value of a gigabyte of primary storage has greatly increased. Just compare the ROI of buying a whole bunch of dedicated, short-stroked HDDs as we did in the past to investing in a modicum of today’s fully deduped, automatically tiered and workload-shared flash.

It’s also worth thinking about flash storage in terms of impact on capacity, not just performance. If flash storage can serve a workload in one-tenth the time, it can also serve 10 similar workloads in the same time, providing an effective 10-times capacity boost.

But don’t discount the major changes that have happened in secondary storage…(read the complete as-published article there)

Future of data storage technology: Transformational trends for 2018

An IT industry analyst article published by SearchStorage.


article_Future-of-data-storage-technology-Transformational-trends-for-2018
Risk-averse enterprises finally accepted the cloud in 2017, and we didn’t even notice. Expect the same for these data storage technology trends in the new year.

Mike Matchett
Small World Big Data

Sometimes big changes sneak up on you, especially when you’re talking about the future of data storage technology. For example, when exactly did full-on cloud adoption become fully accepted by all those risk-averse organizations, understaffed IT shops and disbelieving business executives? I’m not complaining, but the needle of cloud acceptance tilted over sometime in the recent past without much ado. It seems everyone has let go of their fear of cloud and hybrid operations as risky propositions. Instead, we’ve all come to accept the cloud as something that’s just done.

Sure, cloud was inevitable, but I’d still like to know why it finally happened now. Maybe it’s because IT consumers expect information technology will provide whatever they want on demand. Or maybe it’s because everything IT implements on premises now comes labeled as private cloud. Influential companies, such as IBM, Microsoft and Oracle, are happy to help ease folks formerly committed to private infrastructure toward hybrid architectures that happen to use their respective cloud services.

In any case, I’m disappointed I didn’t get my invitation to the “cloud finally happened” party. But having missed cloud’s big moment, I’m not going to let other obvious yet possibly transformative trends sneak past as they go mainstream with enterprises in 2018. So when it comes to the future of data storage technology, I’ll be watching the following:

Containers arose out of a long-standing desire to find a better way to package applications. This year we should see enterprise-class container management reach maturity parity with virtual machine management — while not holding back any advantages containers have over VMs. Expect modern software-defined resources, such as storage, to be delivered mostly in containerized form. When combined with dynamic operational APIs, these resources will deliver highly flexible programmable infrastructures. This approach should enable vendors to package applications and their required infrastructure as units that can be redeployed — that is, blueprinted or specified in editable and versionable manifest files — enabling full environment and even data center-level cloud provisioning. Being able to deploy a data center on demand could completely transform disaster recovery, to name one use case.

Everyone is talking about AI, but it is machine learning that’s slowly permeating through just about every facet of IT management. Although there’s a lot of hype, it’s worth figuring out how and where carefully applied machine learning could add significant value. Most machine learning is conceptually made up of advanced forms of pattern recognition. So think about where using the technology to automatically identify complex patterns would reduce time and effort. We expect the increasing availability of machine learning algorithms to give rise to new storage management processes. These algorithms can produce storage management processes that can learn and adjust operations and settings to optimize workload services, quickly identify and fix the root causes of abnormalities, and broker storage infrastructure and manage large-scale data to minimize cost.

Management as a service (MaaS) is gaining traction, when looking at the future of data storage technology. First, every storage array seemingly comes with built-in call home support replete with management analytics and performance optimization. I predict that the interval for most remote vendor management services to quickly drop from today’s daily batch to five-minute streaming. I also expect cloud-hosted MaaS offerings are the way most shops will manage their increasingly hybrid architectures, and many will start to shift away from the burdens of on-premises management software…(read the complete as-published article there)

Secondary data storage: A massively scalable transformation

An IT industry analyst article published by SearchStorage.


article_Secondary-data-storage-A-massively-scalable-transformation
Capitalize on flash with interactive, online secondary data storage architectures that make a lot more data available for business while maximizing flash investment.

Mike Matchett
Small World Big Data

We all know flash storage is fast, increasingly affordable and quickly beating out traditional spinning disk for primary storage needs. It’s like all our key business applications have been magically upgraded to perform 10 times faster!

In the data center, modern primary storage arrays now come with massive flash caching, large flash tiers or are all flash through and through. Old worries about flash wearing out have been largely forgotten. And there are some new takes on storage designs, such as Datrium’s, that make great use of less-expensive server-side flash. Clearly, spending money on some kind of flash, if not all flash, can be a great IT investment.

Yet, as everyone builds primary storage with flash, there is less differentiation among those flashy designs. At some point, “really fast” is fast enough for now, assuming you aren’t in financial trading.

Rather than argue whose flash is faster, more reliable, more scalable or even cheaper, the major enterprise IT storage concern is shifting toward getting the most out of whatever high-performance primary storage investment gets made. Chasing ever-greater performance can be competitively lucrative, but universally, we see business demand for larger operational data sets growing quickly. Flash or not, primary storage still presents an ever-present capacity-planning challenge.

A new ‘big data’ opportunity
The drive to optimize shiny new primary storage pushes IT folks to use it as much as possible with suitable supporting secondary data storage. As this is literally a new “big data” opportunity, there is a correspondingly big change happening in the secondary storage market. Old-school backup storage designed solely as an offline data protection target doesn’t provide the scale, speed and interactive storage services increasingly demanded by today’s self-service-oriented users.

We’re seeing a massive trend toward interactive, online, secondary storage architectures. Instead of dumping backups, snapshots and archives into slow, near-online or essentially offline deep storage tiers, organizations are finding it’s worthwhile to keep large volumes of second-tier data in active use. With this shift to online secondary data storage, end users can quickly find and recover their own data like they do with Apple’s Time Machine on their Macs. And organizations can profitably mine and derive valuable insights from older, colder, larger data sets, such as big data analytics, machine learning and deep historical search.

If that sounds like a handy convergence of backup and archive, you’re right. There’s increasingly less difference between data protection backup and recovery and retention archiving…(read the complete as-published article there)

The power and benefits of encouraging IT disruption

An IT industry analyst article published by SearchStorage.


article_The-power-and-benefits-of-encouraging-IT-disruption
IT can’t remain a reactive cost center and cheerful help desk, but must become a competitive, cutthroat service provider and powerful champion of emerging disruptive technology.

Mike Matchett
Small World Big Data

I have a theory that true IT disruption happens when something nonlinear occurs to change the traditional expectation or baseline for a key operating capability. In storage, this could be related to capacity, performance or value. We’ve seen great market disruption — not to mention data center evolution — with the rise of scale-out vs. scale-up storage architectures, flash vs. disk and big data analytics vs. data warehouse business intelligence, for example. These disruptions have all brought orders of magnitude improvement, enabling many new ways to distill more value out of data.

I’m not saying we leave disrupted technologies completely behind, but old top-tier technologies can quickly drop down our perceptual pyramids of perceived value. Some older techs do disappear — think floppy drives and CRT monitors. But usually, they get subsumed as a lower tier inside a larger, newer umbrella and relegated to narrower, less prestigious use cases.

Emerging disruptive storage technologies include nonvolatile memory express server-side flash and persistent memory; in-storage data processing, combining software-defined storage, containerization and in-stream processing; global file systems and databases with global consistency, security, protection and access features; pervasive machine learning; and truly distributed internet of things data processing.

If you know it’s coming, is it really disruptive?

You might think it’s been the monstrous, growing volume of data, which is certainly growing non-linearly, that’s given birth to these disruptive technologies. That’s the story told by all the vendor presentations I’ve seen in the last few years. I’ve even presented that line of thinking myself.

Successful disruption requires early adopters be willing to take some risks.

The problem with that interpretation is that it makes the storage industry look reactive instead of proactive. And it also makes them look like heroes or at least paragons of product management, figuring out exactly what people need and delivering it just in time to save IT from certain disaster.

If we’re honest, the truth is more likely that newer generations of storage technologies let us retain massively growing volumes of data, access that data faster and increase the ROI for keeping and analyzing ever bigger data sets. In other words, it’s new technology that enables more data, not the fact of more data causing new technology. Creating disruptive technology may be as much a matter of luck as intentional design.

Disruptive technologies arise out of vendors serving our competitive natures…(read the complete as-published article there)