Persistent data storage in containerized environments

An IT industry analyst article published by SearchStorage.


article_Persistent-data-storage-in-containerized-environments
The most significant challenge to the rise of containerized applications is quickly and easily providing enterprise-class persistent storage for containers.

Mike Matchett

The pace of change in IT is staggering. Fast growing data, cloud-scale processing and millions of new internet of things devices are driving us to find more efficient, reliable and scalable ways to keep up. Traditional application architectures are reaching their limits, and we’re scrambling to evaluate the best new approaches for development and deployment. Fortunately, the hottest prospect — containerization — promises to address many, if not all, of these otherwise overwhelming challenges.

In containerized application design, each individual container hosts an isolatable, and separately scalable, processing component of a larger application web of containers. Unlike monolithic application processes of the past, large, containerized applications can consist of hundreds, if not thousands, of related containers. The apps support Agile design, development and deployment methodologies. They can scale readily in production and are ideally suited for hosting in distributed, and even hybrid, cloud infrastructure.

Unfortunately, containers weren’t originally designed to implement full-stack applications or really any application that requires persistent data storage. The original idea for containers was to make it easy to create and deploy stateless microservice application layers on a large scale. Think of microservices as a form of highly agile middleware with conceptually no persistent data storage requirements to worry about.

Persistence in persisting

Because the container approach has delivered great agility, scalability, efficiency and cloud-readiness, and is lower-cost in many cases, people now want to use it for far more than microservices. Container architectures provide such a better way to build modern applications that we see many commercial software and systems vendors transitioning internal development to container form and even deploying them widely, often without explicit end-user or IT awareness. It’s a good bet that most Fortune 1000 companies already host third-party production IT applications in containers, especially inside appliances, converged approaches and purpose-built infrastructure.

It’s a good bet that most Fortune 1000 companies already host third-party container applications within production IT.

You might find large, containerized databases and even storage systems. Still, designing enterprise persistent storage for these applications is a challenge, as containers can come and go and migrate across distributed and hybrid infrastructure. Because data needs to be mastered, protected, regulated and governed, persistent data storage acts in many ways like an anchor, holding containers down and threatening to reduce many of their benefits.

Container architectures need three types of storage…(read the complete as-published article there)

Data center storage architecture gets smarter with AI

An IT industry analyst article published by SearchStorage.


article_Data-center-storage-architecture-gets-smarter-with-AI
Trends, such as event-triggered computing, as exemplified by Lambda Architectures, converge on data center storage to hasten data center intelligence evolution.

Mike Matchett

Infrastructure is getting smarter by the day. It’s reached the point where I’m afraid artificially intelligent IT will soon turn the tables and start telling me how to manage my own personal “lifecycle.” Well, I would be afraid if I believed all those AI vendors suddenly claiming they offer AI-powered infrastructure.

Now, we all want smarter, more automated, self-optimizing infrastructure — especially with storage — but I don’t see storage infrastructure components engaging in a human conversation with people about anything anytime soon. Storage is definitely getting smarter in more practical ways, however, and these changes are being seen in places such as data center storage architecture.

I’m excited by the hot storage trend toward embedding machine learning algorithms aimed at key optimization, categorization, search and pattern detection tasks. Corporate data assets are growing, and so is the potential value that comes from gathering and analyzing big data. It’s difficult to manually find those nuggets of data gold, though. And with the coming onslaught of the internet of things (IoT), data prospecting challenges will add mining huge amounts of fast streaming, real-time machine-generated and operational transactional data to the mix.

To help us take advantage of these potential information riches, storage vendors have started inserting intelligent algorithms into the storage layer directly…(read the complete as-published article there)

Smarter storage starts with analytics

An IT industry analyst article published by SearchStorage.


article_Smarter-storage-starts-with-analytics
Storage smartens up to keep pace with data-intensive business applications embedding operational analytics capabilities.

Mike Matchett

The amount of data available to today’s enterprise is staggering. Yet the race to collect and mine even more data to gain competitive insight, deeply optimize business processes and better inform strategic decision-making is accelerating. Fueled by these new data-intensive capabilities, traditional enterprise business applications primarily focused on operational transactions are now quickly converging with advanced big data analytics to help organizations grow increasingly (albeit artificially) intelligent.

To help IT keep pace with data-intensive business applications that are now embedding operational analytics, data center infrastructure is also evolving rapidly. In-memory computing, massive server-side flash, software-defined resources and scale-out platforms are a few of the recent growth areas reshaping today’s data centers. In particular, we are seeing storage infrastructure, long considered the slow-changing anchor of the data center, transforming faster than ever. You might say that we’re seeing smarter storage.

Modern storage products take full advantage of newer silicon technologies, growing smarter with new inherent analytics, embedding hybrid cloud tiering and (often) converging with or hosting core data processing directly. Perhaps the biggest recent change in storage isn’t with hardware or algorithms at all, but with how storage can now best be managed.

For a long time, IT shops had no option but to manage storage by deploying and learning a unique storage management tool for each type of vendor product in use. This wastes significant time implementing, integrating and supporting one-off instances of complex vendor-specific management tools. But as managing data about business data (usage, performance, security and so on, see “Benefits of analytical supercharging”) grows, simply managing a metrics database now becomes a huge challenge as well. Also, with trends like the internet of things proliferating the baking of streaming sensors into everything, key systems metadata is itself becoming much more prolific and real-time.

It can take a significant data science investment to harvest the desired value out of it.

Perhaps the biggest recent change in storage isn’t with hardware or algorithms at all, but with how storage can now best be managed.

Storage analytics ‘call home’

So while I’m all for DIY when it comes to unique integration of analytics with business processes and leveraging APIs to create custom widgets or reports, I’ve seen too many enterprises develop their own custom in-house storage management tools, only for those eventually becoming as expensive and onerous to support and keep current as if they had just licensed one of those old-school “Big 4” enterprise management platforms (i.e., BMC, CA, Hewlett Packard Enterprise [HPE] and IBM). In these days of cloud-hosted software as a service (SaaS) business applications, it makes sense that such onerous IT management tasks should be subscribed out to and provided by a remote expert service provider.

Remote storage management on a big scale really started with the augmented vendor support “call home” capability pioneered by NetApp years ago. Log and event files from on-premises arrays are bundled up and sent daily back to the vendor’s big data database “in the cloud.” Experts then analyze incoming data from all participating customers with big data analysis tools (e.g., Cassandra, HBase and Spark) to learn from their whole pool of end-user deployments.
Benefits of analytical supercharging

Smarter infrastructure with embedded analytical intelligence can help IT do many things better, and in some cases even continue to improve with automated machine learning. Some IT processes already benefitting from analytical supercharging include the following:

  • Troubleshooting. Advanced analytics can provide predictive alerting to warn of potential danger in time to avoid it, conduct root cause analyses when something does go wrong to identify the real problem that needs to be addressed and make intelligent recommendations for remediation.
  • Resource optimization. By learning what workloads require for good service and how resources are used over time, analytics can help tune and manage resource allocations to both ensure application performance and optimize infrastructure utilization.
  • Operations automation. Smarter storage systems can learn (in a number of ways) how to best automate key processes and workflows, and then optimally manage operational tasks at large scale — effectively taking over many of today’s manual DevOps functions.
  • Brokerage. Cost control and optimization will become increasingly important and complex as truly agile hybrid computing goes mainstream. Intelligent algorithms will be able to make the best cross-cloud brokering and dynamic deployment decisions.
  • Security. Analytical approaches to securing enterprise networks and data are key to processing the massive scale and nonstop stream of global event and log data required today to find and stop malicious intrusion, denial of service and theft of corporate assets.

That way, the array vendor can deliver valuable proactive advice and recommendations based on data any one organization simply couldn’t generate on its own. With this SaaS model, IT doesn’t have to manage their own historical database, operate a big data analysis platform or find the data science resources to analyze it. And the provider can gain insight into general end-user behavior, study actual feature usage and identify sales and marketing opportunities.

Although it seems every storage vendor today offers call home support, you can differentiate between them. Some look at customer usage data at finer-grained intervals, even approaching real-time stream-based monitoring. Some work hard on improving visualization and reporting. And others intelligently mine collected data to train machine learning models and feedback smarter operational advice to users…(read the complete as-published article there)

Smart storage systems smart for business

An IT industry analyst article published by SearchStorage.


article_Smart-storage-systems-smart-for-business
Mike Matchett explains how data-aware storage combined with application awareness is leading to a new wave of intelligent data storage.

Mike Matchett

Not too long ago, storage arrays were holed up deep in the data center and manageable without requiring much knowledge about the data actually stored therein. A storage admin might have known it was database data for a key application requiring high performance and solid backups, for example, but the database administrator took care of all the data-specific details. Today, this artificial wall separating information about data and the storage that it holds is changing, and rapidly.

Convergence isn’t only closing the gaps between silos of infrastructure, it is collapsing the distance between the job of persistence on the back end in storage and what stored data actually means and is used for on the front end. No longer desirable or even sufficient to store and protect bit patterns deep in the innards of the data center, you must now manage storage in ways that directly advance business operations.

In fact, it’s becoming a competitive necessity to leverage data at every level, or tier, of persistence throughout the data’s lifecycle. This is good for IT folks, as new data-aware storage is helping IT come to the forefront of key business processes.

Smart storage systems are powered by a glut of CPU/cores, cheaper flash and memory, agile software-defined storage functions and lessons learned from the big data analytics world. Internally, smarter storage systems can do a better job of optimizing capacity and performance through smart deduplication and compression schemes, application-aligned caching and tiering, and policy-definable quality of service (QoS) and data protection schemes. Externally, smart storage systems can create and serve new kinds of metadata about the data inside, providing for better management and governance, application QoS reporting and alignment, and can even help to create direct business value.

The roots of data awareness

Data-aware storage has its roots in old archival “content-addressable storage” architectures, which were early object-based archives that kept additional metadata (i.e., data about data) in order to exactly manage retention requirements (and possibly help with legal discovery actions). Systems often indexed and made this metadata accessible outside of the content itself and, eventually, even content was indexed and made searchable for e-discovery processing. However, as appropriate for archival cold storage, this data intelligence was created offline in post-processing and only applied to static archived data sets, and therefore rarely used.

Ten years ago, the emergence of big data approaches demonstrated that masses of live, unstructured and highly varied data could have tremendous primary business value. Today, the massive web-scale object stores popular for cloud-building and used to power production web and mobile applications often store all kinds of metadata. In fact, these stores support user-defined metadata that developers can arbitrarily extend for advanced application-specific tagging or data labeling. Some advanced file systems directly incorporate content indexing on data ingest to enable end-users to query primary storage for content containing specific words or phrases.

As an example of this evolution, consider the difference between two popular online file-sharing services, Dropbox and Evernote. Both can be used to store and sync various files across devices and share them between groups of users. Dropbox was the baseline standard defining online file sharing and collaboration, but Evernote goes much farther — although for a narrower set of use cases — by becoming innately content-aware with full content search, inline viewers and editors for common file types, extra metadata (e.g., URL source or reference if available, user tagging) and “similar content” recommendations. Although I use both daily, I view Dropbox as just another file-sharing alternative, while Evernote is critical to my workflow.
IT data awareness

Company lawyers (for e-discovery) and detectives (in security) require online systems that proactively identify abnormal behavior to produce early warnings on possible breaches. Smart data-aware storage systems can fold in auditing-type information and help correlate files, data and metadata with patterns of “events” — such as application crashes, file systems filling up, new users granted root access and shared or hidden key directories.

I remember one particularly blatant storage misusage (on a DEC VAX!) when we caught someone hoarding huge amounts of NSFW material on a little-accessed file system. Today’s more content-aware smart storage systems could alert security about such transgressions and warn (or even prevent) creative boundary-pushing users from crossing into job-termination territory to begin with.

Benefits of data-aware storage

  • Fine-grained data protection: Storage that knows, for example, what VM files or volumes belong to or — even better — a specific policy to enforce that VM’s data can directly ensure appropriate data protection (e.g., the right level of RAID or replication).
  • Fine-grained QoS: Similarly, storage that knows what database files require which kinds of performance acceleration can directly prioritize I/O and cache resources for optimal application performance.
  • Content indexing and search: Large stores used for text-based data can deliver extra value by indexing content upon ingestion and enabling built-in admin and (even) end-user search.
  • Social storage analysis: Storage can track usage and access by users and groups as metadata. Then other users can easily find out who in an organization had recent interest in certain content, identify group collaboration patterns and receive recommendations of new things to research based on collaborative filtering (e.g., “people who like the things I like also liked X”).
  • Active capacity and utilization management: Storage can also track metadata about “per-data” system resource performance, capacity and utilization metrics. This enables storage admins to directly see what is going on in IT infrastructure for any piece or group of data tracked directly back to end users, departments and applications. Smart storage can also help optimize its own configuration and behavioral alignment to workloads.
  • Analytics and machine learning: As storage grows smarter, expect to see increasing amounts of both low-level compute processing and automated machine learning incorporated directly into the storage layer. Storage-side functions could then be used to automatically categorize, score, translate, transform, visualize and report on data even as it’s being created and stored.

…(read the complete as-published article there)

When data storage infrastructure really has a brain

An IT industry analyst article published by SearchStorage.


article_When-data-storage-infrastructure-really-has-a-brain
Big data analysis and the internet of things are helping produce more intelligent storage infrastructure.

Mike Matchett

Cheaper and denser CPUs are driving smarter built-in intelligence into each layer of the data storage infrastructure stack.

Take storage, for example. Excess compute power can be harnessed to deploy agile software-defined storage (e.g., Hewlett Packard Enterprise StoreVirtual), transition to hyper-converged architectures (e.g., HyperGrid, Nutanix, Pivot3, SimpliVity), or optimize I/O by smartly redistributing storage functionality between application servers and disk hosts (e.g., Datrium).

There is a downside to all this built-in intelligence, however. It can diminish the visibility we might otherwise have between our data storage infrastructure and, well, changes — any IT change, really, whether due to intentional patching and upgrades, expanding usage and users, or complex bugs and component failures. Or, to put it another way, native, dynamic optimization enabled by powerful and inexpensive processors is making it increasingly difficult for us humans to figure out what’s going on with our infrastructures.

So while it’s really great when we don’t need to know any details, and can simply rely on low-level components to always do the right thing, until there is an absolutely autonomous data center — and, no, today’s public cloud computing doesn’t do away with the need for internal experts — IT may find baked-in intelligence a double-edged sword. Furthermore, while smarter data storage infrastructure helps us with provisioning, optimization, growth plans and troubleshooting, it can blind or fool us and actively work against our best efforts to bend infrastructure to our “will.”

Still, in spite of all these potential negatives, given the choice, I’d rather live in a smarter and more autonomous IT world than not (even if there is some risk of runaway AI). I’ll explain.

It’s all about the data

Remember when analysis used to be an offline process? Capture some data in a file; open Excel, SAS or other desktop tool; and weeks later receive a recommendation. Today, that kind of analysis latency is entirely too long and naïve.

Native, dynamic optimization enabled by powerful and inexpensive processors is making it increasingly difficult for us humans to figure out what’s going on with our infrastructures.

Given the speed and agility of our applications and users nowadays, not to mention bigger data streams and minute-by-minute elastic cloud brokering, we need insight and answers faster than ever. This kind of intelligence starts with plentiful, reliable data, which today’s infrastructures are producing more and more of every day (in fact, we’ll soon be drowning in new data thanks to the internet of things [IoT]), and a way to process and manage all that information.

Storage arrays, for example, have long produced insightful data, but historically required vendor-specific, complex and expensive storage resource management applications to make good use of it. Fortunately, today, there are a series of developments helping us become smarter about IT systems management and better (and faster) users of data generated by our infrastructures: …(read the complete as-published article there)

Get the most from cloud-based storage services

An IT industry analyst article published by SearchStorage.


We have been hearing about the inevitable transition to the cloud for IT infrastructure since before the turn of the century. But, year after year, storage shops quickly become focused on only that year’s prioritized initiatives, which tend to be mostly about keeping the lights on and costs low. A true vision-led shift to cloud-based storage services requires explicit executive sponsorship from the business side of an organization. But unless you cynically count the creeping use of shadow IT as an actual strategic directive to do better as an internal service provider, what gets asked of you is likely — and unfortunately — to perform only low-risk tactical deployments or incremental upgrades.

Not exactly the stuff of business transformations.

Cloud adoption at a level for maximum business impact requires big executive commitment. That amount of commitment is, quite frankly, not easy to generate.

…(read the complete as-published article there)

What’s the future of data storage in 2016?

An IT industry analyst article published by SearchStorage.


It’s hard to make stunning predictions on the future of data storage that are certain to come true, but it’s that time of year and I’m going to step out on that limb again. I’ll review my predictions from last year as I go — after all, how much can you trust me if I’m not on target year after year? (Yikes!)

Last year, I said the total data storage market would stay flat despite big growth in unstructured data. I’d have to say that seems to be true, if not actually dropping. Despite lots of new entrants in the market, the average vendor margin in storage is narrowing with software-defined variants showing up everywhere, open-source alternatives nibbling at the edges, commodity-based appliances becoming the rule, and ever-cheaper “usable” flash products improving performance and density at the same time.

…(read the complete as-published article there)

Big data analytics applications impact storage systems

An IT industry analyst article published by SearchStorage.


Whether driven by direct competition or internal business pressure, CIOs, CDOs and even CEOs today are looking to squeeze more value, more insight and more intelligence out of their data. They no longer can afford to archive, ignore or throw away data if it can be turned into a valuable asset. At face value, it might seem like a no-brainer — “we just need to analyze all that data to mine its value.” But, as you know, keeping any data, much less big data, has a definite cost. Processing larger amounts of data at scale is challenging, and hosting all that data on primary storage hasn’t always been feasible.

Historically, unless data had some corporate value — possibly as a history trail for compliance, a source of strategic insight or intelligence that can optimize operational processes — it was tough to justify keeping it. Today, thanks in large part to big data analytics applications, that thinking is changing. All of that bulky low-level bigger data has little immediate value, but there might be great future potential someday, so you want to keep it — once it’s gone, you lose any downstream opportunity.

To extract value from all that data, however, IT must not only store increasingly large volumes of data, but also architect systems that can process and analyze it in multiple ways.

…(read the complete as-published article there)

Navigate data lakes to manage big data

An IT industry analyst article published by SearchStorage.


Big data sure is exciting to business folks, with all sorts of killer applications just waiting to be discovered. And you no doubt have a growing pile of data bursting the seams of your current storage infrastructure, with lots of requests to mine even more voluminous data streams. Haven’t you been collecting microsecond end-user behavior across all your customers and prospects, not to mention collating the petabytes of data exhaust from instrumenting your systems to the nth degree? Imagine the insight management would have if they could look at all that data at once. Forget about data governance, data management, data protection and all those other IT worries — you just need to land all that data in a relatively scale-cheap Hadoop cluster!

Seriously, though, big data lakes can meet growing data challenges and provide valuable new services to your business. By collecting a wide variety of data sets relevant to the business all in one place and enabling multi-talented analytics based on big data approaches that easily scale, many new data mining opportunities can be created. The total potential value of a data lake grows with the amount of useful data it holds available for analysis. And, one of the key tenets of big data and the big data lake concept is that you don’t have to create a master schema ahead of time, so non-linear growth is possible.

The enterprise data lakes or hub concept was first proposed by big data vendors like Cloudera and Hortonworks, ostensibly using vanilla scale-out HDFS-based commodity storage. But it just so happens that the more data you keep on hand, the more storage of all kinds you will need. Eventually, all corporate data is likely to be considered big data. However, not all of that corporate data is best hosted on a commodity scale-out HDFS cluster.

So, today, traditional storage vendors are signing up to the big data lakes vision. From a storage marketing perspective, it seems like data lakes are the new cloud. “Everyone needs a data lake. How can you compete without one (or two or three)?” And there are a variety of enterprise storage options for big data, including enterprise storage, that can provide remote storage that acts like HDFS, Hadoop virtualization that can translate other storage protocols into HDFS, and scalable software-defined storage options.

…(read the complete as-published article there)

Moving to all-flash? Think about your data storage infrastructure

An IT industry analyst article published by SearchStorage.


Everyone is now onboard with flash. All the key storage vendors have at least announced entry into the all-flash storage array market, with most having offered hybrids — solid-state drive-pumped traditional arrays — for years. As silicon storage gets cheaper and denser, it seems inevitable that data centers will migrate from spinning disks to “faster, better and cheaper” options, with non-volatile memory poised to be the long-term winner.

But the storage skirmish today seems to be heading toward the total cost of ownership end of things, where two key questions must be answered:

  • How much performance is needed, and how many workloads in the data center have data with varying quality of service (QoS) requirements or data that ages out?
  • Are hybrid arrays a better choice to handle mixed workloads through advanced QoS and auto-tiering features?

All-flash proponents argue that cost and capacity will continue to drop for flash compared to hard disk drives (HDDs), and that no workload is left wanting with the ability of all-flash to service all I/Os at top performance. Yet we see a new category of hybrids on the market that are designed for flash-level performance and then fold in multiple tiers of colder storage. The argument there is that data isn’t all the same and its value changes over its lifetime. Why store older, un-accessed data on a top tier when there are cheaper, capacity-oriented tiers available?

It’s misleading to lump together hybrids that are traditional arrays with solid-state drives (SSDs) added and the new hybrids that might be one step evolved past all-flash arrays. And it can get even more confusing when the old arrays get stuffed with nothing but flash and are positioned as all-flash products. To differentiate, some industry wags like to use the term “flash-first” to describe newer-generation products purpose-built for flash speeds. That still could cause some confusion when considering both hybrids and all-flash designs. It may be more accurate to call the flash-first hybrids “flash-converged.” By being flash-converged, you can expect to buy one of these new hybrids with nothing but flash inside and get all-flash performance.

We aren’t totally convinced that the future data center will have just a two-tier system with flash on top backed by tape (or a remote cold cloud), but a “hot-cold storage” future is entirely possible as intermediate tiers of storage get, well, dis-intermediated. We’ve all predicted the demise of 15K HDDs for a while; can all the other HDDs be far behind as QoS controls get more sophisticated in handling the automatic mixing of hot and cold to create any temperature storage you might need? …

…(read the complete as-published article there)