Where serverless storage fits in the serverless computing world

An IT industry analyst article published by SearchStorage

Storage for serverless functions must be external to the compute environment. Learn about the types of storage that work best for serverless computing.

What is serverless computing and how does serverless storage work? Basically, serverless is a way to develop an application as a set of smaller functional components and then submit those functions to a host service for elastically scalable event-driven execution. The service provider takes care of running, scaling and operating the entire IT stack underneath, freeing the application owner to focus more on business value and less on IT operations.

Because the service provider charges per function execution and not for reserved infrastructure, serverless operating costs fully align with application usage. With serverless, cloud expenses can be traced directly to applications and business users.

Serverless isn’t truly without servers; there’s a server somewhere in a cloud. Behind the scenes, there are probably virtual machines, containers, servers and storage. There are OS and application server code and layers of IT management that might include cloud orchestration, virtual hosting and container management, such as Kubernetes. The service provider handles all the networking, availability, performance, capacity and scalability issues an IT shop might normally have to be concerned with when hosting its own applications.

Think of it as functions as a service

A better name for all this might be functions as a service. At its simplest, this kind of function is a little piece of event-driven code, perhaps written in JavaScript, that’s set up to be invoked in response to some trigger like a web-form submission click or IoT device event. Simple functions could simply insert a record into a database, make a log entry, trigger another event, send a notification message, transform a piece of data or return a calculation.

A complete serverless application would consist of a well-orchestrated cascade of functions. Each function could be independently reused and triggered in massively parallel ways. A function’s execution could be scaled orthogonally from other functions providing for tremendous flexibility without traditional bottlenecks.

When a well-designed master graph of functions that trigger each other in a sophisticated cascade of events is deployed at scale, it can not only replace but outperform monolithic applications. One of the premises of serverless is that your applications would scale smoothly and be fully elastic, well beyond the limits of any predefined or prepaid server farm or set of cloud machine instances.

This is still a bit of futuristic because, of course, applications aren’t made up of just compute functions. There’s still data that needs to be managed, protected and persisted. Storage for serverless must be a consideration and effective serverless storage developed.

Storage for serverless

Like microservices in containers, the initial idea for serverless functions is to construct them in ephemerally, so they don’t contain any data and aren’t relied upon to internally persist data.

Unlike long-running applications, the point is that functions are triggered by an event, do something specific and are retired as opposed to becoming a long-lived application. Serverless functions and may be launched in massively parallel ways to scale quickly to meet demand. So where does serverless data live?

Effectively, data storage for serverless functions must be external to the compute environment. Because of the elastic scaling and small event cascades, traditional storage volumes and file systems are going to struggle mightily and become obvious bottlenecks in large-scale serverless application deployments.

Today, serverless function development environments can make direct use of storage service APIs. As scalable storage suitable for web-scale apps and containerized applications, cloud-data services are ideal data persistence partners. Here are the storage types that work well as storage for serverless:

  1. Cloud database services that are architected for multi-tenancy, elastic scalability and web-style access make them a good choice for transactional persistence.
  2. Object storage services, such as AWS S3, with simple get/put protocols are ideal for many kinds of web-scale apps and functional designs.
  3. Application memory cache, such as Redis, can work for high-performance data sharing needs.
  4. Journaling logs in which data is written serially to the end, while readable in aggregate, can help protect streaming data in functional designs.

When using any data store with serverless designs, great care must be taken with idempotency — an operation that has no additional effect if applied more than one time. Care must also be taken with asynchronous event assumptions and timing or race conditions. In particular, functional designs have to pay attention to parallel data updates and write-locking conditions.

So what exactly is serverless storage?

The serverless ideal is to get IT — and DevOps — out of managing and operating servers of any kind, including physical servers, virtual servers, containers and cloud instances. The idea is to force IT to hand over the responsibility to be on call for any operational issue to the serverless environment provider. If a company is adopting serverless, ultimately, the idea will be to get out of operating and managing storage as well.

However, common requirements for IT to ensure data governance, compliance and protection will mean a long period of hybrid protected storage architectures. And in the long term? Data could just be passed forward from function to function through the event queue and never actually persisted, relying on the event queuing service for data protection. But, of course, there’s still storage in there somewhere.

…(read the complete as-published article there)

Learn storage techniques for managing unstructured data use

An IT industry analyst article published by SearchStorage.


article_Learn-storage-techniques-for-managing-unstructured-data-use
Rearchitect storage to maximize unstructured data use at the global scale for larger data sets coming from big data analytics and other applications.

Mike Matchett
Small World Big Data

Back in the good old days, we mostly dealt with two storage tiers. We had online, high-performance primary storage directly used by applications and colder secondary storage used to tier less-valuable data out of primary storage. It wasn’t that most data lost value on a hard expiration date, but primary storage was pricey enough to constrain capacity, and we needed to make room for newer, more immediately valuable data.

We spent a lot of time trying to intelligently summarize and aggregate aging data to keep some kind of historical information trail online. Still, masses of detailed data were sent off to bed, out of sight and relatively offline. That’s all changing as managing unstructured data becomes a bigger concern. New services provide storage for big data analysis of detailed unstructured and machine data, as well as to support web-speed DevOps agility, deliver storage self-service and control IT costs. Fundamentally, these services help storage pros provide and maintain more valuable online access to ever-larger data sets.

Products for managing unstructured data may include copy data management (CDM), global file systems, hybrid cloud architectures, global data protection and big data analytics. These features help keep much, if not all, data available and productive.

Handling the data explosion

The underlying theme of many new storage offerings is to extend enterprise-quality IT management and governance across multiple tiers of global storage.

We’re seeing a lot of high-variety, high-volume and unstructured data. That’s pretty much everything other than highly structured database records. The new data explosion includes growing files and file systems, machine-generated data streams, web-scale application exhaust, endless file versioning, finer-grained backups and rollback snapshots to meet lower tolerances for data integrity and business continuity, and vast image and media repositories.

The public cloud is one way to deal with this data explosion, but it’s not always the best answer by itself. Elastic cloud storage services are easy to use to deploy large amounts of storage capacity. However, unless you want to create a growing and increasingly expensive cloud data dump, advanced storage management is required for managing unstructured data as well. The underlying theme of many new storage offerings is to extend enterprise-quality IT management and governance across multiple tiers of global storage, including hybrid and public cloud configurations.

If you’re architecting a new approach to storage, especially unstructured data storage at a global enterprise scale, here are seven advanced storage capabilities to consider:

Automated storage tiering. Storage tiering isn’t a new concept, but today it works across disparate storage arrays and vendors, often virtualizing in-place storage first. Advanced storage tiering products subsume yesterday’s simpler cloud gateways. They learn workload-specific performance needs and implement key quality of service, security and business cost control policies.

Much of what used to make up individual products, such as storage virtualizers, global distributed file systems, bulk data replicators, and migrators and cloud gateways, are converging into single-console unifying storage services. Enmotus and Veritas offer these simple-to-use services. This type of storage tiering enables unified storage infrastructure and provides a core service for many different types of storage management products.

Metadata at scale. There’s a growing focus on collecting and using storage metadata — data about stored data — when managing unstructured data. By properly aggregating and exploiting metadata at scale, storage vendors can better virtualize storage, optimize services, enforce governance policies and augment end-user analytical efforts.

Metadata concepts are most familiar in an object or file storage context. However, advanced block and virtual machine-level storage services are increasingly using metadata detail to help with tiering for performance. We also see metadata in data protection features. Reduxio’s infinite snapshots and immediate recovery based on timestamping changed blocks take advantage of metadata, as do change data capture techniques and N-way replication. When looking at heavily metadata-driven storage, it’s important to examine metadata protection schemes and potential bottlenecks. Interestingly, metadata-heavy approaches can improve storage performance because they usually allow for high metadata performance and scalability out of band from data delivery.

Storage analytics. You can use metadata and other introspective analytics about storage use gathered across enterprise storage, both offline and increasingly in dynamic optimizations. Call-home management is one example of how these analytics are used to better manage storage…(read the complete as-published article there)

Is demand for data storage or supply driving increased storage?

An IT industry analyst article published by SearchStorage.


article_Is-demand-for-data-storage-or-supply-driving-increased-storage
Figuring out whether we’re storing more data than ever because we’re producing more data or because constantly evolving storage technology lets us store more of it isn’t easy.

Mike Matchett
Small World Big Data

Whether you’re growing on-premises storage or your cloud storage footprint this year, it’s likely you’re increasing total storage faster than ever. Where we used to see capacity upgrade requests for proposals in terms of tens of terabytes growth, we now regularly see RFPs for half a petabyte or more. When it comes to storage size, huge is in.

Do we really need that much more data to stay competitive? Yes, probably. Can we afford extremely deep storage repositories? It seems that we can. However, these questions raise a more basic chicken-and-egg question: Are we storing more data because we’re making more data or because constantly evolving storage technology lets us?

Data storage economics
Looked at from a pricing perspective, the question becomes what’s driving price — more demand for data storage or more storage supply? I’ve heard economics professors say they can tell who really understands basic supply and demand price curve lessons when students ask this kind of question and consider a supply-side answer first. People tend to focus on demand-side explanations as the most straightforward way of explaining why prices fluctuate. I guess it’s easier to assume supply is a remote constant while envisioning all the possible changes in demand for data storage.

As we learn to wring more value out of our data, we want to both make and store more data.

But if storage supply is constant, given our massive data growth, then it should be really expensive. The massive squirreling away of data would instead be constrained by that high storage price (low availability). This was how it was years ago. Remember when traditional IT application environments struggled to fit into limited storage infrastructure that was already stretched thin to meet ever-growing demand?

Today, data capacities are growing large fast, and yet the price of storage keeps dropping (per unit of storage capacity). There’s no doubt supply is rising faster than demand for data storage. Technologies that bring tremendous supply-side benefits, such as the inherent efficiencies in shared cloud storage — and Moore’s law and clustered open source file systems like Hadoop Distributed File System and other technologies — have made bulk storage capacity so affordable that despite massive growth in demand for data storage, the price of storage continues to drop.

Endless data storage
When we think of hot new storage technologies, we tend to focus on primary storage advances such as flash and nonvolatile memory express. All so-called secondary storage comes, well, second. It’s true the relative value of a gigabyte of primary storage has greatly increased. Just compare the ROI of buying a whole bunch of dedicated, short-stroked HDDs as we did in the past to investing in a modicum of today’s fully deduped, automatically tiered and workload-shared flash.

It’s also worth thinking about flash storage in terms of impact on capacity, not just performance. If flash storage can serve a workload in one-tenth the time, it can also serve 10 similar workloads in the same time, providing an effective 10-times capacity boost.

But don’t discount the major changes that have happened in secondary storage…(read the complete as-published article there)

Future of data storage technology: Transformational trends for 2018

An IT industry analyst article published by SearchStorage.


article_Future-of-data-storage-technology-Transformational-trends-for-2018
Risk-averse enterprises finally accepted the cloud in 2017, and we didn’t even notice. Expect the same for these data storage technology trends in the new year.

Mike Matchett
Small World Big Data

Sometimes big changes sneak up on you, especially when you’re talking about the future of data storage technology. For example, when exactly did full-on cloud adoption become fully accepted by all those risk-averse organizations, understaffed IT shops and disbelieving business executives? I’m not complaining, but the needle of cloud acceptance tilted over sometime in the recent past without much ado. It seems everyone has let go of their fear of cloud and hybrid operations as risky propositions. Instead, we’ve all come to accept the cloud as something that’s just done.

Sure, cloud was inevitable, but I’d still like to know why it finally happened now. Maybe it’s because IT consumers expect information technology will provide whatever they want on demand. Or maybe it’s because everything IT implements on premises now comes labeled as private cloud. Influential companies, such as IBM, Microsoft and Oracle, are happy to help ease folks formerly committed to private infrastructure toward hybrid architectures that happen to use their respective cloud services.

In any case, I’m disappointed I didn’t get my invitation to the “cloud finally happened” party. But having missed cloud’s big moment, I’m not going to let other obvious yet possibly transformative trends sneak past as they go mainstream with enterprises in 2018. So when it comes to the future of data storage technology, I’ll be watching the following:

Containers arose out of a long-standing desire to find a better way to package applications. This year we should see enterprise-class container management reach maturity parity with virtual machine management — while not holding back any advantages containers have over VMs. Expect modern software-defined resources, such as storage, to be delivered mostly in containerized form. When combined with dynamic operational APIs, these resources will deliver highly flexible programmable infrastructures. This approach should enable vendors to package applications and their required infrastructure as units that can be redeployed — that is, blueprinted or specified in editable and versionable manifest files — enabling full environment and even data center-level cloud provisioning. Being able to deploy a data center on demand could completely transform disaster recovery, to name one use case.

Everyone is talking about AI, but it is machine learning that’s slowly permeating through just about every facet of IT management. Although there’s a lot of hype, it’s worth figuring out how and where carefully applied machine learning could add significant value. Most machine learning is conceptually made up of advanced forms of pattern recognition. So think about where using the technology to automatically identify complex patterns would reduce time and effort. We expect the increasing availability of machine learning algorithms to give rise to new storage management processes. These algorithms can produce storage management processes that can learn and adjust operations and settings to optimize workload services, quickly identify and fix the root causes of abnormalities, and broker storage infrastructure and manage large-scale data to minimize cost.

Management as a service (MaaS) is gaining traction, when looking at the future of data storage technology. First, every storage array seemingly comes with built-in call home support replete with management analytics and performance optimization. I predict that the interval for most remote vendor management services to quickly drop from today’s daily batch to five-minute streaming. I also expect cloud-hosted MaaS offerings are the way most shops will manage their increasingly hybrid architectures, and many will start to shift away from the burdens of on-premises management software…(read the complete as-published article there)

Secondary data storage: A massively scalable transformation

An IT industry analyst article published by SearchStorage.


article_Secondary-data-storage-A-massively-scalable-transformation
Capitalize on flash with interactive, online secondary data storage architectures that make a lot more data available for business while maximizing flash investment.

Mike Matchett
Small World Big Data

We all know flash storage is fast, increasingly affordable and quickly beating out traditional spinning disk for primary storage needs. It’s like all our key business applications have been magically upgraded to perform 10 times faster!

In the data center, modern primary storage arrays now come with massive flash caching, large flash tiers or are all flash through and through. Old worries about flash wearing out have been largely forgotten. And there are some new takes on storage designs, such as Datrium’s, that make great use of less-expensive server-side flash. Clearly, spending money on some kind of flash, if not all flash, can be a great IT investment.

Yet, as everyone builds primary storage with flash, there is less differentiation among those flashy designs. At some point, “really fast” is fast enough for now, assuming you aren’t in financial trading.

Rather than argue whose flash is faster, more reliable, more scalable or even cheaper, the major enterprise IT storage concern is shifting toward getting the most out of whatever high-performance primary storage investment gets made. Chasing ever-greater performance can be competitively lucrative, but universally, we see business demand for larger operational data sets growing quickly. Flash or not, primary storage still presents an ever-present capacity-planning challenge.

A new ‘big data’ opportunity
The drive to optimize shiny new primary storage pushes IT folks to use it as much as possible with suitable supporting secondary data storage. As this is literally a new “big data” opportunity, there is a correspondingly big change happening in the secondary storage market. Old-school backup storage designed solely as an offline data protection target doesn’t provide the scale, speed and interactive storage services increasingly demanded by today’s self-service-oriented users.

We’re seeing a massive trend toward interactive, online, secondary storage architectures. Instead of dumping backups, snapshots and archives into slow, near-online or essentially offline deep storage tiers, organizations are finding it’s worthwhile to keep large volumes of second-tier data in active use. With this shift to online secondary data storage, end users can quickly find and recover their own data like they do with Apple’s Time Machine on their Macs. And organizations can profitably mine and derive valuable insights from older, colder, larger data sets, such as big data analytics, machine learning and deep historical search.

If that sounds like a handy convergence of backup and archive, you’re right. There’s increasingly less difference between data protection backup and recovery and retention archiving…(read the complete as-published article there)