Where serverless storage fits in the serverless computing world

An IT industry analyst article published by SearchStorage

Storage for serverless functions must be external to the compute environment. Learn about the types of storage that work best for serverless computing.

What is serverless computing and how does serverless storage work? Basically, serverless is a way to develop an application as a set of smaller functional components and then submit those functions to a host service for elastically scalable event-driven execution. The service provider takes care of running, scaling and operating the entire IT stack underneath, freeing the application owner to focus more on business value and less on IT operations.

Because the service provider charges per function execution and not for reserved infrastructure, serverless operating costs fully align with application usage. With serverless, cloud expenses can be traced directly to applications and business users.

Serverless isn’t truly without servers; there’s a server somewhere in a cloud. Behind the scenes, there are probably virtual machines, containers, servers and storage. There are OS and application server code and layers of IT management that might include cloud orchestration, virtual hosting and container management, such as Kubernetes. The service provider handles all the networking, availability, performance, capacity and scalability issues an IT shop might normally have to be concerned with when hosting its own applications.

Think of it as functions as a service

A better name for all this might be functions as a service. At its simplest, this kind of function is a little piece of event-driven code, perhaps written in JavaScript, that’s set up to be invoked in response to some trigger like a web-form submission click or IoT device event. Simple functions could simply insert a record into a database, make a log entry, trigger another event, send a notification message, transform a piece of data or return a calculation.

A complete serverless application would consist of a well-orchestrated cascade of functions. Each function could be independently reused and triggered in massively parallel ways. A function’s execution could be scaled orthogonally from other functions providing for tremendous flexibility without traditional bottlenecks.

When a well-designed master graph of functions that trigger each other in a sophisticated cascade of events is deployed at scale, it can not only replace but outperform monolithic applications. One of the premises of serverless is that your applications would scale smoothly and be fully elastic, well beyond the limits of any predefined or prepaid server farm or set of cloud machine instances.

This is still a bit of futuristic because, of course, applications aren’t made up of just compute functions. There’s still data that needs to be managed, protected and persisted. Storage for serverless must be a consideration and effective serverless storage developed.

Storage for serverless

Like microservices in containers, the initial idea for serverless functions is to construct them in ephemerally, so they don’t contain any data and aren’t relied upon to internally persist data.

Unlike long-running applications, the point is that functions are triggered by an event, do something specific and are retired as opposed to becoming a long-lived application. Serverless functions and may be launched in massively parallel ways to scale quickly to meet demand. So where does serverless data live?

Effectively, data storage for serverless functions must be external to the compute environment. Because of the elastic scaling and small event cascades, traditional storage volumes and file systems are going to struggle mightily and become obvious bottlenecks in large-scale serverless application deployments.

Today, serverless function development environments can make direct use of storage service APIs. As scalable storage suitable for web-scale apps and containerized applications, cloud-data services are ideal data persistence partners. Here are the storage types that work well as storage for serverless:

  1. Cloud database services that are architected for multi-tenancy, elastic scalability and web-style access make them a good choice for transactional persistence.
  2. Object storage services, such as AWS S3, with simple get/put protocols are ideal for many kinds of web-scale apps and functional designs.
  3. Application memory cache, such as Redis, can work for high-performance data sharing needs.
  4. Journaling logs in which data is written serially to the end, while readable in aggregate, can help protect streaming data in functional designs.

When using any data store with serverless designs, great care must be taken with idempotency — an operation that has no additional effect if applied more than one time. Care must also be taken with asynchronous event assumptions and timing or race conditions. In particular, functional designs have to pay attention to parallel data updates and write-locking conditions.

So what exactly is serverless storage?

The serverless ideal is to get IT — and DevOps — out of managing and operating servers of any kind, including physical servers, virtual servers, containers and cloud instances. The idea is to force IT to hand over the responsibility to be on call for any operational issue to the serverless environment provider. If a company is adopting serverless, ultimately, the idea will be to get out of operating and managing storage as well.

However, common requirements for IT to ensure data governance, compliance and protection will mean a long period of hybrid protected storage architectures. And in the long term? Data could just be passed forward from function to function through the event queue and never actually persisted, relying on the event queuing service for data protection. But, of course, there’s still storage in there somewhere.

…(read the complete as-published article there)

Cloud and containers rewrite monitoring and management rulebooks

An IT industry analyst article published by searchitoperations.techtarget.

IT performance management is tricky when it stretches as big as the public cloud and scrunches as small as a microservice. IT ops techniques must change to keep up.article_Cloud-and-containers-rewrite-monitoring-and-management-rulebooks

After getting just a little taste of data center transformation success, an overly enthusiastic enterprise can rush off on a headlong, downhill race to eliminate as much IT data center cost as possible. And that can lead to trying multiple cloud services, experiments converging infrastructure and software stacks, and the adoption of DevOps-friendly technologies, such as containerization. Each of these can help force out burdensome Capex and provide increased agility.

Still, I can’t help but think that in this mad rush to deflate the data center maybe we lose sight of something important. Cloud and containers are grand and hold lots of promise, but often they simply don’t arrive on scene with full enterprise management, battle-tested security or — as we address here — guaranteed ways to assure service levels.

Keep an eye on the prize

Convergence, cloud and containers are all hot technologies to be sure. The value they provide is increased abstraction between workloads and infrastructure. More abstraction is great for our new distributed, DevOps-oriented world, but it also tends to obscure ultimate visibility into what makes for good IT performance.

There are many ways to look at performance, but let’s focus on workload response time as one of the important measures of how happy our end users will be with their IT experience. Imagine a chart that has CPU utilization increasing linearly on the X axis from 0% to 100%. If we plot the average interactive transaction performance for that CPU on the Y axis, we’ll end up with an exponential curve starting at a reasonable service time at 0%, but shooting up towards infinity at 100% utilization. (Note: For the mathematically minded, the response time curve can be modeled using queuing theory to calculate the probabilistic waiting time for the increasingly busy resource.)

Driving the utilization of an infrastructure resource as high as possible by adding load, especially to please accountants focused on utilization metrics, eventually becomes counter-productive in terms of IT performance management.

It is true that batch workloads are measured more by throughput, which does max out at maximum utilization. However, response time is critical for any interactive workload. And in today’s fast-data world, we process more sources and streams of data in near real time by and for interactive operations and applications. Big data today is about bringing as much intelligence as possible as far forward as possible to the operational edge.

Cloud and containers each change IT performance management in distinct ways, and while change can be scary, there are ways for IT admins to ensure performance stays within an acceptable range.

Prized pets vs. cattle herds

Cloud-app designers tend to treat portions of critical code more like fungible cattle — and less like the historical pets they used to be in the silos of traditional data center infrastructure. We used to carefully monitor and manage mission-critical applications and their resources; any deviation from normal would be triaged immediately. With many of today’s cloud apps, deviation in performance will result in the app being re-provisioned with a new cloud instance that should perform better.

But not necessarily. To understand this, let’s look at virtualization, which has brought many benefits, but not always better performance. In a virtualized host, guaranteed actual response time performance has always been a problem.

While a virtual admin can assign a quota of host resources (in terms of utilization) to a given VM, each host resource by definition is shared by many VMs at the same time. Once we understand that response time performance is nonlinear with respect to total system utilization, we can immediately see how the noisy neighbor problem arises on heavily utilized virtual servers — even if our critical VM has a guaranteed slice of utilization.

As an example, consider how all of the VMs on a given host server have a guaranteed slice of capacity. If enough VMs use their capacity at the same time to drive total utilization of the server above 50%-60%, response time will degrade for all of them. Over a certain threshold of utilization far less than 100%, the underlying server resource still has remaining capacity, but experienced performance can degrade by half. As utilization approaches 100%, responsiveness can degrade to the point where little actual work is even getting through the system.

If we think of clouds, public or private, as large virtual server farms, we can see why a cloud machine instance may not always provide the performance we deserve. The cloud service provider promises a certain amount of resource utilization when we put in our credit card number and check out a cloud server. The cloud provider, however, does not generally certify that your particular machine instance will not be cohosted with many other competing instances. This means that during busy times, many hosted machine instances won’t provide the same level of performance as when their underlying cloud infrastructure is less than half idle.

Fundamentally, clouds are cost-efficient because they pool and share infrastructure as widely as possible. A cloud service provider is economically incentivized to stuff as many virtual instances as possible into a given cloud infrastructure footprint. In fact, one of the key areas of profit margin for a cloud provider is in being able to oversubscribe real infrastructure as much as possible across multiple tenants, knowing statistically that many machine instances much of the time are highly underutilized, if utilized at all.

Thus, web app administrators and clever DevOps folks treat their cloud applications more like cattle. They architect their web applications in a distributed fashion across many machine instances such that if any one machine instance within that pool ever suffers slow performance, they simply kill it and restart it. When your service provider is large enough, the restart operation almost guarantees that the new instance will generate in a different area of the cloud infrastructure, away from its previously noisy neighbors. It’s worth noting that this cattle approach might not work so well on less-expansive private clouds.

Containerized visibility

With containerized, microservices-heavy applications, performance can even be more opaque. A single microservice by original definition simply doesn’t last long even if its performance is lousy. With a massively containerized application, we might only see poor performance in the aggregate end result. And because microservices can be ephemeral, we can’t really manage them as either pets or cattle.

When we had pets assigned to their own isolated infrastructure, end-to-end infrastructure performance management tools enabled IT admins to identify and correct obvious performance problems. While virtualization began to muddy IT performance management, there were still effective ways to correlate application performance with virtualized infrastructure. But once we move our applications to a public cloud, managing for top-notch performance becomes more of statistical cat-and-mouse game. And now with the rise of containers, managing performance is an even greater challenge.

The good news is that, with container architectures, we can readily add performance instrumentation at a very fine-grained level within our application. Given a new crop of highly scalable and responsive management tools, it should be possible to shepherd flocks of containers to greener-performing pastures using clever IT operations automation (likely based on effective use of machine learning).

The real trick for a competitive technology organization will be to proactively, if not predictably and continuously, achieve high performance at the same time it implements a deliberately chosen cost or spend policy. This balancing act in some ways gets harder with cloud and containers — because of increased opaqueness and scale — but also easier — because of distributed data and processing technologies.

…(read the complete as-published article there)

5 trends driving the big data evolution

An IT industry analyst article published by SearchDataManagement.


article_5-trends-driving-the-big-data-evolution
The speedy evolution of big data technologies is connected to five trends, including practical applications of machine learning and cheap, abundantly available compute resources.

Mike Matchett
Small World Big Data

I’ve long said that all data will eventually become big data, and big data platforms will evolve into our next-generation data processing platform. We have reached a point in big data evolution where it is now mainstream, and if your organization is not neck-deep in figuring out how to implement big data technologies, you might be running out of time.

Indeed, the big data world continues to change rapidly, as I observed recently at the Strata Data Conference in New York. While there, I met with over a dozen key vendors in sessions and on the show floor.

Overall, the folks attending conferences like this one are less and less those slightly goofy and idealistic, open source research-focused geeks, and are more real-world big data and machine learning practitioners looking to solve real business problems in enterprise production environments. Given that basic vibe, here are my top five takeaways from Strata on the big data trends that are driving the big data evolution.

1. Structured data

Big data isn’t just about unstructured or semi-structured data anymore. Many of the prominent vendors, led by the key platform providers like Hortonworks, MapR and Cloudera, are now talking about big data implementations as full enterprise data warehouses (EDWs). The passive, often swampy data lake idea seems a bit passé, while there is a lot of energy aimed at providing practical, real-time business intelligence to a wider corporate swath of BI consumers.

I noted a large number of the big data-based acceleration competitors are applying on-demand analytics against tremendous volumes — both historical and streaming IoT style — of structured data.

Clearly, there is a war going on for the corporate BI and EDW investment. Given what I’ve seen, my bet is on big data platforms to inevitably outpace and outperform monolithic and proprietary legacy EDW.

2. Converged system of action

This leads into the observation that big data evolution includes implementations that host more and more of a company’s entire data footprint — structured and unstructured data together.

We’ve previously noted that many advanced analytical approaches can add tremendous value when they combine many formerly disparate corporate data sets of all different types…(read the complete as-published article there)

Big data processing could be the way all data is processed

An IT industry analyst article published by SearchITOperations.


article_Big-data-processing-could-be-the-way-all-data-is-processed
Some organizations take their time with new technologies to let first adopters suffer the growing pains. But there’s no treading water in the big data stream; the current won’t wait.

Mike Matchett
Small World Big Data

Have you noticed yet? Those geeky big data platforms based on clusters of commodity nodes running open source parallel processing algorithms are evolving into some seriously advanced IT functionality.

The popular branded distributions of the Apache projects, including Hortonworks, Cloudera and MapR, are no longer simply made up of relatively basic big data batch query tools, such as Hadoop MapReduce, the way they were 10 years ago. We’ve seen advances in machine learning, SQL-based transaction support, in-memory acceleration, interactive query performance, streaming data handling, enterprise IT data governance, protection and security. And even container services, scheduling and management are on a new level. Big data platforms now present a compelling vision for the future of perhaps all IT data processing.

Wait — do I really mean all IT data center processing will be big data processing? Most of us are just getting used to the idea of investing in and building out functional data lakes to capture and collect tons of unstructured data for business intelligence tasks, offline machine learning, active archive and other secondary data applications. And many are having a hard time making those data lake initiatives successful. It’s a challenge to develop staff expertise, assure data provenance, manage metadata and master implied schemas, i.e., creating a single version of truth.

…big data isn’t just for backroom data science geeks. The technologies involved are going to define the next-generation IT data center platform…

Many organizations may be waiting for things in the big data market to settle out. Unfortunately, especially for those more comfortable being late adopters, big data processing technology development is accelerating. We see use cases rapidly proliferate, and general IT manageability of big data streams (easing adoption and integration) greatly increase.

The universal big data onslaught is not going to slow down, nor will it wait for slackers to catch up. And those able to harness their big data streams today aren’t just using them to look up old baseball stats. They are able to use data to improve and accelerate operations, gain greater competitiveness and achieve actual ROI. I’m not even going to point out the possibility that savvy big data processing will uncover new revenue opportunities and business models. Oops, just did!

If you think you are falling behind today on big data initiatives, I’d recommend you consider doubling down now. This area is moving way too fast to jump on board later and still expect to catch competitors. Big data is proving to be a huge game changer. There simply won’t be a later with big data.

I’ve written before that all data is eventually going to be big data. I’ll now add that all processing is eventually going to be big data processing. In my view, the focus of big data technology has moved from building out systems of insight over trailing big data sets to now offering ways to build convergent systems of action over all data.

In other words, big data isn’t just for backroom data science geeks. The technologies involved are going to define the next-generation IT data center platform…(read the complete as-published article there)

Survey Spotlights Top 5 Data Storage Pain Points

An IT industry analyst article published by Enterprise Storage Forum.


by Mike Matchett,

The Enterprise Storage Forum survey uncovered the biggest challenges storage professionals have with their existing storage infrastructure: aging gear, lack of capacity, high operations cost, security, maintenance burden. We’ll discuss which storage technologies available or coming soon might serve to ease those pain points.

Data storage has been around as long as computing, but based on the Enterprise Storage Forum survey, we have yet to solve all the problems. Entitled Data Storage Trends 2018, the survey reveals that storage professionals face no lack of serious concerns.

One of the interesting charts that jumped out at me is about the biggest challenge in operating current storage infrastructure. In essence, this is the “select your biggest pain” question. Let’s dive in.

Top Five Data Storage Challenges
Why are these ever-present data storage challenges? Why haven’t storage vendors researched technologies and nailed down solutions to solve them? This chart illustrates the leading pain points; we’ll look at the top five:

http://www.enterprisestorageforum.com/imagesvr_ce/9011/biggest%20challenge%20chart.png

1. Aging gear: Of course, no matter when you invest in new equipment, it starts aging immediately. And once deployed, storage, and the data stored on it tends to sit in the data center until it reaches some arbitrary vendor end-of-life (EOL) stage. With working storage the motto tends to be – “If it’s not broke, don’t fix it!”

Still, once something like storage is deployed, the capex is a sunk cost. Aging storage should probably be replaced long before full obsolescence comes along; significant attribute improvements are likely available on the market at any large storage’s “half-life.” These include better performance and agility, cheaper operating costs and upgrades, increased capacity and new features.

Here, I can’t blame storage vendors for lack of improved storage offerings. From flash engineered designs to software-defined agility, the storage landscape is full of opportunistic (and large ROI) “refresh” solutions. Proactive storage managers might think to replace their storage “ahead of time” as the scales tip in favor of new solutions, rather than sit back and wait for the traditional “five year” accounting-based storage refresh cycle.

2. Lack of Storage Capacity: Yes, data is still growing. In fact, data growth can be non-linear, which makes it hard to plan ahead. Unable to keep up with capacity demand, many organizations now rely on that elastic storage provider, cloud, hybrid cloud or even multi-cloud storage services – which can get pricey!

We may be doomed to suffer this pain point forever, but some newer storage technologies are being designed to scale-out “for a long time” with linear performance…(read the complete as-published article there)