Cloud and containers rewrite monitoring and management rulebooks

An IT industry analyst article published by searchitoperations.techtarget.

IT performance management is tricky when it stretches as big as the public cloud and scrunches as small as a microservice. IT ops techniques must change to keep up.article_Cloud-and-containers-rewrite-monitoring-and-management-rulebooks

After getting just a little taste of data center transformation success, an overly enthusiastic enterprise can rush off on a headlong, downhill race to eliminate as much IT data center cost as possible. And that can lead to trying multiple cloud services, experiments converging infrastructure and software stacks, and the adoption of DevOps-friendly technologies, such as containerization. Each of these can help force out burdensome Capex and provide increased agility.

Still, I can’t help but think that in this mad rush to deflate the data center maybe we lose sight of something important. Cloud and containers are grand and hold lots of promise, but often they simply don’t arrive on scene with full enterprise management, battle-tested security or — as we address here — guaranteed ways to assure service levels.

Keep an eye on the prize

Convergence, cloud and containers are all hot technologies to be sure. The value they provide is increased abstraction between workloads and infrastructure. More abstraction is great for our new distributed, DevOps-oriented world, but it also tends to obscure ultimate visibility into what makes for good IT performance.

There are many ways to look at performance, but let’s focus on workload response time as one of the important measures of how happy our end users will be with their IT experience. Imagine a chart that has CPU utilization increasing linearly on the X axis from 0% to 100%. If we plot the average interactive transaction performance for that CPU on the Y axis, we’ll end up with an exponential curve starting at a reasonable service time at 0%, but shooting up towards infinity at 100% utilization. (Note: For the mathematically minded, the response time curve can be modeled using queuing theory to calculate the probabilistic waiting time for the increasingly busy resource.)

Driving the utilization of an infrastructure resource as high as possible by adding load, especially to please accountants focused on utilization metrics, eventually becomes counter-productive in terms of IT performance management.

It is true that batch workloads are measured more by throughput, which does max out at maximum utilization. However, response time is critical for any interactive workload. And in today’s fast-data world, we process more sources and streams of data in near real time by and for interactive operations and applications. Big data today is about bringing as much intelligence as possible as far forward as possible to the operational edge.

Cloud and containers each change IT performance management in distinct ways, and while change can be scary, there are ways for IT admins to ensure performance stays within an acceptable range.

Prized pets vs. cattle herds

Cloud-app designers tend to treat portions of critical code more like fungible cattle — and less like the historical pets they used to be in the silos of traditional data center infrastructure. We used to carefully monitor and manage mission-critical applications and their resources; any deviation from normal would be triaged immediately. With many of today’s cloud apps, deviation in performance will result in the app being re-provisioned with a new cloud instance that should perform better.

But not necessarily. To understand this, let’s look at virtualization, which has brought many benefits, but not always better performance. In a virtualized host, guaranteed actual response time performance has always been a problem.

While a virtual admin can assign a quota of host resources (in terms of utilization) to a given VM, each host resource by definition is shared by many VMs at the same time. Once we understand that response time performance is nonlinear with respect to total system utilization, we can immediately see how the noisy neighbor problem arises on heavily utilized virtual servers — even if our critical VM has a guaranteed slice of utilization.

As an example, consider how all of the VMs on a given host server have a guaranteed slice of capacity. If enough VMs use their capacity at the same time to drive total utilization of the server above 50%-60%, response time will degrade for all of them. Over a certain threshold of utilization far less than 100%, the underlying server resource still has remaining capacity, but experienced performance can degrade by half. As utilization approaches 100%, responsiveness can degrade to the point where little actual work is even getting through the system.

If we think of clouds, public or private, as large virtual server farms, we can see why a cloud machine instance may not always provide the performance we deserve. The cloud service provider promises a certain amount of resource utilization when we put in our credit card number and check out a cloud server. The cloud provider, however, does not generally certify that your particular machine instance will not be cohosted with many other competing instances. This means that during busy times, many hosted machine instances won’t provide the same level of performance as when their underlying cloud infrastructure is less than half idle.

Fundamentally, clouds are cost-efficient because they pool and share infrastructure as widely as possible. A cloud service provider is economically incentivized to stuff as many virtual instances as possible into a given cloud infrastructure footprint. In fact, one of the key areas of profit margin for a cloud provider is in being able to oversubscribe real infrastructure as much as possible across multiple tenants, knowing statistically that many machine instances much of the time are highly underutilized, if utilized at all.

Thus, web app administrators and clever DevOps folks treat their cloud applications more like cattle. They architect their web applications in a distributed fashion across many machine instances such that if any one machine instance within that pool ever suffers slow performance, they simply kill it and restart it. When your service provider is large enough, the restart operation almost guarantees that the new instance will generate in a different area of the cloud infrastructure, away from its previously noisy neighbors. It’s worth noting that this cattle approach might not work so well on less-expansive private clouds.

Containerized visibility

With containerized, microservices-heavy applications, performance can even be more opaque. A single microservice by original definition simply doesn’t last long even if its performance is lousy. With a massively containerized application, we might only see poor performance in the aggregate end result. And because microservices can be ephemeral, we can’t really manage them as either pets or cattle.

When we had pets assigned to their own isolated infrastructure, end-to-end infrastructure performance management tools enabled IT admins to identify and correct obvious performance problems. While virtualization began to muddy IT performance management, there were still effective ways to correlate application performance with virtualized infrastructure. But once we move our applications to a public cloud, managing for top-notch performance becomes more of statistical cat-and-mouse game. And now with the rise of containers, managing performance is an even greater challenge.

The good news is that, with container architectures, we can readily add performance instrumentation at a very fine-grained level within our application. Given a new crop of highly scalable and responsive management tools, it should be possible to shepherd flocks of containers to greener-performing pastures using clever IT operations automation (likely based on effective use of machine learning).

The real trick for a competitive technology organization will be to proactively, if not predictably and continuously, achieve high performance at the same time it implements a deliberately chosen cost or spend policy. This balancing act in some ways gets harder with cloud and containers — because of increased opaqueness and scale — but also easier — because of distributed data and processing technologies.

…(read the complete as-published article there)

Big data processing could be the way all data is processed

An IT industry analyst article published by SearchITOperations.

Some organizations take their time with new technologies to let first adopters suffer the growing pains. But there’s no treading water in the big data stream; the current won’t wait.

Mike Matchett
Small World Big Data

Have you noticed yet? Those geeky big data platforms based on clusters of commodity nodes running open source parallel processing algorithms are evolving into some seriously advanced IT functionality.

The popular branded distributions of the Apache projects, including Hortonworks, Cloudera and MapR, are no longer simply made up of relatively basic big data batch query tools, such as Hadoop MapReduce, the way they were 10 years ago. We’ve seen advances in machine learning, SQL-based transaction support, in-memory acceleration, interactive query performance, streaming data handling, enterprise IT data governance, protection and security. And even container services, scheduling and management are on a new level. Big data platforms now present a compelling vision for the future of perhaps all IT data processing.

Wait — do I really mean all IT data center processing will be big data processing? Most of us are just getting used to the idea of investing in and building out functional data lakes to capture and collect tons of unstructured data for business intelligence tasks, offline machine learning, active archive and other secondary data applications. And many are having a hard time making those data lake initiatives successful. It’s a challenge to develop staff expertise, assure data provenance, manage metadata and master implied schemas, i.e., creating a single version of truth.

…big data isn’t just for backroom data science geeks. The technologies involved are going to define the next-generation IT data center platform…

Many organizations may be waiting for things in the big data market to settle out. Unfortunately, especially for those more comfortable being late adopters, big data processing technology development is accelerating. We see use cases rapidly proliferate, and general IT manageability of big data streams (easing adoption and integration) greatly increase.

The universal big data onslaught is not going to slow down, nor will it wait for slackers to catch up. And those able to harness their big data streams today aren’t just using them to look up old baseball stats. They are able to use data to improve and accelerate operations, gain greater competitiveness and achieve actual ROI. I’m not even going to point out the possibility that savvy big data processing will uncover new revenue opportunities and business models. Oops, just did!

If you think you are falling behind today on big data initiatives, I’d recommend you consider doubling down now. This area is moving way too fast to jump on board later and still expect to catch competitors. Big data is proving to be a huge game changer. There simply won’t be a later with big data.

I’ve written before that all data is eventually going to be big data. I’ll now add that all processing is eventually going to be big data processing. In my view, the focus of big data technology has moved from building out systems of insight over trailing big data sets to now offering ways to build convergent systems of action over all data.

In other words, big data isn’t just for backroom data science geeks. The technologies involved are going to define the next-generation IT data center platform…(read the complete as-published article there)

Serverless technology obfuscates workflows, performance data

An IT industry analyst article published by SearchITOperations.

Serverless and microservices reshape the application stack into something that looks like a swath of stars in the sky. How do you find a slow, misconfigured component in this interconnected galaxy?

Mike Matchett
Small World Big Data

I’m hearing that IT infrastructure is dead. And who needs it anymore, really? The future is about moving up the stack to microservices and serverless technology, as we continue to abstract, embed and automate away all the complexities of explicit infrastructure layers, such as storage arrays and physical servers.

On-premises, Capex-style IT is shrinking, while rented and remotely managed hardware and cloud transformation set new standards for modern IT. All the cool kids use end-to-end orchestration, advanced machine learning, real-time management data streams, microservices architecture and insanely scalable container environments. And now we even have serverless computing, sometimes called function as a service (FaaS).

But can we have computing without the server? And where did the server go?

Serving more with serverless technology
There is a certain hazard in my life that comes from telling non-IT people that, as an IT industry analyst, I explore and explain technology. I’m asked all the time, even by my mom, questions like, “I suppose you can explain what the cloud is?

I tend to bravely charge in, and, after a lot of at-bats with this question, I’ve got the first 25 seconds down: “It’s like running all your favorite applications and storing all your data on somebody else’s servers that run somewhere else — you just rent it while you use it.” Then I lose them with whatever I say next, usually something about the internet and virtualization.

The same is mostly true with serverless computing. We are just moving one more level up the IT stack. Of course, there is always a server down in the stack somewhere, but you don’t need to care about it anymore. With serverless technology in the stack, you pay for someone else to provide and operate the servers for you.

We submit our code (functions) to the service, which executes it for us according to whatever event triggers we set. As clients, we don’t have to deal with machine instances, storage, execution management, scalability or any other lower-level infrastructure concerns.

The event-driven part is a bit like how stored procedures acted in old databases, or the way modern webpages call in JavaScript functions, hooked to and fired off in response to various clicks and other web events. In fact, AWS Lambda, a popular serverless computing service, executes client JavaScript functions, likely running Node.js in the background in some vastly scalable way.

Look ma, no server!
We need to tackle several issues to ready serverless technology for primetime enterprise use. The first is controlling complexity…(read the complete as-published article there)

What’s our future if we don’t secure IoT devices?

An IT industry analyst article published by SearchITOperations.

When everything from the coffee maker to the manufacturing plant’s robots to the electric grid is connected, shouldn’t security be IT’s primary concern?

Mike Matchett
Small World Big Data

I was recently asked about the most pressing IT challenge in 2018. At first, I was going to throw out a pat answer, something like dealing with big data or finally deploying hybrid cloud architecture. But those aren’t actually all that difficult to pull off anymore.

We should be much more afraid of today’s human ignorance than tomorrow’s AI.

Then I thought about how some people like to be irrationally scared about the future, and bogeyman like artificial intelligence in particular. But AI really isn’t the scary part. It’s the blind trust we already tend to put into black-box algorithms and short-sighted local optimizations that inevitably bring about unintended consequences. We should be much more afraid of today’s human ignorance than tomorrow’s AI.

Instead, what I came up with as the hard, impending problem for IT is how to adequately secure the fast-expanding internet of things. To be clear, I interpret IoT rather broadly to include existing mobile devices — e.g., smartphones that can measure us constantly with multiple sensors and GPS — connected consumer gadgets and household items, and the burgeoning realm of industrial IoT.

The rush to secure IoT devices isn’t just about your personal things, as in the risk of someone hacking your future driverless car. The potential scope of an IoT security compromise is, by definition, huge. Imagine every car on the road hacked — at the same time.

IoT exploits could also go wide and deep. Sophisticated compromises could attack your car, your phone, your home security system, your pacemaker and your coffeepot simultaneously. Imagine every coffee machine out of service on the same morning. We haven’t even begun to outline the potential nightmare scenarios caused by insecure IoT devices. And I sure hope Starbucks is keeping some analog percolators on standby.

If personal physical danger isn’t scary enough, think about the ease with which a single penetration of a key connected system could cause a nationwide or even global disaster. For example, a 2003 cascading power outage that affected over 50 million people in New England was triggered by a single alarm system misconfiguration. An inability to recover or reset something easily at that scale could push one into imagining a truly dystopian future.

Vulnerable with a capital V
What worries me more than the possibility of a large, direct attack is the very real likelihood of slow, insidious, creeping subversion, achieved through IoT device security breaches. And not just by one party or a single bad actor, but by many competing interests and organizations over time — some with supposedly good intentions.

We will make mistakes, take shortcuts and ignore vulnerabilities until it’s too late.

The total IoT attack surface will be too large to keep everything fully secured…(read the complete as-published article there)

Reap IT automation benefits in every layer of the stack

An IT industry analyst article published by SearchITOperations.

Automation technologies create an artificial brain for IT operations, but that won’t turn skilled admins and engineers into zombies — far from it.

Mike Matchett
Small World Big Data

As a technology evangelist and professional IT systems optimizer, I see the benefits of IT automation and can only champion trends that increase it. When we automate onerous tasks and complex manual procedures, we naturally free up time to focus our energies higher in the stack. Better and more prevalent automation increases the relative return on our total effort so that we each become more productive and valuable. Simply put, IT automation provides leverage. So it’s all good, right?

Another IT automation benefit is that it captures, encapsulates and applies valuable knowledge to real-world problems. And actually, it’s increasingly hard to find IT automation platforms that don’t promote embedded machine learning and artificially intelligent algorithms. There is a fear that once our hard-earned knowledge is automated, we’ll no longer be necessary.

So, of course, I need to temper my automation enthusiasm. Automation can eliminate low-level jobs, and not everyone can instantly adjust or immediately convert to higher-value work. For example, industrial robots, self-driving cars or a plethora of internet of things (IoT)-enabled devices that cut out interactions with local retailers all tend to remove the bottom layer of the related pyramid of available jobs. In those situations, there will be fewer, more-utilized positions left as one climbs upward in skill sets.

Still, I believe automation, in the long run, can’t help but create even more pyramids to climb. We are a creative species after all. Today, we see niches emerging for skilled folks with a combination of internal IT and, for example, service provider, high-performance computing, data science, IoT and DevOps capabilities.

Automation initiatives aren’t automatic

If one squints a bit, almost every IT initiative aims to increase automation.

A service provider has a profit motive, so the benefit of IT automation is creating economies of scale. Those, in turn, drive competitive margins. But even within enterprise IT, where IT is still booked as a cost center, the drive toward intelligent automation is inevitable. Today, enterprise IT shops, following in the footsteps of the big service providers, are edging toward hybrid cloud-scale operations internally and finding that serious automation isn’t a nice-to-have, but a must-have.If one squints a bit, almost every IT initiative aims to increase automation. Most projects can be sorted roughly into these three areas with different IT automation benefits, from cost savings to higher uptime:

  • Assurance. Efforts to automate support and help desk tasks, shorten troubleshooting cycles, shore up security, protect data, reduce outages and recover operations quickly.
  • Operations. Necessary automation to stand up self-service catalogs, provision apps and infrastructure across hybrid and multi-cloud architectures to enable large-scale operations, and orchestrate complex system management tasks.
  • Optimization. Automation that improves or optimizes performance in complex, distributed environments, and minimizes costs through intelligent brokering, resource recovery and dynamic usage balancing.

Automation enablers at large
Successful automation initiatives don’t necessarily start by implementing new technologies like machine learning or big data. Organizational commitment to automation can drive a whole business toward a new, higher level of operational excellence…(read the complete as-published article there)