Protecting All Data in a Complex World

   …(Also posted on Cobalt Iron’s blog)

Are you actually protecting all your important data today? Not “almost,” “just about”, or “we have plans,” but all of it?

And I don’t mean just your most mission-critical data sets, but any and all of your data – critical/operational, analytical, and even archival that could be important to anyone (or any process) in your organization. The sad truth is that very few enterprise IT shops are able to claim they provide adequate protection for all their important data, much less provide rock-solid business-enhancing protection services even on just their mission-critical data.

Why? First, IT seems to grow more complex every day. Our high-tech IT architecture keeps evolving – stack convergence, hybrid cloud operations, multi-cloud brokering, distributed and mobile users, edge computing, and more.

Second, data growth is not quite totally out of control, but that’s only because we can only keep what we can actually store. With the Internet of Things streaming more data every day, Machine Learning algorithms feeding on longer tails of detailed history, and demanding users expecting ever more active archives, both the available data and the demand for more of that data is increasing non-linearly quarter-by-quarter.

And third, businesses change. Mergers and acquisitions add new layers of data center complexity. Corporations upsize, downsize, globalize, reorganize and even evolve new ways to conduct business.

It’s no wonder that we’ve outgrown the older generation of increasingly fragile data protection solutions. You might pretend that keeping up is just a matter of buying more backup licenses, hiring more IT people, and finally getting that extra budget increase this year (unlike every past year). But the truth is that what we are doing today isn’t working and isn’t going to work ever again.

It used to be simple to have an admin just backup the main file system once a week to some deep storage repository and be done with it. Then we added virtualization, deduplicating target storage, incremental backups, and remote cloud repositories. Swizzle in a growing morass of compliance and regulatory requirements and yesterday’s solutions become overwhelmingly difficult to maintain, much less serve to protect all those new applications that harness big data, leverage cloud processing, and deliver end user experiences to mobile devices. (Note – Change is hard! I can still call on an old muscle memory to type “tar –cvf backup|gzip >>/dev/tape” without thinking.)

In fact, we’ve outgrown many generations of data protection solutions and yet sometimes expect many of them to work reliably forever. How many different types of protection software do you have deployed? How many different skill sets does that require? Can you upgrade and patch all those older solutions in a timely manner? And I’d still bet you have major gaps in protection coverage, have blown out backup windows, and are unsure that a timely restore for some systems is actually possible.
Yet a New Hope

But there is always hope. While many of the changes listed above make assuring protection more complex, there are new approaches that can also help make things simple again. We know how we got here, but how do we get out of this morass?

Fundamentally we have to recognize and accept that all applications (and thus most if not all data) are becoming critical to the business. People depend on data and applications today for almost every aspect of their jobs. If there are still manual process fallbacks, no one quite remembers. Those kinds of business continuity plans are simply no longer realistic. We have built the basis of our brave new high-tech world on increasing task automation and the enhancement of human capabilities through intelligent applications. All of that data deserves protection.

The rise of pervasive, intelligent, and automated applications is not just a growing data protection problem for IT. Luckily, that trend also provides big clues for how to solve today’s data protection challenges.

There is great hope now available that combines automation, machine learning, and managed services. None of these things really offers a better mousetrap as it were, but when applied smartly together they can provide a very practical (and actually affordable) assurance that all those naughty mice actually get caught.

I can say this with some conviction because I was recently introduced to Cobalt Iron and their latest Adaptive Data Protection release (ADP 4.1 as of this writing). They have successfully encapsulated a massive amount of field experience into their data protection service, eliminating a tremendous amount of complexity and low-level skill requirement (what I might call technical trivia) on behalf of their clients. It’s all about advanced automation, applied intelligence, and leveraging computers to help people better manage all their data.

How do trends in primary storage affect secondary storage?

I caught up with Steve Pao at Igneous recently to chat again about the rise of secondary storage. Primary storage is great and all, but flash is everywhere – it’s old news.  In secondary storage, we are seeing a lot happening with scale and functionality and hybridization and built-in data protection services.

How do trends in primary storage affect secondary storage? (here with full transcript)

Why is the non-RDBMS popular?

Ok, this is a softball question for anyone inside the big data space.  However, it does come up as a real question when talking to business owners and application users.   This is a quick high-level description of NoSQL/NewSQL, why it’s different than SQL RDBMS, and where I think both of those markets are going.

Why is the non-#RDBMS becoming so popular? #NoSQL #database (with transcript)

Scalable Persistent Storage for Kubernetes Using MapR

Lots of storage solutions can claim to provide adequate container storage when there are 10’s or 100’s of containers, but what are you going to do when you really need to push the “go” button on your next gen apps, and spin up 100k’s+ containers across a hybrid cloud architecture?

MapR just introduced a very compelling container solution, of course leveraging the highly scalable and production-proven MapR platform. The big data storage layer in Mapr is already able to handle trillions of objects/files/tables/streams (hey it’s big data AND POSIX compliant AND…) in a highly scalable (and enteprise-y) manner.

In this short video bit just released on Truth In IT (with transcript), I interview Jack Norris from MapR about the new MapR for Kubernetes solution, announced yesterday.

Hanging Out, Having Fun with Data Management

Occasionally James Maguire over at Datamation invites me onto one of his hot topic panel sessions. There are usually a couple of vendors who really know what they are talking about and, of course, James who always does a great job slinging questions around. I always think that I might be getting invited because they need some comic relief — but then I’m not *that* funny.

Anyway, data management can be a pretty dry topic, but this video provides an easily watchable (and relatively juicy) overview. You will hear how today’s modern data management solutions aim to both establish rock-solid governance over ever-larger amounts of on-line data AND ensure that organizations can squeeze maximum value out of that spreading data pile.

It’s worth watching just in case someone drops the phrase “macro versus micro-infonomics” at your next geeky get-together…

https://www.datamation.com/events/allevents/challenges-in-data-governance.html

James Maguire, Datamation
Stephanie McReynolds, Alation
Danny Sandwell, Erwin
and me, Mike Matchett, Small World Big Data

 

Secondary Storage in a Primary Role !?

Hey all! – This is the first of what I hope will be many little topical quick video segments, working with Dave Littman over at Truth In IT to get them recorded, produced and published.

In this one we discuss what’s going in with secondary storage these days, and how it’s perhaps more interesting than all that commodity-ish “all-flash” primary storage out there

Can secondary storage play a primary role in the datacenter?

(gotta love the Youtube freeze frame that catches us both with eyes closed!)

Follow That Transaction! APM Across Clouds

It’s one thing to stand up a new cloud-native application, but it’s impossible to manage end-to-end performance using the same approaches and tooling we’ve long used in the data center.

I’m hearing a lot about transitional “new” cloud-native applications that actually combine and span layers of existing persistent storage, traditional data stores and key legacy application functionality hosted in VM’s, and containerized new development. Existing back-end “stuff with API’s” can be readily topped and extended now by thousands (hundreds of thousands?) of microservices running web-like across hybrid and multi-cloud platform hosting. Even the idea of what makes up any particular application can get pretty fuzzy.

While there are certainly security, data protection, and availabilty/resilience concerns to be sure, the problem we are talking about today is that when you pile up that much complexity and scale, assuring production performance can become quite a challenge.

Transactional History in Performance Management

Performance management includes monitoring targeted service levels but also should provide ways to identify both sudden and creeping problems, troubleshoot down to root cause (and then help remediate in situ), optimize bottlenecks to provide better service (endless, because there is always a “next” longest pole in the tent) and plan/predict for possible changes in functionality, usage and resources (capacity planning in the cloud era).

I spent many years working for one of the so-called “big 4” system management companies implementing infrastructure capacity planning and performance management (IPM) solutions in Fortune 500 company data centers. With a little operational queuing modeling work on rather monolithic workloads (mainframe, AS400, mid-range UNIX…), we could help steer multi-million dollar IT buys towards the right resources that would solve today’s problems and assure future performance.

A core concept is the idea of the mythical “workload transaction” as the unit of application work. In those days, at least for capacity planning, we could get away with a statistical transaction unit of work. For example, we’d observe a certain amount of active usage on a given system in terms of it’s CPU utilization, memory, IO, etc., and then divide those metrics by an arbitrary usage metric (e.g. number of known users, # of IO’s written, processes forked, forms processed, functionality points, or the default generic cpu-second itself). This statistical modeling approach worked remarkably well in helping right-size, right-time, and right-host infrastructure investments.

However, this approach only went so far when it came to troubleshooting or optimizing within an application. We could readily look at application behavior in some kind of aggregate way, maybe isolating behavior down to a specific observed process (or set of identified processes). You could even in some cases get developers to add some instrumentation (anybody remember ARM?) into key application code to count and report on arbitrary app-specific transaction counts. Of course this was rarely achievable in practice (most business critical code was 3rd party and painful performance problems that needed solving “fast” were already in production).

If you needed to go inside the app itself, or track individual transactions across a distributed system (classically a 3-tier presentation/business logic/database architecture), you needed application insight from another set of tools that came to be called Application Performance Management (APM). APM solutions aimed to provide performance insight into application specific transaction “definitions”. Instrumentation for transaction tracking was often “inserted” early into the app development process. Of course this still requires some up front discipline.  Or a non-intrusive (but in many ways halfway) approach might capture network traffic and parse it (with deep packet inspection DPI) to produce information on transactional workflow and sometimes drill down to identify individual transactions flowing between systems.

Hybrid Containerized PM

It’s impossible to follow a unique transaction across today’s potentially huge web of containerized microservices. I think of it visually as similar to how our neurons theoretically fire and cascade in the brain – an overlapping mesh of activity. We can see behavior in aggregate easily enough, but tracking what goes into each unique transaction?

First we need to realize that transaction workflow in this kind of environment is naturally complex. Application devs (and third party services) can implement messaging busses, delivery queues, make synchronous calls and at the same fire asynchronous events and triggers, span arbitrarily large pauses (to account for human interactions like web page interaction), cause large cascades, aggregate behavior (trigger something X every 10 Y’s), and so on.

The only real approach to tracking unique transactions is still through instrumentation.  Luckily there is a “tracing” standard (see Opentracing project). But tracing is even more challenging at large scale (and across dynamic and abstracted platform hosting).  How much data (and how fast) can something like Splunk take in constant instrumentation data from 100,000’s of microservices (and how much will that cost)? This can easily be a case where performance measurement uses as much or more resource than the app itself.

To tackle this, there are some folks rolling out practical tracing services designed to tackle both the distributed complexity and huge scales involved. This last week LightStep rolled out of stealth (founder Ben Sigelman was instrumental in Opentracing 🙂 ). LightStep [x]PM, a managed service offering that incurs minimal performance analysis overhead on site, provides 100% transaction tracing at scale by doing some sophisticated sampling during aggregation/monitoring, but preserving full tracing info for immediate audit/drill down.  LightStep has some impressively large scale use cases already stacked up.

FaaS Performance Management

This of course is not the end of the transactional tracing saga. I’ve written before about Fission, a developing open source Function as a Service layer (FaaS ontop of K8). That project has now recently started on a next layer called Fission Workflow, which implements a YAML-like blueprint file to declare and stitch together functions into larger workflows (compare to AWS Step functionality).  I think workflows of functions will naturally correspond to interesting “application” transactions.

And FaaS workflows could very well be the future of application development. Each function runs as a container, but by using something like Fission the developer doesn’t need to know about containers or container management. And when it comes to generating  performance insight across webs of functions, for example, the Fission Workflow engine itself can (or will) explicitly track transactions across wherever they are defined to flow (tracing state/status, timing, etc).

[check out this interesting Fission Workflow work in progress page for some categorization of the complexity for tracking async “waiting” workflows…]

This immediately makes me want to collect Fission Workflow data into something like Cassandra and play with subsets in Spark (esp. graph structured queries and visualization).  There a lot of new frontiers here to explore.

I can’t wait to see what comes next!

 

What’s a Software Defined Data Center? – Pensa Aims Really High

This week Pensa came out of their stealthy development phase to announce the launch of their company and their Pensa Maestro cloud-based (SaaS) platform, accessible today through an initial service offering called Pensa Lab. The technology here has great opportunity, and importantly the team at Pensa is firming up with the best folks (I used to work for Tom Joyce).

I’m not sure we analysts have firmed out all the words to easily describe what they do yet, but basically Pensa provides a way to define the whole data center in code, validate it as a model, and then pull a trigger and aim it at some infrastructure to automatically deploy it. Data centers on demand!  Of course, doing all the background tranfigurations to validate and actually deploy this über level of complexity and scale requires big smarts – a large part of the magic here is some cleverly applied ML algorithms to drive required transformations, ensure policies and set up SDN configurations.

What is Software Defined?

So let’s back up a bit and explore some of the technologies involved – one of the big benefits of software and software-defined resources is that they can be spun up dynamically (and readily converged within compute hosts with applications and other software defined resources). These software-side “resources” are usually provisioned and configured through “editable model/manifest files/templates” – so-called “infrastructure as code”. Because they are implemented in software they are often also dynamically re-configurable and remotely programmable through API’s.

Application Blueprinting for DevOps

On the other side of the IT fence, applications are increasingly provisioned and deployed dynamically via recipes or catalog-style automation, which in turn rely on internal application “blueprint” or container manifest files that can drive automated configuration and deployment of application code and needed resources, like private network connections, storage volumes and specific data sets. This idea is most visible in new containerized environments, but we also see application blueprinting coming on strong for legacy hypervisor environments and bare metal provisioning solutions too.

Truly Software Defined Data Centers

If you put these two ideas together – SD and application blueprinting, you might envision a truly software defined data center describable fully in code. With some clever discovery solutions, you can imagine that an existing data center could be explored and captured/documented into a model file describing a complete blueprint for both infrastructure and applications (and the enterprise services that wrap around them). Versions of that data center “file” could be edited as desired (e.g. to make a test or dev version perhaps), with the resulting data center models deployable at will on some other actual infrastructure – like “another” public cloud.

Automation of this scenario requires an intelligent translation of high-level blueprint service and resource requirements into practical provisioning and operational configurations on specifically target infrastructure. But imagine being able to effectively snapshot your current data center top to bottom, and them be able to deploy a full, complete copy on demand for testing, replication or even live DR  (we might call this a “live re-inflation DR” (or LR-DR) scenario).

Of course, today’s data center is increasingly hybrid/multi-cloud consisting of a mix of physical, virtual machines and containerized apps and corporate data. But through emerging cutting-edge IT capabilities like hybrid-supporting software defined networking and storage, composable bare metal provisioning, virtualizing hypervisors and cloud-orchestration stacks, container systems, PaaS, and hybrid cloud storage services (e.g. HPE’s Cloud Volumes), it’s becoming possible to not just blueprint and dynamically deploy applications, but soon the whole data center around them.

There is no way that VMware, whose tagline has been SDDC for some time, will roll over and cede the territory here completely to Pensa (or any other startup). But Pensa now has a live service out there today – and that could prove disruptive to the whole enterprise IT  market.