Scalable Persistent Storage for Kubernetes Using MapR

Lots of storage solutions can claim to provide adequate container storage when there are 10’s or 100’s of containers, but what are you going to do when you really need to push the “go” button on your next gen apps, and spin up 100k’s+ containers across a hybrid cloud architecture?

MapR just introduced a very compelling container solution, of course leveraging the highly scalable and production-proven MapR platform. The big data storage layer in Mapr is already able to handle trillions of objects/files/tables/streams (hey it’s big data AND POSIX compliant AND…) in a highly scalable (and enteprise-y) manner.

In this short video bit just released on Truth In IT (with transcript), I interview Jack Norris from MapR about the new MapR for Kubernetes solution, announced yesterday.

Future of data storage technology: Transformational trends for 2018

An IT industry analyst article published by SearchStorage.


article_Future-of-data-storage-technology-Transformational-trends-for-2018
Risk-averse enterprises finally accepted the cloud in 2017, and we didn’t even notice. Expect the same for these data storage technology trends in the new year.

Mike Matchett
Small World Big Data

Sometimes big changes sneak up on you, especially when you’re talking about the future of data storage technology. For example, when exactly did full-on cloud adoption become fully accepted by all those risk-averse organizations, understaffed IT shops and disbelieving business executives? I’m not complaining, but the needle of cloud acceptance tilted over sometime in the recent past without much ado. It seems everyone has let go of their fear of cloud and hybrid operations as risky propositions. Instead, we’ve all come to accept the cloud as something that’s just done.

Sure, cloud was inevitable, but I’d still like to know why it finally happened now. Maybe it’s because IT consumers expect information technology will provide whatever they want on demand. Or maybe it’s because everything IT implements on premises now comes labeled as private cloud. Influential companies, such as IBM, Microsoft and Oracle, are happy to help ease folks formerly committed to private infrastructure toward hybrid architectures that happen to use their respective cloud services.

In any case, I’m disappointed I didn’t get my invitation to the “cloud finally happened” party. But having missed cloud’s big moment, I’m not going to let other obvious yet possibly transformative trends sneak past as they go mainstream with enterprises in 2018. So when it comes to the future of data storage technology, I’ll be watching the following:

Containers arose out of a long-standing desire to find a better way to package applications. This year we should see enterprise-class container management reach maturity parity with virtual machine management — while not holding back any advantages containers have over VMs. Expect modern software-defined resources, such as storage, to be delivered mostly in containerized form. When combined with dynamic operational APIs, these resources will deliver highly flexible programmable infrastructures. This approach should enable vendors to package applications and their required infrastructure as units that can be redeployed — that is, blueprinted or specified in editable and versionable manifest files — enabling full environment and even data center-level cloud provisioning. Being able to deploy a data center on demand could completely transform disaster recovery, to name one use case.

Everyone is talking about AI, but it is machine learning that’s slowly permeating through just about every facet of IT management. Although there’s a lot of hype, it’s worth figuring out how and where carefully applied machine learning could add significant value. Most machine learning is conceptually made up of advanced forms of pattern recognition. So think about where using the technology to automatically identify complex patterns would reduce time and effort. We expect the increasing availability of machine learning algorithms to give rise to new storage management processes. These algorithms can produce storage management processes that can learn and adjust operations and settings to optimize workload services, quickly identify and fix the root causes of abnormalities, and broker storage infrastructure and manage large-scale data to minimize cost.

Management as a service (MaaS) is gaining traction, when looking at the future of data storage technology. First, every storage array seemingly comes with built-in call home support replete with management analytics and performance optimization. I predict that the interval for most remote vendor management services to quickly drop from today’s daily batch to five-minute streaming. I also expect cloud-hosted MaaS offerings are the way most shops will manage their increasingly hybrid architectures, and many will start to shift away from the burdens of on-premises management software…(read the complete as-published article there)

Persistent data storage in containerized environments

An IT industry analyst article published by SearchStorage.


article_Persistent-data-storage-in-containerized-environments
The most significant challenge to the rise of containerized applications is quickly and easily providing enterprise-class persistent storage for containers.

Mike Matchett

The pace of change in IT is staggering. Fast growing data, cloud-scale processing and millions of new internet of things devices are driving us to find more efficient, reliable and scalable ways to keep up. Traditional application architectures are reaching their limits, and we’re scrambling to evaluate the best new approaches for development and deployment. Fortunately, the hottest prospect — containerization — promises to address many, if not all, of these otherwise overwhelming challenges.

In containerized application design, each individual container hosts an isolatable, and separately scalable, processing component of a larger application web of containers. Unlike monolithic application processes of the past, large, containerized applications can consist of hundreds, if not thousands, of related containers. The apps support Agile design, development and deployment methodologies. They can scale readily in production and are ideally suited for hosting in distributed, and even hybrid, cloud infrastructure.

Unfortunately, containers weren’t originally designed to implement full-stack applications or really any application that requires persistent data storage. The original idea for containers was to make it easy to create and deploy stateless microservice application layers on a large scale. Think of microservices as a form of highly agile middleware with conceptually no persistent data storage requirements to worry about.

Persistence in persisting

Because the container approach has delivered great agility, scalability, efficiency and cloud-readiness, and is lower-cost in many cases, people now want to use it for far more than microservices. Container architectures provide such a better way to build modern applications that we see many commercial software and systems vendors transitioning internal development to container form and even deploying them widely, often without explicit end-user or IT awareness. It’s a good bet that most Fortune 1000 companies already host third-party production IT applications in containers, especially inside appliances, converged approaches and purpose-built infrastructure.

It’s a good bet that most Fortune 1000 companies already host third-party container applications within production IT.

You might find large, containerized databases and even storage systems. Still, designing enterprise persistent storage for these applications is a challenge, as containers can come and go and migrate across distributed and hybrid infrastructure. Because data needs to be mastered, protected, regulated and governed, persistent data storage acts in many ways like an anchor, holding containers down and threatening to reduce many of their benefits.

Container architectures need three types of storage…(read the complete as-published article there)

A serverless architecture could live in your data center

An IT industry analyst article published by SearchITOperations.


article_A-serverless-architecture-could-live-in-your-data-center
Just because you don’t see the server doesn’t mean it’s not there. Serverless frameworks are superseding containers, but is the extra abstraction worth it?

Mike Matchett

Have you figured out everything you need to know about managing and operating container environments already? How to host them in your production data centers at scale? Transform all your legacy apps into containerized versions? Train your developers to do agile DevOps, and turn your IT admins into cloud brokers? Not quite yet?

I hate to tell you, but the IT world is already moving past containers. Now you need to look at the next big thing: serverless computing.

I don’t know who thought it was a good idea to label this latest application architecture trend serverless computing. Code is useless, after all, unless it runs on a computer. There has to be a server in there somewhere. I guess the idea was to imply that when you submit application functionality for execution without caring about servers, it feels completely serverless.

In cloud infrastructure as a service, you don’t have to own or manage your own physical infrastructure. With cloud serverless architecture, you also don’t have to care about virtual machines, operating systems or even containers.

Serving more through serverless architecture?

So what is serverless computing? It’s a service in which a programmer can write relatively contained bits of code and then directly deploy them as standalone, function-sized microservices. You can easily set up these microservices to execute on a serverless computing framework, triggering or scheduling them by policy in response to supported events or API calls.

A serverless framework is designed to scale well with inherently stateless microservices — unlike today’s containers, which can host stateful computing as well as stateless code. You might use serverless functions to tackle applications that need highly elastic, event-driven execution or when you create a pipeline of arbitrary functionality to transform raw input into polished output. This event-pipeline concept meshes well with expected processing needs related to the internet of things. It could also prove useful with applications running in a real-time data stream.

A well-known public cloud example of serverless computing is Amazon Web Service’s Lambda. The Lambda name no doubt refers to anonymous lambda functions used extensively in functional programming. In languages such as JavaScript or Ruby, a function can be a first-class object defined as a closure of some code function within a prescribed variable scope. Some languages have actual lambda operators that a programmer can use to dynamically create new function objects at runtime (e.g., as other code executes).

So with a serverless framework, where does the actual infrastructure come into the picture? It’s still there, just under multiple layers of abstraction. Talk about software-defined computing. With this latest evolution into serverless computing, we now have perhaps several million lines of system- and platform-defining code between application code and hardware. It’s a good thing Moore’s Law hasn’t totally quit on us…(read the complete as-published article there)

I’m Going Fission

I just spent a couple of weeks in Boston at Red Hat Summit and OpenStack Summit.  Containers are clearly the big thing this year – Kubernetes, Openshift, etc. And increasingly, IT is learning how to take advantage of remote Management As A Service (MaaS) offerings that free up folks to focus more on business value and less on running complex stacks. On that front I talked with folks like Platform9, who happen to also sponsor a “server-less” computing solution called Fission (- later in this blog post I’ll show how I got Fission deployed to my Mac).

Because I’m an industry analyst (in my day job), here is a big picture of the evolution happening in application infrastructure: Physically hosted apps (server and O/S) –> Virtual machines (in a hypervisor) –> Cloud platforms (e.g. OpenStack) –> Container “ships” (e.g. OpenShift, Docker, Kubernetes) –> Serverless Computing (e.g. AWS Lambda and Fission).

Applications have always been constructed out of multiple tiers and communicating parts, but generally we are moving towards a world in which functionality is both defined and deployed (distributable, scalable) in small, testable bits (i.e. “units” as in unit testing), while an application “blueprint” defines all the related bits and required service properties in operation.  Some folks are calling the blueprinting part “infrastructure as code”.

(BTW – the next evolutionary step is probably some kind of highly intelligent, dynamic IoT/Big Data/Distributed engine that inherently analyzes and distributes compute functionality out as far as it can go towards the IoT edge while centralizing data only as much as required. Kind of like a database query planner on IoT-size steroids).

So, onto my Mac deployment of Fission. I’ve already got VirtualBox installed for running Hadoop cluster sandboxes and other fun projects, but OpenStack is probably not something I really need or want to run on my own Mac (although apparently I could if I wanted more agility in spinning up and down big data clusters). But – Ah ha! – now a mental lightbulb goes on! (or rather, an LED went on – gotta save power these days).

This Fission project means I can run my own lambda services now too on my little desktop Mac, and then easily deploy really cool stuff to really big clouds when someday I create that killer app (with lambdas that happily interface with other coolness like Spark, Neo4j, Ruby on Rails…).  Ok, this is definitely something I want to play with.  And I’m thinking, wait for it –  Ruby lambdas!  (Ruby is not dead, you fools! You’ll all eventually see why Ruby is the one language that will be used in the darkness to bind them all!)

Well, we’ll come back to Ruby later.  First things first – we’ll start with the default node.js example. Let’s aim for a local nested stack that will run like this:

osx (-> virtualbox (-> minikube (-> fission (-> node.js))))

host server – hypervisor – container cluster – lambda services – execution environment

While the lambda execution will be nested, the CLI commands to interface with minikube/kubernetes (kubectl) and fission (fission) will be available locally at the osx command line (in a terminal window).

Ok, I’ve already got VirtualBox, but it’s out of date for minikube. So I directly download the latest off the web and install – oops, first issue! Mac OSX now has some fancy SIP security layer that prevents anyone from actually getting anything done as root (I swear if they keep making my unix-based Mac work like IOS I’m gonna convert to Ubuntu!). So after working around security to get that update in place (and thank you Oracle for VirtualBox) we are moving on!

$ virtualbox
Oh, and make sure to also have kubectl installed locally. The local kubectl will get dynamically linked into the minikube kubernetes environment that will be running inside virtualbox.
$ curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/v1.6.0/bin/darwin/amd64/kubectl && chmod +x kubectl && sudo mv kubectl /usr/local/bin/
$ kubectl version

For the minikube install I used brew, which of course I had to update first. And of course, I had to again work around the Mac OSX SIP challenge above (hopefully this is a one time fix) by setting /usr/local directly ownership to myself (then back to root:wheel after the dust settled).

$ brew update
$ brew cask install minikube
$ minikube start 
# minikube stop
# minikube service [-n NAMESPACE] [--url] NAME
$ minikube ip
$ minikube dashboard

At this point you can deploy containerized apps with kubectl into the minikube “cluster”.  This next bit is an example of a simple “echo” server from the minikube github.

$ kubectl run hello-minikube --image=gcr.io/google_containers/echoserver:1.4 --port=8080
$ kubectl expose deployment hello-minikube --type=NodePort
$ kubectl get pod
$ curl $(minikube service hello-minikube --url)

(If you are following along, you might suggest that I should play here with minishift too, but now is not yet the time! Maybe I’ll climb into that PaaS arena in another post.)

Now it’s time for Fission. These next snippets are taken from fission github readme page. The first curl gets the fission client command lines installed locally. The kubectl lines start the fission services up. The two shell variables are just for convenience of the provided example, and not part of the required install.

$ curl http://fission.io/mac/fission > fission && chmod +x fission && sudo mv fission /usr/local/bin/

$ kubectl create -f http://fission.io/fission.yaml
$ kubectl create -f http://fission.io/fission-nodeport.yaml

$ export FISSION_URL=http://$(minikube ip):31313
$ export FISSION_ROUTER=$(minikube ip):31314    (for these examples)

Ok, so now have our own lambda services host running. Next we can start deploying lambda functions. Fission does a number of things like scale-out our services and keep a few containers ready for fast startup, and probably a bunch of stuff I won’t figure out until some O’Reilly book comes out (oh, I could just read the code…).

$ fission env create --name nodejs --image fission/node-env
$ curl https://raw.githubusercontent.com/fission/fission/master/examples/nodejs/hello.js > hello.js

$ fission function create --name hello --env nodejs --code hello.js
$ fission route create --method GET --url /hello --function hello

First, we create a fission environment associating a fission environment container image with the name “nodejs”. Then we create a fission function with our functional lambda hello.js “code” into that fission environment. Here we are using javascript and node.js, but there are other execution environments available (and we can make our own!). We also then need to map a web services route to our fission function.


module.exports = async function(context) {
    return {
        status: 200,
        body: "Hello, World!\n"
    };
}
hello.js

You can see that a Fission lambda function is just a javascript function. In this case all it does is return a standard HTTP response.

$ curl http://$FISSION_ROUTER/hello
 ->  Hello, World!

Testing it out – we hit the URL with a GET request and tada!  Hello World!

This is quite an onion we’ve built, but you hopefully can appreciate that each layer is adding to the architecture that would enable easy deployment at large scale and wide distribution down the road. Next up though, I personally want Ruby lambdas!

I could build a full native ruby fission environment (should be easy enough to start with an existing RH or docker ruby container). There is a python fission example that wouldn’t be hard to emulate. I’d have to decide on key gems to pre-load, and that leads to a big question on what I’d like to actually do and how big and fat that environment might get (which could be slow and bloated). Or we could try to stay very small – there have been small embeddable ruby’s like mruby (although that one looks dead since 2015). There is also some interesting advice for building minimal ruby app containers .

While not actually ruby, CoffeeScript transpiling ruby-like coffeescript code to javascript seems the easiest route at the moment, and just uses the vanilla fission node.js environment we already have above. I could also see also embedding “coffee” in a fission environment easily enough so that I could send coffeescript code directly to fission (although that would require transpiling on every lambda execution – it’s always a trade-off). To get started with coffee, add it to your local node.js environment (install Node first if you don’t already have that).

$ npm install -g coffee-script
$ coffee

Using coffee is easy enough. Learning it might take a bit of study, although if you like ruby and only suffer when forced to work with native Javascript, it’s well worth it.

But CoffeeScript is not ruby.  Something like Opal (transpiling full ruby syntax to js) is an even more interesting project, and if it was ever solid it could be implemented here with fission in a number of ways – possibly embedding it in a unique Opal ruby fission environment, statically applying it upstream from a node.js fission environment like with CoffeeScript, or even using it dynamically as a wrapper with ruby code sent to the node.js environment.

Another idea is to build a small ruby native fission solution with something like a nested ruby Sinatra design. First create a local “super-fission-sinatra” DSL that would deploy sinatra-like web service definition code to an embedded ruby/sinatra fission environment. Kind of meta-meta maybe, but maybe an interesting way to build scalable, instrumented API’s.

All right – that’s enough for now. Time to play! Let me know if you create any Ruby Fission examples!