Scalable Persistent Storage for Kubernetes Using MapR

Lots of storage solutions can claim to provide adequate container storage when there are 10’s or 100’s of containers, but what are you going to do when you really need to push the “go” button on your next gen apps, and spin up 100k’s+ containers across a hybrid cloud architecture?

MapR just introduced a very compelling container solution, of course leveraging the highly scalable and production-proven MapR platform. The big data storage layer in Mapr is already able to handle trillions of objects/files/tables/streams (hey it’s big data AND POSIX compliant AND…) in a highly scalable (and enteprise-y) manner.

In this short video bit just released on Truth In IT (with transcript), I interview Jack Norris from MapR about the new MapR for Kubernetes solution, announced yesterday.

Spark speeds up adoption of big data clusters and clouds

An IT industry analyst article published by SearchITOperations.


article_Spark-speeds-up-adoption-of-big-data-clusters-and-clouds
Infrastructure that supports big data comes from both the cloud and clusters. Enterprises can mix and match these seven infrastructure choices to meet their needs.

Mike Matchett

If enterprise IT has been slow to support big data analytics in production for the decade-old Hadoop, there has been a much faster ramp-up now that Spark is part of the overall package. After all, doing the same old business intelligence approach with broader, bigger data (with MapReduce) isn’t exciting, but producing operational time predictive intelligence that guides and optimizes business with machine precision is a competitive must-have.

With traditional business intelligence (BI), an analyst studies a lot of data and makes some hypotheses and a conclusion to form a recommendation. Using the many big data machine learning techniques supported by Spark’s MLlib, a company’s big data can dynamically drive operational-speed optimizations. Massive in-memory machine learning algorithms enable businesses to immediately recognize and act on inherent patterns in even big streaming data.

But the commoditization of machine learning itself isn’t the only new driver here. A decade ago, IT needed to stand up either a “baby” high performance computing cluster for serious machine learning or learn to write low-level distributed parallel algorithms to run on the commodity-based Hadoop MapReduce platform. Either option required both data science and exceptionally talented IT admins that could stand up and support massive physical scale-out clusters in production. Today there are many infrastructure options for big data clusters that can help IT deploy and support big data-driven applications.

Here are seven types of big data infrastructures for IT to consider, each with core strengths and differences:…(read the complete as-published article there)

Machine learning algorithms make life easier — until they don’t

An IT industry analyst article published by SearchITOperations.


article_Machine-learning-algorithms-make-life-easier-until-they-dont
Algorithms govern many facets of our lives. But imperfect logic and data sets can make results worse instead of better, so it behooves all of us to think like data scientists.

Mike Matchett

Algorithms control our lives in many and increasingly mysterious ways. While machine learning algorithms change IT, you might be surprised at the algorithms at work in your nondigital life as well.

When I pull a little numbered ticket at the local deli counter, I know with some certainty that I’ll eventually get served. That’s a queuing algorithm in action — it preserves the expected first-in, first-out ordering of the line. Although wait times vary, it delivers a predictable average latency to all shoppers.

Now compare that to when I buy a ticket for the lottery. I’m taking a big chance on a random-draw algorithm, which is quite unlikely to ever go my way. Winning is not only uncertain, but improbable. Still, for many folks, the purchase of a lottery ticket delivers a temporary emotional salve, so there is some economic utility — as you might have heard in Economics 101.

People can respond well to algorithms that have guaranteed certainty and those with arbitrary randomness in the appropriate situations. But imagine flipping those scenarios. What if your deli only randomly selected people to serve? With enough competing shoppers, you might never get your sliced bologna. What if the lottery just ended up paying everyone back their ticket price minus some administrative tax? Even though this would improve almost everyone’s actual lottery return on investment, that kind of game would be no fun at all.

Without getting deep into psychology or behavioral economics, there are clearly appropriate and inappropriate uses of randomization. When we know we are taking a long-shot chance at a big upside, we might grumble if we lose. But our reactions are different when the department of motor vehicles closes after we’ve already spent four hours waiting.

Now imagine being subjected to opaque algorithms in various important facets of your life, as when applying for a mortgage, a car loan, a job or school admission. Many of the algorithms that govern your fate are seemingly arbitrary. Without transparency, it’s hard to know if any of them are actually fair, much less able to predict your individual prospects. (Consider the fairness concept the next time an airline randomly bumps you from a flight.)
Machine learning algorithms overview — machines learn what?

So let’s consider the supposedly smarter algorithms designed at some organizational level to be fair. Perhaps they’re based on some hard, rational logic leading to an unbiased and random draw, or more likely on some fancy but operationally opaque big data-based machine learning algorithm.

With machine learning, we hope things will be better, but they can also get much worse. In too many cases, poorly trained or designed machine learning algorithms end up making prejudicial decisions that can unfairly affect individuals.

I’m not exaggerating when I predict that machine learning will touch every facet of human existence.

This is a growing — and significant — problem for all of us. Machine learning is influencing a lot of the important decisions made about us and is steering more and more of our economy. It has crept in behind the scenes as so-called secret sauce or as proprietary algorithms applied to key operations.

But with easy-to-use big data, machine learning tools like Apache Spark and the increasing streams of data from the internet of things wrapping all around us, I expect that every data-driven task will be optimized with machine learning in some important way…(read the complete as-published article there)

Storage technologies evolve toward a data-processing platform

An IT industry analyst article published by SearchDataCenter.


article_Storage-technologies-evolve-toward-a-data-processing-platform
Emerging technologies such as containers, HCI and big data have blurred the lines between compute and storage platforms, breaking down traditional IT silos.

Mike Matchett

With the rise of software-defined storage, in which storage services are implemented as a software layer, the whole idea of data storage is being re-imagined. And with the resulting increase in the convergence of compute with storage, the difference between a storage platform and a data-processing platform is further eroding.

Storage takes new forms

Let’s look at a few of the ways that storage is driving into new territory:

  • Now in containers! Almost all new storage operating systems, at least under the hood, are being written as containerized applications. In fact, we’ve heard rumors that some traditional storage systems are being converted to containerized form. This has a couple of important implications, including the ability to better handle massive scale-out, increased availability, cloud-deployment friendliness and easier support for converging computation within the storage.
  • Merged and converged. Hyper-convergence bakes software-defined storage into convenient, modular appliance units of infrastructure. Hyper-converged infrastructure products, such as those from Hewlett Packard Enterprise’s SimpliVity and Nutanix, can greatly reduce storage overhead and help build hybrid clouds. We also see innovative approaches merging storage and compute in new ways, using server-side flash (e.g., Datrium), rack-scale infrastructure pooling (e.g., Drivescale) or even integrating ARM processors on each disk drive (e.g., Igneous).
  • Bigger is better. If the rise of big data has taught us anything, it’s that keeping more data around is a prerequisite for having the opportunity to mine value from that data. Big data distributions today combine Hadoop and Spark ecosystems, various flavors of databases and scale-out system management into increasingly general-purpose data-processing platforms, all powered by underlying big data storage tools (e.g., Hadoop Distributed File System, Kudu, Alluxio).
  • Always faster. If big is good, big and fast are even better. We are seeing new kinds of automatically tiered and cached big data storage and data access layer products designed around creating integrated data pipelines. Many of these tools are really converged big data platforms built for analyzing big and streaming data at internet of things (IoT) scales.

The changing fundamentals

Powering many of these examples are interesting shifts in underlying technical capabilities. New data processing platforms are handling more metadata per unit of data than ever before. More metadata leads to new, highly efficient ways to innovate …(read the complete as-published article there)

Accelerate Apache Spark to boost big data platforms

An IT industry analyst article published by SearchITOperations.


article_Accelerate-Apache-Spark-to-boost-big-data-platforms
Big data platforms like Apache Spark process massive volumes of data faster than other options. As data volumes grow, enterprises seek ways to speed up Spark.

Mike Matchett

So, we have data — lots and lots of data. We have blocks, files and objects in storage. We have tables, key values and graphs in databases. And increasingly, we have media, machine data and event streams flowing in.

It must be a fun time to be an enterprise data architect, figuring out how to best take advantage of all this potential intelligence — without missing or dropping a single byte.

Big data platforms such as Spark help process this data quickly and converge traditional transactional data center applications with advanced analytics. If you haven’t yet seen Spark show up in the production side of your data center, you will soon. Organizations that don’t, or can’t, adopt big data platforms to add intelligence to their daily business processes are soon going to find themselves way behind their competition.

Spark, with its distributed in-memory processing architecture — and native libraries providing both expert machine learning and SQL-like data structures — was expressly designed for performance with large data sets. Even with such a fast start, competition and larger data volumes have made Spark performance acceleration a sizzling hot topic. You can see this trend at big data shows, such as the recent, sold-out Spark Summit in Boston, where it seemed every vendor was touting some way to accelerate Spark.

If Spark already runs in memory and scales out to large clusters of nodes, how can you make it faster, processing more data than ever before? Here are five Spark acceleration angles we’ve noted:

  1. In-memory improvements. Spark can use a distributed pool of memory-heavy nodes. Still, there is always room to improve how memory management works — such as sharding and caching — how much memory can be stuffed into each node and how far clusters can effectively scale out. Recent versions of Spark use native Tungsten off-heap memory management — i.e., compact data encoding — and the optimizing Catalyst query planner to greatly reduce both execution time and memory demand. According to Databricks, the leading Spark sponsor, we’ll continue to see future releases aggressively pursue greater Spark acceleration.
  2. Native streaming data. The hottest topic in big data is how to deal with streaming data.

…(read the complete as-published article there)

Internet of things data security proves vital in digitized world

An IT industry analyst article published by SearchITOperations.


article_Internet-of-things-data-security-proves-vital-in-digitized-world
Securing IoT data should become a priority as more companies manipulate the volumes produced by these devices. Seemingly innocuous information could allow privacy invasions.

Mike Matchett

The data privacy and access discussion gets all the more complicated in the age of IoT.

Some organizations might soon suffer from data paucity — getting locked, outbid or otherwise shut out of critical new data sources that could help optimize future business. While I believe that every data-driven organization should start planning today to avoid ending up data poor, this concern is just one of many potential data-related problems arising in our new big data, streaming, internet of things (IoT) world. In fact, issues with getting the right data will become so critical that I predict a new strategic data enablement discipline will emerge to not just manage and protect valuable data, but to ensure access to all the necessary — and valid — data the corporation might need to remain competitive.

In addition to avoiding debilitating data paucity, data enablement will mean IT will also need to consider how to manage and address key issues in internet of things data security, privacy and veracity. Deep discussions about the proper use of data in this era of analytics are filling books, and much remains undetermined. But IT needs to prepare for whatever data policies emerge in the next few years.

Piracy or privacy?

Many folks explore data privacy in depth, and I certainly don’t have immediate advice on how to best balance the personal, organizational or social benefits of data sharing, or where to draw a hard line on public versus private data. But if we look at privacy from the perspective of most organizations, the first requirements are to meet data security demands, specifically the regulatory and compliance laws defining the control of personal data. These would include medical history, salary and other HR data. Many commercial organizations, however, reserve the right to access, manage, use and share anything that winds up in their systems unless specifically protected — including any data stored or created by or about their employees.

If you are in the shipping business, using GPS and other sensor data from packages and trucks seems like fair game. After all, truck drivers know their employers are monitoring their progress and driving habits. But what happens when organizations track our interactions with IoT devices? Privacy concerns arise, and the threat of an internet of things security breach looms.

Many people are working hard to make GPS work within buildings, ostensibly as a public service, using Wi-Fi equipment and other devices to help triangulate the position of handheld devices and thus locate people in real time, all the time, on detailed blueprints.

In a shopping mall, this tracking detail would enable directed advertising and timely deals related to the store a shopper enters. Such data in a business setting could tell your employer who is next to whom and for how long, what you are looking at online, what calls you receive and so on. Should our casual friendships — not to mention casual flirting — bathroom breaks and vending machine selections be monitored this way? Yet the business can make the case that it should be able to analyze those associations in the event of a security breach — or adjust health plan rates if you have that candy bar. And once that data exists, it can be leaked or stolen…(read the complete as-published article there)

Big data concerns reach broad and deep in new era of applications

An IT industry analyst article published by SearchITOperations.


article_Big-data-concerns-reach-broad-and-deep-in-new-era-of-applications
Big data is out there, waiting to make you rich — or help your organization succeed anyway. But there are still more unknowns than knowns about the future of big data.

Mike Matchett

Big data is being created everywhere we look, and we are all thinking about how to take advantage of it. I certainly want to come up with some novel new big data application and become fabulously wealthy just for the idea. The thing is, most companies — perhaps all — can profit from big data today just by accelerating or refining some piece of their current business, supposing they can identify and corral the right information in the right time and place.

There is no need to find a new earth-shattering application to get started. I believe a significant big data payback is right in front of any marketing, sales, production or customer-engagement team. One simply needs to find a way to unlock the buried big data treasure. And, of course, that’s where big data concerns from practical to theoretical bubble to the surface.

A big sticking point has been finding the data science expertise, especially experts who could build optimized machine learning models tailored for your exact business needs. But we are seeing some interesting efforts recently to automate and, in some ways, commoditize big data handling and complicated machine learning. These big data automation technologies enable the regular Java Joe or Josie programmer to effectively drop big data analytics into existing, day-to-day operational-focused business applications.

Not only does this have the democratizing effect of unlocking big data value for non-data scientists, but it also highlights the trend toward a new application style. In the next three to five years, we will see most business applications that we’ve long categorized as transactional converge with what we’ve implemented separately as analytical applications. Put simply, with big data power, “business intelligence” is becoming fast enough and automated enough to deliver inside the operational business process in active business timeframes.

As these data processing worlds collide, they will reveal big data concerns for IT staff, and for those making decisions on IT infrastructure and data centers. Storage, databases and even networks will all need to adapt. Along with the rise of the internet of things (IoT), hybrid cloud architectures, persistent memory and containers, 2017 is going to be a pivotal year for challenging long-held assumptions and changing IT directions.
Out-of-reach data

While I will undoubtedly focus a lot of time and energy as an industry analyst on these fast-evolving topics in the near term, there is a longer-term big data concern: Some companies might not be able to take advantage of this democratization of data simply because they can’t get access to the data they need.

We need to think about how we can ensure [big data is] reliable, how we can maintain and ensure privacy — and regulatory compliance — how we can ensure we only implement ethical and moral big data algorithms and so on.

We’ve heard warnings about how hard it is to manage big data as important data. We need to think about how we can ensure it’s reliable, how we can maintain and ensure privacy — and regulatory compliance — how we can ensure we only implement ethical and moral big data algorithms and so on. But before all that, you first need access to the data — assuming it exists or can be created — that is valuable to your company. I call this the data paucity problem — there’s too little big data in use.

As an example, I don’t believe every IoT device manufacturer will end up getting unfettered access to the data streams generated by their own things, much less to the ecosystem of data surrounding their things in the field. I think it is inevitable that some will be getting locked out of their own data flowback…(read the complete as-published article there)

Cloud, IoT to drive enterprise IT trends in 2017

An IT industry analyst article published by SearchCloudComputing.


article_Cloud-IoT-to-drive-enterprise-IT-trends-in-2017
Cloud computing has evolved quite a bit in the last few years, but it still has far to go. Technologies such as big data, containers and IoT will have a big part to play in the future.

Mike Matchett

Yes, it’s a brand new year and time to make some Next Big Thing predictions for the year to come. This year, our outline of what’s on the immediate horizon is already well known: hybrid cloud adoption, big data applications and containers. Looking a little further out at enterprise IT trends, we might see the first practical persistent storage-class memory begin to disrupt 30 years of traditionally structured data center infrastructure. And expect a hot smoking internet of things mess of requirements to land in the lap of IT folks everywhere.

All of these topics are, of course, highly interrelated. In fact, it wouldn’t surprise me to find that many organizations will have to bite the bullet on all five at the same time to handle a new internet of things (IoT) data processing application. But let’s take a quick look at each:

Cloud adoption. I am as guilty as the next pundit in predicting when cloud adoption will finally be considered a “traditional” deployment model. But this time I really mean it! VMware is demonstrating cross-cloud products. Microsoft is making real hay rolling traditional businesses, large and small, into software as a service, likeOffice365, and infrastructure as a service, like Azure. And all our favorite storage vendors are realizing that building in a cloud tier won’t shrink on-premises storage needs given the growth in data and hybrid technologies that balance and marry the best benefits of both cloud and on-premises processing.

Big data. Hadoop is a decade old now. With newer generation platforms like Apache Spark making it easier to deploy and consume big data interactively for SQL-friendly business analysis, real-time operations, machine learning and even graph-based applications, it’s time for us all to get on board this train. As I’ve said, all data can grow up into big data someday. One of the top enterprise IT trends we’ve noticed is less concern about what big data is and more focus on getting maximum value out of all that data. In fact, I predict that data access — or data paucity — will become a new corporate key performance indicator in the future.

Containers. Having predicted the fast rise of containers last year, I claim some victory here against naysayers. Containers have won even if they aren’t in production everywhere yet. Yes, there are some major issues yet to be resolved for the regular, not quite DevOps, IT organization. Many apps will never transition to containers — just like how we will have mainframe applications and VM-based appliances hanging around for decades — but open the hood of every modern application, appliance, cloud or software-defined infrastructure, and you’ll likely find containers. In fact, most of the newest enterprise IT trends covered above – especially cloud and big data — are internally powered by container-based development and deployment.

Containers have won even if they aren’t in production everywhere yet.

…(read the complete as-published article there)