Visualizing (and Optimizing) Cluster Performance

(Excerpt from original post on the Taneja Group News Blog)

Clusters are the scale-out way to go in today’s data center. Why not try to architect an infrastructure that can grow linearly in capacity and/or performance? Well, one problem is that operations can get quite complex especially when you start mixing workloads and tenants on the same cluster. In vanilla big data solutions everyone can compete, and not always fairly, for the same resources. This is a growing problem in production environments where big data apps are starting to underpin key business-impacting processes.

…(read the full post)

Big data analytics applications impact storage systems

An IT industry analyst article published by SearchStorage.


Whether driven by direct competition or internal business pressure, CIOs, CDOs and even CEOs today are looking to squeeze more value, more insight and more intelligence out of their data. They no longer can afford to archive, ignore or throw away data if it can be turned into a valuable asset. At face value, it might seem like a no-brainer — “we just need to analyze all that data to mine its value.” But, as you know, keeping any data, much less big data, has a definite cost. Processing larger amounts of data at scale is challenging, and hosting all that data on primary storage hasn’t always been feasible.

Historically, unless data had some corporate value — possibly as a history trail for compliance, a source of strategic insight or intelligence that can optimize operational processes — it was tough to justify keeping it. Today, thanks in large part to big data analytics applications, that thinking is changing. All of that bulky low-level bigger data has little immediate value, but there might be great future potential someday, so you want to keep it — once it’s gone, you lose any downstream opportunity.

To extract value from all that data, however, IT must not only store increasingly large volumes of data, but also architect systems that can process and analyze it in multiple ways.

…(read the complete as-published article there)

Is It Still Artificial Intelligence? Knowm Rolls Out Adaptive Machine Learning Stack

(Excerpt from original post on the Taneja Group News Blog)

When we want to start computing at biological scales and speeds – evolving today’s practical machine learning forward towards long-deferred visions of a more “artificial intelligence” – we’ll need to take advantage of new forms of hardware that transcend the strictly digital.

Digital computing infrastructure, based on switching digital bits and separating the functions of persisting data from processing, is now facing some big hurdles with Moore’s law. Even if there are a couple of magnitudes of improvement yet to be squeezed out of the traditional digital design paradigm, it has inherent limits in power consumption, scale, and speed. For example, there simply isn’t enough power available to meet the desires of those wishing to reach biological scale and density with traditional computing infrastructure, whether evolving artificial intelligence or more practically scaling machine learning to ever larger big data sets.

Knowm Inc. is pioneering a brilliant new form of computing that leverages the adaptive “learning” properties of memristive technology to not only persist data in fast memory (as others in the industry like HP are researching), but to inherently – and in one operation – calculate serious compute functions that would otherwise require the stored data to be offloaded into CPU’s, processed, and written back (taking more time and power).

The Knowm synapse, their circuit-level integrated unit of calculation and data persistence, was inspired by biological and natural world precedent. At a philosophical level this takes some deep thinking to fully appreciate the implications and opportunities, but this is no longer just a theory. Today, Knowm is announcing their “neuromemristive” solution to market supported with a full stack of  technologies – discrete chips, scalable simulators, defined low-level API’s and higher-level machine learning libraries, and a service that can help layer large quantities of Knowm synapses directly onto existing CMOS (Back End of Line or BEOL) designs.

Knowm is aiming squarely at the machine learning market, but here at Taneja Group we think the opportunity is much larger. This approach that intelligently harnesses analog hardware functions for extremely fast, cheap, dense and memory-inherent computing could represent a truly significant change and turning point for the whole computing industry.

I look forward to finding out who will take advantage of this solution first, and potentially cause a massively disruptive shift in not just machine learning, but in how all computing is done.
 

…(read the full post)

What was BIG at Hadoop Summit 2015

(Excerpt from original post on the Taneja Group News Blog)

At this month’s Hadoop Summit 2015 I noted two big trends. One was the continuing focus on Spark as an expansion of the big data analytical ecosystem, with main sponsor Hortonworks (great show by the way!) and most vendors talking about how they support, interact, or deliver Spark in addition to Hadoop’s MapReduce. The other was a very noticeable direction shifting focus from trotting out ever more gee-whiz big data use cases towards talking about how to make it all work in enterprise production environments. If you ask me, this second trend is the bigger deal for IT folks to pay attention to.

…(read the full post)

Navigate data lakes to manage big data

An IT industry analyst article published by SearchStorage.


Big data sure is exciting to business folks, with all sorts of killer applications just waiting to be discovered. And you no doubt have a growing pile of data bursting the seams of your current storage infrastructure, with lots of requests to mine even more voluminous data streams. Haven’t you been collecting microsecond end-user behavior across all your customers and prospects, not to mention collating the petabytes of data exhaust from instrumenting your systems to the nth degree? Imagine the insight management would have if they could look at all that data at once. Forget about data governance, data management, data protection and all those other IT worries — you just need to land all that data in a relatively scale-cheap Hadoop cluster!

Seriously, though, big data lakes can meet growing data challenges and provide valuable new services to your business. By collecting a wide variety of data sets relevant to the business all in one place and enabling multi-talented analytics based on big data approaches that easily scale, many new data mining opportunities can be created. The total potential value of a data lake grows with the amount of useful data it holds available for analysis. And, one of the key tenets of big data and the big data lake concept is that you don’t have to create a master schema ahead of time, so non-linear growth is possible.

The enterprise data lakes or hub concept was first proposed by big data vendors like Cloudera and Hortonworks, ostensibly using vanilla scale-out HDFS-based commodity storage. But it just so happens that the more data you keep on hand, the more storage of all kinds you will need. Eventually, all corporate data is likely to be considered big data. However, not all of that corporate data is best hosted on a commodity scale-out HDFS cluster.

So, today, traditional storage vendors are signing up to the big data lakes vision. From a storage marketing perspective, it seems like data lakes are the new cloud. “Everyone needs a data lake. How can you compete without one (or two or three)?” And there are a variety of enterprise storage options for big data, including enterprise storage, that can provide remote storage that acts like HDFS, Hadoop virtualization that can translate other storage protocols into HDFS, and scalable software-defined storage options.

…(read the complete as-published article there)

In Memory Big Data Heats Up With Apache Ignite

(Excerpt from original post on the Taneja Group News Blog)

Recently we posted about GridGain contributing their core in-memory solution to the Apache Ignite project. While this is still incubating, it’s clear that this was a good move for GridGrain, and a win for the big data/BI community in general. Today Apache Ignite drops its v1.0 release candidate with some new features added in like built-in support for jCache and an autoloader to help migrate data and schema in from existing SQL databases (e.g. Oracle, MySQL, Postgres, DB2, Microsoft SQL, etc.).

…(read the full post)

Project Myriad Will Become Your Next Data Center Platform

(Excerpt from original post on the Taneja Group News Blog)

One of the big things bubbling around at Strata this week is talk about YARN, Mesos, and Project Myriad (initiated/sponsored by MapR).  One on hand it seems that this is just about some evolution of the Hadoop scheduling layer, but looking at with a critical eye, I see the impending culmination of what I predicted years ago – that the Hadoop ecosystem will quickly evolve to bring high-powered HPC technologies right into the heart of the next gen enterprise data center.

…(read the full post)

What’s All In That Pile of Big Data – And Our NoSQL Database?

(Excerpt from original post on the Taneja Group News Blog)

It must be about time for Strata again as all the major big data players are busy readying their “Spring Break”  round of announcements. First up, and perhaps addressing the most important aspect of putting big data to work in real corporate IT data centers, is a new security solution from the folks at Dataguise (think “data in disguise”) aimed at NoSQL. 

…(read the full post)

Out on a data storage market limb: Six predictions for 2015

An IT industry analyst article published by SearchStorage.

Our crystal ball tells us this will be a year of change for the data storage market.


With another year just getting underway, we here at Taneja Group felt we needed a few analyst predictions to get things off on the right foot. The easiest predictions, and often the most likely ones, are that things will continue mostly as they are. But what fun is that? So, like any good fortune teller, we held hands around a crystal ball, gathered our prescient thoughts and with the help of the storage spirits came up with these six predictions for change in the data storage market for 2015.

  1. The overall traditional storage market will stay relatively flat despite huge growth in big data and the onrushing Internet of Things. Most new big data will be unstructured and big data architectures like Hadoop will still tend to leverage DAS for storage. In addition, many big data players are pushing the data lake or hub concept to land even bigger chunks of other enterprise data on big data clusters. While we do see some salvation in this space from vendors […] that enable big data analysis to leverage traditional enterprise storage, it won’t be enough to make a big dent in 2015.We’ve also noticed that many storage shops have yet to take advantage of the emerging capacity optimizations now available (e.g., thin provisioning, linked clones, global deduplication, inline compression and so on) in recent versions of competitive arrays that are becoming table stakes for new acquisition decisions. Hybrid arrays, in particular, are bringing flash-enabling space efficiencies across their full complement of storage tiers, and most arrays these days are turning at least hybrid.
  2. Speaking of flash, there are too many all-flash array (AFA) vendors and not enough differentiation. During 2012/2013 the first AFA vendors had the market to themselves, but with all the big players rolling out full-fledged flash offerings, opportunities are declining. With [many vendors] all pushing their own solutions (both AFA and hybrid), the remaining independent vendors will have a harder time finding a niche where they can survive. We also expect to see a new round of very high-end performance storage architectures in 2015[…]As a related trend, we anticipate that hybrid-based Tier-1 arrays will lose ground to AFAs in general, as the cost of flash drops and flash performance proves valuable to most if not all Tier-1 I/O. In virtualization environments, this trend will be somewhat lessened by the rise in popularity of server-side flash and/or memory caching/tiering solutions.
  3. Data protection and other add-on storage capabilities will become more directly baked into storage solutions. We expect to see more traditional arrays follow the examples of

…(To read the complete six item prediction see as-published article over there)

Make this your most modern IT year yet

An IT industry analyst article published by SearchDataCenter.

No one sticks with their New Year’s resolution — where is that gym card anyway? — but you can pick up seven modern habits and feel good about fit, healthy IT ops.


Industry analysts like to make outlandish predictions at the start of every new year. But since I focus already on predicting the next big thing, I’ve decided to adopt another New Year’s tradition: making resolutions. But I have a problem with that too. I only have a few bad habits and am intent on keeping them.

Instead, I’m going to suggest some resolutions for organizations to adopt modern IT:

  1. Do something predictive with big data. Don’t even worry about the big data part if “big” doesn’t come naturally to the top of your pile of opportunities. You can always grow into bigger data efforts. But do look for a starter project to leverage the power of prediction. Commit to a project that embeds predictive algorithms or machine learning to get accustomed to what it is, what it can and can’t do, and how to approach it profitably. Some areas to consider might be to explore inherent clusters in your customer or client base, to estimate which client or transaction will succeed or fail, or to identify the most likely root causes of support issues.
  2. Reduce opex through hyperconvergence. Converged infrastructure is clearly a natural evolution under the ever-present pressure to reduce total cost of ownership. Hyperconvergence, which offers a single building block of data center infrastructure that bakes in server, storage and networking resources all into one scale-out unit, takes this process to the extreme. While it might not solve every problem, there is no doubt that a large portion of do-it-yourself data center architecture could profitably migrate onto hyperconverged platforms. If you aren’t ready to completely convert, at least resolve to evaluate hyperconverged solutions for new projects. And if that is too big a leap, at least deploy some software-defined storage this year to get comfortable with this potential “new order” for modern IT.
  3. Accelerate your infrastructure. Several acceleration technologies, like caching and in-memory processing, can easily drop into an IT environment with little effort, cost or risk. Although downstream users may complain about poor performance, they rarely ask if acceptable service could be accelerated 10x or 100x or more. Resolve to improve on the “satisfactory,” not just oiling the squeaky wheels, because acceleration technologies can spur noticeable improvements in quality of service. They can also lead to huge productivity gains for many, if not all, applications, to the point of creating competitive differentiation or even new sources of revenue.

…(for four more resolutions read the complete as-published article there)