Hyperconverged Supercomputers For the Enterprise Data Center

(Excerpt from original post on the Taneja Group News Blog)

Last month NVIDIA, our favorite GPU vendor, dived into the converged appliance space. In fact we might call their new NVIDIA DGX-1 a hyperconverged supercomputer in a 4U box. Designed to support the application of GPU’s to Deep Learning (i.e. compute intensive deeply layered neural networks that need to train and run in operational timeframes over big data), this beast has 8 new Tesla P100 GPUs inside on an embedded NVLink mesh, pre-integrated with flash SSDs, decent memory, and an optimized container-hosting deep learning software stack. The best part? The price is surprisingly affordable, and can replace the 250+ server cluster you might otherwise need for effective Deep Learning.

…(read the full post)

Big Data Enterprise Maturity

(Excerpt from original post on the Taneja Group News Blog)

It’s time to look at big data again. Last week I was at Cloudera’s growing and vibrant annual analyst event to hear the latest from the folks who know what’s what. Then this week Strata (conference for data scientists) brings lots of public big data vendor announcements. A noticeable shift this year is less focus on how to apply big data and more about maturing enterprise features intended to ease wider data center level adoption. A good example is the “mixed big data workload QoS” cluster optimizating solution from Pepperdata.

…(read the full post)

Kudu Might Be Invasive: Cloudera Breaks Out Of HDFS

(Excerpt from original post on the Taneja Group News Blog)

For the IT crowd just now getting to used to the idea of big data’s HDFS (Hadoop’s Distributed File System) and it’s peculiarities, there is another alternative open source big data file system coming from Cloudera called Kudu. Like HDFS, Kudu is designed to be hosted across a scale-out cluster of commodity systems, but specifically intended to support more low-latency analytics.

…(read the full post)

Time To Use The Force, IT! – OpsDataStore Unifies Systems Management Data

(Excerpt from original post on the Taneja Group News Blog)

We are only a bit excited by the impending Star Wars release. How old were we when the first one came out? I’m not saying. We are all very excited here to see this new continuation – of the story, the characters, and the universe of the force. Especially compared to our day-to-day IT management reality which often seems stuck in the 70’s. Systems management has been around even longer than the Star Wars franchise, but it seems to have stagnated along the way. Where is the rebellion? The good Jedi warriors to save us all from the Dark side?

…(read the full post)

Visualizing (and Optimizing) Cluster Performance

(Excerpt from original post on the Taneja Group News Blog)

Clusters are the scale-out way to go in today’s data center. Why not try to architect an infrastructure that can grow linearly in capacity and/or performance? Well, one problem is that operations can get quite complex especially when you start mixing workloads and tenants on the same cluster. In vanilla big data solutions everyone can compete, and not always fairly, for the same resources. This is a growing problem in production environments where big data apps are starting to underpin key business-impacting processes.

…(read the full post)

Big data analytics applications impact storage systems

An IT industry analyst article published by SearchStorage.


Whether driven by direct competition or internal business pressure, CIOs, CDOs and even CEOs today are looking to squeeze more value, more insight and more intelligence out of their data. They no longer can afford to archive, ignore or throw away data if it can be turned into a valuable asset. At face value, it might seem like a no-brainer — “we just need to analyze all that data to mine its value.” But, as you know, keeping any data, much less big data, has a definite cost. Processing larger amounts of data at scale is challenging, and hosting all that data on primary storage hasn’t always been feasible.

Historically, unless data had some corporate value — possibly as a history trail for compliance, a source of strategic insight or intelligence that can optimize operational processes — it was tough to justify keeping it. Today, thanks in large part to big data analytics applications, that thinking is changing. All of that bulky low-level bigger data has little immediate value, but there might be great future potential someday, so you want to keep it — once it’s gone, you lose any downstream opportunity.

To extract value from all that data, however, IT must not only store increasingly large volumes of data, but also architect systems that can process and analyze it in multiple ways.

…(read the complete as-published article there)

Is It Still Artificial Intelligence? Knowm Rolls Out Adaptive Machine Learning Stack

(Excerpt from original post on the Taneja Group News Blog)

When we want to start computing at biological scales and speeds – evolving today’s practical machine learning forward towards long-deferred visions of a more “artificial intelligence” – we’ll need to take advantage of new forms of hardware that transcend the strictly digital.

Digital computing infrastructure, based on switching digital bits and separating the functions of persisting data from processing, is now facing some big hurdles with Moore’s law. Even if there are a couple of magnitudes of improvement yet to be squeezed out of the traditional digital design paradigm, it has inherent limits in power consumption, scale, and speed. For example, there simply isn’t enough power available to meet the desires of those wishing to reach biological scale and density with traditional computing infrastructure, whether evolving artificial intelligence or more practically scaling machine learning to ever larger big data sets.

Knowm Inc. is pioneering a brilliant new form of computing that leverages the adaptive “learning” properties of memristive technology to not only persist data in fast memory (as others in the industry like HP are researching), but to inherently – and in one operation – calculate serious compute functions that would otherwise require the stored data to be offloaded into CPU’s, processed, and written back (taking more time and power).

The Knowm synapse, their circuit-level integrated unit of calculation and data persistence, was inspired by biological and natural world precedent. At a philosophical level this takes some deep thinking to fully appreciate the implications and opportunities, but this is no longer just a theory. Today, Knowm is announcing their “neuromemristive” solution to market supported with a full stack of  technologies – discrete chips, scalable simulators, defined low-level API’s and higher-level machine learning libraries, and a service that can help layer large quantities of Knowm synapses directly onto existing CMOS (Back End of Line or BEOL) designs.

Knowm is aiming squarely at the machine learning market, but here at Taneja Group we think the opportunity is much larger. This approach that intelligently harnesses analog hardware functions for extremely fast, cheap, dense and memory-inherent computing could represent a truly significant change and turning point for the whole computing industry.

I look forward to finding out who will take advantage of this solution first, and potentially cause a massively disruptive shift in not just machine learning, but in how all computing is done.
 

…(read the full post)

What was BIG at Hadoop Summit 2015

(Excerpt from original post on the Taneja Group News Blog)

At this month’s Hadoop Summit 2015 I noted two big trends. One was the continuing focus on Spark as an expansion of the big data analytical ecosystem, with main sponsor Hortonworks (great show by the way!) and most vendors talking about how they support, interact, or deliver Spark in addition to Hadoop’s MapReduce. The other was a very noticeable direction shifting focus from trotting out ever more gee-whiz big data use cases towards talking about how to make it all work in enterprise production environments. If you ask me, this second trend is the bigger deal for IT folks to pay attention to.

…(read the full post)

Navigate data lakes to manage big data

An IT industry analyst article published by SearchStorage.


Big data sure is exciting to business folks, with all sorts of killer applications just waiting to be discovered. And you no doubt have a growing pile of data bursting the seams of your current storage infrastructure, with lots of requests to mine even more voluminous data streams. Haven’t you been collecting microsecond end-user behavior across all your customers and prospects, not to mention collating the petabytes of data exhaust from instrumenting your systems to the nth degree? Imagine the insight management would have if they could look at all that data at once. Forget about data governance, data management, data protection and all those other IT worries — you just need to land all that data in a relatively scale-cheap Hadoop cluster!

Seriously, though, big data lakes can meet growing data challenges and provide valuable new services to your business. By collecting a wide variety of data sets relevant to the business all in one place and enabling multi-talented analytics based on big data approaches that easily scale, many new data mining opportunities can be created. The total potential value of a data lake grows with the amount of useful data it holds available for analysis. And, one of the key tenets of big data and the big data lake concept is that you don’t have to create a master schema ahead of time, so non-linear growth is possible.

The enterprise data lakes or hub concept was first proposed by big data vendors like Cloudera and Hortonworks, ostensibly using vanilla scale-out HDFS-based commodity storage. But it just so happens that the more data you keep on hand, the more storage of all kinds you will need. Eventually, all corporate data is likely to be considered big data. However, not all of that corporate data is best hosted on a commodity scale-out HDFS cluster.

So, today, traditional storage vendors are signing up to the big data lakes vision. From a storage marketing perspective, it seems like data lakes are the new cloud. “Everyone needs a data lake. How can you compete without one (or two or three)?” And there are a variety of enterprise storage options for big data, including enterprise storage, that can provide remote storage that acts like HDFS, Hadoop virtualization that can translate other storage protocols into HDFS, and scalable software-defined storage options.

…(read the complete as-published article there)

In Memory Big Data Heats Up With Apache Ignite

(Excerpt from original post on the Taneja Group News Blog)

Recently we posted about GridGain contributing their core in-memory solution to the Apache Ignite project. While this is still incubating, it’s clear that this was a good move for GridGrain, and a win for the big data/BI community in general. Today Apache Ignite drops its v1.0 release candidate with some new features added in like built-in support for jCache and an autoloader to help migrate data and schema in from existing SQL databases (e.g. Oracle, MySQL, Postgres, DB2, Microsoft SQL, etc.).

…(read the full post)