5 trends driving the big data evolution

An IT industry analyst article published by SearchDataManagement.


article_5-trends-driving-the-big-data-evolution
The speedy evolution of big data technologies is connected to five trends, including practical applications of machine learning and cheap, abundantly available compute resources.

Mike Matchett
Small World Big Data

I’ve long said that all data will eventually become big data, and big data platforms will evolve into our next-generation data processing platform. We have reached a point in big data evolution where it is now mainstream, and if your organization is not neck-deep in figuring out how to implement big data technologies, you might be running out of time.

Indeed, the big data world continues to change rapidly, as I observed recently at the Strata Data Conference in New York. While there, I met with over a dozen key vendors in sessions and on the show floor.

Overall, the folks attending conferences like this one are less and less those slightly goofy and idealistic, open source research-focused geeks, and are more real-world big data and machine learning practitioners looking to solve real business problems in enterprise production environments. Given that basic vibe, here are my top five takeaways from Strata on the big data trends that are driving the big data evolution.

1. Structured data

Big data isn’t just about unstructured or semi-structured data anymore. Many of the prominent vendors, led by the key platform providers like Hortonworks, MapR and Cloudera, are now talking about big data implementations as full enterprise data warehouses (EDWs). The passive, often swampy data lake idea seems a bit passé, while there is a lot of energy aimed at providing practical, real-time business intelligence to a wider corporate swath of BI consumers.

I noted a large number of the big data-based acceleration competitors are applying on-demand analytics against tremendous volumes — both historical and streaming IoT style — of structured data.

Clearly, there is a war going on for the corporate BI and EDW investment. Given what I’ve seen, my bet is on big data platforms to inevitably outpace and outperform monolithic and proprietary legacy EDW.

2. Converged system of action

This leads into the observation that big data evolution includes implementations that host more and more of a company’s entire data footprint — structured and unstructured data together.

We’ve previously noted that many advanced analytical approaches can add tremendous value when they combine many formerly disparate corporate data sets of all different types…(read the complete as-published article there)

Scaling All Flash to New Heights – DDN Flashscale All Flash Array Brings HPC to the Data Center

(Excerpt from original post on the Taneja Group News Blog)

It’s time to start thinking about massive amounts of flash in the enterprise data center. I mean PBs of flash for the biggest, baddest, fastest data-driven applications out there. This amount of flash requires an HPC-capable storage solution brought down and packaged for enterprise IT management. Which is where Data Domain Networks (aka DDN) is stepping up. Perhaps too quietly, they have been hard at work pivoting their high-end HPC portfolio into the enterprise space. Today they are rolling out a massively scalable new flash-centric Flashscale 14KXi storage array that will help them offer complete, comprehensive single-vendor big data workflow solutions – from the fastest scratch through the biggest throughput parallel file systems into the largest distributed object storage archives.

…(read the full post)

Hyperconverged Supercomputers For the Enterprise Data Center

(Excerpt from original post on the Taneja Group News Blog)

Last month NVIDIA, our favorite GPU vendor, dived into the converged appliance space. In fact we might call their new NVIDIA DGX-1 a hyperconverged supercomputer in a 4U box. Designed to support the application of GPU’s to Deep Learning (i.e. compute intensive deeply layered neural networks that need to train and run in operational timeframes over big data), this beast has 8 new Tesla P100 GPUs inside on an embedded NVLink mesh, pre-integrated with flash SSDs, decent memory, and an optimized container-hosting deep learning software stack. The best part? The price is surprisingly affordable, and can replace the 250+ server cluster you might otherwise need for effective Deep Learning.

…(read the full post)

Big data analytics applications impact storage systems

An IT industry analyst article published by SearchStorage.


Whether driven by direct competition or internal business pressure, CIOs, CDOs and even CEOs today are looking to squeeze more value, more insight and more intelligence out of their data. They no longer can afford to archive, ignore or throw away data if it can be turned into a valuable asset. At face value, it might seem like a no-brainer — “we just need to analyze all that data to mine its value.” But, as you know, keeping any data, much less big data, has a definite cost. Processing larger amounts of data at scale is challenging, and hosting all that data on primary storage hasn’t always been feasible.

Historically, unless data had some corporate value — possibly as a history trail for compliance, a source of strategic insight or intelligence that can optimize operational processes — it was tough to justify keeping it. Today, thanks in large part to big data analytics applications, that thinking is changing. All of that bulky low-level bigger data has little immediate value, but there might be great future potential someday, so you want to keep it — once it’s gone, you lose any downstream opportunity.

To extract value from all that data, however, IT must not only store increasingly large volumes of data, but also architect systems that can process and analyze it in multiple ways.

…(read the complete as-published article there)

InfiniBand Updates Specs Preparing for 10000 Node Exascale Clusters

(Excerpt from original post on the Taneja Group News Blog)

We’ve long been fans of InfiniBand, watching as new generations of enterprise class scale-out clusters and storage solutions learn from the HPC world how to achieve really high-speed interconnection. InfiniBand itself may never win the popular market race against Ethernet, but newer generations of Ethernet are looking more and more like InfiniBand. And parts of the IB world, namely RDMA and RoCE, have swept into datacenters almost unaware (e.g. look under the hood of SMB 3.0).

…(read the full post)