What was BIG at Hadoop Summit 2015

(Excerpt from original post on the Taneja Group News Blog)

At this month’s Hadoop Summit 2015 I noted two big trends. One was the continuing focus on Spark as an expansion of the big data analytical ecosystem, with main sponsor Hortonworks (great show by the way!) and most vendors talking about how they support, interact, or deliver Spark in addition to Hadoop’s MapReduce. The other was a very noticeable direction shifting focus from trotting out ever more gee-whiz big data use cases towards talking about how to make it all work in enterprise production environments. If you ask me, this second trend is the bigger deal for IT folks to pay attention to.

…(read the full post)

A Billion Here, A Billion There – Big Data Is Big Money

(Excerpt from original post on the Taneja Group News Blog)

When we talk about big data today we aren’t talking just about the data and its three V’s (or up to 15 depending on who you consult), but more and more about the promise of big transformation to the data center. In other words, it’s about big money.

First, consider recent news about some key Hadoop distro vendors. Many of them are now billion dollar players, much of that on speculation and expectation of future data center occupation. When Pivotal spun off from EMC it got to start with a gaggle of successful commercially deployed products giving it a tremendous day one revenue stream. With GE’s 10% outside stake at $105M that made them a billion dollar startup. Coming back from the Cloudera Analyst Event last month we found that Cloudera was doing really well with $160M in new funding, but soon thereafter Intel weighed in to top them up over a billion in funding (valuation at 4.1B). Not to be left out in the cold, Hortonworks announced a $100M round that valued them at $1B (ostensibly they claim they could take in 20x more, but are raising funds as they need them).

Second, consider the infrastructure that not just billions but trillions and more (gadzillions?) pieces of data have to still land on, even if made up of commodity disks/server clusters. Of course most companies are going to want to build out big data solutions, or they risk getting left behind competitively. But many of these are going to eventually turn into massive investments that only grow as the data grows (i.e. predicted to be exponential!) and occupy more and more of the data center, not stay constrained as little R+D projects or simple data warehouse offloading platforms.

Cleary big data is now a playing field for competition amongst billionaires.  I’m sure the lot of startups in that space are only encouraged by the ecosystem wealth and opportunity for acquisition, but as the big money grows, keep an eye on how standards and open source foundations increasingly become political battlefields, with results not always in the best interest of the end user. 

While there is an open source model underpinning the whole ecosystem, with this much money on the table it will be interesting to see how fair competition plays out. From my perspective it looks like big data isn’t going to be very free, or there wouldn’t be billions of dollars in bets being made. Up till now most of the ecosystem vendors have been making arguments about providing better support than the other guy.  In that academic view, there is not much call for outside analysis or third party validation.

But every big, big data vendor we talk now to has some proprietary angle on how they do things better than the next guy – with lurking implied vendor lock-in –  based on how enterprises can effectively manage big data or extract maximal value from it. Which sounds like the current IT vendor ecosystem we know and love.  And which requires some analysis and validation to separate the wheat from the chaff, the real from the hype.

As an IT organization faced with big data challenges, how do you feel about suddenly dealing with billion dollar behemoths in a space founded on open source principles? In the end, it doesn’t really impact our recommended approach – you need to have enterprise capabilities for big data, and you always were likely to get the best of those from vendors with highly competitive proprietary technology. We’ve started working with big data vendors now as real IT solution vendors. In our book, Pivotal, Cloudera, Hortonworks and the like have simply graduated into the full-fledged IT vendor category, which can only help the IT organization faced with enterprise-level big data challenges.

…(read the full post)

Enterprise IT Will Dive Into Big Data Solutions in 2013

(Excerpt from original post on the Taneja Group News Blog)

If you are in IT, 2013 is going to be the year that you will want to dive into the “big data” pool if you haven’t been pushed in already. But don’t worry – it’s no longer sink or swim. For one, we’ll be here to help coach you through it all. And while the concepts, terminology and hype have been all over the place, once you start floating around you’ll find that under the surface much of what fills the big data pool is familiar IT infrastructure, data management, and services re-cast around a few easy-to-grasp innovations.

For example, if you are in IT and asked to pick a Hadoop distro to stand up, you’d probably start with evaluating the three main vendor distributions of Hadoop (rather than getting it straight off Apache) followed by other downstream OEM’d and pre-integrated versions.  The main supported distros are from Cloudera, Hortonworks, and MapR.  I didn’t really appreciate the differences until talking with all three individually (at 2012 NY Strata, see below).  

What really struck me was that Hadoop, from an IT perspective, is really a set of distributed data storage services at heart. Sure, you can talk about Hadoop as an architecture based on parallel functional programming that uses map/reduce tasks to move compute process to distributed data sets, or as an analytics platform for huge volumes of unstructured data, or as a commoditized/democratized scale-out data mining solution, but from an IT viewpoint, it looks and smells a lot like a way to cost-effectively store and access large volumes of data.  It’s an innovative twist on the age-old IT data storage problem when faced with ever growing amounts of data – how to support your business cost-effectively deriving value out of growing data stores.

In an nutshell, Hortonworks takes the high-road regarding the open source Hadoop and related Apache projects, and focuses on 100% open source support. There is some good argument that by sticking with open source, you are assured future open source benefits.  If you start paying for bastardized distros and versions, you will, well, have to continue paying for it.

Cloudera is more commercially focused to help business deploy productive big data solutions. While offering a free Cloudera packaged distro version (CDH), they have a premium enterprise management offering when you get over 50 nodes. CDH is the most widely used Hadoop distro today. For example, you can buy a Dell Cloudera pre-integrated cluster, ready-to-go.

MapR thought as along as they were going to focus on helping businesses compete, they shouldn’t be constrained by open source and instead aggressively optimize the whole thing including the core parts for high performance and scalability. MapR also has a free community edition (M3), but the real enterprise solutions are M5/M7 which add sophisticated enterprise storage features directly to Hadoop including mirroring, snapshots, HA, and data placement controls (MapR optimizes other parts too, including now a highly customized Hbase). In other words, at the fundamental level MapR enterprise versions look like a hard-core “big” data storage services (EMC chose MapR for its Greenplum distro).

We’ll dive into more big data topics soon, but wanted to say I thoroughly enjoyed Strata (and the combined Hadoop World) in NY a few months ago. It was sold out to the “walls”. However, folks walking in just to see even the vendor exhibition floor were turned away due to already packing the place to the maximum allowed by fire codes. The Strata/Hadoop co-sponsors O’Reilly and Cloudera should think about stepping up to a much larger venue like the Jarvitz next time. Let vendors have bigger booths and welcome all who are interested drop in. Maybe even make the vendor show floor “free” to IT folks? I’m sure the vendors would love the traffic and attention.

I am looking forward to the future Strata events – one is coming up on the West coast next month (Feb 26-28) and I’d encourage IT folks in the area to check it out. The program web page indicates they recognize the IT impact big data is having with “Big Data for Enterprise IT” themed presentations. Hope to see you there!

…(read the full post)