(Excerpt from original post on the Taneja Group News Blog)
When we talk about big data today we aren’t talking just about the data and its three V’s (or up to 15 depending on who you consult), but more and more about the promise of big transformation to the data center. In other words, it’s about big money.
First, consider recent news about some key Hadoop distro vendors. Many of them are now billion dollar players, much of that on speculation and expectation of future data center occupation. When Pivotal spun off from EMC it got to start with a gaggle of successful commercially deployed products giving it a tremendous day one revenue stream. With GE’s 10% outside stake at $105M that made them a billion dollar startup. Coming back from the Cloudera Analyst Event last month we found that Cloudera was doing really well with $160M in new funding, but soon thereafter Intel weighed in to top them up over a billion in funding (valuation at 4.1B). Not to be left out in the cold, Hortonworks announced a $100M round that valued them at $1B (ostensibly they claim they could take in 20x more, but are raising funds as they need them).
Second, consider the infrastructure that not just billions but trillions and more (gadzillions?) pieces of data have to still land on, even if made up of commodity disks/server clusters. Of course most companies are going to want to build out big data solutions, or they risk getting left behind competitively. But many of these are going to eventually turn into massive investments that only grow as the data grows (i.e. predicted to be exponential!) and occupy more and more of the data center, not stay constrained as little R+D projects or simple data warehouse offloading platforms.
Cleary big data is now a playing field for competition amongst billionaires. I’m sure the lot of startups in that space are only encouraged by the ecosystem wealth and opportunity for acquisition, but as the big money grows, keep an eye on how standards and open source foundations increasingly become political battlefields, with results not always in the best interest of the end user.
While there is an open source model underpinning the whole ecosystem, with this much money on the table it will be interesting to see how fair competition plays out. From my perspective it looks like big data isn’t going to be very free, or there wouldn’t be billions of dollars in bets being made. Up till now most of the ecosystem vendors have been making arguments about providing better support than the other guy. In that academic view, there is not much call for outside analysis or third party validation.
But every big, big data vendor we talk now to has some proprietary angle on how they do things better than the next guy – with lurking implied vendor lock-in – based on how enterprises can effectively manage big data or extract maximal value from it. Which sounds like the current IT vendor ecosystem we know and love. And which requires some analysis and validation to separate the wheat from the chaff, the real from the hype.
As an IT organization faced with big data challenges, how do you feel about suddenly dealing with billion dollar behemoths in a space founded on open source principles? In the end, it doesn’t really impact our recommended approach – you need to have enterprise capabilities for big data, and you always were likely to get the best of those from vendors with highly competitive proprietary technology. We’ve started working with big data vendors now as real IT solution vendors. In our book, Pivotal, Cloudera, Hortonworks and the like have simply graduated into the full-fledged IT vendor category, which can only help the IT organization faced with enterprise-level big data challenges.
…(read the full post)