Navigate data lakes to manage big data

An IT industry analyst article published by SearchStorage.


Big data sure is exciting to business folks, with all sorts of killer applications just waiting to be discovered. And you no doubt have a growing pile of data bursting the seams of your current storage infrastructure, with lots of requests to mine even more voluminous data streams. Haven’t you been collecting microsecond end-user behavior across all your customers and prospects, not to mention collating the petabytes of data exhaust from instrumenting your systems to the nth degree? Imagine the insight management would have if they could look at all that data at once. Forget about data governance, data management, data protection and all those other IT worries — you just need to land all that data in a relatively scale-cheap Hadoop cluster!

Seriously, though, big data lakes can meet growing data challenges and provide valuable new services to your business. By collecting a wide variety of data sets relevant to the business all in one place and enabling multi-talented analytics based on big data approaches that easily scale, many new data mining opportunities can be created. The total potential value of a data lake grows with the amount of useful data it holds available for analysis. And, one of the key tenets of big data and the big data lake concept is that you don’t have to create a master schema ahead of time, so non-linear growth is possible.

The enterprise data lakes or hub concept was first proposed by big data vendors like Cloudera and Hortonworks, ostensibly using vanilla scale-out HDFS-based commodity storage. But it just so happens that the more data you keep on hand, the more storage of all kinds you will need. Eventually, all corporate data is likely to be considered big data. However, not all of that corporate data is best hosted on a commodity scale-out HDFS cluster.

So, today, traditional storage vendors are signing up to the big data lakes vision. From a storage marketing perspective, it seems like data lakes are the new cloud. “Everyone needs a data lake. How can you compete without one (or two or three)?” And there are a variety of enterprise storage options for big data, including enterprise storage, that can provide remote storage that acts like HDFS, Hadoop virtualization that can translate other storage protocols into HDFS, and scalable software-defined storage options.

…(read the complete as-published article there)