The Advent of Big Data
(Excerpt from original post on the Taneja Group News Blog)
The advent of the era of “Big Data” presents a real new opportunity to accelerate business, improve government, and even enrich our personal lives in ways we have just begun imagining. With new methods and technology we can now feasibly mine valuable insights and knowledge out of huge mountains of raw data. At the same time, implementing big data projects is a challenge not just to the brave new world data scientists boldly exploring new algorithms and data sources, but perhaps even more so to the IT professionals tasked with implementing real world production-quality big data platforms and solutions.
Explosive data growth has been a perennial problem for IT well before big data requirements came along. We’ve seen technologies like dedupe and compression become common features in enterprise storage, and yet the data deluge keeps coming. As the world moves into the 21st century, evolving and emerging technologies increasingly instrument, measure and record finer grained details at higher volumes of ever more subjects. In the business world this means vastly more raw data is being generated every day about customers (and their behavior both on and off-line), vendors and supply chains, and internal operations and processes.
And IT is having to rethink data life cycle management because one of the tenets of big data analysis is that you have to retain all that big data (in its original raw form according to some best practices) in order to analyze it. Data deletion may become a thing of the past in favor of near infinite archives – and that certainly creates tension with risk mitigation policies that limit certain kinds of data lifetimes (i.e. email). Archiving itself takes on a whole new meaning when archives also need to be actively accessible for historical analysis.
Just capturing and maintaining the increasing stream of “3-V’s” (huge Volume, wide Variety, high Velocity) big data is challenge enough, but IT also needs to support timely analysis. This entails standing up (or subscribing to) clustered Hadoop and other advanced BI stacks that still at times seem more like university projects than commercial solutions. Hot new web companies with VC money to burn can write their own big data solutions, but even the most agile enterprise IT shop will face support challenges implementing HPC-like solutions for production big data applications.
And let’s not forget that big data analyses may also produce some fairly big results that need to be searchable and shareable across often globally distributed organizations in order to actually recognize value from the effort. Cloud storage is getting cheaper and more secure by the day, and WAN optimization technologies more efficient, but data transfer constraints and costs will still bottleneck ever larger big data workflows.
Big data and the potential benefits are tremendously exciting, and the question isn’t really if you should pursue big data projects, but can you? IT organizations will need new skills, cutting edge resources, and most of all business commitment to look for new intelligence in their big data to competitively enhance existing processes and derive new business opportunities. Big data is a chance to evolve through applied IT, but only if the business organization understands and aligns to the opportunity.
…(read the full post)