An IT industry analyst article published by SearchITOperations.
If enterprise IT has been slow to support big data analytics in production for the decade-old Hadoop, there has been a much faster ramp-up now that Spark is part of the overall package. After all, doing the same old business intelligence approach with broader, bigger data (with MapReduce) isn’t exciting, but producing operational time predictive intelligence that guides and optimizes business with machine precision is a competitive must-have.
With traditional business intelligence (BI), an analyst studies a lot of data and makes some hypotheses and a conclusion to form a recommendation. Using the many big data machine learning techniques supported by Spark’s MLlib, a company’s big data can dynamically drive operational-speed optimizations. Massive in-memory machine learning algorithms enable businesses to immediately recognize and act on inherent patterns in even big streaming data.
But the commoditization of machine learning itself isn’t the only new driver here. A decade ago, IT needed to stand up either a “baby” high performance computing cluster for serious machine learning or learn to write low-level distributed parallel algorithms to run on the commodity-based Hadoop MapReduce platform. Either option required both data science and exceptionally talented IT admins that could stand up and support massive physical scale-out clusters in production. Today there are many infrastructure options for big data clusters that can help IT deploy and support big data-driven applications.
Here are seven types of big data infrastructures for IT to consider, each with core strengths and differences:…(read the complete as-published article there)