Can your cluster management tools pass muster?

An IT industry analyst article published by SearchDataCenter.


A big challenge for IT is managing big clusters effectively, especially with bigger data, larger mashed-up workflows, and the need for more agile operations.

Cluster designs are everywhere these days. Popular examples include software-defined storage, virtual infrastructure, hyper-convergence, public and private clouds, and, of course, big data. Clustering is the scale-out way to architect infrastructure to use commodity resources like servers and JBODs. Scale-out designs can gain capacity and performance incrementally, reaching huge sizes cost-effectively compared to most scale-up infrastructure.

Big clusters are appealing because they support large-scale convergence and consolidation initiatives that help optimize overall CapEx. So why haven’t we always used cluster designs for everyday IT infrastructure? Large cluster management and operations are quite complex, especially when you start mixing workloads and tenants. If you build a big cluster, you’ll want to make sure it gets used effectively, and that usually means hosting multiple workloads. As soon as that happens, IT has trouble figuring out how to prioritize or share resources fairly. This has never been easy — the total OpEx in implementing, provisioning, and optimally managing shared clustered architectures is often higher than just deploying fully contained and individually assigned scale-up products.

…(read the complete as-published article there)

Container technology’s role in storage

An IT industry analyst article published by SearchServerVirtualization.


Could containers dethrone virtual machines as the next generation compute architecture? I’ve heard many industry folks say that containers are moving faster into real deployments than almost any previous technology, driven by application developers, DevOps and business-side folks looking for agility as much as IT needs efficiency and scale.

Containers were one of the hottest topics at VMworld 2015. VMware clearly sees a near-term mash-up of virtual machines and containers coming quickly to corporate data centers. And IT organizations still need to uphold security and data management requirements — even with containerized applications. VMware has done a bang-up job of delivering that on the VM side, and now it’s weighed in with designs that extend its virtualization and cloud management solutions to support (and, we think, ultimately assimilate) enterprise containerization projects.

VMware’s new vSphere Integrated Containers (VICs) make managing and securing containers, which in this case are running nested in virtual machines (called “virtual container hosts”), pretty much the same as managing and securing traditional VMs. The VICs show up in VMware management tools as first-class IT managed objects equivalent to VMs, and inherit much of what of vSphere offers to virtual machine management, including robust security. This makes container adoption something every VMware customer can simply slide into.

However, here at Taneja Group we think the real turning point for container adoption will be when containers move beyond being simply stateless compute engines and deal directly with persistent data.

…(read the complete as-published article there)

Flash storage market remains a tsunami

An IT industry analyst article published by SearchSolidStateStorage.


A few months ago, Taneja Group surveyed 694 enterprise IT folks (about half in management, half in architecture/operations) about their storage acceleration and performance needs, perceptions and plans. Of course, examining the role and future of flash storage was a big part of our analysis of the flash storage market.

One of the key questions we asked was if they each thought that all-flash arrays would be used for all tier 1 workloads in the enterprise data center by the end of 2017, less than two years out. We found that 18% agreed without qualification, while another 35% agreed but thought they might need more time to accommodate natural storage refresh cycles. Together, that’s a majority of 53% firmly in the all-flash future camp, while only 10% outright disagreed that all-flash was going to be the dominant future storage platform.

Of course “tier 1” can mean different things to different folks, but people generally agree that tier 1 is their primary application storage powering key business processes. We followed up with several vendors about their all-flash future footprint visions and, unsurprisingly, we found broader, more inclusive descriptions. In general, all-flash array vendors think that all tier 1 and tier 2 data center storage could be on all-flash, while vendors with wider portfolios — including traditional storage and hybrids — have naturally hedged their bets on the flash storage market to “let” the client obtain what they see as best fitting their needs.

…(read the complete as-published article there)

Big data analytics applications impact storage systems

An IT industry analyst article published by SearchStorage.


Whether driven by direct competition or internal business pressure, CIOs, CDOs and even CEOs today are looking to squeeze more value, more insight and more intelligence out of their data. They no longer can afford to archive, ignore or throw away data if it can be turned into a valuable asset. At face value, it might seem like a no-brainer — “we just need to analyze all that data to mine its value.” But, as you know, keeping any data, much less big data, has a definite cost. Processing larger amounts of data at scale is challenging, and hosting all that data on primary storage hasn’t always been feasible.

Historically, unless data had some corporate value — possibly as a history trail for compliance, a source of strategic insight or intelligence that can optimize operational processes — it was tough to justify keeping it. Today, thanks in large part to big data analytics applications, that thinking is changing. All of that bulky low-level bigger data has little immediate value, but there might be great future potential someday, so you want to keep it — once it’s gone, you lose any downstream opportunity.

To extract value from all that data, however, IT must not only store increasingly large volumes of data, but also architect systems that can process and analyze it in multiple ways.

…(read the complete as-published article there)

Memristor technology brings about an analog revolution

An IT industry analyst article published by SearchDataCenter.


We are always driven to try to do smarter things faster. It’s human nature. In our data centers, we layer machine learning algorithms over big and fast data streams to create that special competitive business edge (or greater social benefit!).

Yet for all its processing power, performance and capacity, today’s digital-based computing and storage can’t compare to what goes on inside each of our very own, very analog brains, which vastly outstrip digital architectures by six, seven or even eight orders of magnitude. If we want to compute at biological scales and speeds, we must take advantage of new forms of hardware that transcend the strictly digital.

Many applications of machine learning are based on examining data’s inherent patterns and behavior, and then using that intelligence to classify what we know, predict what comes next, and identify abnormalities. This isn’t terribly different from our own neurons and synapses, which learn from incoming streams of signals, store that learning, and allow it to be used “forward” to make more intelligent decisions (or take actions). In the last 30 years, AI practitioners have built practical neural nets and other types of machine learning algorithms for various applications, but they are all bound today by the limitations of digital scale (an exponentially growing Web of interconnections is but one facet of scale) and speed.

Today’s digital computing infrastructure, based on switching digital bits, faces some big hurdles to keep up with Moore’s Law. Even if there are a couple of magnitudes of improvement yet to be squeezed out of the traditional digital design paradigm, there are inherent limits in power consumption, scale and speed. Whether we’re evolving artificial intelligence into humanoid robots or more practically scaling machine learning to ever-larger big data sets to better target the advertising budget, there simply isn’t enough raw power available to reach biological scale and density with traditional computing infrastructure.

…(read the complete as-published article there)

IT pros get a handle on machine learning and big data

An IT industry analyst article published by SearchDataCenter.


Machine learning is the force behind many big data initiatives. But things can go wrong when implementing it, with significant effects on IT operations.

Unfortunately, predictive modeling can be fraught with peril if you don’t have a firm grasp of the quality and veracity of the input data, the actual business goal and the real world limits of prediction (e.g., you can’t avoid black swans).

It’s also easy for machine learning and big data beginners to either make ineffectively complex models or “overtrain” on the given data (learning too many details of the specific training data that don’t apply generally). In fact, it’s quite hard to really know when you have achieved the smartest yet still “generalized” model to take into production.

Another challenge is that the metrics of success vary widely depending on the use case. There are dozens of metrics used to describe the quality and accuracy of the model output on test data. Even as an IT generalist, it pays to at least get comfortable with the matrix of machine learning outcomes, expressed with quadrants for the counts of true positives, true negatives, false positives (items falsely identified as positive) and false negatives (positives that were missed).

…(read the complete as-published article there)

Integrate cloud tiering with on-premises storage

An IT industry analyst article published by SearchCloudStorage.


Cloud and on-premises storage are increasingly becoming integrated. This means cloud tiering is just another option available to storage administrators. Organizations aren’t likely to move 100% of their data into cloud services, but most will want to take advantage of cloud storage benefits for at least some data. The best approaches to using cloud storage in a hybrid fashion create a seamless integration between on-premises storage resources and the cloud. The cloud tiering integration can be accomplished with purpose-built software, cloud-enabled applications or the capabilities built into storage systems or cloud gateway products.

This may be the year that public cloud adoption finally moves beyond development projects and Web 2.0 companies and enters squarely into the mainstream of IT. Cloud service providers can offer tremendous advantages in terms of elasticity, agility, scalable capacity and utility pricing. Of course, there remain some unavoidable concerns about security, competitiveness, long-term costs and performance. Also, not all applications or workloads are cloud-ready and most organizations are not able to operate fully in a public cloud. However, these concerns lead to what we are seeing in practice as a hybrid cloud approach, attempting to combine the best of both worlds.

Taneja Group research supports that view, determining that only about 10% of enterprise IT organizations are even considering moving wholesale into public clouds. The vast majority of IT shops continue to envision future architectures with cloud and on-premises infrastructure augmented by hyperconverged products, at least within the next 3-5 years. Yet, in those same shops, increasing storage consolidation, virtualization and building out cloud services are the top IT initiatives planned out for the next 18 months. These initiatives lean toward using available public cloud capabilities where it makes sense — supporting Web apps and mobile users, collaboration and sharing, deep archives, off-site backups, DRaaS and even, in some cases, as a primary storage tier.

The amount of data that many IT shops will have to store, manage, protect and help process, by many estimates, is predicted to double every year for the foreseeable future. Given very real limits on data centers, staffing and budget, it will become increasingly harder to deal with this data growth completely in-house.

…(read the complete as-published article there)

Intro to machine learning algorithms for IT professionals

An IT industry analyst article published by SearchDataCenter.


Our data center machines, due to all the information we feed them, are getting smarter. How can you use machine learning to your advantage?

Machine learning is a key part of how big data brings operational intelligence into our organizations. But while machine learning algorithms are fascinating, the science gets complex very quickly. We can’t all be data scientists, but IT professionals need to learn about how our machines are learning.

We are increasingly seeing practical and achievable goals for machine learning, such as finding usable patterns in our data and then making predictions. Often, these predictive models are used in operational processes to optimize an ongoing decision-making process, but they can also provide key insight and information to inform strategic decisions.

The basic premise of machine learning is to train an algorithm to predict an output value within some probabilistic bounds when it is given specific input data. Keep in mind that machine learning techniques today are inductive, not deductive — it leads to probabilistic correlations, not definitive conclusions.

…(read the complete as-published article there)

Navigate data lakes to manage big data

An IT industry analyst article published by SearchStorage.


Big data sure is exciting to business folks, with all sorts of killer applications just waiting to be discovered. And you no doubt have a growing pile of data bursting the seams of your current storage infrastructure, with lots of requests to mine even more voluminous data streams. Haven’t you been collecting microsecond end-user behavior across all your customers and prospects, not to mention collating the petabytes of data exhaust from instrumenting your systems to the nth degree? Imagine the insight management would have if they could look at all that data at once. Forget about data governance, data management, data protection and all those other IT worries — you just need to land all that data in a relatively scale-cheap Hadoop cluster!

Seriously, though, big data lakes can meet growing data challenges and provide valuable new services to your business. By collecting a wide variety of data sets relevant to the business all in one place and enabling multi-talented analytics based on big data approaches that easily scale, many new data mining opportunities can be created. The total potential value of a data lake grows with the amount of useful data it holds available for analysis. And, one of the key tenets of big data and the big data lake concept is that you don’t have to create a master schema ahead of time, so non-linear growth is possible.

The enterprise data lakes or hub concept was first proposed by big data vendors like Cloudera and Hortonworks, ostensibly using vanilla scale-out HDFS-based commodity storage. But it just so happens that the more data you keep on hand, the more storage of all kinds you will need. Eventually, all corporate data is likely to be considered big data. However, not all of that corporate data is best hosted on a commodity scale-out HDFS cluster.

So, today, traditional storage vendors are signing up to the big data lakes vision. From a storage marketing perspective, it seems like data lakes are the new cloud. “Everyone needs a data lake. How can you compete without one (or two or three)?” And there are a variety of enterprise storage options for big data, including enterprise storage, that can provide remote storage that acts like HDFS, Hadoop virtualization that can translate other storage protocols into HDFS, and scalable software-defined storage options.

…(read the complete as-published article there)

Data aware storage yields insights into business info

An IT industry analyst article published by SearchDataCenter.


Many people think that IT infrastructure is critical, but not something that provides unique differentiation and competitive value. But that’s about to change, as IT starts implementing more “data-aware” storage in the data center.

When business staffers are asked what IT should and could do for them, they can list out confused, contrary and naïve desires that have little to do with infrastructure (assuming minimum service levels are met). As IT shops grow to become service providers to their businesses, they pay more attention to what is actually valuable to the systems they serve. The best IT shops are finding that a closer look at what infrastructure can do “autonomically” yields opportunities to add great value.

Today, IT storage infrastructure is smarter about the data it holds. Big data processing capabilities provide the motivation to investigate formerly disregarded data sets. Technological resources are getting denser and more powerful — converged is the new buzzword across infrastructure layers — and core storage is not only getting much faster with flash and in-memory approaches, but can take advantage of a glut of CPU power to locally perform additional tasks.

Storage-side processing isn’t just for accelerating latency-sensitive financial applications anymore. Thanks to new kinds of metadata analysis, it can help IT create valuable new data services…

…(read the complete as-published article there)