Kudu Might Be Invasive: Cloudera Breaks Out Of HDFS

(Excerpt from original post on the Taneja Group News Blog)

For the IT crowd just now getting to used to the idea of big data’s HDFS (Hadoop’s Distributed File System) and it’s peculiarities, there is another alternative open source big data file system coming from Cloudera called Kudu. Like HDFS, Kudu is designed to be hosted across a scale-out cluster of commodity systems, but specifically intended to support more low-latency analytics.

…(read the full post)

Integrate cloud tiering with on-premises storage

An IT industry analyst article published by SearchCloudStorage.


Cloud and on-premises storage are increasingly becoming integrated. This means cloud tiering is just another option available to storage administrators. Organizations aren’t likely to move 100% of their data into cloud services, but most will want to take advantage of cloud storage benefits for at least some data. The best approaches to using cloud storage in a hybrid fashion create a seamless integration between on-premises storage resources and the cloud. The cloud tiering integration can be accomplished with purpose-built software, cloud-enabled applications or the capabilities built into storage systems or cloud gateway products.

This may be the year that public cloud adoption finally moves beyond development projects and Web 2.0 companies and enters squarely into the mainstream of IT. Cloud service providers can offer tremendous advantages in terms of elasticity, agility, scalable capacity and utility pricing. Of course, there remain some unavoidable concerns about security, competitiveness, long-term costs and performance. Also, not all applications or workloads are cloud-ready and most organizations are not able to operate fully in a public cloud. However, these concerns lead to what we are seeing in practice as a hybrid cloud approach, attempting to combine the best of both worlds.

Taneja Group research supports that view, determining that only about 10% of enterprise IT organizations are even considering moving wholesale into public clouds. The vast majority of IT shops continue to envision future architectures with cloud and on-premises infrastructure augmented by hyperconverged products, at least within the next 3-5 years. Yet, in those same shops, increasing storage consolidation, virtualization and building out cloud services are the top IT initiatives planned out for the next 18 months. These initiatives lean toward using available public cloud capabilities where it makes sense — supporting Web apps and mobile users, collaboration and sharing, deep archives, off-site backups, DRaaS and even, in some cases, as a primary storage tier.

The amount of data that many IT shops will have to store, manage, protect and help process, by many estimates, is predicted to double every year for the foreseeable future. Given very real limits on data centers, staffing and budget, it will become increasingly harder to deal with this data growth completely in-house.

…(read the complete as-published article there)

Moving to all-flash? Think about your data storage infrastructure

An IT industry analyst article published by SearchStorage.


Everyone is now onboard with flash. All the key storage vendors have at least announced entry into the all-flash storage array market, with most having offered hybrids — solid-state drive-pumped traditional arrays — for years. As silicon storage gets cheaper and denser, it seems inevitable that data centers will migrate from spinning disks to “faster, better and cheaper” options, with non-volatile memory poised to be the long-term winner.

But the storage skirmish today seems to be heading toward the total cost of ownership end of things, where two key questions must be answered:

  • How much performance is needed, and how many workloads in the data center have data with varying quality of service (QoS) requirements or data that ages out?
  • Are hybrid arrays a better choice to handle mixed workloads through advanced QoS and auto-tiering features?

All-flash proponents argue that cost and capacity will continue to drop for flash compared to hard disk drives (HDDs), and that no workload is left wanting with the ability of all-flash to service all I/Os at top performance. Yet we see a new category of hybrids on the market that are designed for flash-level performance and then fold in multiple tiers of colder storage. The argument there is that data isn’t all the same and its value changes over its lifetime. Why store older, un-accessed data on a top tier when there are cheaper, capacity-oriented tiers available?

It’s misleading to lump together hybrids that are traditional arrays with solid-state drives (SSDs) added and the new hybrids that might be one step evolved past all-flash arrays. And it can get even more confusing when the old arrays get stuffed with nothing but flash and are positioned as all-flash products. To differentiate, some industry wags like to use the term “flash-first” to describe newer-generation products purpose-built for flash speeds. That still could cause some confusion when considering both hybrids and all-flash designs. It may be more accurate to call the flash-first hybrids “flash-converged.” By being flash-converged, you can expect to buy one of these new hybrids with nothing but flash inside and get all-flash performance.

We aren’t totally convinced that the future data center will have just a two-tier system with flash on top backed by tape (or a remote cold cloud), but a “hot-cold storage” future is entirely possible as intermediate tiers of storage get, well, dis-intermediated. We’ve all predicted the demise of 15K HDDs for a while; can all the other HDDs be far behind as QoS controls get more sophisticated in handling the automatic mixing of hot and cold to create any temperature storage you might need? …

…(read the complete as-published article there)

New approaches to scalable storage

An IT industry analyst article published by SearchDataCenter.

With all these scalable storage approaches, IT organizations must evaluate the options against their data storage and analytics needs, as well as future architectures.


Unrelenting data growth has spawned new scalable storage designs.

We’ve all read the storage reports about overwhelming data growth. It’s certainly a big and growing challenge that deserves attention, but I’ll skip the part where I scare you into thinking we’re about to be overwhelmed under a deluge of data. We tend to store about as much data as we can, no matter how much data there might be. There has always been more data than we could keep. That’s why even the earliest data center storage systems implemented quotas, archives and data summarization.

The new challenge today is effectively mining business value out of the huge amount of newly useful data, with even more coming fast in all areas of IT storage: block, file, object, and big data. If you want to stay competitive, you’ll likely have to tackle some data storage scaling projects soon. Newer approaches to large-scale storage can help.
Scaling storage out into space

The first thing to consider is the difference between scale-up and scale-out approaches. Traditional storage systems are based on the scale-up principle, in which you incrementally grow storage capacity by simply adding more disks under a relatively fixed number of storage controllers (or small cluster of storage controllers, with one to four high availability pairs being common). If you exceed the system capacity (or performance drops off), you add another system alongside it.

Scale-up storage approaches are still relevant, especially in flash-first and high-end hybrid platforms, where latency and IOPS performance are important. A large amount of dense flash can serve millions of IOPS from a small footprint. Still, larger capacity scale-up deployments can create difficult challenges — rolling out multiple scale-up systems tends to fragment the storage space, creates a management burden and requires uneven CapEx investment.

In response, many scalable storage designs have taken a scale-out approach. In scale-out designs, capacity and performance throughput grow incrementally by adding more storage nodes to a networked system cluster. Scale-up designs are often interpreted as having limited vertical growth, whereas scale-out designs imply a relatively unconstrained horizontal growth. Each node can usually service client I/O requests, and depending on how data is spread and replicated internally, each node may access any data in the cluster. As a single cluster can grow to very large scale, system management remains unified (as does the namespace in most cases). This gives scale-out designs a smoother CapEx growth path and a more overall linear performance curve.

Another trend that helps address storage scalability is a shift from hierarchical file systems towards object storage…

…(read the complete as-published article there)

Violin Bares New Gen of All-Flash Teeth

(Excerpt from original post on the Taneja Group News Blog)

Recently I hosted a second all-flash vendor panel on our Taneja Group BrightTalk channel. Violin Memory and IBM squared off (and agreed on many points) on the current good use cases, applications, and futures of the high-performance end of the flash storage market, with Netapp providing some balanced commentary preparing for their upcoming all-flash solution (FlashRay). Violin’s perspective was that efficiently designed all-flash systems are starting to be so cost-competitive  they are taking over tier 1 AND tier 2 workloads in the datacenter, and by the way, a competitive data center has already completely transitioned to their all-flash storage.

…(read the full post)

A new gen for NexGen?

(Excerpt from original post on the Taneja Group News Blog)

NexGen was one of the first real flash/hybrid with QoS storage solutions, and it leveraged PCIe flash (i.e. Fusion-IO cards) to great effect. Which we suppose had something to do with why Fusion-IO bought them up a couple of years ago. But whatever plans were in the works were likely messed up when SanDisk in-turn bought Fusion-IO because we haven’t heard form NexGen folks in awhile – not a good sign for a storage solution. Well, SanDisk has now spun NexGen back out on its own. While it may be sink or swim time for the NexGen team, we think it’s a good opportunity for all involved. 

…(read the full post)

Out on a data storage market limb: Six predictions for 2015

An IT industry analyst article published by SearchStorage.

Our crystal ball tells us this will be a year of change for the data storage market.


With another year just getting underway, we here at Taneja Group felt we needed a few analyst predictions to get things off on the right foot. The easiest predictions, and often the most likely ones, are that things will continue mostly as they are. But what fun is that? So, like any good fortune teller, we held hands around a crystal ball, gathered our prescient thoughts and with the help of the storage spirits came up with these six predictions for change in the data storage market for 2015.

  1. The overall traditional storage market will stay relatively flat despite huge growth in big data and the onrushing Internet of Things. Most new big data will be unstructured and big data architectures like Hadoop will still tend to leverage DAS for storage. In addition, many big data players are pushing the data lake or hub concept to land even bigger chunks of other enterprise data on big data clusters. While we do see some salvation in this space from vendors […] that enable big data analysis to leverage traditional enterprise storage, it won’t be enough to make a big dent in 2015.We’ve also noticed that many storage shops have yet to take advantage of the emerging capacity optimizations now available (e.g., thin provisioning, linked clones, global deduplication, inline compression and so on) in recent versions of competitive arrays that are becoming table stakes for new acquisition decisions. Hybrid arrays, in particular, are bringing flash-enabling space efficiencies across their full complement of storage tiers, and most arrays these days are turning at least hybrid.
  2. Speaking of flash, there are too many all-flash array (AFA) vendors and not enough differentiation. During 2012/2013 the first AFA vendors had the market to themselves, but with all the big players rolling out full-fledged flash offerings, opportunities are declining. With [many vendors] all pushing their own solutions (both AFA and hybrid), the remaining independent vendors will have a harder time finding a niche where they can survive. We also expect to see a new round of very high-end performance storage architectures in 2015[…]As a related trend, we anticipate that hybrid-based Tier-1 arrays will lose ground to AFAs in general, as the cost of flash drops and flash performance proves valuable to most if not all Tier-1 I/O. In virtualization environments, this trend will be somewhat lessened by the rise in popularity of server-side flash and/or memory caching/tiering solutions.
  3. Data protection and other add-on storage capabilities will become more directly baked into storage solutions. We expect to see more traditional arrays follow the examples of

…(To read the complete six item prediction see as-published article over there)

New choices bring enterprise big data home

An IT industry analyst article published by SearchDataCenter.

Enterprises recognize the tantalizing value of big data analytics, but traditional concerns about data management and security have held back deployments — until now.


article_New-choices-bring-enterprise-big-data-home
When big data practices come to your organization, it’s all about location, location, location.

I’ve heard recently from a bunch of big-data-related vendors that are all vying to gain from your sure-to-grow big data footprint. After all, big data isn’t about minimizing your data set, but making the best use of as much data as you can possibly manage. That’s not a bad definition of big data if you are still looking for one. With all this growing data, you will need a growing data center infrastructure to match.

This big data craze really got started with Apache Hadoop’s Distributed File System (HDFS), which unlocked the vision of massive data analysis based on cost-effective scale-out clusters of commodity servers using relatively cheap local attached disks. Hadoop and its ecosystem of solutions let you keep and analyze all kinds of data in its natural raw low-level form (i.e., not fully database structured), no matter how much you pile up or how fast it grows.

The problem is that once you get beyond, err, data science projects, old familiar enterprise data management issues return to the forefront, including data security, protection, reliability, operational performance and creeping Opex costs.

While Hadoop and HDFS mature with each release, there are still a lot of gaps when it comes to meeting enterprise requirements. It turns out that those commodity scale-out clusters of direct-attached storage (DAS) might not actually offer the lowest total cost of ownership when big data lands in production operations…

…(read the complete as-published article there)

Figuring out the real price of flash technology

An IT industry analyst article published by SearchSolidStateStorage.

Sometimes comparing the costs of flash arrays is an apples-to-oranges affair — interesting, but not very helpful.


article_Figuring-out-the-real-price-of-flash-technology
We’re often told by hybrid and all-flash array vendors that their particular total cost of ownership (TCO) is effectively lower than the other guy’s. We’ve even heard vendors claim that by taking certain particulars into account, the per-gigabyte price of their flash solution is lower than that of spinning disk. Individually, the arguments sound compelling; but stack them side by side and you quickly run into apples-and-oranges issues.

Storage has a lot of factors that should be profiled and evaluated such as IOPS, latency, bandwidth, protection, reliability, consistency and so on, and these must match up with client workloads with unique read/write mixes, burstiness, data sizes, metadata overhead and quality of service/service-level agreement requirements. Standard benchmarks may be interesting, but the best way to evaluate storage is to test it under your particular production workloads; a sophisticated load gen and modeling tool like that from Load DynamiX can help with that process.

But as analysts, when we try to make industry-level evaluations hoping to compare apples to apples, we run into a host of half-hidden factors we’d like to see made explicitly transparent if not standardized across the industry. Let’s take a closer look…

…(read the complete as-published article there)