What’s a Multi-cloud Really?  Some Insider Notes from VMworld 2017

(Excerpt from original post on the Taneja Group News Blog)

As comfortable 65-70 degree weather blankets New England here as we near end of summer, flying into Las Vegas for VMworld at 110 degrees seemed like dropping into hell. Last time I was in that kind of heat I was stepping off a C-130 into the Desert Shield/Desert Storm theater of operations. At least here, as everyone still able to breathe immediately says -“at least it’s a dry heat.”

…(read the full post)

Storage technologies evolve toward a data-processing platform

An IT industry analyst article published by SearchDataCenter.

Emerging technologies such as containers, HCI and big data have blurred the lines between compute and storage platforms, breaking down traditional IT silos.

Mike Matchett

With the rise of software-defined storage, in which storage services are implemented as a software layer, the whole idea of data storage is being re-imagined. And with the resulting increase in the convergence of compute with storage, the difference between a storage platform and a data-processing platform is further eroding.

Storage takes new forms

Let’s look at a few of the ways that storage is driving into new territory:

  • Now in containers! Almost all new storage operating systems, at least under the hood, are being written as containerized applications. In fact, we’ve heard rumors that some traditional storage systems are being converted to containerized form. This has a couple of important implications, including the ability to better handle massive scale-out, increased availability, cloud-deployment friendliness and easier support for converging computation within the storage.
  • Merged and converged. Hyper-convergence bakes software-defined storage into convenient, modular appliance units of infrastructure. Hyper-converged infrastructure products, such as those from Hewlett Packard Enterprise’s SimpliVity and Nutanix, can greatly reduce storage overhead and help build hybrid clouds. We also see innovative approaches merging storage and compute in new ways, using server-side flash (e.g., Datrium), rack-scale infrastructure pooling (e.g., Drivescale) or even integrating ARM processors on each disk drive (e.g., Igneous).
  • Bigger is better. If the rise of big data has taught us anything, it’s that keeping more data around is a prerequisite for having the opportunity to mine value from that data. Big data distributions today combine Hadoop and Spark ecosystems, various flavors of databases and scale-out system management into increasingly general-purpose data-processing platforms, all powered by underlying big data storage tools (e.g., Hadoop Distributed File System, Kudu, Alluxio).
  • Always faster. If big is good, big and fast are even better. We are seeing new kinds of automatically tiered and cached big data storage and data access layer products designed around creating integrated data pipelines. Many of these tools are really converged big data platforms built for analyzing big and streaming data at internet of things (IoT) scales.

The changing fundamentals

Powering many of these examples are interesting shifts in underlying technical capabilities. New data processing platforms are handling more metadata per unit of data than ever before. More metadata leads to new, highly efficient ways to innovate …(read the complete as-published article there)

Data center storage architecture gets smarter with AI

An IT industry analyst article published by SearchStorage.

Trends, such as event-triggered computing, as exemplified by Lambda Architectures, converge on data center storage to hasten data center intelligence evolution.

Mike Matchett

Infrastructure is getting smarter by the day. It’s reached the point where I’m afraid artificially intelligent IT will soon turn the tables and start telling me how to manage my own personal “lifecycle.” Well, I would be afraid if I believed all those AI vendors suddenly claiming they offer AI-powered infrastructure.

Now, we all want smarter, more automated, self-optimizing infrastructure — especially with storage — but I don’t see storage infrastructure components engaging in a human conversation with people about anything anytime soon. Storage is definitely getting smarter in more practical ways, however, and these changes are being seen in places such as data center storage architecture.

I’m excited by the hot storage trend toward embedding machine learning algorithms aimed at key optimization, categorization, search and pattern detection tasks. Corporate data assets are growing, and so is the potential value that comes from gathering and analyzing big data. It’s difficult to manually find those nuggets of data gold, though. And with the coming onslaught of the internet of things (IoT), data prospecting challenges will add mining huge amounts of fast streaming, real-time machine-generated and operational transactional data to the mix.

To help us take advantage of these potential information riches, storage vendors have started inserting intelligent algorithms into the storage layer directly…(read the complete as-published article there)

Actual Hybrid of Enterprise Storage and Public Cloud? Oracle creates a Cloud Converged System

(Excerpt from original post on the Taneja Group News Blog)

What’s a Cloud Converged system? It is really what us naive people thought hybrid storage was all about all along.  Yet until now no high performance enterprise class storage ever actually delivered it.  But now, Oracle’s latest ZFS Storage Appliance, the ZS5, comes natively integrated with Oracle Cloud storage. What does that mean? On-premise ZS5 Storage Object pools now extend organically into Oracle Cloud storage (which is also made up of ZS storage) – no gateway or third party software required.
Oracle has essentially brought enterprise hybrid cloud storage to market, no integration required. I’m not really surprised that Oracle has been able to roll this out, but I am a little surprised that they are leading the market in this area.
Why hasn’t Dell EMC come up with a straightforward hybrid cloud leveraging their enterprise storage and cloud solutions? Despite having all the parts, they failed to actually produce the long desired converged solution – maybe due to internal competition between infrastructure and cloud divisions? Well, guess what. Customers want to buy hybrid storage, not bundles or bunches of parts and disparate services that could be integrated (not to mention wondering who supports the resulting stack of stuff).
Some companies so married to their legacy solutions that they, like NetApp for example, don’t even offer their own cloud services – maybe they were hoping this cloud thing would just blow over? Maybe all those public cloud providers would stick with web 2.0 apps and wouldn’t compete for enterprise GB dollars?
(Microsoft does have StorSimple which may have pioneered on-prem storage integrated with cloud tiering (to Azure). However, StorSimple is not a high performance, enterprise class solution (capable of handling PBs+ with massive memory accelerated performance). And it appears that Microsoft is no longer driving direct sales of StorSimple, apparently positioning it now only as one of many on-ramps to herd SME’s fully into Azure.)
We’ve reported on the Oracle ZFS Storage Appliance itself before. It has been highly augmented over the years. The Oracle ZFS Storage Appliance is a great filer on its own, competing favorably on price and performance with all the major NAS vendors. And it provides extra value with all the Oracle Database co-engineering poured into it.  And now that it’s inherently cloud enabled, we think for some folks it’s likely the last storage NAS they will ever need to invest in (if you’ll want more performance, you will likely move to in-memory solutions, and if you want more capacity – well that’s what the cloud is for!).
Oracle’s Public Cloud is made up of – actually built out of – Oracle ZFS Storage Appliances. That means the same storage is running on the customer’s premise as in the public cloud they are connected with. Not only does this eliminate a whole raft of potential issues, but solving any problems that might arise is going to be much simpler – (and less likely to happen given the scale of Oracle’s own deployment of their own hardware first).
Compare this to NetApp’s offering to run a virtual image of NetApp storage in a public cloud that only layers up complexity and potential failure points. We don’t see many taking the risk of running or migrating production data into that kind of storage. Their NPS co-located private cloud storage is perhaps a better offering, but the customer still owns and operates all the storage – there is really no public cloud storage benefit like elasticity or utility pricing.
Other public clouds and on-prem storage can certainly be linked with products like Attunity CloudBeam, or additional cloud gateways or replication solutions.  But these complications are exactly what Oracle’s new offering does away with.
There is certainly a core vendor alignment of on-premises Oracle storage with an Oracle Cloud subscription, and no room for cross-cloud brokering at this point. But a ZFS Storage Appliance presents no more technical lock-in than any other NAS (other than the claim that they are more performant at less cost, especially for key workloads that run Oracle Database.), nor does Oracle Cloud restrict the client to just Oracle on-premise storage.
And if you are buying into the Oracle ZFS family, you will probably find that the co-engineering benefits with Oracle Database (and Oracle Cloud) makes the set of them all that much more attractive (technically and financially). I haven’t done recent pricing in this area, but I think we’d find that while there may be cheaper cloud storage prices per vanilla GB out there, looking at the full TCO for an enterprise GB, hybrid features and agility could bring Oracle Cloud Converged Storage to the top of the list.

…(read the full post)

Showback the value of your data storage expertise

An IT industry analyst article published by SearchCloudStorage.

To demonstrate value, IT must provide an easy-to-understand cost model to its business leaders. This has fostered IT showback projects. Yet showback isn’t easy to achieve.

Mike Matchett

It seems everyone is trying to eliminate the storage admin. If the general virtual admin can manage storage or the cloud takes care of it, why not? Well, for one, it would greatly increase all kinds of risk. IT operations could always become more automated and infrastructure silos will naturally merge, sure, but advanced storage responsibilities like data protection and governance are complex, tricky, and often require close supervision and expertise.

It’s not just storage that’s in trouble, either. Public clouds provide a simple price structure that businesses can not only directly map to utilization, but also often unfairly compare to their internal IT allocation, or tax, for similar services.

So how can IT make clear the relative value of storage knowhow in this new world of automation, virtualization, convergence and cloud computing…(read the complete as-published article there)

Smart storage systems smart for business

An IT industry analyst article published by SearchStorage.

Mike Matchett explains how data-aware storage combined with application awareness is leading to a new wave of intelligent data storage.

Mike Matchett

Not too long ago, storage arrays were holed up deep in the data center and manageable without requiring much knowledge about the data actually stored therein. A storage admin might have known it was database data for a key application requiring high performance and solid backups, for example, but the database administrator took care of all the data-specific details. Today, this artificial wall separating information about data and the storage that it holds is changing, and rapidly.

Convergence isn’t only closing the gaps between silos of infrastructure, it is collapsing the distance between the job of persistence on the back end in storage and what stored data actually means and is used for on the front end. No longer desirable or even sufficient to store and protect bit patterns deep in the innards of the data center, you must now manage storage in ways that directly advance business operations.

In fact, it’s becoming a competitive necessity to leverage data at every level, or tier, of persistence throughout the data’s lifecycle. This is good for IT folks, as new data-aware storage is helping IT come to the forefront of key business processes.

Smart storage systems are powered by a glut of CPU/cores, cheaper flash and memory, agile software-defined storage functions and lessons learned from the big data analytics world. Internally, smarter storage systems can do a better job of optimizing capacity and performance through smart deduplication and compression schemes, application-aligned caching and tiering, and policy-definable quality of service (QoS) and data protection schemes. Externally, smart storage systems can create and serve new kinds of metadata about the data inside, providing for better management and governance, application QoS reporting and alignment, and can even help to create direct business value.

The roots of data awareness

Data-aware storage has its roots in old archival “content-addressable storage” architectures, which were early object-based archives that kept additional metadata (i.e., data about data) in order to exactly manage retention requirements (and possibly help with legal discovery actions). Systems often indexed and made this metadata accessible outside of the content itself and, eventually, even content was indexed and made searchable for e-discovery processing. However, as appropriate for archival cold storage, this data intelligence was created offline in post-processing and only applied to static archived data sets, and therefore rarely used.

Ten years ago, the emergence of big data approaches demonstrated that masses of live, unstructured and highly varied data could have tremendous primary business value. Today, the massive web-scale object stores popular for cloud-building and used to power production web and mobile applications often store all kinds of metadata. In fact, these stores support user-defined metadata that developers can arbitrarily extend for advanced application-specific tagging or data labeling. Some advanced file systems directly incorporate content indexing on data ingest to enable end-users to query primary storage for content containing specific words or phrases.

As an example of this evolution, consider the difference between two popular online file-sharing services, Dropbox and Evernote. Both can be used to store and sync various files across devices and share them between groups of users. Dropbox was the baseline standard defining online file sharing and collaboration, but Evernote goes much farther — although for a narrower set of use cases — by becoming innately content-aware with full content search, inline viewers and editors for common file types, extra metadata (e.g., URL source or reference if available, user tagging) and “similar content” recommendations. Although I use both daily, I view Dropbox as just another file-sharing alternative, while Evernote is critical to my workflow.
IT data awareness

Company lawyers (for e-discovery) and detectives (in security) require online systems that proactively identify abnormal behavior to produce early warnings on possible breaches. Smart data-aware storage systems can fold in auditing-type information and help correlate files, data and metadata with patterns of “events” — such as application crashes, file systems filling up, new users granted root access and shared or hidden key directories.

I remember one particularly blatant storage misusage (on a DEC VAX!) when we caught someone hoarding huge amounts of NSFW material on a little-accessed file system. Today’s more content-aware smart storage systems could alert security about such transgressions and warn (or even prevent) creative boundary-pushing users from crossing into job-termination territory to begin with.

Benefits of data-aware storage

  • Fine-grained data protection: Storage that knows, for example, what VM files or volumes belong to or — even better — a specific policy to enforce that VM’s data can directly ensure appropriate data protection (e.g., the right level of RAID or replication).
  • Fine-grained QoS: Similarly, storage that knows what database files require which kinds of performance acceleration can directly prioritize I/O and cache resources for optimal application performance.
  • Content indexing and search: Large stores used for text-based data can deliver extra value by indexing content upon ingestion and enabling built-in admin and (even) end-user search.
  • Social storage analysis: Storage can track usage and access by users and groups as metadata. Then other users can easily find out who in an organization had recent interest in certain content, identify group collaboration patterns and receive recommendations of new things to research based on collaborative filtering (e.g., “people who like the things I like also liked X”).
  • Active capacity and utilization management: Storage can also track metadata about “per-data” system resource performance, capacity and utilization metrics. This enables storage admins to directly see what is going on in IT infrastructure for any piece or group of data tracked directly back to end users, departments and applications. Smart storage can also help optimize its own configuration and behavioral alignment to workloads.
  • Analytics and machine learning: As storage grows smarter, expect to see increasing amounts of both low-level compute processing and automated machine learning incorporated directly into the storage layer. Storage-side functions could then be used to automatically categorize, score, translate, transform, visualize and report on data even as it’s being created and stored.

…(read the complete as-published article there)

When data storage infrastructure really has a brain

An IT industry analyst article published by SearchStorage.

Big data analysis and the internet of things are helping produce more intelligent storage infrastructure.

Mike Matchett

Cheaper and denser CPUs are driving smarter built-in intelligence into each layer of the data storage infrastructure stack.

Take storage, for example. Excess compute power can be harnessed to deploy agile software-defined storage (e.g., Hewlett Packard Enterprise StoreVirtual), transition to hyper-converged architectures (e.g., HyperGrid, Nutanix, Pivot3, SimpliVity), or optimize I/O by smartly redistributing storage functionality between application servers and disk hosts (e.g., Datrium).

There is a downside to all this built-in intelligence, however. It can diminish the visibility we might otherwise have between our data storage infrastructure and, well, changes — any IT change, really, whether due to intentional patching and upgrades, expanding usage and users, or complex bugs and component failures. Or, to put it another way, native, dynamic optimization enabled by powerful and inexpensive processors is making it increasingly difficult for us humans to figure out what’s going on with our infrastructures.

So while it’s really great when we don’t need to know any details, and can simply rely on low-level components to always do the right thing, until there is an absolutely autonomous data center — and, no, today’s public cloud computing doesn’t do away with the need for internal experts — IT may find baked-in intelligence a double-edged sword. Furthermore, while smarter data storage infrastructure helps us with provisioning, optimization, growth plans and troubleshooting, it can blind or fool us and actively work against our best efforts to bend infrastructure to our “will.”

Still, in spite of all these potential negatives, given the choice, I’d rather live in a smarter and more autonomous IT world than not (even if there is some risk of runaway AI). I’ll explain.

It’s all about the data

Remember when analysis used to be an offline process? Capture some data in a file; open Excel, SAS or other desktop tool; and weeks later receive a recommendation. Today, that kind of analysis latency is entirely too long and naïve.

Native, dynamic optimization enabled by powerful and inexpensive processors is making it increasingly difficult for us humans to figure out what’s going on with our infrastructures.

Given the speed and agility of our applications and users nowadays, not to mention bigger data streams and minute-by-minute elastic cloud brokering, we need insight and answers faster than ever. This kind of intelligence starts with plentiful, reliable data, which today’s infrastructures are producing more and more of every day (in fact, we’ll soon be drowning in new data thanks to the internet of things [IoT]), and a way to process and manage all that information.

Storage arrays, for example, have long produced insightful data, but historically required vendor-specific, complex and expensive storage resource management applications to make good use of it. Fortunately, today, there are a series of developments helping us become smarter about IT systems management and better (and faster) users of data generated by our infrastructures: …(read the complete as-published article there)

Kudu Might Be Invasive: Cloudera Breaks Out Of HDFS

(Excerpt from original post on the Taneja Group News Blog)

For the IT crowd just now getting to used to the idea of big data’s HDFS (Hadoop’s Distributed File System) and it’s peculiarities, there is another alternative open source big data file system coming from Cloudera called Kudu. Like HDFS, Kudu is designed to be hosted across a scale-out cluster of commodity systems, but specifically intended to support more low-latency analytics.

…(read the full post)