SQL Server machine learning goes full throttle on operational data

An IT industry analyst article published by SearchSQLServer.

Artificial intelligence is a hot topic in IT, and Microsoft has made strides to synchronize SQL Server with machine learning tools for use in analyzing operational data pipelines.

Mike Matchett

One of the hottest IT trends today is augmenting traditional business applications with artificial intelligence or machine learning capabilities. I predict the next generation of data center application platforms will natively support the real-time convergence of online transaction processing with analytics. Why not bring the point of the sword on operational insight to the frontline where business actually happens?

But modifying production application code that is optimized for handling transactions to embed machine learning algorithms is a tough slog. As most IT folks are reluctant — OK, absolutely refuse — to take apart successfully deployed operational applications to fundamentally rebuild them from the inside out, software vendors have rolled out some new ways to insert machine intelligence into business workflows. Microsoft is among them, pushing SQL Server machine learning tools tied to its database software.

Basically, adding intelligence to an application means folding in a machine learning model to recognize patterns in data, automatically label or categorize new information, recommend priorities for action, score business opportunities or make behavioral predictions about customers. Sometimes this intelligence is overtly presented to the end user, but it can also transparently supplement existing application functionality.

In conventional data science and analytics activities, machine learning models typically are built, trained and run in separate analytics systems. But models applied to transactional workflows require a method that enables them to be used operationally at the right time and place, and may need another operational method to support ongoing training (e.g., to learn about new data).
Closeness counts in machine learning

In the broader IT world, many organizations are excited by serverless computing and lambda function cloud services in which small bits of code are executed in response to data flows and event triggers. But this isn’t really a new idea in the database world, where stored procedures have been around for decades. They effectively bring compute processes closer to data, the core idea behind much of today’s big data tools.

Database stored procedures offload data-intensive modeling tasks such as training, but can also integrate machine learning functionality directly into application data flows. With such injections, some transactional applications may be able to take advantage of embedded intelligence without any upstream application code which needs to be modified. Additionally, applying machine learning models close to the data in a database allows the operational intelligence to be readily shared among different downstream users…(read the complete as-published article there)

Big data concerns reach broad and deep in new era of applications

An IT industry analyst article published by SearchITOperations.

Big data is out there, waiting to make you rich — or help your organization succeed anyway. But there are still more unknowns than knowns about the future of big data.

Mike Matchett

Big data is being created everywhere we look, and we are all thinking about how to take advantage of it. I certainly want to come up with some novel new big data application and become fabulously wealthy just for the idea. The thing is, most companies — perhaps all — can profit from big data today just by accelerating or refining some piece of their current business, supposing they can identify and corral the right information in the right time and place.

There is no need to find a new earth-shattering application to get started. I believe a significant big data payback is right in front of any marketing, sales, production or customer-engagement team. One simply needs to find a way to unlock the buried big data treasure. And, of course, that’s where big data concerns from practical to theoretical bubble to the surface.

A big sticking point has been finding the data science expertise, especially experts who could build optimized machine learning models tailored for your exact business needs. But we are seeing some interesting efforts recently to automate and, in some ways, commoditize big data handling and complicated machine learning. These big data automation technologies enable the regular Java Joe or Josie programmer to effectively drop big data analytics into existing, day-to-day operational-focused business applications.

Not only does this have the democratizing effect of unlocking big data value for non-data scientists, but it also highlights the trend toward a new application style. In the next three to five years, we will see most business applications that we’ve long categorized as transactional converge with what we’ve implemented separately as analytical applications. Put simply, with big data power, “business intelligence” is becoming fast enough and automated enough to deliver inside the operational business process in active business timeframes.

As these data processing worlds collide, they will reveal big data concerns for IT staff, and for those making decisions on IT infrastructure and data centers. Storage, databases and even networks will all need to adapt. Along with the rise of the internet of things (IoT), hybrid cloud architectures, persistent memory and containers, 2017 is going to be a pivotal year for challenging long-held assumptions and changing IT directions.
Out-of-reach data

While I will undoubtedly focus a lot of time and energy as an industry analyst on these fast-evolving topics in the near term, there is a longer-term big data concern: Some companies might not be able to take advantage of this democratization of data simply because they can’t get access to the data they need.

We need to think about how we can ensure [big data is] reliable, how we can maintain and ensure privacy — and regulatory compliance — how we can ensure we only implement ethical and moral big data algorithms and so on.

We’ve heard warnings about how hard it is to manage big data as important data. We need to think about how we can ensure it’s reliable, how we can maintain and ensure privacy — and regulatory compliance — how we can ensure we only implement ethical and moral big data algorithms and so on. But before all that, you first need access to the data — assuming it exists or can be created — that is valuable to your company. I call this the data paucity problem — there’s too little big data in use.

As an example, I don’t believe every IoT device manufacturer will end up getting unfettered access to the data streams generated by their own things, much less to the ecosystem of data surrounding their things in the field. I think it is inevitable that some will be getting locked out of their own data flowback…(read the complete as-published article there)

Cloud, IoT to drive enterprise IT trends in 2017

An IT industry analyst article published by SearchCloudComputing.

Cloud computing has evolved quite a bit in the last few years, but it still has far to go. Technologies such as big data, containers and IoT will have a big part to play in the future.

Mike Matchett

Yes, it’s a brand new year and time to make some Next Big Thing predictions for the year to come. This year, our outline of what’s on the immediate horizon is already well known: hybrid cloud adoption, big data applications and containers. Looking a little further out at enterprise IT trends, we might see the first practical persistent storage-class memory begin to disrupt 30 years of traditionally structured data center infrastructure. And expect a hot smoking internet of things mess of requirements to land in the lap of IT folks everywhere.

All of these topics are, of course, highly interrelated. In fact, it wouldn’t surprise me to find that many organizations will have to bite the bullet on all five at the same time to handle a new internet of things (IoT) data processing application. But let’s take a quick look at each:

Cloud adoption. I am as guilty as the next pundit in predicting when cloud adoption will finally be considered a “traditional” deployment model. But this time I really mean it! VMware is demonstrating cross-cloud products. Microsoft is making real hay rolling traditional businesses, large and small, into software as a service, likeOffice365, and infrastructure as a service, like Azure. And all our favorite storage vendors are realizing that building in a cloud tier won’t shrink on-premises storage needs given the growth in data and hybrid technologies that balance and marry the best benefits of both cloud and on-premises processing.

Big data. Hadoop is a decade old now. With newer generation platforms like Apache Spark making it easier to deploy and consume big data interactively for SQL-friendly business analysis, real-time operations, machine learning and even graph-based applications, it’s time for us all to get on board this train. As I’ve said, all data can grow up into big data someday. One of the top enterprise IT trends we’ve noticed is less concern about what big data is and more focus on getting maximum value out of all that data. In fact, I predict that data access — or data paucity — will become a new corporate key performance indicator in the future.

Containers. Having predicted the fast rise of containers last year, I claim some victory here against naysayers. Containers have won even if they aren’t in production everywhere yet. Yes, there are some major issues yet to be resolved for the regular, not quite DevOps, IT organization. Many apps will never transition to containers — just like how we will have mainframe applications and VM-based appliances hanging around for decades — but open the hood of every modern application, appliance, cloud or software-defined infrastructure, and you’ll likely find containers. In fact, most of the newest enterprise IT trends covered above – especially cloud and big data — are internally powered by container-based development and deployment.

Containers have won even if they aren’t in production everywhere yet.

…(read the complete as-published article there)

Showback the value of your data storage expertise

An IT industry analyst article published by SearchCloudStorage.

To demonstrate value, IT must provide an easy-to-understand cost model to its business leaders. This has fostered IT showback projects. Yet showback isn’t easy to achieve.

Mike Matchett

It seems everyone is trying to eliminate the storage admin. If the general virtual admin can manage storage or the cloud takes care of it, why not? Well, for one, it would greatly increase all kinds of risk. IT operations could always become more automated and infrastructure silos will naturally merge, sure, but advanced storage responsibilities like data protection and governance are complex, tricky, and often require close supervision and expertise.

It’s not just storage that’s in trouble, either. Public clouds provide a simple price structure that businesses can not only directly map to utilization, but also often unfairly compare to their internal IT allocation, or tax, for similar services.

So how can IT make clear the relative value of storage knowhow in this new world of automation, virtualization, convergence and cloud computing…(read the complete as-published article there)

Four big data and AI trends to keep an eye on

An IT industry analyst article published by SearchITOperations.

AI is making a comeback — and it’s going to affect your data center soon.

Mike Matchett

Big data and artificial intelligence will affect the world — and already are — in mind-boggling ways. That includes, of course, our data centers.

The term artificial intelligence (AI) is making a comeback. I interpret AI as a larger, encompassing umbrella that includes machine learning — which in turn includes deep learning methods — but also implies thought. Meanwhile, machine learning is somehow safe to talk about. It’s just some applied math — e.g., built-over probabilities, linear algebra, differential equations — under the hood. But use the term AI and, suddenly, you get wildly different emotional reactions —for example, the Terminator is coming. However, today’s broader field of AI is working toward providing humanity with enhanced and automated vision, speech and reasoning.

If you’d like to stay on top of what’s happening practically in these areas, here are some emerging big data and AI trends to watch that might affect you and your data center sooner rather than later:

Where there is a Spark…
Apache Spark is replacing basic Hadoop MapReduce for latency-sensitive big data jobs with its in-memory, real-time queries and fast machine learning at scale. And with familiar, analyst-friendly data constructs and languages, Spark brings it all within reach of us middling hacker types.

As far as production bulletproofing, it’s not quite fully baked. But version two of Spark was just released in mid-2016, and it’s solidifying fast. Even so, this fast-moving ecosystem and potential “Next Big Things” such as Apache Flink are already turning heads.

Even I can do it. A few years ago, all this big data and AI stuff required doctorate-level data scientists. In response, a few creative startups attempted to short-circuit those rare and expensive math geeks out of the standard corporate analytics loop and provide the spreadsheet-oriented business intelligence analyst some direct big data access.

Today, as with Spark, I get a real sense that big data analytics is finally within reach of the average engineer or programming techie. The average IT geek may still need to apply him or herself to some serious study but can achieve great success creating massive organizational value. In other words, there is now a large and growing middle ground where smart non-data scientists can be very productive with applied machine learning even on big and real-time data streams…(read the complete as-published article there)

What options exist for IT infrastructure management services?

An IT industry analyst article published by SearchITOperations.

What kinds of as-a-service IT management options are available? Are IT management services only coming from startups, or do established management software vendors have options?

Mike Matchett

Various companies offer IT infrastructure management services hosted and operated in a remote, multi-tenant cloud. This as-a-service model provides core IT management services to private, on-premises data centers, remote offices, rented infrastructure in colocation or other infrastructure as a service hosting, or some hybrid combination of these deployments.

As an early example, when Exablox launched, it targeted IT shops generally seeking to squeeze the most out of constrained storage budgets — organizations that would gladly give up the pain and cost of installing and operating on-premises storage management in favor of just using a cloud-hosted storage management service. This approach radically evolved call-home support based on daily data dumps into online operational IT management as a service.

At that time, some businesses were dismissive of the idea that IT infrastructure management services would not require operational software they could directly host in their own data centers. Some forward-thinking startups, such as VM management provider CloudPhysics and the deeper infrastructure-focused Galileo Performance Explorer, noted that large companies would consider remote performance management tooling, as it’s based on machine data and log files with little risk of exposing corporate secrets. And performance management activities don’t sit in the direct operational workflow.…(read the complete as-published article there)

IT management as a service is coming to a data center near you

An IT industry analyst article published by SearchITOperations.

IT management as a service uses big data analytics and vendors’ expertise to ease the IT administration and optimization process. IT orgs must trust the flow of log and related data into an offsite, multi-tenant cloud.

Mike Matchett

IT management as a service finally breaks through. Where does it go from here?

Perhaps the über IT trend isn’t about hailing a ride from within the data center, but adopting and migrating to newer generations of tools that ease the cost and pain of managing infrastructure.

It’s not efficient for each IT shop to individually develop and maintain siloed expertise in managing every vendor-specific component. The physical — and financial — limits of IT shops are by and large why cloud service providers continue to gain ground.

Today, there is an inexorable transition toward commoditized physical equipment with differentiating software-defined capabilities floated in on top. Using commodity hardware offers direct CapEx benefits. However, by taking advantage of software resources — and virtualization — to pre-integrate multiple infrastructure layers, converged and hyper-converged platforms also eliminate significant IT time and labor required by traditional, siloed architectures. In freeing up IT, the converged and hyper-converged options also improve overall agility and help IT groups transition from equipment caretakers to business enhancers.

In a similar bid to lower management OpEx pain, IT operations and management solutions are slowly and inexorably increasing inherent automation. Policy-based approaches help an IT organization address scale and focus on building the right services for their users instead of remaining stuck in low-level, tedious and often reactive “per-thing” configuration and management. And much of the appeal of cloud computing is based on offloading IT by offering end-user self-service capabilities.

But even in running a hyper-converged or hybrid cloud data center, there are still plenty of IT hours spent thanklessly on internally facing operations and management tasks. Operating a cloud, a cluster, a hybrid operation — even just maintaining the actual management tools that run the operations and automation — can still be a huge chore. Similar to how many businesses now use the cloud as a source of easy, catalog-driven, self-service, elastic, utility-priced application computing, IT is starting to look to the cloud for IT management as a service.

The broadening acceptance of public cloud services is inverting the traditional IT management paradigm, moving management services into the cloud while preserving on-premises — or hybrid — computing and infrastructure. This has been a long, slow battle due to ingrained IT tradition, security fears and worries about losing control; there’s a reluctance to let go of the private management stack. But the drive to make IT more efficient and productive is now taking priority.

We are seeing the inevitable acceptance and widespread adoption of remote, cloud-hosted IT management services, from remote performance management to hybrid cloud provisioning and brokering and even on-premises “cluster” operations. These services can be referred to collectively as IT management as a service, or IT MaaS…(read the complete as-published article there)

Smart storage systems smart for business

An IT industry analyst article published by SearchStorage.

Mike Matchett explains how data-aware storage combined with application awareness is leading to a new wave of intelligent data storage.

Mike Matchett

Not too long ago, storage arrays were holed up deep in the data center and manageable without requiring much knowledge about the data actually stored therein. A storage admin might have known it was database data for a key application requiring high performance and solid backups, for example, but the database administrator took care of all the data-specific details. Today, this artificial wall separating information about data and the storage that it holds is changing, and rapidly.

Convergence isn’t only closing the gaps between silos of infrastructure, it is collapsing the distance between the job of persistence on the back end in storage and what stored data actually means and is used for on the front end. No longer desirable or even sufficient to store and protect bit patterns deep in the innards of the data center, you must now manage storage in ways that directly advance business operations.

In fact, it’s becoming a competitive necessity to leverage data at every level, or tier, of persistence throughout the data’s lifecycle. This is good for IT folks, as new data-aware storage is helping IT come to the forefront of key business processes.

Smart storage systems are powered by a glut of CPU/cores, cheaper flash and memory, agile software-defined storage functions and lessons learned from the big data analytics world. Internally, smarter storage systems can do a better job of optimizing capacity and performance through smart deduplication and compression schemes, application-aligned caching and tiering, and policy-definable quality of service (QoS) and data protection schemes. Externally, smart storage systems can create and serve new kinds of metadata about the data inside, providing for better management and governance, application QoS reporting and alignment, and can even help to create direct business value.

The roots of data awareness

Data-aware storage has its roots in old archival “content-addressable storage” architectures, which were early object-based archives that kept additional metadata (i.e., data about data) in order to exactly manage retention requirements (and possibly help with legal discovery actions). Systems often indexed and made this metadata accessible outside of the content itself and, eventually, even content was indexed and made searchable for e-discovery processing. However, as appropriate for archival cold storage, this data intelligence was created offline in post-processing and only applied to static archived data sets, and therefore rarely used.

Ten years ago, the emergence of big data approaches demonstrated that masses of live, unstructured and highly varied data could have tremendous primary business value. Today, the massive web-scale object stores popular for cloud-building and used to power production web and mobile applications often store all kinds of metadata. In fact, these stores support user-defined metadata that developers can arbitrarily extend for advanced application-specific tagging or data labeling. Some advanced file systems directly incorporate content indexing on data ingest to enable end-users to query primary storage for content containing specific words or phrases.

As an example of this evolution, consider the difference between two popular online file-sharing services, Dropbox and Evernote. Both can be used to store and sync various files across devices and share them between groups of users. Dropbox was the baseline standard defining online file sharing and collaboration, but Evernote goes much farther — although for a narrower set of use cases — by becoming innately content-aware with full content search, inline viewers and editors for common file types, extra metadata (e.g., URL source or reference if available, user tagging) and “similar content” recommendations. Although I use both daily, I view Dropbox as just another file-sharing alternative, while Evernote is critical to my workflow.
IT data awareness

Company lawyers (for e-discovery) and detectives (in security) require online systems that proactively identify abnormal behavior to produce early warnings on possible breaches. Smart data-aware storage systems can fold in auditing-type information and help correlate files, data and metadata with patterns of “events” — such as application crashes, file systems filling up, new users granted root access and shared or hidden key directories.

I remember one particularly blatant storage misusage (on a DEC VAX!) when we caught someone hoarding huge amounts of NSFW material on a little-accessed file system. Today’s more content-aware smart storage systems could alert security about such transgressions and warn (or even prevent) creative boundary-pushing users from crossing into job-termination territory to begin with.

Benefits of data-aware storage

  • Fine-grained data protection: Storage that knows, for example, what VM files or volumes belong to or — even better — a specific policy to enforce that VM’s data can directly ensure appropriate data protection (e.g., the right level of RAID or replication).
  • Fine-grained QoS: Similarly, storage that knows what database files require which kinds of performance acceleration can directly prioritize I/O and cache resources for optimal application performance.
  • Content indexing and search: Large stores used for text-based data can deliver extra value by indexing content upon ingestion and enabling built-in admin and (even) end-user search.
  • Social storage analysis: Storage can track usage and access by users and groups as metadata. Then other users can easily find out who in an organization had recent interest in certain content, identify group collaboration patterns and receive recommendations of new things to research based on collaborative filtering (e.g., “people who like the things I like also liked X”).
  • Active capacity and utilization management: Storage can also track metadata about “per-data” system resource performance, capacity and utilization metrics. This enables storage admins to directly see what is going on in IT infrastructure for any piece or group of data tracked directly back to end users, departments and applications. Smart storage can also help optimize its own configuration and behavioral alignment to workloads.
  • Analytics and machine learning: As storage grows smarter, expect to see increasing amounts of both low-level compute processing and automated machine learning incorporated directly into the storage layer. Storage-side functions could then be used to automatically categorize, score, translate, transform, visualize and report on data even as it’s being created and stored.

…(read the complete as-published article there)

What do IT administrator skills mean now?

An IT industry analyst article published by SearchIToperations.

In a world full of data-aware this and internet-connected that, deep IT administrator skills should be more in-demand than ever.

Mike Matchett

It seems everything in the IT world is getting smarter and more connected. Storage is becoming data-aware, IT infrastructure components are becoming part of the internet of things and even our applications are going global, mobile and always on. And big data analytics and machine learning promise to find any information, buried anywhere, to optimize operations and business processes. So where does that leave long-time IT administrators?

The hot trend of DevOps was just an early warning sign that IT is no longer going to be made up of backroom, silo-focused, shell-scripting admin jobs. DevOps is great because having someone versed deeply in the application as much as in the production infrastructure hosting it avoids many of the problems that occur when IT folks are thrown some black box code over the wall and told to just make it run well at scale. But as we’ve seen, native DevOps folks that can dig into application code as easily as they troubleshoot, re-balance, and even capacity plan production systems are quite rare.

It’s common to see DevOps folks coming from the application side when infrastructure is easily and simply cloud provisioned — hence the ready interest in containerized applications. But when it isn’t, especially if hybrid architectures are involved, IT experts might become better DevOps masters in the long run.

I suspect many IT experts consider that kind of move to be somewhat of a downgrade. Perhaps it should instead be seen as moving closer to providing direct business value. Personally, I love hacking code, building accurate capacity planning models, tuning production performance and yes, even troubleshooting arcane and exotic problems. But as I’ve often told anyone who doesn’t know the true depth of IT administrator skills — usually at cocktail parties when it comes out that I do something in technology — “I AM NOT [JUST] A PROGRAMMER!” (This is usually followed by everyone within earshot beating a hasty retreat. I’m really a lot of fun at parties!)

It’s all virtualization’s fault

IT specialists also need to broaden into — or be replaced by — IT generalists. Here we can blame virtualization and, to some extent, infrastructure convergence. There are an awful lot more virtual admins out there than 10 years ago. Virtual environment administration isn’t actually easy, but a big value when virtualizing infrastructure is to lower operational expenditures by making it easier to administer: more automatic sharing, simpler point-and-click operations, scalable policy-based management and plug-and-play integration. I often hear from virtual admins that their IT administrator skills are still challenged daily, simply with keeping the lights on and ensuring things are running smoothly, but they are relying more and more on built-in lower-level intelligence and automation. This frees up some time to take a bigger-picture view and operate at a wider span of control. Still, the trend toward IT generalists often disenfranchises the IT silo expert whose cheese gets virtualized or converged.

The role of the IT administrator will definitely need to change as data centers hybridize across multiple types of private and public clouds, stacks of infrastructure converge and hyper-converge, and systems management develops sentience. Of course, change is inevitable. But how can old-school IT administrators stay current and continue providing mastery-level value to their organizations? …(read the complete as-published article there)

When data storage infrastructure really has a brain

An IT industry analyst article published by SearchStorage.

Big data analysis and the internet of things are helping produce more intelligent storage infrastructure.

Mike Matchett

Cheaper and denser CPUs are driving smarter built-in intelligence into each layer of the data storage infrastructure stack.

Take storage, for example. Excess compute power can be harnessed to deploy agile software-defined storage (e.g., Hewlett Packard Enterprise StoreVirtual), transition to hyper-converged architectures (e.g., HyperGrid, Nutanix, Pivot3, SimpliVity), or optimize I/O by smartly redistributing storage functionality between application servers and disk hosts (e.g., Datrium).

There is a downside to all this built-in intelligence, however. It can diminish the visibility we might otherwise have between our data storage infrastructure and, well, changes — any IT change, really, whether due to intentional patching and upgrades, expanding usage and users, or complex bugs and component failures. Or, to put it another way, native, dynamic optimization enabled by powerful and inexpensive processors is making it increasingly difficult for us humans to figure out what’s going on with our infrastructures.

So while it’s really great when we don’t need to know any details, and can simply rely on low-level components to always do the right thing, until there is an absolutely autonomous data center — and, no, today’s public cloud computing doesn’t do away with the need for internal experts — IT may find baked-in intelligence a double-edged sword. Furthermore, while smarter data storage infrastructure helps us with provisioning, optimization, growth plans and troubleshooting, it can blind or fool us and actively work against our best efforts to bend infrastructure to our “will.”

Still, in spite of all these potential negatives, given the choice, I’d rather live in a smarter and more autonomous IT world than not (even if there is some risk of runaway AI). I’ll explain.

It’s all about the data

Remember when analysis used to be an offline process? Capture some data in a file; open Excel, SAS or other desktop tool; and weeks later receive a recommendation. Today, that kind of analysis latency is entirely too long and naïve.

Native, dynamic optimization enabled by powerful and inexpensive processors is making it increasingly difficult for us humans to figure out what’s going on with our infrastructures.

Given the speed and agility of our applications and users nowadays, not to mention bigger data streams and minute-by-minute elastic cloud brokering, we need insight and answers faster than ever. This kind of intelligence starts with plentiful, reliable data, which today’s infrastructures are producing more and more of every day (in fact, we’ll soon be drowning in new data thanks to the internet of things [IoT]), and a way to process and manage all that information.

Storage arrays, for example, have long produced insightful data, but historically required vendor-specific, complex and expensive storage resource management applications to make good use of it. Fortunately, today, there are a series of developments helping us become smarter about IT systems management and better (and faster) users of data generated by our infrastructures: …(read the complete as-published article there)