Why is the non-RDBMS popular?

Ok, this is a softball question for anyone inside the big data space.  However, it does come up as a real question when talking to business owners and application users.   This is a quick high-level description of NoSQL/NewSQL, why it’s different than SQL RDBMS, and where I think both of those markets are going.

Why is the non-#RDBMS becoming so popular? #NoSQL #database (with transcript)

Why Facebook and the NSA love graph databases

An IT industry analyst article published by SearchDataCenter.

Graph databases play six degrees of separation to find real connections. See how IT teams can use the database approach for businesses.

Is there a benefit to understanding how your users, suppliers or employees relate to and influence one another? It’s hard to imagine that there is a business that couldn’t benefit from more detailed insight and analysis, let alone prediction, of its significant relationships.

If you have ever drawn dots on a whiteboard and then connected them, you can appreciate that thinking in terms of nodes and links naturally echoes many real-world scenarios. Many of today’s hottest data analysis opportunities for optimization or identifying fraud are represented as a linked web.

Analyzing sets of nodes and the relationships between them is known as graph theory. In a graph database, a common query might be to find all the related objects, based on a certain pattern of relationship, between two and six links away.

Specialized graph databases are a small but fast-growing part of the so-called NoSQL (not only structured query language) database movement. Graph databases are designed to help model and explore a web or graph of relationships in a natural and more productive way than through the traditional relational database approach.

In a graph database, for example, a common query might be to find all the related objects, based on a certain pattern of relationship, between two and six links away. If the same problem was force-fit into normalized tables in an SQL relational model, the translated query would become quite complex and require tens or even hundreds of nested full table joins.

In a relational database query, every required join is going to cause a performance hit. For graph problems of any size, a SQL approach will be demonstrably slower, more complex, prone to error and definitely not as scalable.

Graph databases don’t require a predefined schema; nodes and links can have attributes edited or assigned to them at any time. If a new relationship type is discovered, it can be added to the database dynamically, extending what’s modeled in the database.

In production, IT should be aware of differences in how graph databases scale, how they use memory and how they ingest (and index) data loads.

…(read the complete as-published article there)

Database performance tuning: Five ways for IT to save the day

An IT industry analyst article published by SearchDataCenter.

IT teams can play heroes when database performance issues disrupt applications. Try these five tips for performance tuning before there’s a problem.

When database performance takes a turn for the worse, IT can play the hero. Production database performance can slow dramatically as both data and the business grow. Whenever a key database slows down, widespread business damage can result.

Technically, performance can be tackled at many different levels — applications can be optimized, databases tuned or new architectures built. However, in production, the problem often falls on IT operations to implement something fast and in a minimally disruptive manner.

There are some new ways for IT pros to tackle slowdown problems. However, one question must be addressed first: Why is it up to IT?

Database administrators and developers have many ways to achieve database performance tuning. They can adjust configuration files to better align database requirements with underlying infrastructure, add indexing, implement stored procedures or even modify the schema to (gasp!) denormalize some tables.

Developers have significant control over how a database is used; they determine what data is processed and how it is queried. Great developers wield fierce SQL ninja skills to tune client queries, implement caching and build application-side throttling. Or, they rewrite the app to use a promising new database platform, such as a NoSQL variant.

All kinds of databases, however, can eventually suffer performance degradation at operational scales. Worse, many developers simply expect IT to simply add more infrastructure if things get slow in production, which clearly isn’t the best option.

Five ways [IT] can address database performance issues:…

…(read the complete as-published article there)

Free Professional Development in Hadoop and NoSql

(Excerpt from original post on the Taneja Group News Blog)

A wonderful trend we have not only noticed but taken advantage is free online education. In the academic world, MIT has been offering it for years with OpenCourseWare, and recently even Harvard announced it’s producing free classes at edX. But today it is not just heavy-duty term-length classes in stats, data mining, AI and multivariate calculus – although those are all valuable and interesting (well, I am a geek). What is really exciting is that professional development is following suit, especially in IT and Big Data.

Certainly there has long been IT vendor-sponsored education and sometimes its even “free”, but when it is focused on the vendor’s own proprietary commercial offerings there is some question of the actual agenda. Will you spend most of your time learning how to just use that vendor’s solutions? This can be valuable if you own them, although you still might suspect that this kind of education is really filling in for lack of great usability or built-in automation features. I’m reminded of some of the old Novell and Windows certifications that were more about memorizing the trivia of the product lines than learning how to capitalize on the capabilities they enabled. I personally considered that indoctrination more than professional development.

What you can look for today are courses in learning and applying new solutions and technologies, not just in vendor-specific products. A good example is the IBM sponsored BigDataUniversity offering free classes on Hadoop. Yes, there are also free classes on IBM’s DB2 and you can graciously use IBM’s distro of Hadoop to play with, but we are pleased with the vendor-neutral presentation.

If you are interested in NoSQL and the open source MongoDB in particular, you can get onto 10gen’s free education classes on MongoDB that start next week. Although there is a class listed for developers, IT folks whipping out scripts here and there to automate and integrate systems might really find some utility in learning how to leverage MongoDB’s document-centric database. And another course listed for DBA’s could be relevant to IT and Dev Ops folks – how do you actually shard and scale-out these kinds of NoSQL solutions?

While its clear that offering education can be a vendor marketing strategy, I think it can be a good one for both the vendor and the user community. Most companies encourage professional development overall, but your local HR might not promote much in the way of the cutting edge technical development that IT folks desire. I’d encourage some quick googling to see what’s out there – and maybe even request something specific from your favored vendors. Free online technical development courses look like a win-win all the way around.

…(read the full post)

MongoDB – Storing Big Data Documents

If a Big Data set (or smaller data) is in the form of documents, then it’s difficult to store them in a traditional schema-defined row and column database. Sure, you can create large blob fields to hold large arbitrary chunks of data, serialize and encode the document in some way, or just store them in a filesystem, but those options aren’t much good for querying or analysis when the data gets big.

MongoDB Document Database

MongoDB is a great example of a document database. There are no predefined schemas for tables (the schema is considered “dynamic”). Rather you declare “collections” and insert or update documents directly into each collection.

A document in this case is basically JSON with some extensions (actually BSON – Binary encoded JSON). It supports nested arrays and other things that you wouldn’t find in a relational database. If you are object oriented, this is fundamentally an object store.

Documents added to a single collection can vary widely from each other in terms of content and composition/structure (although an application layer above could obviously enforce consistency as happens when MongoDB is used under Rails).

MongoDB’s list of key features is a fantastic mini-tutorial in itself: Continue reading