(Excerpt from original post on the Taneja Group News Blog)
…(read the full post)
…(read the full post)
I’ve been playing again with Neo4j now that v3 is out. And hacking through some ruby scripts to load some interesting data I have laying around (e.g. the database for this website which I’m mainly modeling as “(posts)<-(tags); (posts:articles)<-(publisher)”).
For ruby hacking in the past I’ve used the Neology gem, but now I’m trying out the Neo4jrb set of gems. And though I think an OGM is where it’s at (next Rails app I build will no doubt be using some graph db), I’m starting with just neo4j-core to get a handle on graph concepts and Cypher.
One thing that stumped me for a bit is that with the latest version of these gems – maybe now that they support multiple Neo4j sessions – I found it helped to add a “default: true” parameter to the session “open” to keep everything down stream working at the neo4j-core level. Otherwise Node and other neo4j-core classes seemed to lose the current session and give a weird error (depending on scope?). Or maybe I just kept clobbering my session context somehow. Anyway doesn’t seem to hurt.
The Neo4j v3 Mac OSX “desktop install” has removed terminal neo4j-shell access in favor of the updated slick browser interface. This updated browser interface is pretty good, but for some things I’d still really like to play with a terminal window command shell. Maybe I’m just getting old :)… If you still want the neo4j shell, apparently you can instead install the linux tarball version (but then you don’t get the browser client?). I’m not sure why product managers make either-or packaging decisions like this. It’s not as if the shell was deprecated (e.g. to save much dev, time or testing effort).
Anyway, things look pretty cool in the browser interface, and playing with Cypher is straightforward as you can change between table, text, and graph views of results with just a click.
I’ve also been wanting to play with Gephi more. So I’m exporting data from Neo (using .cvs files though as the Gephi community neo4j importer plugin isn’t yet updated to Gephi v0.9) using Cypher statements like these and the browser interface download button.
Is there a benefit to understanding how your users, suppliers or employees relate to and influence one another? It’s hard to imagine that there is a business that couldn’t benefit from more detailed insight and analysis, let alone prediction, of its significant relationships.
If you have ever drawn dots on a whiteboard and then connected them, you can appreciate that thinking in terms of nodes and links naturally echoes many real-world scenarios. Many of today’s hottest data analysis opportunities for optimization or identifying fraud are represented as a linked web.
Analyzing sets of nodes and the relationships between them is known as graph theory. In a graph database, a common query might be to find all the related objects, based on a certain pattern of relationship, between two and six links away.
Specialized graph databases are a small but fast-growing part of the so-called NoSQL (not only structured query language) database movement. Graph databases are designed to help model and explore a web or graph of relationships in a natural and more productive way than through the traditional relational database approach.
In a graph database, for example, a common query might be to find all the related objects, based on a certain pattern of relationship, between two and six links away. If the same problem was force-fit into normalized tables in an SQL relational model, the translated query would become quite complex and require tens or even hundreds of nested full table joins.
In a relational database query, every required join is going to cause a performance hit. For graph problems of any size, a SQL approach will be demonstrably slower, more complex, prone to error and definitely not as scalable.
Graph databases don’t require a predefined schema; nodes and links can have attributes edited or assigned to them at any time. If a new relationship type is discovered, it can be added to the database dynamically, extending what’s modeled in the database.
In production, IT should be aware of differences in how graph databases scale, how they use memory and how they ingest (and index) data loads.
…(read the complete as-published article there)