How is a graph library different from a graph represented in a relational database?

I can represent a graph trivially in a relational database with two tables: vertex

and edge

. Richer structures such as "properties" and "labels" (in Neo4j terminology) can be represented as more tables. Did I misunderstand, or can a diagram database like Neo4j represent something that is not easy to represent relationally?

I can query this graph using SQL, with recursive subqueries if needed, and multiple separate queries in a transaction if needed. Am I misunderstanding, or does a graph query language like Cypher provide more expressiveness than SQL?

The relational chart model is stored and queried efficiently, AFAIK. Does the charting database build its store or optimize its queries in some way that provides performance characteristics that cannot be obtained from a relational database?

My relational database provides ACID guarantees and allows me to write fairly expressive constraints on my graph data (and even more constraints if I split one table vertex

into a properly normalized schema). Did I get it, or does the graph database provide some guarantees or check for some correctness properties that are not available in my relational database?

I am struggling to understand how a graph database like Neo4j is something of a subset of the relational model. (Apologies for using Neo4j as a spokesperson for all graphical bases here, this is the only one I've looked at.)

In short: is there a database relay database?


source to share

1 answer

Is it a subset of the other?

Definitely not; both are ultimately modeled in mathematical concepts of relationships or graphs. Both models are super generators, there is basically no informational content that you cannot use with either of them. This means that while they may differ in many of the syntactic ways of sugar and in how they encourage you to model / think about data (just like programming languages), they have the same "expressive power".

What you are describing in your question is one way to model a graph (tables vertex

and edge

). This graph implementation is a subset of what relational can express. Likewise, I could mock tables and rows using a graph database, but I would choose a specific implementation - it would not demonstrate that relational data is a subset of the graph data.

So, the first understanding is that they have approximately equal expressive power. You can simulate anything. So the real question you should be asking is why did you choose one of these?

Why did you choose one by one?

All databases exist to facilitate data access. Simply put, you store it so that you can retrieve the data. But how exactly do you need to get the data? There are many different access patterns. The overall design of the database space is huge . Every time the database makes a certain decision, it tends to automatically improve some things, which is worse in others. For example, when you create an index on a relational database, you've just sped up reads, but you've degraded write performance because the index needs to be maintained.

So, when approaching the question "Graph or Relational"? - first you have to figure out what your data looks like and what your data access patterns look like. If you knew what it is, then you can evaluate a variety of databases, see what they've done, and choose the one that works for what you need. And then, if a DBMS made a choice that would make certain, wrong, or slow access patterns difficult, you could avoid that DBMS for that dataset.

He's (in part) about data access patterns

Graphical databases are generally better than relational databases when the stored data is a graph, when the data access pattern involves a large number of graph traversals, or both. ( See the other answer I wrote for a more detailed discussion of why this is the case). This link there also provides an answer to your specific question: "Is the graph database structure its own store or does it optimize its queries in some way that provides performance characteristics that cannot be obtained from a relational database?"

You say, I can query this graph using SQL, with recursive subqueries if needed, and multiple separate queries per transaction if needed. “So technically true, but let's take an example to understand why relational might not be good enough. Let's say I have a graph (in RDBMS, node table, edge table, with a join key between them). Let's say I select one node and I want to identify everything between 6 and 8 bumps from that node. Here's the cypher for that:

match (myChosenNode {id: 'foo'})-[r:relationshipType*6..8]->(y) return y;


I really want you to write this as SQL. It is possible, but difficult and difficult. And it will also act like a dog because of the large amount of joining you will be doing on non-trivial amounts of data.


Now OK on ACID guarantees, Neo4J provides transactions with ACID guarantees . The answer will be different for different graphical databases, especially those implemented on top of Hadoop / HBase. YMMV, so check the small print with each database.

It is true that there are a number of RDBMS features that are not commonly found in graph databases, examples are triggers and some kinds of constraints. As a longtime RDMBS nerd, I am not very happy with being missing, I think they are valuable.


What it basically comes down to me and many other engineers I work with are:

  • What is your data?
  • What are your access patterns?

If your data is a graph, or your access patterns involve a lot of graph traversals, you should probably use a graphical DB. If your data is more tabluar, or your access patterns are more targeted towards bulk crawling, you should use an RDBMS. After all, they are two different tools with different niches. If you use them to your best, you will be happy. If you use an RDBMS to simulate a graph just "because you can", you will suffer. If you use a graph database to do a lot of bulk scans of each node in each graph, you will suffer. Like most technology, it just uses the right tool for the job.



All Articles