Time-based data modeling

I have a data modeling question. The data I have is basically nodes with relationships to other nodes. Nodes have properties. Edges are directional and have properties. I am researching if a DB graph like Neo4j is suitable or not.

Confidence That: The data I have is based on time. It changes over time and I need to track historical data as well. For example, I should be able to request:

  • What was the schedule on a specific date?
  • Which of all node depends on a certain time?
  • What were the properties of the edge between two given nodes at a specific time?

I searched but could not find a satisfactory resource where I could figure out how time can be accounted for in a DB graph. Do you think my requirement can be met with a graphical DB? Is there an example / resource / article that describes this for Neo4j or any other db plot?

I want to make sure the database scales to 100K nodes and millions of edges. I am optimizing time over space.

+3


source to share


1 answer


Is there an example / resource / article that describes this for Neo4j or any other db graph?

Here's a great article from Ian Robinson's blog on time-varying charts.

Basically, this article describes a way to represent time windowed charts adding some additional nodes and time relationships to represent the state of the chart at a given timestamp.

The following image from said article shows:

  • The price produc_id : 1

    changed from 1.00 to 2.00. This is a change in state.
  • product_id : 1

    is now on sale shop_id : 2

    (not shop_id : 1

    ). This is a structural change.

Temporary model

Do you think my requirement can be met by default with Graph DB?

Yes, but not in a simple or "natural" form. Versions of the time-based model with a database that does not offer this functionality natively can be complex and costly. From the article:



Neo4j does not provide internal support either at the level of its tagged property graph model or in the Cypher query language for versioning. Therefore, in order to get the version of the graph, we need to make the data model of the application graphs and the versioning information of the queries.

and

Versioning necessarily creates a lot more data - both more nodes and more relationships. In addition, queries will be more complex and slower because each MATCH must account for one or more versions. Considering this overhead, use version control with caution. Perhaps not all of your schedule needs to be versioned. If it is a case, version is only those parts of the graph that require it.

EDIT:

A few words from the book "Graphical Databases" (Ian Robinson, Jim Webber, and Emil Eifrem) about version control in graph databases. This book is available for download on the New4J page :

Versioning: A vertex graph allows us to restore the state of the graph to a specific point in time. Most graphical databases are not versioning as a first-class concept. It is possible, however, to create a versioning scheme within a graph model. With this schema, nodes and relationships are timestamped and archived every time they are changed. The disadvantage of such versioning schemes is that they flow into any queries written against the graph, adding a layer of complexity to even the simplest query.

This paragraph links to the article linked at the beginning of this answer.

+1


source







All Articles