Poor spatial layer input performance

So I tried loading some postcodes and address data into neo4j. I am creating a unique constraint and there are three shortcuts. POSTCODE, ADDRESS and REGION. REGION and POSTCODE have unique constraints on their one property. The query we are using to insert will be MERGE REGION, MERGE POSTCODE CREATE ADDRESS and then CREATE RELATIONSHIP. The idea is to be able to see which zip codes are in which region, and how many addresses one zip code is using, so MERGE behavior is important.

However, we found it to be very slow when the database even reaches a fairly moderate size. Now we were expecting this, but we were expecting constraint checks to scale as log (n). Instead, performance is linear across the size of the database, which is very unexpected.

enter image description here

What can I do to improve this without giving up the MERGE behavior? Is this a consequence of the UNIQUE constraint? In theory, there should be no difference between having a unique constraint and just indexing when using merge, since there is only one property. In any case, the merge needs to know if the property exists in order to decide the merge.

I know I can do different things to speed up inserts, use a csv loader, etc. I am interested in improving asymptotic performance here. I thought unique constraints should have a time cost of O (log (n)), not O (n), and this potentially makes a huge difference.

EDIT: Further investigation showed that the problem is not with the indices, but with the R-tree in the spatial layer. The specific code used to embed was using the built-in API, not cypher, and snippet:

graphDB.index().forNodes(s).add(node, "dummy", "variable");

      

gradually increases by O (n) as the tree size expands. This appears to be the expected behavior for R-trees. This takes about 0.0005 * Number of nodes in the layer. When removing a spatial inset, it is an order of magnitude faster and does not show any scaling. My guess is that the decrease is only due to the cache warming up after startup.

enter image description here

Specifically, I use the following code to run the spatial index:

Map<String, String> config = SpatialIndexProvider.SIMPLE_POINT_CONFIG;
            Transaction tx = graphDB.beginTx();
            IndexManager indexMan = graphDB.index();
            try{
                indexMan.forNodes(lab.name(), config);
                tx.success();
            } finally {
                tx.close();
            }

      

How does it do you get the Cypher entry point, but are there any qualitative differences between indices and levels? Will the layer have better performance than the index, or are both supported by the same R-trees.

Suggestion for this question: Unlimited performance degradation of Neo4J after records added to the spatial layer , it looks like I have to put all the nodes in the database before I start the spatial layer as it will index much faster than incremental insertion.

Bad Try it tomorrow.

+3


source to share


1 answer


Which version of Neo4j are you using?

Yes, please share your requests.

If you use LOAD CSV

, you will get better performance when creating the nodes separately, first with MERGE

, and then in a second pass, create relationships withMATCH ... MATCH ... CREATE ...

see also: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/



If you are not using LOAD CSV, are you running separate small transactions? If so, it makes sense to set them to 1000 operations per transaction.

Can you also check that your constraints are in place, with ": schema" in the browser or "schema" in the shell?

And check that the index / constraint is actually being used by profiling your query in the shell? Just attach it to profile

.

+1


source







All Articles