How do neo4j caches speed up requests?

I am currently working on a project using neo4j as the database and queries that involve defining difficult relationships and after doing performance testing we have some problems.

We found out that the cache affects the request time insanely (3000 to 100ms or so). Executing the same query twice will result in it being very slow and the second much faster. After some searching, we saw a warm-up method that will preload all the nodes and relationships in the database by querying something like this:

match (n)-[r]->() return count(1);

      

After activating the cache and this demining, we significantly reduced the time of our requests, but still not as quickly as if you requested the same request two, three or four times.

So we continued testing and looking for information until we saw that Neo4j also buffers requests in some way so that it doesn't compile every time (using the Scala compiler, if I'm right). I say it somehow because after extensive testing I could conclude that Neo4j is compiling the query on the fly.

Let me show you a simplified example of what I mean:

structure_example

(numbers are id attributes)

If I make a request like this:

match (n:green {id: 1})-[r]->(:red)-[s]->(:green)<-[t]-(m:yellow {id: 7}) 
return count(m);

      

What I want to do is find the relationship between node 1 and node. As you can see, I need to find a bunch of nodes and more important relationships, and the compilation process looks more or less complicated as it took 1227ms to complete the query. If I make the exact same request again, I get a response time of around 5ms, good enough for performance testing. Definitely Neo4j or Scala compiler buffers cypher requests as well.

Realizing that there is a compilation process in the cypher request, I went deeper and started modifying only parts of the already buffered request. Changing the label or id of the last node mapping also caused a delay, but only ~ 19ms, still acceptable:

match (n:green {id: 1})-[r]->(:red)-[s]->(:green)<-[t]-(m:purple {id: 7}) 
return count(m);

      

However, when I restart the server, do a warm-up and set up the request so that the first node (marked as n) does not match, the request will respond very quickly to 0 results so that I can deduce that not the whole request has been parsed since the first node is not matched, and there is no need to go deeper into the tree.

I've also tried with an extra match, providing that it returns null if no match is found, but it doesn't work either.

I wanted to ask, first of all, if everything I said in my tests is correct, and in case it is not, how does it work? And secondly, what should I do (if there is a way) to cache everything at the beginning, when the server is started. Unfortunately, the requirements of the project say that the requests should perform well, even the first one (and not to say that the real scenario has more than a thousand links and nodes, making everything slower), or if there is no way to avoid this delay.

+1


source to share


1 answer


First of all you need to think about the JVM warm-up - be careful when classes are lazy loaded when needed (your first request) and the JIT can only run after a few (thousand) calls.

it

match (n)-[r]->() return count(1);

      

should warm up the node and relationship cache properly, however I'm not sure if it also loads all of its properties and indices. Also make sure your dataset fits in memory.

Providing values ​​directly in the cypher query, for example: {id: 1}

instead of using parameters, {id: {paramId}}

means that if the id value changes, the query must be compiled again.



You can pass parameters this way in the shell:

neo4j-sh (?)$ export paramId=5
neo4j-sh (?)$ return {paramId};
==> +-----------+
==> | {paramId} |
==> +-----------+
==> | 5         |
==> +-----------+
==> 1 row
==> 4 ms

      

So if you need to execute queries from the start

  • change queries to use parameters
  • follow your other startup requests along with the warm-up request

EDIT : Added info on how to pass parameters in the shell

+4


source







All Articles