Best way to remove loops from path in neo4j plot

I am using neo4j graph database version 2.1.7. Data Brief: 2 million nodes with 6 different types of nodes, 5 million relationships with only 5 different types of relationships and mostly associated graph, but contains several separate subgraphs.

When resolving paths, I get loops in the path. And to limit this, I used the solution shared below: Returning only simple paths in a Neo4j Cypher query

Here is the request I am using:

MATCH (n:nodeA{key:905728}) 
MATCH path = n-[:rel1|rel2|rel3|rel4*0..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA) 
WHERE ALL(a in nodes(path) where 1=length (filter (m in nodes(path) where m=a))) 
and (length(EXTRACT (p in NODES(path)| p.key)) > 1) 
and ((exists ((c)-[:rel5]->(b)) and (not exists((b)-[:rel1|rel2|rel3|rel4]->(:nodeA)) OR ANY (x in nodes(path) where (b)-[]->(x))))
    OR (not exists ((c)-[:rel5]->()) and (not exists ((c)-[:rel1|rel2|rel3|rel4]->(:nodeA)) OR ANY (x in nodes(path) where (c)-[]->(x))))) 
RETURN distinct EXTRACT (rp in Rels(path)| type(rp)), EXTRACT (p in NODES(path)| p.key);

      

The above query solves my requirements but is not cost effective and keeps working if done for a huge subgraph. I used the Profile command to improve query performance from where I started. But now it's stuck. Performance improved, but not what I expected from neo4j :(

+3


source to share


2 answers


I don't know I have a solution, but I have a number of suggestions. Some may speed up the process, some may simply make the request easier to read.

First, instead of putting it exists ((c)-[:rel5]->(b))

in yours WHERE

, I believe you can put it in yours MATCH

like this:

MATCH path = n-[:rel1|rel2|rel3|rel4*0..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA), (c)-[:rel5]->(b)

      

I don't think you need a keyword exists

. I think you can just say for example(NOT (b)-[:rel1|rel2|rel3|rel4]->(:nodeA))



I also suggest thinking about an WITH

article
to improve performance.

A few notes on your variable paths: B *0..

0

means you are potentially self-promoting. This may or may not be what you want. Also, leaving the variable path open can often cause performance problems (as I think you can see). If you can close it that might help.

Also, if you upgrade to 2.2.1, there are a number of built-in performance improvements on line 2.2.x, but you also get visual PROFILE

ing in the console and a new command EXPLAIN

that both profiles and tells you the real performance of the query after it runs.

On the one hand, it should be borne in mind that I don't think you are hitting the Neo4j performance bounds, but most likely you can potentially run into some Cypher bounds. If so, I can suggest you fulfill your requests using Java APIs, which Neo4j provides for better performance and more control. This could be related to your database injection if you are using a JVM-compatible language or writing an unmanaged extension that allows you to make your own requests in java but provide a custom REST API from the server

+1


source


Made some more tweaks to my query as suggested by Brian above. And I found an improvement in the response time to the request. It now takes almost 20% of the execution time compared to my original query, and the current query removes almost 60% less db compared to the query I used earlier during query execution. PFB updated request:

MATCH (n:nodeA{key:905728}) 
MATCH path = n-[:rel1|rel2|rel3|rel4*1..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA) 
WHERE ALL(a in nodes(path) where 1=length (filter (m in nodes(path) where m=a))) 
and (length(path) > 0) 
and ((exists ((c)-[:rel5]->(b)) and (not ((c)-[:rel1|rel2|rel3|rel4]->()) OR ANY (x in nodes(path) where (c)-[]->(x))))
    OR (not exists ((c)-[:rel5]->()) and (not ((c)-[:rel1|rel2|rel3|rel4]->()) OR ANY (x in nodes(path) where (c)-[]->(x))))) 
RETURN distinct EXTRACT (rp in Rels(path)| type(rp)), EXTRACT (p in NODES(path)| p.key);

      



And I saw a sharp improvement when I limited the path from * 1 .. to * 1..15. In addition, we removed one filter from the request, which also took more time. But the response time to the request increased when the request was made on nodes with ratios of more than 18-20 depths.

I would suggest using the profile command frequently to find pain points in your query. This will help you solve problems faster. Thanks Brian.

0


source







All Articles