Neo4j doesn't use indexes

I am currently trying this request on Neo4j 2.2.2

At the time of this post, we have not flagged any nodes yet as we recently updated it from Neo4j 1.x. therefore we cannot use the USING clause.

I am trying to use indexes but ending up with a full table scan.

START pfComp=node:Company('id:2403226') , ptComp=node:Company('id:1946633')
OPTIONAL MATCH
     (pfComp)<-[c:CHILD_OF*]-(cfComp)
WITH collect(id(cfComp)) as cfCompIds, ptComp, pfComp
OPTIONAL MATCH
                (ptComp)<-[c2:CHILD_OF*]-(ctComp)
WITH  cfCompIds, collect(id(ctComp)) AS ctCompIds
MATCH
                (fComp) -[fR:PARTICIPATES_IN]->  cdeals <-[tR:PARTICIPATES_IN]-(tComp)
WHERE
                (fComp.id = 2403226 or id(fComp) in  cfCompIds) and
                (tComp.id = 1946633 or id(tComp) in ctCompIds)
RETURN fComp, tComp, cdeals

      

Cypher version: CYPHER 2.2, scheduler: COST. 1305292 Total dB hits in 79128 ms.

Any help on this would be much appreciated.

Below is the complete output of the profile command.

This is the explain plan for the query

The initial part of the query is fast:

profile START pfComp=node:Company('id:2403226') , ptComp=node:Company('id:1946633')
OPTIONAL MATCH
     (pfComp)<-[c:CHILD_OF*]-(cfComp)
WITH collect(id(cfComp)) as cfCompIds, ptComp, pfComp
OPTIONAL MATCH
                (ptComp)<-[c2:CHILD_OF*]-(ctComp)
return   cfCompIds, collect(id(ctComp)) AS ctCompIds

      

Cypher version: CYPHER 2.2, scheduler: COST. 836 cumulative db deletions in 582 ms.

enter image description here

+3


source to share


3 answers


The second part looks like a relational join or additional lookup (like n + 1 selection). Perhaps use a graphical model instead? And the query is also simplified.

So, you have to calculate fComp

and tComp

with initial matches, because of *0..

it includes pfComp and ptComp each.

Then you have an intersection between fComp and tComp for the last match.



Please give it a try and see how it works:

MATCH (pfComp:lCompany)<-[c:CHILD_OF*0..]-(fComp:lCompany)
WHERE pfComp.id = 2403226
// reduce cardinality for following match
WITH collect(distinct fComp) as companies1
MATCH (ptComp:lCompany)<-[c2:CHILD_OF*]-(tComp:lCompany)
WHERE ptComp.id = 1946633
// create cross product between fComp and tComp
UNWIND companies1 as fComp
MATCH (fComp) -[fR:PARTICIPATES_IN]->(cdeals)<-[tR:PARTICIPATES_IN]-(tComp)
RETURN  fComp, tComp, cdeals;

      

+2


source


Your profile has index queries (NodeByQueryIndex).

You can determine which index you want to use in your query:



MATCH (n:Swedish)
USING INDEX n:Swedish(surname)
WHERE n.surname = 'Taylor'
RETURN n

      

See http://neo4j.com/docs/stable/query-using.html

+1


source


Our solution was to create labels (lCompany) and add a new index type to the Company.id column (CREATE INDEX ON: lCompany (id)).

Then the query was modified to use the new index:

OPTIONAL MATCH
     (pfComp:lCompany)<-[c:CHILD_OF*]-(cfComp:lCompany)
WHERE pfComp.id = 2403226
WITH 
    collect(cfComp.id) as cfCompIds
    , pfComp
OPTIONAL MATCH
        (ptComp:lCompany)<-[c2:CHILD_OF*]-(ctComp:lCompany)
WHERE ptComp.id = 1946633
WITH  cfCompIds, 
      collect(ctComp.id)  AS ctCompIds, 
      pfComp, ptComp 
MATCH
      (fComp:lCompany) -[fR:PARTICIPATES_IN]->  cdeals <-[tR:PARTICIPATES_IN]-(tComp:lCompany)
USING INDEX fComp:lCompany(id) //tComp:lCompany(id)
WHERE
                (   
                    fComp.id in (cfCompIds + [2403226])
                ) 
                and
                (
                    tComp.id in (ctCompIds + [1946633])
                )
RETURN  fComp, tComp, cdeals

      

There may be further optimizations that can be performed, but that is as far as we have come so far.

Now the profiling results are:

Cypher version: CYPHER 2.2, scheduler: COST. 134151 total db in 1498ms.

This is the new profile after setup:

enter image description here

+1


source







All Articles