Neo4j doesn't use indexes
I am currently trying this request on Neo4j 2.2.2
At the time of this post, we have not flagged any nodes yet as we recently updated it from Neo4j 1.x. therefore we cannot use the USING clause.
I am trying to use indexes but ending up with a full table scan.
START pfComp=node:Company('id:2403226') , ptComp=node:Company('id:1946633')
OPTIONAL MATCH
(pfComp)<-[c:CHILD_OF*]-(cfComp)
WITH collect(id(cfComp)) as cfCompIds, ptComp, pfComp
OPTIONAL MATCH
(ptComp)<-[c2:CHILD_OF*]-(ctComp)
WITH cfCompIds, collect(id(ctComp)) AS ctCompIds
MATCH
(fComp) -[fR:PARTICIPATES_IN]-> cdeals <-[tR:PARTICIPATES_IN]-(tComp)
WHERE
(fComp.id = 2403226 or id(fComp) in cfCompIds) and
(tComp.id = 1946633 or id(tComp) in ctCompIds)
RETURN fComp, tComp, cdeals
Cypher version: CYPHER 2.2, scheduler: COST. 1305292 Total dB hits in 79128 ms.
Any help on this would be much appreciated.
Below is the complete output of the profile command.
The initial part of the query is fast:
profile START pfComp=node:Company('id:2403226') , ptComp=node:Company('id:1946633')
OPTIONAL MATCH
(pfComp)<-[c:CHILD_OF*]-(cfComp)
WITH collect(id(cfComp)) as cfCompIds, ptComp, pfComp
OPTIONAL MATCH
(ptComp)<-[c2:CHILD_OF*]-(ctComp)
return cfCompIds, collect(id(ctComp)) AS ctCompIds
Cypher version: CYPHER 2.2, scheduler: COST. 836 cumulative db deletions in 582 ms.
source to share
The second part looks like a relational join or additional lookup (like n + 1 selection). Perhaps use a graphical model instead? And the query is also simplified.
So, you have to calculate fComp
and tComp
with initial matches, because of *0..
it includes pfComp and ptComp each.
Then you have an intersection between fComp and tComp for the last match.
Please give it a try and see how it works:
MATCH (pfComp:lCompany)<-[c:CHILD_OF*0..]-(fComp:lCompany)
WHERE pfComp.id = 2403226
// reduce cardinality for following match
WITH collect(distinct fComp) as companies1
MATCH (ptComp:lCompany)<-[c2:CHILD_OF*]-(tComp:lCompany)
WHERE ptComp.id = 1946633
// create cross product between fComp and tComp
UNWIND companies1 as fComp
MATCH (fComp) -[fR:PARTICIPATES_IN]->(cdeals)<-[tR:PARTICIPATES_IN]-(tComp)
RETURN fComp, tComp, cdeals;
source to share
Our solution was to create labels (lCompany) and add a new index type to the Company.id column (CREATE INDEX ON: lCompany (id)).
Then the query was modified to use the new index:
OPTIONAL MATCH
(pfComp:lCompany)<-[c:CHILD_OF*]-(cfComp:lCompany)
WHERE pfComp.id = 2403226
WITH
collect(cfComp.id) as cfCompIds
, pfComp
OPTIONAL MATCH
(ptComp:lCompany)<-[c2:CHILD_OF*]-(ctComp:lCompany)
WHERE ptComp.id = 1946633
WITH cfCompIds,
collect(ctComp.id) AS ctCompIds,
pfComp, ptComp
MATCH
(fComp:lCompany) -[fR:PARTICIPATES_IN]-> cdeals <-[tR:PARTICIPATES_IN]-(tComp:lCompany)
USING INDEX fComp:lCompany(id) //tComp:lCompany(id)
WHERE
(
fComp.id in (cfCompIds + [2403226])
)
and
(
tComp.id in (ctCompIds + [1946633])
)
RETURN fComp, tComp, cdeals
There may be further optimizations that can be performed, but that is as far as we have come so far.
Now the profiling results are:
Cypher version: CYPHER 2.2, scheduler: COST. 134151 total db in 1498ms.
This is the new profile after setup:
source to share