Cypher - Neo4j Query Profiling
I have some questions regarding Quo Neo4j profiling. Consider below a simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and the output is:
So, according to Neo4j Documentation :
3.7.2.2. Expand to
When both the beginning and the end of a node have already been found, expand in is used to find all the connecting relationships between two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN > fof
So, here in the above query (in my case), first of all, it must find both StartNode and EndNode before finding any relationship. But unfortunately, it just finds the StartNode and then expands all the associated relationships :HAS_CONTACT
, which results in the "Expand Into" statement not being used. Why does it work like this? There is only one link between two nodes :HAS_CONTACT
. There :Consumer{mobileNumber}
is no unique index limitation. Why does the above query expand all 7 relationships?
Another question about the Filter operator : why does it require 12 dB even though all the nodes / ratios have already been obtained? Why does this operation require 12dB calls for only 6 lines?
Edited
This is the complete graph I'm asking for:
Also I tested different versions of the same query, but the same query result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
source to share
The query being executed and the example provided in the Neo4j documentation for Deploy to are not the same. An example request starts and ends with the same node.
If you want the scheduler to find both nodes first and see if there is a connection, you can use shortestPath
with a length of 1 to minimize DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
source to share
Why is this being done?
It looks like this behavior has to do with the way the query planner searches the database in response to your cypher query. Cypher provides an interface for finding and performing chart operations (alternatives include Java API, etc.), Requests are processed by a query planner and then turned into chart operations using neo4j internals. It makes sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because the cypher query is written in one way, it will not necessarily search the graph in that how we imagine it will be in our head.
The documentation for this indication is a little sparse (or rather I couldn't find it properly), any links or further explanations would be much appreciated.
Looking at your query, I think you are trying to say this:
"Find two nodes, each labeled :Consumer
, n and m, with pin numbers x and y respectively, using the index mobileNumber
. If you find them, try to find the -[:HAS_CONTACT]->
relationship from n
to m
. If you find a relationship, return both nodes and the relationship, return nothing. "
Running this query this way requires creating a Cartesian product (i.e. a small table of all combinations n
and m
- in this case only one row, but potentially much more for other queries) and then the relationships to be searched between each of those rows.
Instead, since a statement must be executed to continue executing the query MATCH
, neo knows there are two nodes n
and m
must be connected through a relationship -[:HAS_CONTACT]->
if the query is to return anything. So the most efficient way to run a query (and avoid a Cartesian product) is presented below, which simplifies your query.
"Find a node n
with a label :Consumer
and an x ββvalue for the index mobileNumber
that connects through -[:HAS_CONTACT]->
relationshop to the node m
using :Consumer
label and a y value for its proprerty mobileNumber
. Return both the node and the relation, otherwise return nothing."
So instead of doing two index searches, a Cartesian product and a set of expand operations, neo only performs one index lookup, expand all, and filter.
You can see the effect of this simplification by the query planner by using parameters AUTOSTRING
in your query profile.
How to change the query to implement the search as desired
If you want to modify the query so that it should use an extension in a relationship, set the requirement for the relationship optionally, or use explicit iteration. Both of these queries below will give the originally expected query profiles.
Additional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
source to share