Neo4J - find nodes where related nodes are a subset
I am very new to Neo4J and graphs.
If I have a very simple graph where node A requires 1 to many node Bs.
Is there an efficient way to find those node As where the Bs they are associated with is a subset of a list of dates.
for example given dataset:
typeA,rel,typeB A1,REQUIRES,B1 A1,REQUIRES,B2 A1,REQUIRES,B3 A2,REQUIRES,B1 A2,REQUIRES,B4 A3,REQUIRES,B4 A4,REQUIRES,B5
I want to ask which of the As are fully covered by the given list of Bs
Examples:
given B1,B2,B3 -> A1
given B1,B3,B4 -> A2, A3
given B1,B3,B4,B5 -> A2, A3, A4
If the specified list Bs does not contain all the Bs with which A is associated, then it should be excluded.
If there is an answer, will it scale to large numbers?
Thank.
source to share
In this answer I am assuming that:
- Nodes are marked as
:A
and:B
and have the propertyid
.- For example "A1" will be
(:A {id: 1})
.
- For example "A1" will be
- You are passing a collection of
:B
IDs of interest in a parameter{ids}
.
The next query should do what you want.
MATCH (a:A)-[:REQUIRES]->(b:B)
WHERE b.id IN {ids}
WITH DISTINCT a
MATCH (a)-[:REQUIRES]->(bb:B)
WITH a, COLLECT(bb) AS bbs
WHERE ALL(x IN bbs WHERE x.id IN {ids})
RETURN a.id
Here is a console that shows the results if the collection of :B
IDs of interest is [1, 3, 4, 5]
, which matches your last example, (Since the console does not support parameter passing, I hardcoded the identity collection in the request.)
Description of the request, in order:
- (First 2 lines) Find all nodes
:A
that require a:B
node with a collection id{ids}
. - Remove the duplicated nodes
:A
so that we get different nodes:A
(which require one or more nodes:B
). - Find all the nodes
:B
required for each of these nodes:A
. (Some of these sites:B
may not be of interest.) - Associate with each of these nodes a
:A
set of all required nodes:B
. - Filter out all nodes
:A
that require nodes:B
that are not of interest. - Return node IDs
:A
that only require the nodes of:B
interest.
Assuming you are creating an index for :B(id)
, this query should be scalable.
source to share