Select the longest path with the RECURSIVE request
I am new WITH RECURSIVE
to PostgreSQL. I have a fairly standard recursive query following an adjacency list. If I have, for example:
1 -> 2
2 -> 3
3 -> 4
3 -> 5
5 -> 6
it produces:
1 1,2 1,2,3 1,2,3,4 1,2,3,5 1,2,3,5,6
I would only like to have:
1,2,3,4 1,2,3,5,6
But I can't see how to do this in Postgres. It would seem to be "pick the longest paths" or "pick paths that are not contained in another path." I can probably see how to do this with a join, but that seems pretty inefficient.
Example request:
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1, ARRAY[g.id], false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
source to share
You already have a solution at hand cycle
, just add the predicate at the end.
But set up the interrupt condition one level, currently you are adding one node too many:
WITH RECURSIVE search AS (
SELECT id, link, data, ARRAY[g.id] AS path, (link = id) AS cycle
FROM graph g
WHERE NOT EXISTS (
SELECT 1
FROM graph
WHERE link = g.id
)
UNION ALL
SELECT g.id, g.link, g.data, s.path || g.id, g.link = ANY(s.path)
FROM search s
JOIN graph g ON g.id = s.link
WHERE NOT s.cycle
)
SELECT *
FROM search
WHERE cycle;
-- WHERE cycle IS NOT FALSE; -- alternative if link can be NULL
-
Also includes a trigger condition like @wildplasser mentioned .
-
The initialization condition is
cycle
-(link = id)
to catch shortcut loops. Not necessary if you have a constraintCHECK
to disallow this on your table. -
The exact implementation depends on the missing details.
-
It is assumed that all graphs end with a cycle or
link IS NULL
, and there is an FK limit fromlink
toid
in the same table. The exact implementation depends on the missing details. Iflink
not actually a link (no referential integrity) then you need to adapt ...
source to share
Just add an additional clause to the final request, for example:
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1, ARRAY[g.id], false
FROM graph g
-- BTW: you should add a START-CONDITION here, like:
-- WHERE g.id = 1
-- or even (to find ALL linked lists):
-- WHERE NOT EXISTS ( SELECT 13
-- FROM graph nx
-- WHERE nx.link = g.id
-- )
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph sg
WHERE NOT EXISTS ( -- <<-- extra condition
SELECT 42 FROM graph nx
WHERE nx.id = sg.link
);
Note that:
-
not exists(...)
-clause tries to join the same record as the second segment of the recursive join. - So: they are mutually exclusive.
- if it exists, it must be added to the "list" by a recursive query.
source to share
I'm not sure if this should be viewed as an ugly solution.
WITH recursive graph (child, parent) AS (
SELECT 2, 1
UNION
SELECT 3, 2
UNION
SELECT 4, 2
UNION
SELECT 6, 5
UNION
SELECT 7, 6
UNION
SELECT 6, 7
),
paths (start, node, depth, path, has_cycle, terminated) AS (
SELECT
ARRAY[g1.parent],
false,
false
FROM graph g1
WHERE true
AND NOT EXISTS (SELECT 1 FROM graph g2 WHERE g1.parent = g2.child)
UNION ALL
SELECT
p.path || g.child,
g.child = ANY(p.path),
g.parent is null AS terminated
FROM paths p
LEFT OUTER JOIN graph g ON g.parent = p.node
WHERE NOT has_cycle
)
SELECT * from path WHERE terminated
;
So the trick is to use the terminated
column with LEFT OUTER JOIN
and then only select the completed paths.
source to share