Finding the least common ancestor from the transitive closure table
I have a table representing the transitive closure of an organizational hierarchy (i.e. its tree with a single root):
create table ancestry (
ancestor integer,
descendant integer,
distance integer
);
I have another table that contains the organizations that each user has access to:
create table accessible (
user integer,
organization integer
);
The system shows the user a roll-up of the costs associated with each organization the user can access. I could always start by showing the user a view of the company (i.e. Root) showing the user a list of direct subsidiaries and how many of their organizations contribute to the total. In most cases, there will be one child and the user will need to expand several levels before seeing multiple children. I would prefer to start my presentation with the first organization that shows multiple children (i.e. LCA).
For a given user, I can easily find a set of root paths, but I am having a hard time finding the least common ancestor. I am using postgresql 9.1 but prefer a solution that is database agnostic. In the worst case, I can pull the paths back into the application code and calculate the LCA there.
source to share
I took another look at this and came up with the following solution. I used a common-table expression to make it easier to understand how it works, but it can be easily written using a subquery.
with
hit (id, count) as (
select
ancestry.ancestor
,count(ancestry.descendant)
from
accessible
inner join ancestry
on accessible.organization = ancestry.descendant
where
accessible.user = @user_id
group by
ancestry.ancestor
)
select
ancestry.descendant as lca
from
hit
inner join ancestry
on ancestry.descendant = hit.id
and ancestry.ancestor = @company_id
order by
hit.count desc
,ancestry.distance desc
limit 1
;
The CTE column lists, for each organization in the hierarchy, the number of paths from the child to the root that crosses the organization. LCA is the organization with the most traversal. In the case of a tie, the organization farthest from the root (i.e. Max (distance)) is the actual LCA. This is best illustrated by an example.
A
|
B
/ \
C D
Assuming we want to find the LCA of nodes C and D from the tree above. Hit CTE makes the following calculations:
Node Count
A 2
B 2
C 1
D 1
The main query adds distance:
Node Count Distance
A 2 0
B 2 1
C 1 2
D 1 2
The main query then orders the results in descending number and distance
Node Count Distance
B 2 1
A 2 0
C 1 2
D 1 2
LCA is the first item in the list.
source to share
Just a guess, not dn agnostic (SQL Server) but adaptable
SELECT TOP 1
a1.ancestor
FROM ancestor a1
INNER JOIN
ancestor a2 ON a1.ancestor=a2.ancestor
WHERE a1.descendent = @Dec1
AND
a2.descendent = @Dec2
ORDER BY a1.distance DESC
If you want to put some data in a SQLFiddle I can play with it.
source to share