Create an identifier based on column values
I have the following dataset that represents the mapping between project and activity:
ProjectID ActivityID
A A
B B
B C
D D
E D
M N
I would like to calculate the ID based on the following rules:
a project that maps 1-1 with an activity:
A - A
M - N
one project that maps to multiple activities:
B - B
B - C
one activity that maps to multiple projects:
D - D
E - D
This will create:
ProjectID Activity CalculatedID
A A 1
B B 2
B C 2
D D 3
E D 3
M N 4
I hope there is enough information there, any ideas are appreciated. I'm particularly interested in the set-based approach.
- UPDATE: Note on answers - I would describe the approach taken by @Erwin as classifying the views, as opposed to the solution provided by @mellamokb (which builds on @CodeByMoonlight's solution) which assigns the id to seqential. Both of your solutions helped me along the way, thanks guys!
source to share
It's a little confusing, but it works:
SELECT ProjectID, ActivityID,
DENSE_RANK() OVER(ORDER BY ProjectID) +
DENSE_RANK() OVER(ORDER BY ActivityID) -
ROW_NUMBER() OVER(ORDER BY ProjectID, ActivityID) AS CalculatedID
FROM MyTable
The two uses of DENSE_RANK affect the creation of an offset against ROW_NUMBER when repeating a ProjectID or ActivityID.
source to share
Here's a built solution with an answer @CodeByMoonlight
that handles the case where the activityID and the project ID can alternate, that is, the activity ID is higher ID than some other records, but the project ID is lower than some other records:
SELECT
D.ProjectID,
D.ActivityID,
-- generate id based on the three different scenarios
-- 1) projects with 1-many activities, use project id
-- 2) activities with 1-many projects, use activity id
-- 3) 1-1, use project id
DENSE_RANK() over (order by
case
when P.ProjectID is not null then P.ProjectID
when A.ActivityID is not null then A.ActivityID
else D.ProjectID
end
) as Identifier
from
MyTable D
left join
(
-- projects with 1-many activities
SELECT ProjectID
FROM MyTable
group by ProjectID
having Count(ActivityID) > 1
) P on P.ProjectID = D.ProjectID
left join
(
-- activities with 1-many projects
SELECT ActivityID
FROM MyTable
group by ActivityID
having Count(ProjectID) > 1
) A on A.ActivityID = D.ActivityID
Input example:
B C
A A
B B
B G
D D
B F
E D
M N
Output example:
A A 1
B B 2
B G 2
B F 2
B C 2
E D 3
D D 3
M N 4
source to share
;WITH p AS (
SELECT ProjectID FROM tbl GROUP BY ProjectID HAVING count(*) > 1
),a AS (
SELECT ActivityID FROM tbl GROUP BY ActivityID HAVING count(*) > 1
)
SELECT t.*
,CASE
WHEN p.ProjectID IS NOT NULL
AND a.ActivityID IS NOT NULL THEN 4 -- n:m (missing in question!)
WHEN p.ProjectID IS NOT NULL THEN 2 -- 1:n
WHEN a.ActivityID IS NOT NULL THEN 3 -- n:1
ELSE 1 -- 1:1
END AS CalculatedID
FROM tbl AS t
LEFT JOIN p ON p.ProjectID = t.ProjectID
LEFT JOIN a ON a.ActivityID = t.ActivityID
I explain:
- 1) In the CTE,
p
find all projects that have more than one activity. - 2) In the CTE,
a
find all activities that have more than one project. - 3) LEFT JOIN to pins in base table and distinguish 4 cases in CASE statement.
I added case 4 (n: m) missing in the question.
Cm. Working demonstration on data.SE .
source to share