Create an identifier based on column values

Question

Create an identifier based on column values

I have the following dataset that represents the mapping between project and activity:

ProjectID  ActivityID
A         A
B         B
B         C
D         D
E         D
M         N

I would like to calculate the ID based on the following rules:

a project that maps 1-1 with an activity:
A - A
M - N

one project that maps to multiple activities:
B - B
B - C

one activity that maps to multiple projects:
D - D
E - D

This will create:

ProjectID  Activity CalculatedID
A         A        1
B         B        2
B         C        2
D         D        3
E         D        3
M         N        4

I hope there is enough information there, any ideas are appreciated. I'm particularly interested in the set-based approach.

- UPDATE: Note on answers - I would describe the approach taken by @Erwin as classifying the views, as opposed to the solution provided by @mellamokb (which builds on @CodeByMoonlight's solution) which assigns the id to seqential. Both of your solutions helped me along the way, thanks guys!

+3

sql tsql

KierenH 27 Mar 12 at 13:37

source to share

3 answers

MartW · Answer 1 · 2012-03-27T14:03:11+0000

It's a little confusing, but it works:

SELECT ProjectID, ActivityID,
DENSE_RANK() OVER(ORDER BY ProjectID) +
DENSE_RANK() OVER(ORDER BY ActivityID) -
ROW_NUMBER() OVER(ORDER BY ProjectID, ActivityID) AS CalculatedID
FROM MyTable

The two uses of DENSE_RANK affect the creation of an offset against ROW_NUMBER when repeating a ProjectID or ActivityID.

mellamokb · Answer 2 · 2012-03-27T14:23:00+0000

Here's a built solution with an answer @CodeByMoonlight

that handles the case where the activityID and the project ID can alternate, that is, the activity ID is higher ID than some other records, but the project ID is lower than some other records:

SELECT
    D.ProjectID,
    D.ActivityID,
    -- generate id based on the three different scenarios
    -- 1) projects with 1-many activities, use project id
    -- 2) activities with 1-many projects, use activity id
    -- 3) 1-1, use project id
    DENSE_RANK() over (order by
        case
            when P.ProjectID is not null then P.ProjectID
            when A.ActivityID is not null then A.ActivityID
            else D.ProjectID
        end
    ) as Identifier
from
    MyTable D
left join
(
    -- projects with 1-many activities
    SELECT ProjectID
    FROM MyTable
    group by ProjectID
    having Count(ActivityID) > 1
) P on P.ProjectID = D.ProjectID
left join
(
    -- activities with 1-many projects
    SELECT ActivityID
    FROM MyTable
    group by ActivityID
    having Count(ProjectID) > 1
) A on A.ActivityID = D.ActivityID

Input example:

B   C
A   A
B   B
B   G
D   D
B   F
E   D
M   N

Output example:

A   A   1
B   B   2
B   G   2
B   F   2
B   C   2
E   D   3
D   D   3
M   N   4

Erwin Brandstetter · Answer 3 · 2012-03-27T14:49:19+0000

;WITH p AS (
    SELECT ProjectID FROM tbl GROUP BY ProjectID HAVING count(*) > 1
    ),a AS (
    SELECT ActivityID FROM tbl GROUP BY ActivityID HAVING count(*) > 1
    )
SELECT t.*
      ,CASE
         WHEN p.ProjectID  IS NOT NULL
          AND a.ActivityID IS NOT NULL THEN 4 -- n:m (missing in question!)
         WHEN p.ProjectID  IS NOT NULL THEN 2 -- 1:n
         WHEN a.ActivityID IS NOT NULL THEN 3 -- n:1
         ELSE                               1 -- 1:1
       END AS CalculatedID
FROM   tbl AS t
LEFT   JOIN p ON p.ProjectID = t.ProjectID
LEFT   JOIN a ON a.ActivityID = t.ActivityID

I explain:

1) In the CTE, p

find all projects that have more than one activity.
2) In the CTE, a

find all activities that have more than one project.
3) LEFT JOIN to pins in base table and distinguish 4 cases in CASE statement.

I added case 4 (n: m) missing in the question.
Cm. Working demonstration on data.SE .

Create an identifier based on column values

More articles: