SQL to normalize existing (many-to-many) data
Summary:
See below for details. I'm copying [unanswered
] the many-to-many question at the top for readability:
Given the "Input" table, what is the SQL to generate the 3rd "Output"
table (Person_plays_Instrument)?
Current input (1 table):
OriginalTable:
PersonId PersonName Instrument_1 Instrument_2 Instrument_3 MailingAddress HomePhone
--------|----------|------------|------------|------------|--------------|------------
1 Bob Violin Viola Trumpet someplace 111-111-1111
2 Suzie Cello Flute <null> otherplace 222-222-2222
3 Jim Violin <null> <null> thirdplace 333-333-3333
Desired output (3 tables):
Person:
Id Name MailingAddress HomePhone
--|------|--------------|------------
1 Bob someplace 111-111-1111
2 Suzie otherplace 222-222-2222
3 Jim thirdplace 333-333-3333
Instrument:
Id Name
--|-------
1 Violin
2 Cello
3 Viola
4 Flute
5 Trumpet
Person_plays_Instrument:
PersonId InstrumentId
--------|------------
1 1
1 3
1 5
2 2
2 4
3 1
Details:
I have one flat SQL table that started out as a spreadsheet. I would like to normalize it. I would split this into 1 question for each table.
The answers to questions 1 and 2, but I leave them in case others find them helpful.
Questions:
Question # 1: [answered
]
How do I create the Person table?
Answer # 1:
This great post gives me 2 / 3rds of the way there. For one-to-many tables, I am configured. Here's the code:
[add autonumber field to OriginalTable, name it PersonId]
[create empty Person table with Id, Name, MailingAddress, HomePhone fields]
INSERT INTO Person (Id, Name, MailingAddress, HomePhone)
SELECT o.PersonID, o.PersonName, o.MailingAddress, o.HomePhone
FROM OriginalTable as o
WHERE o.PersonName Is Not Null;
Question # 2: [attempted
] (better version from @Branko in the accepted answer)
How do I create an Instrument table?
Answer # 2:
Again, one-to-many. At first, a few columns baffled me. The solution got two parts:
- I just need to repeat the INSERT command, once for each column.
- Using this post and the IN operator, I can check each time to confirm that I haven't inserted that value yet.
Here's the code:
[create empty Instrument table with Id[autonumber], Name fields]
INSERT INTO Instrument (Name)
SELECT Distinct o.Instrument_1
FROM OriginalTable as o
WHERE o.Instrument_1 Is Not Null
AND o.Instrument_1 Not In (SELECT Name from Instrument);
INSERT INTO Instrument (Name)
SELECT Distinct o.Instrument_2
FROM OriginalTable as o
WHERE o.Instrument_2 Is Not Null
AND o.Instrument_2 Not In (SELECT Name from Instrument);
INSERT INTO Instrument (Name)
SELECT Distinct o.Instrument_3
FROM OriginalTable as o
WHERE o.Instrument_3 Is Not Null
AND o.Instrument_3 Not In (SELECT Name from Instrument);
Question # 3: [unanswered
]
How do I create the Person_plays_Instrument table?
source to share
Assuming there is OriginalTable.PersonID
one that you did not show us but implied by your own answer # 1, answer # 3 can be expressed simply as:
INSERT INTO Person_plays_Instrument (PersonId, InstrumentId)
SELECT PersonID, Instrument.Id
FROM
OriginalTable
JOIN Instrument
ON OriginalTable.Instrument_1 = Instrument.Name
OR OriginalTable.Instrument_2 = Instrument.Name
OR OriginalTable.Instrument_3 = Instrument.Name;
By the way, there is a more concise way to express answer # 2 :
INSERT INTO Instrument (Name)
SELECT *
FROM (
SELECT o.Instrument_1 I
FROM OriginalTable as o
UNION
SELECT o.Instrument_2
FROM OriginalTable as o
UNION
SELECT o.Instrument_3
FROM OriginalTable as o
) Q
WHERE I IS NOT NULL;
And here is a fully working SQL Fiddle example for MS SQL Server. Other DBMSs should behave in a similar way. BTW, you must tag your question appropriately to indicate your DBMS.
source to share