SQL to normalize existing (many-to-many) data

Summary:
See below for details. I'm copying [unanswered

] the many-to-many question at the top for readability:

    Given the "Input" table, what is the SQL to generate the 3rd "Output"
    table (Person_plays_Instrument)?

      



Current input (1 table):

OriginalTable:
PersonId PersonName Instrument_1 Instrument_2 Instrument_3 MailingAddress HomePhone
--------|----------|------------|------------|------------|--------------|------------
1        Bob        Violin       Viola        Trumpet      someplace      111-111-1111
2        Suzie      Cello        Flute        <null>       otherplace     222-222-2222
3        Jim        Violin       <null>       <null>       thirdplace     333-333-3333

      

Desired output (3 tables):

Person:
Id Name   MailingAddress HomePhone
--|------|--------------|------------
1  Bob    someplace      111-111-1111
2  Suzie  otherplace     222-222-2222
3  Jim    thirdplace     333-333-3333

Instrument:
Id Name
--|-------
1  Violin
2  Cello
3  Viola
4  Flute
5  Trumpet

Person_plays_Instrument:
PersonId InstrumentId
--------|------------
1        1
1        3
1        5
2        2
2        4
3        1

      


Details:

I have one flat SQL table that started out as a spreadsheet. I would like to normalize it. I would split this into 1 question for each table.

The answers to questions 1 and 2, but I leave them in case others find them helpful.

Questions:

Question # 1: [answered

]
How do I create the Person table?

Answer # 1:
This great post gives me 2 / 3rds of the way there. For one-to-many tables, I am configured. Here's the code:

[add autonumber field to OriginalTable, name it PersonId]
[create empty Person table with Id, Name, MailingAddress, HomePhone fields]

INSERT INTO Person (Id, Name, MailingAddress, HomePhone)
  SELECT o.PersonID, o.PersonName, o.MailingAddress, o.HomePhone
  FROM OriginalTable as o
  WHERE o.PersonName Is Not Null;

      


Question # 2: [attempted

] (better version from @Branko in the accepted answer)
How do I create an Instrument table?

Answer # 2:
Again, one-to-many. At first, a few columns baffled me. The solution got two parts:

  • I just need to repeat the INSERT command, once for each column.
  • Using this post and the IN operator, I can check each time to confirm that I haven't inserted that value yet.

Here's the code:

[create empty Instrument table with Id[autonumber], Name fields]

INSERT INTO Instrument (Name)
  SELECT Distinct o.Instrument_1
  FROM OriginalTable as o
  WHERE o.Instrument_1 Is Not Null
  AND o.Instrument_1 Not In (SELECT Name from Instrument);

INSERT INTO Instrument (Name)
  SELECT Distinct o.Instrument_2
  FROM OriginalTable as o
  WHERE o.Instrument_2 Is Not Null
  AND o.Instrument_2 Not In (SELECT Name from Instrument);

INSERT INTO Instrument (Name)
  SELECT Distinct o.Instrument_3
  FROM OriginalTable as o
  WHERE o.Instrument_3 Is Not Null
  AND o.Instrument_3 Not In (SELECT Name from Instrument);

      


Question # 3: [unanswered

]
How do I create the Person_plays_Instrument table?

+3


source to share


1 answer


Assuming there is OriginalTable.PersonID

one that you did not show us but implied by your own answer # 1, answer # 3 can be expressed simply as:

INSERT INTO Person_plays_Instrument (PersonId, InstrumentId)
SELECT PersonID, Instrument.Id
FROM
    OriginalTable
    JOIN Instrument
        ON OriginalTable.Instrument_1 = Instrument.Name
        OR OriginalTable.Instrument_2 = Instrument.Name
        OR OriginalTable.Instrument_3 = Instrument.Name;

      

By the way, there is a more concise way to express answer # 2 :



INSERT INTO Instrument (Name)
    SELECT *
    FROM (
        SELECT o.Instrument_1 I
        FROM OriginalTable as o
        UNION
        SELECT o.Instrument_2
        FROM OriginalTable as o
        UNION
        SELECT o.Instrument_3
        FROM OriginalTable as o
    ) Q
    WHERE I IS NOT NULL;

      

And here is a fully working SQL Fiddle example for MS SQL Server. Other DBMSs should behave in a similar way. BTW, you must tag your question appropriately to indicate your DBMS.

+4


source







All Articles