How to correctly query a normalized database

I am in the process of reworking a MySQL database which has a table of about 1500 columns, among other tables. We want to normalize the data in this table by creating a second table that will have a record for every column / row that existed in the original table. Let's call these tables Master and MasterData. The wizard will contain the basic information that all records in this table will require. MasterData will contain values ​​for some additional data related to records in the master table. So let the Master look like this:

MasterID     Property1     Property2
1            Yes           No
2            No            Yes
3            Yes           Yes
4            No            No

      

Let's assume MasterData looks like this:

MasterID     Property     Value
1            Property3    Yes
1            Property4    No
3            Property3    No
4            Property7    Yes

      

Are you still with me? How do I query this data and only get one row returned on each master row, but containing all MasterData related information. I searched and found a couple of examples, but they take too long to execute on our data. I created a test MasterData table based on existing data in our one huge table mentioned earlier. This results in MasterData having about 4.5 million records and the following queries are simply taking too long to complete and time out.

SELECT Property1, Property2, Master.MasterID,
    GROUP_CONCAT(case when Property = "Property3" then Value end) as Property3, 
    GROUP_CONCAT(case when Property = "Property7" then Value end) as Property7
FROM Master LEFT JOIN MasterData USING (MasterID) GROUP BY MasterID
HAVING Property3='Yes' OR Property7='Yes';

      

or

Select * FROM Master AS M, MasterData AS MD1, MasterData AS MD2 
WHERE M.MasterID=MD1.MasterID AND MD1.Property='Property3' AND MD1.Value='Yes' 
AND M.MasterID=MD2.MasterID AND MD2.VAR='Property7' AND MD2.Value='Yes';

      

Again, our goal is to get all the data in MasterData in one row, as if it were a column in Master. Is it possible?

Any help is greatly appreciated!

+3


source to share


1 answer


Again, our goal is to get all the data in MasterData in one row, as if it were a column in Master. Is it possible?

Without fully understanding your purpose, I'm going to go out to the limb and say that this is possible, strictly speaking. But this is hardly possible in any practical sense. Performance is likely to be terrible even at best (just one or two properties); in the likely case (which, between 30 and 500 properties), you can remove the server entirely.

Normalized does not mean "creating a second table that will have an entry for every column / row that existed in the original table." It doesn't mean anything, even remotely. But it is possible that normalization will solve your problem. (In my experience, most database problems are structural.)

What you suggested here is a solution that doesn't work well for the problem you didn't mention. To get the most out of your StackOverflow experience, list the problem you are trying to solve, as well as the solutions you tried.

Wikipedia article on database normalization


If you start with a table like this ...

create table master_data (
  master_id integer not null,
  property_name varchar(30) not null,
  property_value boolean not null default true,
  primary key (master_id, property_name)
);

insert into master_data values
(1, 'Property3', true),
(1, 'Property4', false),
(3, 'Property3', false),
(4, 'Property7', true);

      

., then you can get all properties for all things with a simple query. (Assumes all of your properties are Boolean.)



select * 
from master_data
order by master_id, property_name
--
1   Property3   t
1   Property4   f
3   Property3   f
4   Property7   t

      

The application code can be quite simple. And you can delete all lines where property_value is false.

This structure allows an unlimited number of properties for each item. But your requirements for a) return an arbitrary number of properties on one line and b) make minimal changes to your application code should change. Nothing like this.


If your table contains these lines.,.

insert into master_data values
(1, 'Property3', true),
(1, 'Property4', false),
(3, 'Property3', false),
(4, 'Property7', true),
(1, 'Property7', true);

      

here is one way to get the set of "things" that qualify and join that set in the master data table.

select md.* 
from master_data md
inner join (select master_id
            from master_data
            where (
              (property_name = 'Property3' and property_value = true) or
              (property_name = 'Property7' and property_value = true)
            )
            group by master_id 
            having count(*) = 2 ) cd
  on (md.master_id = cd.master_id)

      

For what it's worth, normalization is still probably the best choice for long term maintenance and performance. This structure (above) is not normalized; performance is generally poor with large datasets. (PostgreSQL with the hstore add-on might be better than MySQL at this.)

+2


source







All Articles