Simple DB Design Question - Dimension Sets
I have several sets of measurements. Each set has several meanings in it. The data is currently in a spreadsheet and I would like to move it to a real database.
In a spreadsheet, each set of dimensions is in its own column, i.e.
1 | 2 | 3 | ...
0.5 | 0.7 | 0.2 | ...
0.3 | 0.6 | 0.4 | ...
etc. If I have, say, 10 datasets and each has 8 dimensions, I end up with an 8x10 table with 80 values.
To put the database in normal form, I know it must be designed to add rows, not columns, for each new set. I assume this means that tables should be ordered something like (MySQL syntax):
create table Measurement(
int id not null auto_increment,
primary key(id)
);
create table Measurement_Data(
int id not null auto_increment,
double value not null,
int measurement,
primary key(id),
constraint fk_data foreign key(measurement) references Measurement(id)
}
It is right? The result is a table with only one autogenerated column (although of course I could add more to it) and another table in which each dataset is stored "lengthwise" which means the table will contain 800 rows given the above values. Is this the correct way to deal with this problem? This also means I need to add data, I need to insert a row into the Measurement table, get its assigned pk, and then add all new rows to the Measurement_Data table, right?
I appreciate any advice
Ken
source to share
All in all, your plan sounds as good as it gets. I would change it a bit compared to other posts:
create table Measurement(
int MeasurementId not null auto_increment,
primary key(id)
);
create table MeasurementData(
int MeasurementId not null,
int MeasurementDataEntry not null,
double value not null,
primary key(MeasurementId, MeasurementDataEntry),
constraint fk_data foreign key(MeasurementId) references Measurement(MeasurementId)
)
As with LukLed, the MeasurementDataEnry will be an ordinal value, starting at 1 and increasing by one for each "data entry" for that MeasurementId. The component primary key should be sufficient, since with the stated objective, I see no real need for an additional surrogate key in MeasurementData.
Make sure your solution is sufficient for your problem. Are other columns required in any table (type of dimension when it happened who dunnit)? Is the amount of data always the same for each dimension, or can it change? If so, you need the number of dimensions of the data stored in the parent table (for faster retrieval, is the declaration a cost of redundancy?). Do you ever need to update your data and it will be a mess? These are just a bit rare and there may be other business rules that you need to consider in the future.
source to share
How about a simple one-table solution for this?
CREATE TABLE measurements
(
id INT AUTO_INCREMENT,
dataSetId INT,
measurementNumber INT,
measurementValue DOUBLE,
CONSTRAINT UNIQUE (dataSetId, measurementNumber)
);
If your problem is more complex than you describe, this should be adequate.
source to share
First, I would change the name of the "dimension" column to "measure-id", because that name is not intuitive. You can also add a sequential number to the Measurement-Data table to see the order in which measurements were taken and add a unique measurement ID and sequential number. By adding a sequential number, you can also limit the number of measurements in each test with a control limit.
source to share