A join between two many-to-many tables
I have some troubles in determining the appropriate database design for my Java based web application. I have 3 tables (tag, DT and property) and my scenario looks like this. Each tag can contain many Dts, and each DT can be assigned to different Tags, each DT has many properties, and each property can be used in many DTs. When a DT is assigned to a tag, the user can set values ββfor all of their properties.
My question is how can I define a relationship so that each tag can have its own values ββon the DT that is assigned to it. what are the missing tables and relationships in my design. Then I have to create relative entities based on this db.
Tag1==> DT1 ==> Initial values1, property values set 1 ;
Tag2==> DT1 ==> Initial values1, property values set 2 ;
Here is the db diagram. And thanks for your help in advance.
Here are some sample data
source to share
The central problem with your design, as I see it, is the use of surrogate keys. It is not necessary to always create numeric single column keys for tables. And even if you do this, it does not guarantee that you will not have duplicates. This essentially forces the system to keep more indexes on your tables, which is additional work to be done.
-
Some concepts:
- I use a single value for the table names because each tuple in the table represents one relation object;
- I use lowercase letters for table and column names (identifiers) and uppercase letters for keywords. I don't like CamelCase in databases;
- I am using
<table_name>_id
PK for columns. This is also possible because of the singularity of the table names; - I am using prefix + table name + details for all constraints and indexes I create.
You don't have to follow them, but it would be nice to stick to consistent naming patterns for your designs.
-
I would start with a dictionary
property_type
:CREATE TABLE property_type ( property_type varchar(20) NOT NULL, CONSTRAINT p_property_type PRIMARY KEY (property_type) );
It is a single column table that only exists to provide the domain of a possible value for property types. I used
varchar(20)
, text column is pretty good. The advantage of this is that you don't need to join this table with numeric keys to get the valueproperty_type_id=123
. -
Properties:
CREATE TABLE property ( property_id integer NOT NULL, property_name varchar(50) NOT NULL, property_type varchar(20) NOT NULL, CONSTRAINT p_property PRIMARY KEY (property_id), CONSTRAINT u_property_name UNIQUE (property_name), CONSTRAINT f_property_type FOREIGN KEY (property_type) REFERENCES property_type ON UPDATE CASCADE );
I decided to go with a numeric PK here, as I am assuming that you will want to rename properties at some point. And if you change
property_type
, the updates will be cascaded.Despite the fact that a PK already exists here, the restriction
UNIQUE
on names is mandatory here, otherwise you risk being in a situation where you have the same named properties with different identifiers. -
DT table:
CREATE TABLE dt ( dt_id integer NOT NULL, dt_name varchar(50) NOT NULL, CONSTRAINT p_dt PRIMARY KEY (dt_id), CONSTRAINT u_dt_name UNIQUE (dt_name) );
Again, a simple PK isn't enough by creating a constraint as well
UNIQUE
. Although I would very completely get rid ofdt_id
and leave onlydt_name
and do it PK. -
Properties for DT:
CREATE TABLE dt_property ( dt_id integer NOT NULL, property_id integer NOT NULL, initial_value varchar(50) NOT NULL, CONSTRAINT p_dt_property PRIMARY KEY (dt_id, property_id), CONSTRAINT f_dt_id FOREIGN KEY (dt_id) REFERENCES dt, CONSTRAINT f_property_id FOREIGN KEY (property_id) REFERENCES property );
This is the first big difference from your design - a composite key is used.
Yes, this means that you will need to execute 2 columns whenever you want to refer to the entry in this table. But it doesn't matter, in fact - you design table once, you also write your queries once, but your software can be used for quite a long time if executed correctly and easily maintained. Better to spend a little time entering queries and maintain systems easily in the long run.
-
Tags:
CREATE TABLE tag ( tag_id integer NOT NULL, tag_name varchar(50) NOT NULL, CONSTRAINT p_tag PRIMARY KEY (tag_id), CONSTRAINT u_tag_name UNIQUE (tag_name) );
It's just another dictionary. Again, just like a table
dt
, I would really like to avoid using a columntag_id
and justtag_name
make it PK as well. -
The new table
tag_dt
presents:CREATE TABLE tag_dt ( tag_id integer NOT NULL, dt_id integer NOT NULL, CONSTRAINT p_tag_dt PRIMARY KEY (tag_id, dt_id), CONSTRAINT f_tag_id FOREIGN KEY (tag_id) REFERENCES tag, CONSTRAINT f_dt_id FOREIGN KEY (dt_id) REFERENCES dt );
This table is required to create
dt
+ relationshipstag
. Without this, you have data duplication - you can see this in your diagram, you have 2 lines withTag_name='Tag1'
. -
Finally, the properties of the tag:
CREATE TABLE tag_property ( tag_id integer NOT NULL, dt_id integer NOT NULL, property_id integer NOT NULL, a_value varchar(50) NOT NULL, CONSTRAINT p_tag_property PRIMARY KEY (tag_id, dt_id), CONSTRAINT u_tag_property UNIQUE (tag_id, property_id), CONSTRAINT f_tag_property_tag FOREIGN KEY (tag_id, dt_id) REFERENCES tag_dt, CONSTRAINT f_tag_property_property FOREIGN KEY (dt_id, property_id) REFERENCES dt_property );
This table contains complete composite keys and meets all your requirements. Primary key
tag_id, dt_id
as well as foreign key to the tabletag_dt
so you want to represent something that hasn't been previously defined. Ittag_id, property_id
is then unique, so the properties of the tag cannot be duplicated. Finally, itdt_id, property_id
refers to a tabledt_property
, which means that only properties that are allowed to do so will be registereddt
.
Final Notes
All primary and unique keys are implemented through indexes in most DBMSs. In addition, some DBMSs may use composite (multi-column) indexes even if the first key column is omitted. At least PostgreSQL can do this, which is what I use the most.
Please take a look at the tables dt
and tag
I highly recommend getting rid of the key surrogate in them like property_type
.
I have not created any additional indexes, generally I do this exercise after the Data Model is implemented and some real queries are made against it.
Also, don't use value
or name
for column names. These are reserved words, and you may have unexpected consequences in future incarnations of your DBMS of choice.
source to share