Meta tables in MySQL

I am rewriting a system that is currently linked to a MySQL database of about 1GB. There are hundreds of thousands of articles, each with a list of contributors (think Wiki style). I have not yet been granted access to the existing database schema, but while I wait, I am brainstorming a bit.

Basically, I'm wondering if the table has an article_contributors

efficient way of handling this, or if there is a better way to approach this situation. Given that there are about 200,000 articles, if there are 5 contributors to each of them, that would be 1,000,000 rows in the meta table.

+3


source to share


2 answers


I would call this a one-to-many table, not a meta. Or even a multi-valued attribute.

Keeping contributors in a separate table, one per row, is the correct way to create a relational database. There may be other ways of storing data, but they are not relational.

Consider my answer to Is storing delimited list in a database column really that bad? Keeping contributors as a list in an article table causes many common SQL queries to break down or become terribly ineffective. If you need to make a lot of queries against this data, you will be grateful for keeping it normal.



On the other hand, if you never ask for anything other than the contributor list as an indivisible item, then why not keep it denormalized (like a list)? This is the right choice - but it depends on how you are going to use the table.

By the way, 1 million rows is not a large MySQL database by some standards. This week I am consulting a client who has a table with 900 million rows.

+1


source


Interest Ask!

You will need to see the diagram to get a direct answer about this. This is because the outline likely embodies some of the major decisions made by experts in bibliography (reference librarians, etc.).

If you try to use join table ( articles_contributors

) so that you can list a given contributor repeatedly when it contributes to multiple articles, you implicitly state that you can create a canonical contributor list using contributor_id

for each individual.

In the world of bibliography and library science, such a list is called "controlled lexicon". It is controlled by "authority". (Read this: http://en.wikipedia.org/wiki/Authority_control ) So some organization is responsible for deciding if this "Jane Smythe" is another person from this "Jane Smith. It's surprisingly difficult to get it right with people.

For an example of relatively simple controlled vocabulary, see the North American Industrial Classification System (NAICS). It has a code for every single industry. http://www.census.gov/eos/www/naics/ It is overseen by National Committees in three countries. Many bibliographic databases covering industry include these terms as one way of classifying their content.



The system designers you are about to tackle will be making decisions about these controlled vocabularies. Will they have one for the members? You can wait and see, or you might ask. But one thing's for sure: bibliographic designers won't be too happy if you create such a controlled vocabulary of your own choosing.

The Library of Congress does not attempt to create a controlled list of authors and authors.

Edit

Once you have a final list of participants, it is recommended that you create a netlist articles_contributors

as you suggested. You should consider the following columns:

 article_id        primary key
 contributor_id    primary key
 role              primary key   values like ("author", "illustrator", "editor", etc)
 order             1, 2, 3  so contributors can be listed in proper order.
 contact           1 or 0  indicating whether readers should contact this author for more info.

      

+1


source







All Articles