MySQL table primary keys

Hello,

I have several mysql tables currently using md5 hash as primary key. I usually generate a hash with the column value. For instante, let's say I have a table called "Artists" with fields for id, name, num_members, year. I am trying to make md5 ($ name) and use it with id.

I would like to know what are the disadvantages of this. Is it better to use integers with AUTO_INCREMENT? I tend to run away from this because it just isn't worth worrying about what was the last inserted ID, and what comes next, etc.

Can you give me some light on this?

Thank.

+2


source to share


5 answers


If you need a surrogate primary key , using the AUTO_INCREMENT field is better than the md5 hash because it is less bytes of data and databases are optimized for integer primary keys.

mysql_insert_id

can be used if you want the last inserted ID.



If you are generating the primary key as a hash of other columns, why not just use those other columns as a unique key and then join them?

Another question: what are the benefits of using md5 hash? I can't think of it.

+2


source


The first approach has one obvious drawback: if there are two artists with the same name, the primary number of keys will occur. Using an autoincrement INT column guarantees uniqueness.



Also, although very unlikely, there is a chance that MD5 hashes of different strings could collide (I seem to recall that the probability is 1 in 36 for cardinality 32).

+2


source


MD5 is not a true key in this case because it functionally depends on the name. This means that if you have two artists with the same name, you have duplicate "keys" for different entries. You could make this the real key by concatenating all the attributes together (and hoping the gods of probability won't send you a collision), or you could just save yourself some money and use an auto-incrementing identifier.

+2


source


It sounds like the way you are trying to use MD5 is not really buying you any benefit. If $ name is unique, why not just use name as the primary key? Computing an MD5 hash and using it as a key for something that is already unique is overkill.

On the other hand, if the "name" is not unique, then the MD5 hash will not be unique either, and therefore it is also meaningless.

You usually use an MD5 hash when you don't want to store the actual column value. For example, if you store passwords, you usually only store the MD5 hash of the password, not the password itself, so you can't see people's passwords just by looking at the contents of the table.

If you don't have unique fields, you're stuck doing something like auto-incrementing because it is at least guaranteed to be unique. If you are using the built-in SQL auto-increment, you will just need to get the latter one way or another. Alternatively, if you can get away with storing the unique counter locally in your application, this avoids using auto-incrementing, but is not necessarily viable for most applications.

+2


source


Advantages if you present IDs to clients (e.g. in a query string for a web form, although this is another no-no) ... it prevents users from guessing another one.

Personally, I use autoincrement without problems (moved the database to new servers and everything is fine)

0


source







All Articles