Seek Opinion: Accent / Diacritical Marks in Primary Key

I have this application that uses natural primary keys. The database uses the WE8ISO8859P15 character set. So my City whe table has primary keys like "MEDELLÍN" and "MÜNCHEN". I have a suspicion that we will have a lot of problems with this.

The problems that I see

  • The interaction of this data with databases with a different character set. I dont want character set conversion on my primary key
  • When dumping data into files and processing these files, we should always be aware of special characters and client settings

Should we allow diacritical marks in PC? Please do not hesitate to give your opinion.

+2


source to share


6 answers


Like you, I feel like he is really looking for problems to allow them.

In addition to the problems you mentioned, this could be:

  • Imagine switching to a different database vendor ...



I don't know if introducing a primary key is a surrogate primary key for you, but it might be the right time to do it ;-) ...

If not, you can duplicate the column :

  • the pk column will not be case sensitive, has no special characters, etc.
  • an extra column will save what was entered by the user to show it in some UIs ...
+3


source


Trying to ignore diacritics simply delays the inevitable. Yes, you could save some trouble in Eastern Europe. But you still cannot deal with the names of Greek cities. You need Unicode and then there is no point in skipping Manchen / Munich anymore; this is Munich.



However, the whole notion is that Brussels already has one name for a city, for example Brussels, and this Western Europe. Thus, they are fundamentally unusable for primary keys, no matter how you write them.

+5


source


Why not? You, the DB model, are no longer working, but why not introduce another source of problems?;)

More seriously, databases are improved with Unicode support, so there is no problem with keeping natural text (with all its oddities). Your problem is the "primary key". There are several ways to encode the same text (for example, you can have accented characters or diacritics with simple characters). This means that you can get two different keys for the same text.

There are many wrong reasons for using business keys like a PC, and no good ones. Do not do this. Bite the bullet and fix it. Fix it now. It will cost you less (even if it costs a lot) than not fixing it.

+4


source


Yes, you will have problems with these symbols. Exiting ASCII always causes problems. But when you do business outside of Britain and the US, you have no choice.

I don't see any particular problem with the Primary Key character set. When exporting, importing, interface or porting, you will have to take these symbols into account, whether they are part of your PC or not.

But they highlight the problem of natural key as primary key. It seems highly likely that someone will write, for example. Muenchen to later change it to München, which will of course cause the known update issue on PC.

+3


source


Whether your attribute (part) is a key or not has nothing to do with the problem.

You have problems converting a character set with ANY data traffic to / from this attribute anyway, whether it's a key or not.

Yes, to encode "correctly" and to have the best guarantee that your data will never get corrupted due to character set conversion problems, you will need the Unicode character set and one of its encodings.

I have some serious doubts about the table itself, by the way. What are you doing with Heidelberg, Germany and Heidelberg, South Africa? Oxford, Great Britain and Oxford, USA, where there is hardly even a state without one?

What information depends on this key? If not, then your table is more of a "variable type" than a "real table". In this case, you can just forget the table and make the cityname attributes just a plain string.

If you really need to create some "canonical spellings" for citynames when exporting data from a database, I would suggest trying to create a "phonetic lookup table" in which the "commonly used spellings" are linked to the "canonical spellings" you have to produce. However, expect serious effort to complete such tables.

In this case, in addition to the already mentioned München / Muenchen and Western / Greek alphabet problems, do not forget about the Liege / Luik / Luttich (Munich / Munich) problems.

+2


source


Things change their names or change names for them. Cities, Universities, Parks, People .. all are unusable as Primary Keys. A unique key perhaps? Or part of a unique key?

+1


source







All Articles