Mysql diacritical insensitive search?
I am using utf8 (utf8_general_ci) and searching for arabica without diacritics does not work, it is not sensitive or does not work, but it does not work correctly.
I tried to look at a character with and without diacritics using Hex and it looks like mysql, treating it as two different characters.
I am thinking about using hex and replace (a lot of replacement) to search for words while filtering diacritics.
My solution to have an insensitive search for Arabic words:
SELECT arabic_word FROM Word
WHERE
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(HEX(REPLACE(
arabic_word, "-", "")), "D98E", ""), "D98B", ""), "D98F", ""), "D98C",
""),"D991",""),"D992",""),"D990",""),"D98D","") LIKE ?', '%'.$search.'%'
the values formatted in hex are the diacritics that we want to filter out. ugly but I haven't found another underserver.
source to share
Have you already read all MySQL Character Set Support to see if your question is answered? Comparisons should be especially understood.
My guess is that using utf8_general_ci might do the right things for you
source to share
The cleanest solution I have come to is:
SELECT arabic_word
FROM Word
WHERE ( arabic_word REGEXP '{$search}' OR SOUNDEX( arabic_word ) = SOUNDEX( '{$search}' ) );
I have not tested the cost of the SOUNDEX function. I guess this is possible for small tables, but not for large datasets.
source to share