Mysql diacritical insensitive search?
How to make diacritical insensitivity,
ex this is a Persian diacritical string
هواى بر آفتاب بارز
does not match deleted diacritic in mySql
هواى بر آفتاب بارز
Is there a way to tell mysql to ignore the diacritics or do I need to remove all the diacritics in my fields manually?
This is a bit like a case insensitive problem.
SELECT * FROM blah WHERE UPPER(foo) = "THOMAS"
Just compare both diacritical strings before comparing.
I am using utf8 (utf8_general_ci) and searching for arabica without diacritics does not work, it is not sensitive or does not work, but it does not work correctly.
I tried to look at a character with and without diacritics using Hex and it looks like mysql, treating it as two different characters.
I am thinking about using hex and replace (a lot of replacement) to search for words while filtering diacritics.
My solution to have an insensitive search for Arabic words:
SELECT arabic_word FROM Word
WHERE
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(HEX(REPLACE(
arabic_word, "-", "")), "D98E", ""), "D98B", ""), "D98F", ""), "D98C",
""),"D991",""),"D992",""),"D990",""),"D98D","") LIKE ?', '%'.$search.'%'
the values formatted in hex are the diacritics that we want to filter out. ugly but I haven't found another underserver.
Have you already read all MySQL Character Set Support to see if your question is answered? Comparisons should be especially understood.
My guess is that using utf8_general_ci might do the right things for you
Customization
set names 'utf8'
usually does the trick for Latin searches before executing the query. I'm not sure if this works for Arabic as well.
The cleanest solution I have come to is:
SELECT arabic_word
FROM Word
WHERE ( arabic_word REGEXP '{$search}' OR SOUNDEX( arabic_word ) = SOUNDEX( '{$search}' ) );
I have not tested the cost of the SOUNDEX function. I guess this is possible for small tables, but not for large datasets.