Remove Arabic diacritic
I want php to convert this ...
Text : الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
converted to : الحمد لله رب العالمين
I'm not sure where to start or how to go about it. I don't know at all. I did some research, found this link http://www.suhailkaleem.com/2009/08/26/remove-diacritics-from-arabic-text-quran/ but it doesn't use php. I would like to use php and convert the above text to converted text. I want to remove any diacritical mark from user input of Arabic text
Diacritical characters in Arabic combine characters , which means that it is enough to find them easily. It is not necessary to have a substitution rule for every possible consonant with every possible vowel, which is a bit tedious.
Here's a working example that outputs what you need:
header('Content-Type: text/html; charset=utf-8', true);
$string = 'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ';
$remove = array('ِ', 'ُ', 'ٓ', 'ٰ', 'ْ', 'ٌ', 'ٍ', 'ً', 'ّ', 'َ');
$string = str_replace($remove, '', $string);
echo $string; // outputs الحمد لله رب العالمين
What is the meaning of the array $remove
. It looks weird because there is a combination character between the quotes '
, so it changes one of those single quotes. This may need to be saved in the same character encoding as your text.
try it:
$string = 'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ';
$string = preg_replace("~[\x{064B}-\x{065B}]~u", "", $string);
echo $string; // outputs الحمد لله رب العالمين
I don't speak Arabic, but I think you can make some character set in the alphabet:
function remap($string) {
$remap = [
'ą' => 'a',
'č' => 'c',
/* ... Arabic alphabet remap */
];
return str_replace(array_keys($remap), $remap, $string);
}
echo remap('ąčasdadfg'); // => acasdadfg
Try this code, it works fine:
$unicode = [
"~[\x{0600}-\x{061F}]~u",
"~[\x{063B}-\x{063F}]~u",
"~[\x{064B}-\x{065E}]~u",
"~[\x{066A}-\x{06FF}]~u",
];
$str = preg_replace($unicode, "", $str);
Arabic unicode