PHP filename encoding conversion error
I am trying to rename files to a folder from PHP. This mostly works, although I'm having issues with accented characters.
An example of a filename with accented characters is ÅRE_GRÖN.JPG
.
I would like to rename this file to ARE_GRON.JPG
.
If I read the files like this:
<?php
$path = __DIR__;
$dir_handle = opendir($path);
while ($file = readdir($dir_handle)) {
echo $file . "\n";
}
closedir($dir_handle);
... The page is displayed ÅRE_GRÖN.JPG
.
If I add header('Content-Type: text/html; charset=UTF-8');
to the beginning of my script it displays the correct filename, but the function rename()
doesn't seem to have any effect.
Here's what I've tried:
while ($file = readdir($dir_handle)) {
rename($file, str_replace('Ö', 'O', $file)); # No effect
rename($file, str_replace('Ö', 'O', $file)); # No effect
}
Where am I going wrong?
Tell me if you think I am using the wrong tool for the job. If anyone knows how to achieve this with a Bash script, show me. I don't have Bash snippets.
source to share
I figured out how to do it.
I first ran urlencode()
through the filename. This will convert the string:
MÖRKGRÅ.JPG
To the url:
MO%CC%88RKGRA%CC%8A.JPG
Then I ran str_replace()
in a url-encoded string providing needles and haystacks in arrays. I only needed this for a few Swedish characters, so my solution looked like this:
<?php
header('Content-Type: text/html; charset=UTF-8');
$path = __DIR__;
$dir_handle = opendir($path);
while ($file = readdir($dir_handle)) {
$search = array('A%CC%8A', 'A%CC%88', 'O%CC%88');
$replace = array('A', 'A', 'O');
rename($file, str_replace($search, $replace, urlencode($file)));
}
closedir($dir_handle);
The task is completed :)
I realized that this is more versatile than I expected. Running another script url_encode()
gave me a slightly different result, but it's easy to change accordingly.
$search = array('%26Aring%3B', '%26Auml%3B', '%26Ouml%3B', '+');
$replace = array('A', 'A', 'O', '_');
source to share
If you have a limited number of characters that you want to replace, you can do so with
for f in *; do mv "$f" "${f//Ö/O/}" 2> /dev/null; done
In GNU, you can usually use
expr=""
for char in {A..Z}
do
expr+="s/[[=$char=]]/$char/g; ";
done;
for f in *; do
mv "$f" "$(sed -e "$expr" <<< "$f")" 2> /dev/null;
done
replace all A-like accented characters with ascii A for every character in the alphabet, but without any guarantees for OS X sed. Beware that this has the side effect of capitalizing all filenames.
source to share