Convert Unicode URL to ASCII
I am writing a PHP application that takes a URL from the user and then processes it by making multiple calls to binaries using system()
*. However, to avoid many of the complications that come with this, I am trying to convert a URL that can contain Unicode characters to ASCII characters.
Let's say I have the following URL:
https://täst.de:8118/news/zh-cn/新闻动态/2015/
There are two parts to handle here: the hostname and the path.
- For the hostname, I can just call
idn_to_ascii()
. - However, I cannot just call
urlencode()
along the path, as each of the characters that should remain unmodified will also be converted (news/zh-cn/新闻动态/2015/ -> news%2Fzh-cn%2F%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81%2F2015%2F
as opposed tonews/zh-cn/%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81/2015/
).
How should I approach this problem?
* I'd rather not handle the calls system()
and the resulting complexity, but given that the functionality is only available when calling binaries, I unfortunately have no choice.
+3
source to share
3 answers
You can use the following for this transformation:
function convertpath ($path) {
$path1 = '';
$len = strlen ($path);
for ($i = 0; $i < $len; $i++) {
if (preg_match ('/^[A-Za-z0-9\/?=+%_.~-]$/', $path[$i])) {
$path1 .= $path[$i];
}
else {
$path1 .= urlencode ($path[$i]);
}
}
return $path1;
}
0
source to share