Convert Unicode URL to ASCII

I am writing a PHP application that takes a URL from the user and then processes it by making multiple calls to binaries using system()

*. However, to avoid many of the complications that come with this, I am trying to convert a URL that can contain Unicode characters to ASCII characters.

Let's say I have the following URL:

https://täst.de:8118/news/zh-cn/新闻动态/2015/

      

There are two parts to handle here: the hostname and the path.

  • For the hostname, I can just call idn_to_ascii()

    .
  • However, I cannot just call urlencode()

    along the path, as each of the characters that should remain unmodified will also be converted ( news/zh-cn/新闻动态/2015/ -> news%2Fzh-cn%2F%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81%2F2015%2F

    as opposed to news/zh-cn/%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81/2015/

    ).

How should I approach this problem?


* I'd rather not handle the calls system()

and the resulting complexity, but given that the functionality is only available when calling binaries, I unfortunately have no choice.

+3


source to share


3 answers


You can use the following for this transformation:



function convertpath ($path) {
  $path1 = '';
  $len = strlen ($path);
  for ($i = 0; $i < $len; $i++) {
     if (preg_match ('/^[A-Za-z0-9\/?=+%_.~-]$/', $path[$i])) {
       $path1 .= $path[$i];
     }
     else {
       $path1 .= urlencode ($path[$i]);
     }
  }
  return $path1;
}

      

0


source


split the url by /

then urlencode()

this part then put it back together



$url = explode("/", $url);
$url[2] = idn_to_ascii($url[2]);
$url[5] = urlencode($url[5]);
$url = join("/", $url);

      

+1


source


You can use PHP's iconv function :

inconv("UTF-8", "ASCII//TRANSLIT", $url);

      

0


source







All Articles