Sorting Chinese names in PHP

I have an array, each element contains a first and last name:

$input = [
  [
    'firstName' => 'foo',
    'lastName' => 'bar',
  ]
];

      

For most users, they are mainly represented by the Latin alphabet, but some of them are written in Chinese.

How do I sort this list of names using PHP?

I am also curious about the convention. I know in languages ​​that use the Latin alphabet, sometimes the first name comes first and other times the last. I'm curious if this situation is similar to tangerine, or if one usually prefers the other.

And finally, I'm curious if there is a difference between sorting names and sorting words, for example in a dictionary.

+3


source to share


1 answer


A really interesting question! Each character has a Unicode value. The big sort is done through this. Since Latin letters are in the ASCII range, these names always appear first. PHP asort

function is Unicode aware. Below is the entry for consideration:

$input = [
    [
        "firstName" => "一",
        "lastName"  => "風"
    ],
    [
        "firstName" => "이",
        "lastName"  => "정윤"
    ],
    [
        "firstName" => "Mari",
        "lastName"  => "M"
    ],
    [
        "firstName" => "三",
        "lastName"  => "火"
    ],
];

      

Summarize what I expect to see, assuming we are sorting by name:

  • Latin name first (Mari M)
  • Hanzi / kanji / hangeul next. I don't know what the meanings of these names are, so we have to find out.

Let's convert the first character of the first names to something numeric. Again, we're using Unicode for this conversion:

  • 一 - 0x4E00
  • 이 - 0xC774
  • M - 0x004D
  • 三 - 0x4E09

As such, I expect to see:

  • M


Here is my code using asort

:

$nameByFirst = [];
foreach( $input as $i )
{
    $nameByFirst[] = $i["firstName"]." ".$i["lastName"];
}
asort($nameByFirst);

      

And my print method:

$i = 1;
foreach( $nameByFirst as $name )
{
    echo $i.'.  '.$name."<br>";
    $i++;
}

      

And my conclusion:

  • Mari M
  • 一 風
  • 三 火
  • 이 정윤

My results, as you can see above, are fine. First Latin, then Hanji / Kanji, then Hangul. Unicode is the closest thing, I believe we can come up easily, so I love doing it. I'm not 100% sure about how Unicode assigns hanzi / kanji / hangeul values, but I'm willing to trust the order they provided, especially because of how simple it is.

0


source







All Articles