Converting special characters to the alphabet

I have an excel file with a Name column that is in a different language. I need to convert these names to standard English.

Example:

HỒ ĐĂNG TẤN needs to be converted to HO DANG TAN.

NGUYỄN ĐỨC KIÊN - NGUYEN DUC KIEN

ĐOÀN THAN THHO-DOAN THI THANH

0


source to share


3 answers


Step 1: Open a Microsoft Visual Basic for Applications window.

Step 2: Click "Insert" → "Module" and enter the following macro in the module window.

Function StripAccent(thestring As String)
Dim A As String * 1
Dim B As String * 1
Dim i As Integer

Const AccChars= "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
Const RegChars= "SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"

For i = 1 To Len(AccChars)
A = Mid(AccChars, i, 1)
B = Mid(RegChars, i, 1)
thestring = Replace(thestring, A, B)
Next
StripAccent = thestring
End Function

      



Step 3: Then go to a blank cell and paste the formula into the cell:
=CheckStringCHAR(InString)


for example =CheckStringCHAR("ù")

or =CheckStringCHAR(A2)

.

The list of symbols is not exhaustive. more at http://en.wikipedia.org/wiki/List_of_Latin-script_letters#Letters_with_diacritics

+6


source


Just for fun, below is a more efficient version of the internet rounds function inserted above as an answer.

Examples of test results (100 loops of 10000 char lines). Times are milliseconds per call, which are fetched from QueryPerformanceTimer.

Old: Min: 57.6 ms, Mean: 65.4 ms



New: Min: 22.1ms, Average: 24.4ms

The performance gains come from not creating new copies of the string with each replacement, instead of replacing characters in place with the Mid $ operator.

Public Function StripAccent(ByVal txt As String) As String
    Dim i As Long, j As Long, n As Long
    Const c1 = "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
    Const c2 = "SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
    n = Len(c1)
    For i = 1 To n
        j = 0
        Do
            j = InStr(j + 1, txt, Mid$(c1, i, 1), vbBinaryCompare)
            If j > 0 Then Mid$(txt, j, 1) = Mid$(c2, i, 1) Else Exit Do
        Loop
    Next
    StripAccent = txt
End Function

      

0


source


Is there a way to update the script so that we can replace characters like ~ () = - with nothing? I know how to update it if there is a comparable character, but what if we just want to remove it? example: John-Paul becomes Jonpole

0


source







All Articles