Delphi - converting string from UTF-8

I'm having a problem converting a UTF-8 encoded string to something used by delphi. The app is written in XE8 and is deployable on windows and OSX. The application uses the dll dll and dylib LimeLM API for windows and OSX, respectively. Everything works fine on Windows, the problem is converting strings returned from dylib library on OSX. I appreciate that all strings in and from dylib must be UTF-8 encoded. The limeLM function returns a PWideChar which I believe is UTF encoded. But it doesn't matter which function I use to try and convert the value to something useful in Delphi, all I get is garbage.

Here's the function:

class function TurboActivate.GetFeatureValue(featureName: String): String;
var
    value : PWideChar;
    FieldName : PWideChar;
    tmpStr : String;
begin

    {$IFDEF MSWINDOWS}
    FieldName := PwideChar(featureName);
    {$ENDIF}
    {$IFDEF MACOS}
    FieldName := PWideChar(UTF8Encode(featureName));
    {$ENDIF}


    value := GetFeatureValue(FieldName, nil);

    if (value = '') then
    begin
        raise ETurboActivateException.Create('Failed to get feature value.  the feature doesn''t exist.');
    end;
    {$IFDEF MSWINDOWS}
    Result := value;
    {$ENDIF}
    {$IFDEF MACOS}
    tmpStr :=  UTF8ToString(value);
    ShowMessage(tmpStr);
    tmpStr :=  UTF8ToWideString(value);
    ShowMessage(tmpStr);
    tmpStr :=  UTF8ToUnicodeString(value);
    ShowMessage(tmpStr);
    tmpStr :=  UTF8ToAnsi(value);
    ShowMessage(tmpStr);

    Result := TmpStr;
    {$ENDIF}

end; 

      

There is definitely a value to decode, value = '散 汤 湡 獤 杀 潯 汧 浥 楡 ⹬ 潣 潣 潣 潣 呖 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎呎

but tmpStr always contains' ?????????? c ?????? / '

Any help would be greatly appreciated.

+3


source to share


1 answer


Meaning = '散 汤 湡 獤 杀 潯 汧 浥 楡 ⹬ 潣 潣 潣 潣 呖 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎 呎

This indicates that you are interpreting 8-bit text, presumably UTF-8 encoded, as if it were UTF-16 encoded. Typically, when you see a UTF-16 string with Chinese characters, it is either correctly interpreted Chinese text, or it is misinterpreted 8-bit text.

When you correctly interpret this text as UTF-8, it is:

cedlands@googlemail.com 4CSA-7GFJ-YMW4-2VTF-II5Q-BNTA♥♦

      

I got it with this code:

  Writeln(TEncoding.UTF8.GetString(
    TEncoding.Unicode.GetBytes('散汤湡獤杀潯汧浥楡⹬潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4)));

      

Note that if you look at the byte array returned TEncoding.Unicode.GetBytes('散汤湡獤杀潯汧浥楡⹬潣m䌴䅓㜭䙇ⵊ䵙㑗㈭呖ⵆ䥉儵䈭呎́'#4)

, you will see that it contains null. So it's actually a null terminated string after the email address.

The problems start here:



value : PWideChar;
....
value := GetFeatureValue(FieldName, nil);

      

It actually GetFeatureValue

returns PAnsiChar

. And the payload is UTF-8 encoded if I'm interpreting you correctly.

So, you need to make the following changes:

  • Change the return type GetFeatureValue

    to PAnsiChar

    .
  • Change the type value

    to PAnsiChar

    .
  • Convert value

    to string with UnicodeFromLocaleChars

    or TEncoding.GetString

    .

It might look like this:

var
  Bytes: TBytes;
....
SetLength(Bytes, StrLen(value));
Move(value^, Pointer(Bytes)^, Length(Bytes));
str := TEncoding.UTF8.GetString(Bytes);

      

Now, for the data in question, which sets str

in cedlands@googlemail.com

. As mentioned above, the data contains a null terminator that cannot complete the string when it is mistakenly interpreted as UTF-16. That is, the text 4CSA-7GFJ-YMW4-2VTF-II5Q-BNTA♥♦

comes from a buffer overflow.

+6


source







All Articles