Does Charset.Unicode in DllImport indicate line highlighting?

[DllImport("foo.dll", CharSet = CharSet.Unicode)]
static extern void Process_utf16(string text, int text_length);

      

The native function is written to get utf-16 data (which is also used to use .net strings). It does not use string data after returning it. So I try to make sure the pointer to the string buffer is passed directly without any unnecessary allocation or copying.

In this declaration, the pointer to the string buffer is passed without allocation? Or is a temporary buffer allocated and the line copied to it? If allocation occurs, is it on a native or managed heap? And who is responsible for freeing him?

Please note that the above code has been tested and works, I'm just trying to figure out if it doesn't require allocation and copying, and if so, how to avoid it.

+3


source to share


1 answer


The rules for marshaling strings are described in "Default marshaling for strings" . For native functions (cross-platform interface), the documentation states:

The invoke framework copies string arguments, converting them from .NET. Framework format (Unicode) to unmanaged platform format. Strings are immutable and are not copied from unmanaged memory to managed memory when the call returns.

However, as can be trivially established experimentally, it is not true if no transformation is required at all, i.e. the method is decorated CharSet.Unicode

or the string is explicitly marked as MarshalAs(UnmanagedType.LPWStr)

. In this case, a pointer to the contents of the string is passed directly. This sounds very efficient and it is, but it is also dangerous because there is nothing to stop the unmanaged function from changing the string passed to it. This is bad because .NET strings must be immutable and the code can depend on it. This is especially bad if it ends up overwriting the row pool.

trample.c

:

__declspec(dllexport) void __stdcall Trample(wchar_t* text) {
    memcpy(text, L"Adios", (sizeof L"Adios") - 2);
}

      

Program.cs

:

static class NativeMethods {
    [DllImport("trample.dll", CharSet = CharSet.Unicode)]
    public static extern void Trample(string text);
}

class Program {
    static void Main(string[] args) {
        Console.WriteLine("Hello, world!");
        NativeMethods.Trample("Hello, world!");
        Console.WriteLine("Hello, world!");
    }
}

      



Output:

Hello, world!
Adios, world!

      

Since it "Hello, world!"

is a string literal, all instances of it end up in the first line pool, and every time it is used we use the "same" string. Our unmanaged function overwrites this, so now when we think we are writing "Hello, world!"

in our managed code, we end up with something else instead. Unfortunately.

How to avoid this if you know that the unmanaged function is changing the string, you should pass it instead StringBuilder

. Here you can choose if you want to copy to be to, from, or with both (with InAttribute

/ OutAttribute

). This has to do with copying to / from buffers - in particular, it CoTaskMemAlloc

will be used to allocate memory for unmanaged code (and will be called when called CoTaskMemFree

). This code is called by the marshaller as part of the call; neither the managed subscriber nor the unmanaged called subscriber need to worry about this.

Calling a function expecting an ANSI string also includes a buffer allocation, but in this case the buffer is allocated using a command localloc

rather than CoTaskMemAlloc

, which is more efficient bunches. However, if you are actually going for efficiency, what you want to do is eliminate, if possible, unmanaged code, not just optimize the string passing. Even ignoring copying memory, there is quite a bit of overhead in managed / unmanaged transitions. If you find yourself calling unmanaged code in a loop, it is calculated to check if that code can be portable to managed code.

Source: coreclr/src/vm/ilmarshalers.cpp

in particular ILWSTRMarshaler::EmitConvertSpaceAndContentsCLRToNativeTemp

.

+1


source







All Articles