CUDA WAN memory deletion problems in .NET.

I have a class (see example below) that acts as a .NET wrapper for a CUDA memory structure
allocated using cudaMalloc () and referenced using a member field of type IntPtr.
(The class uses DllImport of its own C DLL, which wraps various CUDA functions.)

The delete methods check if the pointer is IntPtr.Zero, and if not call cudaFree ()
which successfully frees memory (returns CUDA success)
and sets the pointer to IntPtr.Zero.

The finalize method calls the dispose method. The problem is if the finalize method is called from which dispose is called earlier,


then the cudaFree () function sets the error code "invalid device pointer".

I checked and the address that cudaFree () receives is the same address that was returned by cudaMalloc () and nopose () was called earlier.

When I add an explict call to delete (), the same address is released successfully.

The only workaround I found was not to call the dispose method from the finalizer, however this can cause a memory leak if dispose () is not always called.

Any ideas why this is happening? - I faced the same problem with CUDA 2.2 and 2.3 under .NET 3.5 SP1 on Windows Vista 64bit + GeForce 8800 and Windows XP 32bit + Quadro FX (not sure which number).

class CudaEntity: IDisposable
{
    private IntPtr dataPointer;

    public CudaEntity ()
    {
        // Calls cudaMalloc () via DllImport,
        // receives error code and throws expection if not 0
        // assigns value to this.dataPointer
    }

    public Dispose ()
    {
        if (this.dataPointer! = IntPtr.Zero)
        {
            // Calls cudaFree () via DllImport,
            // receives error code and throws expection if not 0

            this.dataPointer = IntPtr.Zero;
        }
    }

    ~ CudaEntity ()
    {
        Dispose ();
    }
}
{
    // this code works
    var myEntity = new CudaEntity ();
    myEntity.Dispose ();
}
{
    // This code cause a "invalid device pointer"
    // error on finalizer call to cudaFree ()
    var myEntity = new CudaEntity ();
}
+2


source to share


1 answer


The problem is that finalizers are executed on a GC thread, a CUDA resource allocated in one thread cannot be used in another. Scan CUDA Programming Guide:

Multiple host threads can execute device code on the same device, but by design, a host thread can only execute device code on one device. As a consequence, multiple host threads are required to execute device code across multiple devices. In addition, any CUDA resources created at runtime in one host thread cannot be used by runtime from another host thread.



Your best bet is to use a statement using

that ensures that the method is Dispose()

always called at the end of the protected code block:

using(CudaEntity ent = new CudaEntity())
{

}

      

+3


source







All Articles