Cuda function pointer contrast

I recently tried to use a function pointer to dynamically define multiple processing steps in my application running on sm_30.

It would be tricky to post the code here as there are many different files and functions related, but basically I started with a sample that was included in the Cuda Toolkit 5.0.

I allocate a device function buffer where I copy the device function pointer defined in the same way as in the example thanks to the cudaMemcpyfromsymbolAsync used with DeviceToDevice copy Kind.

My device pointer is defined this way in .cu.h:

//device function pointer model
typedef void (*func)(structGpuArgument*);

//Declaring a function
__device__ void gpuFunc1(structGpuArgument* arg1);

      

elsewhere I have a .cu that includes the previous declaration, which contains the following code:

//get the actual function pointer
__device__ func gpuFuncPtr = gpuFunc1;

//Buffer to store a list of function pointer
func* pFuncDevBuffer;
cudaMalloc(&pFuncDevBuffer,NB_FUNC*sizeof(func));

//copy the actual function pointer (symbol) to the list buffer 
cudaMemcpyFromSymbolAsync( pFuncDevBuffer+i ,gpuFuncPtr,sizeof(func),0,cudaMemcpyDeviceToDevice,stream)

//Launch the kernel that will use the functions
kernel_test<<<1,10,0,stream>>>(pFuncDevBuffer)
...

//defining the kernel that uses pointer buffer
__global__ void kernel_test(func* pFuncDevBuffer)
{
   printf("func address : %p\n",pFuncDevBuffer[0]);
   pFuncDevBuffer[0](NULL);
}

//defining the function pointed by the function pointer
__device__ void gpuFunc1(structGpuArgument* arg1)
{
     do_something;
}

      

In fact, everything works fine until the core of global , which takes the argument of the function device in the argument, defined in the same file as the function and its index. The kernel can then print out the address of the function (0x4) and execute its code without problems. I am not using separate compilation.

If, in the same instance of a program, a second core defined elsewhere takes the same function pointer buffer as an argument, it can print out the same memory address for the function pointer (0x4), but if it tries to execute it cannot issue illegal instruction at 0x00000000 in cuda-memcheck. Any other cuda API call hangs after I need to restart my computer (reset via cuda-smi is not supported on my gpu).

I would like to know if there is a known issue when using a function pointer this way, that is, using a function pointer buffer defined in another file, but using the same function pointer definition.

Also, if there is a need to reboot the device after a segfault without rebooting the entire system, it can help me save time while debugging my application.

thanks for the help

0


source to share





All Articles