CUDA Stack size when using CUDA debug
I am using Visual Studio 2012 and got multiple kernels that crashed while executing code using CUDA Debugging. Some other kernels that execute the same code without any problem (on different generated numbers / data). I don't know if the kernel crashes when starting the program without CUDA Debugging as I am not getting any errors.
Mistake:
CUDA Debugger detected data Qaru on 120 threads. First thread: blockIdx = {2,0,0} threadIdx = {1,0,0} StackPointer = 0x00ffe9d0 StackLimit = 0x00ffea40
Looking in the documentation, I found how to increase the stack size (I also needed to increase the heap size):
//Increase memory limits
size_t size_heap, size_stack;
cudaDeviceSetLimit(cudaLimitMallocHeapSize,20000000*sizeof(double));
cudaDeviceSetLimit(cudaLimitStackSize,12928);
cudaDeviceGetLimit(&size_heap, cudaLimitMallocHeapSize);
cudaDeviceGetLimit(&size_stack, cudaLimitStackSize);
printf("Heap size found to be %d; Stack size found to be %d\n",(int)size_heap,(int)size_stack);
The default stack size was 6464, so I wanted to double it to see if there were any improvements. When I ran the program using the standard window debugger, the stack size returned cudaDeviceGetLimit(&size_stack, cudaLimitStackSize)
was 12928, as expected.
However, when I run the program with the CUDA debugger, it reports a stack size of 1024, not 12928. Why is that?
source to share
It seems like it was just a bug, I upgraded to CUDA 7.0 Release Candidate and stack allocation works well now.
If you have the same problem, please update the latest drivers / tools. CUDA 7.0 RC is only available to registered CUDA developers, you need to register on their website.
source to share