Nvidia OpenCL hangs when blocking buffer access

Question

Nvidia OpenCL hangs when blocking buffer access

I have an OpenCL program that copies a bunch of values into an input buffer, processes those values, and copies the results.

// map input data buffer, has CL_MEM_ALLOC_HOST_PTR
cl_float* data = clEnqueueMapBuffer(queue, data_buffer, CL_TRUE, CL_MAP_WRITE, 0, data_size, 0, NULL, NULL, NULL);

// set input values
for(size_t i = 0; i < n; ++i)
    data[i] = values[i];

// unmap input buffer
clEnqueueUnmapMemObject(queue, data_buffer, data, 0, NULL, NULL);

// run kernels
...

// map results buffer, has CL_MEM_ALLOC_HOST_PTR
cl_float* results = clEnqueueMapBuffer(queue, results_buffer, CL_TRUE, CL_MAP_READ, 0, results_size, 0, NULL, NULL, NULL);

// processing
...

// unmap results buffer
clEnqueueUnmapMemObject(queue, results_buffer, results, 0, NULL, NULL);

(In real code, I check for errors, etc.)

This works great on AMD and Intel architectures (both CPU and GPU). On Nvidia GPUs, the code is incredibly slow. It usually takes 10 seconds for the program to run (5 seconds host, 5 seconds device) will run over two and a half minutes on Nvidia cards.

However, I found that this is not a simple optimization problem or zero-copy speed difference. Using a profiler, I can see that the program host time is 5 seconds, as is the usual case. And using the OpenCL profiling events, I can see that the device time is also 5 seconds, as usual!

So I used the poor mans 'profiler' to figure out where the program spends its time on Nvidia GPUs. And this shows that the program is just waiting idly on both calls clEnqueueMapBuffer

. I find this especially incomprehensible in the first case, since at this point the queue is empty.

Again, I've profiled every map / unmap and kernel call and the extra time doesn't show up there, so it's not wasted on the device, nor on the host. I can see from the stack profile that it waits instead of a semaphore. Does anyone know what is causing this?

+3

opencl gpu gpgpu nvidia

nes June 26. 15 at 10:53

source to share

No one has answered this question yet

Check out similar questions:

15

How to use pinned memory / memory stick in OpenCL

eleven

Exclusive Computing Mode with OpenCL + NVidia

8

OpenCL FFT on both Nvidia and AMD devices?

1

OpenCL Sub Buffer Node Pointer

1

ClEnqueueMapBuffer problems in OpenCL

1

HyperQ support in OpenCL

1

How do I create a read-only storage buffer across multiple devices in OpenCL?

0

Opencl kernel runtime

0

clEnqueueMap takes a long time, almost equal to clEnqueuwritebuffer

0

Use OpenCL Buffer in Host and Kernel Simultaneously

Nvidia OpenCL hangs when blocking buffer access

More articles: