Can I call OpenCL kernels one by one, on the same device buffers?
Suppose I am copying data to device using clEnqueueWriteBuffer
, and suppose the data is a buffer of values RGB
( unsigned char
s). I want to convert the image to grayscale first, working only on the input buffer (for example, overwriting the component R
), and then I want to resize the resulting image into the output buffer. I then used clEnqueueReadBuffer
to copy the output back to host memory.
Since I can't write a single core with all the logic (due to the inherent unordered nature of OpenCL's processing), I thought about using the sequence: clEnqueueWriteBuffer
- two clEnqueueNDRangeKernel
- clEnqueueReadBuffer
.
Is this approach correct? Where in the spec can I find more details on this?
source to share
If everyone is in the same command queue and the command queue is ok, then it works.
Queues in the queue execute all commands in order. Each team sees the results from the last team to it.
Here: https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clCreateCommandQueue.html
it says
For example, if an application calls clEnqueueNDRangeKernel to execute kernel A followed by clEnqueueNDRangeKernel to execute kernel B, the application may assume that kernel A ends first and then kernel B starts. If the memory objects outputted by kernel A are inputs to kernel B, then kernel B will see the correct data in the memory objects created when kernel A is executed. If
Note. Applying a grayscale after resizing can be more effective if you don't use the original grayscale and it's scaled down instead of scaling. Also you can do as in a single kernel if you only need a resized image. When the size of the resulting element selects a few pixels for the resulting pixel, you can apply grayscale to the resulting pixel.
If you are going to use both a grayscale with the original image and a reduced grayscale image, you can have two outputs (without altering the original image) and have two command queues in parallel to complete the whole work faster (if the kernel overhead is comparable to the kernel but this would require a sync point for both queues and can get slower for very small images (one queue must see a copy of the buffer from the other queue, and both must finish later to have two results) Two cores from two queues can use the same same buffer to read without any problem.
Note only to set the correct arguments in kernels before the commands are queued in the queue (there is no guarantee that they will not start immediately)
You can have as many kernel executions as you need, but setting the arguments is not a queue operation, so you need to take care at the beginning.
source to share