Why are GPU threads in CUDA and OpenCL highlighted in the grid?

I am just learning OpenCL and I am at the point where I am trying to start the kernel. Why are GPU threads driven in a grid?

I will go into this in detail, but it would be nice with a simple explanation. Does this always happen when working with GPGPU ?

+2


source to share


4 answers


This is the general approach used in CUDA, OpenCL, and I think ATI thread.

The idea behind the grid is to provide a simple yet flexible mapping between the data being processed and the threads performing the data processing. In the simple version of the GPGPU runtime model, one GPU thread is "allocated" for each output element in a 1D, 2D, or 3D data grid. To process this output element, the stream will read one (or more) elements from the corresponding location or adjacent locations in the input data grid (s). By organizing streams in a grid, it is easier for streams to determine which elements of the input to read and where to store the output.



This is in contrast to the general multi-core CPU threading model, where one thread is mapped to a CPU core and each thread handles many input and output elements (e.g. 1/4 of the data in a quad-core system).

+5


source


The simple answer is that GPUs are designed to handle images and textures that are 2D grids of pixels. When you render a triangle in DirectX or OpenGL, the hardware rasterizes it into a grid of pixels.



+1


source


I will refer to the classic analogy of placing a square pin in a circular hole. Well, in this case the GPU is a very square hole, not the rounded one the GP (general purpose) suggested.

The above explanations outlines ideas for 2D textures, etc. The architecture of the GPU is such that all processing is performed in threads with the same pipeline in each thread, so the processed data must be segmented like this.

+1


source


One of the reasons why this is a good API is that you usually work with an algorithm that has multiple nested loops. If you have one, two, or three loops, a one, two, or three dimensional grid displays the problem well, giving you a flow for the value of each index.

The values ​​you need in your kernel (index values) are naturally expressed in the API.

0


source







All Articles