Why are GPU threads in CUDA and OpenCL highlighted in the grid?
This is the general approach used in CUDA, OpenCL, and I think ATI thread.
The idea behind the grid is to provide a simple yet flexible mapping between the data being processed and the threads performing the data processing. In the simple version of the GPGPU runtime model, one GPU thread is "allocated" for each output element in a 1D, 2D, or 3D data grid. To process this output element, the stream will read one (or more) elements from the corresponding location or adjacent locations in the input data grid (s). By organizing streams in a grid, it is easier for streams to determine which elements of the input to read and where to store the output.
This is in contrast to the general multi-core CPU threading model, where one thread is mapped to a CPU core and each thread handles many input and output elements (e.g. 1/4 of the data in a quad-core system).
source to share
I will refer to the classic analogy of placing a square pin in a circular hole. Well, in this case the GPU is a very square hole, not the rounded one the GP (general purpose) suggested.
The above explanations outlines ideas for 2D textures, etc. The architecture of the GPU is such that all processing is performed in threads with the same pipeline in each thread, so the processed data must be segmented like this.
source to share
One of the reasons why this is a good API is that you usually work with an algorithm that has multiple nested loops. If you have one, two, or three loops, a one, two, or three dimensional grid displays the problem well, giving you a flow for the value of each index.
The values you need in your kernel (index values) are naturally expressed in the API.
source to share