OpenCL: one program with one multiple devices

Question

OpenCL: one program with one multiple devices

I already found this OpenCL: Running multiple cpu / gpu devices .

But I have questions (3) how to run the program on multiple devices. Is the recipe as follows? (Q1)

create the devices you want to use.
Create a context for each device.
for each call to the clBuilProgram context to create a program
for each call to the clCreateCommandQueue program to create one command queue for each context
for each context and for each call to the clCreateBuffer function parameter.

or should I combine CommandQueues. (Q2)

Does anyone have a sample code or link to a tutorial? (Q3)

+3

opencl

user1235183 May 13 '15 at 14:51

source to share

2 answers

If all the devices you are targeting come from the same platform then @ Lee's answer is great (e.g. AMD GPU + CPU or Intel GPU + CPU). If you expect to have a target mix of platforms (such as combining Nvidia GPUs with AMD and CPU GPUs), then your contexts cannot move from one platform to another - at least you will need one context for each platform. >

The options I see are as follows:

One device per context. Synchronization between devices requires copying to host memory.
Multiple devices in the same context using only one platform. This can facilitate communication between devices in the same context.
Multiple devices from one platform in one context, one context per platform. Allows you to use multiple platforms at the same time, giving you the benefits of having multiple devices in the same context.

Option 3 is a little confusing about work allocation because you have two levels of work to be divided into - between contexts / platforms and between devices. Option 1 is IMHO, the easiest way to access every OpenCL device on a computer, regardless of their platform. Option 2 is really worth it if you are guaranteed to always run on devices from the same vendor (i.e. all devices on the same platform). This assumption breaks down pretty quickly if you are using GPU + CPU at the same time.

Once you have processed the above three parameters, you will need at least one command queue for each device . You will need to compile your OpenCL kernels for each group of identical devices. Each generation of GPUs is different from each vendor. At the very least, you can get macros that have different definitions from one device to another. In the worst case, you can have different algorithms from one device to another (easier to handle if you use option 1 above).

0

chippies May 17 '15 at 15:08

source to share

Lee · Accepted Answer · 2015-05-14T21:04:17+0000

A single context is created that contains all devices. The context construct takes a list of devices. You compile the program once for context. You call clBuildProgram or clCompileProgram and clLinkProgram once per program, listing all devices or not listing any devices, and letting them build for all in context. Create a command queue for each device in context. Create a buffer for each array you want to access. If you want to process different parts of the array on different devices, you can either create two buffers or use sub-buffers to split it into sections.

If you do not agree with the same program intended for all devices and want for further optimization, you can create a separate program for each device, or create a program once and call clCompileProgram separately for each device, passing in macros.

OpenCL: one program with one multiple devices

More articles: