Intel OpenCL Compiler: Optimizes Structure Usage

Question

Intel OpenCL Compiler: Optimizes Structure Usage

I have a question about using structs in OpenCL on an Intel processor. My current kernel accesses two buffers using a struct like this:

struct pair {
    float first;
    float second;
};

inline const float f(const struct pair param) {
    return param.first * param.second;
}

inline const struct pair access_func(__global float const * const a, __global float const * const b, const int i) {
    struct pair res = {
            a[i],
            b[i]
    };
    return res;
}

// slow
__kernel ...(__global float const * const a, __global float const * const b)
{
 // ...

 x = f( access_func( a, b, i ) );

 // ...
}

When I change the kernel like this, it is much faster:

// fast
__kernel ...(__global float const * const a, __global float const * const b)
{
 // ...

 x = a[i] * b[ i ];

 // ...
}

Is there a way to let the Intel compiler do this optimization? Perhaps the NVIDIA compiler can do this as I don't see a difference in runtime on the GPU.

Thanks in advance!

+3

c ++ compiler-optimization struct intel opencl

Richard Schulze 03 june 17 at 10:32

source to share

1 answer

pmdj · Answer 1 · 2017-06-04T09:49:52+0000

The compiler cannot perform optimizations on the memory layout of your data, given that buffers are shared between the OpenCL device and the host and / or across multiple cores on the OpenCL device; the most efficient layout will depend on the kernel access patterns, and they can be different for each kernel.

You need to choose the right data memory location; this is one of the hardest parts of GPU programming. Check out the OpenCL optimization guides for each target you target to see which ones they prefer. Sometimes inefficient access patterns can be masked by copying from memory global

to memory local

and then from a local copy.

Intel OpenCL Compiler: Optimizes Structure Usage

More articles: