Aligning nested OpenCL loops

I am trying to use a GPU for some image processing. In my kernel function I have selected the "misalignment" exception as

The stream attempted to read or write data that does not match hardware that does not provide alignment. For example, 16-bit values ​​must be aligned on 2-byte boundaries; 32-bit values ​​at 4-byte boundaries, etc.

I have reduced the kernel code to just for loops, but I still have this problem. My shorthand kernel function:

__kernel void TestKernel(
    global const uchar* iImage, 
    global uchar* oImage, 
    uint width,
    uint heigth, 
    uchar dif,
    float power)
{
   uint y = get_global_id(0);

    if (y >= heigth) 
        return; 

    for (uint x = 0; x< width; ++x){
        for (uint i = 0; i < 5; ++i) {
            uint sum = 0;
            for (uint j = 0; j<5; ++j) {
                sum += 3;
            }
        }

    }   
}

      

(the program throws an exception in the second loop)

I am using C ++ wrapper to call my kernel

kernel.setArg(iArg++, iImage);
    kernel.setArg(iArg++, oImage);
    kernel.setArg(iArg++, header.GetVal(header.Width));
    kernel.setArg(iArg++, header.GetVal(header.Height));
    kernel.setArg(iArg++, (unsigned char)10);
    kernel.setArg(iArg++, saturation);


    queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(header.GetVal(header.Height)), cl::NDRange(128));

      

oImage

and iImage

arecl::Buffer

saturation

float

header.GetVal()

returns int

I am using Visual Studio 2015 with CodeXL plugin and running the program on AMD Specter (Radion R7).

What could be causing this problem?

+3


source to share





All Articles