Aligning nested OpenCL loops
I am trying to use a GPU for some image processing. In my kernel function I have selected the "misalignment" exception as
The stream attempted to read or write data that does not match hardware that does not provide alignment. For example, 16-bit values must be aligned on 2-byte boundaries; 32-bit values at 4-byte boundaries, etc.
I have reduced the kernel code to just for loops, but I still have this problem. My shorthand kernel function:
__kernel void TestKernel(
global const uchar* iImage,
global uchar* oImage,
uint width,
uint heigth,
uchar dif,
float power)
{
uint y = get_global_id(0);
if (y >= heigth)
return;
for (uint x = 0; x< width; ++x){
for (uint i = 0; i < 5; ++i) {
uint sum = 0;
for (uint j = 0; j<5; ++j) {
sum += 3;
}
}
}
}
(the program throws an exception in the second loop)
I am using C ++ wrapper to call my kernel
kernel.setArg(iArg++, iImage);
kernel.setArg(iArg++, oImage);
kernel.setArg(iArg++, header.GetVal(header.Width));
kernel.setArg(iArg++, header.GetVal(header.Height));
kernel.setArg(iArg++, (unsigned char)10);
kernel.setArg(iArg++, saturation);
queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(header.GetVal(header.Height)), cl::NDRange(128));
oImage
and iImage
arecl::Buffer
saturation
float
header.GetVal()
returns int
I am using Visual Studio 2015 with CodeXL plugin and running the program on AMD Specter (Radion R7).
What could be causing this problem?
source to share
No one has answered this question yet
Check out similar questions: