What is causing the split error in this CUDA core?
I recently noticed an error when launching our program on different hardware. I was able to track things down to the initial part of the kernel where the position within the grid was calculated using modulo (%) and the division operator (/). This is a minimal working example that reproduces the error:
#include <stdio.h>
__global__ void div_issue( int blocks_x, int* block_offset)
{
int blks_x = blocks_x;
//number of block 2d
int block_id_2d = block_offset[0];
//x-coordinate of block in absolute grid
int block_idx = block_id_2d % blks_x;
//y-coordinate of block in absolute grid
int block_idy = (block_id_2d - block_idx) / blks_x;
printf("%d mod %d = %d \n", block_id_2d, blks_x, block_idx);
printf("%d / %d = %d \n", block_id_2d - block_idx, blks_x, block_idy);
}
int main(int argc, char *argv[])
{
int dev_count;
cudaGetDeviceCount(&dev_count);
for (unsigned int i=0; i < dev_count; i++)
{
cudaSetDevice(i);
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, i);
printf("using device %s :\n\n", prop.name);
int block_offset_host[1];
block_offset_host[0] = 753;
int* block_offset_dev;
cudaMalloc(&block_offset_dev, sizeof(int));
cudaMemcpy(block_offset_dev, block_offset_host, sizeof(int), cudaMemcpyHostToDevice);
div_issue <<<1, 1 >>>( 251 , block_offset_dev);
cudaDeviceSynchronize();
printf("\n");
}
}
Result (on my machine with 2 GPUs):
using device GeForce GTX 980 Ti :
753 mod 251 = 0
753 / 251 = 4
using device GeForce GTX TITAN Black :
753 mod 251 = 0
753 / 251 = 3
I am using CUDA 7.0, Visual Studio 2012, 9.18.13.5306 WHQL on Windows 8.
I have no other hint that the 980Ti could be damaged by hardware. Can anyone confirm the issue on their own hardware?
So far, this doesn't seem to be happening in debug mode. No additional parameter was used for compilation, eg -use_fast_math
-prec-div=false
-prec-sqrt=false
.
source to share