2D arrays in CUDA
I've read a lot about working with 2D arrays in CUDA and I think this is necessary to flatten it out before sending it to the GPU. How can I allocate a 1D array to the GPU and get it as a 2D array to the GPU? the code looks like this:
__global__ void kernel( int **d_a )
{
cuPrintf("%p",local_array[0][0]);
}
int main(){
int **A;
int i;
cudaPrintfInit();
cudaMalloc((void**)&A,16*sizeof(int));
kernel<<<1,1>>>(A);
cudaPrintfDisplay(stdout,true);
cudaPrintfEnd();
}
This is how I fixed the problem I cudaMalloc in the usual way, but when sending a pointer to the kernel, I cast it to int (*) [col] and it works for me
There is really no need to "flatten" your 2D array before using it on the GPU (although this can speed up memory access). If you need a 2D array, you can use something like the cudaMallocPitch
one described in the CUDA C programming guide. I believe the reason your code is not working is because you are only malloc
editing a 1D array - A [0] [0] does not exist. If you look at your code, you made a 1D array of int
s, not int*
s. If you want to malloc a flattened 2D array, you can do something like:
int** A;
cudaMalloc(&A, 16*length*sizeof(int*)); //where length is the number of rows/cols you want
And then in your kernel use (to print a pointer to any element):
__global__ void kernel( int **d_a, int row, int col, int stride )
{
printf("%p", d_a[ col + row*stride ]);
}