Cuda error of illegal memory access when using indexes of arrays stored in another array

I am using cuda fortran and I was struggling with this problem in one simple kernel and I couldn't find a solution. Is it not possible to use integer values โ€‹โ€‹stored in an array as indices for another array?

Here's a simple example (edited in the same way as the main program):

program test
  use cudafor

  integer:: ncell, i
  integer, allocatable:: values(:)
  integer, allocatable, device :: values_d(:)


  ncell = 10

  allocate(values(ncell), values_d(ncell))

  do i=1,ncell
        values(i) = i
  enddo


  values_d = values


  call multipleindices_kernel<<< ncell/1024+1,1024 >>> (values_d,
 + ncell)

  values = values_d

  write (*,*) values


  end program test

!////////////////////////////////////////////////////

attributes(global) subroutine multipleindices_kernel(valu, ncell)
use cudafor
  implicit none
  integer, value:: ncell   ! ncell = 10
  integer :: valu(ncell)
  integer :: tempind(10)
  integer:: i

  tempind(1)=10
  tempind(2)=3
  tempind(3)=5
  tempind(4)=7
  tempind(5)=9
  tempind(6)=2
  tempind(7)=4
  tempind(8)=6
  tempind(9)=8
  tempind(10)=1

  i = (blockidx%x - 1 ) * blockdim%x + threadidx%x

  if (i .LE. ncell) then
        valu(tempind(i))= 1
  endif


  end subroutine

      

I understand that if there were repeated values โ€‹โ€‹in the tempind array, then different threads could access the same memory location for reading or writing, but this is not the case. Although it gives the error "0: copying Memcpy (host = 0x303610, dev = 0x3e20000, size = 40) FAILED: 77 (illegal memory access was encountered)."

Does anyone know if it is possible to use these indices coming from another array in cuda?

After some additional tests, I noticed that the problem does not occur during the launch of the kernel itself, but when transferring data back to the CPU (if I remove "values โ€‹โ€‹= values_d", then no errors will be visible). Also, if I replace the kernel (tempind (i)) value with value (i) it works fine, but I want the indices to come from an array as the purpose of this test is to do parallelization of the CFD code where the indices are stored So.

+3


source to share


1 answer


The problem is that the generated executable is not passing the correct variable ncell

. Running the application through cuda-memcheck

shows that threads outside 1-10 are going through the branch instruction, and adding a print statement to print ncell

inside the kernel also gives weird answers.

It was required that all subroutines attributes(global)

should be inside the module. This requirement seems to have been relaxed in later versions of CUDA Fortran (I cannot find references to it in the programming manual). I believe the code outside the module is causing the error here. By putting multipleindices_kernel

in a module and using that module in test

, I am consistently getting correct answers without errors. The code for this is below:



module testmod
contains
attributes(global) subroutine multipleindices_kernel(valu, ncell)
  use cudafor
  implicit none
  integer, value:: ncell   ! ncell = 10
  integer :: valu(ncell)
  integer :: tempind(10)
  integer:: i

  tempind(1)=10
  tempind(2)=3
  tempind(3)=5
  tempind(4)=7
  tempind(5)=9
  tempind(6)=2
  tempind(7)=4
  tempind(8)=6
  tempind(9)=8
  tempind(10)=1

  i = (blockidx%x - 1 ) * blockdim%x + threadidx%x

  if (i .LE. ncell) then
        valu(tempind(i))= 1
  endif


  end subroutine
end module testmod

  program test
  use cudafor
  use testmod

  integer:: ncell, i
  integer, allocatable:: values(:)
  integer, allocatable, device :: values_d(:)


  ncell = 10

  allocate(values(ncell), values_d(ncell))

  do i=1,ncell
        values(i) = i
  enddo


  values_d = values


  call multipleindices_kernel<<< ncell/1024+1,1024 >>> (values_d, ncell)

  values = values_d

  write (*,*) values


  end program test

!////////////////////////////////////////////////////

      

+1


source







All Articles