How is error handling done in Jcuda?

In CUDA, we can find out about errors by simply checking the return type of functions like cudaMemcpy (), cudaMalloc (), etc., which is cudaError_t with cudaSuccess. Is there any method available in JCuda for error checking for functions like cuMemcpyHtoD (), cuMemAlloc (), cuLaunchKernel (), etc.

+3


source to share


1 answer


First of all, JCuda methods (should) behave exactly like the corresponding CUDA functions: they return an error code in the form of int

. These error codes are also defined in ...

and are the same error codes as in the corresponding CUDA library.

All of these classes additionally have a static method called stringFor(int)

- for example, cudaError # stringFor (int) and CUresult # stringFor (int) . These methods return a human-readable representation String

of the error code.

So you can do manual error checks like:

int error = someCudaFunction();
if (error != 0= {
    System.out.println("Error code "+error+": "+cudaError.stringFor(error));
}

      

which can print something like



Error code 10: cudaErrorInvalidDevice

      

But...

... error checking can be a problem. You may have noticed in the CUDA samples that NVIDIA introduced some macros that make error checking easier. And in a similar way, I've added additional exception checks for JCuda: all libraries offer a static method called setExceptionsEnabled(boolean)

. When calling

JCudaDriver.setExceptionsEnabled(true);

      

then all subsequent calls to the driver API method will automatically validate the method return values ​​and throw a CudaException

when any error occurs.

(Note that this method exists separately for all libraries. For example, when using JCublas, the call will be JCublas.setExceptionsEnabled(true)

)

samples usually includes checking for exceptions at the beginning of the method main

. And I would recommend doing this as well, at least during the development phase. Once it is clear that the program does not contain any errors, it is possible to disable the exceptions, but there is hardly a reason for this: they provide clear information about what error occurred, otherwise the calls may fail.

+3


source







All Articles