CUDA kernel concurrent priority

I have two cores (A and B) that can run at the same time. I need core A to finish as soon as possible (in order to swap MPI for result). So I can execute them in one thread: A and then B.

However, there are multiple thread blocks in core A, so if I run A and B sequentially, the GPU is not fully utilized while A is running.

Is it possible to execute A and B at the same time with a higher priority A?

I am. e., I want thread blocks from core B to run only if core A has no .

As I understand it, if I run kernel A in one thread and the next line in the main code, start kernel B in another thread, I am not guaranteeing that the blocks of threads from B will not actually be executed first?

+3


source to share


1 answer


NVIDIA now provides a way to prioritize CUDA cores. This is a fairly new feature, so you need to upgrade to CUDA 5.5 to do this.

In your case, you are running kernel A

on a high priority kernel B

CUDA thread and running on a low priority CUDA thread. The function you probably want is . cudaStreamCreateWithPriority(..., priority)

  • To use this feature, you need a GPU with Compute 3.5 capabilities or higher. To check if priorities are supported on your GPU take a look cudaDeviceProp::streamPrioritiesSupported

    .
  • cudaDeviceGetStreamPriorityRange

    should tell you how many priority levels are available on your GPU. The syntax for is a cudaDeviceGetStreamPriorityRange

    bit impossible; it's worth looking into the CUDA manual to see how it works.



More detailed documentation on priority settings from the CUDA Runtime API manual :

cudaError_t cudaStreamCreateWithPriority(cudaStream_t *pStream, 
                                         unsigned int flags, int priority)
Create an asynchronous stream with the specified priority.

Parameters
pStream  = Pointer to new stream identifier 
flags    = Flags for stream creation. See cudaStreamCreateWithFlags for a list of 
           valid flags that can be passed 
priority = Priority of the stream. Lower numbers represent higher priorities. See  
           cudaDeviceGetStreamPriorityRange for more information about the 
           meaningful stream priorities that can be passed.

      

+3


source







All Articles