Cuda: mix C ++ and cuda code

My problem is this: I want to add cuda code to an already existing C ++ library and use my existing code as much as possible. To use polymorphism, I use template classes and template cores. Thus, everything is implemented in .cpp, .h and .cuh files. There is no .cu file and therefore nvcc is not used and the C ++ compiler throttles on <→> the kernel invocation syntax.

I've already seen [ How to separate CUDA core file from main .cpp file and [ How to call CUDA file from C ++ header file? but I cannot find any design that would solve my problem.

Files used:

main.cpp

Set up a bunch of existing classes, pass them to the CudaPrepare class, which combines them, and is responsible for preparing the data passed to the cuda code only with primitive types.

#include "CudaPrepare.h"
#include "CudaSpecificType1.h"
#include "A.h" //already existing classes 
#include "B.h" //already existing classes

void main()
{
    A a(...);
    B b(...);
    CudaSpecificType1 cudaType(...);
    CudaPrepare<CudaSpecificType> cudaPrepare(a, b, cudaType);
    cudaPrepare.run();

}

      

CudaSpecificType1.cuh

class CudaSpecificType1
{
protected:
/*
a few members
*/
public:
CudaSpecificType1(...) : /*initializations*/ {}
float polymorphicFunction(/*args*/); 
};

      

CudaPrepare.h

#include "A.h" //already existing classes 
#include "B.h" //already existing classes

template<typename T>
class CudaPrepare
{
protected:
const A& a;
const B& b;
const T& t;
public:
CudaPrepare(const A& a, const B& b, const T& t): A(a), B(b), T(t) {/*some initialization stuff*/}
void run() const
{
/*
data preparation : various discretizations,  sticking to primitive type only, casting to single precision etc...
*/

CudaClass<T> cudaClass(t, /*all the prepared data here*/);
cudaClass.run();

}
};

      

CudaClass.cuh

template <typename T>
__global__ void kernel(const T t, /*other args*/, float* results)
{
int threadId = ...;
results[threadId] = t.polymorphicFunction(...);

}



template<typename T>
class CudaClass
{
protected:
const T& t;
/*
all the prepared data with primitive types
*/
public:
CudaClass(const T& t, ...) : t(t) /*other initialization*/ {}
void run() const
{
/*
grid size calculation, cuda memory allocation, data transfer to device...
*/
//kernel invocation
kernel<T><<</*grid & block size*/>>>(/*args*/);
/*
clean up with cudaFree(...);
*/
}
};

      

C ++ compiler gives error when calling kernel as expected. CudaClass :: run () cannot be moved to a .cu file because the template is templated. The only thing I can think of is injecting a .cu file replacing main.cpp / or containing a stub that will be called from main.cpp, but then nvcc cannot handle some C ++ 11 functions. In particular, Ah and Bh contain many enum classes ...

+3


source to share


1 answer


I experimented with Cuda 7.0 (was 6.5 earlier). Unfortunately, there is still no support (at least) for the following C ++ 11 features:

  • enum classes

  • final keyword

  • range based on cycles

However, as Robert Crovella suggested, explicitly implementing the pattern solves the problem.

CudaClass.cuh should be split into two parts:



CudaClass.cuh

template <typename T>
__global__ void kernel(const T t, /*other args*/, float* results)
{
int threadId = ...;
results[threadId] = t.polymorphicFunction(...);

}



template<typename T>
class CudaClass
{
protected:
const T& t;
/*
all the prepared data with primitive types
*/
public:
CudaClass(const T& t, ...) : t(t) /*other initialization*/ {}

void run() const;

};

      

CudaClass.cu

#include "CudaClass.cuh"



//explicit instantiation, so that the kernel invocation can be in a .cu file
template class CudaClass<CudaSpecificType1>;
/*
other explicit instantiations for various types
*/



template<typename T>
void run() const
{
/*
grid size calculation, cuda memory allocation, data transfer to device...
*/
//kernel invocation
kernel<T><<</*grid & block size*/>>>(/*args*/);
/*
clean up with cudaFree(...);
*/
}

      

0


source







All Articles