Cuda: mix C ++ and cuda code
My problem is this: I want to add cuda code to an already existing C ++ library and use my existing code as much as possible. To use polymorphism, I use template classes and template cores. Thus, everything is implemented in .cpp, .h and .cuh files. There is no .cu file and therefore nvcc is not used and the C ++ compiler throttles on <→> the kernel invocation syntax.
I've already seen [ How to separate CUDA core file from main .cpp file and [ How to call CUDA file from C ++ header file? but I cannot find any design that would solve my problem.
Files used:
main.cpp
Set up a bunch of existing classes, pass them to the CudaPrepare class, which combines them, and is responsible for preparing the data passed to the cuda code only with primitive types.
#include "CudaPrepare.h"
#include "CudaSpecificType1.h"
#include "A.h" //already existing classes
#include "B.h" //already existing classes
void main()
{
A a(...);
B b(...);
CudaSpecificType1 cudaType(...);
CudaPrepare<CudaSpecificType> cudaPrepare(a, b, cudaType);
cudaPrepare.run();
}
CudaSpecificType1.cuh
class CudaSpecificType1
{
protected:
/*
a few members
*/
public:
CudaSpecificType1(...) : /*initializations*/ {}
float polymorphicFunction(/*args*/);
};
CudaPrepare.h
#include "A.h" //already existing classes
#include "B.h" //already existing classes
template<typename T>
class CudaPrepare
{
protected:
const A& a;
const B& b;
const T& t;
public:
CudaPrepare(const A& a, const B& b, const T& t): A(a), B(b), T(t) {/*some initialization stuff*/}
void run() const
{
/*
data preparation : various discretizations, sticking to primitive type only, casting to single precision etc...
*/
CudaClass<T> cudaClass(t, /*all the prepared data here*/);
cudaClass.run();
}
};
CudaClass.cuh
template <typename T>
__global__ void kernel(const T t, /*other args*/, float* results)
{
int threadId = ...;
results[threadId] = t.polymorphicFunction(...);
}
template<typename T>
class CudaClass
{
protected:
const T& t;
/*
all the prepared data with primitive types
*/
public:
CudaClass(const T& t, ...) : t(t) /*other initialization*/ {}
void run() const
{
/*
grid size calculation, cuda memory allocation, data transfer to device...
*/
//kernel invocation
kernel<T><<</*grid & block size*/>>>(/*args*/);
/*
clean up with cudaFree(...);
*/
}
};
C ++ compiler gives error when calling kernel as expected. CudaClass :: run () cannot be moved to a .cu file because the template is templated. The only thing I can think of is injecting a .cu file replacing main.cpp / or containing a stub that will be called from main.cpp, but then nvcc cannot handle some C ++ 11 functions. In particular, Ah and Bh contain many enum classes ...
source to share
I experimented with Cuda 7.0 (was 6.5 earlier). Unfortunately, there is still no support (at least) for the following C ++ 11 features:
-
enum classes
-
final keyword
-
range based on cycles
However, as Robert Crovella suggested, explicitly implementing the pattern solves the problem.
CudaClass.cuh should be split into two parts:
CudaClass.cuh
template <typename T>
__global__ void kernel(const T t, /*other args*/, float* results)
{
int threadId = ...;
results[threadId] = t.polymorphicFunction(...);
}
template<typename T>
class CudaClass
{
protected:
const T& t;
/*
all the prepared data with primitive types
*/
public:
CudaClass(const T& t, ...) : t(t) /*other initialization*/ {}
void run() const;
};
CudaClass.cu
#include "CudaClass.cuh"
//explicit instantiation, so that the kernel invocation can be in a .cu file
template class CudaClass<CudaSpecificType1>;
/*
other explicit instantiations for various types
*/
template<typename T>
void run() const
{
/*
grid size calculation, cuda memory allocation, data transfer to device...
*/
//kernel invocation
kernel<T><<</*grid & block size*/>>>(/*args*/);
/*
clean up with cudaFree(...);
*/
}
source to share