Why should we pass pointer to pointer to cudaMalloc

The following codes are widely used to allocate global GPU memory:

float *M;


I wonder why we have to pass a pointer to a pointer to cudaMalloc and why it was not designed like this:

float *M;


Thanks for any simple descriptions!


source to share

2 answers


it is necessary to write the value of the pointer to M

(not *M

), so it M

must be passed by reference.

Another way is to revert the pointer to classic style malloc

. In contrast malloc

, it cudaMalloc

returns an error status, like all CUDA execution functions.



To clarify the need for a little more detail:

Before the call cudaMalloc

, M

dots ... anywhere, undefined. After the call, cudaMalloc

you want a valid array to be present in the memory location where it points. One could naively say "then just allocate memory in this place", but this, of course, is generally impossible: an undefined address, as a rule, will not even be inside valid memory. cudaMalloc

should be able to choose a location. But if the pointer is called by value, there is no way to tell the caller where.

In C ++ one can make a signature

template<typename PointerType>
cudaStatus_t cudaMalloc(PointerType& ptr, size_t);


where passing ptr

by reference allows the function to change location, but since it cudaMalloc

is part of the CUDA C API, this is not an option. The only way to pass something modifiable in C is to pass a pointer to it. And the object itself is a pointer to what you need to pass is a pointer to a pointer.



All Articles