Why should we pass pointer to pointer to cudaMalloc
The following codes are widely used to allocate global GPU memory:
float *M;
cudaMalloc((void**)&M,size);
I wonder why we have to pass a pointer to a pointer to cudaMalloc and why it was not designed like this:
float *M;
cudaMalloc((void*)M,size);
Thanks for any simple descriptions!
source to share
To clarify the need for a little more detail:
Before the call cudaMalloc
, M
dots ... anywhere, undefined. After the call, cudaMalloc
you want a valid array to be present in the memory location where it points. One could naively say "then just allocate memory in this place", but this, of course, is generally impossible: an undefined address, as a rule, will not even be inside valid memory. cudaMalloc
should be able to choose a location. But if the pointer is called by value, there is no way to tell the caller where.
In C ++ one can make a signature
template<typename PointerType>
cudaStatus_t cudaMalloc(PointerType& ptr, size_t);
where passing ptr
by reference allows the function to change location, but since it cudaMalloc
is part of the CUDA C API, this is not an option. The only way to pass something modifiable in C is to pass a pointer to it. And the object itself is a pointer to what you need to pass is a pointer to a pointer.
source to share