Parallel filtering of CUDA array (computational architecture)
I'm trying to design a framework to do standard functional sequence operations (map, filter, fold, etc.) in F #, where the computation is done on the GPU (using CUDA) instead of the CPU.
I have had some success in implementing the CUDA map function so that I can write
let result =
cudaArray
|> CudaArray.map <@ fun x -> x ** 3.0 @>
|> Array.ofCudaArray
It is relatively simple because all operations are performed in different ways.
Now I'm interested in writing a similar system for filtering with some predicate. I am thinking of implementing a predicate as another map function that returns a boolean array, but I need to find a way to reduce the array of interest to where the corresponding boolean array element is true.
I found this article ( http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/ ) which describes what looks like a very good method for solving this problem, summarizes to use the following function to index the elements in the target array:
// warp-aggregated atomic increment
__device__
int atomicAggInc(int *ctr) {
int mask = __ballot(1);
// select the leader
int leader = __ffs(mask) – 1;
// leader does the update
int res;
if(lane_id() == leader)
res = atomicAdd(ctr, __popc(mask));
// broadcast result
res = warp_bcast(res, leader);
// each thread computes its own value
return res + __popc(mask & ((1 << lane_id()) – 1));
} // atomicAggInc
The problem is, as I understand from the article, warp_bcast is only supported by Compute Capability 3.0 or higher cards.
I am curious to know if there is some alternative to warp_bcast to be able to compute <3.0 or if there is some other approach I can use to solve this problem without sacrificing the huge performance gains described in the article? (To be clear, I am completely open to completely different approaches to those described in the above article if they can help me hack this.)
source to share
No one has answered this question yet
Check out similar questions: