Parallel filtering of CUDA array (computational architecture)

I'm trying to design a framework to do standard functional sequence operations (map, filter, fold, etc.) in F #, where the computation is done on the GPU (using CUDA) instead of the CPU.

I have had some success in implementing the CUDA map function so that I can write

let result = 
    cudaArray 
    |> CudaArray.map <@ fun x -> x ** 3.0  @> 
    |> Array.ofCudaArray

      

It is relatively simple because all operations are performed in different ways.

Now I'm interested in writing a similar system for filtering with some predicate. I am thinking of implementing a predicate as another map function that returns a boolean array, but I need to find a way to reduce the array of interest to where the corresponding boolean array element is true.

I found this article ( http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/ ) which describes what looks like a very good method for solving this problem, summarizes to use the following function to index the elements in the target array:

// warp-aggregated atomic increment
__device__
int atomicAggInc(int *ctr) {
  int mask = __ballot(1);
  // select the leader
  int leader = __ffs(mask) – 1;
  // leader does the update
  int res;
  if(lane_id() == leader)
    res = atomicAdd(ctr, __popc(mask));
  // broadcast result
  res = warp_bcast(res, leader);
  // each thread computes its own value
  return res + __popc(mask & ((1 << lane_id()) – 1));
} // atomicAggInc

      

The problem is, as I understand from the article, warp_bcast is only supported by Compute Capability 3.0 or higher cards.

I am curious to know if there is some alternative to warp_bcast to be able to compute <3.0 or if there is some other approach I can use to solve this problem without sacrificing the huge performance gains described in the article? (To be clear, I am completely open to completely different approaches to those described in the above article if they can help me hack this.)

+3


source to share





All Articles