OpenCL selects / removes points from a large array
I have an array of 2M + points (planned to grow to 20M over time) that I am doing calculations via OpenCL. I would like to remove any points that fall into the random geometry of the triangle.
How can I do this in the OpenCL core process?
I can already:
-
identify those points that fall outside the triangle (simple point in the poly-algorithm in the kernel)
-
pass their coordinates to the global output array.
But:
-
openCL's global output array cannot be variable, so I initialize it to match the input points array in terms of size
-
As a result, 0,0 points appear in the final output when the point falls into the triangle
-
The output array therefore does not result in any reduction per se.
Is it possible to remove 0,0 points in the context of openCL?
nb I am coding in OpenFrameworks, so C ++ implementations are linked to .cl files
source to share
Just an alternative for the case where most of the points fall inside the atomic state:
It is possible to have a local counter and a local atom. You can then use to combine that atom into a global value atomic_add()
. The witch will return the "previous" global value. So, you just copy the indices to that address and up.
This should be a noticeable speedup since threads will sync locally and only once globally. The global copy can be parallel as the address will never overlap.
For example:
__kernel mykernel(__global MyType * global_out, __global int * global_count, _global MyType * global_in){
int lid = get_local_id(0);
int lws = get_local_size(0);
int idx = get_global_id(0);
__local int local_count;
__local int global_val;
//I am using a local container, but a local array of pointers to global is possible as well
__local MyType local_out[WG_SIZE]; //Ensure this is higher than your work_group size
if(lid==0){
local_count = 0; global_val = -1;
}
barrier(CLK_LOCAL_MEM_FENCE);
//Classify them
if(global_in[idx] == ....)
local_out[atomic_inc(local_count)] = global_in[idx];
barrier(CLK_LOCAL_MEM_FENCE);
//If not, we are done
if(local_count > 0){
//Only the first local ID does the atomic to global
if(lid == 0)
global_val = atomic_add(global_count,local_count);
//Resync all the local workers here
barrier(CLK_LOCAL_MEM_FENCE);
//Copy all the data
for(int i=0; i<local_count; i+=lws)
global_out[global_val+i] = local_out[i];
}
}
NOTE. I haven't compiled it, but it should work more or less.
source to share
If I understood your problem, you can do:
-> In your kernel, you can define points in a triangle and:
if(element[idx]!=(0,0))
output_array[atomic_inc(number_of_elems)] = element[idx];
Finally, in the first number_of_elems output_array on the host, you will have your internal points.
I hope this helps you, Best
source to share
There are alternatives, everything works better or worse, depending on how the data looks. I put one below.
Removing identified points can also be done by registering them in a separate array to the workgroup - you need to use the same atomic key as with Moise's answer (see my note on how this is done at the workgroup level!). The end result is a list of the starting points and ending points of the parts that don't need to be removed. Then you can copy parts of the array by different streams. This is less effective if you have clusters of points that need to be removed.
source to share