What is the performance impact of atomic operations computed in the compute shader?

I have a compute shader that changes texels in a 256x256 texture.

There are 256x256x256 calls in the compute shader, where the x and y components of the call are directly mapped to u and v texel coordinates. Thus, each texel can be written up to 256 times.

I want every call to the compute shader to check what is currently in the given texel and run some tests to decide if they should be overwritten or not. However, to avoid the concurrency issue of these, all getting the texel value before any other call written on it, I'm looking to use an atomic operation to write the texture values.

However, I was told that this breaks the point of parallelization of the operation, as the atomic operations force everything else to wait until it is finished, which means that each z-call to the compute shader must go sequentially as they wait for the previous one to write the texture atomic.

Is this so, and if so, how much will it affect performance? It's worth noting that the call to z can vary and can be much larger than 256

+3


source to share





All Articles