Performance - drawing many 2 circles in opengl

I am trying to draw a large amount of 2nd circle for my 2 games in opengl. They are the same size and have the same texture. Many sprites overlap. What would be the fastest way to do this?

an example of such an effect I am doing http://img805.imageshack.us/img805/6379/circles.png

(Note that the black edges are only related to the expanding explosion of circles. It was filled in a moment after this screenshot was taken.

Currently I am using a couple of textured triangles to create each circle. I have transparency around the edges of the texture to make it look like a circle. Using blending for this turned out to be very slow (and z culling was not possible as they were mapped as squares to the depth buffer). I don't use blending instead, but my fragment shader discards any fragments with an alpha of 0. This works, however this means early z is not possible (since fragments are discarded).

The speed is limited by a lot of congestion and gpu filling. The order in which the circles are drawn doesn't really matter (as long as it doesn't change between frames that create the flicker), so I tried to ensure that every pixel on the screen could only be written once.

I tried to do this using a depth buffer. At the start of each frame, it clears to 1.0f. Then, when the circle is drawn, it changes that part of the depth buffer to 0.0f. When another circle would normally be drawn, it is not like the new circle also has a z from 0.0f. This is no less than 0.0f, which is currently in the depth buffer, so it is not drawn. This works and should reduce the number of pixels to paint. However; strangely it's not faster. I have already asked a question about this behavior ( opengl depth buffer is slower when the points have the same depth ) and it has been suggested that z culling does not speed up when using equal z values.

Instead, I have to give all my circles separate false z values ​​from 0 upwards. Then, when rendering using glDrawArrays and the default GL_LESS, we get the correct speed acceleration due to dropping z (although early z is not possible since fragments are discarded to make circles possible). However, this is not ideal as I had to add in a lot of z-code for the 2nd game, which just shouldn't require it (and not skipping z-values ​​if possible would be faster). This is, however, the fastest way I have found so far.

Finally, I tried to use the stencil buffer, here I used

glStencilFunc(GL_EQUAL, 0, 1);
glStencilOp(GL_KEEP, GL_INCR, GL_INCR);

      

Where the template buffer is reset to 0 every frame. The idea is that after the pixel is drawn for the first time. It is then changed to zero in the stencil buffer. This pixel should then not be drawn again, thereby reducing the number of overlays. However, it turned out to be no faster than just drawing everything without a stencil buffer or depth buffer.

What is the fastest way people have found to write, do what I am trying to do?

+3


source to share


1 answer


The main problem is that you are filled with limited, which prevents the GPUs from shadowing all the fragments you ask to draw at the time you expect. The reason you do the buffering depth adjustment is inefficient is because the most expensive part of processing is the fragment shading (either through the native fragment shader or through the fixed function shader) which occurs before depth control work. The same problem occurs when using a stencil; pixel shading occurs in front of the stencil.

There are several things that can help, but they depend on your hardware:



  • render your sprites from front to back with depth buffering. Modern GPUs often try to determine if a collection of slices will be displayed before submitting them for shading. Roughly speaking, the depth buffer (or its representation) is checked whether the fragment that will be shaded is visible, and if not, then processing ends at this point. This should help reduce the number of pixels that need to be written to the framebuffer.
  • Use a fragment shader that immediately checks the texel alpha value and discards the fragment before any additional processing, as in:

    varying vec2 texCoord;
    uniform sampler2D tex;
    
    void main()
    {
        vec4 texel = texture( tex, texCoord );
    
        if ( texel.a < 0.01 ) discard;
    
        // rest of your color computations
    }
    
          

(you can also use alpha test in processing fixed function fragments, but it cannot be said if the test will be applied before the fragment shading is complete).

+2


source







All Articles