Debugging / performance / web shader cost

I used three.js to experiment and learn GLSL and WebGL. I come from the 3D art world, so I understand the concepts of mesh, 3d math, lighting, etc. While I consult both OpenGL and WebGL literature (along with gpu gems, eric lengyels math book, etc.), I seem to have CS that apply to graphics.

I am currently using colors for debugging along with the canvas inspector to see how long a draw call takes.

I am interested in such questions as:

  • How heavy the GLSL functions are. For example, how exactly does division compare to multiplication or sin in terms of performance, MAD commands, etc.?
  • Say you have something like this

    vec2 normalizedCoord = gl_FragCoord.xy / uniform_resolution.xy;
    vs
    vec2 normalizedCoord = gl_FragCoord.xy * uniform_resolution_inverse.xy;
    vs
    ... the same with lowp/mediump/highp
    
          

    what happens to the accuracy / performance?

  • or something like

    vec4 someVec4 = ...;
    
    float sum = dot(someVec4,vec4(1.));
    vs 
    float sum = someVec4.x + someVec4.y + someVec4.z + someVec4.w;
    
          

  • What are the implications of searching for textures, for example, for doing some sort of sampling - SSAO or something similar?

Is this the type of information found in something like a book by Michael Abrash Chernoy?

If someone can help me frame this question better, it would be helpful :)

+3


source to share


2 answers


I'm sure someone more inexperienced than me can give you a better answer, but true. It depends.

GPUs are parallelized and they are all different, so it can take a certain amount of time per GPU.

Also, I don't know what you mean by "canvas inspector" but probably can't show you how long it takes, because graphics pipelines are also parallelized, multithreaded, multiprocessing, at least from POV JavaScript everything. what you may know is how long it took to send the command, not how long it took to complete it. For example, in Chrome, the command is passed to the GPU process and JavaScript continues. The GPU process then passes it to GL / DirectX, which in turn passes the command to another process, at least on most desktop OSs.

People talk about using gl.finish

to find out how long something will take, but even that doesn't work because it doesn't tell you how long it took for the GPU to start up. This tells you how long it took for the GPU to start + how long it took for all these processes to sync. Just like asking "how fast was the car going" when the only thing you can measure is the car from a stopped state to another stopped state. You can say that one car made it from point A to point B in a given time, but you cannot measure which car hit the fastest speed. One car could have gone from 0 to 60 in 1 second and then took 3 seconds to decelerate. The rest 0-20 instantly, 4 seconds to reach the target,and then stopped instantly. Both cars took 4 seconds. If all you can measure it took 4 seconds, you can't tell which hit is faster.



Worse, you have tiled architectures like all iOS devices and many Android devices that don't actually draw until they have all the commands. They then generate command "tiles" for displaying different parts of the screen.

Okay, that wasn't the point.

In general, the less code the faster, the slower textures are viewed, GPUs have texture caches, so in "typical" use a texture stretched over a polygon stores a ton of texture cache. You can kill the texture cache by doing random texture searches. For example, create a random texture, use this texture to calculate texture coordinates for another texture. This will wipe out the texture cache completely and the GPU will run very slowly.

Accordingly, swizzling is fast. Fine foods are quick. Multiply and add quickly. Linear interpolation is fast.

+3


source


If you are familiar with SIMD registers and how to use them (for example: using SSE intrinsic functions or ASM), you will soon notice that GPU registers are very similar to CPU SIMD registers. All performance is tied to the bandwidth between the processor and its memory (main memory for the CPU or GPU RAM for the GPU). In some execution schemes, you can even split and load the balance of processing (mainly linear algebra calculations when shaded) through free registers, no matter which processor you are using.



0


source







All Articles