Is there any way or even possible to get the total GPU usage over a period of time?

I am trying to get information about the total GPU usage (NVIDIA Tesla K20 mine running on Linux) over a period of time. By "general" I mean something like: how many streaming multiprocessors are scheduled to run and how many GPU cores are scheduled to run (my guess is that if the core is running, it will run at full speed / frequency?). It would also be nice if I could get the total usage as measured by flops.

Of course, before asking a question here, I searched and researched several existing tools / libraries, including NVML (and nvidia-smi built on top of it), CUPTI (and nvprof), PAPI, TAU, and Vampir. However, it seems (but I'm not sure yet) none of them could provide me with the information I needed. For example, NVML may report using "GPU Utilization" on a percentage, but according to its doc / comment, this is using "Percentage of time in the last second that one or more cores have been running on the GPU" which appears to be not accurate enough.For nvprof, it can report flops for a single core (with very high overhead), but I still don't know how well the GPU is being used.

It seems that PAPI can receive the number of commands, but it cannot work with other floating point. I have not tried the other two tools (TAU and Vampir) yet, but I doubt they can satisfy my need.

So I'm wondering if it is possible to get general information about GPU usage? If not, what is the best alternative to evaluate it? The goal I am doing is to find the best schedule for multiple jobs running on top of the GPU.

I'm not sure how much I have covered my question in sufficient detail, so please let me know if there is anything I can add for a better description.

Many thanks!

+3


source to share


1 answer


Plugin

nVidia Nsight for Visual Studio has some very nice graphical features that give you the stats you want. But my feeling is that you have a Linux machine, so Nsight won't work.

I suggest using nVidia Visual Profiler .

The link to the metric is pretty complete and can be found here . This is how I collected the data that interests you:

  • Active SMX Nodes - See sm_efficiency . It should be close to 100%. If it is lower, then some of the SMX modules are inactive.

  • Active Kernels / SMX - This depends on . The K20 has a dual instruction Quad-warp scheduler. Pivot launches 32 SM cores. The K20 has 192 SP cores and 64 DP cores. You need to look at the ipc label (instructions per loop). If your DP and IPC program is 2, then you are using 100% utilization ( for all workload execution ). This means that the planning instructions have been revised 2 times, so that all of your 64-core cores were active during all cycles . If your program is SP, your IPC should theoretically be 6. However, in practice this is very difficult to obtain. An IPC of 6 means that 3 schedulers ran 2 distortions each and gave 3 x 2 x 32 = 192 SP cores to work.

  • FLOPS - Well, if your program uses floating point operations, I would look at flop_count_sp and divide it by the elapsed seconds.



As for the frequency, I would not worry, but it is not harmful to check with nvidia-smi. If your card has enough cooling, it will stay at its peak frequency during operation.

Check out the metric link as it will provide you with much more useful information.

I think NVprof supports multiple processes as well. Check here . You can also filter by process ID. Thus, you can collect these "multi-context" or "single-context" metrics. In the metrics reference table, you have a column that states if they can be collected in both cases.

Note . Metrics are calculated using HW performance counters and driver-level analysis. If nvidia tools cannot provide more than this, then it is unlikely that other tools can offer more. But I think the right mix of metrics can tell you everything you want to know about running your application.

+5


source







All Articles