What are the access times for different GPU memory spaces?

This is a question about discrete GPUs, mostly recent GPUs (NVIDIA Kepler, Maxwell, etc. in AMD Kaveri and R290).

How much does it take to load into an inactive item into a register from ...

  • Global device memory?
  • L2 Global Memory Cache
  • Texture cache (s)?
  • Persistent cache (s)?
  • L1 cache in the kernel?
  • (shared memory with shared core) must be the same as L1 cache.)

A table reference somewhere would be great, an explanation would be ok ...

+3


source to share


1 answer


It depends on gpu, generation, how it integrates (like pcie) and other things. I work with ASM a lot and these are the numbers I work with:

-Global device memory? about 300-800 hours. (GPUs on the motherboard like laptops with main memory have slower memory)

-Global L2 cache? about 100 clock cycles

-Texture cache (s)? guessing 50-100 clock cycles



-Constant cache (s)? about 1-3 clock cycles if it is in cache or L2 cache (~ 50-100 hours) or even global time 300-500 hours. (depending on whether it is a hit or a cache miss)

-Per-core (i.e. Per-SMX / SMM in Keplers / Maxwell) L1 cache? about 1-3 clock cycles

-Per-core (i.e. Per-SMX / SMM in Kepler / Maxwell), shared memory? about 1-3 clock cycles

I also made some online inquiries to see how close I was and found this. The numbers are different from mine. http://lpgpu.org/wp/wp-content/uploads/2013/05/poster_andresch_acaces2014.pdf I think the actual time it takes for a programmer to work is two different numbers due to multithreading. Hope this helps.

+2


source







All Articles