What are the access times for different GPU memory spaces?
This is a question about discrete GPUs, mostly recent GPUs (NVIDIA Kepler, Maxwell, etc. in AMD Kaveri and R290).
How much does it take to load into an inactive item into a register from ...
- Global device memory?
- L2 Global Memory Cache
- Texture cache (s)?
- Persistent cache (s)?
- L1 cache in the kernel?
- (shared memory with shared core) must be the same as L1 cache.)
A table reference somewhere would be great, an explanation would be ok ...
source to share
It depends on gpu, generation, how it integrates (like pcie) and other things. I work with ASM a lot and these are the numbers I work with:
-Global device memory? about 300-800 hours. (GPUs on the motherboard like laptops with main memory have slower memory)
-Global L2 cache? about 100 clock cycles
-Texture cache (s)? guessing 50-100 clock cycles
-Constant cache (s)? about 1-3 clock cycles if it is in cache or L2 cache (~ 50-100 hours) or even global time 300-500 hours. (depending on whether it is a hit or a cache miss)
-Per-core (i.e. Per-SMX / SMM in Keplers / Maxwell) L1 cache? about 1-3 clock cycles
-Per-core (i.e. Per-SMX / SMM in Kepler / Maxwell), shared memory? about 1-3 clock cycles
I also made some online inquiries to see how close I was and found this. The numbers are different from mine. http://lpgpu.org/wp/wp-content/uploads/2013/05/poster_andresch_acaces2014.pdf I think the actual time it takes for a programmer to work is two different numbers due to multithreading. Hope this helps.
source to share