Registers and private buffer
Physically, the registers are built-in and close to the cores. Private memory is allocated from DRAM, which is quite far from the core. Near and far, we mean latency here. Reading from a register can take 1-10 clock cycles, while reading from DRAM can take 200-400 clock cycles.
Also, as a programmer, you cannot really access a specific register (unless you are doing assembly programming). Which registers will be used to execute your kernel is determined by the compiler or processor at runtime. But you can specify a specific private memory location if needed.
source to share