Why are the alignment boundaries greater than 4?

What I don't understand is why we have to align data in memory at boundaries greater than 4 bytes, since all other boundaries are multiples of 4. Assuming the CPU can read 4 bytes in a loop, this would be basically no difference performance if that data is 8 bytes and aligned to 4 bytes / 8 bytes / 16 bytes, etc.

+3


source to share


2 answers


First: x86 processors don't only read things in 4 bytes, they can read 8 bytes in a loop or even more with SIMD extension.

But to answer your question "why are the alignment bounds greater than 4?", Assuming a general architecture (you did not specify one and you wrote that x86 was just an example ) I will give you a specific case: GPUs.

NVIDIA GPU memory can be accessed only if the address is aligned to the edge of the access size ( PTX ISA ld / st ). There are different kinds of loads, and the most efficient ones happen when an address is aligned to a multiple of the access size, so if you try to load a double from memory (8 bytes) you have (pseudocode):



ld.double [48dec] // Works, 8 bytes aligned
ld.double [17dec] // Fails, not 8 bytes aligned

      

in the above case, when trying to access (r / w) memory that was not aligned correctly, the process will actually throw an error. If you want speed, you have to provide some security guarantees.

This might answer your question as to why alignment boundaries greater than 4 exist in the first place. With this architecture, an access size of 1 is always safe (each address is aligned to 1). This is not true for every n> 1.

+2


source


When the x86 processor reads double

, it reads 8 bytes in a loop. When it reads the SSE vector, it reads 16 bytes. When it reads an AVX vector, it reads 32.

When the CPU fetches a cache line from memory, it also reads at least 32 bytes.



Your assumption that the processor reads 4 bytes per cycle is false.

+5


source







All Articles