Why does the compiler match N byte data types on N byte boundaries?

Question

Why does the compiler match N byte data types on N byte boundaries?

I don't understand why the compiler aligns int on 4 byte boundaries, short on 2 byte boundaries, and char on 1 byte. I understand that if the processor's data bus width is 4 bytes, it takes 2 memory read cycles to read an int from an address not a multiple of 4.
So why does the compiler not align all data on 4 byte boundaries? For example:

struct s {
 char c;
 short s;
};

Here 1) why does the compiler align on the 2 byte boundary? Assuming the processor can retrieve 4 bytes in one memory read cycle, wouldn't it take only 1 memory read cycle to read the short one in the above case, even if between char and short?

2) Why doesn't the compiler align on the 4 byte boundary?

+3

c compiler-construction memory memory-alignment

linuxfreak 12 Sep 14 at 10:35

source to share

4 answers

MSalters · Answer 1 · 2014-09-12T10:58:14+0000

These objects must be placed in arrays. The array is contiguous. Thus, if the first element is N byte aligned, and all objects are N bytes large, then all objects in the array are necessarily N byte aligned.

So if short

will have 2 bytes large but aligned to 4 bytes, there will be two byte holes between all the shorts in the array, which is forbidden.

You can see that your guess is slightly flawed. I could do struct

with 26 characters and it won't be 26 bytes aligned. It can start anywhere. N-type byte aligned equal to N or separating N .

rici · Answer 2 · 2014-09-12T17:45:52+0000

First, your premise is wrong. Each object is aligned with some fundamental alignment. For some scalar objects, the alignment can be the same as the size of the object's data, but it can also be smaller or larger. For example, a classic 32-bit architecture (I mean i386 here) can include both 8-byte paired and 10-byte doubles, both with 4-byte alignment. Note that I said the data size is higher; don't confuse this with sizeof

.

The actual size of the object can be larger than the size of the data because the size of the object must be a multiple of the object's alignment. The reason is that the alignment of an object is always the same, regardless of context. In other words, object alignment only depends on the object type.

Hence, in structures:

struct example1 {
  type1 a;
  type2 b;
};

struct example2 {
  type2 b;
  type1 a;
};

the alignment of both b is the same. To guarantee this alignment, it is necessary that the alignment of the composite type should be as close as possible to the alignment of the element types. This means that struct example1

both struct example2

above have the same alignment.

The requirement that the alignment of an object depends only on its type implies that the size of the type must be a multiple of its alignment. (Any type can be an array element type, including an array of only one element. The size of an array is the product of the element size and the number of elements. Therefore, any padding required must be part of the element size.)

In general, reordering elements in a complex type can change the size of a complex type, but it cannot change the alignment of a complex type. For example, both of the following structures have the same alignment - this is the alignment of a double

, but the former is almost certainly less:

struct compact {
  double d;   // Must be at offset 0
  char   c1;  // Will be at offset sizeof(double)
  char   c2;  // Will be at offset sizeof(double)+sizeof(char).
};

struct bloated {
  char   c1;  // Must be at offset 0
  double d;   // Will be at offset alignof(double)
  char   c2;  // Will be at offset (alignof(double) + sizeof(double))
};

linuxfreak · Answer 3 · 2014-09-15T04:49:53+0000

I think I found the answer to my question. There can be two reasons why the byte is padded between char and short rather than after short.

1) Some architectures may have 2 byte instructions that only retrieve 2 bytes from memory. If so, it takes 2 memory read cycles to obtain a short.

2) Some architectures may not have double-byte instructions. Even so, the processor fetches 4 bytes from memory into a register and masks the unprocessed bytes to get a short value. If a byte is not filled between char and short, the processor must shift the bytes to get the short value.

Both of the above cases can lead to lower performance. This is why the short byte is 2 bytes aligned.

Basile starynkevitch · Answer 4 · 2014-09-15T05:21:51+0000

The compiler compares the data according to the target processor (micro) architecture and ABI . Find an example in the x86-64 ABI spec as an example.

^{If your compiler is aligned differently than specified by some ABIs, you will not be able to call functions from libraries that conform to that ABI!}

In your example, if (on x86-64) the short field was s

not 2-byte aligned, the processor would have to work harder (possibly issue two accesses) to get that field.

Also, on many x86-64 chips, the cache line is often multiples of 16 (or maybe less) bytes. So it makes sense to align the frame with the call stack at 16 bytes. And this is needed for vector local variables (AVX, SSE3, etc.)

On some processors with poorly aligned data, one could either throw an error (such as an interrupt for a machine exception) or slow down processing significantly. In addition, it can make some accesses non-atomic (for multi-core processing). Consequently, some ABIs prescribe more ABIs for what is strictly necessary. Also, some recent processor features (like vectorization like SIMD like AVX or SSE3 ) benefit from very aligned data (like alignment to 16 bytes). Your compiler can optimize more - use instructions like this - if it knows about such strong alignment.

Why does the compiler match N byte data types on N byte boundaries?

More articles: