Aligning memory addresses

I am a bit confused about the concept of memory alignment. So here are my doubts: The text says that if you want to read 4 bytes of data starting from an address that is not divisible by 4, you have a case of uneven memory access. For example, if I want to read 10 bytes starting at address 05, this will be designated as non-master access (http://www.mjmwired.net/kernel/Documentation/unaligned-memory-access.txt).

Will this case be specific for a 4 byte address architecture, or is it true for a byte address architecture? If the above case doesn't align for the byte address architecture, why is it?

Thank!

+3


source to share


2 answers


Typically, bit 0 in memory is gated to the bus, and bit 0 of that bus is connected to bit 0 of each register. This continues up to bit 31. There may be special hardware that routes each byte (bits 15: 8, 23:16, and 31:24) to the least significant byte, bit 7: 0. (When you get to bit "32" it is actually bit 0 of the 4-byte word at address 4.)

However, in the nominal case, there is no special hardware that moves the bytes to any position other than the one to which they are nominally linked in natural order and possibly byte track 0.

Imagine a simple 32-pin memory chip and a simple 32-pin processor. A given pin of data on each chip is connected to the corresponding one, and the other to one. It's just that a simple processor cannot perform an invalid read.

So, consider reading from 0. The next 4 bytes enter the register as wired, and this also happens to read from address 4. But what if you are reading (32 bits) from address 1? Or 2? Or 3? While reading cannot be done directly in hardware, a fancy controller can cause a lot of things:



  • The CPU can TWO reads just to get all the bits. He cannot do them at the same time, he only has 32 contacts. One is read from address 0 and one from address 4
  • the CPU then has to perform various swap, mask, and include-OR operations to create one word from the two components.

All this takes extra time.


Note. In fact, the data bus is usually a multiple of 32-bit as well as memory. Special equipment may be available to reorganize facilities. But even then, because this is an abnormal case, it may not get the pipeline optimizations that align reads get correctly, and even with special hardware, there is probably a penalty in the execution time of the operands through it.

+3


source


Alignment is related to data size and addressing. In most instruction / software sets, addressing is in byte units. 0,1,2,3 are all valid byte addresses. Assuming your memory system or peripheral system you are accessing is a "byte addressable", basically you can write individual bytes to it, usually you have instructions that allow any address value to be used. Alignment starts if you have more than one byte, two bytes, if aligned means the address lsbit is zero, unaligned means it is one. Four bytes, 32-bit values, the least significant two bits are zero, aligned, one or both are non-zero, not aligned, and so on. You can think about this, since modulo you want to get the address,where modulo 4 = 0 is aligned on 4 byte boundaries.

Now, as usual, as a software engineer you didn't intentionally put yourself in a situation where you needed to get 10 bytes at address 5, you would probably be doing 12 bytes at 0x4 or 16 at 0x0 or something along those lines. even if you only use 10 of them, you would align them more logically. External influences, network packets, filesystems, shared memory, hardware, etc., anytime you cross the compilation domain, you might have to deal with it and act accordingly. 10 bytes is semi-interested, depends on if you are trying to copy those bytes to another equally bad address, or just read them or write them. If you're reading, you probably just want to read 12 bytes at 0x4 and be done with it. If you write well,you can just do all 10 in a nice loop, or unwrap a byte at a time, you can write one at 0x5, two at 0x6, four at 0x8, two at 0xC and one at 0xE, or one at 0x5, loop or unrolled 4 16-bit values starting at 0x6, then one byte at 0xE. Etc.

After you said, you can read 3 32-bit quantities in 0x4, or two 64-bit quantities starting at 0x0. It depends largely on what you plan to do with the data and what set of instructions you are using, etc. A loop of 10 byte reads can be the cleanest / easiest to read, maintain, etc.



if you are wondering about aligned vs unaligned then, as I mentioned above, with the help of writing, you can do

8 bit access at 0x5
16 bit access at 0x6
32 bit access at 0x8
16 bit access at 0xC
8 bit access at 0xE

      

as I keep talking, although for reading this may not be the most efficient. For writing, you can read-modify the write in 32 or 64 bit amounts or combinations mentioned above.

0


source







All Articles