x86-64 canonical address?

While reading Intel's manual, I found the following:

On processors supporting Intel 64 architectures, the IA32_SYSENTER_ESP

field and IA32_SYSENTER_EIP

field must contain each canonical address.

What is a "canonical address"?

+7


source to share


3 answers


I suggest you download the complete software developer guide . The documentation is available in separate volumes, but this link gives you all seven volumes in one massive PDF, making it easier to find things.

The answer is in section 3.3.7.1. The first line of this section reads:

In 64-bit mode, the address is considered in canonical form if the address bits from 63 to the most significant implemented bit are set to all ones or all zeros by the microarchitecture.

It continues from there ...



You can use cpuid

to query the supported virtual address width on this processor. (ie "implemented by microarchitecture".) Or you can just accept 48-bit.


That is, the canonical virtual address is 48 bits, properly signed to 64. If the most significant bits do not match, it is non-canonical and will be erroneous if you try to dereference it.

(Or with Intel's upcoming 5-level page table extension, 57 bits extended to 64).

+7


source


This answer is less detailed than the previous ones, but IMHO is easier to understand:

While 64-bit processors have registers that are 64-bit wide, systems generally do not implement all 64-bit addressing (16 exabytes of theoretical physical memory).

Thus, most architectures define an unfulfilled address space that the processor will invalidate for use. x86-64 (...) determine the most significant valid bit of the address, which must then be an extended sign (...) to create a valid address. The result of this is that the common address space is effectively divided into two parts, an upper and a lower part, allowing for the addresses in between. (...) Valid addresses are called canonical (invalid). non-canonical addresses).



From https://www.bottomupcs.com/virtual_memory_is.xhtml

Sign-extended

- the same least significant bit copied to the address of the most significant bits. Upper 11111...

lower 00000...

.

+2


source


Section 3.3.7.1 Intel Guide covers this 5 (difficult to digest) paragraphs, for me its page 74 in a set of 4 volumes, which can be downloaded from an Intel website or go directly to the. Address : https://software.intel.com/sayty / by default / files / /39/c5/325462-SDM-Vol-1-2abcd-3abcd.pdf management

These paragraphs say that canonical addresses are anything less than a full 64-bit address. There are different addressing implementations such as 48-bit or 57-bit. (57-bit requires an extra layer of page tables, increasing the cost of crawling pages. See https://en.wikipedia.org/wiki/Intel_5-level_paging for more information on this new CPU feature that can be disabled).


A 48-bit implementation will have a high semi-canonical address starting at

0xFFFF800000000000

while the lower half will

0x00007FFFFFFFFFFF

Bit 63 to any value will mean it as a canonical address if you see all ones or all zeros. In the 57-bit implementation, I immediately realized that I was looking at the canonical address when I saw 0xFF____ or 0x00____. (The least significant bit of the most significant byte is the significant address bit, and the remaining 7 are copies of it: i.e., are properly signed)

Maybe a useful way to remember is the very word canonical means referring to a general rule or way of doing something. In general, no one needs as many addresses as the 64-bit version can provide, so they are usually not used. Also, if something fits the canon, as in Star Trek or in the comics, then how it was seen or done initially.

Now, to answer, WHY do we have canonical addresses? Nobody needs to access 16 exabytes (the theoretical limit for a 64-bit machine), so the second paragraph of this manual simply says that Intel architecture "defines" a 64-bit linear address, but no one seems to be using it. Now, just in case, the third paragraph says that the implementation will still check those first few bits and, if NOT in canonical form, throw a "general protection" exception.

The main reason for checking canonical addresses, rather than silently ignoring the most significant bits, is to make sure the software is directly compatible with future hardware that supports more virtual address bits.

+1


source







All Articles