How can I combine short conditional jumps with branch alignments with `.align` in Delphi assembler?

How to reconcile short conditional jumps with branch alignments in Delphi assembler?

I'm using Delphi version 10.2 Tokyo, for 32bit and 64bit build, to write some functionality entirely using build.

If I don't use .align

, the compiler encodes the short

conditional branch instructions correctly (2 byte instruction which consists of 1 byte opcode 074h

and 1 byte relative offset + to 07Fh). But if I ever put at least one .align

, even one as small as .align 4

, all conditional jump instructions located before .align and having a destination located after .align

- then all these instructions become 6-byte instructions, not 2 bytes as it should be. Only instructions that follow .align remain correctly encoded as 2-bytes short

.

Delphi Assembler does not accept short prefix.

How can I combine short conditional jumps with branch alignments with .align

in Delphi assembler?

Here's an example of the procedure - notice what's in the middle .align

.

    procedure Test; assembler;
    label
      label1, label2, label3;
    asm
      mov     al, 1
      cmp     al, 2
      je      label1
      je      label2
      je      label3
    label1:
      mov     al, 3
      cmp     al, 4
      je      label1
      je      label2
      je      label3
      mov     al, 5
      .align 4
    label2:
      cmp     al, 6
      je      label1
      je      label2
      je      label3
      mov     al, 7
      cmp     al, 8
      je      label1
      je      label2
      je      label3
    label3:
    end;

      

This is how it is encoded - the conditional jumps placed before align

that point to label2 and label3 (after align

) are encoded as 6-byte instructions (that's the 64-bit target CPU):

0041C354 B001          mov al,$01      //   mov     al, 1
0041C356 3C02          cmp al,$02      //   cmp     al, 2
0041C358 740C          jz $0041c366    //   je      label1
0041C35A 0F841C000000  jz $0041c37c    //   je      label2
0041C360 0F8426000000  jz $0041c38c    //   je      label3
0041C366 B003          mov al,$03 //label1: mov al, 3
0041C368 3C04          cmp al,$04      //   cmp     al, 4
0041C36A 74FA          jz $0041c366    //   je      label1
0041C36C 0F840A000000  jz $0041c37c    //   je      label2
0041C372 0F8414000000  jz $0041c38c    //   je      label3
0041C378 B005          mov al,$05      //   mov     al, 5
0041C37A 8BC0          mov eax,eax     //  <-- a 2-byte dummy instruction, inserted by ".align 4" (almost a 2-byte NOP)
0041C37C 3C06          cmp al,$06 //label2: cmp al, 6
0041C37E 74E6          jz $0041c366    //   je      label1
0041C380 74FA          jz $0041c37c    //   je      label2
0041C382 7408          jz $0041c38c    //   je      label3
0041C384 B007          mov al,$07      //   mov     al, 7
0041C386 3C08          cmp al,$08      //   cmp     al, 8
0041C388 74DC          jz $0041c366    //   je      label1
0041C38A 74F0          jz $0041c37c    //   je      label2
0041C38C C3            ret        // label3:

      

But if I remove .align

- all instructions are the correct size - only 2 bytes as they were before:

0041C354 B001          mov al,$01      //   mov     al, 1
0041C356 3C02          cmp al,$02      //   cmp     al, 2
0041C358 7404          jz $0041c35e    //   je      label1
0041C35A 740E          jz $0041c36a    //   je      label2
0041C35C 741C          jz $0041c37a    //   je      label3
0041C35E B003          mov al,$03 //label1: mov     al, 3
0041C360 3C04          cmp al,$04      //   cmp     al, 4
0041C362 74FA          jz $0041c35e    //   je      label1
0041C364 7404          jz $0041c36a    //   je      label2
0041C366 7412          jz $0041c37a    //   je      label3
0041C368 B005          mov al,$05      //   mov     al, 5
0041C36A 3C06          cmp al,$06 //.align 4 label2:cmp al, 6
0041C36C 74F0          jz $0041c35e    //   je      label1
0041C36E 74FA          jz $0041c36a    //   je      label2
0041C370 7408          jz $0041c37a    //   je      label3
0041C372 B007          mov al,$07      //   mov     al, 7
0041C374 3C08          cmp al,$08      //   cmp     al, 8
0041C376 74E6          jz $0041c35e    //   je      label1
0041C378 74F0          jz $0041c36a    //   je      label2
0041C37A C3            ret             //   je      label3
                                //  label3: 

      

Back to the conditional jump instructions: How can I reconcile short conditional jumps with branch alignments using .align

in Delphi assembler?

I admit that the advantage of branch chain alignment on CPUs like SkyLake and later is subtle, and I understand that I can just refrain from using it .align

- it will save code size too. But I want to know how I can use Delphi assembler to generate short jumps with align

. This issue persists on the 32-bit target as well, not just the 64-bit version.

+3


source to share


1 answer


If your assembler doesn't have the ability to better optimize branch traversal (which can lead to repeated passes), you're probably out of luck. (Of course, you could have manually done all the settings yourself, but this needs to be redone every time you change something.)

Or you can use a different collector to build. But as I expected, this is highly undesirable because you lose access to Delphi-specific stuff, such as the object layout for things declared outside of asm . (Thanks @Rudy for the comment.)

Perhaps you could write part of your function in Delphi assembler and do as much for Delphi as possible. Write part of the critical loop in another assembler, hexdump dump its machine code output into a pseudo-instruction db

that you put in the middle of your Delphi assembly.

This may work fine if the start of each function is at least consistent as something within the function, but you will likely end up losing instructions or putting constants in registers for use by part of NASM, which is probably worse than just having more long branches.


Only commands after .align remain correctly encoded as short 2-byte



This is not entirely accurate. The first one je label1

looks ok but the front .align

.

It looks like any branch that goes ahead on a directive not yet evaluated .align

leaves room forrel32

, and the assembler never comes back and fixes it. Every other case seems fine: back branches through a .align

and forward branches that do not cross .align

.


Optimizing a pluggable shift is not an easy task, especially when directives exist .align

. However, this appears to be a really suboptimal implementation.

Related: Why is a "start small" algorithm for offset branches not optimal? for more information on the algorithms used by assemblers to optimize branch-move. Even good assemblers probably don't make optimal choices, especially if directives exist .align

.

+1


source







All Articles