Why won't this LEA statement compile?

I'm porting 32-bit BASM Delphi code to 64-bit FPC (Win64 target OS) and wondering why the following instruction won't compile on 64-bit FPC:

{$IFDEF FPC}
  {$ASMMODE INTEL}
{$ENDIF}

procedure DoesNotCompile;
asm
      LEA   ECX,[ECX + ESI + $265E5A51]
end;

// Error: Asm: 16 or 32 Bit references not supported

      

Possible workarounds:

procedure Compiles1;
asm
      ADD   ECX,ESI
      ADD   ECX,$265E5A51
end;

procedure Compiles2;
asm
      LEA   ECX,[RCX + RSI + $265E5A51]
end;

      

I just don't understand what is wrong with the 32-bit instruction LEA

in the target Win64 (it compiles OK in 32-bit Delphi, so this is the correct processor instruction).


Optimization notes:

The following code compiled with 64bit FPC 2.6.2

  {$MODE DELPHI}
  {$ASMMODE INTEL}

procedure Test;
asm
        LEA     ECX,[RCX + RSI + $265E5A51]
        NOP
        LEA     RCX,[RCX + RSI + $265E5A51]
        NOP
        ADD     ECX,$265E5A51
        ADD     ECX,ESI
        NOP
end;

      

generates the following assembler output:

00000000004013F0 4883ec08                 sub    $0x8,%rsp
                         project1.lpr:10  LEA     ECX,[RCX + RSI + $265E5A51]
00000000004013F4 8d8c31515a5e26           lea    0x265e5a51(%rcx,%rsi,1),%ecx
                         project1.lpr:11  NOP
00000000004013FB 90                       nop
                         project1.lpr:12  LEA     RCX,[RCX + RSI + $265E5A51]
00000000004013FC 488d8c31515a5e26         lea    0x265e5a51(%rcx,%rsi,1),%rcx
                         project1.lpr:13  NOP
0000000000401404 90                       nop
                         project1.lpr:14  ADD     ECX,$265E5A51
0000000000401405 81c1515a5e26             add    $0x265e5a51,%ecx
                         project1.lpr:15  ADD     ECX,ESI
000000000040140B 01f1                     add    %esi,%ecx
                         project1.lpr:16  NOP
000000000040140D 90                       nop
                         project1.lpr:17  end;
000000000040140E 4883c408                 add    $0x8,%rsp

      

and the winner (7 bytes long):

LEA     ECX,[RCX + RSI + $265E5A51]

      

all 3 alternatives (including the ones LEA ECX,[ECX + ESI + $265E5A51]

that won't compile with 64-bit FPC) are 8 bytes long.

Not sure if the winner is best in speed.

+3


source to share


2 answers


I would treat this as a bug in FPC assembler. The asm code you present is valid and in 64-bit mode it is quite fair to use LEA with 32-bit registers as you did. The Intel processor docs are clear on this issue. Delphi's 64-bit inline assembler accepts this code.

To get around this, you will need to compile the code:

DQ    $265e5a510e8c8d67

      

In Delphi's CPU view, it looks like:

Project1.dpr.12: DQ $ 265e5a510e8c8d67
0000000000424160 678D8C0E515A5E26 lea ecx, [esi + ecx + $ 265e5a51]

I did a very simple benchmarking to compare the use of 32- and 64-bit operands and the version using two ADDs. The code looks like this:

{$APPTYPE CONSOLE}

uses
  System.Diagnostics;

function BenchWithTwoAdds: Integer;
asm
    MOV   EDX,ESI
    XOR   EAX,EAX
    MOV   ESI,$98C34
    MOV   ECX,$ffffffff
@loop:
    ADD   EAX,ESI
    ADD   EAX,$265E5A51
    DEC   ECX
    CMP   ECX,0
    JNZ   @loop
    MOV   ESI,EDX
end;

function BenchWith32bitOperands: Integer;
asm
    MOV   EDX,ESI
    XOR   EAX,EAX
    MOV   ESI,$98C34
    MOV   ECX,$ffffffff
@loop:
    LEA   EAX,[EAX + ESI + $265E5A51]
    DEC   ECX
    CMP   ECX,0
    JNZ   @loop
    MOV   ESI,EDX
end;

{$IFDEF CPUX64}
function BenchWith64bitOperands: Integer;
asm
    MOV   EDX,ESI
    XOR   EAX,EAX
    MOV   ESI,$98C34
    MOV   ECX,$ffffffff
@loop:
    LEA   EAX,[RAX + RSI + $265E5A51]
    DEC   ECX
    CMP   ECX,0
    JNZ   @loop
    MOV   ESI,EDX
end;
{$ENDIF}

var
  Stopwatch: TStopwatch;

begin
{$IFDEF CPUX64}
  Writeln('64 bit');
{$ELSE}
  Writeln('32 bit');
{$ENDIF}
  Writeln;

  Writeln('BenchWithTwoAdds');
  Stopwatch := TStopwatch.StartNew;
  Writeln('Value = ', BenchWithTwoAdds);
  Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
  Writeln;

  Writeln('BenchWith32bitOperands');
  Stopwatch := TStopwatch.StartNew;
  Writeln('Value = ', BenchWith32bitOperands);
  Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
  Writeln;

{$IFDEF CPUX64}
  Writeln('BenchWith64bitOperands');
  Stopwatch := TStopwatch.StartNew;
  Writeln('Value = ', BenchWith64bitOperands);
  Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
{$ENDIF}

  Readln;
end.

      



Output on my Intel i5-2300:

32 bit

BenchWithTwoAdds
Value = -644343429
Elapsed time = 2615

BenchWith32bitOperands
Value = -644343429
Elapsed time = 3915

----------------------

64 bit

BenchWithTwoAdds
Value = -644343429
Elapsed time = 2612

BenchWith32bitOperands
Value = -644343429
Elapsed time = 3917

BenchWith64bitOperands
Value = -644343429
Elapsed time = 3918

As you can see, there is nothing to choose from one of the LEA options based on this. The differences between their times are within the range of measurement variability. However, the option using ADD

twice wins hands.

Some different results from different machines. Here's the output on the Xeon E5530:

64 bit

BenchWithTwoAdds
Value = -644343429
Elapsed time = 3434

BenchWith32bitOperands
Value = -644343429
Elapsed time = 3295

BenchWith64bitOperands
Value = -644343429
Elapsed time = 3279

And on Xeon E5-4640 v2:

64 bit

BenchWithTwoAdds
Value = -644343429
Elapsed time = 4102

BenchWith32bitOperands
Value = -644343429
Elapsed time = 5868

BenchWith64bitOperands
Value = -644343429
Elapsed time = 5868
+5


source


By separating the size of the operands themselves, the components of the memory operands have a default size. In 64-bit mode, this is 64 bits, meaning you should use 64-bit registers for the memory operand components unless you have a specific reason.

The x86 ISA allows resizing for a given command with a byte prefix 0x67

, but you probably don't want to do that (and apparently your assembler doesn't even support it).



To make the distinction between operand and operand component a little clearer:

lea eax, dword ptr [rax + rdx * 4]

    ^^^  ^^^^^ ^^^                   operands: can be any size you like
                    ^^^   ^^^        operand components: usually 64-bit

      

+2


source







All Articles