Why won't this LEA statement compile?
I'm porting 32-bit BASM Delphi code to 64-bit FPC (Win64 target OS) and wondering why the following instruction won't compile on 64-bit FPC:
{$IFDEF FPC}
{$ASMMODE INTEL}
{$ENDIF}
procedure DoesNotCompile;
asm
LEA ECX,[ECX + ESI + $265E5A51]
end;
// Error: Asm: 16 or 32 Bit references not supported
Possible workarounds:
procedure Compiles1;
asm
ADD ECX,ESI
ADD ECX,$265E5A51
end;
procedure Compiles2;
asm
LEA ECX,[RCX + RSI + $265E5A51]
end;
I just don't understand what is wrong with the 32-bit instruction LEA
in the target Win64 (it compiles OK in 32-bit Delphi, so this is the correct processor instruction).
Optimization notes:
The following code compiled with 64bit FPC 2.6.2
{$MODE DELPHI}
{$ASMMODE INTEL}
procedure Test;
asm
LEA ECX,[RCX + RSI + $265E5A51]
NOP
LEA RCX,[RCX + RSI + $265E5A51]
NOP
ADD ECX,$265E5A51
ADD ECX,ESI
NOP
end;
generates the following assembler output:
00000000004013F0 4883ec08 sub $0x8,%rsp
project1.lpr:10 LEA ECX,[RCX + RSI + $265E5A51]
00000000004013F4 8d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%ecx
project1.lpr:11 NOP
00000000004013FB 90 nop
project1.lpr:12 LEA RCX,[RCX + RSI + $265E5A51]
00000000004013FC 488d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%rcx
project1.lpr:13 NOP
0000000000401404 90 nop
project1.lpr:14 ADD ECX,$265E5A51
0000000000401405 81c1515a5e26 add $0x265e5a51,%ecx
project1.lpr:15 ADD ECX,ESI
000000000040140B 01f1 add %esi,%ecx
project1.lpr:16 NOP
000000000040140D 90 nop
project1.lpr:17 end;
000000000040140E 4883c408 add $0x8,%rsp
and the winner (7 bytes long):
LEA ECX,[RCX + RSI + $265E5A51]
all 3 alternatives (including the ones LEA ECX,[ECX + ESI + $265E5A51]
that won't compile with 64-bit FPC) are 8 bytes long.
Not sure if the winner is best in speed.
source to share
I would treat this as a bug in FPC assembler. The asm code you present is valid and in 64-bit mode it is quite fair to use LEA with 32-bit registers as you did. The Intel processor docs are clear on this issue. Delphi's 64-bit inline assembler accepts this code.
To get around this, you will need to compile the code:
DQ $265e5a510e8c8d67
In Delphi's CPU view, it looks like:
Project1.dpr.12: DQ $ 265e5a510e8c8d67 0000000000424160 678D8C0E515A5E26 lea ecx, [esi + ecx + $ 265e5a51]
I did a very simple benchmarking to compare the use of 32- and 64-bit operands and the version using two ADDs. The code looks like this:
{$APPTYPE CONSOLE}
uses
System.Diagnostics;
function BenchWithTwoAdds: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
ADD EAX,ESI
ADD EAX,$265E5A51
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
function BenchWith32bitOperands: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
LEA EAX,[EAX + ESI + $265E5A51]
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
{$IFDEF CPUX64}
function BenchWith64bitOperands: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
LEA EAX,[RAX + RSI + $265E5A51]
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
{$ENDIF}
var
Stopwatch: TStopwatch;
begin
{$IFDEF CPUX64}
Writeln('64 bit');
{$ELSE}
Writeln('32 bit');
{$ENDIF}
Writeln;
Writeln('BenchWithTwoAdds');
Stopwatch := TStopwatch.StartNew;
Writeln('Value = ', BenchWithTwoAdds);
Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
Writeln;
Writeln('BenchWith32bitOperands');
Stopwatch := TStopwatch.StartNew;
Writeln('Value = ', BenchWith32bitOperands);
Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
Writeln;
{$IFDEF CPUX64}
Writeln('BenchWith64bitOperands');
Stopwatch := TStopwatch.StartNew;
Writeln('Value = ', BenchWith64bitOperands);
Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds);
{$ENDIF}
Readln;
end.
Output on my Intel i5-2300:
32 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 2615 BenchWith32bitOperands Value = -644343429 Elapsed time = 3915 ---------------------- 64 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 2612 BenchWith32bitOperands Value = -644343429 Elapsed time = 3917 BenchWith64bitOperands Value = -644343429 Elapsed time = 3918
As you can see, there is nothing to choose from one of the LEA options based on this. The differences between their times are within the range of measurement variability. However, the option using ADD
twice wins hands.
Some different results from different machines. Here's the output on the Xeon E5530:
64 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 3434 BenchWith32bitOperands Value = -644343429 Elapsed time = 3295 BenchWith64bitOperands Value = -644343429 Elapsed time = 3279
And on Xeon E5-4640 v2:
64 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 4102 BenchWith32bitOperands Value = -644343429 Elapsed time = 5868 BenchWith64bitOperands Value = -644343429 Elapsed time = 5868
source to share
By separating the size of the operands themselves, the components of the memory operands have a default size. In 64-bit mode, this is 64 bits, meaning you should use 64-bit registers for the memory operand components unless you have a specific reason.
The x86 ISA allows resizing for a given command with a byte prefix 0x67
, but you probably don't want to do that (and apparently your assembler doesn't even support it).
To make the distinction between operand and operand component a little clearer:
lea eax, dword ptr [rax + rdx * 4]
^^^ ^^^^^ ^^^ operands: can be any size you like
^^^ ^^^ operand components: usually 64-bit
source to share