How do I make LLVM preferable to one machine instruction over another?

Suppose I have two register computational units on the target machine: i and X. Only integer operations with I-registers and integer and float operations can be applied to X-registers. There are also two types of instructions:

def ADDIi32 : MyInstruction< ..., (outs I:$Rs), (ins I:$Rm, I:$Rn), [(set i32:$Rs, (add i32:$Rm, i32:$Rn)]>;
...

def ADDXi32 : MyInstruction< ..., (outs X:$Rs), (ins X:$Rm, X:$Rn), [(set i32:$Rs, (add i32:$Rm, i32:$Rn)]>;
def ADDXf32 : MyInstruction< ..., (outs X:$Rs), (ins X:$Rm, X:$Rn), [(set f32:$Rs, (fadd f32:$Rm, f32:$Rn)]>;
...

      

They are encoded differently and have different asmstrings. Thus, llvm can display

int a, b;
a = a + b;

      

In ADDIi32 or ADDXi32, but

float a, b;
a = a + b;

      

displayed only for ADDXf32.

I would like LLVM to use ADDIi32 whenever possible, but unfortunately I haven't found a way to say that one instruction (or registration) "costs" more than another. The CostPerUse class in the register appears to be a candidate, but it defines the cost among other registers in the group, rather than among all the registers. I've seen some threads claiming that addComplexity (a field in the Instruction class with an ambiguous description) controls the selection of patterns, but does nothing.

LLVM seems to pick the first compare command. When I define ADDXf32 and ADDXi32 before ADDIi32, it only uses X functions for any data type. When I define ADDIi32 before ADDXf32 and ADDXi32, it uses I-instruction for integer data, but nothing matches floating data (weird). I suppose I could have inserted ADDIi32 between ADDXf32 and ADDXi32, but they are currently in different .td files and it doesn't seem like a long term solution.

Any ideas?

+3


source to share


2 answers


Note that the order of declaration does not affect the choice of the command: LLVM only sorts, matching more complex patterns first (which affects the addition of AddComplexity - note that increasing the added component increases the chances of matching the pattern), tiebreaking by matching statements with smaller CodeSize, finally , inseparably inseparable. Hence, you cannot rely on the order of the declaration.


EDIT: The bottom of my original answer doesn't really apply to your question - see http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-September/010828.html . In particular:

ISEL patterns are matched against DAGs before register allocation. So ISEL Models cannot take into account the case of classes. Case classes in templates are a selection limitation for the register allocator if this command is selected, not a limitation on the selection of this template.




In addition to the I and X register classes, you must have an "IntRegsRegClass" or similar in which you add both the I and X registers. (This is the reg class that you call addRegisterClass.)

The order in which registers are added to this register class determines the preferred placement order.

+6


source


It looks suspiciously similar to the m68k with its brain-fractured ribs D (ata) and A (ddress). For manual data transfer, D-registers are close to ALU and are used for whole operations, while A registers are close to an address computation unit. This means that if the pointer is in register D, it will need to be copied to register A before dereferencing it.

Padding is an operation that makes sense for both types of registers, since each increments both counters and pointers, and therefore there are separate ADD and ADDA instructions. They are probably implemented quite differently in silicon, but the only difference from the programmer is that ADD updates condition codes based on the result, whereas ADDA does not affect them.

Several other instructions also come in a separate special address form, including MOVE, and this is how I dealt with the same problem you have when implementing separate MOVE and MOVEA for the m68k backend and are annoyed that the wrong one was chosen. This is how I ended up on this SO post and Cheng San's answer told me that I really didn't want to hear, but I needed to know.

My new approach is to define three register classes: Dn, An and Xn (Xn is the existing convention on the m68k), consisting of D, A, and both D and A, respectively. I have combined ADD / ADDA into one new ADDZ pseudo-instruction (ADDX already exists) that acts on the Xn class where the condition codes are undefined. I plan to add a fix later that converts such pseudo-instructions to the appropriate valid command and removes any redundant TST statements for cases where condition codes are already set.



This stopped passing pattern matching, for example. issuing ADDA followed by MOVE when ADD makes more sense. Since the register allocator avoids redundant moves, pointer arithmetic will be performed using the A registers, even if they are listed last in the Xn class. If I also supported floating point, the FADD instruction would be defined to use a separate FPx register class and therefore would not conflict with the existing ADD and ADDZ patterns.

In the case of your example registering i and X, I would create the GPR and XR classes for the i and X and X classes respectively. The GPR class will also enumerate all registers so that the X registers are only used as a last resort. Then I would write your example templates like this:

def ADDi32 : MyInstruction< ..., (outs GPR:$Rs), (ins GPR:$Rm, GPR:$Rn), [(set i32:$Rs, (add i32:$Rm, i32:$Rn)]>;
def ADDf32 : MyInstruction< ..., (outs XR:$Rs), (ins XR:$Rm, XR:$Rn), [(set f32:$Rs, (fadd f32:$Rm, f32:$Rn)]>;

      

0


source







All Articles