LLVM Backend: replacing indirect jmps for x86 backend

Question

LLVM Backend: replacing indirect jmps for x86 backend

I want to replace indirect instructions jmp *(eax)

in code with mov *(eax),ebx; jmp *ebx

for x86 executables.

Before implementing this, I would like to make the LLVM compiler log the output every time it encounters an instruction jmp *(eax)

, adding some print statements.

Then I want to move on to indirect sequence replacement.

From what I've seen from google searches and articles, I can probably achieve this by modifying the x86asmprinter in the llvm backend. But I'm not sure how to do this. Any help or reading would be appreciated.

Note. My actual requirements are for indirect jumping and pop, but I want to start with that to understand the backend a little more before diving into anything else.

+3

x86 code-generation llvm codegen

woodstok Apr 19 15 at 20:49

source to share

1 answer

woodstok · Accepted Answer · 2015-05-04T18:41:07+0000

I am done with my project. Following my approach for others.

The main function of LLVM is to convert the intermediate representation to the final executable file depending on the target architecture and other specifications. The LLVM backend itself consists of several phases that make targeted specific optimization, team selection, planning, and training emitting. These phases are necessary because IR is a very general concept and requires a lot of changes in order to finally convert them into target executables.

1) Logging every time the compiler generates jmp *(eax)

We can achieve this by adding print instructions in the instruction input / print phase. After most of the basic conversion from IR is done, there is an AsmPrinter pass that goes through each Machine in the basic block of each function. This main loop is in lib/CodeGen/AsmPrinter/AsmPrinter.cpp:AsmPrinter::EmitFunctionBody()

. There are other related functions like EmitFunctionEpilogue, EmitFunctionPrologue. These functions are finally called EmitInstruction for specific architecture, for example lib/Target/X86/X86AsmPrinter.cpp

. If you do a little bit of work, you can call MI.getOpcode () and compare it against specific enums for the architecture to print the log.

For example, to jump using register in X86, this is X86 :: JMP64r. You can get register related using MI.getOperand (0) etc.

if(MI->getOpcode() == X86::JMP64r)
dbgs() << "Found jmp *x instruction\n";

2) Replacement instructions The required changes depend on the type of replacement required. If you need more context about registers or previous instructions, we will need to implement the changes above in the Pass chain. There is an instruction view called Selection DAG (directed acyclic graph), which stores the dependencies of each instruction to the previous instructions. For example, in the sequence

mov myvalue,%rax
jmp *rax

The DAG will have a jmp instruction pointing to the move instruction (and possibly other nodes before it), since the rax value depends on the mov instruction. Here you can replace Node with the nodes you want. If done correctly, it should definitively change the final instructions. The SelectionDAG code is in lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

. It is always best to poke first to find the perfect place to change. Each IR request goes through several changes before the DAG is topologically sorted so that the instructions are in a linear sequence. Graphs can be viewed using the -view-dag * options that are visible in llc --help-hidden

. In my case, I just added a specific check to EmitInstruction and added code to highlight the two instructions I wanted.

LLVM documentation is always there, but I found Eli Benderski in two articles more useful than any other resource. Life of LLVM Instruction and A Deeper Look at Generating LLVM Code . The articles discuss the very complex description of TableGen and the command matching process, which is pretty cool if you're interested.

LLVM Backend: replacing indirect jmps for x86 backend

More articles: