Translating armv7m instructions into LLVM IR

194 Views Asked by At

I am developing a lifter in c++ that lifts armv7m instructions into LLVM IR.

now I'm in the translation phase where I simply input an arm instruction and translate it into the equivalent SSA LLVM IR instructions.

My architecture simply creates an llvm function for each function in my assembly code and for each basic block inside a function create an llvm basic block and for each assembly instruction inside a basic block I use a builder to build the equivalent llvm IR instructions for that instruction (while passing the llvm basic block of that instruction's basic block to the IRBuilder)..a sample of my code for an ADD assembly instruction.

//Translates an ADD assembly instruction object into the equivalent SSA llvm IR instructions.  
llvm::Instruction* ADD(Instr* instruction, bool update_condition_flags = false) {

    IRBuilder<> builder(instruction->basic_block->get_llvm_basic_block());

    std::vector<Reg*> registers = instruction->get_registers();

    std::vector<Immediate*> immediates = instruction->get_immediates();

    if (instruction->get_immediates().empty()) {

            std::string register1 = registers[1]->get_register_name();

            Value* register1_val = registers[1]->get_register_value();

            llvm::Value* immediate = immediates[0]->get_immediate_value_long();

            std::string output_register = registers[0]->get_register_name();

            llvm::Instruction* x = builder.CreateAlloca(Type::getInt32Ty(TheContext), nullptr);

            instruction->insert_llvm_instructions(x);

            llvm::Instruction* s = builder.CreateStore(register1_val, x, /*isVolatile=*/false);

            instruction->insert_llvm_instructions(s);

            llvm::Instruction* LHS = builder.CreateLoad(x, register1);

            instruction->insert_llvm_instructions(LHS);

            llvm::Instruction* add_ll = BinaryOperator::Create(Instruction::Add,
                LHS, immediate, output_register);

            instruction->insert_llvm_instructions(add_ll);

            return add_ll;

Basically, I'm trying to generate the llvm IR instructions for an example add r1, r3, #2 assembly instruction.

I want to use the names of the registers used in the assembly instruction (r1, r3) in the generated equivalent llvm ir instructions (i.e. %r1, %r3). However in many instances in the generated IR, I get additional indices added to the name of the register as %r346 in this example (the "46" added after "%r3"). why are these indices added and is there a way of removing them please.

For an example add r1, r3, #2 assembly instruction. The output I want is:

%12 = alloca i32, align 4

store i32 0, i32* %12, align 4

%r3 = load i32, i32* %12, align 4

%r1 = add i32 %r3, 2

While the output I get is:

%12 = alloca i32, align 4

store i32 0, i32* %12, align 4

%r346 = load i32, i32* %12, align 4

%r1 = add i32 %r346, 2

Can I update variables inside an llvm function like this %1 here?.

%1 = mul i32 %x, %y

%1 = add i32 %1, %z

When im testing my lifter id like to follow a specific register (r3) as it progresses through different assembly instructions inside a function. And these additional indices will add confusion. For the following assembly code id like the generated llvm IRs for each assembly instruction to have %r3 without any additional indices.

 ldr r3, [fp, #-8] 

 add r3, r3, #1 

 str r3, [fp, #-8] 
1

There are 1 best solutions below

0
On

LLVM IR uses SSA, ie. each symbol refers to one value only. add r3, r3, #1 isn't possible, because r3 refers to two values (the result and the input).

I think the most common solution is to add a suffix that you determine from the offset within the function. If that add is the 16th instruction and the r3 it uses comes from the 12th, perhaps %r3.16 = add i32 %r3.12, 1.