Adding a basic LLVM pass

| 🤔 | 👍 | 👎 |

Next episode:

The SimGrid model checker uses memory introspection (of the heap, stack and global variables) in order to detect the equality of the state of a distributed application at the different nodes of its execution graph. One difficulty is to deal with uninitialised variables. The uninitialised global variables are usually not a big problem as their initial value is 0. The heap variables are dealt with by memseting to 0 the content of the buffers returned by malloc and friends. The case of uninitialised stack variables is more problematic as their value is whatever was at this place on the stack before. In order to evaluate the impact of those uninitialised variables, we would like to clean each stack frame before using them. This could be done with a LLVM plugin. Here's my first attempt to write a LLVM pass to modify the code of a function.

A solution for this, would be to include, at compilation time, instructions to clean the stack frame at the beginning of each function. This could be implemented as a LLVM pass:

This is mostly relevant when the generated code is not optimised. In optimised code, local variables do not need to live on the stack.

Table of Content

LLVM overview

A good high level introduction to the LLVM architecture (LLVM IR and passes) can be found in The Architecture of Open Source Applications.

IR generation

LLVM uses an intermediate language, LLVM IR to optimise and generate native code.

For example, a simple hello world like this,

#include <stdio.h>

int main(int argc, char** argv) {
  puts("Hello world!");
  return 0;
}

is turned into this LLVM IR:

; ModuleID = 'helloworld.c'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@.str = private unnamed_addr constant [13 x i8] c"Hello world!\00", align 1

; Function Attrs: nounwind uwtable
define i32 @main(i32 %argc, i8** %argv) #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i8**, align 8
  store i32 0, i32* %1
  store i32 %argc, i32* %2, align 4
  store i8** %argv, i8*** %3, align 8
  %4 = call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}

declare i32 @puts(i8*) #1

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"Debian clang version 3.6.0-svn215195-1 (trunk) (based on LLVM 3.6.0)"}

by

clang -S -emit-llvm helloworold.c -o helloworld.ll

The generated LLVM IR can be target-dependant as the type of the variables may depend on the architecture/OS:

  • a C int is mapped into a LLVM i32 on 32-bit, LLP64 and LP64 system but to a i64 on ILP64;

  • a C long is mapped into a i32 on 32-bit and LLP64 systems but to i64 on LP64 and ILP64.

The initial generation of LLVM IR is not done in LLVM but by the frontend (clang, dragonegg…).

LLVM IR passes

Many LLVM optimisations are implemented in an architecture independant way by IR passes which transform/optimise IR:

opt -std-compile-opts -S helloworld.ll -o helloworld.opt.ll --time-passes 2> opt.log

Generated IR:

; ModuleID = 'helloworld.ll'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@.str = private unnamed_addr constant [13 x i8] c"Hello world!\00", align 1

; Function Attrs: nounwind uwtable
define i32 @main(i32 %argc, i8** nocapture readnone %argv) #0 {
  %1 = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @.str, i64 0, i64 0)) #2
  ret i32 0
}

; Function Attrs: nounwind
declare i32 @puts(i8* nocapture readonly) #1

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"Debian clang version 3.6.0-svn215195-1 (trunk) (based on LLVM 3.6.0)"}

CodeGen passes

This optimized LLVM IR is then used to generate assembly/binary code for the target architecture:

llc  helloworld.opt.ll -o helloworld.s --time-passes 2> llc.log

Generated assembly:

        .text
        .file   "/home/foo/temp/helloworld.opt.ll"
        .globl  main
        .align  16, 0x90
        .type   main,@function
main:                                   # @main
        .cfi_startproc
# BB#0:
        pushq   %rbp
.Ltmp0:
        .cfi_def_cfa_offset 16
.Ltmp1:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
.Ltmp2:
        .cfi_def_cfa_register %rbp
        movl    $.L.str, %edi
        callq   puts
        xorl    %eax, %eax
        popq    %rbp
        retq
.Ltmp3:
        .size   main, .Ltmp3-main
        .cfi_endproc

        .type   .L.str,@object          # @.str
        .section        .rodata.str1.1,"aMS",@progbits,1
.L.str:
        .asciz  "Hello world!"
        .size   .L.str, 13


        .ident  "Debian clang version 3.6.0-svn215195-1 (trunk) (based on LLVM 3.6.0)"
        .section        ".note.GNU-stack","",@progbits

Summary

A LLVM based compiler uses the following phases:

  1. code analysis (preprocessing, lexing, parsing, semantic analysis…);

  2. LLVM IR generation (by the compiler);

  3. LLVM IR transformation/optimisation (by applying IR passes);

  4. native code generation from IR (by applying CodeGen passes).

Steps 1 and 2 are parts of the code of the compiler. Steps 3 and 4 are handled by the LLVM framework (configurable/pluggable by the compiler).

As we want to touch the content of the stack, we want to add a CodeGen pass.

Adding a CodeGen pass

Let's first try to add a pass to insert a NOP into every function.

Let's create a new NoopInserter pass (NoopInserter.h). There are meny kinds of passes. This pass is a MachineFunction pass: it is called (runOnMachineFunction) on each generarated native function and can modify it before it is passed to the next pass.

#include <llvm/PassRegistry.h>
#include <llvm/CodeGen/MachineFunctionPass.h>

namespace llvm {

  class NoopInserter : public llvm::MachineFunctionPass {
  public:
    static char ID;
    NoopInserter();
    virtual bool runOnMachineFunction(llvm::MachineFunction &Fn);
  };

}

The ID is used as a reference to the pass in LLVM: the value of this variable is not important, only its address is used.

Implementation

#include "NoopInserter.h"

#include <llvm/CodeGen/MachineInstrBuilder.h>
#include <llvm/Target/TargetMachine.h>
#include <llvm/Target/TargetInstrInfo.h>
#include <llvm/PassManager.h>
#include <llvm/Transforms/IPO/PassManagerBuilder.h>
#include <llvm/CodeGen/Passes.h>
#include <llvm/Target/TargetSubtargetInfo.h>
#include "llvm/Pass.h"

#define GET_INSTRINFO_ENUM
#include "../Target/X86/X86GenInstrInfo.inc"

#define GET_REGINFO_ENUM
#include "../Target/X86/X86GenRegisterInfo.inc.tmp"

namespace llvm {
  char NoopInserter::ID = 0;

  NoopInserter::NoopInserter() : llvm::MachineFunctionPass(ID) {
  }

  bool NoopInserter::runOnMachineFunction(llvm::MachineFunction &fn) {
    const llvm::TargetInstrInfo &TII = *fn.getSubtarget().getInstrInfo();
    MachineBasicBlock& bb = *fn.begin();
    llvm::BuildMI(bb, bb.begin(), llvm::DebugLoc(), TII.get(llvm::X86::NOOP));
    return true;
  }

  char& NoopInserterID = NoopInserter::ID;
}

using namespace llvm;

INITIALIZE_PASS_BEGIN(NoopInserter, "noop-inserter",
  "Insert a NOOP", false, false)
INITIALIZE_PASS_DEPENDENCY(PEI)
INITIALIZE_PASS_END(NoopInserter, "noop-inserter",
  "Insert a NOOP", false, false)

The runOnMachineFunction method find the beginning of the function and insert a X86 NOOP instruction. The method return true in order to tell the LLVM framework that this function has been modified by this pass. This implementation will only work on X86/AMD64 targets. A real pass should be target independent or at least check the target.

The INITIALIZE_PASS macros declare the pass and declare its dependencies. Here, we are declaring a dependency on PEI a.k.a PrologEpilogInserter which adds the prolog and epilog to the code of native function. Those macros define a function:

void initializeNoopInserterPass(PassRegistry &Registry);

The NoopInserterID may be used by other passes to refer to this pass.

Declarations

We have to add a few declarations of this pass.

In include/llvm/CodeGen/Passes.h:

// NoopInserter - This pass inserts a NOOP instruction
extern char &NoopInserterID;

In include/llvm/InitializePasses.h:

void initializeNoopInserterPass(PassRegistry &Registry)

Registration

The pass must be added in llvm::initializeCodeGen() lib/CodeGen/CodeGen.cpp:

initializeNoopInserterPass(Registry);

Result

clang -O3 helloworld.c -S -o-

We have a nice NOOP:

    .text
    .file   "/home/foo/temp/helloworld.c"
    .globl  main
    .align  16, 0x90
    .type   main,@function
main:                                   # @main
    .cfi_startproc
# BB#0:                                 # %entry
    nop
    pushq   %rax
.Ltmp0:
    .cfi_def_cfa_offset 16
    movl    $.L.str, %edi
    callq   puts
    xorl    %eax, %eax
    popq    %rdx
    retq
.Ltmp1:
    .size   main, .Ltmp1-main
    .cfi_endproc

    .type   .L.str,@object          # @.str
    .section    .rodata.str1.1,"aMS",@progbits,1
.L.str:
    .asciz  "Hello world!"
    .size   .L.str, 13


    .ident  "clang version 3.6.0 "
    .section    ".note.GNU-stack","",@progbits

The program still works:

$ clang -O3 helloworld.c -S -o-
$ ./a.out
Hello world!

Conclusion

I successfully managed to add a pass in order to (actively) do nothing in each generated native function. In the next episode, I'll try do do something useful instead.