{"version": "https://jsonfeed.org/version/1", "title": "/dev/posts/ - Tag index - llvm", "home_page_url": "https://www.gabriel.urdhr.fr", "feed_url": "/tags/llvm/feed.json", "items": [{"id": "http://www.gabriel.urdhr.fr/2014/10/06/cleaning-the-stack-in-a-llvm-pass/", "title": "Cleaning the stack in a LLVM pass", "url": "https://www.gabriel.urdhr.fr/2014/10/06/cleaning-the-stack-in-a-llvm-pass/", "date_published": "2014-10-06T10:00:02+02:00", "date_modified": "2014-10-06T10:00:02+02:00", "tags": ["computer", "simgrid", "llvm", "compilation", "assembly", "x86_64"], "content_html": "
In the previous episode, we implemented a LLVM pass which does\nnothing. Now we are trying to modify\nthis to create a (proof-of-concept) LLVM pass which fills the current\nstack frame with zero before using it.
\n\nThe top (in fact the bottom) of the stack is stored in the %rsp
\nregister: a push
operation decrements the value of %rsp
and store\nthe value in the resulting address; conversely a pop
operation\nincrements the value of %rsp
. Stack variables are allocated by\ndecrementing %rsp
.
A function call (call
) pushes the current value of the instruction\n(%rip
) pointer on the stack. A return instruction (ret
) pops a\nvalue from the stack into %rip
.
A typical call frame contains in order:
\nFor example this C code,
\nint f();\n\nint main(int argc, char** argv) {\n int i = 42;\n f();\n return 0;\n}\n
\nis compiled (with clang -S -fomit-frame-poiner example.c
) into this\n(using AT&T\nsyntax):
main:\n\tsubq\t$24, %rsp\n\tmovl\t$0, 20(%rsp)\n\tmovl\t%edi, 16(%rsp)\n\tmovq\t%rsi, 8(%rsp)\n\tmovl\t$42, 4(%rsp)\n\tmovb\t$0, %al\n\tcallq\tf\n\tmovl\t$0, %edi\n\tmovl\t%eax, (%rsp)\n\tmovl\t%edi, %eax\n\taddq\t$24, %rsp\n\tret\n
\nMemory is allocated on the stack using subq
. Local variables are\nusually referenced by offsets from the stack pointer, OFFSET(%rsp)
.
The x86 (32 bit) ABI uses the %rbp
as the base of the stack. This is\nnot mandatory in the x86-64\nABI but the\ncompiler might still use a frame pointer. The base of the stack frame\nin stored in %rbp
.
Here is the same program compiled with -fno-omit-frame-pointer
:
main:\n\tpushq\t%rbp\n\tmovq\t%rsp, %rbp\n\tsubq\t$32, %rsp\n\tmovl\t$0, -4(%rbp)\n\tmovl\t%edi, -8(%rbp)\n\tmovq\t%rsi, -16(%rbp)\n\tmovl\t$42, -20(%rbp)\n\tmovb\t$0, %al\n\tcallq\tf\n\tmovl\t$0, %edi\n\tmovl\t%eax, -24(%rbp)\n\tmovl\t%edi, %eax\n\taddq\t$32, %rsp\n\tpopq\t%rbp\n\tret\n
\nWhen a frame pointer is used, stack memory is usually referenced as\nfixed offset from %rsp
: OFFSET(%rsp)
.
The x86 32-bit ABI did not allow the code of the function to use\nvariables after the top of the stack: a signal handler could at any\nmoment use any memory after the top of the stack.
\nThe standard x86-64\nABI allows the\ncode of the current function to use the 128 bytes (the red zone) after\nthe top the stack. A signal handler must be instantiated by the OS\nafter the red zone. The red zone can be used for temporary variables\nor for local variables for leaf functions (functions which do not call\nother functions).
\n\nNote: Windows systems do not use the standard x86-64 ABI: the\nusage of the register is different and there is no red zone.
\nLet's make main()
a leaf function:
int main(int argc, char** argv) {\n int i = 42;\n return 0;\n}\n
\nThe variables are allocated in the red zone (negative offsets from the\nstack pointer):
\nmain:\n movl $0, %eax\n movl $0, -4(%rsp)\n movl %edi, -8(%rsp)\n movq %rsi, -16(%rsp)\n movl $42, -20(%rsp)\n ret\n
\nHere is the code we are going to add at the beginning of each\nfunction:
\n\tmovq $QSIZE, %r11\n.Lloop:\n movq $0, OFFSET(%rsp,%r11,8)\n subq $1, %r11\n jne .Lloop\n
\nfor some suitable values of QSIZE and OFFSET.
\nThe %r11
is defined by the System V x86-64 ABI (as well as the\nWindows ABI) as a scratchpad register: at the beginning of the\nfunction we are free to use it without saving it first.
This is implemented by a StackCleaner
machine pass whose\nrunOnMachineFunction()
works similarly to the NopInserter
pass.
We compute the parameters of the generate native code from the size of\nthe stack frame:
\nfn.getFrameInfo()->getStackSize()
is the size of the stack used\nby this function (excluding the red zone);X86FrameLowering.cpp
) and SimGridMC does not analyse the stack of\nleaf functions (we would just have to add 128 to size
in order to\nclean up the red zone as well);alloca()
) are not counted here.int size = fn.getFrameInfo()->getStackSize();\nint qsize = size / sizeof(uint64_t);\nif (size==0) {\n // No stack to clean, we do not modify the function:\n return false;\n}\nint offset = - size - sizeof(uint64_t);\n
\nFor LLVM, a functions is represented as a collection\nof basic\nblocks. A basic block is a sequence of instructions where:
\nOur assembly snippet is made of two basic blocks:
\nMachineBasicBlock* bb0 = fn.begin();\nMachineBasicBlock* bb1 = fn.CreateMachineBasicBlock();\nMachineBasicBlock* bb2 = fn.CreateMachineBasicBlock();\n\nfn.push_front(bb2);\nfn.push_front(bb1);\n
\nA functions is a Control Flow Graph of basic blocks. We need to\ncomplete the arcs in this graph:
\nbb1->addSuccessor(bb1);\nbb2->addSuccessor(bb2);\nbb2->addSuccessor(bb0);\n
\nWe generate the machine instructions:
\n// First basic block (initialisation):\n\n// movq $QSIZE, %r11\nllvm::BuildMI(*bb1, bb1->end(), llvm::DebugLoc(), TII.get(llvm::X86::MOV64ri),\n X86::R11).addImm(qsize);\n\n// Second basic block (.Lloop):\n\n// movq $0, OFFSET(%rsp,%r11,8)\nllvm::BuildMI(*bb2, bb2->end(), llvm::DebugLoc(), TII.get(llvm::X86::MOV64mi32))\n .addReg(X86::RSP).addImm(8).addReg(X86::R11).addImm(offset).addReg(0)\n .addImm(0);\n\n// subq $1, %r11\nllvm::BuildMI(*bb2, bb2->end(), llvm::DebugLoc(), TII.get(llvm::X86::SUB64ri8),\n X86::R11)\n .addReg(X86::R11)\n .addImm(1);\n\n// jne .Lloop\nllvm::BuildMI(*bb2, bb2->end(), llvm::DebugLoc(), TII.get(llvm::X86::JNE_4))\n .addMBB(bb2);\n
\nThe instructions have suffix on the argument size and types:
\n64
for instructions working on 64-bit values;r
for register;i
for immediate;i
for memory.The function has been modified:
\nreturn true;\n
\nHere is the generated assembly for our test code:
\nmain:\n\tmovabsq\t$3, %r11\n.LBB0_1:\n\tmovq\t$0, -32(%rsp,%r11,8)\n\tsubq\t$1, %r11\n\tjne\t.LBB0_1\n\tsubq\t$24, %rsp\n\tmovl\t$0, 20(%rsp)\n\tmovl\t%edi, 16(%rsp)\n\tmovq\t%rsi, 8(%rsp)\n\tmovl\t$42, 4(%rsp)\n\tmovb\t$0, %al\n\tcallq\tf\n\tmovl\t$0, %edi\n\tmovl\t%eax, (%rsp)\n\tmovl\t%edi, %eax\n\taddq\t$24, %rsp\n\tretq\n
\nHere is a simple test program using unitialized stack variables:
\n#include <stdio.h>\n\nvoid f() {\n int i;\n int data[16];\n\n for(i=0; i!=16; ++i)\n printf(\"%i \", data[i]);\n printf(\"\\n\");\n\n for(i=0; i!=16; ++i)\n data[i] = i;\n}\n\nvoid g() {\n int i, j, k, l, m, n, o, p;\n printf(\"%i %i %i %i %i %i %i %i\\n\", i, j, k, l, m, n, o, p);\n}\n\nint main(int argc, char** argv) {\n f();\n f();\n g();\n return 0;\n}\n
\nThis is the output of a normal compilation:
\n-1 0 -812203224 32767 -406470232 32655 -400476992 32655 -400465496 32655 0 0 1 0 4195997 0\n0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15\n16 0 0 15774463 15 14 13 12\n\n
And with our stack-cleaning clang:
\n0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n0 0 0 0 0 0 0 0\n\n
The whole SimGrid test suite works without compiling SimgridMC\nsupport.
\nAt this point, I discovered that SimGrid fails to run when compiled\nwith clang (or DragonEgg) with support for SimGridMC. I need to fix\nthis first before testing the impact of cleaning the stack on\nSimGridMC state comparison.
\nIn the next episode, I'll try another implementation of the same\nconcept using a few scripts in order to process the generated\nassembly between the compiler and the\nassembler\nwhich should work with a standard GCC and with SimGridMC.
\nThe SimGrid model checker uses memory introspection (of the heap,\nstack and global variables) in order to detect the equality of the\nstate of a distributed application at the different nodes of its\nexecution graph. One difficulty is to deal with uninitialised\nvariables. The uninitialised global variables are usually not a big\nproblem as their initial value is 0. The heap variables are dealt with\nby memset
ing to 0 the content of the buffers returned by malloc
\nand friends. The case of uninitialised stack variables is more\nproblematic as their value is whatever was at this place on the stack\nbefore. In order to evaluate the impact of those uninitialised\nvariables, we would like to clean each stack frame before using\nthem. This could be done with a LLVM plugin. Here is my first attempt\nto write a LLVM pass to modify the code of a function.
A solution for this, would be to include, at compilation time,\ninstructions to clean the stack frame at the beginning of each\nfunction. This could be implemented as a LLVM\npass:
\nThis is mostly relevant when the generated code is not optimised. In\noptimised code, local variables do not need to live on the stack.
\nA good high level introduction to the LLVM architecture (LLVM IR and\npasses) can be found in The Architecture of Open Source\nApplications.
\nLLVM uses an intermediate language, LLVM\nIR to optimise and generate native\ncode.
\nFor example, a simple hello world like this,
\n#include <stdio.h>\n\nint main(int argc, char** argv) {\n puts(\"Hello world!\");\n return 0;\n}\n
\nis turned into this LLVM IR:
\n; ModuleID = 'helloworld.c'\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n@.str = private unnamed_addr constant [13 x i8] c\"Hello world!\\00\", align 1\n\n; Function Attrs: nounwind uwtable\ndefine i32 @main(i32 %argc, i8** %argv) #0 {\n %1 = alloca i32, align 4\n %2 = alloca i32, align 4\n %3 = alloca i8**, align 8\n store i32 0, i32* %1\n store i32 %argc, i32* %2, align 4\n store i8** %argv, i8*** %3, align 8\n %4 = call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0))\n ret i32 0\n}\n\ndeclare i32 @puts(i8*) #1\n\nattributes #0 = { nounwind uwtable \"less-precise-fpmad\"=\"false\" \"no-frame-pointer-elim\"=\"true\" \"no-frame-pointer-elim-non-leaf\" \"no-infs-fp-math\"=\"false\" \"no-nans-fp-math\"=\"false\" \"stack-protector-buffer-size\"=\"8\" \"unsafe-fp-math\"=\"false\" \"use-soft-float\"=\"false\" }\nattributes #1 = { \"less-precise-fpmad\"=\"false\" \"no-frame-pointer-elim\"=\"true\" \"no-frame-pointer-elim-non-leaf\" \"no-infs-fp-math\"=\"false\" \"no-nans-fp-math\"=\"false\" \"stack-protector-buffer-size\"=\"8\" \"unsafe-fp-math\"=\"false\" \"use-soft-float\"=\"false\" }\n\n!llvm.ident = !{!0}\n\n!0 = metadata !{metadata !\"Debian clang version 3.6.0-svn215195-1 (trunk) (based on LLVM 3.6.0)\"}\n
\nby
\nclang -S -emit-llvm helloworold.c -o helloworld.ll\n
\nThe generated LLVM IR can be target-dependant as the type of the\nvariables may depend on the architecture/OS:
\nint
is mapped into a LLVM i32
on 32-bit, LLP64 and LP64\nsystem but to a i64
on ILP64;long
is mapped into a i32
on 32-bit and LLP64 systems but\nto i64
on LP64 and ILP64.The initial generation of LLVM IR is not done in LLVM but by the\nfrontend (clang, dragonegg, etc.).
\nMany LLVM optimisations are implemented in an architecture independant\nway by IR passes which transform/optimise IR:
\nopt -std-compile-opts -S helloworld.ll -o helloworld.opt.ll --time-passes 2> opt.log\n
\nGenerated IR:
\n; ModuleID = 'helloworld.ll'\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n@.str = private unnamed_addr constant [13 x i8] c\"Hello world!\\00\", align 1\n\n; Function Attrs: nounwind uwtable\ndefine i32 @main(i32 %argc, i8** nocapture readnone %argv) #0 {\n %1 = tail call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @.str, i64 0, i64 0)) #2\n ret i32 0\n}\n\n; Function Attrs: nounwind\ndeclare i32 @puts(i8* nocapture readonly) #1\n\nattributes #0 = { nounwind uwtable \"less-precise-fpmad\"=\"false\" \"no-frame-pointer-elim\"=\"true\" \"no-frame-pointer-elim-non-leaf\" \"no-infs-fp-math\"=\"false\" \"no-nans-fp-math\"=\"false\" \"stack-protector-buffer-size\"=\"8\" \"unsafe-fp-math\"=\"false\" \"use-soft-float\"=\"false\" }\nattributes #1 = { nounwind \"less-precise-fpmad\"=\"false\" \"no-frame-pointer-elim\"=\"true\" \"no-frame-pointer-elim-non-leaf\" \"no-infs-fp-math\"=\"false\" \"no-nans-fp-math\"=\"false\" \"stack-protector-buffer-size\"=\"8\" \"unsafe-fp-math\"=\"false\" \"use-soft-float\"=\"false\" }\nattributes #2 = { nounwind }\n\n!llvm.ident = !{!0}\n\n!0 = metadata !{metadata !\"Debian clang version 3.6.0-svn215195-1 (trunk) (based on LLVM 3.6.0)\"}\n
\nThis optimized LLVM IR is then used to generate assembly/binary code\nfor the target architecture:
\nllc helloworld.opt.ll -o helloworld.s --time-passes 2> llc.log\n
\nGenerated assembly:
\n .text\n .file \"/home/foo/temp/helloworld.opt.ll\"\n .globl main\n .align 16, 0x90\n .type main,@function\nmain: # @main\n .cfi_startproc\n# BB#0:\n pushq %rbp\n.Ltmp0:\n .cfi_def_cfa_offset 16\n.Ltmp1:\n .cfi_offset %rbp, -16\n movq %rsp, %rbp\n.Ltmp2:\n .cfi_def_cfa_register %rbp\n movl $.L.str, %edi\n callq puts\n xorl %eax, %eax\n popq %rbp\n retq\n.Ltmp3:\n .size main, .Ltmp3-main\n .cfi_endproc\n\n .type .L.str,@object # @.str\n .section .rodata.str1.1,\"aMS\",@progbits,1\n.L.str:\n .asciz \"Hello world!\"\n .size .L.str, 13\n\n\n .ident \"Debian clang version 3.6.0-svn215195-1 (trunk) (based on LLVM 3.6.0)\"\n .section \".note.GNU-stack\",\"\",@progbits\n
\nA LLVM based compiler uses the following\nphases:
\nSteps 1 and 2 are parts of the code of the compiler. Steps 3 and 4 are\nhandled by the LLVM framework (configurable/pluggable by the\ncompiler).
\nAs we want to touch the content of the stack, we want to add a CodeGen\npass.
\nLet's first try to add a pass to insert a NOP into every function.
\nLet's create a new NoopInserter
pass (NoopInserter.h
). There are\nmany kinds of passes. This pass is a MachineFunction
pass: it is\ncalled (runOnMachineFunction
) on each generarated native function\nand can modify it before it is passed to the next pass.
#include <llvm/PassRegistry.h>\n#include <llvm/CodeGen/MachineFunctionPass.h>\n\nnamespace llvm {\n\n class NoopInserter : public llvm::MachineFunctionPass {\n public:\n static char ID;\n NoopInserter();\n virtual bool runOnMachineFunction(llvm::MachineFunction &Fn);\n };\n\n}\n
\nThe ID
is used as a reference to the pass in LLVM: the value of this\nvariable is not important, only its address is used.
#include \"NoopInserter.h\"\n\n#include <llvm/CodeGen/MachineInstrBuilder.h>\n#include <llvm/Target/TargetMachine.h>\n#include <llvm/Target/TargetInstrInfo.h>\n#include <llvm/PassManager.h>\n#include <llvm/Transforms/IPO/PassManagerBuilder.h>\n#include <llvm/CodeGen/Passes.h>\n#include <llvm/Target/TargetSubtargetInfo.h>\n#include \"llvm/Pass.h\"\n\n#define GET_INSTRINFO_ENUM\n#include \"../Target/X86/X86GenInstrInfo.inc\"\n\n#define GET_REGINFO_ENUM\n#include \"../Target/X86/X86GenRegisterInfo.inc.tmp\"\n\nnamespace llvm {\n char NoopInserter::ID = 0;\n\n NoopInserter::NoopInserter() : llvm::MachineFunctionPass(ID) {\n }\n\n bool NoopInserter::runOnMachineFunction(llvm::MachineFunction &fn) {\n const llvm::TargetInstrInfo &TII = *fn.getSubtarget().getInstrInfo();\n MachineBasicBlock& bb = *fn.begin();\n llvm::BuildMI(bb, bb.begin(), llvm::DebugLoc(), TII.get(llvm::X86::NOOP));\n return true;\n }\n\n char& NoopInserterID = NoopInserter::ID;\n}\n\nusing namespace llvm;\n\nINITIALIZE_PASS_BEGIN(NoopInserter, \"noop-inserter\",\n \"Insert a NOOP\", false, false)\nINITIALIZE_PASS_DEPENDENCY(PEI)\nINITIALIZE_PASS_END(NoopInserter, \"noop-inserter\",\n \"Insert a NOOP\", false, false)\n
\nThe runOnMachineFunction
method finds the beginning of the function\nand inserts a X86 NOOP instruction. The method returns true
in order\nto tell the LLVM framework that this function has been modified by\nthis pass. This implementation will only work on X86/AMD64 targets.\nA real pass should be target independent or at least check the target.
The INITIALIZE_PASS
macros declare the pass and declare its\ndependencies. Here, we are declaring a dependency on PEI
a.k.a\nPrologEpilogInserter
which adds the prolog and epilog to the code of\nnative function. Those macros define a function:
void initializeNoopInserterPass(PassRegistry &Registry);\n
\nThe NoopInserterID
may be used by other passes to refer to this\npass.
We have to add a few declarations of this pass.
\nIn include/llvm/CodeGen/Passes.h
:
// NoopInserter - This pass inserts a NOOP instruction\nextern char &NoopInserterID;\n
\nIn include/llvm/InitializePasses.h
:
void initializeNoopInserterPass(PassRegistry &Registry)\n
\nThe pass must be added in llvm::initializeCodeGen()
\nlib/CodeGen/CodeGen.cpp
:
initializeNoopInserterPass(Registry);\n
\nclang -O3 helloworld.c -S -o-\n
\nWe have a nice NOOP:
\n\t.text\n\t.file\t\"/home/foo/temp/helloworld.c\"\n\t.globl\tmain\n\t.align\t16, 0x90\n\t.type\tmain,@function\nmain: # @main\n\t.cfi_startproc\n# BB#0: # %entry\n\tnop\n\tpushq\t%rax\n.Ltmp0:\n\t.cfi_def_cfa_offset 16\n\tmovl\t$.L.str, %edi\n\tcallq\tputs\n\txorl\t%eax, %eax\n\tpopq\t%rdx\n\tretq\n.Ltmp1:\n\t.size\tmain, .Ltmp1-main\n\t.cfi_endproc\n\n\t.type\t.L.str,@object # @.str\n\t.section\t.rodata.str1.1,\"aMS\",@progbits,1\n.L.str:\n\t.asciz\t\"Hello world!\"\n\t.size\t.L.str, 13\n\n\n\t.ident\t\"clang version 3.6.0 \"\n\t.section\t\".note.GNU-stack\",\"\",@progbits\n
\nThe program still works:
\n$ clang -O3 helloworld.c -S -o-\n$ ./a.out\nHello world!\n
\nI successfully managed to add a pass in order to (actively) do nothing\nin each generated native function. In the next episode, I will try to do\nsomething useful\ninstead.
\n"}]}