The ELF file format

Some notes on the ELF file format with references, explanations and some examples.

The ELF file format is a standard file format for executable files, dynamic libraries27 (DSOs, .so files), compiled compilation unit (.o files) and core dumps. It is used for many platforms6 including many recent Unix-ish systems (System V, GNU, BSD) and embedded software5.

You might want to read this document alongside with the outputs of readelf, objdump -D2, objcopy --dump-section, elfcat7 and/or an hexadecimal editor. You might want to cross-reference with elf.h, the manpage (man 5 elf) or the ELF specs.

Basic structure

The ELF header is located at the beginning of the ELF file and contains information about the target OS, architecture, the type of ELF file (executable, dynamic library, etc.) and the location of two important structures within the ELF file defining two views of the ELF file:

  • the program header table defining the execution view;

  • the section header table defining the linking view.

Execution view

The execution view is given by the program header table. This table is used (by the kernel, by the dynamic linker, etc.) to create a runtime image of the program in memory:

  • which ranges of the file should be loaded in memory (the segments);

  • which dynamic linker should be used (if any);

  • which other shared-objects are needed;

  • how to resolve the references to other shared-objects;

  • etc.

Linking view

The linking view is given by the section header table which describes the location of the different sections (within the file and within the the runtime image of the program).

The .o files generated by the compiler are made of different sections (.text for executable code, .data for initialised global variables, .rss for uninitialised global variables, .rodata for read-only global variables, etc.): the link editor combines different .o files in a single executable or DSO (by merging the sections of the different .o files with the same name) and generates some others (.got, .dynamic, .plt, .got.plt, etc.)12.

The linking view is not used at runtime: all the information needed at runtime is in the the program header table. Some sections are not used at runtime (debugging information, full symbol table) and are not present in the execution view. Those sections and the section header table can be omitted (or stripped) from the ELF file.

If they are present those extra informations can be used by debugging tools (such as GDB), profiling tools, etc. Many tools for inspection and manipulation of ELF files (readelf, objdump) rely on the section table header to work correctly.

Other important structures

The dynamic section contains important informations used for dynamic linking.

Symbol tables list the symbols defined and used by the file.

Hash tables are used for efficient lookup of symbols by their name (symbol table entries by symbol name).

Relocation tables list the relocations needed to relocate the ELF file at a different memory address or to link it to other ELF objects;

String tables are lists of strings which are referenced at other places in the ELF file (for section names in the section header table, for symbol names on the symbol tables, etc.);

The GOT is a table filled by the dynamic linker with addresses of functions and variables. The program uses those entries to get the address of variables or functions which could be located in another ELF module.

The PLT contains trampolines: they are stubs for functions which might be located in another ELF module. The program calls those stubs which calls the real function (by dereferencing a corresponding GOT entry). This is used for lazy relocation.

Notes are used to add miscellaneous informations (such as GNU ABI informations, GNU build IDs).

ELF header

The ELF header is at the beginning of the ELF file and contains:

  • a 4-bytes magic number used to identify ELF files (0x7f followed by the "ELF" string);

  • informations about the ELF file;

  • informations about the target machine and OS/ABI;

  • the location of the main structures of the ELF files, the section table and the program table.

The ELF header is using the following structure4:

typedef struct {
  unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
  Elf64_Half    e_type;             /* Object file type */
  Elf64_Half    e_machine;          /* Architecture */
  Elf64_Word    e_version;          /* Object file version */
  Elf64_Addr    e_entry;            /* Entry point virtual address */
  Elf64_Off     e_phoff;            /* Program header table file offset */
  Elf64_Off     e_shoff;            /* Section header table file offset */
  Elf64_Word    e_flags;            /* Processor-specific flags */
  Elf64_Half    e_ehsize;           /* ELF header size in bytes */
  Elf64_Half    e_phentsize;        /* Program header table entry size */
  Elf64_Half    e_phnum;            /* Program header table entry count */
  Elf64_Half    e_shentsize;        /* Section header table entry size */
  Elf64_Half    e_shnum;            /* Section header table entry count */
  Elf64_Half    e_shstrndx;         /* Section header string table index */
} Elf64_Ehdr;

readelf -h can display the content of the ELF header.

ELF class

The e_ident[EI_CLASS] field describes the ELF class: 32-bit (ELFCLASS32) or 64-bit (ELFCLASS64) for 32-bit and 64-bit programs respectively.

The ELF structures are different for the two ELF classes: the fields are the same but their type and sometimes their order is different (in order to have packed structures). For example, the -ELF header is using the Elf32_Ehdr and Elf64_Ehdr structures for -ELFCLASS32 and ELFCLASS64 respectively.

ELF endianess

The e_ident[EI_DATA] field describes the encoding (endianess) of the architecture (either ELFDATA2LSB or ELFDATA2MSB). The fields of the ELF file are encoded in the encoding/endianess of the architecture: you might have to swap the endianess (see endian.h) if you process ELF files from a foreign architecture.

ELF type

The ELF type is in the e_type field:

  • ET_REL is used for relocatable objects (.o files);

  • ET_EXEC is used for executable files (with the exception of PIEs which are ET_DYN);

  • ET_DYN is used for dynamic libraries also known as shared-objects (.so files);

  • ET_CORE is used for core files8.

A major difference between ET_EXEC and ET_DYN files is that ET_DYN files are always fixed at a given position in the virtual address. In contrast, ET_DYN files can be relocated anywhere in the virtual address space by applying a constant offset to its virtual addresses10: the same .so file can be mapped at different locations in different processes9. Usually, the shared-object is mapped at address 0 in the ELF file29.

Normal (ET_EXEC) executables are always mapped at a given location so the location of their subprograms and global variables is always the same for each process. This knowledge can be exploited by an attacker to get control of the process. In order to avoid this, the program can be compiled as a PIE11 which can be mapped (relocated) at any address in the process virtual address space. PIEs being relocatable are ET_DYN instead of ET_EXEC file.

The Linux kernel (vmlinux) uses the ET_EXEC type and its loadable modules (.ko files) use the ET_REL type.

Location of the header tables

The location of the section header table and program header table are described in the ELF header:

  • in e_phoff, e_phentsize, e_phnum for the program header table (execution view);

  • in e_shoff, e_shentsize, e_shnum for the section header table (linking view).

Section header table

The section header table defines the linking view of the ELF file: each entry defines a section within the file. The compiler generates relocatable object (.o files) made of different sections (.text, .data, .rodata, .rss, etc.). When the link editor ld combines different relocatable objects into an executable or shared-object, it merges the sections with the same name in a single section in the final output. For example, it combines the .text sections (containing the compiled code) of the different .o files in a single .text section.

The section table is an array of section descriptions with the structure:

typedef struct {
  Elf64_Word    sh_name;      /* Section name (string tbl index) */
  Elf64_Word    sh_type;      /* Section type */
  Elf64_Xword   sh_flags;     /* Section flags */
  Elf64_Addr    sh_addr;      /* Section virtual addr at execution */
  Elf64_Off     sh_offset;    /* Section file offset */
  Elf64_Xword   sh_size;      /* Section size in bytes */
  Elf64_Word    sh_link;      /* Link to another section */
  Elf64_Word    sh_info;      /* Additional section information */
  Elf64_Xword   sh_addralign; /* Section alignment */
  Elf64_Xword   sh_entsize;   /* Entry size if section holds table */
} Elf64_Shdr;

The first entry of a section header table is always a empty null section (type SHT_NULL).

readelf -S can display the section header table. readelf -x can be used to get a hexdump of a given ELF section. A raw dump of a section can be produced with objcopy a.out --dump-section .dynstr=/dev/stdout /dev/null | cat. Note that, some sections are not visible to objcopy and objdump: you might want to use elfcat7 instead.

Section names

Each section has a name (.text, .data, .rodata, .rss, .got, .plt, etc.): all section names are stored in a string table (.shstrtab). The e_shstrndx field of the ELF header is the index (in the section header table) of the section containing the section names:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
[...]
  Section header string table index: 26

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [26] .shstrtab         STRTAB           0000000000000000  0001e220
       00000000000000f3  0000000000000000           0     0     1

The sh_name field of the section header is the byte offset of the section name within this string table.

Existing sections

Section name Type Usage (and equivalent runtime description)
.text SHT_PROGBITS Main executable code
.data SHT_PROGBITS Initialised read and write data
.rodata SHT_PROGBITS Read only data
.bss SHT_NOBITS Uninitialised read and write data
.data.rel.ro SHT_PROGBITS
.tdata SHT_PROGBITS Initialised thread-local data (part of PT_TLS)
.tbss SHT_NOBITS Uninitialised thread-local data (part of PT_TLS)
.init SHT_PROGBITS Initialisation code (usually .init, DT_INIT)
.fini SHT_PROGBITS Termination code (usually .fini, DT_FINI)
.init_array SHT_INIT_ARRAY Addresses of initialisation functions (DT_INIT_ARRAY and DT_INIT_ARRAYSZ`)
.fini_array SHT_FINI_ARRAY Addresses of termination functions (DT_FINI_ARRAY and DT_FINI_ARRAYSZ`)
.ctors SHT_PROGBITS Similar to .init_array but old-school
.dtors SHT_PROGBITS Similar to .fini_array but old-school
.dynsym SHT_DYNSYM Dynamic symbol table (DT_SYMTAB)
.dynstr SHT_STRTAB String table for dynamic linkins (DT_STRTAB)
.symtab SHT_SYMTAB Full symbol table
.symtab_shndx SHT_SYMTAB_SHNDX
.strtab SHT_STRTAB String table used for the symbol table
.relaXXX SHT_RELA Relocations for section XXX, with addend
.relXXX SHT_REL Relocations for section XXX, without addend
.rela.dyn SHT_RELA Other runtime relocations, with addend
.rel.dyn SHT_REL Other runtime relocations, without addend
.rela.plt SHT_RELA PLT relocations, with addend
.rel.plt SHT_REL PLT relocations, without addend
.got SHT_PROGBITS Main GOT
.got.plt SHT_PROGBITS PLT GOT, GOT used by the PLT (lazy relocations)
.hash SHT_HASH Standard symbol hash table (DT_HASH)
.gnu.hash SHT_GNU_HASH GNU symbol hash table (DT_GNU_HASH)
.gnu.version SHT_VERSYM GNU symbol versions (DT_VERSYM)
.gnu.version_r SHT_VERNEED GNU versions requirements (DT_VERNEED and DT_VERNEED_NUM)
.gnu.version_d SHT_VERDEF GNU versions definitions (DT_VERDEF and DT_VERDEF_NUM)
.debug_info SHT_PROGBITS DWARF, Main DWARF section (variables, subprograms, types, etc.)
.debug_abbrev SHT_PROGBITS DWARF, Type of the nodes in debug_abbrev
.debug_aranges SHT_PROGBITS DWARF
.debug_line SHT_PROGBITS DWARF, Mapping between instruction and source code lines
.debug_str SHT_PROGBITS DWARF, Strings for DWARF sections
.debug_fame SHT_PROGBITS DWARF, Stack unwinding information31
.debug_macro Debug macros (GNU extension)
.debug_link 32
.stab SHT_PROGBITS Debugging informations in the (old) stab format
.stabstr SHT_PROGBITS Strings associated with the .stab section
.eh_frame SHT_PROGBITS Runtime stack unwinding information31
.eh_frame_hdr SHT_PROGBITS Header (location and index) of the EH frame table (PT_GNU_EH_FRAME)
.shstrtab SHT_STRTAB String table for section names
.note.XXXX SHT_NOTE Note
.note.ABI-tag SHT_NOTE ABI used in this file (NT_GNU_ABI_TAG)
.note.gnu.build-id SHT_NOTE Build-id for thie build33 (NT_GNU_BUILD_ID note.)
.dynamic SHT_DYNAMIC Dynamic table, dynamic linking information (PT_DYNAMIC)
.interp SHT_PROGBITS Interpreter (PT_INTERP)
.group SHT_GROUP Group of related sections (used for COMDAT)
.comment
.jcr SHT_PROGBITS Used for Java (?)
.stapsdt.base Used for SystemTap SDT
.note.stapsdt Used for SystemTap SDT
.gcc_except_table SHT_PROGBITS LSDA (Language Specific Data) for exception handling
.gnu.warning Warning message when linking against this file34
.gnu_warning.XXX SHT_PROGBITS Warning message when linking against symbol XXX34
.ARM.extab SHT_PROGBITS
.ARM.exidx SHT_ARM_EXIDX
.ARM.attributes SHT_ARM_ATTRIBUTES

Section types

  • SHT_PROGBITS, section containing data which do not have any special meaning for the link editor;

  • SHT_NOBITS, section full of zeros (.bss);

  • SHT_NOTE, notes

  • SHT_HASH, standard symbol hash table;

  • SHT_GNU_HASH, GNU symbol hash table;

  • SHT_DYNSYM, minimum runtime dynamic symbol table (.dynsym)

  • SHT_SYMTAB, full symbol table (.symtab)

  • SHT_STRTAB, string tables

  • SHT_RELA and SHT_REL, relocation tables (with addendum and without addendum respectively);

  • SHT_INIT_ARRAY and SHT_FINI_ARRAY, addresses of initialisation/termination functions;

  • SHT_DYNAMIC, dynamic table (.dynamic section, DT_DYNAMIC segment type)

  • SHT_VERDEF

  • SHT_VERSYM

  • SHT_VERNEED

  • SHT_GROUP

Section link

For symbol tables (SHT_SYMTAB and SHT_DYNSYM) and the dynamic section (SHT_DYNAMIC), the sh_link gives the index of the string table used to find the strings referenced in the section.

For symbol hash tables (SHT_HASH and SHT_GNU_HASH) and relocation tables (SHT_RELA and SHT_REL), it gives the index of the associated symbol table.

Section info

For relocation tables, the sh_info field gives the index of the section it applies to. This is mostly relevant for .o files. For executables and DSOs on GNU systems, the .rela.dyn uses 0 because it applies to many different sections and rela.plt uses the index of the .plt even if it applies to the .got.plt.

For symbol tables, it gives the index in the symbol table which can be used to skip the STT_LOCAL symbols.

Section flags

The sh_flags is a field of flags:

  • SHF_WRITE, SHF_EXECINSTR defines expected runtime accesses to this section. When the linker editor, it will set the PF_W and PF_X flags accordingly.

  • SHF_ALLOC is used for sections which are present in the ELF file at runtime. Those section are expected to be present in a PT_LOAD entry.

  • SHF_MERGE is used for section which can be merged to eliminate duplication.

  • SHF_STRINGS is used for string table sections.

  • SHF_INFO_LINK is used for sections which reference another section in the sh_info field25.

  • SHF_LINK_ORDER

  • SHF_OS_NONCONFORMING is used for sections which need OS-specific processing.

  • SHF_GROUP

  • SHF_TLS is used for section holding TLS.

Program header table

The program header table defines the execution view of the ELF file:

  • location (in memory and on disk) of the segments (parts of the file which must be loaded/mapped in program memory);

  • location (in memory and on disk) of some important runtime parts of those segments.

The program table is an array of program headers:

typedef struct {
   uint32_t   p_type;   /* Segment type */
   uint32_t   p_flags;  /* Segment flags */
   Elf64_Off  p_offset; /* Segment file offset */
   Elf64_ddr p_vaddr;   /* Segment virtual address */
   Elf64_Addr p_paddr;  /* Segment physical address */
   uint64_t   p_filesz; /* Segment size in file */
   uint64_t   p_memsz;  /* Segment size in memory */
   uint64_t   p_align;  /* Segment alignment */
} Elf64_Phdr;

The program header table can be seen with readelf -l. readelf tells as well which section is located in each region described in a program header entry.

Segments

A PT_LOAD entry represents a loadable segment to load (typically mmap()) in the program memory. A typical ELF executable or DSO has two such entries describing two segments28:

  1. The first one is the text segment. It is executable, readable but not writable and contains code and read-only data (.text, .rodata, .plt, .eh_frame, etc);

  2. The second one is the data segment. It is readable, writable but not executable and contains the modifiable data (.data, .got, got.plt, .bss, etc.).

The idea in this separation is that everything which does not need to be written (read-only data, code) should be read-only:

  • the text segment is not modified23 and thus its memory pages can be shared for all processes using this ELF file;

  • the pages of the writable segment are automatically unshared by the OS as soon as they are modified (using copy-on-write).

Security considerations

Another important property in the design is that executable segments are not writable14. If a process has VMAs36 which are both executable and writable, an attacker might exploit bugs such as buffer overflows in order to write arbitraty code in the program's memory and possibly execute it. If the executable pages are read-only, the attackers can try to write arbitrary code but it will not be executable13.

Example

A simple hello world program:

  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001c0 0x00000000000001c0  R E    8
  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000006dc 0x00000000000006dc  R E    200000
  LOAD           0x00000000000006e0 0x00000000006006e0 0x00000000006006e0
                 0x0000000000000230 0x0000000000002288  RW     200000
  DYNAMIC        0x00000000000006f8 0x00000000006006f8 0x00000000006006f8
                 0x00000000000001d0 0x00000000000001d0  RW     8
  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x00000000000005b4 0x00000000004005b4 0x00000000004005b4
                 0x0000000000000034 0x0000000000000034  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10

We can see the resulting VMAs36 in /proc/$pid/maps of a corresponding process:

  • The first VMA is the text segment.

  • The second VMA is the part of the data segment which is initialised.

  • The fourth VMA is the part of the data segment which is not been initialised. This is the end of the .bss segment. This part is it not stored in the ELF file and and is a thus a separate MAP_ANONYMOUS VMA.

00400000-00401000 r-xp 00000000 08:13 27418661   /home/foo/temp/wait
00600000-00601000 rw-p 00000000 08:13 27418661   /home/foo/temp/wait
00601000-00603000 rw-p 00000000 00:00 0
[...]

Read only relocations

On GNU systems, the dynamic linker may be instructed to mprotect() the .got section against write access after the relocation is finished. This improves the security by preventing the poisoning of the (non-PLT) GOT26 after the relocation is done.

This is enabled with ld -z relro (which generates a PT_GNU_RELRO entry) and disabled explicitly with ld -z norelo. When enabled, PT_GNU_RELRO is present in the program header table and describes a range of memory which the dynamic linker can mprotect() after the (non-lazy) relocation is done (the .got section).

The same example program linked with ld -z relro features the additional PT_GNU_RELRO entry:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000070c 0x000000000000070c  R E    200000
  LOAD           0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x0000000000000230 0x0000000000002258  RW     200000
  DYNAMIC        0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
                 0x00000000000001d0 0x00000000000001d0  RW     8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x00000000000005e4 0x00000000004005e4 0x00000000004005e4
                 0x0000000000000034 0x0000000000000034  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x00000000000001f0 0x00000000000001f0  R      1

This can be seen in /proc/$pid/maps:

  • the first VMA is the text segment;

  • the second VMA is the part of the data segment (described by PT_GNU_RELRO) which as been protected;

  • the third VMA is the part of the data segment which has not been protected;

  • the fourth VMA is the part of the data segment which has not been initialised.

00400000-00401000 r-xp 00000000 08:13 27418663   /home/foo/temp/wait2
00600000-00601000 r--p 00000000 08:13 27418663   /home/foo/temp/wait2
00601000-00602000 rw-p 00001000 08:13 27418663   /home/foo/temp/wait2
00602000-00604000 rw-p 00000000 00:00 0
[...]

In addition, ld -z now (DF_BIND_NOW) might be used which disables lazy-relocation. By combining the two options, you can get an executable or DSO without .got.plt and all the GOT will be read-only after relocation.

Other program header entries

  • PT_PHDR describes the program header table itself if it is available in the program memory.

  • PT_INTERP gives the absolute path of the interpreter/dynamic-linker (.interp section) if there is one. This entry if present on dynamically linked executables.

  • PT_DYNAMIC describes the location of the dynamic linking informations (.dynamic section).

  • PT_GNU_EH_FRAME (aka PT_SUNW_EH_FRAME) describes the location of stack unwinding information used at runtime (for exception handling). This is the .eh_frame_hdr section which can be used to locate the .eh_frame section.

  • PT_GNU_STACK is an empty segment which can be used to set the permissions of the default stack. This can be used to make the stack executable which is probably not such as good idea.

  • PT_NOTE describes the location of notes. All the .noteXXX sections (of type SHT_NOTE) are usually combined into a single PT_NOTE section.

  • PT_TLS is used for initalising TLS.

String tables

String tables are lists of strings. They use the SHT_STRTAB section type. Each string in the string table is terminated by a NUL byte and is referenced by its byte offset from the beginning of the table.

The first entry of a string table is always the empty string (the first byte of a string table is always NUL): the empty string can always be designated with the zero offset.

The content of a string section can be displayed with readelf -p .dynstr or with objcopy a.out --dump-section .dynstr=/dev/stdout /dev/null | tr '\000' '\n'.

Usages:

  • .shstrtab holds the section names;

  • .dynstr holds the names of the symbols in the dynamic symbol table .dynsym;

  • .strtab holds the names of the symbols in the full symbol table .symtab.

References to string tables:

  • the (section index of the) string table used for section names is indicated in the e_shstrndx field of the ELF header;

  • many sections which with string reference use the section header field sh_link to give the (section index of the) string table they use;

  • in the dynamic section, the string table used is located with the DT_STRTAB entry.

Example of .shstrtab (x86_64 GNU/Linux)

Section Headers:

[Nr] Name              Type             Address           Offset
     Size              EntSize          Flags  Link  Info  Align
[27] .shstrtab         STRTAB           0000000000000000  000008f1
     0000000000000108  0000000000000000           0     0     1

File hexdump:

[...]
000008b0: 0000 0000 0000 0000 4743 433a 2028 4465  ........GCC: (De
000008c0: 6269 616e 2034 2e39 2e32 2d31 3029 2034  bian 4.9.2-10) 4
000008d0: 2e39 2e32 0047 4343 3a20 2844 6562 6961  .9.2.GCC: (Debia
000008e0: 6e20 342e 382e 342d 3129 2034 2e38 2e34  n 4.8.4-1) 4.8.4
000008f0: 0000 2e73 796d 7461 6200 2e73 7472 7461  ...symtab..strta
00000900: 6200 2e73 6873 7472 7461 6200 2e69 6e74  b..shstrtab..int
00000910: 6572 7000 2e6e 6f74 652e 4142 492d 7461  erp..note.ABI-ta
00000920: 6700 2e6e 6f74 652e 676e 752e 6275 696c  g..note.gnu.buil
00000930: 642d 6964 002e 676e 752e 6861 7368 002e  d-id..gnu.hash..
00000940: 6479 6e73 796d 002e 6479 6e73 7472 002e  dynsym..dynstr..
00000950: 676e 752e 7665 7273 696f 6e00 2e67 6e75  gnu.version..gnu
00000960: 2e76 6572 7369 6f6e 5f72 002e 7265 6c61  .version_r..rela
00000970: 2e64 796e 002e 7265 6c61 2e70 6c74 002e  .dyn..rela.plt..
00000980: 696e 6974 002e 7465 7874 002e 6669 6e69  init..text..fini
00000990: 002e 726f 6461 7461 002e 6568 5f66 7261  ..rodata..eh_fra
000009a0: 6d65 5f68 6472 002e 6568 5f66 7261 6d65  me_hdr..eh_frame
000009b0: 002e 696e 6974 5f61 7272 6179 002e 6669  ..init_array..fi
000009c0: 6e69 5f61 7272 6179 002e 6a63 7200 2e64  ni_array..jcr..d
000009d0: 796e 616d 6963 002e 676f 7400 2e67 6f74  ynamic..got..got
000009e0: 2e70 6c74 002e 6461 7461 002e 6273 7300  .plt..data..bss.
000009f0: 2e63 6f6d 6d65 6e74 0000 0000 0000 0000  .comment........

This string table of section names starts at 0x8f1:

  • the first entry if the empty string for section header number 0 (offset 0);

  • the second entry is the .symtab string (offset 1);

  • the this entry is the .strtab string (offset 9).

Symbols and the symbol table

What's a symbol?

Symbols are used for linking (by the link editor and the dynamic linker).

The C statement:

extern int foo;

int foo = 3;

defines a global variable associated with the foo symbol15.

A user of this global variable:

extern int foo;

int foo_updater()
{
  return foo++;
}

will link to the foo symbol.

The linker will bind the user of the global variable with the global variable because they are using the same symbol name.

Symbol tables

Three section header table often includes two different symbol tables:

  • the .symtab section (SHT_SYMTAB) lists all the symbols including the local symbol which are not used outside of the ELF file;

  • the .dynsym section (SHT_DYNSYM) is a (usually) much smaller symbol table which only contains the imported and exported symbols.

The former can be used by debugging tools and the latter contains the minimum amount of entries for the dynamic linker. For this reason, only the latter is mapped in the process virtual address space and is present in the dynamic table.

The symbol tables are arrays of symbol entries:

typedef struct {
  Elf64_Word    st_name;  /* Symbol name (string tbl index) */
  unsigned char st_info;  /* Symbol type and binding */
  unsigned char st_other; /* Symbol visibility (and 0) */
  Elf64_Section st_shndx; /* Section index */
  Elf64_Addr    st_value; /* Symbol value */
  Elf64_Xword   st_size;  /* Symbol size */
} Elf64_Sym;

At runtime, the dynamic symbol table is given by the dynamic table entry ST_SYMTAB. Its size is not given and can be inferred from the hash table (DT_HASH or DT_GNU_HASH).

readelf -s can display the symbol tables.

Symbol type

  • STT_OBJECT, global variables

  • STT_FUNC, executable code (function, subprogram, method);

  • STT_SECTION, section

  • STT_FILE, gives a file name and precedes STB_LOCAL symbols of the file

  • STT_TLS is used for TLS variables such as errno, h_errno.

Section index

Each symbol can be associated with a section (by it's index).

Some special values are used:

  • STT_UNDEF means that the symbol is undefined. It is not defined in this ELF file but is only references by it.

  • STT_COMMON is used for a symbol which as not bean allocated yet. The value is an alignment constraint. This is used in .o files for uninitialised global variables (in C). It can be defined in multiple C files and will be instanciated only once.

  • STT_ABS is used for absolute values which are not relocated. It is used for STT_FILE entries and for GNU versioning.

Visibility and binding

Common visibility and binding combinations:

Binding Visibility Meaning
STT_LOCAL STV_DEFAULT Local to relocatable object
STT_GLOBAL STV_HIDDEN Local to the executable or DSO37
STT_GLOBAL STV_DEFAULT Global (visible in other runtime ELF modules)

Symbol binding

The symbol binding control the link-time visibility of the symbol (i.e. outside translation units and within a given ELF runtime objecte but not accross runtime ELF objects). It is a part of the stb_info field.

  • STB_LOCAL symbols are local to a .o file (they are used for static variables and functions in C and for things in anonymous namespace in C++).

Multiple symbols with the same name can be in the same ET_EXEC or ET_DYN (originating from different ET_REL): they are usually located before a STT_FILE entry with the source file name of the corresponding compilation unit in the .symtab

  • STB_GLOBAL symbols are visible outside of the .o file.

  • STB_WEAK is similar to STT_GLOBAL.

When combining multiple .o files into one executable or DSO, the link editor will raise an error if multiple STT_GLOBAL versions of the same symbols are defined but a STT_WEAK symbol with the same name as a STT_GLOBAL or another STT_WEAK symbol can appear.

A weak symbol does not need to be resolved: an unresolved weak symbol has a value of 0. The link editor will not pull .o relocatable objects from .a archives in order to resolve undefined weak symbols.

Symbol visibility

The symbol visibility controls the visibility across executable and DSOs. It is stored in the st_other field. This field is not relevant for STT_LOCAL symbols.

The different values are:

  • STV_DEFAULT, global visibility;

  • STV_PROTECTED, global visibility but local references do not use the PLT16;

  • STV_HIDDEN, not visible outside of the executable or shared-object.

  • STV_INTERNAL similar to STV_HIDDEN but may have some additional processor-specific semantic. Apparently the intent is that the symbol cannot be accessed outside the module (STV_HIDDEN might be accessed indirectly).

The STT_HIDDEN can be used in order to mark symbols which needs not be used outside of the DSO:

  • by reducing the number of exported symbols, this can speed up the symbol lookups by the dynamic linker;

  • the PLT can be avoided and the function id called directly;

  • this can avoid unexpected usage by other DSOs of symbols which are not part of the ABI of this DSO .

The visibility of a symbol can be defined in GCC with the visibility attribute:

int get_answer(void) __attribute__(visibility("hidden"))
{
  return 42;
}

The default visibility can be changed with command-line arguments with recent versions of GCC (gcc -fvisibility=hidden) or with pragmas:

#pragma GCC visibility push(hidden)
int get_answer(void) __attribute__(visibility("hidden"))
{
  return 42;
}
#pragma GCC visibility pop(hidden)

Relocation tables

The relocation tables are arrays of relocation entries using one of those forms:

typedef struct {
  Elf64_Addr    r_offset;  /* Address */
  Elf64_Xword   r_info;    /* Relocation type and symbol index */
} Elf64_Rel;

typedef struct {
  Elf64_Addr    r_offset;  /* Address */
  Elf64_Xword   r_info;    /* Relocation type and symbol index */
  Elf64_Sxword  r_addend;  /* Addend */
} Elf64_Rela;

The relocations exist in two forms. In both cases an addend is added to the symbol:

  • with explicit addend (Elf64_Rela), the addend is stored in the r_addend field of the relocation table;

  • without explicit addend (Elf64_Rel), the addend is stored in the relocation address.

readelf -r can display the relocation tables.

Relocation address

ET_REL files have one relocation section .rela.foo (or .rel.foo) per relocated section .foo. The r_offset address of the relocation is the offset of within the relocated .foo section.

For ET_EXEC and ET_DYN files, there is usually two relocation tables: the normal relocation table .rela.dyn (or .rel.dyn) and the lazy/PLT relocation table .rela.plt (or .rel.plt). The r_offset address of the relocation has a different meaning: it is the (runtime) virtual address of the relocation. The location of the relocation tables is described at runtime in the dynamic section (DT_RELA, DT_REL, DT_RELASZ, DT_RELSZ, DT_RELAENT, DT_RELENT DL_PLTREL, PLTRELSZ, DT_JMPREL).

GOT

The executable code is (usually) in the read-only segment:

  • we want to avoid to be able to modify the code for security reasons;

  • by avoiding to modify the code we can share the same same physical pages for the code for all processes using this ELF object.

As we do no want to modify the code (in the readonly text segment) in order to share it, the dynamic linker cannot relocate the DSO by patching the addresses of the referenced objects in the executable code. Instead, the address of the object is stored by the dynamic linker in the writable segment and the code fetches this address.

The link editor creates a section in the writable segment, the GOT (.got), containing all the slots for those addresses35. It creates a relocation entries in order to make the dynamic linker store the suitable values in the GOT.

GOT example x86_64

Compilation

For example, this C code:

extern int foo;

int get_foo()
{
  return foo;
}

compiles into this (gcc -S deref.c -o- -fPIC):

get_foo:
        movq    foo@GOTPCREL(%rip), %rax
        movl    (%rax), %eax
        ret

foo@GOTPCREL(%rip) resolves to a memory address (a entry in the GOT) where the address of foo is written: the first instruction stores this address in the %rax register. In the next instruction, the processor fetches the foo variable by dereferencing this address.

Relocatable object

When compiled into a relocatable object, we get this relocation:

Relocation section '.rela.text' at offset 0x250 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000003  000b00000009 R_X86_64_GOTPCREL 0000000000000000 foo - 4

It asks the link editor to generate a GOT entry for the address of foo and fill the relative address of this GOT entry in the instruction (movq foo@GOTPCREL(%rip), %rax). The link editor creates the GOT entry.

An addend of -4 is used because the relative instructions in x86 are using the address of the next instruction as a base address.

Shared-object

At runtime, the GOT entry needs to be filled by the dynamic linker. In order to do this, the link editor creates a relocation for the GOT entry in the shared-object:

Relocation section '.rela.dyn' at offset 0x458 contains 9 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000200990  000800000006 R_X86_64_GLOB_DAT 00000000002009ec foo + 0

This entry sets the third entry in the .got GOT1:

[19] .got              PROGBITS         0000000000200980  00000980
     0000000000000030  0000000000000008  WA       0     0     8

PLT

The Procedure Linkage table is used for calling functions whose address is not known at link time (because they might be in another shared-object or the executable). The PLT can be disassembled with objdump -D -j .plt.

For example this code:

#include <stdlib.h>

int main(int argc, char** argv)
{
  abort();
  return 0;
}

is compiled into (gcc test.c -S -o- -fPIC -O3):

main:
        subq    $8, %rsp
        call    abort

When decompiling the resulting executable we find that the call to foo as been replaced by a call to a stub for abort@plt (called a trampoline):

0000000000400410 <main>:
  400410:       48 83 ec 08             sub    $0x8,%rsp
  400414:       e8 c7 ff ff ff          callq  4003e0 <abort@plt>

This trampoline fetches the address of the abort in the GOT and jumps to this address:

00000000004003e0 <abort@plt>:
  4003e0:       ff 25 ea 04 20 00       jmpq   *0x2004ea(%rip)  # 6008d0 <_GLOBAL_OFFSET_TABLE_+0x18>
  4003e6:       68 00 00 00 00          pushq  $0x0
  4003eb:       e9 e0 ff ff ff          jmpq   4003d0 <_init+0x28>

All of this is done by the first instruction of this PLT trampoline: the two remaining instructions are used for lazy relocation which is explained afterwards.

A relocation exists in order to store the address of foo in this PLT GOT entry:

Relocation section '.rela.plt' at offset 0x360 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000006008d0  000100000007 R_X86_64_JUMP_SLO 0000000000000000 abort +

Lazy relocations

Relocation in dynamic linking can slow down the initialisation of the application: each symbol must be looked up in all loaded DSOs and the executable. In order to speed up the relocation of programs, lazy relocation is used for function calls17: the corresponding PLT GOT entry is not filled with the address of the function in the process initialisation but only when the function is actually called.

# Special .PLT0 entry:
00000000004003d0 :
  4003d0:  ff 35 ea 04 20 00   pushq  0x2004ea(%rip)  # 6008c0 <_GLOBAL_OFFSET_TABLE_+0x8>
  4003d6:  ff 25 ec 04 20 00   jmpq   *0x2004ec(%rip) # 6008c8 <_GLOBAL_OFFSET_TABLE_+0x10>
  4003dc:  0f 1f 40 00

# .PLT1 for abort:
00000000004003e0 <abort@plt>:
  4003e0:  ff 25 ea 04 20 00   jmpq   *0x2004ea(%rip) # 6008d0 <_GLOBAL_OFFSET_TABLE_+0x18>
  4003e6:  68 00 00 00 00      pushq  $0x0
  4003eb:  e9 e0 ff ff ff      jmpq   4003d0 <_init+0x28>
  1. The dynamic linker preinitialises the PLT GOT,

    • the first entry of the PLT GOT is filled by the dynamic linker with the address of _DYNAMIC;

    • the second entry of the PLT GOT is filled by the dynamic linker with a value used by the dynamic linker to recognise this ELF executable or DSO;

    • the third entry of the PLT GOT is filled with the address of a callback of the dynamic linker;

    • the PLT GOT entry for abort@plt is initially filled with the address of it's second instruction (0x4003e6);

  2. on the first call of the PLT trampoline abort@plt,

    a. the first instruction of the trampoline jumps to the second instruction of the trampoline;

    b. the second instruction of the PLT pushes on the stack the index of this relocation in the relocation table (from DT_JMPREL);

    c. the third instruction jumps to the first entry of the PLT (.PLT0);

    d. this entry pushes the second entry of the PLT GOT on the stack (this is used by the dynamic linker to identify this shared-object);

    e. this entry jumps to the callback of the dynamic linker;

    f. the dynamic linker does the real relocations,

    • it uses the arguments passed on the stack (identifier of this shared-object or executable and index in the relocation table),

    • it resolves the symbol;

    • it updates the PLT GOT entry with the address of the symbol;

    • it jumps to the address of the symbol in order to execute the function;

    g. the function is executed;

  3. on other calls, the PLT GOT entry now contains the address of the function and the PLT entry jumps to it directly (instead of jumping to .PLT0 and to the dynamic linker).

In the section header table:

  • the part of the GOT used by the PLT is in a separate section .got.plt;

  • the relocations of this PLT GOT are in a separate section .rela.plt.

In the dynamic section:

  • DT_JMPREL and DT_PLTRELSZ give the size of this relocation table (and DT_PLTREL whether it uses addends or not);

  • the DT_GOTPLT entry is used to tell the dynamic linker where it should find the the (three) special PLT GOT entries.

PLT example for x86_64

Compilation

This time let's compile a function call:

extern int foo(void);

int get_foo()
{
  return foo() + 42;
}

We get this assembly (cc -O3 -S -fpic):

get_foo:
.LFB0:
        subq    $8, %rsp
        call    foo@PLT
        addq    $8, %rsp
        addl    $42, %eax
        ret

The foo@PLT asks the assembler to use the address of a PLT entry for the foo function

Relocatable object

We get this relocation in the relocatable object:

Relocation section '.rela.text' at offset 0x260 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000005  000b00000004 R_X86_64_PLT32    0000000000000000 foo - 4

It asks the link editor to patch the instruction with the 32-bit relative address of the PLT entry for symbol foo. The link editor creates a PLT entry, corresponding PLT GOT entry (in the .got.plt) section and a relocation entry for this PLT GOT entry (in .rela.dyn).

Shared-object

We get this relocation in the shared-object:

Relocation section '.rela.plt' at offset 0x4f0 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000200960  000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0

This relocation entry asks the dynamic linker to lazily initialise the PLT GOT entry:

  1. it will first fill the PLT GOT entry with the second instruction of the associated PLT entry;

  2. when the PLT is called, it will call the dynamic linker which will initialise the PLT GOT entry with the address of the foo symbol.

Some x86_64 relocations

Link time relocation:

  • R_X86_64_PC32, 32-bit relative address of the symbol;

  • R_X86_64_PLT32, 32-bit relative address of the PLT entry of the symbol;

  • R_X86_64_GOTPCREL, relative address of the sGOT entry of the symbol.

Runtime relocations:

  • R_X86_64_JUMP_SLOT, sets a lazy PLT GOT entry

  • R_X86_64_GLOB_DAT, sets a GOT entry

  • R_X86_64_COPY, copy-relocation

  • R_X86_64_RELATIVE

  • R_X86_64_64, stores the 64 value of the symbol

Some x86 relocations

  • R_386_32

  • R_386_PC32

  • R_386_GOT32

  • R_386_PLT32

  • R_386_COPY, copy relocation

  • R_386_GLOB_DAT, set a GOT entry

  • R_386_JMP_SLOT, set a PLT GOT entry

  • R_386_RELATIVE

  • R_386_GOTOFF

  • R_386_GOTPC

Hash tables

Standard hash table

The standard hash table is built by the link editor. It is described by the .hash SHT_HASH section and by the DT_HASH entry in the dynamic section. Its structure is quite simple:

// Pseudo-C:
struct {
  Elf32_Word nbucket;          /* Number of buckets */
  Elf32_Word nchain;           /* Numer of entries in .dynsy* */
  Elf32_Word buckets[nbucket]; /* First entry in the chain */
  Elf32_Word chains[nchain];   /* Next entry in the chain */
};
  • buckets[hash % nbucket] gives the initial index in both the symbol table and the chain array;

  • chains[index] gives the next index in the chain;

  • index == STN_UNDEF marks the end of the chain.

The lookup looks like this:

Elf64_Sym const* lookup_symbol(
  const char* symbol,
  Elf64_Sym const* symbol_table
  const char* string_table,
  Elf32_Word const* hash_table)
{
  Elf32_Word nbucket           = hash_table[0];
  Elf32_Word nchain            = hash_table[1];
  Elf32_Word const* buckets    = hash_table + 2;
  Elf32_Word const* chains     = hash_table + 2 + nbucket;

  unsigned long hash = elf_hash(symbol_name);

  // Iterate on the chain:
  while (Elf32_Word index = buckets[hash % nbucket];
         chains[index] != STN_UNDEF;
    index = chains[index])
    if (strcmp(symbol, string_table + symbol_table[index].st_name) == 0)
      return symbol_table + index;

  return NULL;
}

GNU hash table

The GNU hash table is a more efficient alternative to the standard hash table18. Both can be present in the same ELF file but modern GNU ELF files usually only contains the GNU hash table. It is described by the .gnu.hash SHT_GNU_HASH section and by the DT_GNU_HASH entry in the dynamic section.

Main differences:

  • It adds a Bloom filter in order to speed up negative lookups. Negative lookups are the common case since the symbol is searched in different ELF files in sequence.

  • It adds the value of the hash next to each entry in order to avoid useless string comparison.

  • It is more cache-friendly by avoiding to jump around in the hash table memory.

  • It uses the DJB hash function.

// Pseudo-C:
struct Gnu_Hash_Header {
  uint32_t nbuckets;
  uint32_t symndx;    /* Index of the first accessible symbol in .dynsym */
  uint32_t maskwords; /* Nyumber of elements in the Bloom Filter */
  uint32_t shift2;    /* Shift count for the Bloom Filter */
  uintXX_t bloom_filter[maskwords];
  uint32_t buckets[nbuckets];
  uint32_t values[dynsymcount - symndx];
};

Notes

Each entry of a note section begins with:

typedef struct {
  Elf64_Word n_namesz;  /* Length of the note's name.  */
  Elf64_Word n_descsz;  /* Length of the note's descriptor.  */
  Elf64_Word n_type;    /* Type of the note.  */
} Elf64_Nhdr;

After this comes the note name and the note content:

  • The note name is the name of the owner of the note. Each owner can define its own values for the note type. The n_namesz field includes the terminating 0 byte at the end of the name.

Padding is used after the name and the content of the note to ensure 4 byte alignment.

Each note is usually in its own section (.note.XXX) but they are all grouped in the same program entry. readelf -n can display the notes.

GNU notes

GNU notes are using the string "GNU" (with a terminating 0 byte) and define the notes:

  • NT_GNU_ABI_TAG is used to describe the ABI used by the file.

The first 64-bit is the target system (ELF_NOTE_OS_LINUX for Linux) and the following bytes are a minimum version number.

Example (GNU/Linux 2.6.32):

Hex dump of section '.note.ABI-tag':
   0x0040021c 04000000 10000000 01000000 474e5500 ............GNU.
   0x0040022c 00000000 02000000 06000000 20000000 ............ ...
   

Example (d53a4435d14a5ac3009bad8c6f840175b37aa86a):

Hex dump of section '.note.gnu.build-id':
  0x00400274 04000000 14000000 03000000 474e5500 ............GNU.
  0x00400284 d53a4435 d14a5ac3 009bad8c 6f840175 .:D5.JZ.....o..u
  0x00400294 b37aa86a                            .z.j
  
  • NT_GNU_HWCAP

CORE notes

See Anatomy of an ELF core file.

LINUX notes

See Anatomy of an ELF core file.

Dynamic section

The dynamic section provides important informations for the dynamic linker. A statically linked executable does not have a PT_DYNAMIC entry.

It is an array of entries with the structure:

typedef struct {
  Elf64_Sxword  d_tag;   /* Dynamic entry type */
  union {
    Elf64_Xword d_val; /* Integer value */
    Elf64_Addr d_ptr;  /* Address value */
  } d_un;
} Elf64_Dyn;

readelf -d can display the content of the dynamic section.

The dynamic table is available as at runtime with the _DYNAMIC local symbol. A DT_NULL entry marks the end of the dynamic section.

Shared objects

  • DT_NEEDED defines a shared-object dependency;

  • DT_SONAME is the name of the current shared-object. This value is copied by the link editor ld as DT_NEEDED entry of dependent ELF objects19.

RPATH

The DT_RUNRPATH (and DT_RPATH 20) defines an additional path where the shared-objects should be searched.

The dynamic linker (ld.so) recognises several special values in DT_RUNRPATH (and DT_RPATH):

  • $ORIGIN expands to the directory of the ELF file;

  • $LIB expands to lib or lib64 depending on the architecture;

  • $PLATFORM expand to x86_64 for x86_64.

The DT_RPATH can be set with ld -rpath='$ORIGIN' (or gcc -Wl,-rpath='$ORIGIN'). ld --enable-new-dtags might be needed to add the DT_RUNPATH entries as well.

Symbols

  • DT_SYMTAB gives the runtime location of the symbol table (section .dynsym of type SHT_DYNSYM) and DT_SYMENT gives the byte size of a single entry21.

  • DT_HASH gives the runtime location of the standard symbol hash table (.hash of type SHT_HASH).

  • DT_GNU_HASH, program memory location of the GNU symbol hash table (section .gnu.hash of type SHT_GNU_HASH).

The type of hash table generated by the link editor can be chosen with ld --hash-style=style=sysv|gnuboth`. By default, the GNU hash table is used on (not-too old) GNU systems.

Symbol versions

DT_VERSYM, DT_VERNEED and DT_VERNEEDNUM are used for GNU symbol versioning.

Relocations

At runtime there's usually two different relocation tables: the main relocation table and the PLT relocation table.

The main relocation table (.rela.dyn section) is located with DT_RELA (address), DT_RELASZ (byte size of the relocation table), DT_RELAENT (byte size of a relocation entry) for relocation tables with addend. The main relocation table without addend uses DT_REL, DT_RELSZ and DT_RELENT.

Another relocation table (.rela.plt section) is used for the PLT. It is located with: DT_JMPREL (address) and DT_PLTRELSZ (byte size of the relocation table). The DT_PLTREL gives the type of relocation table (either DT_RELA or DT_REL) used for the PLT.

The DT_PLTGOT is the address of the PLT GOT (.got.plt). The dynamic linker needs to know it because the first entries of the PLT GOT are used by the dynamic linker.

Symbol lookup

Each relocation implies a symbol lookup.

In ELF, symbol resolution is using a mostly3 flat-namespace22: a used symbol is not bound to a specific DSO and is it searched in all the executable and all DSOs with breadth-first search30 (using the order of the DT_NEEDED entries).

This search is is O(#modules). For each executable or shared-object, a hash table (DT_HASH, DT_GNU_HASH or both) is included in the file (and available at runtime) in order to speed up the symbol lookup.

Flags

DT_FLAGS is a field of flags:

  • DF_ORIGIN is used when the current shared-object uses the $ORIGIN variable.

  • If the DF_SYMBOLIC flag is present, a given shared-object will always use its local definitions before definitions from another shared-object or the executable.

  • If the DF_TEXTREL24 flag is present, text relocations are used: relocation are done in non-writable segment (usually this is text segment) and the dynamic linker might need to make the text segment temporarily writable. It is usually not present because it prevents sharing of the text segment between different processes.

  • DF_BIND_NOW (ld -z now) disabled lazy relocations for the generated executable or shared-object.

Initialisation and termination functions

Initialisation functions are called in this order:

  1. DT_PREINIT_ARRAY array (of byte size DT_PREINIT_ARRAYSZ) of preinitialisation function addresses.

  2. DT_INIT, address of an initialisation function (the .init section);

  3. DT_INIT_ARRAY array (of byte size DT_INIT_ARRAYSZ) of initialisation function addresses.

Termination functions are called in this order:

  1. DT_FINI_ARRAY array (if byte size DT_FINI_ARRAYSZ) of termination function addresses;

  2. DT_FINI address of a termination function respectively (.fini sections).

Debug interface

If a DT_DEBUG entry is present, this value will be set by the dynamic linker to a pointer to the address of a struct r_debug (see link.h):

struct r_debug
{
  int r_version;          /* Version number for this protocol. */
  struct link_map *r_map; /* Head of the chain of loaded objects.  */
  ElfW(Addr) r_brk;
  enum {
    RT_CONSISTENT,        /* Mapping change is complete.  */
    RT_ADD,               /* Beginning to add a new object.  */
    RT_DELETE             /* Beginning to remove an object mapping.  */
  } r_state;
  ElfW(Addr) r_ldbase;    /* Base address the linker is loaded at.  */
};

This can be used to traverse the list of executables and shared-objects (of a given namespace):

struct link_map {
  /* These first few members are part of the protocol with the debugger.
     This is the same format used in SVR4.  */
  ElfW(Addr) l_addr;          /* Difference between the address in the ELF
                                 file and the addresses in memory.  */
  char *l_name;               /* Absolute file name object was found in.  */
  ElfW(Dyn) *l_ld;            /* Dynamic section of the shared object.  */
  struct link_map *l_next, *l_prev; /* Chain of loaded objects.  */
};

The struct link_map can be obtained at runtime with dlinfo(handle, RTLD_DI_LINKMAP, &res).

String table

DT_STRTAB and DT_STRSZ give the location and byte size of string table used by the dynamic section (.dynstr);

Symbol versions

Those entries are GNU extensions for versioning of symbol:

  • DT_VERSYM is the runtime location of the symbol version table (.gnu.version section of type SHT_GNU_versym). It contains the same number of entries as the dynamic symbol table and references a entry in the version definition table.

  • DT_VERDEF is the runtime location of the symbol definitions (.gnu.version_d section of type SHT_GNU_verdef) and ST_VERDEFNUM is the number of entries.

  • DT_VERNEED is the runtime location of the version requirements (.gnu.version_rsection of typeSHT_GNU_verned) andDT_VERNEEDNUM` is the number of entries.

Not covered (much) here

GNU symbol versioning

Main structures:

  • The symbol version table (DT_VERSYM, .gnu.version, SHT_GNU_versym) defines the version associated with each dynamic symbol.

  • The version definition section (DT_VERDEF, ST_VERDEFNUM .gnu.version_d, SHT_GNU_verdef) defines the versions implemented in this ELF file. It uses the Elf64_Verdef and Elf64_Verdaux structures.

  • The version requirements section (DT_VERNEED, DT_VERNEEDNUM, .gnu.version_r, SHT_GNU_verned) defines for each imported (DT_NEEDED) entry the required versions. It uses the Elf64_Verneed and Elf64_Vernaux structures.

See the LSB.

TLS

The ELF file contains an initialisation image for the TLS data:

  • the SHF_TLS section flag is used for TLS initialisation sections (.tdata and .tbss);

  • The PT_TLS program header type is used to describe the location of the initialisation image of the TLS data and is contained in a PT_LOAD segment. It contains all the SHF_TLS sections.

  • The STT_TLS symbol type is used for TLS data symbol. They are expected to be located in the PT_TLS range.

See ELF Handling For Thread Local Storage.

COMDAT

COMDAT refers to the ability of the static linker to remove redundant code and data when combining different .o files. This is used in C++ when instanciating templates. In order to do this, the compiler creates dedicated sections for each template instanciation.

For example, this C++ code:

#include <string>

std::string foo(std::string& x)
{
  return x + x;
}

Generates the following sections in the relocatable object:

$ readelf -WS test.o
There are 26 section headers, starting at offset 0xc058:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .group            GROUP           0000000000000000 000040 00000c 04     24  18  4
  [ 2] .text             PROGBITS        0000000000000000 00004c 00002d 00  AX  0   0  1
  [ 3] .rela.text        RELA            0000000000000000 008278 000018 18   I 24   2  8
  [ 4] .data             PROGBITS        0000000000000000 000079 000000 00  WA  0   0  1
  [ 5] .bss              NOBITS          0000000000000000 000079 000000 00  WA  0   0  1
  [ 6] .text._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ PROGBITS        0000000000000000 000079 000062 00 AXG  0   0  1
  [ 7] .rela.text._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ RELA            0000000000000000 008290 000060 18   I 24   6  8
  [ 8] .gcc_except_table._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ PROGBITS        0000000000000000 0000db 000010 00  AG  0   0  1
[...]

Section groups (sections with .group name and SHT_GROUP type) are used to group related sections: the first Elf32_Word of the group section is a set of flags (GRP_COMDAT is used for COMDAT section groups) and the remaining Elf32_Word of the section are the indices of the sections belonging to this group.

ARM

  • .ARM.* section names and SHT_ARM_* [section types]](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044e/IHI0044E_aaelf.pdf#page=18)

  • PT_ARM_* program header types

References

Authoritative references

Blogs posts, articles, books and such


  1. The address of the GOT entry is 0x200990 and the address of .got is 0x200980: the offset of the GOT entry within .got is 0x200990 - 0x200980 = 0x10 = 16. Each GOT entry is 8 bytes on x86_64 so this is the third entry. 

  2. GNU objdump and objcopy both rely on BFD and are unable to see some sections (and can synthesise some others) because of the file-format abstraction of the BFD library. objdump from elfutils (called eu-objdump on some distributions) does not have this limitation (but only has a limited subset of the feature of GNU objdump). 

  3. Solaris and GNU systems have the ability to handle different namespaces (see dlmopen()): different shared-object can be placed in different namespaces. Usually only two namespaces are used: one for the dynamic linker and a second one for the the application and the shared-object libraries. 

  4. The C structures (and the associated comments) are taken from the GNU elf.h file. Only the 64 bit variant is displayed here. 

  5. For example, it used used for ARM-based embedded software. 

  6. Notable exception are the Apple systems (MacOS X, iOS, Darwin) which use their own Mach-O format (coming from their NeXTSTEP lineage) and Microsoft systems (Windows) which use the PE file format (which is based on the old Unix System V COFF format). 

  7. I wrote this tool because objcopy --dump-section was not completely satisfying. 

  8. This is an extension to the ELF standard not documented in the specification. 

  9. In contrast to PE files, the (readonly) text segment (such as the code) is shared for all processes (and with the filesystem cache) even if the shared-object is loaded at different addresses. In order to achieve this, the code for shared-objects should be compiled as PIC.

    PE files are built with a preferred address and if they must be relocated, the code becomes private to the process. In other words, Windows DLL do not use PIC

  10. They are using PIC code. They must be compiled with cc -fpic (or -fPIC). 

  11. They are compiled with cc -fpie (or -fPIE). 

  12. With the GNU BFD linker, the layout of sections after linking is given by a linker script. The default linker script can be seen with ld -verbose. Another linker script can be used with ld -T some_linker_script

  13. However they can use other techniques such as GOT infection and ROP

  14. This property is so important that the MPROTECT feature of the PaX (an Linux patch) prevents the existence of VMAs which are both executable and writable in most cases in order to enhance security. 

  15. In C, symbols have the name of the corresponding C function or variable on ELF systems.

    In C++, function overloading, templates, namespaces and so on make it more difficult. The name of the object (including the types of its arguments for functions) is mangled to form the symbol. Different name mangling schemes exist, but modern versions of GCC and clang use the name mangling of the C++ Itanium ABI: For example with this ABI, the foo::Something::bar(int) method is mangled into _ZN3foo9Something3barEi. The c++filt program can be used to demangle C++ symbol names (or the __cxa_demangle function). 

  16. The usage of STV_PROTECTED symbols is not recommended because it slows down the dynamic linkage

  17. The usage of the PLT can be disabled at compile-time (for a given compilation unit) with cc -fno-plt or for a given function with __attribute__((noplt)). This disables lazy binding. 

  18. See GNU Hash ELF Section by Ali Bahrami and How to write Shared Libraries by Ulrich Depper. 

  19. Each shared-object dependency is described with a DT_NEEDED entry. A typical value is libfoo.so.6 (where 6 is a version number). This file is searched in different directories by the dynamic linker. A same shared object can be present in different incompatible versions.

    The link editor ld links against libfoo.so (using the -lfoo flag) which is a symbolic link to the current version of the shared object. Shared objects usually contain a DT_SONAME entry defining the full (shared-object) name (libfoo.so.6) of this shared-object. This value is copied a as DT_NEEDED entry in the dependent ELF objects.

    If no DT_SONAME is present, the link editor creates a DT_NEEDED entry with libfoo.so instead when given the -lfoo flags.

    If a full path to the shared object is given to ld and this shared object does not have DT_SONAME entry, the full path to the shared object will be used in the DT_NEEDED entry. 

  20. DT_RPATH serves the same purpose but is searched before the LD_LIBRARY_PATH environment variable which is not considered a good solution. For this reason, the DT_RUNRPATH was created as a replacement: the values of DT_RUNPATH are searched after the LD_LIBRARY_PATH environment. DT_RPATH is deprecated and ignored when DT_RUNPATH is present (and recognised by the dynamic linker). 

  21. There is no size/number of entries for the symbol table at the program header table level. This is not needed at runtime as the symbol lookup always go through the hash table. 

  22. This is on contrast with Windows PE files and MacOS X which both use a two-level namespace lookup: they import a given symbol from a given DLL or .dyld

  23. This means that there is usually no runtime relocation in the text segment: all the runtime relocations are done in the text segment.

    If the DT_TEXTREL flag is present (or a DT_TEXTREL dynamic table entry) is present, text relocation are present in this file. 

  24. The DT_TEXTREL dynamic table entry can be used as well but its usage is deprecated/optional. 

  25. The PLT GOT is still vulnerable to GOT poisoning

  26. Static libraries (.a files) are archives to .o files. Different formats exist for them. 

  27. As a result, the sections in the ELF files are grouped in three parts:

    1. the sections which belong to the text segment;

    2. the sections which belong to the data segment;

    3. the sections which do not belong to any segment (and are not available/used at runtime).

  28. This is a simplification. Other things influence the order and the set of ELF modules used for a given lookup: DT_SYMBOLIC, dlopen(), dlmopen() etc.

    dlopen-ed shared-object and their dependencies are not added to the global scope but only in a local scope (unless RTLD_GLOBAL is used).

    dlmopen() can be used to create separate symbol namespaces with their own sets of ELF shared-objects. 

  29. The .debug_frame DWARF section is used to tell the debugger how to unwind each stack frame

    The .eh_frame has been created in order to unwind the stack at runtime. This is used for exception handling.

    The .eh_frame section contains information for uwinding the frame for each instruction address. This is use by the Itanium C++ exception ABI to unwind the stack on exceptions. Its format is based on the .debug_frame DWARF section.

    If it is present the .debug_frame can be omitted. 

  30. .note.gnu.build-id describes the build-id used to locate a separate ELF file containing the debug informations. This is the NT_GNU_BUILD_ID note. 

  31. .gnu.warning and .gnu_warning.XXX contains warning message displayed by the linker to issue warnings when linking against this ELF file or this symbol respectively.

    Example:

    Hex dump of section '.gnu.warning.gets':
    0x00000000 74686520 60676574 73272066 756e6374 the `gets' funct
    0x00000010 696f6e20 69732064 616e6765 726f7573 ion is dangerous
    0x00000020 20616e64 2073686f 756c6420 6e6f7420  and should not
    0x00000030 62652075 7365642e 00                be used..
    
     
  32. In fact, it creates two GOT sections: .got and .got.plt

  33. The VMA are the different available/mapped regions in the virtual address space. Each VMA has some properties such as:

    • permissions (rwx);

    • whether it is shared with other processes (MAP_SHARED) or private to this process (MAP_PRIVATE);

    • whether it has an associated file (and the offset of the VMA within the file);

    • etc.

    They are created with mmap() (or similar) or directly by the kernel. On Linux, they can be seen in /proc/$pid/maps or with the pmap tool. 

  34. This is what appears in the .o file. In the shared-object or executable, it is converted to STT_LOCAL and STV_DEFAULT