The ELF file format
Published:
Updated:
Some notes on the ELF 🧝 file format with references, explanations and some examples.
The ELF (Executable and Linkable Format) file format is a standard file format for executable files, dynamic libraries[1] (DSOs, .so files), compiled compilation unit (.o files) and core dumps. It is used for many platforms[2] including many recent Unix-ish systems (System V, GNU, BSD) and embedded software[3].
You might want to read this document alongside with the outputs of readelf, objdump -D[4], objcopy --dump-section, elfcat[5] and/or an hexadecimal editor. You might want to cross-reference with elf.h, the manpage (man 5 elf) or the ELF specs.
Table of content
Basic structure
The ELF header is located at the beginning of the ELF file and contains information about the target operating system (OS), architecture, the type of ELF file (executable, dynamic library, etc.) and the location of two important structures within the ELF file defining two views of the ELF file:
- the program header table defining the execution view;
- the section header table defining the linking view.
Execution view
The execution view is given by the program header table. This table is used (by the kernel, by the dynamic linker, etc.) to create a runtime image of the program in memory:
- which ranges of the file should be loaded in memory (the segments);
- which dynamic linker should be used (if any);
- which other shared-objects are needed;
- how to resolve the references to other shared-objects;
- etc.
Linking view
The linking view is given by the section header table which describes the location of the different sections (within the file and within the the runtime image of the program).
The .o files generated by the compiler are made of different sections (.text for executable code, .data for initialised global variables, .rss for uninitialised global variables, .rodata for read-only global variables, etc.): the link editor combines different .o files in a single executable or DSO (Dynamic Shared Object), by merging the sections of the different .o files with the same name, and generates some others (.got, .dynamic, .plt, .got.plt, etc.)[6].
The linking view is not used at runtime: all the information needed at runtime is in the the program header table. Some sections are not used at runtime (debugging information, full symbol table) and are not present in the execution view. Those sections and the section header table can be omitted (or stripped) from the ELF file.
If they are present those extra informations can be used by debugging tools (such as GDB), profiling tools, etc. Many tools for inspection and manipulation of ELF files (readelf, objdump) rely on the section table header to work correctly.
Other important structures
The dynamic section contains important informations used for dynamic linking.
Symbol tables list the symbols defined and used by the file.
Hash tables are used for efficient lookup of symbols by their name (symbol table entries by symbol name).
Relocation tables list the relocations needed to relocate the ELF file at a different memory address or to link it to other ELF objects.
String tables are lists of strings which are referenced at other places in the ELF file (for section names in the section header table, for symbol names on the symbol tables, etc.).
The GOT (Global Offset Table) is a table filled by the dynamic linker with addresses of functions and variables. The program uses those entries to get the address of variables or functions which could be located in another ELF module.
The PLT (Procedure Linkage Table) contains trampolines: they are stubs for functions which might be located in another ELF module. The program calls those stubs which calls the real function (by dereferencing a corresponding GOT entry). This is used for lazy relocation.
Notes are used to add miscellaneous informations (such as GNU ABI (Application Binary Interface) informations, GNU build IDs).
ELF header
The ELF header is at the beginning of the ELF file and contains:
- a 4-bytes magic number used to identify ELF files (0x7f followed by the
"ELF"string); - informations about the ELF file;
- informations about the target machine and OS/ABI;
- the location of the main structures of the ELF files, the section table and the program table.
The ELF header is using the following structure[7]:
typedef struct {
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Architecture */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point virtual address */
Elf64_Off e_phoff; /* Program header table file offset */
Elf64_Off e_shoff; /* Section header table file offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size in bytes */
Elf64_Half e_phentsize; /* Program header table entry size */
Elf64_Half e_phnum; /* Program header table entry count */
Elf64_Half e_shentsize; /* Section header table entry size */
Elf64_Half e_shnum; /* Section header table entry count */
Elf64_Half e_shstrndx; /* Section header string table index */
} Elf64_Ehdr;
readelf -h can display the content of the ELF header.
ELF class
The e_ident[EI_CLASS] field describes the ELF class: 32-bit (ELFCLASS32) or 64-bit (ELFCLASS64) for 32-bit and 64-bit programs respectively.
The ELF structures are different for the two ELF classes: the fields are the same but their type and sometimes their order is different (in order to have packed structures). For example, the -ELF header is using the Elf32_Ehdr and Elf64_Ehdr structures for -ELFCLASS32 and ELFCLASS64 respectively.
ELF endianess
The e_ident[EI_DATA] field describes the encoding (endianess) of the architecture (either ELFDATA2LSB or ELFDATA2MSB). The fields of the ELF file are encoded in the encoding/endianess of the architecture: you might have to swap the endianess (see endian.h) if you process ELF files from a foreign architecture.
ELF type
The ELF type is in the e_type field:
ET_RELis used for relocatable objects (.ofiles);ET_EXECis used for executable files (with the exception of PIEs which areET_DYN);ET_DYNis used for dynamic libraries also known as shared-objects (.sofiles);ET_COREis used for core files[8].
A major difference between ET_EXEC and ET_DYN files is that ET_EXEC files are always fixed at a given position in the virtual address. In contrast, ET_DYN files can be relocated anywhere in the virtual address space by applying a constant offset to its virtual addresses[9]: the same .so file can be mapped at different locations in different processes[10]. Usually, the shared-object is mapped at address 0 in the ELF file[11].
Normal (ET_EXEC) executables are always mapped at a given location so the location of their subprograms and global variables is always the same for each process. This knowledge can be exploited by an attacker to get control of the process. In order to avoid this, the program can be compiled as a PIE[12] (Position Independent Executable) which can be mapped (relocated) at any address in the process virtual address space. PIEs being relocatable are ET_DYN instead of ET_EXEC file.
The Linux kernel (vmlinux) uses the ET_EXEC type and its loadable modules (.ko files) use the ET_REL type.
Location of the header tables
The location of the section header table and program header table are described in the ELF header:
- in
e_phoff,e_phentsize,e_phnumfor the program header table (execution view); - in
e_shoff,e_shentsize,e_shnumfor the section header table (linking view).
Section header table
The section header table defines the linking view of the ELF file: each entry defines a section within the file. The compiler generates relocatable object (.o files) made of different sections (.text, .data, .rodata, .bss, etc.). When the link editor ld combines different relocatable objects into an executable or shared-object, it merges the sections with the same name in a single section in the final output. For example, it combines the .text sections (containing the compiled code) of the different .o files in a single .text section.
The section table is an array of section descriptions with the structure:
typedef struct {
Elf64_Word sh_name; /* Section name (string tbl index) */
Elf64_Word sh_type; /* Section type */
Elf64_Xword sh_flags; /* Section flags */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Section size in bytes */
Elf64_Word sh_link; /* Link to another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;
The first entry of a section header table is always a empty null section (type SHT_NULL).
readelf -S can display the section header table. readelf -x can be used to get a hexdump of a given ELF section. A raw dump of a section can be produced with objcopy a.out --dump-section .dynstr=/dev/stdout /dev/null | cat. Note that, some sections are not visible to objcopy and objdump: you might want to use elfcat[5:1] instead.
Section names
Each section has a name (.text, .data, .rodata, .bss, .got, .plt, etc.): all section names are stored in a string table (.shstrtab). The e_shstrndx field of the ELF header is the index (in the section header table) of the section containing the section names:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
[...]
Section header string table index: 26
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[26] .shstrtab STRTAB 0000000000000000 0001e220
00000000000000f3 0000000000000000 0 0 1
The sh_name field of the section header is the byte offset of the section name within this string table.
Existing sections
| Section name | Type | Usage (and equivalent runtime description) |
|---|---|---|
.text | SHT_PROGBITS | Main executable code |
.data | SHT_PROGBITS | Initialised read and write data |
.rodata | SHT_PROGBITS | Read only data |
.bss | SHT_NOBITS | Uninitialised read and write data |
.data.rel.ro | SHT_PROGBITS | |
.tdata | SHT_PROGBITS | Initialised thread-local data (part of PT_TLS) |
.tbss | SHT_NOBITS | Uninitialised thread-local data (part of PT_TLS) |
.init | SHT_PROGBITS | Initialisation code (usually .init, DT_INIT) |
.fini | SHT_PROGBITS | Termination code (usually .fini, DT_FINI) |
.init_array | SHT_INIT_ARRAY | Addresses of initialisation functions (DT_INIT_ARRAY and DT_INIT_ARRAYSZ`) |
.fini_array | SHT_FINI_ARRAY | Addresses of termination functions (DT_FINI_ARRAY and DT_FINI_ARRAYSZ`) |
.ctors | SHT_PROGBITS | Similar to .init_array but old-school |
.dtors | SHT_PROGBITS | Similar to .fini_array but old-school |
.dynsym | SHT_DYNSYM | Dynamic symbol table (DT_SYMTAB) |
.dynstr | SHT_STRTAB | String table for dynamic linkins (DT_STRTAB) |
.symtab | SHT_SYMTAB | Full symbol table |
.symtab_shndx | SHT_SYMTAB_SHNDX | |
.strtab | SHT_STRTAB | String table used for the symbol table |
.relaXXX | SHT_RELA | Relocations for section XXX, with addend |
.relXXX | SHT_REL | Relocations for section XXX, without addend |
.rela.dyn | SHT_RELA | Other runtime relocations, with addend |
.rel.dyn | SHT_REL | Other runtime relocations, without addend |
.rela.plt | SHT_RELA | PLT relocations, with addend |
.rel.plt | SHT_REL | PLT relocations, without addend |
.got | SHT_PROGBITS | Main GOT |
.got.plt | SHT_PROGBITS | PLT GOT, GOT used by the PLT (lazy relocations) |
.hash | SHT_HASH | Standard symbol hash table (DT_HASH) |
.gnu.hash | SHT_GNU_HASH | GNU symbol hash table (DT_GNU_HASH) |
.gnu.version | SHT_VERSYM | GNU symbol versions (DT_VERSYM) |
.gnu.version_r | SHT_VERNEED | GNU versions requirements (DT_VERNEED and DT_VERNEED_NUM) |
.gnu.version_d | SHT_VERDEF | GNU versions definitions (DT_VERDEF and DT_VERDEF_NUM) |
.debug_info | SHT_PROGBITS | DWARF, Main DWARF section (variables, subprograms, types, etc.) |
.debug_abbrev | SHT_PROGBITS | DWARF, Type of the nodes in debug_abbrev |
.debug_aranges | SHT_PROGBITS | DWARF |
.debug_line | SHT_PROGBITS | DWARF, Mapping between instruction and source code lines |
.debug_str | SHT_PROGBITS | DWARF, Strings for DWARF sections |
.debug_fame | SHT_PROGBITS | DWARF, Stack unwinding information[13] |
.debug_macro | Debug macros (GNU extension) | |
.debug_link | [14] | |
.stab | SHT_PROGBITS | Debugging informations in the (old) stab format |
.stabstr | SHT_PROGBITS | Strings associated with the .stab section |
.eh_frame | SHT_PROGBITS | Runtime stack unwinding information[13:1] |
.eh_frame_hdr | SHT_PROGBITS | Header (location and index) of the EH frame table (PT_GNU_EH_FRAME) |
.shstrtab | SHT_STRTAB | String table for section names |
.note.XXXX | SHT_NOTE | Note |
.note.ABI-tag | SHT_NOTE | ABI used in this file (NT_GNU_ABI_TAG) |
.note.gnu.build-id | SHT_NOTE | Build-id for thie build[15] (NT_GNU_BUILD_ID note.) |
.dynamic | SHT_DYNAMIC | Dynamic table, dynamic linking information (PT_DYNAMIC) |
.interp | SHT_PROGBITS | Interpreter (PT_INTERP) |
.group | SHT_GROUP | Group of related sections (used for COMDAT) |
.comment | ||
.jcr | SHT_PROGBITS | Used for Java (?) |
.stapsdt.base | Used for SystemTap SDT | |
.note.stapsdt | Used for SystemTap SDT | |
.gcc_except_table | SHT_PROGBITS | LSDA (Language Specific Data) for exception handling |
.gnu.warning | Warning message when linking against this file[16] | |
.gnu_warning.XXX | SHT_PROGBITS | Warning message when linking against symbol XXX[16:1] |
.ARM.extab | SHT_PROGBITS | |
.ARM.exidx | SHT_ARM_EXIDX | |
.ARM.attributes | SHT_ARM_ATTRIBUTES |
Section types
SHT_PROGBITS, section containing data which do not have any special meaning for the link editor;SHT_NOBITS, section full of zeros (.bss);SHT_NOTE, notesSHT_HASH, standard symbol hash table;SHT_GNU_HASH, GNU symbol hash table;SHT_DYNSYM, minimum runtime dynamic symbol table (.dynsym)SHT_SYMTAB, full symbol table (.symtab)SHT_STRTAB, string tablesSHT_RELAandSHT_REL, relocation tables (with addendum and without addendum respectively);SHT_INIT_ARRAYandSHT_FINI_ARRAY, addresses of initialisation/termination functions;SHT_DYNAMIC, dynamic table (.dynamicsection,DT_DYNAMICsegment type)SHT_VERDEFSHT_VERSYMSHT_VERNEEDSHT_GROUP
Section link
For symbol tables (SHT_SYMTAB and SHT_DYNSYM) and the dynamic section (SHT_DYNAMIC), the sh_link gives the index of the string table used to find the strings referenced in the section.
For symbol hash tables (SHT_HASH and SHT_GNU_HASH) and relocation tables (SHT_RELA and SHT_REL), it gives the index of the associated symbol table.
Section info
For relocation tables, the sh_info field gives the index of the section it applies to. This is mostly relevant for .o files. For executables and DSOs on GNU systems, the .rela.dyn uses 0 because it applies to many different sections and rela.plt uses the index of the .plt even if it applies to the .got.plt.
For symbol tables, it gives the index in the symbol table which can be used to skip the STT_LOCAL symbols.
Section flags
The sh_flags is a field of flags:
SHF_WRITE,SHF_EXECINSTRdefines expected runtime accesses to this section. When the linker editor, it will set thePF_WandPF_Xflags accordingly.SHF_ALLOCis used for sections which are present in the ELF file at runtime. Those section are expected to be present in aPT_LOADentry.SHF_MERGEis used for section which can be merged to eliminate duplication.SHF_STRINGSis used for string table sections.SHF_INFO_LINKis used for sections which reference another section in thesh_infofield[17].SHF_LINK_ORDERSHF_OS_NONCONFORMINGis used for sections which need OS-specific processing.SHF_GROUPSHF_TLSis used for section holding TLS (Thread Local Storage).
Program header table
The program header table defines the execution view of the ELF file:
- location (in memory and on disk) of the segments (parts of the file which must be loaded/mapped in program memory);
- location (in memory and on disk) of some important runtime parts of those segments.
The program table is an array of program headers:
typedef struct {
uint32_t p_type; /* Segment type */
uint32_t p_flags; /* Segment flags */
Elf64_Off p_offset; /* Segment file offset */
Elf64_ddr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
uint64_t p_filesz; /* Segment size in file */
uint64_t p_memsz; /* Segment size in memory */
uint64_t p_align; /* Segment alignment */
} Elf64_Phdr;
The program header table can be seen with readelf -l. readelf tells as well which section is located in each region described in a program header entry.
Segments
A PT_LOAD entry represents a loadable segment to load (typically mmap()) in the program memory. A typical ELF executable or DSO has two such entries describing two segments[18]:
- The first one is the text segment. It is executable, readable but not writable and contains code and read-only data (
.text,.rodata,.plt,.eh_frame, etc); - The second one is the data segment. It is readable, writable but not executable and contains the modifiable data (
.data,.got,got.plt,.bss, etc.).
The idea in this separation is that everything which does not need to be written (read-only data, code) should be read-only:
- the text segment is not modified[19] and thus its memory pages can be shared for all processes using this ELF file;
- the pages of the writable segment are automatically unshared by the OS as soon as they are modified (using copy-on-write).
Note: security considerations
Another important property in the design is that executable segments are not writable[20]. If a process has VMAs[21] which are both executable and writable, an attacker might exploit bugs such as buffer overflows in order to write arbitraty code in the program's memory and possibly execute it. If the executable pages are read-only, the attackers can try to write arbitrary code but it will not be executable[22].
Example
A simple hello world program:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001c0 0x00000000000001c0 R E 8
INTERP 0x0000000000000200 0x0000000000400200 0x0000000000400200
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000006dc 0x00000000000006dc R E 200000
LOAD 0x00000000000006e0 0x00000000006006e0 0x00000000006006e0
0x0000000000000230 0x0000000000002288 RW 200000
DYNAMIC 0x00000000000006f8 0x00000000006006f8 0x00000000006006f8
0x00000000000001d0 0x00000000000001d0 RW 8
NOTE 0x000000000000021c 0x000000000040021c 0x000000000040021c
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x00000000000005b4 0x00000000004005b4 0x00000000004005b4
0x0000000000000034 0x0000000000000034 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
We can see the resulting VMAs[21:1] in /proc/$pid/maps of a corresponding process:
- The first VMA (Virtual Memory Area) is the text segment.
- The second VMA is the part of the data segment which is initialised.
- The fourth VMA is the part of the data segment which is not been initialised. This is the end of the
.bsssegment. This part is it not stored in the ELF file and and is a thus a separateMAP_ANONYMOUSVMA.
00400000-00401000 r-xp 00000000 08:13 27418661 /home/foo/temp/wait 00600000-00601000 rw-p 00000000 08:13 27418661 /home/foo/temp/wait 00601000-00603000 rw-p 00000000 00:00 0 [...]
Read only relocations
On GNU systems, the dynamic linker may be instructed to mprotect() the .got section against write access after the relocation is finished. This improves the security by preventing the poisoning of the (non-PLT) GOT[23] after the relocation is done.
This is enabled with ld -z relro (which generates a PT_GNU_RELRO entry) and disabled explicitly with ld -z norelo. When enabled, PT_GNU_RELRO is present in the program header table and describes a range of memory which the dynamic linker can mprotect() after the (non-lazy) relocation is done (the .got section).
The same example program linked with ld -z relro features the additional PT_GNU_RELRO entry:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000070c 0x000000000000070c R E 200000
LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x0000000000000230 0x0000000000002258 RW 200000
DYNAMIC 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
0x00000000000001d0 0x00000000000001d0 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x00000000000005e4 0x00000000004005e4 0x00000000004005e4
0x0000000000000034 0x0000000000000034 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x00000000000001f0 0x00000000000001f0 R 1
This can be seen in /proc/$pid/maps:
- the first VMA is the text segment;
- the second VMA is the part of the data segment (described by
PT_GNU_RELRO) which as been protected; - the third VMA is the part of the data segment which has not been protected;
- the fourth VMA is the part of the data segment which has not been initialised.
00400000-00401000 r-xp 00000000 08:13 27418663 /home/foo/temp/wait2 00600000-00601000 r--p 00000000 08:13 27418663 /home/foo/temp/wait2 00601000-00602000 rw-p 00001000 08:13 27418663 /home/foo/temp/wait2 00602000-00604000 rw-p 00000000 00:00 0 [...]
In addition, ld -z now (DF_BIND_NOW) might be used which disables lazy-relocation. By combining the two options, you can get an executable or DSO without .got.plt and all the GOT will be read-only after relocation.
Other program header entries
PT_PHDRdescribes the program header table itself if it is available in the program memory.PT_INTERPgives the absolute path of the interpreter/dynamic-linker (.interpsection) if there is one. This entry is present on dynamically linked executables.PT_DYNAMICdescribes the location of the dynamic linking informations (.dynamicsection).PT_GNU_EH_FRAME(akaPT_SUNW_EH_FRAME) describes the location of stack unwinding information used at runtime (for exception handling). This is the.eh_frame_hdrsection which can be used to locate the.eh_framesection.PT_GNU_STACKis an empty segment which can be used to set the permissions of the default stack. This can be used to make the stack executable which is probably not such as good idea.PT_NOTEdescribes the location of notes. All the.noteXXXsections (of typeSHT_NOTE) are usually combined into a singlePT_NOTEsection.PT_TLSis used for initalising TLS.
String tables
String tables are lists of strings. They use the SHT_STRTAB section type. Each string in the string table is terminated by a NUL byte and is referenced by its byte offset from the beginning of the table.
The first entry of a string table is always the empty string (the first byte of a string table is always NUL): the empty string can always be designated with the zero offset.
The content of a string section can be displayed with either of:
readelf -p .dynstr`
objcopy a.out --dump-section .dynstr=/dev/stdout /dev/null | tr '\000' '\n'
Usages:
.shstrtabholds the section names;.dynstrholds the names of the symbols in the dynamic symbol table.dynsym;.strtabholds the names of the symbols in the full symbol table.symtab.
References to string tables:
- the (section index of the) string table used for section names is indicated in the
e_shstrndxfield of the ELF header; - many sections which with string reference use the section header field
sh_linkto give the (section index of the) string table they use; - in the dynamic section, the string table used is located with the
DT_STRTABentry.
Example of .shstrtab (x86_64 GNU/Linux)
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[27] .shstrtab STRTAB 0000000000000000 000008f1
0000000000000108 0000000000000000 0 0 1
File hexdump:
000008b0: 0000 0000 0000 0000 4743 433a 2028 4465 ........GCC: (De
000008c0: 6269 616e 2034 2e39 2e32 2d31 3029 2034 bian 4.9.2-10) 4
000008d0: 2e39 2e32 0047 4343 3a20 2844 6562 6961 .9.2.GCC: (Debia
000008e0: 6e20 342e 382e 342d 3129 2034 2e38 2e34 n 4.8.4-1) 4.8.4
000008f0: 0000 2e73 796d 7461 6200 2e73 7472 7461 ...symtab..strta
00000900: 6200 2e73 6873 7472 7461 6200 2e69 6e74 b..shstrtab..int
00000910: 6572 7000 2e6e 6f74 652e 4142 492d 7461 erp..note.ABI-ta
00000920: 6700 2e6e 6f74 652e 676e 752e 6275 696c g..note.gnu.buil
00000930: 642d 6964 002e 676e 752e 6861 7368 002e d-id..gnu.hash..
00000940: 6479 6e73 796d 002e 6479 6e73 7472 002e dynsym..dynstr..
00000950: 676e 752e 7665 7273 696f 6e00 2e67 6e75 gnu.version..gnu
00000960: 2e76 6572 7369 6f6e 5f72 002e 7265 6c61 .version_r..rela
00000970: 2e64 796e 002e 7265 6c61 2e70 6c74 002e .dyn..rela.plt..
00000980: 696e 6974 002e 7465 7874 002e 6669 6e69 init..text..fini
00000990: 002e 726f 6461 7461 002e 6568 5f66 7261 ..rodata..eh_fra
000009a0: 6d65 5f68 6472 002e 6568 5f66 7261 6d65 me_hdr..eh_frame
000009b0: 002e 696e 6974 5f61 7272 6179 002e 6669 ..init_array..fi
000009c0: 6e69 5f61 7272 6179 002e 6a63 7200 2e64 ni_array..jcr..d
000009d0: 796e 616d 6963 002e 676f 7400 2e67 6f74 ynamic..got..got
000009e0: 2e70 6c74 002e 6461 7461 002e 6273 7300 .plt..data..bss.
000009f0: 2e63 6f6d 6d65 6e74 0000 0000 0000 0000 .comment........
This string table of section names starts at 0x8f1:
- the first entry if the empty string for section header number 0 (offset 0);
- the second entry is the
.symtabstring (offset 1); - the this entry is the
.strtabstring (offset 9).
Symbols and the symbol table
What is a symbol?
Symbols are used for linking (by the link editor and the dynamic linker).
The C statement:
extern int foo;
int foo = 3;
defines a global variable associated with the foo symbol[24].
A user of this global variable:
extern int foo;
int foo_updater()
{
return foo++;
}
will link to the foo symbol.
The linker will bind the user of the global variable with the global variable because they are using the same symbol name.
Symbol tables
Three section header table often includes two different symbol tables:
- the
.symtabsection (SHT_SYMTAB) lists all the symbols including the local symbol which are not used outside of the ELF file; - the
.dynsymsection (SHT_DYNSYM) is a (usually) much smaller symbol table which only contains the imported and exported symbols.
The former can be used by debugging tools and the latter contains the minimum amount of entries for the dynamic linker. For this reason, only the latter is mapped in the process virtual address space and is present in the dynamic table.
The symbol tables are arrays of symbol entries:
typedef struct {
Elf64_Word st_name; /* Symbol name (string tbl index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility (and 0) */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;
At runtime, the dynamic symbol table is given by the dynamic table entry ST_SYMTAB. Its size is not given and can be inferred from the hash table (DT_HASH or DT_GNU_HASH).
readelf -s can display the symbol tables.
Symbol type
STT_OBJECT, global variablesSTT_FUNC, executable code (function, subprogram, method);STT_SECTION, sectionSTT_FILE, gives a file name and precedesSTB_LOCALsymbols of the fileSTT_TLSis used for TLS variables such aserrno,h_errno.
Section index
Each symbol can be associated with a section (by its index).
Some special values are used:
STT_UNDEFmeans that the symbol is undefined. It is not defined in this ELF file but is only references by it.STT_COMMONis used for a symbol which as not bean allocated yet. The value is an alignment constraint. This is used in.ofiles for uninitialised global variables (in C). It can be defined in multiple C files and will be instanciated only once.STT_ABSis used for absolute values which are not relocated. It is used forSTT_FILEentries and for GNU versioning.
Visibility and binding
| Binding | Visibility | Meaning |
|---|---|---|
STT_LOCAL | STV_DEFAULT | Local to relocatable object |
STT_GLOBAL | STV_HIDDEN | Local to the executable or DSO[25] |
STT_GLOBAL | STV_DEFAULT | Global (visible in other runtime ELF modules) |
Symbol binding
The symbol binding control the link-time visibility of the symbol (i.e., outside translation units and within a given ELF runtime objecte but not across runtime ELF objects). It is a part of the stb_info field.
-
STB_LOCALsymbols are local to a.ofile (they are used forstaticvariables and functions in C and for things in anonymous namespace in C++).Multiple symbols with the same name can be in the same
ET_EXECorET_DYN(originating from differentET_REL): they are usually located before aSTT_FILEentry with the source file name of the corresponding compilation unit in the.symtab -
STB_GLOBALsymbols are visible outside of the.ofile. -
STB_WEAKis similar toSTT_GLOBAL.When combining multiple
.ofiles into one executable or DSO, the link editor will raise an error if multipleSTT_GLOBALversions of the same symbols are defined but aSTT_WEAKsymbol with the same name as aSTT_GLOBALor anotherSTT_WEAKsymbol can appear.A weak symbol does not need to be resolved: an unresolved weak symbol has a value of 0. The link editor will not pull
.orelocatable objects from.aarchives in order to resolve undefined weak symbols.
Symbol visibility
The symbol visibility controls the visibility across executable and DSOs. It is stored in the st_other field. This field is not relevant for STT_LOCAL symbols.
The different values are:
STV_DEFAULT, global visibility;STV_PROTECTED, global visibility but local references do not use the PLT[26];STV_HIDDEN, not visible outside of the executable or shared-object.STV_INTERNALsimilar toSTV_HIDDENbut may have some additional processor-specific semantic. Apparently the intent is that the symbol cannot be accessed outside the module (STV_HIDDENmight be accessed indirectly).
The STT_HIDDEN can be used in order to mark symbols which need not be used outside of the DSO:
- by reducing the number of exported symbols, this can speed up the symbol lookups by the dynamic linker;
- the PLT can be avoided and the function id called directly;
- this can avoid unexpected usage by other DSOs of symbols which are not part of the ABI of this DSO .
The visibility of a symbol can be defined in GCC with the visibility attribute:
int get_answer(void) __attribute__(visibility("hidden"))
{
return 42;
}
The default visibility can be changed with command-line arguments with recent versions of GCC (gcc -fvisibility=hidden) or with pragmas:
#pragma GCC visibility push(hidden)
int get_answer(void) __attribute__(visibility("hidden"))
{
return 42;
}
#pragma GCC visibility pop(hidden)
Relocation tables
The relocation tables are arrays of relocation entries using one of those forms:
typedef struct {
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
} Elf64_Rel;
typedef struct {
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
Elf64_Sxword r_addend; /* Addend */
} Elf64_Rela;
The relocations exist in two forms. In both cases an addend (offset) is added to the symbol:
- with explicit addend (
Elf64_Rela), the addend is stored in ther_addendfield of the relocation table; - without explicit addend (
Elf64_Rel), the addend is stored in the relocation address.
readelf -r can display the relocation tables.
Relocation address
ET_REL files have one relocation section .rela.foo (or .rel.foo) per relocated section .foo. The r_offset address of the relocation is the offset of within the relocated .foo section.
For ET_EXEC and ET_DYN files, there is usually two relocation tables: the normal relocation table .rela.dyn (or .rel.dyn) and the lazy/PLT relocation table .rela.plt (or .rel.plt). The r_offset address of the relocation has a different meaning: it is the (runtime) virtual address of the relocation. The location of the relocation tables is described at runtime in the dynamic section (DT_RELA, DT_REL, DT_RELASZ, DT_RELSZ, DT_RELAENT, DT_RELENT DL_PLTREL, PLTRELSZ, DT_JMPREL).
GOT
The executable code is (usually) in the read-only segment:
- we want to avoid to be able to modify the code for security reasons;
- by avoiding to modify the code we can share the same same physical pages for the code for all processes using this ELF object.
As we do no want to modify the code (in the readonly text segment) in order to share it, the dynamic linker cannot relocate the DSO by patching the addresses of the referenced objects in the executable code. Instead, the address of the object is stored by the dynamic linker in the writable segment and the code fetches this address.
The link editor creates a section in the writable segment, the GOT (.got), containing all the slots for those addresses[27]. It creates a relocation entries in order to make the dynamic linker store the suitable values in the GOT.
GOT examples for x86_64
Compilation
For example, this C code:
extern int foo;
int get_foo()
{
return foo;
}
compiles into this (gcc -S deref.c -o- -fPIC):
get_foo:
movq foo@GOTPCREL(%rip), %rax
movl (%rax), %eax
ret
foo@GOTPCREL(%rip) resolves to a memory address (a entry in the GOT) where the address of foo is written: the first instruction stores this address in the %rax register. In the next instruction, the processor fetches the foo variable by dereferencing this address.
Relocatable object
When compiled into a relocatable object, we get this relocation:
Relocation section '.rela.text' at offset 0x250 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000003 000b00000009 R_X86_64_GOTPCREL 0000000000000000 foo - 4
It asks the link editor to generate a GOT entry for the address of foo and fill the relative address of this GOT entry in the instruction (movq foo@GOTPCREL(%rip), %rax). The link editor creates the GOT entry.
An addend (offset) of -4 is used because the relative instructions in x86 are using the address of the next instruction as a base address.
Shared object
At runtime, the GOT entry needs to be filled by the dynamic linker. In order to do this, the link editor creates a relocation for the GOT entry in the shared-object:
Relocation section '.rela.dyn' at offset 0x458 contains 9 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000200990 000800000006 R_X86_64_GLOB_DAT 00000000002009ec foo + 0
This entry sets the third entry in the .got GOT[28]:
[19] .got PROGBITS 0000000000200980 00000980
0000000000000030 0000000000000008 WA 0 0 8
PLT
The Procedure Linkage table is used for calling functions whose address is not known at link time (because they might be in another shared-object or the executable). The PLT can be disassembled with objdump -D -j .plt.
For example this code:
#include <stdlib.h>
int main(int argc, char** argv)
{
abort();
return 0;
}
is compiled into (gcc test.c -S -o- -fPIC -O3):
main:
subq $8, %rsp
call abort
When decompiling the resulting executable we find that the call to foo as been replaced by a call to a stub for abort@plt (called a trampoline):
0000000000400410 <main>:
400410: 48 83 ec 08 sub $0x8,%rsp
400414: e8 c7 ff ff ff callq 4003e0 <abort@plt>
This trampoline fetches the address of the abort in the GOT and jumps to this address:
00000000004003e0 <abort@plt>:
4003e0: ff 25 ea 04 20 00 jmpq *0x2004ea(%rip) # 6008d0 <_GLOBAL_OFFSET_TABLE_+0x18>
4003e6: 68 00 00 00 00 pushq $0x0
4003eb: e9 e0 ff ff ff jmpq 4003d0 <_init+0x28>
All of this is done by the first instruction of this PLT trampoline: the two remaining instructions are used for lazy relocation which is explained afterwards.
A relocation exists in order to store the address of foo in this PLT GOT entry:
Relocation section '.rela.plt' at offset 0x360 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 0000006008d0 000100000007 R_X86_64_JUMP_SLO 0000000000000000 abort +
Lazy relocations
Relocation in dynamic linking can slow down the initialisation of the application: each symbol must be looked up in all loaded DSOs and the executable. In order to speed up the relocation of programs, lazy relocation is used for function calls[29]: the corresponding PLT GOT entry is not filled with the address of the function in the process initialisation but only when the function is actually called.
# Special .PLT0 entry:
00000000004003d0 <abort@plt-0x10>:
4003d0: ff 35 ea 04 20 00 pushq 0x2004ea(%rip) # 6008c0 <_GLOBAL_OFFSET_TABLE_+0x8>
4003d6: ff 25 ec 04 20 00 jmpq *0x2004ec(%rip) # 6008c8 <_GLOBAL_OFFSET_TABLE_+0x10>
4003dc: 0f 1f 40 00
# .PLT1 for abort:
00000000004003e0 <abort@plt>:
4003e0: ff 25 ea 04 20 00 jmpq *0x2004ea(%rip) # 6008d0 <_GLOBAL_OFFSET_TABLE_+0x18>
4003e6: 68 00 00 00 00 pushq $0x0
4003eb: e9 e0 ff ff ff jmpq 4003d0 <_init+0x28>
- The dynamic linker preinitialises the PLT GOT,
- the first entry of the PLT GOT is filled by the dynamic linker with the address of
_DYNAMIC; - the second entry of the PLT GOT is filled by the dynamic linker with a value used by the dynamic linker to recognise this ELF executable or DSO;
- the third entry of the PLT GOT is filled with the address of a callback of the dynamic linker;
- the PLT GOT entry for
abort@pltis initially filled with the address of its second instruction (0x4003e6);
- the first entry of the PLT GOT is filled by the dynamic linker with the address of
- on the first call of the PLT trampoline
abort@plt, a. the first instruction of the trampoline jumps to the second instruction of the trampoline; b. the second instruction of the PLT pushes on the stack the index of this relocation in the relocation table (fromDT_JMPREL); c. the third instruction jumps to the first entry of the PLT (.PLT0); d. this entry pushes the second entry of the PLT GOT on the stack (this is used by the dynamic linker to identify this shared-object); e. this entry jumps to the callback of the dynamic linker; f. the dynamic linker does the real relocations,- it uses the arguments passed on the stack (identifier of this shared-object or executable and index in the relocation table),
- it resolves the symbol;
- it updates the PLT GOT entry with the address of the symbol;
- it jumps to the address of the symbol in order to execute the function; g. the function is executed;
- on other calls, the PLT GOT entry now contains the address of the function and the PLT entry jumps to it directly (instead of jumping to
.PLT0and to the dynamic linker).
In the section header table:
- the part of the GOT used by the PLT is in a separate section
.got.plt; - the relocations of this PLT GOT are in a separate section
.rela.plt.
In the dynamic section:
DT_JMPRELandDT_PLTRELSZgive the size of this relocation table (andDT_PLTRELwhether it uses addends or not);- the
DT_GOTPLTentry is used to tell the dynamic linker where it should find the the (three) special PLT GOT entries.
PLT example for x86_64
Compilation
This time let's compile a function call:
extern int foo(void);
int get_foo()
{
return foo() + 42;
}
We get this assembly (cc -O3 -S -fpic):
get_foo:
.LFB0:
subq $8, %rsp
call foo@PLT
addq $8, %rsp
addl $42, %eax
ret
The foo@PLT asks the assembler to use the address of a PLT entry for the foo function
Relocatable object
We get this relocation in the relocatable object:
Relocation section '.rela.text' at offset 0x260 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000005 000b00000004 R_X86_64_PLT32 0000000000000000 foo - 4
It asks the link editor to patch the instruction with the 32-bit relative address of the PLT entry for symbol foo. The link editor creates a PLT entry, corresponding PLT GOT entry (in the .got.plt) section and a relocation entry for this PLT GOT entry (in .rela.dyn).
Shared object
We get this relocation in the shared-object:
Relocation section '.rela.plt' at offset 0x4f0 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000200960 000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0
This relocation entry asks the dynamic linker to lazily initialise the PLT GOT entry:
- it will first fill the PLT GOT entry with the second instruction of the associated PLT entry;
- when the PLT is called, it will call the dynamic linker which will initialise the PLT GOT entry with the address of the
foosymbol.
Some x86_64 relocations
Link time relocation:
R_X86_64_PC32, 32-bit relative address of the symbol;R_X86_64_PLT32, 32-bit relative address of the PLT entry of the symbol;R_X86_64_GOTPCREL, relative address of the GOT entry of the symbol.
Runtime relocations:
R_X86_64_JUMP_SLOT, sets a lazy PLT GOT entryR_X86_64_GLOB_DAT, sets a GOT entryR_X86_64_COPY, copy-relocationR_X86_64_RELATIVER_X86_64_64, stores the 64 value of the symbol
Some x86 relocations
R_386_32R_386_PC32R_386_GOT32R_386_PLT32R_386_COPY, copy relocationR_386_GLOB_DAT, set a GOT entryR_386_JMP_SLOT, set a PLT GOT entryR_386_RELATIVER_386_GOTOFFR_386_GOTPC
Hash tables
Standard hash table
The standard hash table is built by the link editor. It is described by the .hash SHT_HASH section and by the DT_HASH entry in the dynamic section. Its structure is quite simple:
// Pseudo-C:
struct {
Elf32_Word nbucket; /* Number of buckets */
Elf32_Word nchain; /* Numer of entries in .dynsy* */
Elf32_Word buckets[nbucket]; /* First entry in the chain */
Elf32_Word chains[nchain]; /* Next entry in the chain */
};
buckets[hash % nbucket]gives the initial index in both the symbol table and the chain array;chains[index]gives the next index in the chain;index == STN_UNDEFmarks the end of the chain.
The lookup looks like this:
Elf64_Sym const* lookup_symbol(
const char* symbol,
Elf64_Sym const* symbol_table
const char* string_table,
Elf32_Word const* hash_table)
{
Elf32_Word nbucket = hash_table[0];
Elf32_Word nchain = hash_table[1];
Elf32_Word const* buckets = hash_table + 2;
Elf32_Word const* chains = hash_table + 2 + nbucket;
unsigned long hash = elf_hash(symbol_name);
// Iterate on the chain:
while (Elf32_Word index = buckets[hash % nbucket];
chains[index] != STN_UNDEF;
index = chains[index])
if (strcmp(symbol, string_table + symbol_table[index].st_name) == 0)
return symbol_table + index;
return NULL;
}
GNU hash table
The GNU hash table is a more efficient alternative to the standard hash table[30]. Both can be present in the same ELF file but modern GNU ELF files usually only contains the GNU hash table. It is described by the .gnu.hash SHT_GNU_HASH section and by the DT_GNU_HASH entry in the dynamic section.
Main differences:
- It adds a Bloom filter in order to speed up negative lookups. Negative lookups are the common case since the symbol is searched in different ELF files in sequence.
- It adds the value of the hash next to each entry in order to avoid useless string comparison.
- It is more cache-friendly by avoiding to jump around in the hash table memory.
- It uses the DJB hash function.
// Pseudo-C:
struct Gnu_Hash_Header {
uint32_t nbuckets;
uint32_t symndx; /* Index of the first accessible symbol in .dynsym */
uint32_t maskwords; /* Nyumber of elements in the Bloom Filter */
uint32_t shift2; /* Shift count for the Bloom Filter */
uintXX_t bloom_filter[maskwords];
uint32_t buckets[nbuckets];
uint32_t values[dynsymcount - symndx];
};
Notes
Each entry of a note section begins with:
typedef struct {
Elf64_Word n_namesz; /* Length of the note's name. */
Elf64_Word n_descsz; /* Length of the note's descriptor. */
Elf64_Word n_type; /* Type of the note. */
} Elf64_Nhdr;
After this comes the note name and the note content:
- The note name is the name of the owner of the note. Each owner can define its own values for the note type. The
n_nameszfield includes the terminating 0 byte at the end of the name.
Padding is used after the name and the content of the note to ensure 4 byte alignment.
Each note is usually in its own section (.note.XXX) but they are all grouped in the same program entry. readelf -n can display the notes.
GNU notes
GNU notes are using the string "GNU" (with a terminating 0 byte) and define the notes:
-
NT_GNU_ABI_TAGis used to describe the ABI used by the file.The first 64-bit is the target system (ELF_NOTE_OS_LINUX for Linux) and the following bytes are a minimum version number.
Example (GNU/Linux 2.6.32):
Hex dump of section '.note.ABI-tag': 0x0040021c 04000000 10000000 01000000 474e5500 ............GNU. 0x0040022c 00000000 02000000 06000000 20000000 ............ ...
-
NT_GNU_BUILD_IDis used to associate a build-id to a given build of a ELF executable or shared-object. This is used to locate a separate file containing its debug informations.Example (d53a4435d14a5ac3009bad8c6f840175b37aa86a):
Hex dump of section '.note.gnu.build-id': 0x00400274 04000000 14000000 03000000 474e5500 ............GNU. 0x00400284 d53a4435 d14a5ac3 009bad8c 6f840175 .:D5.JZ.....o..u 0x00400294 b37aa86a .z.j
-
NT_GNU_HWCAP
CORE notes
See Anatomy of an ELF core file.
LINUX notes
See Anatomy of an ELF core file.
Dynamic section
The dynamic section provides important informations for the dynamic linker. A statically linked executable does not have a PT_DYNAMIC entry.
It is an array of entries with the structure:
typedef struct {
Elf64_Sxword d_tag; /* Dynamic entry type */
union {
Elf64_Xword d_val; /* Integer value */
Elf64_Addr d_ptr; /* Address value */
} d_un;
} Elf64_Dyn;
readelf -d can display the content of the dynamic section.
The dynamic table is available as at runtime with the _DYNAMIC local symbol. A DT_NULL entry marks the end of the dynamic section.
Shared objects
DT_NEEDEDdefines a shared-object dependency;DT_SONAMEis the name of the current shared-object. This value is copied by the link editorldasDT_NEEDEDentry of dependent ELF objects[31].
RPATH
The DT_RUNRPATH (and DT_RPATH [32]) defines an additional path where the shared-objects should be searched.
The dynamic linker (ld.so) recognises several special values in DT_RUNRPATH (and DT_RPATH):
$ORIGINexpands to the directory of the ELF file;$LIBexpands toliborlib64depending on the architecture;$PLATFORMexpand tox86_64for x86_64.
The DT_RPATH can be set with ld -rpath='$ORIGIN' (or gcc -Wl,-rpath='$ORIGIN'). ld --enable-new-dtags might be needed to add the DT_RUNPATH entries as well.
Symbols
DT_SYMTABgives the runtime location of the symbol table (section.dynsymof typeSHT_DYNSYM) andDT_SYMENTgives the byte size of a single entry[33].DT_HASHgives the runtime location of the standard symbol hash table (.hashof typeSHT_HASH).DT_GNU_HASH, program memory location of the GNU symbol hash table (section.gnu.hashof typeSHT_GNU_HASH).
The type of hash table generated by the link editor can be chosen with ld --hash-style=style=sysv|gnuboth`. By default, the GNU hash table is used on (not-too old) GNU systems.
Relocations
At runtime there is usually two different relocation tables: the main relocation table and the PLT relocation table.
The main relocation table (.rela.dyn section) is located with DT_RELA (address), DT_RELASZ (byte size of the relocation table), DT_RELAENT (byte size of a relocation entry) for relocation tables with addend. The main relocation table without addend uses DT_REL, DT_RELSZ and DT_RELENT.
Another relocation table (.rela.plt section) is used for the PLT. It is located with: DT_JMPREL (address) and DT_PLTRELSZ (byte size of the relocation table). The DT_PLTREL gives the type of relocation table (either DT_RELA or DT_REL) used for the PLT.
The DT_PLTGOT is the address of the PLT GOT (.got.plt). The dynamic linker needs to know it because the first entries of the PLT GOT are used by the dynamic linker.
Symbol lookup
Each relocation implies a symbol lookup.
In ELF, symbol resolution is using a mostly[34] flat-namespace[35]: a used symbol is not bound to a specific DSO and is it searched in all the executable and all DSOs with breadth-first search[36] (using the order of the DT_NEEDED entries).
This search is in O(#modules). For each executable or shared-object, a hash table (DT_HASH, DT_GNU_HASH or both) is included in the file (and available at runtime) in order to speed up the symbol lookup.
Flags
DT_FLAGS is a field of flags:
DF_ORIGINis used when the current shared-object uses the$ORIGINvariable.- If the
DF_SYMBOLICflag is present, a given shared-object will always use its local definitions before definitions from another shared-object or the executable. - If the
DF_TEXTREL[37] flag is present, text relocations are used: relocation are done in non-writable segment (usually this is text segment) and the dynamic linker might need to make the text segment temporarily writable. It is usually not present because it prevents sharing of the text segment between different processes. DF_BIND_NOW(ld -z now) disabled lazy relocations for the generated executable or shared-object.
Initialisation and termination functions
Initialisation functions are called in this order:
DT_PREINIT_ARRAYarray (of byte sizeDT_PREINIT_ARRAYSZ) of preinitialisation function addresses.DT_INIT, address of an initialisation function (the.initsection);DT_INIT_ARRAYarray (of byte sizeDT_INIT_ARRAYSZ) of initialisation function addresses.
Termination functions are called in this order:
DT_FINI_ARRAYarray (if byte sizeDT_FINI_ARRAYSZ) of termination function addresses;DT_FINIaddress of a termination function respectively (.finisections).
Debug interface
If a DT_DEBUG entry is present, this value will be set by the dynamic linker to a pointer to the address of a struct r_debug (see link.h):
struct r_debug
{
int r_version; /* Version number for this protocol. */
struct link_map *r_map; /* Head of the chain of loaded objects. */
ElfW(Addr) r_brk;
enum {
RT_CONSISTENT, /* Mapping change is complete. */
RT_ADD, /* Beginning to add a new object. */
RT_DELETE /* Beginning to remove an object mapping. */
} r_state;
ElfW(Addr) r_ldbase; /* Base address the linker is loaded at. */
};
This can be used to traverse the list of executables and shared-objects (of a given namespace):
struct link_map {
/* These first few members are part of the protocol with the debugger.
This is the same format used in SVR4. */
ElfW(Addr) l_addr; /* Difference between the address in the ELF
file and the addresses in memory. */
char *l_name; /* Absolute file name object was found in. */
ElfW(Dyn) *l_ld; /* Dynamic section of the shared object. */
struct link_map *l_next, *l_prev; /* Chain of loaded objects. */
};
The struct link_map can be obtained at runtime with dlinfo(handle, RTLD_DI_LINKMAP, &res).
String table
DT_STRTAB and DT_STRSZ give the location and byte size of string table used by the dynamic section (.dynstr);
Symbol versions
Those entries are GNU extensions for versioning of symbol:
DT_VERSYMis the runtime location of the symbol version table (.gnu.versionsection of typeSHT_GNU_versym). It contains the same number of entries as the dynamic symbol table and references a entry in the version definition table.DT_VERDEFis the runtime location of the symbol definitions (.gnu.version_dsection of typeSHT_GNU_verdef) andST_VERDEFNUMis the number of entries.DT_VERNEEDis the runtime location of the version requirements (.gnu.version_rsection of typeSHT_GNU_verned) andDT_VERNEEDNUM` is the number of entries.
Not covered (much) here
GNU symbol versioning
Main structures:
- The symbol version table (
DT_VERSYM,.gnu.version,SHT_GNU_versym) defines the version associated with each dynamic symbol. - The version definition section (
DT_VERDEF,ST_VERDEFNUM.gnu.version_d,SHT_GNU_verdef) defines the versions implemented in this ELF file. It uses theElf64_VerdefandElf64_Verdauxstructures. - The version requirements section (
DT_VERNEED,DT_VERNEEDNUM,.gnu.version_r,SHT_GNU_verned) defines for each imported (DT_NEEDED) entry the required versions. It uses theElf64_VerneedandElf64_Vernauxstructures.
See the LSB.
TLS
The ELF file contains an initialisation image for the TLS data:
- the
SHF_TLSsection flag is used for TLS initialisation sections (.tdataand.tbss); - The
PT_TLSprogram header type is used to describe the location of the initialisation image of the TLS data and is contained in aPT_LOADsegment. It contains all theSHF_TLSsections. - The
STT_TLSsymbol type is used for TLS data symbol. They are expected to be located in thePT_TLSrange.
See ELF Handling For Thread Local Storage.
COMDAT
COMDAT refers to the ability of the static linker to remove redundant code and data when combining different .o files. This is used in C++ when instanciating templates. In order to do this, the compiler creates dedicated sections for each template instanciation.
For example, this C++ code:
#include <string>
std::string foo(std::string& x)
{
return x + x;
}
Generates the following sections in the relocatable object:
$ readelf -WS test.o
There are 26 section headers, starting at offset 0xc058:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .group GROUP 0000000000000000 000040 00000c 04 24 18 4
[ 2] .text PROGBITS 0000000000000000 00004c 00002d 00 AX 0 0 1
[ 3] .rela.text RELA 0000000000000000 008278 000018 18 I 24 2 8
[ 4] .data PROGBITS 0000000000000000 000079 000000 00 WA 0 0 1
[ 5] .bss NOBITS 0000000000000000 000079 000000 00 WA 0 0 1
[ 6] .text._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ PROGBITS 0000000000000000 000079 000062 00 AXG 0 0 1
[ 7] .rela.text._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ RELA 0000000000000000 008290 000060 18 I 24 6 8
[ 8] .gcc_except_table._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ PROGBITS 0000000000000000 0000db 000010 00 AG 0 0 1
[...]
Section groups (sections with .group name and SHT_GROUP type) are used to group related sections: the first Elf32_Word of the group section is a set of flags (GRP_COMDAT is used for COMDAT section groups) and the remaining Elf32_Word of the section are the indices of the sections belonging to this group.
ARM
.ARM.*section names andSHT_ARM_*section typesPT_ARM_*program header types
References
Authoritative references
- ELF Format
- SystemV gABI (Generic Application Binary Interface):
- System V ABI, the main spec for ELF
- System V ABI - Draft 17, contains update with the System V ABI 4.1
- System V psABI (Processor-specific Application Binary Interface):
- System V ABI ~ Intel386 Supplement 4th edition, the spec for ELF on x86
- System V ABI ~ AMD64 supplement Draft 0.99.6, the spec for ELF on x86_64
- ELF for the ARM Architecture
- ARM ELF File Format
- 64-bit PowerPC ELF Application Binary Interface Supplement
- Linux Programmer's Manual ~ ELF(5)
- GNU Symbol Versioning
- The Linux Standard Base for GNU/Linux specific stuff including:
- The DWARF specification, for debugging information
Blogs posts, articles, books and such
- Relocations, relocations
- Interpreting readelf -r, in this case R_X86_64_PC32
- Inside ELF symbol tables
- How to write Shared Libraries by Ulrich Depper has a lot of information about the ELF format especially from the point of view of dynamic linking performance (for example the different hash tables)
- DT_GNU_HASH by deroko of ARTeam
- Linkers and names
- Smallest x86 ELF Hello World
- The Itanium C++ ABI
- Itanium C++ ABI, defined a name mangling scheme which is used by G++ on all architectures.
- Itanium C++ ABI: Exception Handling
- The content of DWARF sections
- GNU Hash ELF Sections by Ali Bahrami
- PIC in shared libraries
- PIC in shared libraries on x64
- Linker relro
- Piece of PIE
- ELF Handling For Thread Local Storage
- GNU Hash ELF Section by Ali Bahrami
- Linkers and Loaders
- Dynamic Linking: ELF vs. Mach-O
- ELF - No Section Header? No Problem
- The PIE is not exactly a lie
- Load-time relocation of shared libraries
- System V ABI page on OS Dev wiki
- Introduction to ELF slides
Backlinks
- Dissecting mobile native code packers. A case study.
- sysfilter: Automated System Call Filtering for Commodity Software
- Google Summer of Code 2021 Summary ~ 08A: Refactoring ELF binaries loading
- [http://cs.brown.edu/courses/cs1650/lectures.html]
- CSCI 1650 – Software Security and Exploitation
- CSCI 2951U – Topics in Software Security
- CSCI 1951H – Software Security and Exploitation
- How2rev, Rise, O Tarnished. Embark upon the sacred Path of reverse engineering through the CTF trials.
Static libraries (
.afiles) are archives to.ofiles. Different formats exist for them. ↩︎Notable exception are the Apple systems (MacOS X, iOS, Darwin) which use their own Mach-O format (coming from their NeXTSTEP lineage) and Microsoft systems (Windows) which use the PE file (Portable Executable) format (which is based on the old Unix System V COFF format). ↩︎
For example, it used used for ARM-based embedded software. ↩︎
GNU
objdumpandobjcopyboth rely on BFD and are unable to see some sections (and can synthesise some others) because of the file-format abstraction of the BFD library.objdumpfrom elfutils (calledeu-objdumpon some distributions) does not have this limitation (but only has a limited subset of the feature of GNUobjdump). ↩︎I wrote this tool because
objcopy --dump-sectionwas not completely satisfying. ↩︎ ↩︎With the GNU BFD linker, the layout of sections after linking is given by a linker script. The default linker script can be seen with
ld -verbose. Another linker script can be used withld -T some_linker_script. ↩︎The C structures (and the associated comments) are taken from the GNU
elf.hfile. Only the 64 bit variant is displayed here. ↩︎This is an extension to the ELF standard not documented in the specification. ↩︎
They are using PIC code (Position Independent Code). They must be compiled with
cc -fpic(or-fPIC). ↩︎In contrast to PE (Portable Executable) files, the (readonly) text segment (such as the code) is shared for all processes (and with the filesystem cache) even if the shared-object is loaded at different addresses. In order to achieve this, the code for shared-objects should be compiled as PIC (Position Independent Code).
PE files are built with a preferred address and if they must be relocated, the code becomes private to the process. In other words, Windows DLL (Dynamic-Link Library) do not use PIC. ↩︎
Prelinked DSOs are located at a given (non-null) address in the ELF file. ↩︎
They are compiled with
cc -fpie(or-fPIE). ↩︎The
.debug_frameDWARF section is used to tell the debugger how to unwind each stack frameThe
.eh_framehas been created in order to unwind the stack at runtime. This is used for exception handling.The
.eh_framesection contains information for uwinding the frame for each instruction address. This is use by the Itanium C++ exception ABI to unwind the stack on exceptions. Its format is based on the.debug_frameDWARF section..gnu_debuglinkis used to locate a separate file containing debug informations. Another solution is to use aNT_GNU_BUILD_IDnote. ↩︎.note.gnu.build-iddescribes the build-id used to locate a separate ELF file containing the debug informations. This is theNT_GNU_BUILD_IDnote. ↩︎.gnu.warningand.gnu_warning.XXXcontains warning message displayed by the linker to issue warnings when linking against this ELF file or this symbol respectively.Example:
Hex dump of section '.gnu.warning.gets': 0x00000000 74686520 60676574 73272066 756e6374 the `gets' funct 0x00000010 696f6e20 69732064 616e6765 726f7573 ion is dangerous 0x00000020 20616e64 2073686f 756c6420 6e6f7420 and should not 0x00000030 62652075 7365642e 00 be used..
↩︎ ↩︎For relocation sections which apply to a single section, the
sh_infofield is the index of the target section. ↩︎As a result, the sections in the ELF files are grouped in three parts:
-
the sections which belong to the text segment;
-
the sections which belong to the data segment;
-
the sections which do not belong to any segment (and are not available/used at runtime).
-
This means that there is usually no runtime relocation in the text segment: all the runtime relocations are done in the text segment.
If the
DT_TEXTRELflag is present (or aDT_TEXTRELdynamic table entry) is present, text relocation are present in this file. ↩︎This property is so important that the MPROTECT feature of the PaX (a Linux patch) prevents the existence of VMAs which are both executable and writable in most cases in order to enhance security. ↩︎
The VMA are the different available/mapped regions in the virtual address space. Each VMA has some properties such as:
-
permissions (rwx);
-
whether it is shared with other processes (
MAP_SHARED) or private to this process (MAP_PRIVATE); -
whether it has an associated file (and the offset of the VMA within the file);
-
etc.
They are created with
mmap()(or similar) or directly by the kernel. On Linux, they can be seen in/proc/$pid/mapsor with thepmaptool. ↩︎ ↩︎-
However they can use other techniques such as GOT infection and ROP (Return Oriented Programming). ↩︎
The PLT GOT is still vulnerable to GOT poisoning. ↩︎
In C, symbols have the name of the corresponding C function or variable on ELF systems.
In C++, function overloading, templates, namespaces and so on make it more difficult. The name of the object (including the types of its arguments for functions) is mangled to form the symbol. Different name mangling schemes exist, but modern versions of GCC and clang use the name mangling of the C++ Itanium ABI: For example with this ABI, the
foo::Something::bar(int)method is mangled into_ZN3foo9Something3barEi. Thec++filtprogram can be used to demangle C++ symbol names (or the__cxa_demanglefunction). ↩︎This is what appears in the
.ofile. In the shared-object or executable, it is converted toSTT_LOCALandSTV_DEFAULT. ↩︎The usage of
STV_PROTECTEDsymbols is not recommended because it slows down the dynamic linkage. ↩︎In fact, it creates two GOT sections:
.gotand.got.plt. ↩︎The address of the GOT entry is
0x200990and the address of.gotis0x200980: the offset of the GOT entry within.gotis0x200990 - 0x200980 = 0x10 = 16. Each GOT entry is 8 bytes on x86_64 so this is the third entry. ↩︎The usage of the PLT can be disabled at compile-time (for a given compilation unit) with
cc -fno-pltor for a given function with__attribute__((noplt)). This disables lazy binding. ↩︎See GNU Hash ELF Section by Ali Bahrami and How to write Shared Libraries by Ulrich Depper. ↩︎
Each shared-object dependency is described with a
DT_NEEDEDentry. A typical value islibfoo.so.6(where6is a version number). This file is searched in different directories by the dynamic linker. A same shared object can be present in different incompatible versions.The link editor
ldlinks againstlibfoo.so(using the-lfooflag) which is a symbolic link to the current version of the shared object. Shared objects usually contain aDT_SONAMEentry defining the full (shared-object) name (libfoo.so.6) of this shared-object. This value is copied a asDT_NEEDEDentry in the dependent ELF objects.If no
DT_SONAMEis present, the link editor creates aDT_NEEDEDentry withlibfoo.soinstead when given the-lfooflags.If a full path to the shared object is given to
ldand this shared object does not haveDT_SONAMEentry, the full path to the shared object will be used in theDT_NEEDEDentry. ↩︎DT_RPATHserves the same purpose but is searched before theLD_LIBRARY_PATHenvironment variable which is not considered a good solution. For this reason, theDT_RUNRPATHwas created as a replacement: the values ofDT_RUNPATHare searched after theLD_LIBRARY_PATHenvironment.DT_RPATHis deprecated and ignored whenDT_RUNPATHis present (and recognised by the dynamic linker). ↩︎There is no size/number of entries for the symbol table at the program header table level. This is not needed at runtime as the symbol lookup always go through the hash table. ↩︎
Solaris and GNU systems have the ability to handle different namespaces (see
dlmopen()): different shared-object can be placed in different namespaces. Usually only two namespaces are used: one for the dynamic linker and a second one for the the application and the shared-object libraries. ↩︎This is on contrast with Windows PE (Portable Executable) files and MacOS X which both use a two-level namespace lookup: they import a given symbol from a given DLL (Dynamic-Link Library) or
.dyld. ↩︎This is a simplification. Other things influence the order and the set of ELF modules used for a given lookup:
DT_SYMBOLIC,dlopen(),dlmopen()etc.dlopen-ed shared-object and their dependencies are not added to the global scope but only in a local scope (unlessRTLD_GLOBALis used).dlmopen()can be used to create separate symbol namespaces with their own sets of ELF shared-objects. ↩︎The
DT_TEXTRELdynamic table entry can be used as well but its usage is deprecated/optional. ↩︎