The ELF file format
Published:
Updated:
Some notes on the ELF 🧝 file format with references, explanations and some examples.
The ELF (Executable and Linkable Format) file format is a standard file format for executable files, dynamic libraries[1] (DSOs, .so
files), compiled compilation unit (.o
files) and core dumps. It is used for many platforms[2] including many recent Unix-ish systems (System V, GNU, BSD) and embedded software[3].
You might want to read this document alongside with the outputs of readelf
, objdump -D
[4], objcopy --dump-section
, elfcat
[5] and/or an hexadecimal editor. You might want to cross-reference with elf.h
, the manpage (man 5 elf
) or the ELF specs.
Table of content
Basic structure
The ELF header is located at the beginning of the ELF file and contains information about the target operating system (OS), architecture, the type of ELF file (executable, dynamic library, etc.) and the location of two important structures within the ELF file defining two views of the ELF file:
- the program header table defining the execution view;
- the section header table defining the linking view.
Execution view
The execution view is given by the program header table. This table is used (by the kernel, by the dynamic linker, etc.) to create a runtime image of the program in memory:
- which ranges of the file should be loaded in memory (the segments);
- which dynamic linker should be used (if any);
- which other shared-objects are needed;
- how to resolve the references to other shared-objects;
- etc.
Linking view
The linking view is given by the section header table which describes the location of the different sections (within the file and within the the runtime image of the program).
The .o
files generated by the compiler are made of different sections (.text
for executable code, .data
for initialised global variables, .rss
for uninitialised global variables, .rodata
for read-only global variables, etc.): the link editor combines different .o
files in a single executable or DSO (Dynamic Shared Object), by merging the sections of the different .o
files with the same name, and generates some others (.got
, .dynamic
, .plt
, .got.plt
, etc.)[6].
The linking view is not used at runtime: all the information needed at runtime is in the the program header table. Some sections are not used at runtime (debugging information, full symbol table) and are not present in the execution view. Those sections and the section header table can be omitted (or stripped) from the ELF file.
If they are present those extra informations can be used by debugging tools (such as GDB), profiling tools, etc. Many tools for inspection and manipulation of ELF files (readelf
, objdump
) rely on the section table header to work correctly.
Other important structures
The dynamic section contains important informations used for dynamic linking.
Symbol tables list the symbols defined and used by the file.
Hash tables are used for efficient lookup of symbols by their name (symbol table entries by symbol name).
Relocation tables list the relocations needed to relocate the ELF file at a different memory address or to link it to other ELF objects.
String tables are lists of strings which are referenced at other places in the ELF file (for section names in the section header table, for symbol names on the symbol tables, etc.).
The GOT (Global Offset Table) is a table filled by the dynamic linker with addresses of functions and variables. The program uses those entries to get the address of variables or functions which could be located in another ELF module.
The PLT (Procedure Linkage Table) contains trampolines: they are stubs for functions which might be located in another ELF module. The program calls those stubs which calls the real function (by dereferencing a corresponding GOT entry). This is used for lazy relocation.
Notes are used to add miscellaneous informations (such as GNU ABI (Application Binary Interface) informations, GNU build IDs).
ELF header
The ELF header is at the beginning of the ELF file and contains:
- a 4-bytes magic number used to identify ELF files (0x7f followed by the
"ELF"
string); - informations about the ELF file;
- informations about the target machine and OS/ABI;
- the location of the main structures of the ELF files, the section table and the program table.
The ELF header is using the following structure[7]:
typedef struct {
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Architecture */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point virtual address */
Elf64_Off e_phoff; /* Program header table file offset */
Elf64_Off e_shoff; /* Section header table file offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size in bytes */
Elf64_Half e_phentsize; /* Program header table entry size */
Elf64_Half e_phnum; /* Program header table entry count */
Elf64_Half e_shentsize; /* Section header table entry size */
Elf64_Half e_shnum; /* Section header table entry count */
Elf64_Half e_shstrndx; /* Section header string table index */
} Elf64_Ehdr;
readelf -h
can display the content of the ELF header.
ELF class
The e_ident[EI_CLASS]
field describes the ELF class: 32-bit (ELFCLASS32
) or 64-bit (ELFCLASS64
) for 32-bit and 64-bit programs respectively.
The ELF structures are different for the two ELF classes: the fields are the same but their type and sometimes their order is different (in order to have packed structures). For example, the -ELF header is using the Elf32_Ehdr
and Elf64_Ehdr
structures for -ELFCLASS32
and ELFCLASS64
respectively.
ELF endianess
The e_ident[EI_DATA]
field describes the encoding (endianess) of the architecture (either ELFDATA2LSB
or ELFDATA2MSB
). The fields of the ELF file are encoded in the encoding/endianess of the architecture: you might have to swap the endianess (see endian.h
) if you process ELF files from a foreign architecture.
ELF type
The ELF type is in the e_type
field:
ET_REL
is used for relocatable objects (.o
files);ET_EXEC
is used for executable files (with the exception of PIEs which areET_DYN
);ET_DYN
is used for dynamic libraries also known as shared-objects (.so
files);ET_CORE
is used for core files[8].
A major difference between ET_EXEC
and ET_DYN
files is that ET_DYN
files are always fixed at a given position in the virtual address. In contrast, ET_DYN
files can be relocated anywhere in the virtual address space by applying a constant offset to its virtual addresses[9]: the same .so
file can be mapped at different locations in different processes[10]. Usually, the shared-object is mapped at address 0 in the ELF file[11].
Normal (ET_EXEC
) executables are always mapped at a given location so the location of their subprograms and global variables is always the same for each process. This knowledge can be exploited by an attacker to get control of the process. In order to avoid this, the program can be compiled as a PIE[12] (Position Independent Executable) which can be mapped (relocated) at any address in the process virtual address space. PIEs being relocatable are ET_DYN
instead of ET_EXEC
file.
The Linux kernel (vmlinux
) uses the ET_EXEC
type and its loadable modules (.ko
files) use the ET_REL
type.
Location of the header tables
The location of the section header table and program header table are described in the ELF header:
- in
e_phoff
,e_phentsize
,e_phnum
for the program header table (execution view); - in
e_shoff
,e_shentsize
,e_shnum
for the section header table (linking view).
Section header table
The section header table defines the linking view of the ELF file: each entry defines a section within the file. The compiler generates relocatable object (.o
files) made of different sections (.text
, .data
, .rodata
, .rss
, etc.). When the link editor ld
combines different relocatable objects into an executable or shared-object, it merges the sections with the same name in a single section in the final output. For example, it combines the .text
sections (containing the compiled code) of the different .o
files in a single .text
section.
The section table is an array of section descriptions with the structure:
typedef struct {
Elf64_Word sh_name; /* Section name (string tbl index) */
Elf64_Word sh_type; /* Section type */
Elf64_Xword sh_flags; /* Section flags */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Section size in bytes */
Elf64_Word sh_link; /* Link to another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;
The first entry of a section header table is always a empty null section (type SHT_NULL
).
readelf -S
can display the section header table. readelf -x
can be used to get a hexdump of a given ELF section. A raw dump of a section can be produced with objcopy a.out --dump-section .dynstr=/dev/stdout /dev/null | cat
. Note that, some sections are not visible to objcopy
and objdump
: you might want to use elfcat
[5:1] instead.
Section names
Each section has a name (.text
, .data
, .rodata
, .rss
, .got
, .plt
, etc.): all section names are stored in a string table (.shstrtab
). The e_shstrndx
field of the ELF header is the index (in the section header table) of the section containing the section names:
ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 [...] Section header string table index: 26 Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [26] .shstrtab STRTAB 0000000000000000 0001e220 00000000000000f3 0000000000000000 0 0 1
The sh_name
field of the section header is the byte offset of the section name within this string table.
Existing sections
Section name | Type | Usage (and equivalent runtime description) |
---|---|---|
.text | SHT_PROGBITS | Main executable code |
.data | SHT_PROGBITS | Initialised read and write data |
.rodata | SHT_PROGBITS | Read only data |
.bss | SHT_NOBITS | Uninitialised read and write data |
.data.rel.ro | SHT_PROGBITS | |
.tdata | SHT_PROGBITS | Initialised thread-local data (part of PT_TLS ) |
.tbss | SHT_NOBITS | Uninitialised thread-local data (part of PT_TLS ) |
.init | SHT_PROGBITS | Initialisation code (usually .init , DT_INIT ) |
.fini | SHT_PROGBITS | Termination code (usually .fini , DT_FINI ) |
.init_array | SHT_INIT_ARRAY | Addresses of initialisation functions (DT_INIT_ARRAY and DT_INIT_ARRAYSZ`) |
.fini_array | SHT_FINI_ARRAY | Addresses of termination functions (DT_FINI_ARRAY and DT_FINI_ARRAYSZ`) |
.ctors | SHT_PROGBITS | Similar to .init_array but old-school |
.dtors | SHT_PROGBITS | Similar to .fini_array but old-school |
.dynsym | SHT_DYNSYM | Dynamic symbol table (DT_SYMTAB ) |
.dynstr | SHT_STRTAB | String table for dynamic linkins (DT_STRTAB ) |
.symtab | SHT_SYMTAB | Full symbol table |
.symtab_shndx | SHT_SYMTAB_SHNDX | |
.strtab | SHT_STRTAB | String table used for the symbol table |
.relaXXX | SHT_RELA | Relocations for section XXX , with addend |
.relXXX | SHT_REL | Relocations for section XXX , without addend |
.rela.dyn | SHT_RELA | Other runtime relocations, with addend |
.rel.dyn | SHT_REL | Other runtime relocations, without addend |
.rela.plt | SHT_RELA | PLT relocations, with addend |
.rel.plt | SHT_REL | PLT relocations, without addend |
.got | SHT_PROGBITS | Main GOT |
.got.plt | SHT_PROGBITS | PLT GOT, GOT used by the PLT (lazy relocations) |
.hash | SHT_HASH | Standard symbol hash table (DT_HASH ) |
.gnu.hash | SHT_GNU_HASH | GNU symbol hash table (DT_GNU_HASH ) |
.gnu.version | SHT_VERSYM | GNU symbol versions (DT_VERSYM ) |
.gnu.version_r | SHT_VERNEED | GNU versions requirements (DT_VERNEED and DT_VERNEED_NUM ) |
.gnu.version_d | SHT_VERDEF | GNU versions definitions (DT_VERDEF and DT_VERDEF_NUM ) |
.debug_info | SHT_PROGBITS | DWARF, Main DWARF section (variables, subprograms, types, etc.) |
.debug_abbrev | SHT_PROGBITS | DWARF, Type of the nodes in debug_abbrev |
.debug_aranges | SHT_PROGBITS | DWARF |
.debug_line | SHT_PROGBITS | DWARF, Mapping between instruction and source code lines |
.debug_str | SHT_PROGBITS | DWARF, Strings for DWARF sections |
.debug_fame | SHT_PROGBITS | DWARF, Stack unwinding information[13] |
.debug_macro | Debug macros (GNU extension) | |
.debug_link | [14] | |
.stab | SHT_PROGBITS | Debugging informations in the (old) stab format |
.stabstr | SHT_PROGBITS | Strings associated with the .stab section |
.eh_frame | SHT_PROGBITS | Runtime stack unwinding information[13:1] |
.eh_frame_hdr | SHT_PROGBITS | Header (location and index) of the EH frame table (PT_GNU_EH_FRAME ) |
.shstrtab | SHT_STRTAB | String table for section names |
.note.XXXX | SHT_NOTE | Note |
.note.ABI-tag | SHT_NOTE | ABI used in this file (NT_GNU_ABI_TAG ) |
.note.gnu.build-id | SHT_NOTE | Build-id for thie build[15] (NT_GNU_BUILD_ID note.) |
.dynamic | SHT_DYNAMIC | Dynamic table, dynamic linking information (PT_DYNAMIC ) |
.interp | SHT_PROGBITS | Interpreter (PT_INTERP ) |
.group | SHT_GROUP | Group of related sections (used for COMDAT) |
.comment | ||
.jcr | SHT_PROGBITS | Used for Java (?) |
.stapsdt.base | Used for SystemTap SDT | |
.note.stapsdt | Used for SystemTap SDT | |
.gcc_except_table | SHT_PROGBITS | LSDA (Language Specific Data) for exception handling |
.gnu.warning | Warning message when linking against this file[16] | |
.gnu_warning.XXX | SHT_PROGBITS | Warning message when linking against symbol XXX [16:1] |
.ARM.extab | SHT_PROGBITS | |
.ARM.exidx | SHT_ARM_EXIDX | |
.ARM.attributes | SHT_ARM_ATTRIBUTES |
Section types
SHT_PROGBITS
, section containing data which do not have any special meaning for the link editor;SHT_NOBITS
, section full of zeros (.bss
);SHT_NOTE
, notesSHT_HASH
, standard symbol hash table;SHT_GNU_HASH
, GNU symbol hash table;SHT_DYNSYM
, minimum runtime dynamic symbol table (.dynsym
)SHT_SYMTAB
, full symbol table (.symtab
)SHT_STRTAB
, string tablesSHT_RELA
andSHT_REL
, relocation tables (with addendum and without addendum respectively);SHT_INIT_ARRAY
andSHT_FINI_ARRAY
, addresses of initialisation/termination functions;SHT_DYNAMIC
, dynamic table (.dynamic
section,DT_DYNAMIC
segment type)SHT_VERDEF
SHT_VERSYM
SHT_VERNEED
SHT_GROUP
Section link
For symbol tables (SHT_SYMTAB
and SHT_DYNSYM
) and the dynamic section (SHT_DYNAMIC
), the sh_link
gives the index of the string table used to find the strings referenced in the section.
For symbol hash tables (SHT_HASH
and SHT_GNU_HASH
) and relocation tables (SHT_RELA
and SHT_REL
), it gives the index of the associated symbol table.
Section info
For relocation tables, the sh_info
field gives the index of the section it applies to. This is mostly relevant for .o
files. For executables and DSOs on GNU systems, the .rela.dyn
uses 0 because it applies to many different sections and rela.plt
uses the index of the .plt
even if it applies to the .got.plt
.
For symbol tables, it gives the index in the symbol table which can be used to skip the STT_LOCAL
symbols.
Section flags
The sh_flags
is a field of flags:
SHF_WRITE
,SHF_EXECINSTR
defines expected runtime accesses to this section. When the linker editor, it will set thePF_W
andPF_X
flags accordingly.SHF_ALLOC
is used for sections which are present in the ELF file at runtime. Those section are expected to be present in aPT_LOAD
entry.SHF_MERGE
is used for section which can be merged to eliminate duplication.SHF_STRINGS
is used for string table sections.SHF_INFO_LINK
is used for sections which reference another section in thesh_info
field[17].SHF_LINK_ORDER
SHF_OS_NONCONFORMING
is used for sections which need OS-specific processing.SHF_GROUP
SHF_TLS
is used for section holding TLS (Thread Local Storage).
Program header table
The program header table defines the execution view of the ELF file:
- location (in memory and on disk) of the segments (parts of the file which must be loaded/mapped in program memory);
- location (in memory and on disk) of some important runtime parts of those segments.
The program table is an array of program headers:
typedef struct {
uint32_t p_type; /* Segment type */
uint32_t p_flags; /* Segment flags */
Elf64_Off p_offset; /* Segment file offset */
Elf64_ddr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
uint64_t p_filesz; /* Segment size in file */
uint64_t p_memsz; /* Segment size in memory */
uint64_t p_align; /* Segment alignment */
} Elf64_Phdr;
The program header table can be seen with readelf -l
. readelf
tells as well which section is located in each region described in a program header entry.
Segments
A PT_LOAD
entry represents a loadable segment to load (typically mmap()
) in the program memory. A typical ELF executable or DSO has two such entries describing two segments[18]:
- The first one is the text segment. It is executable, readable but not writable and contains code and read-only data (
.text
,.rodata
,.plt
,.eh_frame
, etc); - The second one is the data segment. It is readable, writable but not executable and contains the modifiable data (
.data
,.got
,got.plt
,.bss
, etc.).
The idea in this separation is that everything which does not need to be written (read-only data, code) should be read-only:
- the text segment is not modified[19] and thus its memory pages can be shared for all processes using this ELF file;
- the pages of the writable segment are automatically unshared by the OS as soon as they are modified (using copy-on-write).
Note: security considerations
Another important property in the design is that executable segments are not writable[20]. If a process has VMAs[21] which are both executable and writable, an attacker might exploit bugs such as buffer overflows in order to write arbitraty code in the program's memory and possibly execute it. If the executable pages are read-only, the attackers can try to write arbitrary code but it will not be executable[22].
Example
A simple hello world program:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040 0x00000000000001c0 0x00000000000001c0 R E 8 INTERP 0x0000000000000200 0x0000000000400200 0x0000000000400200 0x000000000000001c 0x000000000000001c R 1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x00000000000006dc 0x00000000000006dc R E 200000 LOAD 0x00000000000006e0 0x00000000006006e0 0x00000000006006e0 0x0000000000000230 0x0000000000002288 RW 200000 DYNAMIC 0x00000000000006f8 0x00000000006006f8 0x00000000006006f8 0x00000000000001d0 0x00000000000001d0 RW 8 NOTE 0x000000000000021c 0x000000000040021c 0x000000000040021c 0x0000000000000044 0x0000000000000044 R 4 GNU_EH_FRAME 0x00000000000005b4 0x00000000004005b4 0x00000000004005b4 0x0000000000000034 0x0000000000000034 R 4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 10
We can see the resulting VMAs[21:1] in /proc/$pid/maps
of a corresponding process:
- The first VMA (Virtual Memory Area) is the text segment.
- The second VMA is the part of the data segment which is initialised.
- The fourth VMA is the part of the data segment which is not been initialised. This is the end of the
.bss
segment. This part is it not stored in the ELF file and and is a thus a separateMAP_ANONYMOUS
VMA.
00400000-00401000 r-xp 00000000 08:13 27418661 /home/foo/temp/wait 00600000-00601000 rw-p 00000000 08:13 27418661 /home/foo/temp/wait 00601000-00603000 rw-p 00000000 00:00 0 [...]
Read only relocations
On GNU systems, the dynamic linker may be instructed to mprotect()
the .got
section against write access after the relocation is finished. This improves the security by preventing the poisoning of the (non-PLT) GOT[23] after the relocation is done.
This is enabled with ld -z relro
(which generates a PT_GNU_RELRO
entry) and disabled explicitly with ld -z norelo
. When enabled, PT_GNU_RELRO
is present in the program header table and describes a range of memory which the dynamic linker can mprotect()
after the (non-lazy) relocation is done (the .got
section).
The same example program linked with ld -z relro
features the additional PT_GNU_RELRO
entry:
Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040 0x00000000000001f8 0x00000000000001f8 R E 8 INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238 0x000000000000001c 0x000000000000001c R 1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x000000000000070c 0x000000000000070c R E 200000 LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10 0x0000000000000230 0x0000000000002258 RW 200000 DYNAMIC 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28 0x00000000000001d0 0x00000000000001d0 RW 8 NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254 0x0000000000000044 0x0000000000000044 R 4 GNU_EH_FRAME 0x00000000000005e4 0x00000000004005e4 0x00000000004005e4 0x0000000000000034 0x0000000000000034 R 4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 10 GNU_RELRO 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10 0x00000000000001f0 0x00000000000001f0 R 1
This can be seen in /proc/$pid/maps
:
- the first VMA is the text segment;
- the second VMA is the part of the data segment (described by
PT_GNU_RELRO
) which as been protected; - the third VMA is the part of the data segment which has not been protected;
- the fourth VMA is the part of the data segment which has not been initialised.
00400000-00401000 r-xp 00000000 08:13 27418663 /home/foo/temp/wait2 00600000-00601000 r--p 00000000 08:13 27418663 /home/foo/temp/wait2 00601000-00602000 rw-p 00001000 08:13 27418663 /home/foo/temp/wait2 00602000-00604000 rw-p 00000000 00:00 0 [...]
In addition, ld -z now
(DF_BIND_NOW
) might be used which disables lazy-relocation. By combining the two options, you can get an executable or DSO without .got.plt
and all the GOT will be read-only after relocation.
Other program header entries
PT_PHDR
describes the program header table itself if it is available in the program memory.PT_INTERP
gives the absolute path of the interpreter/dynamic-linker (.interp
section) if there is one. This entry if present on dynamically linked executables.PT_DYNAMIC
describes the location of the dynamic linking informations (.dynamic
section).PT_GNU_EH_FRAME
(akaPT_SUNW_EH_FRAME
) describes the location of stack unwinding information used at runtime (for exception handling). This is the.eh_frame_hdr
section which can be used to locate the.eh_frame
section.PT_GNU_STACK
is an empty segment which can be used to set the permissions of the default stack. This can be used to make the stack executable which is probably not such as good idea.PT_NOTE
describes the location of notes. All the.noteXXX
sections (of typeSHT_NOTE
) are usually combined into a singlePT_NOTE
section.PT_TLS
is used for initalising TLS.
String tables
String tables are lists of strings. They use the SHT_STRTAB
section type. Each string in the string table is terminated by a NUL byte and is referenced by its byte offset from the beginning of the table.
The first entry of a string table is always the empty string (the first byte of a string table is always NUL): the empty string can always be designated with the zero offset.
The content of a string section can be displayed with readelf -p .dynstr
or with objcopy a.out --dump-section .dynstr=/dev/stdout /dev/null | tr '\000' '\n'
.
Usages:
.shstrtab
holds the section names;.dynstr
holds the names of the symbols in the dynamic symbol table.dynsym
;.strtab
holds the names of the symbols in the full symbol table.symtab
.
References to string tables:
- the (section index of the) string table used for section names is indicated in the
e_shstrndx
field of the ELF header; - many sections which with string reference use the section header field
sh_link
to give the (section index of the) string table they use; - in the dynamic section, the string table used is located with the
DT_STRTAB
entry.
Example of .shstrtab
(x86_64 GNU/Linux)
Section Headers:
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align [27] .shstrtab STRTAB 0000000000000000 000008f1 0000000000000108 0000000000000000 0 0 1
File hexdump:
000008b0: 0000 0000 0000 0000 4743 433a 2028 4465 ........GCC: (De
000008c0: 6269 616e 2034 2e39 2e32 2d31 3029 2034 bian 4.9.2-10) 4
000008d0: 2e39 2e32 0047 4343 3a20 2844 6562 6961 .9.2.GCC: (Debia
000008e0: 6e20 342e 382e 342d 3129 2034 2e38 2e34 n 4.8.4-1) 4.8.4
000008f0: 0000 2e73 796d 7461 6200 2e73 7472 7461 ...symtab..strta
00000900: 6200 2e73 6873 7472 7461 6200 2e69 6e74 b..shstrtab..int
00000910: 6572 7000 2e6e 6f74 652e 4142 492d 7461 erp..note.ABI-ta
00000920: 6700 2e6e 6f74 652e 676e 752e 6275 696c g..note.gnu.buil
00000930: 642d 6964 002e 676e 752e 6861 7368 002e d-id..gnu.hash..
00000940: 6479 6e73 796d 002e 6479 6e73 7472 002e dynsym..dynstr..
00000950: 676e 752e 7665 7273 696f 6e00 2e67 6e75 gnu.version..gnu
00000960: 2e76 6572 7369 6f6e 5f72 002e 7265 6c61 .version_r..rela
00000970: 2e64 796e 002e 7265 6c61 2e70 6c74 002e .dyn..rela.plt..
00000980: 696e 6974 002e 7465 7874 002e 6669 6e69 init..text..fini
00000990: 002e 726f 6461 7461 002e 6568 5f66 7261 ..rodata..eh_fra
000009a0: 6d65 5f68 6472 002e 6568 5f66 7261 6d65 me_hdr..eh_frame
000009b0: 002e 696e 6974 5f61 7272 6179 002e 6669 ..init_array..fi
000009c0: 6e69 5f61 7272 6179 002e 6a63 7200 2e64 ni_array..jcr..d
000009d0: 796e 616d 6963 002e 676f 7400 2e67 6f74 ynamic..got..got
000009e0: 2e70 6c74 002e 6461 7461 002e 6273 7300 .plt..data..bss.
000009f0: 2e63 6f6d 6d65 6e74 0000 0000 0000 0000 .comment........
This string table of section names starts at 0x8f1:
- the first entry if the empty string for section header number 0 (offset 0);
- the second entry is the
.symtab
string (offset 1); - the this entry is the
.strtab
string (offset 9).
Symbols and the symbol table
What is a symbol?
Symbols are used for linking (by the link editor and the dynamic linker).
The C statement:
extern int foo;
int foo = 3;
defines a global variable associated with the foo
symbol[24].
A user of this global variable:
extern int foo;
int foo_updater()
{
return foo++;
}
will link to the foo
symbol.
The linker will bind the user of the global variable with the global variable because they are using the same symbol name.
Symbol tables
Three section header table often includes two different symbol tables:
- the
.symtab
section (SHT_SYMTAB
) lists all the symbols including the local symbol which are not used outside of the ELF file; - the
.dynsym
section (SHT_DYNSYM
) is a (usually) much smaller symbol table which only contains the imported and exported symbols.
The former can be used by debugging tools and the latter contains the minimum amount of entries for the dynamic linker. For this reason, only the latter is mapped in the process virtual address space and is present in the dynamic table.
The symbol tables are arrays of symbol entries:
typedef struct {
Elf64_Word st_name; /* Symbol name (string tbl index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility (and 0) */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;
At runtime, the dynamic symbol table is given by the dynamic table entry ST_SYMTAB
. Its size is not given and can be inferred from the hash table (DT_HASH
or DT_GNU_HASH
).
readelf -s
can display the symbol tables.
Symbol type
STT_OBJECT
, global variablesSTT_FUNC
, executable code (function, subprogram, method);STT_SECTION
, sectionSTT_FILE
, gives a file name and precedesSTB_LOCAL
symbols of the fileSTT_TLS
is used for TLS variables such aserrno
,h_errno
.
Section index
Each symbol can be associated with a section (by its index).
Some special values are used:
STT_UNDEF
means that the symbol is undefined. It is not defined in this ELF file but is only references by it.STT_COMMON
is used for a symbol which as not bean allocated yet. The value is an alignment constraint. This is used in.o
files for uninitialised global variables (in C). It can be defined in multiple C files and will be instanciated only once.STT_ABS
is used for absolute values which are not relocated. It is used forSTT_FILE
entries and for GNU versioning.
Visibility and binding
Binding | Visibility | Meaning |
---|---|---|
STT_LOCAL | STV_DEFAULT | Local to relocatable object |
STT_GLOBAL | STV_HIDDEN | Local to the executable or DSO[25] |
STT_GLOBAL | STV_DEFAULT | Global (visible in other runtime ELF modules) |
Symbol binding
The symbol binding control the link-time visibility of the symbol (i.e. outside translation units and within a given ELF runtime objecte but not across runtime ELF objects). It is a part of the stb_info
field.
-
STB_LOCAL
symbols are local to a.o
file (they are used forstatic
variables and functions in C and for things in anonymous namespace in C++).Multiple symbols with the same name can be in the same
ET_EXEC
orET_DYN
(originating from differentET_REL
): they are usually located before aSTT_FILE
entry with the source file name of the corresponding compilation unit in the.symtab
-
STB_GLOBAL
symbols are visible outside of the.o
file. -
STB_WEAK
is similar toSTT_GLOBAL
.When combining multiple
.o
files into one executable or DSO, the link editor will raise an error if multipleSTT_GLOBAL
versions of the same symbols are defined but aSTT_WEAK
symbol with the same name as aSTT_GLOBAL
or anotherSTT_WEAK
symbol can appear.A weak symbol does not need to be resolved: an unresolved weak symbol has a value of 0. The link editor will not pull
.o
relocatable objects from.a
archives in order to resolve undefined weak symbols.
Symbol visibility
The symbol visibility controls the visibility across executable and DSOs. It is stored in the st_other
field. This field is not relevant for STT_LOCAL
symbols.
The different values are:
STV_DEFAULT
, global visibility;STV_PROTECTED
, global visibility but local references do not use the PLT[26];STV_HIDDEN
, not visible outside of the executable or shared-object.STV_INTERNAL
similar toSTV_HIDDEN
but may have some additional processor-specific semantic. Apparently the intent is that the symbol cannot be accessed outside the module (STV_HIDDEN
might be accessed indirectly).
The STT_HIDDEN
can be used in order to mark symbols which need not be used outside of the DSO:
- by reducing the number of exported symbols, this can speed up the symbol lookups by the dynamic linker;
- the PLT can be avoided and the function id called directly;
- this can avoid unexpected usage by other DSOs of symbols which are not part of the ABI of this DSO .
The visibility of a symbol can be defined in GCC with the visibility attribute:
int get_answer(void) __attribute__(visibility("hidden"))
{
return 42;
}
The default visibility can be changed with command-line arguments with recent versions of GCC (gcc -fvisibility=hidden
) or with pragmas:
#pragma GCC visibility push(hidden)
int get_answer(void) __attribute__(visibility("hidden"))
{
return 42;
}
#pragma GCC visibility pop(hidden)
Relocation tables
The relocation tables are arrays of relocation entries using one of those forms:
typedef struct {
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
} Elf64_Rel;
typedef struct {
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
Elf64_Sxword r_addend; /* Addend */
} Elf64_Rela;
The relocations exist in two forms. In both cases an addend is added to the symbol:
- with explicit addend (
Elf64_Rela
), the addend is stored in ther_addend
field of the relocation table; - without explicit addend (
Elf64_Rel
), the addend is stored in the relocation address.
readelf -r
can display the relocation tables.
Relocation address
ET_REL
files have one relocation section .rela.foo
(or .rel.foo
) per relocated section .foo
. The r_offset
address of the relocation is the offset of within the relocated .foo
section.
For ET_EXEC
and ET_DYN
files, there is usually two relocation tables: the normal relocation table .rela.dyn
(or .rel.dyn
) and the lazy/PLT relocation table .rela.plt
(or .rel.plt
). The r_offset
address of the relocation has a different meaning: it is the (runtime) virtual address of the relocation. The location of the relocation tables is described at runtime in the dynamic section (DT_RELA
, DT_REL
, DT_RELASZ
, DT_RELSZ
, DT_RELAENT
, DT_RELENT
DL_PLTREL
, PLTRELSZ
, DT_JMPREL
).
GOT
The executable code is (usually) in the read-only segment:
- we want to avoid to be able to modify the code for security reasons;
- by avoiding to modify the code we can share the same same physical pages for the code for all processes using this ELF object.
As we do no want to modify the code (in the readonly text segment) in order to share it, the dynamic linker cannot relocate the DSO by patching the addresses of the referenced objects in the executable code. Instead, the address of the object is stored by the dynamic linker in the writable segment and the code fetches this address.
The link editor creates a section in the writable segment, the GOT (.got
), containing all the slots for those addresses[27]. It creates a relocation entries in order to make the dynamic linker store the suitable values in the GOT.
GOT examples for x86_64
Compilation
For example, this C code:
extern int foo;
int get_foo()
{
return foo;
}
compiles into this (gcc -S deref.c -o- -fPIC
):
get_foo:
movq foo@GOTPCREL(%rip), %rax
movl (%rax), %eax
ret
foo@GOTPCREL(%rip)
resolves to a memory address (a entry in the GOT) where the address of foo
is written: the first instruction stores this address in the %rax
register. In the next instruction, the processor fetches the foo
variable by dereferencing this address.
Relocatable object
When compiled into a relocatable object, we get this relocation:
Relocation section '.rela.text' at offset 0x250 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000003 000b00000009 R_X86_64_GOTPCREL 0000000000000000 foo - 4
It asks the link editor to generate a GOT entry for the address of foo
and fill the relative address of this GOT entry in the instruction (movq foo@GOTPCREL(%rip), %rax
). The link editor creates the GOT entry.
An addend of -4 is used because the relative instructions in x86 are using the address of the next instruction as a base address.
Shared object
At runtime, the GOT entry needs to be filled by the dynamic linker. In order to do this, the link editor creates a relocation for the GOT entry in the shared-object:
Relocation section '.rela.dyn' at offset 0x458 contains 9 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000200990 000800000006 R_X86_64_GLOB_DAT 00000000002009ec foo + 0
This entry sets the third entry in the .got
GOT[28]:
[19] .got PROGBITS 0000000000200980 00000980 0000000000000030 0000000000000008 WA 0 0 8
PLT
The Procedure Linkage table is used for calling functions whose address is not known at link time (because they might be in another shared-object or the executable). The PLT can be disassembled with objdump -D -j .plt
.
For example this code:
#include <stdlib.h>
int main(int argc, char** argv)
{
abort();
return 0;
}
is compiled into (gcc test.c -S -o- -fPIC -O3
):
main:
subq $8, %rsp
call abort
When decompiling the resulting executable we find that the call to foo
as been replaced by a call to a stub for abort@plt
(called a trampoline):
0000000000400410 <main>:
400410: 48 83 ec 08 sub $0x8,%rsp
400414: e8 c7 ff ff ff callq 4003e0 <abort@plt>
This trampoline fetches the address of the abort
in the GOT and jumps to this address:
00000000004003e0 <abort@plt>:
4003e0: ff 25 ea 04 20 00 jmpq *0x2004ea(%rip) # 6008d0 <_GLOBAL_OFFSET_TABLE_+0x18>
4003e6: 68 00 00 00 00 pushq $0x0
4003eb: e9 e0 ff ff ff jmpq 4003d0 <_init+0x28>
All of this is done by the first instruction of this PLT trampoline: the two remaining instructions are used for lazy relocation which is explained afterwards.
A relocation exists in order to store the address of foo
in this PLT GOT entry:
Relocation section '.rela.plt' at offset 0x360 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 0000006008d0 000100000007 R_X86_64_JUMP_SLO 0000000000000000 abort +
Lazy relocations
Relocation in dynamic linking can slow down the initialisation of the application: each symbol must be looked up in all loaded DSOs and the executable. In order to speed up the relocation of programs, lazy relocation is used for function calls[29]: the corresponding PLT GOT entry is not filled with the address of the function in the process initialisation but only when the function is actually called.
# Special .PLT0 entry:
00000000004003d0 <abort@plt-0x10>:
4003d0: ff 35 ea 04 20 00 pushq 0x2004ea(%rip) # 6008c0 <_GLOBAL_OFFSET_TABLE_+0x8>
4003d6: ff 25 ec 04 20 00 jmpq *0x2004ec(%rip) # 6008c8 <_GLOBAL_OFFSET_TABLE_+0x10>
4003dc: 0f 1f 40 00
# .PLT1 for abort:
00000000004003e0 <abort@plt>:
4003e0: ff 25 ea 04 20 00 jmpq *0x2004ea(%rip) # 6008d0 <_GLOBAL_OFFSET_TABLE_+0x18>
4003e6: 68 00 00 00 00 pushq $0x0
4003eb: e9 e0 ff ff ff jmpq 4003d0 <_init+0x28>
- The dynamic linker preinitialises the PLT GOT,
- the first entry of the PLT GOT is filled by the dynamic linker with the address of
_DYNAMIC
; - the second entry of the PLT GOT is filled by the dynamic linker with a value used by the dynamic linker to recognise this ELF executable or DSO;
- the third entry of the PLT GOT is filled with the address of a callback of the dynamic linker;
- the PLT GOT entry for
abort@plt
is initially filled with the address of its second instruction (0x4003e6);
- the first entry of the PLT GOT is filled by the dynamic linker with the address of
- on the first call of the PLT trampoline
abort@plt
, a. the first instruction of the trampoline jumps to the second instruction of the trampoline; b. the second instruction of the PLT pushes on the stack the index of this relocation in the relocation table (fromDT_JMPREL
); c. the third instruction jumps to the first entry of the PLT (.PLT0
); d. this entry pushes the second entry of the PLT GOT on the stack (this is used by the dynamic linker to identify this shared-object); e. this entry jumps to the callback of the dynamic linker; f. the dynamic linker does the real relocations,- it uses the arguments passed on the stack (identifier of this shared-object or executable and index in the relocation table),
- it resolves the symbol;
- it updates the PLT GOT entry with the address of the symbol;
- it jumps to the address of the symbol in order to execute the function; g. the function is executed;
- on other calls, the PLT GOT entry now contains the address of the function and the PLT entry jumps to it directly (instead of jumping to
.PLT0
and to the dynamic linker).
In the section header table:
- the part of the GOT used by the PLT is in a separate section
.got.plt
; - the relocations of this PLT GOT are in a separate section
.rela.plt
.
In the dynamic section:
DT_JMPREL
andDT_PLTRELSZ
give the size of this relocation table (andDT_PLTREL
whether it uses addends or not);- the
DT_GOTPLT
entry is used to tell the dynamic linker where it should find the the (three) special PLT GOT entries.
PLT example for x86_64
Compilation
This time let's compile a function call:
extern int foo(void);
int get_foo()
{
return foo() + 42;
}
We get this assembly (cc -O3 -S -fpic
):
get_foo:
.LFB0:
subq $8, %rsp
call foo@PLT
addq $8, %rsp
addl $42, %eax
ret
The foo@PLT
asks the assembler to use the address of a PLT entry for the foo
function
Relocatable object
We get this relocation in the relocatable object:
Relocation section '.rela.text' at offset 0x260 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000005 000b00000004 R_X86_64_PLT32 0000000000000000 foo - 4
It asks the link editor to patch the instruction with the 32-bit relative address of the PLT entry for symbol foo
. The link editor creates a PLT entry, corresponding PLT GOT entry (in the .got.plt
) section and a relocation entry for this PLT GOT entry (in .rela.dyn
).
Shared object
We get this relocation in the shared-object:
Relocation section '.rela.plt' at offset 0x4f0 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000200960 000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0
This relocation entry asks the dynamic linker to lazily initialise the PLT GOT entry:
- it will first fill the PLT GOT entry with the second instruction of the associated PLT entry;
- when the PLT is called, it will call the dynamic linker which will initialise the PLT GOT entry with the address of the
foo
symbol.
Some x86_64 relocations
Link time relocation:
R_X86_64_PC32
, 32-bit relative address of the symbol;R_X86_64_PLT32
, 32-bit relative address of the PLT entry of the symbol;R_X86_64_GOTPCREL
, relative address of the GOT entry of the symbol.
Runtime relocations:
R_X86_64_JUMP_SLOT
, sets a lazy PLT GOT entryR_X86_64_GLOB_DAT
, sets a GOT entryR_X86_64_COPY
, copy-relocationR_X86_64_RELATIVE
R_X86_64_64
, stores the 64 value of the symbol
Some x86 relocations
R_386_32
R_386_PC32
R_386_GOT32
R_386_PLT32
R_386_COPY
, copy relocationR_386_GLOB_DAT
, set a GOT entryR_386_JMP_SLOT
, set a PLT GOT entryR_386_RELATIVE
R_386_GOTOFF
R_386_GOTPC
Hash tables
Standard hash table
The standard hash table is built by the link editor. It is described by the .hash
SHT_HASH
section and by the DT_HASH
entry in the dynamic section. Its structure is quite simple:
// Pseudo-C:
struct {
Elf32_Word nbucket; /* Number of buckets */
Elf32_Word nchain; /* Numer of entries in .dynsy* */
Elf32_Word buckets[nbucket]; /* First entry in the chain */
Elf32_Word chains[nchain]; /* Next entry in the chain */
};
buckets[hash % nbucket]
gives the initial index in both the symbol table and the chain array;chains[index]
gives the next index in the chain;index == STN_UNDEF
marks the end of the chain.
The lookup looks like this:
Elf64_Sym const* lookup_symbol(
const char* symbol,
Elf64_Sym const* symbol_table
const char* string_table,
Elf32_Word const* hash_table)
{
Elf32_Word nbucket = hash_table[0];
Elf32_Word nchain = hash_table[1];
Elf32_Word const* buckets = hash_table + 2;
Elf32_Word const* chains = hash_table + 2 + nbucket;
unsigned long hash = elf_hash(symbol_name);
// Iterate on the chain:
while (Elf32_Word index = buckets[hash % nbucket];
chains[index] != STN_UNDEF;
index = chains[index])
if (strcmp(symbol, string_table + symbol_table[index].st_name) == 0)
return symbol_table + index;
return NULL;
}
GNU hash table
The GNU hash table is a more efficient alternative to the standard hash table[30]. Both can be present in the same ELF file but modern GNU ELF files usually only contains the GNU hash table. It is described by the .gnu.hash
SHT_GNU_HASH
section and by the DT_GNU_HASH
entry in the dynamic section.
Main differences:
- It adds a Bloom filter in order to speed up negative lookups. Negative lookups are the common case since the symbol is searched in different ELF files in sequence.
- It adds the value of the hash next to each entry in order to avoid useless string comparison.
- It is more cache-friendly by avoiding to jump around in the hash table memory.
- It uses the DJB hash function.
// Pseudo-C:
struct Gnu_Hash_Header {
uint32_t nbuckets;
uint32_t symndx; /* Index of the first accessible symbol in .dynsym */
uint32_t maskwords; /* Nyumber of elements in the Bloom Filter */
uint32_t shift2; /* Shift count for the Bloom Filter */
uintXX_t bloom_filter[maskwords];
uint32_t buckets[nbuckets];
uint32_t values[dynsymcount - symndx];
};
Notes
Each entry of a note section begins with:
typedef struct {
Elf64_Word n_namesz; /* Length of the note's name. */
Elf64_Word n_descsz; /* Length of the note's descriptor. */
Elf64_Word n_type; /* Type of the note. */
} Elf64_Nhdr;
After this comes the note name and the note content:
- The note name is the name of the owner of the note. Each owner can define its own values for the note type. The
n_namesz
field includes the terminating 0 byte at the end of the name.
Padding is used after the name and the content of the note to ensure 4 byte alignment.
Each note is usually in its own section (.note.XXX
) but they are all grouped in the same program entry. readelf -n
can display the notes.
GNU notes
GNU notes are using the string "GNU" (with a terminating 0 byte) and define the notes:
-
NT_GNU_ABI_TAG
is used to describe the ABI used by the file.The first 64-bit is the target system (ELF_NOTE_OS_LINUX for Linux) and the following bytes are a minimum version number.
Example (GNU/Linux 2.6.32):
Hex dump of section '.note.ABI-tag': 0x0040021c 04000000 10000000 01000000 474e5500 ............GNU. 0x0040022c 00000000 02000000 06000000 20000000 ............ ...
-
NT_GNU_BUILD_ID
is used to associate a build-id to a given build of a ELF executable or shared-object. This is used to locate a separate file containing its debug informations.Example (d53a4435d14a5ac3009bad8c6f840175b37aa86a):
Hex dump of section '.note.gnu.build-id': 0x00400274 04000000 14000000 03000000 474e5500 ............GNU. 0x00400284 d53a4435 d14a5ac3 009bad8c 6f840175 .:D5.JZ.....o..u 0x00400294 b37aa86a .z.j
-
NT_GNU_HWCAP
CORE notes
See Anatomy of an ELF core file.
LINUX notes
See Anatomy of an ELF core file.
Dynamic section
The dynamic section provides important informations for the dynamic linker. A statically linked executable does not have a PT_DYNAMIC
entry.
It is an array of entries with the structure:
typedef struct {
Elf64_Sxword d_tag; /* Dynamic entry type */
union {
Elf64_Xword d_val; /* Integer value */
Elf64_Addr d_ptr; /* Address value */
} d_un;
} Elf64_Dyn;
readelf -d
can display the content of the dynamic section.
The dynamic table is available as at runtime with the _DYNAMIC
local symbol. A DT_NULL
entry marks the end of the dynamic section.
Shared objects
DT_NEEDED
defines a shared-object dependency;DT_SONAME
is the name of the current shared-object. This value is copied by the link editorld
asDT_NEEDED
entry of dependent ELF objects[31].
RPATH
The DT_RUNRPATH
(and DT_RPATH
[32]) defines an additional path where the shared-objects should be searched.
The dynamic linker (ld.so
) recognises several special values in DT_RUNRPATH
(and DT_RPATH
):
$ORIGIN
expands to the directory of the ELF file;$LIB
expands tolib
orlib64
depending on the architecture;$PLATFORM
expand tox86_64
for x86_64.
The DT_RPATH
can be set with ld -rpath='$ORIGIN'
(or gcc -Wl,-rpath='$ORIGIN'
). ld --enable-new-dtags
might be needed to add the DT_RUNPATH
entries as well.
Symbols
DT_SYMTAB
gives the runtime location of the symbol table (section.dynsym
of typeSHT_DYNSYM
) andDT_SYMENT
gives the byte size of a single entry[33].DT_HASH
gives the runtime location of the standard symbol hash table (.hash
of typeSHT_HASH
).DT_GNU_HASH
, program memory location of the GNU symbol hash table (section.gnu.hash
of typeSHT_GNU_HASH
).
The type of hash table generated by the link editor can be chosen with ld --hash-style=style=sysv|gnu
both`. By default, the GNU hash table is used on (not-too old) GNU systems.
Relocations
At runtime there is usually two different relocation tables: the main relocation table and the PLT relocation table.
The main relocation table (.rela.dyn
section) is located with DT_RELA
(address), DT_RELASZ
(byte size of the relocation table), DT_RELAENT
(byte size of a relocation entry) for relocation tables with addend. The main relocation table without addend uses DT_REL
, DT_RELSZ
and DT_RELENT
.
Another relocation table (.rela.plt
section) is used for the PLT. It is located with: DT_JMPREL
(address) and DT_PLTRELSZ
(byte size of the relocation table). The DT_PLTREL
gives the type of relocation table (either DT_RELA
or DT_REL
) used for the PLT.
The DT_PLTGOT
is the address of the PLT GOT (.got.plt
). The dynamic linker needs to know it because the first entries of the PLT GOT are used by the dynamic linker.
Symbol lookup
Each relocation implies a symbol lookup.
In ELF, symbol resolution is using a mostly[34] flat-namespace[35]: a used symbol is not bound to a specific DSO and is it searched in all the executable and all DSOs with breadth-first search[36] (using the order of the DT_NEEDED
entries).
This search is in O(#modules). For each executable or shared-object, a hash table (DT_HASH
, DT_GNU_HASH
or both) is included in the file (and available at runtime) in order to speed up the symbol lookup.
Flags
DT_FLAGS
is a field of flags:
DF_ORIGIN
is used when the current shared-object uses the$ORIGIN
variable.- If the
DF_SYMBOLIC
flag is present, a given shared-object will always use its local definitions before definitions from another shared-object or the executable. - If the
DF_TEXTREL
[37] flag is present, text relocations are used: relocation are done in non-writable segment (usually this is text segment) and the dynamic linker might need to make the text segment temporarily writable. It is usually not present because it prevents sharing of the text segment between different processes. DF_BIND_NOW
(ld -z now
) disabled lazy relocations for the generated executable or shared-object.
Initialisation and termination functions
Initialisation functions are called in this order:
DT_PREINIT_ARRAY
array (of byte sizeDT_PREINIT_ARRAYSZ
) of preinitialisation function addresses.DT_INIT
, address of an initialisation function (the.init
section);DT_INIT_ARRAY
array (of byte sizeDT_INIT_ARRAYSZ
) of initialisation function addresses.
Termination functions are called in this order:
DT_FINI_ARRAY
array (if byte sizeDT_FINI_ARRAYSZ
) of termination function addresses;DT_FINI
address of a termination function respectively (.fini
sections).
Debug interface
If a DT_DEBUG
entry is present, this value will be set by the dynamic linker to a pointer to the address of a struct r_debug
(see link.h
):
struct r_debug
{
int r_version; /* Version number for this protocol. */
struct link_map *r_map; /* Head of the chain of loaded objects. */
ElfW(Addr) r_brk;
enum {
RT_CONSISTENT, /* Mapping change is complete. */
RT_ADD, /* Beginning to add a new object. */
RT_DELETE /* Beginning to remove an object mapping. */
} r_state;
ElfW(Addr) r_ldbase; /* Base address the linker is loaded at. */
};
This can be used to traverse the list of executables and shared-objects (of a given namespace):
struct link_map {
/* These first few members are part of the protocol with the debugger.
This is the same format used in SVR4. */
ElfW(Addr) l_addr; /* Difference between the address in the ELF
file and the addresses in memory. */
char *l_name; /* Absolute file name object was found in. */
ElfW(Dyn) *l_ld; /* Dynamic section of the shared object. */
struct link_map *l_next, *l_prev; /* Chain of loaded objects. */
};
The struct link_map
can be obtained at runtime with dlinfo(handle, RTLD_DI_LINKMAP, &res)
.
String table
DT_STRTAB
and DT_STRSZ
give the location and byte size of string table used by the dynamic section (.dynstr
);
Symbol versions
Those entries are GNU extensions for versioning of symbol:
DT_VERSYM
is the runtime location of the symbol version table (.gnu.version
section of typeSHT_GNU_versym
). It contains the same number of entries as the dynamic symbol table and references a entry in the version definition table.DT_VERDEF
is the runtime location of the symbol definitions (.gnu.version_d
section of typeSHT_GNU_verdef
) andST_VERDEFNUM
is the number of entries.DT_VERNEED
is the runtime location of the version requirements (.gnu.version_rsection of type
SHT_GNU_verned) and
DT_VERNEEDNUM` is the number of entries.
Not covered (much) here
GNU symbol versioning
Main structures:
- The symbol version table (
DT_VERSYM
,.gnu.version
,SHT_GNU_versym
) defines the version associated with each dynamic symbol. - The version definition section (
DT_VERDEF
,ST_VERDEFNUM
.gnu.version_d
,SHT_GNU_verdef
) defines the versions implemented in this ELF file. It uses theElf64_Verdef
andElf64_Verdaux
structures. - The version requirements section (
DT_VERNEED
,DT_VERNEEDNUM
,.gnu.version_r
,SHT_GNU_verned
) defines for each imported (DT_NEEDED
) entry the required versions. It uses theElf64_Verneed
andElf64_Vernaux
structures.
See the LSB.
TLS
The ELF file contains an initialisation image for the TLS data:
- the
SHF_TLS
section flag is used for TLS initialisation sections (.tdata
and.tbss
); - The
PT_TLS
program header type is used to describe the location of the initialisation image of the TLS data and is contained in aPT_LOAD
segment. It contains all theSHF_TLS
sections. - The
STT_TLS
symbol type is used for TLS data symbol. They are expected to be located in thePT_TLS
range.
See ELF Handling For Thread Local Storage.
COMDAT
COMDAT refers to the ability of the static linker to remove redundant code and data when combining different .o
files. This is used in C++ when instanciating templates. In order to do this, the compiler creates dedicated sections for each template instanciation.
For example, this C++ code:
#include <string>
std::string foo(std::string& x)
{
return x + x;
}
Generates the following sections in the relocatable object:
$ readelf -WS test.o
There are 26 section headers, starting at offset 0xc058:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .group GROUP 0000000000000000 000040 00000c 04 24 18 4
[ 2] .text PROGBITS 0000000000000000 00004c 00002d 00 AX 0 0 1
[ 3] .rela.text RELA 0000000000000000 008278 000018 18 I 24 2 8
[ 4] .data PROGBITS 0000000000000000 000079 000000 00 WA 0 0 1
[ 5] .bss NOBITS 0000000000000000 000079 000000 00 WA 0 0 1
[ 6] .text._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ PROGBITS 0000000000000000 000079 000062 00 AXG 0 0 1
[ 7] .rela.text._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ RELA 0000000000000000 008290 000060 18 I 24 6 8
[ 8] .gcc_except_table._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ PROGBITS 0000000000000000 0000db 000010 00 AG 0 0 1
[...]
Section groups (sections with .group
name and SHT_GROUP
type) are used to group related sections: the first Elf32_Word
of the group section is a set of flags (GRP_COMDAT
is used for COMDAT section groups) and the remaining Elf32_Word
of the section are the indices of the sections belonging to this group.
ARM
.ARM.*
section names andSHT_ARM_*
section typesPT_ARM_*
program header types
References
Authoritative references
- ELF Format
- SystemV gABI (Generic Application Binary Interface):
- System V ABI, the main spec for ELF
- System V ABI - Draft 17, contains update with the System V ABI 4.1
- System V psABI (Processor-specific Application Binary Interface):
- System V ABI ~ Intel386 Supplement 4th edition, the spec for ELF on x86
- System V ABI ~ AMD64 supplement Draft 0.99.6, the spec for ELF on x86_64
- ELF for the ARM Architecture
- ARM ELF File Format
- 64-bit PowerPC ELF Application Binary Interface Supplement
- Linux Programmer's Manual ~ ELF(5)
- GNU Symbol Versioning
- The Linux Standard Base for GNU/Linux specific stuff including:
- The DWARF specification, for debugging information
Blogs posts, articles, books and such
- Relocations, relocations
- Interpreting readelf -r, in this case R_X86_64_PC32
- Inside ELF symbol tables
- How to write Shared Libraries by Ulrich Depper has a lot of information about the ELF format especially from the point of view of dynamic linking performance (for example the different hash tables)
- DT_GNU_HASH by deroko of ARTeam
- Linkers and names
- Smallest x86 ELF Hello World
- The Itanium C++ ABI
- Itanium C++ ABI, defined a name mangling scheme which is used by G++ on all architectures.
- Itanium C++ ABI: Exception Handling
- The content of DWARF sections
- GNU Hash ELF Sections by Ali Bahrami
- PIC in shared libraries
- PIC in shared libraries on x64
- Linker relro
- Piece of PIE
- ELF Handling For Thread Local Storage
- GNU Hash ELF Section by Ali Bahrami
- Linkers and Loaders
- Dynamic Linking: ELF vs. Mach-O
- ELF - No Section Header? No Problem
- The PIE is not exactly a lie
- Load-time relocation of shared libraries
- System V ABI page on OS Dev wiki
- Introduction to ELF slides
Backlinks
- Dissecting mobile native code packers. A case study.
- sysfilter: Automated System Call Filtering for Commodity Software
- Google Summer of Code 2021 Summary ~ 08A: Refactoring ELF binaries loading
- [http://cs.brown.edu/courses/cs1650/lectures.html]
- CSCI 1650 – Software Security and Exploitation
- CSCI 2951U – Topics in Software Security
Static libraries (
.a
files) are archives to.o
files. Different formats exist for them. ↩︎Notable exception are the Apple systems (MacOS X, iOS, Darwin) which use their own Mach-O format (coming from their NeXTSTEP lineage) and Microsoft systems (Windows) which use the PE file (Portable Executable) format (which is based on the old Unix System V COFF format). ↩︎
For example, it used used for ARM-based embedded software. ↩︎
GNU
objdump
andobjcopy
both rely on BFD and are unable to see some sections (and can synthesise some others) because of the file-format abstraction of the BFD library.objdump
from elfutils (calledeu-objdump
on some distributions) does not have this limitation (but only has a limited subset of the feature of GNUobjdump
). ↩︎I wrote this tool because
objcopy --dump-section
was not completely satisfying. ↩︎ ↩︎With the GNU BFD linker, the layout of sections after linking is given by a linker script. The default linker script can be seen with
ld -verbose
. Another linker script can be used withld -T some_linker_script
. ↩︎The C structures (and the associated comments) are taken from the GNU
elf.h
file. Only the 64 bit variant is displayed here. ↩︎This is an extension to the ELF standard not documented in the specification. ↩︎
They are using PIC code (Position Independent Code). They must be compiled with
cc -fpic
(or-fPIC
). ↩︎In contrast to PE (Portable Executable) files, the (readonly) text segment (such as the code) is shared for all processes (and with the filesystem cache) even if the shared-object is loaded at different addresses. In order to achieve this, the code for shared-objects should be compiled as PIC (Position Independent Code).
PE files are built with a preferred address and if they must be relocated, the code becomes private to the process. In other words, Windows DLL (Dynamic-Link Library) do not use PIC. ↩︎
Prelinked DSOs are located at a given (non-null) address in the ELF file. ↩︎
They are compiled with
cc -fpie
(or-fPIE
). ↩︎The
.debug_frame
DWARF section is used to tell the debugger how to unwind each stack frameThe
.eh_frame
has been created in order to unwind the stack at runtime. This is used for exception handling.The
.eh_frame
section contains information for uwinding the frame for each instruction address. This is use by the Itanium C++ exception ABI to unwind the stack on exceptions. Its format is based on the.debug_frame
DWARF section..gnu_debuglink
is used to locate a separate file containing debug informations. Another solution is to use aNT_GNU_BUILD_ID
note. ↩︎.note.gnu.build-id
describes the build-id used to locate a separate ELF file containing the debug informations. This is theNT_GNU_BUILD_ID
note. ↩︎.gnu.warning
and.gnu_warning.XXX
contains warning message displayed by the linker to issue warnings when linking against this ELF file or this symbol respectively.Example:
Hex dump of section '.gnu.warning.gets': 0x00000000 74686520 60676574 73272066 756e6374 the `gets' funct 0x00000010 696f6e20 69732064 616e6765 726f7573 ion is dangerous 0x00000020 20616e64 2073686f 756c6420 6e6f7420 and should not 0x00000030 62652075 7365642e 00 be used..
↩︎ ↩︎For relocation sections which apply to a single section, the
sh_info
field is the index of the target section. ↩︎As a result, the sections in the ELF files are grouped in three parts:
-
the sections which belong to the text segment;
-
the sections which belong to the data segment;
-
the sections which do not belong to any segment (and are not available/used at runtime).
-
This means that there is usually no runtime relocation in the text segment: all the runtime relocations are done in the text segment.
If the
DT_TEXTREL
flag is present (or aDT_TEXTREL
dynamic table entry) is present, text relocation are present in this file. ↩︎This property is so important that the MPROTECT feature of the PaX (a Linux patch) prevents the existence of VMAs which are both executable and writable in most cases in order to enhance security. ↩︎
The VMA are the different available/mapped regions in the virtual address space. Each VMA has some properties such as:
-
permissions (rwx);
-
whether it is shared with other processes (
MAP_SHARED
) or private to this process (MAP_PRIVATE
); -
whether it has an associated file (and the offset of the VMA within the file);
-
etc.
They are created with
mmap()
(or similar) or directly by the kernel. On Linux, they can be seen in/proc/$pid/maps
or with thepmap
tool. ↩︎ ↩︎-
However they can use other techniques such as GOT infection and ROP (Return Oriented Programming). ↩︎
The PLT GOT is still vulnerable to GOT poisoning. ↩︎
In C, symbols have the name of the corresponding C function or variable on ELF systems.
In C++, function overloading, templates, namespaces and so on make it more difficult. The name of the object (including the types of its arguments for functions) is mangled to form the symbol. Different name mangling schemes exist, but modern versions of GCC and clang use the name mangling of the C++ Itanium ABI: For example with this ABI, the
foo::Something::bar(int)
method is mangled into_ZN3foo9Something3barEi
. Thec++filt
program can be used to demangle C++ symbol names (or the__cxa_demangle
function). ↩︎This is what appears in the
.o
file. In the shared-object or executable, it is converted toSTT_LOCAL
andSTV_DEFAULT
. ↩︎The usage of
STV_PROTECTED
symbols is not recommended because it slows down the dynamic linkage. ↩︎In fact, it creates two GOT sections:
.got
and.got.plt
. ↩︎The address of the GOT entry is
0x200990
and the address of.got
is0x200980
: the offset of the GOT entry within.got
is0x200990 - 0x200980 = 0x10 = 16
. Each GOT entry is 8 bytes on x86_64 so this is the third entry. ↩︎The usage of the PLT can be disabled at compile-time (for a given compilation unit) with
cc -fno-plt
or for a given function with__attribute__((noplt))
. This disables lazy binding. ↩︎See GNU Hash ELF Section by Ali Bahrami and How to write Shared Libraries by Ulrich Depper. ↩︎
Each shared-object dependency is described with a
DT_NEEDED
entry. A typical value islibfoo.so.6
(where6
is a version number). This file is searched in different directories by the dynamic linker. A same shared object can be present in different incompatible versions.The link editor
ld
links againstlibfoo.so
(using the-lfoo
flag) which is a symbolic link to the current version of the shared object. Shared objects usually contain aDT_SONAME
entry defining the full (shared-object) name (libfoo.so.6
) of this shared-object. This value is copied a asDT_NEEDED
entry in the dependent ELF objects.If no
DT_SONAME
is present, the link editor creates aDT_NEEDED
entry withlibfoo.so
instead when given the-lfoo
flags.If a full path to the shared object is given to
ld
and this shared object does not haveDT_SONAME
entry, the full path to the shared object will be used in theDT_NEEDED
entry. ↩︎DT_RPATH
serves the same purpose but is searched before theLD_LIBRARY_PATH
environment variable which is not considered a good solution. For this reason, theDT_RUNRPATH
was created as a replacement: the values ofDT_RUNPATH
are searched after theLD_LIBRARY_PATH
environment.DT_RPATH
is deprecated and ignored whenDT_RUNPATH
is present (and recognised by the dynamic linker). ↩︎There is no size/number of entries for the symbol table at the program header table level. This is not needed at runtime as the symbol lookup always go through the hash table. ↩︎
Solaris and GNU systems have the ability to handle different namespaces (see
dlmopen()
): different shared-object can be placed in different namespaces. Usually only two namespaces are used: one for the dynamic linker and a second one for the the application and the shared-object libraries. ↩︎This is on contrast with Windows PE (Portable Executable) files and MacOS X which both use a two-level namespace lookup: they import a given symbol from a given DLL (Dynamic-Link Library) or
.dyld
. ↩︎This is a simplification. Other things influence the order and the set of ELF modules used for a given lookup:
DT_SYMBOLIC
,dlopen()
,dlmopen()
etc.dlopen
-ed shared-object and their dependencies are not added to the global scope but only in a local scope (unlessRTLD_GLOBAL
is used).dlmopen()
can be used to create separate symbol namespaces with their own sets of ELF shared-objects. ↩︎The
DT_TEXTREL
dynamic table entry can be used as well but its usage is deprecated/optional. ↩︎