Program structure
- Programs begin as high-level source code, which the compiler translates into low-level assembly, then an assembler produces object files, and a linker combines them into executables ready for execution.
High-level vs low-level languages
- High-level languages (e.g., C/C++) abstract away machine details, offering constructs like loops and local variables to improve productivity and portability.
- Low-level assembly provides direct control over instructions, essential for firmware, OS handlers, bootloaders, performance-critical routines, and exploit development.
compilation process
- The compiler emits assembly plus additional stubs and runtime code (e.g., PLT stubs for lazy binding, C-runtime init sequences, syscall stubs).
Cross-compiling for other architectures
- Cross-compilers and cross-assemblers allow building ARM binaries on non-ARM hosts by generating target-architecture object files remotely.
Assembling and linking
- Assembler: Translates assembly source to object (.o) files.
- Linker: Resolves symbols across object files and libraries, producing executables or shared libraries.
ELF file overview
- ELF binaries self-describe their sections (code, data, etc.) and memory protections via program headers, which loaders map into memory.
ELF file header
- Magic & class: Identifies ELF, 32- or 64-bit, endianness, version, ABI.
- Target fields: Architecture (e.g., AArch64), ABI version.
- Entry point: Virtual address where execution begins.
- Table locations: Offsets/counts for program and section header tables.
ELF program headers
Each entry specifies a segment type, file offset, virtual memory address, size, and flags.
- PHDR: Describes where the program header table itself resides, enabling dynamic linkers to find it.
- INTERP: Points to the dynamic linker path for executable files.
- LOAD: Maps file regions (e.g.,
.text
,.data
,.bss
) into memory with read/execute or read/write permissions; zero-fills gaps in.bss
. - DYNAMIC: References the
.dynamic
section for shared-library dependencies and relocation info. - NOTE: Contains vendor-specific metadata (ABI versions, build IDs) as name/value tables.
- TLS: Defines thread-local storage templates; layout detailed later.
- GNU_EH_FRAME: Points to unwind tables for debuggers and C++ exception handling.
- GNU_STACK: Marks stack executable/non-executable for security enforcement.
- GNU_RELRO: Marks critical sections read-only after relocations (protects GOT, dynamic section).
ELF section headers
Detail individual sections by name, type, flags, address, offset, and size.
-
Meta-sections:
.strtab
/.dynstr
: String tables for names..symtab
/.dynsym
: Symbol tables listing functions/variables.
-
Main sections:
.text
: Executable code..data
: Initialized globals, read/write..bss
: Zero-initialized globals, not stored on disk..rodata
: Read-only data (string literals,const
)..tdata
/.tbss
: Initialized and uninitialized thread-local data.
Symbols
Symbols describe functions/variables with binding, type, section index, value, and size.
- Global vs local: Global visible across objects; local confined to one.
- Weak symbols: Can be overridden by strong definitions at link time.
- Symbol versions: Maintain ABI compatibility by tagging versions (e.g., GLIBC_2.17).
- Mapping symbols: ARM-specific ($a, $t, $x, $d) guide disassembly through mixed-code sections.
The dynamic section and dynamic loading
- .dynamic entries instruct the loader on dependencies, relocations, and symbol tables.
- Dependency loading (NEEDED): Lists shared libraries required; loader pulls them (and their dependencies) in turn.
Program relocations
- Static: REL/RELA relocations applied by linker before execution.
- Dynamic: Loader resolves external symbols at load time, writing addresses into GOT slots. and
- GOT: Consolidates external addresses to avoid repeated lookups; calls load via .got entries.
- PLT: Small stubs that on first call resolve symbols lazily and then jump directly on subsequent calls.
ELF program initialization and termination sections
- INIT / INIT_ARRAY: Functions or lists run before
main
. - FINI_ARRAY / FINI: Functions or lists run after
main
returns or on library unload.
Initialization and termination order
- Load segments and zero‐init globals (.bss).
- Load and initialize dependencies.
- Register FINI/FINI_ARRAY entries (
atexit
ordlclose
). - Call PREINIT_ARRAY, then INIT_ARRAY, then INIT stub.
- Transfer control to program entry point (C runtime →
main
). and
Thread-local storage
- Definition: Each thread has its own copy of TLS variables; accessed via a thread-pointer register (TPIDR_EL0 on AArch64).
- .tdata/.tbss: TLS templates and zero segments; symbol table entries map TLS offset templates. and
- Runtime: Thread Pointer → Thread Control Block → DTV array → per-module TLS area.