Program structure

  • Programs begin as high-level source code, which the compiler translates into low-level assembly, then an assembler produces object files, and a linker combines them into executables ready for execution.

High-level vs low-level languages

  • High-level languages (e.g., C/C++) abstract away machine details, offering constructs like loops and local variables to improve productivity and portability.
  • Low-level assembly provides direct control over instructions, essential for firmware, OS handlers, bootloaders, performance-critical routines, and exploit development.

compilation process

  • The compiler emits assembly plus additional stubs and runtime code (e.g., PLT stubs for lazy binding, C-runtime init sequences, syscall stubs).

Cross-compiling for other architectures

  • Cross-compilers and cross-assemblers allow building ARM binaries on non-ARM hosts by generating target-architecture object files remotely.

Assembling and linking

  • Assembler: Translates assembly source to object (.o) files.
  • Linker: Resolves symbols across object files and libraries, producing executables or shared libraries.

ELF file overview

  • ELF binaries self-describe their sections (code, data, etc.) and memory protections via program headers, which loaders map into memory.

ELF file header

  • Magic & class: Identifies ELF, 32- or 64-bit, endianness, version, ABI.
  • Target fields: Architecture (e.g., AArch64), ABI version.
  • Entry point: Virtual address where execution begins.
  • Table locations: Offsets/counts for program and section header tables.

ELF program headers

Each entry specifies a segment type, file offset, virtual memory address, size, and flags.

  • PHDR: Describes where the program header table itself resides, enabling dynamic linkers to find it.
  • INTERP: Points to the dynamic linker path for executable files.
  • LOAD: Maps file regions (e.g., .text, .data, .bss) into memory with read/execute or read/write permissions; zero-fills gaps in .bss.
  • DYNAMIC: References the .dynamic section for shared-library dependencies and relocation info.
  • NOTE: Contains vendor-specific metadata (ABI versions, build IDs) as name/value tables.
  • TLS: Defines thread-local storage templates; layout detailed later.
  • GNU_EH_FRAME: Points to unwind tables for debuggers and C++ exception handling.
  • GNU_STACK: Marks stack executable/non-executable for security enforcement.
  • GNU_RELRO: Marks critical sections read-only after relocations (protects GOT, dynamic section).

ELF section headers

Detail individual sections by name, type, flags, address, offset, and size.

  • Meta-sections:

    • .strtab/.dynstr: String tables for names.
    • .symtab/.dynsym: Symbol tables listing functions/variables.
  • Main sections:

    • .text: Executable code.
    • .data: Initialized globals, read/write.
    • .bss: Zero-initialized globals, not stored on disk.
    • .rodata: Read-only data (string literals, const).
    • .tdata / .tbss: Initialized and uninitialized thread-local data.

Symbols

Symbols describe functions/variables with binding, type, section index, value, and size.

  • Global vs local: Global visible across objects; local confined to one.
  • Weak symbols: Can be overridden by strong definitions at link time.
  • Symbol versions: Maintain ABI compatibility by tagging versions (e.g., GLIBC_2.17).
  • Mapping symbols: ARM-specific ($a, $t, $x, $d) guide disassembly through mixed-code sections.

The dynamic section and dynamic loading

  • .dynamic entries instruct the loader on dependencies, relocations, and symbol tables.
  • Dependency loading (NEEDED): Lists shared libraries required; loader pulls them (and their dependencies) in turn.

Program relocations

  • Static: REL/RELA relocations applied by linker before execution.
  • Dynamic: Loader resolves external symbols at load time, writing addresses into GOT slots. and
  • GOT: Consolidates external addresses to avoid repeated lookups; calls load via .got entries.
  • PLT: Small stubs that on first call resolve symbols lazily and then jump directly on subsequent calls.

ELF program initialization and termination sections

  • INIT / INIT_ARRAY: Functions or lists run before main.
  • FINI_ARRAY / FINI: Functions or lists run after main returns or on library unload.

Initialization and termination order

  1. Load segments and zero‐init globals (.bss).
  2. Load and initialize dependencies.
  3. Register FINI/FINI_ARRAY entries (atexit or dlclose).
  4. Call PREINIT_ARRAY, then INIT_ARRAY, then INIT stub.
  5. Transfer control to program entry point (C runtime → main). and

Thread-local storage

  • Definition: Each thread has its own copy of TLS variables; accessed via a thread-pointer register (TPIDR_EL0 on AArch64).
  • .tdata/.tbss: TLS templates and zero segments; symbol table entries map TLS offset templates. and
  • Runtime: Thread Pointer → Thread Control Block → DTV array → per-module TLS area.