Browse Source

fix links in README

main
sys64738 3 months ago
parent
commit
458bce29af
3 changed files with 2619 additions and 1 deletions
  1. +1
    -1
      README.md
  2. +2503
    -0
      elf.html
  3. +115
    -0
      masto-thread.md

+ 1
- 1
README.md View File

@@ -62,7 +62,7 @@ Here's a collection of links about the subject, I'm putting these here because
people seem to find these useful.

* [`elf(5)` manpage](https://linux.die.net/man/5/elf)
* [unofficial ELF docs](https://cs.stevens.edu/%7Ejschauma/631A/elf.html) (has
* [unofficial ELF docs](elf.html) (has
more than the manpage, also has extra links)
* [glibc internals](http://s.eresi-project.org/inc/articles/elf-rtld.txt)
* [stuff about `.gnu.hash`](https://web.archive.org/web/20111022202443/http://blogs.oracle.com/ali/entry/gnu_hash_elf_sections)


+ 2503
- 0
elf.html
File diff suppressed because it is too large
View File


+ 115
- 0
masto-thread.md View File

@@ -0,0 +1,115 @@
# Rough transcriptions of a thread on Mastodon

Here are some useful parts of posts I made on Mastodon, I haven't cleaned
them up too much.

## General structure of an ELF file

An ELF file starts with an ELF header (Ehdr), which contains offsets to the
program headers aka segments (Phdr), and the section headers (Shdr). also tells
you the entry point, architecture+bitsize, and which shdr is `.shstrtab`.

Shdrs and phdrs are explained [here](linkers-8.md). Both provide views on the
ELF file, but for different purposes. Though its kinda not a good idea, I'll
give you that.

The stuff pointed to by the shdrs and phdrs are:

* `text`, `data`, `rodata`, `bss`, ... blobs
* string table blobs (`strtab`, `shstrtab`, `dynstr`)
* interpreter, comment, etc etc
* relocation tables (`Rel`, `Rela`) (phdrs don't know about this one)
* symbol tables (`Sym`) (phdrs don't know about this one)
* versioning info (`Versym`, `Verdef`, `Verneed`) (phdrs don't know about this one)
* dynamic table (Dyn), which also has entries for relocations, symbols, versioning

Yes, its true that phdrs, shdrs, dynamic, symtab, dynsym, ... could've just been
tables right after the Ehdr, but that was apparently not complicated enough for
Sun.

## What are all the different sections for?

`.hash, .gnu.hash`: hash tables for looking up symbols. both do the same but
are slihgtly different in implementation, `.hash` comes from SysV R4 and has
been deprecated for ages, no clue why its still there. `.gnu.hash` is made by
the GNU people because they thought the SysV one wasnt good enough.

`.comment` is just a string the toolchain inserts to tell people its built with
the toolchain, for some reason.

`.shstrtab` is the blob that contains the section names (so the actual ".text",
".data", ... strings), for some reason (elaborated on later) this is stored
separately from the other string tables (the '`sh_name`' field of an
`ElfXX_Shdr` is an offset into this table).

`.rel*` and `.rela*` contain relocation info, used both during
static/"compile-time" linking and runtime/dynamic linking. binaries contain
only runtime relocation info, linkable objects contain only static linking info
(the linker has to figure out which symbols and relocations need to get truned
into dynamic ones).

`.gnu.version` and `.gnu.version_r` contain versioning information of symbols,
glibc uses this a lot, and practically nothing else

There's also `.debug*` and `.dwarf*` stuff for debug info, that's yet another
rabbithole im *not* going into this time.

Usually, an ELF binary (not a non-linked object) has two symbol tables,
`.symtab` and `.dynsym`. the former contains all the 'internal' symbols (the
part you can strip away), the latter are the imported and exported ones

`.strtab` contains the symbol name strings, the `st_name` of the `ElfXX_Sym`
entries in `.symtab` is again an offset, `.dynstr` contains the names of the
names of the `.dynsym` entries

However, section headers don't actually have to be present at all in binaries
(executables *and* libraries), only in linkable object files. you can just, get
rid of them completely (patch out the shdr-related fields in the ELF header),
and things still work, which is why and how you can get rid of the `.symtab`,
`.strtab`, `.shstrtab`, etc (and the shdr table itself), and thats also why all
the string tables are separate.

But how would ld.so find `.dynsym` etc. if the shdrs that point to them are
gone?

That's where `.dynamic` is for: it contains a bunch of only half-related
offsets of the file into a table: a list of library dependencies, offsets into
the `.dynsym`, `.dynstr`, `.gnu.version`, `.rel(a)`, ... tables, misc flags and
settings, and so on (the entries are key/value pairs, see `ElfXX_Dyn`).

But then how does ld.so find `.dynamic`?

That's what the phdrs are for (not). Originally, those are meant for the kernel
to see where in memory an executable needs to be mapped, with offset+address,
alignment, permission, ... info. But as that's the table the kernel looks at,
that's also where they added the info about which interpreter should be used for
the binary, whether the stack should be mapped NX, and so on. there's also one
containing the offset of the `.dynamic` table. the kernel doesnt touch it, but
thats how ld.so can reliably find it.

The thing is, you can have most things "gone" by removing all the sections, but
many of these will still actually be present because they have an entry in the
`.dynamic` table. which is not very useful. so if you want to get rid of some
stuff (hash tables, versioning info, ...), you'll first have to remove the
entries from the dyn table, and only *then* remove the relevant shdrs, as that
will properly remove it from the binary

Then you can nuke the shdr table itself using a tool like `sstrip` (usually
packaged in `elf-kickers` or `elfkickers` or ...), binutils/objcopy won't let
you do this.

And thats why, if you want a *small* output file, you want to either write the
ELF headers manually, or use/write a custom linker that doesn't emit all this
stuff.

## Random notes

* `ld.so` has to be linked with `-static-pie`
* All symbol tables must start with a zeroed-out entry, because the standard
says that symbol index 0 (when referencing a symbol elsewhere) means no
symbol, instead of index -1 or so. It's not a sentinel value.
* ld.so will use the hash tables first to look up symbols that are defined in
the binary before resorting to walking the symbol table manually. It probably
actually needs at least one of these two to be present in a binary nowdays.
`.hash` is provided as a fallback for when ld.so wouldn't know about
`.gnu.hash`, but that practically never happens.

Loading…
Cancel
Save