airs-notes/linkers-3.md

# Linkers part 3

Continuing notes on linkers.

## Address Spaces

An address space is simply a view of memory, in which each byte has an address.
The linker deals with three distinct types of address space.

Every input object file is a small address space: the contents have addresses,
and the symbols and relocations refer to the contents by addresses.

The output program will be placed at some location in memory when it runs.
This is the output address space, which I generally refer to as using virtual
memory addresses.

The output program will be loaded at some location in memory. This is the load
memory address. On typical Unix systems virtual memory addresses and load
memory addresses are the same. On embedded systems they are often different;
for example, the initialized data (the initial contents of global or static
variables) may be loaded into ROM at the load memory address, and then copied
into RAM at the virtual memory address.

Shared libraries can normally be run at different virtual memory address in
different processes. A shared library has a base address when it is created;
this is often simply zero. When the dynamic linker copies the shared library
into the virtual memory space of a process, it must apply relocations to
adjust the shared library to run at its virtual memory address. Shared library
systems minimize the number of relocations which must be applied, since they
take time when starting the program.

## Object File Formats

As I said above, an assembler turns human readable assembly language into an
object file. An object file is a binary data file written in a format designed
as input to the linker. The linker generates an executable file. This
executable file is a binary data file written in a format designed as input for
the operating system or the loader (this is true even when linking dynamically,
as normally the operating system loads the executable before invoking the
dynamic linker to begin running the program). There is no logical requirement
that the object file format resemble the executable file format. However,
in practice they are normally very similar.

Most object file formats define sections. A section typically holds memory
contents, or it may be used to hold other types of data. Sections generally
have a name, a type, a size, an address, and an associated array of data.

Object file formats may be classed in two general types: record oriented and
section oriented.

A record oriented object file format defines a series of records of varying
size. Each record starts with some special code, and may be followed by data.
Reading the object file requires reading it from the begininng and processing
each record. Records are used to describe symbols and sections. Relocations may
be associated with sections or may be specified by other records. IEEE-695
and Mach-O are record oriented object file formats used today.

In a section oriented object file format the file header describes a section
table with a specified number of sections. Symbols may appear in a separate
part of the object file described by the file header, or they may appear in a
special section. Relocations may be attached to sections, or they may appear in
separate sections. The object file may be read by reading the section table,
and then reading specific sections directly. ELF, COFF, PE, and a.out are
section oriented object file formats.

Every object file format needs to be able to represent debugging information.
Debugging informations is generated by the compiler and read by the debugger.
In general the linker can just treat it like any other type of data. However,
in practice the debugging information for a program can be larger than the
actual program itself. The linker can use various techniques to reduce the
amount of debugging information, thus reducing the size of the executable.
This can speed up the link, but requires the linker to understand the
debugging information.

The a.out object file format stores debugging information using special strings
in the symbol table, known as stabs. These special strings are simply the names
of symbols with a special type. This technique is also used by some variants of
ECOFF, and by older versions of Mach-O.

The COFF object file format stores debugging information using special fields
in the symbol table. This type information is limited, and is completely
inadequate for C++. A common technique to work around these limitations is to
embed stabs strings in a COFF section.

The ELF object file format stores debugging information in sections with
special names. The debugging information can be stabs strings or the DWARF
debugging format.

More next week.