airs-notes/linkers-3.md

# Linkers part 3

Continuing notes on linkers.

## Address Spaces

An address space is simply a view of memory, in which each byte has an address.
The linker deals with three distinct types of address space.

Every input object file is a small address space: the contents have addresses,
and the symbols and relocations refer to the contents by addresses.

The output program will be placed at some location in memory when it runs.
This is the output address space, which I generally refer to as using virtual
memory addresses.

The output program will be loaded at some location in memory. This is the load
memory address. On typical Unix systems virtual memory addresses and load
memory addresses are the same. On embedded systems they are often different;
for example, the initialized data (the initial contents of global or static
variables) may be loaded into ROM at the load memory address, and then copied
into RAM at the virtual memory address.

Shared libraries can normally be run at different virtual memory address in
different processes. A shared library has a base address when it is created;
this is often simply zero. When the dynamic linker copies the shared library
into the virtual memory space of a process, it must apply relocations to
adjust the shared library to run at its virtual memory address. Shared library
systems minimize the number of relocations which must be applied, since they
take time when starting the program.

## Object File Formats

As I said above, an assembler turns human readable assembly language into an
object file. An object file is a binary data file written in a format designed
as input to the linker. The linker generates an executable file. This
executable file is a binary data file written in a format designed as input for
the operating system or the loader (this is true even when linking dynamically,
as normally the operating system loads the executable before invoking the
dynamic linker to begin running the program). There is no logical requirement
that the object file format resemble the executable file format. However,
in practice they are normally very similar.

Most object file formats define sections. A section typically holds memory
contents, or it may be used to hold other types of data. Sections generally
have a name, a type, a size, an address, and an associated array of data.

Object file formats may be classed in two general types: record oriented and
section oriented.

A record oriented object file format defines a series of records of varying
size. Each record starts with some special code, and may be followed by data.
Reading the object file requires reading it from the begininng and processing
each record. Records are used to describe symbols and sections. Relocations may
be associated with sections or may be specified by other records. IEEE-695
and Mach-O are record oriented object file formats used today.

In a section oriented object file format the file header describes a section
table with a specified number of sections. Symbols may appear in a separate
part of the object file described by the file header, or they may appear in a
special section. Relocations may be attached to sections, or they may appear in
separate sections. The object file may be read by reading the section table,
and then reading specific sections directly. ELF, COFF, PE, and a.out are
section oriented object file formats.

Every object file format needs to be able to represent debugging information.
Debugging informations is generated by the compiler and read by the debugger.
In general the linker can just treat it like any other type of data. However,
in practice the debugging information for a program can be larger than the
actual program itself. The linker can use various techniques to reduce the
amount of debugging information, thus reducing the size of the executable.
This can speed up the link, but requires the linker to understand the
debugging information.

The a.out object file format stores debugging information using special strings
in the symbol table, known as stabs. These special strings are simply the names
of symbols with a special type. This technique is also used by some variants of
ECOFF, and by older versions of Mach-O.

The COFF object file format stores debugging information using special fields
in the symbol table. This type information is limited, and is completely
inadequate for C++. A common technique to work around these limitations is to
embed stabs strings in a COFF section.

The ELF object file format stores debugging information in sections with
special names. The debugging information can be stabs strings or the DWARF
debugging format.

More next week.
add stuff 2021-01-12 20:17:52 +00:00			`# Linkers part 3`

			`Continuing notes on linkers.`

			`## Address Spaces`

			`An address space is simply a view of memory, in which each byte has an address.`
			`The linker deals with three distinct types of address space.`

			`Every input object file is a small address space: the contents have addresses,`
			`and the symbols and relocations refer to the contents by addresses.`

			`The output program will be placed at some location in memory when it runs.`
			`This is the output address space, which I generally refer to as using virtual`
			`memory addresses.`

			`The output program will be loaded at some location in memory. This is the load`
			`memory address. On typical Unix systems virtual memory addresses and load`
			`memory addresses are the same. On embedded systems they are often different;`
			`for example, the initialized data (the initial contents of global or static`
			`variables) may be loaded into ROM at the load memory address, and then copied`
			`into RAM at the virtual memory address.`

			`Shared libraries can normally be run at different virtual memory address in`
			`different processes. A shared library has a base address when it is created;`
			`this is often simply zero. When the dynamic linker copies the shared library`
			`into the virtual memory space of a process, it must apply relocations to`
			`adjust the shared library to run at its virtual memory address. Shared library`
			`systems minimize the number of relocations which must be applied, since they`
			`take time when starting the program.`

			`## Object File Formats`

			`As I said above, an assembler turns human readable assembly language into an`
			`object file. An object file is a binary data file written in a format designed`
			`as input to the linker. The linker generates an executable file. This`
			`executable file is a binary data file written in a format designed as input for`
			`the operating system or the loader (this is true even when linking dynamically,`
			`as normally the operating system loads the executable before invoking the`
			`dynamic linker to begin running the program). There is no logical requirement`
			`that the object file format resemble the executable file format. However,`
			`in practice they are normally very similar.`

			`Most object file formats define sections. A section typically holds memory`
			`contents, or it may be used to hold other types of data. Sections generally`
			`have a name, a type, a size, an address, and an associated array of data.`

			`Object file formats may be classed in two general types: record oriented and`
			`section oriented.`

			`A record oriented object file format defines a series of records of varying`
			`size. Each record starts with some special code, and may be followed by data.`
			`Reading the object file requires reading it from the begininng and processing`
			`each record. Records are used to describe symbols and sections. Relocations may`
			`be associated with sections or may be specified by other records. IEEE-695`
			`and Mach-O are record oriented object file formats used today.`

			`In a section oriented object file format the file header describes a section`
			`table with a specified number of sections. Symbols may appear in a separate`
			`part of the object file described by the file header, or they may appear in a`
			`special section. Relocations may be attached to sections, or they may appear in`
			`separate sections. The object file may be read by reading the section table,`
			`and then reading specific sections directly. ELF, COFF, PE, and a.out are`
			`section oriented object file formats.`

			`Every object file format needs to be able to represent debugging information.`
			`Debugging informations is generated by the compiler and read by the debugger.`
			`In general the linker can just treat it like any other type of data. However,`
			`in practice the debugging information for a program can be larger than the`
			`actual program itself. The linker can use various techniques to reduce the`
			`amount of debugging information, thus reducing the size of the executable.`
			`This can speed up the link, but requires the linker to understand the`
			`debugging information.`

			`The a.out object file format stores debugging information using special strings`
			`in the symbol table, known as stabs. These special strings are simply the names`
			`of symbols with a special type. This technique is also used by some variants of`
			`ECOFF, and by older versions of Mach-O.`

			`The COFF object file format stores debugging information using special fields`
			`in the symbol table. This type information is limited, and is completely`
			`inadequate for C++. A common technique to work around these limitations is to`
			`embed stabs strings in a COFF section.`

			`The ELF object file format stores debugging information in sections with`
			`special names. The debugging information can be stabs strings or the DWARF`
			`debugging format.`

			`More next week.`