91 lines
4.6 KiB
Markdown
91 lines
4.6 KiB
Markdown
|
# Linkers part 3
|
||
|
|
||
|
Continuing notes on linkers.
|
||
|
|
||
|
## Address Spaces
|
||
|
|
||
|
An address space is simply a view of memory, in which each byte has an address.
|
||
|
The linker deals with three distinct types of address space.
|
||
|
|
||
|
Every input object file is a small address space: the contents have addresses,
|
||
|
and the symbols and relocations refer to the contents by addresses.
|
||
|
|
||
|
The output program will be placed at some location in memory when it runs.
|
||
|
This is the output address space, which I generally refer to as using virtual
|
||
|
memory addresses.
|
||
|
|
||
|
The output program will be loaded at some location in memory. This is the load
|
||
|
memory address. On typical Unix systems virtual memory addresses and load
|
||
|
memory addresses are the same. On embedded systems they are often different;
|
||
|
for example, the initialized data (the initial contents of global or static
|
||
|
variables) may be loaded into ROM at the load memory address, and then copied
|
||
|
into RAM at the virtual memory address.
|
||
|
|
||
|
Shared libraries can normally be run at different virtual memory address in
|
||
|
different processes. A shared library has a base address when it is created;
|
||
|
this is often simply zero. When the dynamic linker copies the shared library
|
||
|
into the virtual memory space of a process, it must apply relocations to
|
||
|
adjust the shared library to run at its virtual memory address. Shared library
|
||
|
systems minimize the number of relocations which must be applied, since they
|
||
|
take time when starting the program.
|
||
|
|
||
|
## Object File Formats
|
||
|
|
||
|
As I said above, an assembler turns human readable assembly language into an
|
||
|
object file. An object file is a binary data file written in a format designed
|
||
|
as input to the linker. The linker generates an executable file. This
|
||
|
executable file is a binary data file written in a format designed as input for
|
||
|
the operating system or the loader (this is true even when linking dynamically,
|
||
|
as normally the operating system loads the executable before invoking the
|
||
|
dynamic linker to begin running the program). There is no logical requirement
|
||
|
that the object file format resemble the executable file format. However,
|
||
|
in practice they are normally very similar.
|
||
|
|
||
|
Most object file formats define sections. A section typically holds memory
|
||
|
contents, or it may be used to hold other types of data. Sections generally
|
||
|
have a name, a type, a size, an address, and an associated array of data.
|
||
|
|
||
|
Object file formats may be classed in two general types: record oriented and
|
||
|
section oriented.
|
||
|
|
||
|
A record oriented object file format defines a series of records of varying
|
||
|
size. Each record starts with some special code, and may be followed by data.
|
||
|
Reading the object file requires reading it from the begininng and processing
|
||
|
each record. Records are used to describe symbols and sections. Relocations may
|
||
|
be associated with sections or may be specified by other records. IEEE-695
|
||
|
and Mach-O are record oriented object file formats used today.
|
||
|
|
||
|
In a section oriented object file format the file header describes a section
|
||
|
table with a specified number of sections. Symbols may appear in a separate
|
||
|
part of the object file described by the file header, or they may appear in a
|
||
|
special section. Relocations may be attached to sections, or they may appear in
|
||
|
separate sections. The object file may be read by reading the section table,
|
||
|
and then reading specific sections directly. ELF, COFF, PE, and a.out are
|
||
|
section oriented object file formats.
|
||
|
|
||
|
Every object file format needs to be able to represent debugging information.
|
||
|
Debugging informations is generated by the compiler and read by the debugger.
|
||
|
In general the linker can just treat it like any other type of data. However,
|
||
|
in practice the debugging information for a program can be larger than the
|
||
|
actual program itself. The linker can use various techniques to reduce the
|
||
|
amount of debugging information, thus reducing the size of the executable.
|
||
|
This can speed up the link, but requires the linker to understand the
|
||
|
debugging information.
|
||
|
|
||
|
The a.out object file format stores debugging information using special strings
|
||
|
in the symbol table, known as stabs. These special strings are simply the names
|
||
|
of symbols with a special type. This technique is also used by some variants of
|
||
|
ECOFF, and by older versions of Mach-O.
|
||
|
|
||
|
The COFF object file format stores debugging information using special fields
|
||
|
in the symbol table. This type information is limited, and is completely
|
||
|
inadequate for C++. A common technique to work around these limitations is to
|
||
|
embed stabs strings in a COFF section.
|
||
|
|
||
|
The ELF object file format stores debugging information in sections with
|
||
|
special names. The debugging information can be stabs strings or the DWARF
|
||
|
debugging format.
|
||
|
|
||
|
More next week.
|
||
|
|