108 lines
5.9 KiB
Markdown
108 lines
5.9 KiB
Markdown
# Linkers part 2
|
||
|
||
I’m back, and I’m still doing the linker technical introduction.
|
||
|
||
Shared libraries were invented as an optimization for virtual memory systems
|
||
running many processes simultaneously. People noticed that there is a set of
|
||
basic functions which appear in almost every program. Before shared libraries,
|
||
in a system which runs multiple processes simultaneously, that meant that
|
||
almost every process had a copy of exactly the same code. This suggested that
|
||
on a virtual memory system it would be possible to arrange that code so that a
|
||
single copy could be shared by every process using it. The virtual memory
|
||
system would be used to map the single copy into the address space of each
|
||
process which needed it. This would require less physical memory to run
|
||
multiple programs, and thus yield better performance.
|
||
|
||
I believe the first implementation of shared libraries was on SVR3, based on
|
||
COFF. This implementation was simple, and basically assigned each shared
|
||
library a fixed portion of the virtual address space. This did not require any
|
||
significant changes to the linker. However, requiring each shared library to
|
||
reserve an appropriate portion of the virtual address space was inconvenient.
|
||
|
||
SunOS4 introduced a more flexible version of shared libraries, which was later
|
||
picked up by SVR4. This implementation postponed some of the operation of the
|
||
linker to runtime. When the program started, it would automatically run a
|
||
limited version of the linker which would link the program proper with the
|
||
shared libraries. The version of the linker which runs when the program starts
|
||
is known as the dynamic linker. When it is necessary to distinguish them, I
|
||
will refer to the version of the linker which creates the program as the
|
||
program linker. This type of shared libraries was a significant change to the
|
||
traditional program linker: it now had to build linking information which could
|
||
be used efficiently at runtime by the dynamic linker.
|
||
|
||
That is the end of the introduction. You should now understand the basics of
|
||
what a linker does. I will now turn to how it does it.
|
||
|
||
## Basic Linker Data Types
|
||
|
||
The linker operates on a small number of basic data types: symbols,
|
||
relocations, and contents. These are defined in the input object files. Here is
|
||
an overview of each of these.
|
||
|
||
A symbol is basically a name and a value. Many symbols represent static objects
|
||
in the original source code–that is, objects which exist in a single place for
|
||
the duration of the program. For example, in an object file generated from C
|
||
code, there will be a symbol for each function and for each global and static
|
||
variable. The value of such a symbol is simply an offset into the contents.
|
||
This type of symbol is known as a defined symbol. It’s important not to confuse
|
||
the value of the symbol representing the variable `my_global_var` with the
|
||
value of `my_global_var` itself. The value of the symbol is roughly the address
|
||
of the variable: the value you would get from the expression
|
||
`&my_global_var` in C.
|
||
|
||
Symbols are also used to indicate a reference to a name defined in a different
|
||
object file. Such a reference is known as an undefined symbol. There are other
|
||
less commonly used types of symbols which I will describe later.
|
||
|
||
During the linking process, the linker will assign an address to each defined
|
||
symbol, and will resolve each undefined symbol by finding a defined symbol with
|
||
the same name.
|
||
|
||
A relocation is a computation to perform on the contents. Most relocations
|
||
refer to a symbol and to an offset within the contents. Many relocations will
|
||
also provide an additional operand, known as the addend. A simple, and commonly
|
||
used, relocation is “set this location in the contents to the value of this
|
||
symbol plus this addend.” The types of computations that relocations do are
|
||
inherently dependent on the architecture of the processor for which the linker
|
||
is generating code. For example, RISC processors which require two or more
|
||
instructions to form a memory address will have separate relocations to be
|
||
used with each of those instructions; for example, “set this location in the
|
||
contents to the lower 16 bits of the value of this symbol.”
|
||
|
||
During the linking process, the linker will perform all of the relocation
|
||
computations as directed. A relocation in an object file may refer to an
|
||
undefined symbol. If the linker is unable to resolve that symbol, it will
|
||
normally issue an error (but not always: for some symbol types or some
|
||
relocation types an error may not be appropriate).
|
||
|
||
The contents are what memory should look like during the execution of the
|
||
program. Contents have a size, an array of bytes, and a type. They contain the
|
||
machine code generated by the compiler and assembler (known as text). They
|
||
contain the values of initialized variables (data). They contain static
|
||
unnamed data like string constants and switch tables (read-only data or rdata).
|
||
They contain uninitialized variables, in which case the array of bytes is
|
||
generally omitted and assumed to contain only zeroes (bss). The compiler and
|
||
the assembler work hard to generate exactly the right contents, but the linker
|
||
really doesn’t care about them except as raw data. The linker reads the
|
||
contents from each file, concatenates them all together sorted by type,
|
||
applies the relocations, and writes the result into the executable file.
|
||
|
||
## Basic Linker Operation
|
||
|
||
At this point we already know enough to understand the basic steps used by
|
||
every linker.
|
||
|
||
* Read the input object files. Determine the length and type of the contents.
|
||
Read the symbols.
|
||
* Build a symbol table containing all the symbols, linking undefined symbols to
|
||
their definitions.
|
||
* Decide where all the contents should go in the output executable file, which
|
||
means deciding where they should go in memory when the program runs.
|
||
* Read the contents data and the relocations. Apply the relocations to the
|
||
contents. Write the result to the output file.
|
||
* Optionally write out the complete symbol table with the final values of the
|
||
symbols.
|
||
|
||
More tomorrow.
|
||
|