93 lines
5.3 KiB
Markdown
93 lines
5.3 KiB
Markdown
# Linkers part 14
|
||
|
||
## Link Time Optimization
|
||
|
||
I’ve already mentioned some optimizations which are peculiar to the linker:
|
||
relaxation and garbage collection of unwanted sections. There is another class
|
||
of optimizations which occur at link time, but are really related to the
|
||
compiler. The general name for these optimizations is link time optimization or
|
||
whole program optimization.
|
||
|
||
The general idea is that the compiler optimization passes are run at link time.
|
||
The advantage of running them at link time is that the compiler can then see
|
||
the entire program. This permits the compiler to perform optimizations which
|
||
can not be done when sources files are compiled separately. The most obvious
|
||
such optimization is inlining functions across source files. Another is
|
||
optimizing the calling sequence for simple functions–e.g., passing more
|
||
parameters in registers, or knowing that the function will not clobber all
|
||
registers; this can only be done when the compiler can see all callers of the
|
||
function. Experience shows that these and other optimizations can bring
|
||
significant performance benefits.
|
||
|
||
Generally these optimizations are implemented by having the compiler write a
|
||
version of its intermediate representation into the object file, or into some
|
||
parallel file. The intermediate representation will be the parsed version of
|
||
the source file, and may already have had some local optimizations applied.
|
||
Sometimes the object file contains only the compiler intermediate
|
||
representation, sometimes it also contains the usual object code. In the former
|
||
case link time optimization is required, in the latter case it is optional.
|
||
|
||
I know of two typical ways to implement link time optimization. The first
|
||
approach is for the compiler to provide a pre-linker. The pre-linker examines
|
||
the object files looking for stored intermediate representation. When it finds
|
||
some, it runs the link time optimization passes. The second approach is for the
|
||
linker proper to call back into the compiler when it finds intermediate
|
||
representation. This is generally done via some sort of plugin API.
|
||
|
||
Although these optimizations happen at link time, they are not part of the
|
||
linker proper, at least not as I defined it. When the compiler reads the stored
|
||
intermediate representation, it will eventually generate an object file, one
|
||
way or another. The linker proper will then process that object file as usual.
|
||
These optimizations should be thought of as part of the compiler.
|
||
|
||
## Initialization Code
|
||
|
||
C++ permits globals variables to have constructors and destructors. The global
|
||
constructors must be run before main starts, and the global destructors must be
|
||
run after exit is called. Making this work requires the compiler and the linker
|
||
to cooperate.
|
||
|
||
The a.out object file format is rarely used these days, but the GNU a.out
|
||
linker has an interesting extension. In a.out symbols have a one byte type
|
||
field. This encodes a bunch of debugging information, and also the section in
|
||
which the symbol is defined. The a.out object file format only supports three
|
||
sections–text, data, and bss. Four symbol types are defined as sets: text set,
|
||
data set, bss set, and absolute set. A symbol with a set type is permitted to
|
||
be defined multiple times. The GNU linker will not give a multiple definition
|
||
error, but will instead build a table with all the values of the symbol. The
|
||
table will start with one word holding the number of entries, and will end with
|
||
a zero word. In the output file the set symbol will be defined as the address
|
||
of the start of the table.
|
||
|
||
For each C++ global constructor, the compiler would generate a symbol named
|
||
`__CTOR_LIST__` with the text set type. The value of the symbol in the object
|
||
file would be the global constructor function. The linker would gather together
|
||
all the `__CTOR_LIST__` functions into a table. The startup code supplied by
|
||
the compiler would walk down the `__CTOR_LIST__` table and call each function.
|
||
Global destructors were handled similarly, with the name `__DTOR_LIST__`.
|
||
|
||
Anyhow, so much for a.out. In ELF, global constructors are handled in a fairly
|
||
similar way, but without using magic symbol types. I’ll describe what gcc does.
|
||
An object file which defines a global constructor will include a `.ctors`
|
||
section. The compiler will arrange to link special object files at the very
|
||
start and very end of the link. The one at the start of the link will define a
|
||
symbol for the `.ctors` section; that symbol will wind up at the start of the
|
||
section. The one at the end of the link will define a symbol for the end of the
|
||
`.ctors` section. The compiler startup code will walk between the two symbols,
|
||
calling the constructors. Global destructors work similarly, in a `.dtors`
|
||
section.
|
||
|
||
ELF shared libraries work similarly. When the dynamic linker loads a shared
|
||
library, it will call the function at the `DT_INIT` tag if there is one. By
|
||
convention the ELF program linker will set this to the function named `_init`,
|
||
if there is one. Similarly the `DT_FINI` tag is called when a shared library is
|
||
unloaded, and the program linker will set this to the function named `_fini`.
|
||
|
||
As I mentioned earlier, three are also `DT_INIT_ARRAY`, `DT_PREINIT_ARRAY`, and
|
||
`DT_FINI_ARRAY` tags, which are set based on the `SHT_INIT_ARRAY`,
|
||
`SHT_PREINIT_ARRAY`, and `SHT_FINI_ARRAY` section types. This is a newer
|
||
approach in ELF, and does not require relying on special symbol names.
|
||
|
||
More tomorrow.
|
||
|