Browse Source

add stuff

main
sys64738 8 months ago
parent
commit
bd3524e516
  1. 44
      README.md
  2. 58
      combining-versions.md
  3. 124
      eh_frame.md
  4. 49
      eh_frame_hdr.md
  5. 104
      executable-stack.md
  6. 56
      gcc-exception-frames.md
  7. 157
      gcc_except_table.md
  8. 23
      linker-combreloc.md
  9. 56
      linker-relro.md
  10. 83
      linkers-1.md
  11. 37
      linkers-10.md
  12. 49
      linkers-11.md
  13. 110
      linkers-12.md
  14. 91
      linkers-13.md
  15. 92
      linkers-14.md
  16. 66
      linkers-15.md
  17. 87
      linkers-16.md
  18. 29
      linkers-17.md
  19. 53
      linkers-18.md
  20. 139
      linkers-19.md
  21. 107
      linkers-2.md
  22. 34
      linkers-20.md
  23. 90
      linkers-3.md
  24. 177
      linkers-4.md
  25. 184
      linkers-5.md
  26. 127
      linkers-6.md
  27. 176
      linkers-7.md
  28. 193
      linkers-8.md
  29. 104
      linkers-9.md
  30. 49
      piece-of-pie.md
  31. 91
      protected-symbols.md
  32. 120
      version-scripts.md

44
README.md

@ -1,3 +1,45 @@
# airs-notes
Collection of ELF and GOLD linker notes from AIRS' blog, for easier searching
## Source
https://www.airs.com/blog/index.php?s=linkers+part
Authored and copyright by Ian Lance Taylor, collected here fore easy lookup.
## Index
[Linkers part 1: introduction](/linkers-1.md)
[Linkers part 2: technial introduction](/linkers-2.md)
[Linkers part 3: address spaces, object file formats](/linkers-3.md)
[Linkers part 4: shared libraries](/linkers-4.md)
[Linkers part 5: shared libraries redux, ELF symbols](/linkers-5.md)
[Linkers part 6: relocations, position-dependent libraries](/linkers-6.md)
[Linkers part 7: thread-local storage](/linkers-7.md)
[Linkers part 8: ELF segments and sections](/linkers-8.md)
[Linkers part 9: symbol versions, relaxation](/linkers-9.md)
[Linkers part 10: parallel linking](/linkers-10.md)
[Linkers part 11: archives](/linkers-11.md)
[Linkers part 12: symbol resolution](/linkers-12.md)
[Linkers part 13: symbol versions redux](/linkers-13.md)
[Linkers part 14: link-time optimization, initialization code](/linkers-14.md)
[Linkers part 15: COMDAT sections](/linkers-15.md)
[Linkers part 16: C++ template instantiation, exception frames](/linkers-16.md)
[Linkers part 17: warning symbols](/linkers-17.md)
[Linkers part 18: incremental linking](/linkers-18.md)
[Linkers part 19: `__start` and `__stop` symbols, byte swapping](/linkers-19.md)
[Linkers part 20: ending note](/linkers-20.md)
Other articles included as well:
[GCC exception frames](/gcc-exception-frames.md)
[Linker combreloc](/linker-combreloc.md)
[Linker relro](/linker-relro.md)
[Combining versions](/combining-versions.md)
[Version scripts](/version-scripts.md)
[Protected symbols](/protected-symbols.md)
[`.eh_frame`](/eh_frame.md)
[`.eh_frame_hdr`](/eh_frame_hdr.md)
[`.gcc_except_table`](/gcc_except_table.md)
[Executable stack](/executable-stack.md)
[Piece of PIE](/piece-of-pie.md)

58
combining-versions.md

@ -0,0 +1,58 @@
# Combining versions
Sun introduced a symbol versioning scheme to use for the linker. Their
implementation is relatively simple: symbol versions are defined in a version
script provided when a shared library was created. The dynamic linker can
verify that all required versions are present. This is useful for ensuring that
an application can run with a specific version of the library.
In the Sun versioning scheme, when a symbol is changed to have an incompatible
interface, the library file name must change. This then produces a new
`DT_SONAME` entry, which leads to new `DT_NEEDED` entries, and thus manages
incompatibility at that level.
Ulrich Drepper and Eric Youngdale introduced a much more sophisticated symbol
versioning scheme, which is used by the glibc, the GNU linker, and gold. The
key differences are that versions may be specified in object files and that
shared libraries may contain multiple independent versions of the same symbol.
Versions are specified in object files by naming the symbol `NAME@VERSION` or
`NAME@@VERSION`. In the former case the symbol is a hidden version, available
only by specific request. In the latter case the symbol is a default version,
and references to `NAME` will be linked to `NAME@@VERSION`. Versions may also
be specified in version scripts.
This facility means that in principle it is never necessary to change the
library file name. The versioning scheme lets the dynamic linker direct each
symbol reference to the appropriate version. This in turn means that in a
complicated program with many shared libraries compiled against different
versions of the base library, only one instance of the base library needs to be
loaded.
However, this additional complexity leads to additional ambiguity. There are
now two possible sources of a symbol version: the name in the object file and
an entry in the version script. There is the possibility that two instances of
the same name will disagree on whether the name should be globally visible or
not–in fact, this is normal, as undefined references will always use
`NAME@VERSION`, not `NAME@@VERSION`. Symbol overriding can be confusing: if the
main executable defines `NAME` without a version, which versions should it
override in the shared library? Which version should be used in the program?
Symbol visibility adds an additional wrinkle to this.
The most important issue for the linker arises when it sees both NAME and
`NAME@VERSION`, and then sees `NAME@@VERSION`. At that time the linker has seen
two separate symbols and has to decide whether to merge them. The rules that
gold currently follows are these:
* If `NAME` is hidden, and `NAME@@VERSION` is in a shared object, they are two
independent symbols, and we do not change `NAME` or its version.
* If `NAME` already has a version, because we earlier saw `NAME@@VERSION2`,
then we produce two separate symbols, and leave `NAME@@VERSION2` as the
default symbol.
* Otherwise, we change the version of `NAME` to `VERSION`, and do normal symbol
resolution.
I recently fixed a bug in this code in gold, which was breaking symbol
overriding in a specific case. I wouldn’t be surprised if there are more bugs.
As far as I know nobody has worked through all the symbol combining issues and
defined what should happen.

124
eh_frame.md

@ -0,0 +1,124 @@
# .eh_frame
When gcc generates code that handles exceptions, it produces tables that
describe how to unwind the stack. These tables are found in the `.eh_frame`
section. The format of the `.eh_frame` section is very similar to the format of
a DWARF `.debug_frame` section. Unfortunately, it is not precisely identical. I
don’t know of any documentation which describes this format. The following
should be read in conjunction with the relevant section of the DWARF standard,
available from http://dwarfstd.org.
The `.eh_frame` section is a sequence of records. Each record is either a CIE
(Common Information Entry) or an FDE (Frame Description Entry). In general
there is one CIE per object file, and each CIE is associated with a list of
FDEs. Each FDE is typically associated with a single function. The CIE and the
FDE together describe how to unwind to the caller if the current instruction
pointer is in the range covered by the FDE.
There should be exactly one FDE covering each instruction which may be being
executed when an exception occurs. By default an exception can only occur
during a function call or a throw. When using the `-fnon-call-exceptions` gcc
option, an exception can also occur on most memory references and floating
point operations. When using `-fasynchronous-unwind-tables`, the FDE will cover
every instruction, to permit unwinding from a signal handler.
The general format of a CIE or FDE starts as follows:
* Length of record. Read 4 bytes. If they are not `0xffffffff`, they are the
length of the CIE or FDE record. Otherwise the next 64 bits holds the length,
and this is a 64-bit DWARF format. This is like `.debug_frame`.
* A 4 byte ID. For a CIE this is 0. For an FDE it is the byte offset from this
field to the start of the CIE with which this FDE is associated. The byte
offset goes to the length record of the CIE. A positive value goes backward;
that is, you have to subtract the value of the ID field from the current byte
position to get the CIE position. This differs from `.debug_frame` in that
the offset is relative rather than being an offset into the `.debug_frame`
section.
A CIE record continues as follows:
* 1 byte CIE version. As of this writing this should be 1 or 3.
* NUL terminated augmentation string. This is a sequence of characters. Very
old versions of gcc used the string “eh” here, but I won’t document that.
This is described further below.
* Code alignment factor, an unsigned LEB128 (LEB128 is a DWARF encoding for
numbers which I won’t describe here). This should always be 1 for `.eh_frame`.
* Data alignment factor, a signed LEB128. This is a constant factored out of
offset instructions, as in `.debug_frame`.
* The return address register. In CIE version 1 this is a single byte; in CIE
version 3 this is an unsigned LEB128. This indicates which column in the
frame table represents the return address.
The next fields of the CIE depend on the augmentation string.
* If the augmentation string starts with ‘z’, we now find an unsigned LEB128
which is the length of the augmentation data, rounded up so that the CIE ends
on an address boundary. This is used to skip to the end of the augmentation
data if an unrecognized augmentation character is seen.
* If the next character in the augmentation string is ‘L’, the next byte in the
CIE is the LSDA (Language Specific Data Area) encoding. This is a
`DW_EH_PE_xxx` value (described later). The default is `DW_EH_PE_absptr`.
* If the next character in the augmentation string is ‘R’, the next byte in the
CIE is the FDE encoding. This is a `DW_EH_PE_xxx` value. The default is
`DW_EH_PE_absptr`.
* The character ‘S’ in the augmentation string means that this CIE represents a
stack frame for the invocation of a signal handler. When unwinding the stack,
signal stack frames are handled slightly differently: the instruction pointer
is assumed to be before the next instruction to execute rather than after it.
* If the next character in the augmentation string is ‘P’, the next byte in the
CIE is the personality encoding, a `DW_EH_PE_xxx` value. This is followed by
a pointer to the personality function, encoded using the personality
encoding. I’ll describe the personality function some other day.
The remaining bytes are an array of `DW_CFA_xxx` opcodes which define the
initial values for the frame table. This is then followed by `DW_CFA_nop`
padding bytes as required to match the total length of the CIE.
An FDE starts with the length and ID described above, and then continues as
follows.
* The starting address to which this FDE applies. This is encoded using the FDE
encoding specified by the associated CIE.
* The number of bytes after the start address to which this FDE applies. This
is encoded using the FDE encoding.
* If the CIE augmentation string starts with ‘z’, the FDE next has an unsigned
LEB128 which is the total size of the FDE augmentation data. This may be used
to skip data associated with unrecognized augmentation characters.
* If the CIE does not specify `DW_EH_PE_omit` as the LSDA encoding, the FDE
next has a pointer to the LSDA, encoded as specified by the CIE.
The remaining bytes in the FDE are an array of `DW_CFA_xxx` opcodes which set
values in the frame table for unwinding to the caller.
The `DW_EH_PE_xxx` encodings describe how to encode values in a CIE or FDE. The
basic encoding is as follows:
* `DW_EH_PE_absptr = 0x00`: An absolute pointer. The size is determined by
whether this is a 32-bit or 64-bit address space, and will be 32 or 64 bits.
* `DW_EH_PE_omit = 0xff`: The value is omitted.
* `DW_EH_PE_uleb128 = 0x01`: The value is an unsigned LEB128.
* `DW_EH_PE_udata2 = 0x02`, `DW_EH_PE_udata4 = 0x03`, `DW_EH_PE_udata8 = 0x04`:
The value is stored as unsigned data with the specified number of bytes.
* `DW_EH_PE_signed = 0x08`: A signed number. The size is determined by whether
this is a 32-bit or 64-bit address space. I don’t think this ever appears in
a CIE or FDE in practice.
* `DW_EH_PE_sleb128 = 0x09`: A signed LEB128. Not used in practice.
* `DW_EH_PE_sdata2 = 0x0a`, `DW_EH_PE_sdata4 = 0x0b`, `DW_EH_PE_sdata8 = 0x0c`:
The value is stored as signed data with the specified number of bytes. Not
used in practice.
In addition the above basic encodings, there are modifiers.
* `DW_EH_PE_pcrel = 0x10`: Value is PC relative.
* `DW_EH_PE_textrel = 0x20`: Value is text relative.
* `DW_EH_PE_datarel = 0x30`: Value is data relative.
* `DW_EH_PE_funcrel = 0x40`: Value is relative to start of function.
* `DW_EH_PE_aligned = 0x50`: Value is aligned: padding bytes are inserted as
required to make value be naturally aligned.
* `DW_EH_PE_indirect = 0x80`: This is actually the address of the real value.
If you follow all that, and also read up on `.debug_frame`, then you have
enough information to unwind the stack at runtime, e.g. to implement glibc’s
backtrace function. Later I’ll describe the LSDA and the personality function,
which work together to implement exception catching on top of stack unwinding.

49
eh_frame_hdr.md

@ -0,0 +1,49 @@
# .eh_frame_hdr
If you followed my last post, you will see that in order to unwind the stack
you have to find the FDE associated with a given program counter value. There
are two steps to this problem. The first one is finding the CIEs and FDEs at
all. The second one is, given the set of FDEs, finding the one you need.
The old way this worked was that gcc would create a global constructor which
called the function `__register_frame_info`, passing a pointer to the
`.eh_frame` data and a pointer to the object. The latter pointer would indicate
the shared library, and was used to deregister the information after a dlclose.
When looking for an FDE, the unwinder would walk through the registered frames,
and sort them. Then it would use the sorted list to find the desired FDE.
The old way still works, but these days, at least on GNU/Linux, the sorting is
done at link time, which is better than doing it at runtime. Both gold and the
GNU linker support an option `--eh-frame-hdr` which tell them to construct a
header for all the .eh_frame sections. This header is placed in a section named
.eh_frame_hdr and also in a PT_GNU_EH_FRAME segment. At runtime the unwinder
can find all the `PT_GNU_EH_FRAME` segments by calling `dl_iterate_phdr`.
The format of the `.eh_frame_hdr` section is as follows:
* A 1 byte version number, currently 1.
* A 1 byte encoding of the pointer to the exception frames. This is a
`DW_EH_PE_xxx` value. It is normally `DW_EH_PE_pcrel | DW_EH_PE_sdata4`,
meaning a 4 byte relative offset.
* A 1 byte encoding of the count of the number of FDEs in the lookup table.
This is a `DW_EH_PE_xxx` value. It is normally `DW_EH_PE_udata4`, meaning a 4
byte unsigned count.
* A 1 byte encoding of the entries in the lookup table. This is a
`DW_EH_PE_xxx` value. It is normally `DW_EH_PE_datarel | DW_EH_PE_sdata4`,
meaning a 4 byte offset from the start of the `.eh_frame_hdr` section. That
is the only encoding that gcc’s current unwind library supports.
* A pointer to the contents of the `.eh_frame` section, encoded as indicated by
the second byte in the header. This pointer is only used if the format of the
lookup table is not supported or is for some reason omitted..
* The number of FDE pointers in the table, encoded as indicated by the third
byte in the header. If there are no FDEs, the encoding can be `DW_EH_PE_omit`
and this number will not be present.
* The lookup table itself, starting at a 4-byte aligned address in memory.
Assuming the fourth byte in the header is `DW_EH_PE_datarel | DW_EH_PE_sdata4`,
each entry in the table is 8 bytes long. The first four bytes are an offset
to the initial PC value for the FDE. The last four byte are an offset to the
FDE data itself. The table is sorted by starting PC.
Since FDEs do not overlap, this table is sufficient for the stack unwinder to
quickly find the relevant FDE if there is one.

104
executable-stack.md

@ -0,0 +1,104 @@
# Executable stack
The gcc compiler implements an extension to C: nested functions. A trivial example:
```c
int f() {
int i = 2;
int g(int j) { return i + j; }
return g(3);
}
```
The function `f` will return 5. Note in particular that the nested function `g`
refers to the variable i defined in the enclosing function.
You can mostly treat nested functions as ordinary functions. In particular, you
can take the address of a nested function, and you can pass the resulting
function pointer to another function, that function can make a call through the
function pointer to the nested function, and the nested function will correctly
refer to variables in its caller’s stack frame. I’m not here going to go into
the details of how this is implemented. What I will say is that gcc currently
implements this by writing instructions to the stack and using a pointer to
those instructions. This requires that the stack be executable.
This approach was implemented many years ago, before computers were routinely
attacked. In the hostile Internet environment of today, an area of memory that
is both writable and executable is dangerous, because it gives an attacker
space to create brand new instructions to execute. Since the stack must be
writable, this means that we want to make the stack non-executable if possible.
Since very few programs use nested functions, this is normally possible. But we
don’t want to break those few programs either.
This is how the GNU tools do it on ELF systems such as GNU/Linux. The compiler
adds a new section to all code that it compiles. The section is named
`.note.GNU-stack`. It is empty and not allocated, which means that it takes up
no space at runtime. If the code being compiled does not require an executable
stack—the normal case—the compiler doesn’t set any flags for the section. If
the code does require an executable stack, the compiler sets the
`SHF_EXECINSTR` flag.
When the linker links a program, it checks each input object for a
`.note.GNU-stack` section. If there is no such section, the linker assumes that
the object must be old, and therefore may require an executable stack. If there
is such a section, the linker checks the section flags to see whether the code
requires an executable stack. The linker discards the `.note.GNU-stack`
sections, and creates a `PT_GNU_STACK` segment in the output executable. The
`PT_GNU_STACK` segment is empty and is not part of any `PT_LOAD` segment. The
segment flags `PF_R` and `PF_W` are always set. If the linker has determined
that the program requires an executable stack, it also sets the `PF_X` flag.
When the Linux kernel starts a program, it looks for a `PT_GNU_STACK` segment.
If it does not find one, it sets the stack to be executable (if appropriate for
the architecture). If it does find a `PT_GNU_STACK` segment, it marks the stack
as executable if the segment flags call for it. (It’s possible to override this
and force the kernel to never use an executable stack.) Similarly, the dynamic
linker looks for a `PT_GNU_STACK` in any executable or shared library that it
loads, and changes the stack to be executable if any of them require it.
When this all works smoothly, most programs wind up with a non-executable
stack, which is what we want. The most common reason that this fails these days
is that part of the program is written in assembler, and the assembler code
does not create a `.note.GNU_stack` section. If you write assembler code for
GNU/Linux, you must always be careful to add the appropriate line to your file.
For most targets, the line you want is:
```asm
.section .note.GNU-stack,"",@progbits
```
There are some linker options to control this. The `-z execstack` option tells
the linker to mark the program as requiring an executable stack, regardless of
the input files. The `-z noexecstack` option marks it as not requiring an
executable stack. The gold linker has a `--warn-execstack` option which will
cause the linker to warn about any object which is missing a `.note.GNU-stack`
option or which has an executable `.note.GNU-stack` option.
The execstack program may also be used to query whether a program requires an
executable stack, and to change its setting.
These days we could probably change the default: we could probably say that if
an object file does not have a `.note.GNU-stack` section, then it does not
require an executable stack. That would avoid the problem of files written in
assembler which do not create the section. It’s possible that this would cause
some programs to incorrectly get a non-executable stack, but I think that would
be quite unlikely in practice. An advantage of changing the default would be
that the compiler would not have to create an empty `.note.GNU-stack` section
in all object files.
By the way, there is one thing you can do with a normal function that you can
not do with a nested function: if the nested function refers to any variables
in the enclosing function, you can not return a pointer to the nested function
to the caller. If you do, the variable will disappear, so the variable
reference in the nested function will be dangling reference. It’s worth noting
here that the Go language supports nested function literals which may refer to
variables in the enclosing function, and when using Go this works correctly.
The compiler creates variables on the heap if necessary, so they do not
disappear until the garbage collector determines that nothing refers to them
any more.
Finally, I’ll mention that there are some plans to implement a different scheme
for nested functions in C, one which does not require any memory to be both
writable and executable, but these plans have not yet been implemented. I’ll
leave the implementation as an exercise for the reader.

56
gcc-exception-frames.md

@ -0,0 +1,56 @@
# GCC Exception Frames
When an exception is thrown in C++ and caught by one of the calling functions,
the supporting libraries need to unwind the stack. With gcc this is done using
a variant of DWARF debugging information. The unwind information is loaded at
runtime, but is not read unless an exception is thrown. That means that the
unwind library needs to have some way of finding the appropriate unwind
information at runtime.
On some systems, this is done by registering the exception frame information
when the program starts. The registration is done with a variant of the
handling of C++ constructors. This becomes interesting when one shared library
can throw an exception which is caught by another shared library. It is
possible for such a case to arise when the executable itself never throws
exceptions and therefore has no frames to register. Obviously the unwinder
needs to be able to find the unwind information for both shared libraries,
which means that both shared libraries need to use the same registration
functions. With gcc this is normally ensured by putting the unwind code in a
shared library, `libgcc_s.so`. Each shared library, and sometimes the
executable, will use `libgcc_s.so`. That ensures a single copy of the
registration and unwind functions, so the library will be able to reliably
unwind across shared libraries. With gcc the use of `libgcc_s.so` can be
controlled with the `-shared-libgcc` and `-static-libgcc` options. Normally the
right thing will happen by default.
That approach has a cost: there is an extra shared library, and there is a
small cost of registering the unwind information at program startup or library
load time (and unregistering it if a shared library is unloaded via dlclose).
There is now a better way, which requires linker support.
Both gold and the GNU linker support the command line option `--eh-frame-hdr`.
With this option, when the linker sees the `.eh_frame` sections used to hold
the unwind information, it automatically builds a header. This header is a
sorted array mapping program counter addresses to unwind information. The
header is recorded as a program segment of type `PT_GNU_EH_FRAME`. (This is a
little bit ugly since the `.eh_frame` sections are recognized only by name;
ideally they should have a special section type.)
At runtime, the unwind library can use the `dl_iterate_phdr` function to find
the program segments of the executable and all currently loaded shared
libraries. It can use that to find the `PT_GNU_EH_FRAME` segments, and use the
sorted array in those segments to quickly find the unwind information.
This approach means that no registration functions are required. It also means
that it is not necessary to have a single shared library, since
`dl_iterate_phdr` is available no matter which shared library throws the
exception.
This all only works if you have a linker which supports generating
`PT_GNU_EH_FRAME` sections, if all the shared libraries and the executable are
linked by such a linker, and if you have a working `dl_iterate_phdr` function
in your C library or dynamic linker. I think that pretty much restricts this
approach to GNU/Linux and possibly other free operating systems. For those
scenarios, I hope that gcc will soon be able to stop using `libgcc_s.so` by
default.

157
gcc_except_table.md

@ -0,0 +1,157 @@
# .gcc_except_table
Throwing an exception in C++ requires more than unwinding the stack. As the
program unwinds, local variable destructors must be executed. Catch clauses
must be examined to see if they should catch the exception. Exception
specifications must be checked to see if the exception should be redirected to
the unexpected handler. Similar issues arise in Go, Java, and even C when using
gcc’s cleanup function attribute.
As I described earlier, each CIE in the unwind data may contain a pointer to a
personality function, and each FDE may contain a pointer to the LSDA, the
Language Specific Data Area. Each language has its own personality function.
The LSDA is only used by the personality function, so it could in principle
differ for each language. However, at least for gcc, every language uses the
same format, since the LSDA is generated by the language-independent
middle-end.
The personality function takes five arguments:
1. A int version number, currently 1.
2. A bitmask of actions.
3. An exception class, a 64-bit unsigned integer which is specific to a language.
4. A pointer to information about the specific exception being thrown.
5. Unwinder state information.
The exception class permits code written in one language to work correctly when
an exception is thrown by code written in a different language. The value for
g++ is “GNUCC++\0” (or “GNUCC++\1” for a dependent exception, which is used
when rethrowing an exception). The value for Go is “GNUCGO\0\0”. The exception
specific information can only be examined if the exception class is recognized.
Unwinding the stack for an exception is done in two phases. In the first phase,
the unwinder walks up the stack passing the action `_UA_SEARCH_PHASE` (which
has the value 1) to each personality function that it finds. The personality
function should examine the LSDA to see if there is a handler for the exception
being thrown. It should return `_URC_HANDLER_FOUND` (`6`) if there is or
`_URC_CONTINUE_UNWIND` (`8`) if there isn’t. The search phase will continue
until a handler is found or until the top of the stack is reached. The unwinder
will not actually change anything while walking. If the top of the stack is
reached the unwinder will simply return, and the calling code will take the
appropriate action, which for C++ is to call `std::terminate`. Because of the
two phase unwinding approach, if `std::terminate` dumps core, a backtrace will
show the code which threw the exception.
If a handler is found, the second phase begins. The unwinder walks up the stack
passing the action `_UA_CLEANUP_PHASE` (`2`) to each personality function. The
unwinder will also set `_UA_FORCE_UNWIND` (`8`) in the actions bitmask if the
personality function may not catch the exception, because the unwinding is
happening due to some event like thread cancellation. The unwinder will walk up
the stack until it finds the handler—the stack frame for which the personality
function returned `_URC_HANDLER_FOUND`. When it calls that function, the
unwinder will pass `_UA_HANDLER_FRAME` (`4`) in the actions bitmask. This time,
the unwinder will changes things as it goes, removing stack frames.
In order to run destructors, the personality function will call `_Unwind_SetIP`
on the context parameter to set the program counter to point to the cleanup
routine, and then return `_URC_INSTALL_CONTEXT` (`7`) to tell the unwinder to
branch to the current context. The address which starts the cleanup is known as
a landing pad. The cleanup should do whatever it needs to do, and then call
`_Unwind_Resume`. The exception information needs to be passed to
`_Unwind_Resume`. The personality routine arranges to pass the exception
information to the cleanup by calling `_Unwind_SetGR` passing
`__builtin_eh_return_data_regno(0)` and the exception information passed to the
personality routine. Each target which supports this approach has to dedicate
two registers to holding exception information. This is the first one.
The personality function which finds the handler works pretty much the same
way. It may also use `_Unwind_SetGR` to set a value in
`__builtin_eh_return_data_regno(1)` to indicate which exception was found. The
exception handler may rethrow the exception via `_Unwind_RaiseException` or it
may simply continue a normal execution path.
At this point we’ve seen everything except how the personality function decides
whether it needs to run a cleanup or catch an exception. The personality
function makes this decision based on the LSDA. As mentioned above, while the
LSDA could be language dependent, in practice it is not. There is a different
personality function for each language, but they all do more or less the same
thing, omitting aspects which are not relevant for the language (e.g., there is
a personality function for C, but it only runs cleanups and does not bother to
look for exception handlers).
The LSDA is found in the section `.gcc_except_table` (the personality function
is just a function and lives in the `.text` section as usual). The personality
function gets a pointer to it by calling `_Unwind_GetLanguageSpecificData`. The
LSDA starts with the following fields:
1. A 1 byte encoding of the following field (a `DW_EH_PE_xxx` value).
2. If the encoding is not `DW_EH_PE_omit`, the landing pad base. This is the
base from which landing pad offsets are computed. If this is omitted, the
base comes from calling `_Unwind_GetRegionStart`, which returns the beginning
of the code described by the current FDE. In practice this field is normally
omitted.
3. A 1 byte encoding of the entries in the type table (a `DW_EH_PE_xxx` value).
4. If the encoding is not `DW_EH_PE_omit`, the types table pointer. This is an
unsigned LEB128 value, and is the byte offset from this field to the start
of the types table used for exception matching.
5. A 1 byte encoding of the fields in the call-site table (a `DW_EH_PE_xxx`
value).
6. An unsigned LEB128 value holding the length in bytes of the call-site table.
This header is immediately followed by the call-site table. Each entry in the
call-site table has four fields. The number of bytes in the header gives the
total length. Each entry in the call-site table describes a particular sequence
of instructions within the function that the FDE desribes.
1. The start of the instructions for the current call site, a byte offset from
the landing pad base. This is encoded using the encoding from the header.
2. The length of the instructions for the current call site, in bytes. This is
encoded using the encoding from the header.
3. A pointer to the landing pad for this sequence of instructions, or 0 if
there isn’t one. This is a byte offset from the landing pad base. This is
encoded using the encoding from the header.
4. The action to take, an unsigned LEB128. This is 1 plus a byte offset into
the action table. The value zero means that there is no action.
The call-site table is sorted by the start address field. If the personality
function finds that there is no entry for the current PC in the call-site
table, then there is no exception information. This should not happen in normal
operation, and in C++ will lead to a call to `std::terminate`. If there is an
entry in the call-site table, but the landing pad is zero, then there is
nothing to do: there are no destructors to run or exceptions to catch. This is
a normal case, and the unwinder will simply continue. If the action record is
zero, then there are destructors to run but no exceptions to catch. The
personality function will arrange to run the destructors as described above,
and unwinding will continue.
Otherwise, we have an offset into the action table. Each entry in the action
table is a pair of signed LEB128 values. The first number is a type filter. The
second number is a byte offset to the next entry in the action table. A byte
offset of 0 ends the current set of actions.
A type filter of zero indicates a cleanup, which is the same as an action
record of zero in the call-site table. This means that there is a cleanup to be
called even if none of the types match.
A positive type filter is an index into the types table. This is a negative
index: the value 1 means the entry preceding the types table base, 2 means the
entry before that, etc. The size of entries in the types table comes from the
encoding in the header, as does the base of the types table. Each entry in the
types table is a pointer to a type information structure. If this type
information structure matches the type of the exception, then we have found a
handler for this exception. The type filter value is a switch value will be
passed to the handler in exception register 1. The actual comparison of the
type information, and determining the type information from the exception
pointer, really is language dependent. In C++ this is a pointer to a
`std::type_info` structure. A `NULL` pointer in the types table is a catch-all
handler.
A negative type filter is a byte offset into the types table of a `NULL`
terminated list of pointers to type information structures. If the type of the
current exception does not match any of the entries in the list, then there is
an exception specification error. This is treated as an exception handler with
a negative switch value.
I think that covers everything about how gcc unwinds the stack and throws
exceptions.

23
linker-combreloc.md

@ -0,0 +1,23 @@
# Linker combreloc
The GNU linker has a `-z combreloc` option, which is enabled by default (it can
be turned off via `-z nocombreloc`). I just implemented this in gold as well.
This option directs the linker to sort the dynamic relocations. The sorting is
done in order to optimize the dynamic linker.
The dynamic linker in glibc uses a one element cache when processing relocs: if
a relocation refers to the same symbol as the previous relocation, then the
dynamic linker reuses the value rather than looking up the symbol again. Thus
the dynamic linker gets the best results if the dynamic relocations are sorted
so that all dynamic relocations for a given dynamic symbol are adjacent.
Other than that, the linker sorts together all relative relocations, which
don’t have symbols. Two relative relocations, or two relocations against the
same symbol, are sorted by the address in the output file. This tends to
optimize paging and caching when there are two references from the same page.
This may seem like a micro-optimization, but it can have a real effect on
program startup time, especially if the program has lots of shared libraries.
I’ve seen a case where a program starts up 16% faster because the relocations
were sorted.

56
linker-relro.md

@ -0,0 +1,56 @@
# Linker relro
gcc, the GNU linker, and the glibc dynamic linker cooperate to implement an
idea called read-only relocations, or relro. This permits the linker to
designate a part of an executable or (more commonly) a shared library as being
read-only after dynamic relocations have been applied.
This may be used for read-only global variables which are initialized to
something which requires a relocation, such as the address of a function or a
different global variable. Because the global variable requires a runtime
initialization in the form of a dynamic relocation, it can not be placed in a
read-only segment. However, because it is declared to be constant, and
therefore may not be changed by the program, the dynamic linker can mark it as
read-only after the dynamic relocation has been applied.
For some targets this technique may also be used for the PLT or parts of the
GOT.
Making these pages read-only helps catch some cases of memory corruption, and
making the PLT in particular read-only helps prevent some types of buffer
overflow exploits.
The first step is in gcc. When gcc sees a variable which is constant but
requires a dynamic relocation, it puts it into a section named `.data.rel.ro`
(this functionality unfortunately relies on magic section names). A variable
which requires a dynamic relocation against a local symbol is put into a
`.data.rel.ro.local` section; this helps group such variables together, so that
the dynamic linker may apply the relocations, which will always be `RELATIVE`
relocations, more efficiently, especially when using `combreloc`.
The linker groups `.data.rel.ro` and `.data.rel.ro.local` sections as usual.
The new step is that the linker then emits a `PT_GNU_RELRO` program segment
which covers these sections. If the PLT and/or GOT can be read-only after
dynamic relocations, they are put next to the `.data.rel.ro` sections and also
become part of the new segment. This segment will enclosed within a `PT_LOAD`
segment. The `p_vaddr` field of the `PT_GNU_RELRO` segment gives the virtual
address of the start of the read-only after dynamic relocations code, and the
`p_memsz` field gives its length.
When the dynamic linker sees a `PT_GNU_RELRO` segment, it uses mprotect to mark
the pages as read-only after the dynamic relocations have been applied. Of
course this only works if the segment does in fact cover an entire page. The
linker will try to force this to happen.
Note that the current dynamic linker code will only work correctly if the
`PT_GNU_RELRO` segment starts on a page boundary. This is because the dynamic
linker rounds the `p_vaddr` field down to the previous page boundary. If there is
anything on the page which should not be read-only, the program is likely to
fail at runtime. So in effect the linker must only emit a `PT_GNU_RELRO`
segment if it ensures that it starts on a page boundary.
I see this as a relatively minor security benefit. It is not an optimization as
far as I can see. I am documenting it here as part of my general documentation
of obscure linker features. The current description of this feature in the GNU
linker manual is rather obscure.

83
linkers-1.md

@ -0,0 +1,83 @@
# Linkers part 1
I’ve been working on and off on a new linker. To my surprise, I’ve discovered
in talking about this that some people, even some computer programmers, are
unfamiliar with the details of the linking process. I’ve decided to write some
notes about linkers, with the goal of producing an essay similar to my existing
one about the GNU configure and build system.
As I only have the time to write one thing a day, I’m going to do this on my
blog over time, and gather the final essay together later. I believe that I may
be up to five readers, and I hope y’all will accept this digression into stuff
that matters. I will return to random philosophizing and minding other people’s
business soon enough.
## A Personal Introduction
Who am I to write about linkers?
I wrote my first linker back in 1988, for the AMOS operating system which ran
on Alpha Micro systems. (If you don’t understand the following description,
don’t worry; all will be explained below). I used a single global database to
register all symbols. Object files were checked into the database after they
had been compiled. The link process mainly required identifying the object file
holding the main function. Other objects files were pulled in by reference. I
reverse engineered the object file format, which was undocumented but quite
simple. The goal of all this was speed, and indeed this linker was much faster
than the system one, mainly because of the speed of the database.
I wrote my second linker in 1993 and 1994. This linker was designed and
prototyped by Steve Chamberlain while we both worked at Cygnus Support (later
Cygnus Solutions, later part of Red Hat). This was a complete reimplementation
of the BFD based linker which Steve had written a couple of years before.
The primary target was a.out and COFF. Again the goal was speed, especially
compared to the original BFD based linker. On SunOS 4 this linker was almost as
fast as running the cat program on the input .o files.
The linker I am now working, called gold, on will be my third. It is
exclusively an ELF linker. Once again, the goal is speed, in this case being
faster than my second linker. That linker has been significantly slowed down
over the years by adding support for ELF and for shared libraries. This support
was patched in rather than being designed in. Future plans for the new linker
include support for incremental linking–which is another way of increasing
speed.
There is an obvious pattern here: everybody wants linkers to be faster. This is
because the job which a linker does is uninteresting. The linker is a speed
bump for a developer, a process which takes a relatively long time but adds no
real value. So why do we have linkers at all? That brings us to our next topic.
## A Technical Introduction
What does a linker do?
It’s simple: a linker converts object files into executables and shared
libraries. Let’s look at what that means. For cases where a linker is used,
the software development process consists of writing program code in some
language: e.g., C or C++ or Fortran (but typically not Java, as Java normally
works differently, using a loader rather than a linker). A compiler translates
this program code, which is human readable text, into into another form of
human readable text known as assembly code. Assembly code is a readable form of
the machine language which the computer can execute directly. An assembler is
used to turn this assembly code into an object file. For completeness, I’ll
note that some compilers include an assembler internally, and produce an object
file directly. Either way, this is where things get interesting.
In the old days, when dinosaurs roamed the data centers, many programs were
complete in themselves. In those days there was generally no compiler–people
wrote directly in assembly code–and the assembler actually generated an
executable file which the machine could execute directly. As languages liked
Fortran and Cobol started to appear, people began to think in terms of
libraries of subroutines, which meant that there had to be some way to run the
assembler at two different times, and combine the output into a single
executable file. This required the assembler to generate a different type of
output, which became known as an object file (I have no idea where this name
came from). And a new program was required to combine different object files
together into a single executable. This new program became known as the linker
(the source of this name should be obvious).
Linkers still do the same job today. In the decades that followed, one new
feature has been added: shared libraries.
More tomorrow.

37
linkers-10.md

@ -0,0 +1,37 @@
# Linkers part 10
## Parallel Linking
It is possible to parallelize the linking process somewhat. This can help hide
I/O latency and can take better advantage of modern multi-core systems. My
intention with gold is to use these ideas to speed up the linking process.
The first area which can be parallelized is reading the symbols and relocation
entries of all the input files. The symbols must be processed in order;
otherwise, it will be difficult for the linker to resolve multiple definitions
correctly. In particular all the symbols which are used before an archive must
be fully processed before the archive is processed, or the linker won’t know
which members of the archive to include in the link (I guess I haven’t talked
about archives yet). However, despite these ordering requirements, it can be
beneficial to do the actual I/O in parallel.
After all the symbols and relocations have been read, the linker must complete
the layout of all the input contents. Most of this can not be done in parallel,
as setting the location of one type of contents requires knowing the size of
all the preceding types of contents. While doing the layout, the linker can
determine the final location in the output file of all the data which needs to
be written out.
After layout is complete, the process of reading the contents, applying
relocations, and writing the contents to the output file can be fully
parallelized. Each input file can be processed separately.
Since the final size of the output file is known after the layout phase, it is
possible to use `mmap` for the output file. When not doing relaxation, it is
then possible to read the input contents directly into place in the output
file, and to relocation them in place. This reduces the number of system calls
required, and ideally will permit the operating system to do optimal disk I/O
for the output file.
Just a short entry tonight. More tomorrow.

49
linkers-11.md

@ -0,0 +1,49 @@
# Linkers part 11
## Archives
Archives are a traditional Unix package format. They are created by the `ar`
program, and they are normally named with a `.a` extension. Archives are passed
to a Unix linker with the `-l` option.
Although the `ar` program is capable of creating an archive from any type of
file, it is normally used to put object files into an archive. When it is used
in this way, it creates a symbol table for the archive. The symbol table lists
all the symbols defined by any object file in the archive, and for each symbol
indicates which object file defines it. Originally the symbol table was created
by the `ranlib` program, but these days it is always created by `ar` by default
(despite this, many Makefiles continue to run `ranlib` unnecessarily).
When the linker sees an archive, it looks at the archive’s symbol table. For
each symbol the linker checks whether it has seen an undefined reference to
that symbol without seeing a definition. If that is the case, it pulls the
object file out of the archive and includes it in the link. In other words, the
linker pulls in all the object files which defines symbols which are referenced
but not yet defined.
This operation repeats until no more symbols can be defined by the archive.
This permits object files in an archive to refer to symbols defined by other
object files in the same archive, without worrying about the order in which
they appear.
Note that the linker considers an archive in its position on the command line
relative to other object files and archives. If an object file appears after an
archive on the command line, that archive will not be used to defined symbols
referenced by the object file.
In general the linker will not include archives if they provide a definition
for a common symbol. You will recall that if the linker sees a common symbol
followed by a defined symbol with the same name, it will treat the common
symbol as an undefined reference. That will only happen if there is some other
reason to include the defined symbol in the link; the defined symbol will not
be pulled in from the archive.
There was an interesting twist for common symbols in archives on old
`a.out`-based SunOS systems. If the linker saw a common symbol, and then saw a
common symbol in an archive, it would not include the object file from the
archive, but it would change the size of the common symbol to the size in the
archive if that were larger than the current size. The C library relied on this
behaviour when implementing the `stdin` variable.
My next posting should be on Monday.

110
linkers-12.md

@ -0,0 +1,110 @@
# Linkers part 12
I apologize for the pause in posts. We moved over the weekend. Last Friday AT&T
told me that the new DSL was working at our new house. However, it did not
actually start working outside the house until Wednesday. Then a problem with
the internal wiring meant that it was not working inside the house until today.
I am now finally back online at home.
## Symbol Resolution
I find that symbol resolution is one of the trickier aspects of a linker.
Symbol resolution is what the linker does the second and subsequent times that
it sees a particular symbol. I’ve already touched on the topic in a few
previous entries, but let’s look at it in a bit more depth.
Some symbols are local to a specific object files. We can ignore these for the
purposes of symbol resolution, as by definition the linker will never see them
more than once. In ELF these are the symbols with a binding of `STB_LOCAL`.
In general, symbols are resolved by name: every symbol with the same name is
the same entity. We’ve already seen a few exceptions to that general rule. A
symbol can have a version: two symbols with the same name but different
versions are different symbols. A symbol can have non-default visibility: a
symbol with hidden visibility in one shared library is not the same as a symbol
with the same name in a different shared library.
The characteristics of a symbol which matter for resolution are:
* The symbol name
* The symbol version.
* Whether the symbol is the default version or not.
* Whether the symbol is a definition or a reference or a common symbol.
* The symbol visibility.
* Whether the symbol is weak or strong (i.e., non-weak).
* Whether the symbol is defined in a regular object file being included in the
output, or in a shared library.
* Whether the symbol is thread local.
* Whether the symbol refers to a function or a variable.
The goal of symbol resolution is to determine the final value of the symbol.
After all symbols are resolved, we should know the specific object file or
shared library which defines the symbol, and we should know the symbol’s type,
size, etc. It is possible that some symbols will remain undefined after all the
symbol tables have been read; in general this is only an error if some
relocation refers to that symbol.
At this point I’d like to present a simple algorithm for symbol resolution, but
I don’t think I can. I’ll try to hit all the high points, though. Let’s assume
that we have two symbols with the same name. Let’s call the symbol we saw first
A and the new symbol B. (I’m going to ignore symbol visibility in the algorithm
below; the effects of visibility should be obvious, I hope.)
1. If A has a version:
* If B has a version different from A, they are actually different symbols.
* If B has the same version as A, they are the same symbol; carry on.
* If B does not have a version, and A is the default version of the symbol,
they are the same symbol; carry on.
* Otherwise B is probably a different symbol. But note that if A and B are
both undefined references, then it is possible that A refers to the default
version of the symbol but we don’t yet know that. In that case, if B does
not have a version, A and B really are the same symbol. We can’t tell until
we see the actual definition.
2. If A does not have a version:
* If B does not have a version, they are the same symbol; carry on.
* If B has a version, and it is the default version, they are the same
symbol; carry on.
* Otherwise, B is probably a different symbol, as above.
3. If A is thread local and B is not, or vice-versa, then we have an error.
4. If A is an undefined reference:
* If B is an undefined reference, then we can complete the resolution, and
more or less ignore B.
* If B is a definition or a common symbol, then we can resolve A to B.
5. If A is a strong definition in an object file:
* If B is an undefined reference, then we resolve B to A.
* If B is a strong definition in an object file, then we have a multiple
definition error.
* If B is a weak definition in an object file, then A overrides B. In effect,
B is ignored.
* If B is a common symbol, then we treat B as an undefined reference.
* If B is a definition in a shared library, then A overrides B. The dynamic
linker will change all references to B in the shared library to refer to A
instead.
6. If A is a weak definition in an object file, we act just like the strong
definition case, with one exception: if B is a strong definition in an
object file. In the original SVR4 linker, this case was treated as a
multiple definition error. In the Solaris and GNU linkers, this case is
handled by letting B override A.
7. If A is a common symbol in an object file:
* If B is a common symbol, we set the size of A to be the maximum of the size
of A and the size of B, and then treat B as an undefined reference.
* If B is a definition in a shared library with function type, then A
overrides B (this oddball case is required to correctly handle some Unix
system libraries).
* Otherwise, we treat A as an undefined reference.
8. If A is a definition in a shared library, then if B is a definition in a
regular object (strong or weak), it overrides A. Otherwise we act as though
A were defined in an object file.
9. If A is a common symbol in a shared library, we have a funny case. Symbols
in shared libraries must have addresses, so they can’t be common in the same
sense as symbols in an object file. But ELF does permit symbols in a shared
library to have the type `STT_COMMON` (this is a relatively recent
addition). For purposes of symbol resolution, if A is a common symbol in a
shared library, we still treat it as a definition, unless B is also a common
symbol. In the latter case, B overrides A, and the size of B is set to the
maximum of the size of A and the size of B.
I hope I got all that right.
More tomorrow, assuming the Internet connection holds up.

91
linkers-13.md

@ -0,0 +1,91 @@
# Linkers part 13
## Symbol Versions Redux
I’ve talked about symbol versions from the linker’s point of view. I think it’s
worth discussing them a bit from the user’s point of view.
As I’ve discussed before, symbol versions are an ELF extension designed to
solve a specific problem: making it possible to upgrade a shared library
without changing existing executables. That is, they provide backward
compatibility for shared libraries. There are a number of related problems
which symbol versions do not solve. They do not provide forward compatibility
for shared libraries: if you upgrade your executable, you may need to upgrade
your shared library also (it would be nice to have a feature to build your
executable against an older version of the shared library, but that is
difficult to implement in practice). They only work at the shared library
interface: they do not help with a change to the ABI of a system call, which is
at the kernel interface. They do not help with the problem of sharing
incompatible versions of a shared library, as may happen when a complex
application is built out of several different existing shared libraries which
have incompatible dependencies.
Despite these limitations, shared library backward compatibility is an
important issue. Using symbol versions to ensure backward compatibility
requires a careful and rigorous approach. You must start by applying a version
to every symbol. If a symbol in the shared library does not have a version,
then it is impossible to change it in a backward compatible fashion. Then you
must pay close attention to the ABI of every symbol. If the ABI of a symbol
changes for any reason, you must provide a copy which implements the old ABI.
That copy should be marked with the original version. The new symbol must be
given a new version.
The ABI of a symbol can change in a number of ways. Any change to the parameter
types or the return type of a function is an ABI change. Any change in the type
of a variable is an ABI change. If a parameter or a return type is a struct or
class, then any change in the type of any field is an ABI change–i.e., if a
field in a struct points to another struct, and that struct changes, the ABI
has changed. If a function is defined to return an instance of an enum, and a
new value is added to the enum, that is an ABI change. In other words, even
minor changes can be ABI changes. The question you need to ask is: can existing
code which has already been compiled continue to use the new symbol with no
change? If the answer is no, you have an ABI change, and you must define a new
symbol version.
You must be very careful when writing the symbol implementing the old ABI, if
you don’t just copy the existing code. You must be certain that it really does
implement the old ABI.
There are some special challenges when using C++. Adding a new virtual method
to a class can be an ABI change for any function which uses that class.
Providing the backward compatible version of the class in such a situation is
very awkward–there is no natural way to specify the name and version to use for
the virtual table or the RTTI information for the old version.
Naturally, you must never delete any symbols.
Getting all the details correct, and verifying that you got them correct,
requires great attention to detail. Unfortunately, I don’t know of any tools to
help people write correct version scripts, or to verify them. Still, if
implemented correctly, the results are good: existing executables will continue
to run.
## Static Linking vs. Dynamic Linking
There is, of course, another way to ensure that existing executables will
continue to run: link them statically, without using any shared libraries. That
will limit their ABI issues to the kernel interface, which is normally
significantly smaller than the library interface.
There is a performance tradeoff with static linking. A statically linked
program does not get the benefit of sharing libraries with other programs
executing at the same time. On the other hand, a statically linked program does
not have to pay the performance penalty of position independent code when
executing within the library.
Upgrading the shared library is only possible with dynamic linking. Such an
upgrade can provide bug fixes and better performance. Also, the dynamic linker
can select a version of the shared library appropriate for the specific
platform, which can also help performance.
Static linking permits more reliable testing of the program. You only need to
worry about kernel changes, not about shared library changes.
Some people argue that dynamic linking is always superior. I think there are
benefits on both sides, and which choice is best depends on the specific
circumstances.
More on Monday. If you think I should write about any specific linker related
topics which have not already been mentioned in the comments, please let me
know.

92
linkers-14.md

@ -0,0 +1,92 @@
# Linkers part 14
## Link Time Optimization
I’ve already mentioned some optimizations which are peculiar to the linker:
relaxation and garbage collection of unwanted sections. There is another class
of optimizations which occur at link time, but are really related to the
compiler. The general name for these optimizations is link time optimization or
whole program optimization.
The general idea is that the compiler optimization passes are run at link time.
The advantage of running them at link time is that the compiler can then see
the entire program. This permits the compiler to perform optimizations which
can not be done when sources files are compiled separately. The most obvious
such optimization is inlining functions across source files. Another is
optimizing the calling sequence for simple functions–e.g., passing more
parameters in registers, or knowing that the function will not clobber all
registers; this can only be done when the compiler can see all callers of the
function. Experience shows that these and other optimizations can bring
significant performance benefits.
Generally these optimizations are implemented by having the compiler write a
version of its intermediate representation into the object file, or into some
parallel file. The intermediate representation will be the parsed version of
the source file, and may already have had some local optimizations applied.
Sometimes the object file contains only the compiler intermediate
representation, sometimes it also contains the usual object code. In the former
case link time optimization is required, in the latter case it is optional.
I know of two typical ways to implement link time optimization. The first
approach is for the compiler to provide a pre-linker. The pre-linker examines
the object files looking for stored intermediate representation. When it finds
some, it runs the link time optimization passes. The second approach is for the
linker proper to call back into the compiler when it finds intermediate
representation. This is generally done via some sort of plugin API.
Although these optimizations happen at link time, they are not part of the
linker proper, at least not as I defined it. When the compiler reads the stored
intermediate representation, it will eventually generate an object file, one
way or another. The linker proper will then process that object file as usual.
These optimizations should be thought of as part of the compiler.
## Initialization Code
C++ permits globals variables to have constructors and destructors. The global
constructors must be run before main starts, and the global destructors must be
run after exit is called. Making this work requires the compiler and the linker
to cooperate.
The a.out object file format is rarely used these days, but the GNU a.out
linker has an interesting extension. In a.out symbols have a one byte type
field. This encodes a bunch of debugging information, and also the section in
which the symbol is defined. The a.out object file format only supports three
sections–text, data, and bss. Four symbol types are defined as sets: text set,
data set, bss set, and absolute set. A symbol with a set type is permitted to
be defined multiple times. The GNU linker will not give a multiple definition
error, but will instead build a table with all the values of the symbol. The
table will start with one word holding the number of entries, and will end with
a zero word. In the output file the set symbol will be defined as the address
of the start of the table.
For each C++ global constructor, the compiler would generate a symbol named
`__CTOR_LIST__` with the text set type. The value of the symbol in the object
file would be the global constructor function. The linker would gather together
all the `__CTOR_LIST__` functions into a table. The startup code supplied by
the compiler would walk down the `__CTOR_LIST__` table and call each function.
Global destructors were handled similarly, with the name `__DTOR_LIST__`.
Anyhow, so much for a.out. In ELF, global constructors are handled in a fairly
similar way, but without using magic symbol types. I’ll describe what gcc does.
An object file which defines a global constructor will include a `.ctors`
section. The compiler will arrange to link special object files at the very
start and very end of the link. The one at the start of the link will define a
symbol for the `.ctors` section; that symbol will wind up at the start of the
section. The one at the end of the link will define a symbol for the end of the
`.ctors` section. The compiler startup code will walk between the two symbols,
calling the constructors. Global destructors work similarly, in a `.dtors`
section.
ELF shared libraries work similarly. When the dynamic linker loads a shared
library, it will call the function at the `DT_INIT` tag if there is one. By
convention the ELF program linker will set this to the function named `_init`,
if there is one. Similarly the `DT_FINI` tag is called when a shared library is
unloaded, and the program linker will set this to the function named `_fini`.
As I mentioned earlier, three are also `DT_INIT_ARRAY`, `DT_PREINIT_ARRAY`, and
`DT_FINI_ARRAY` tags, which are set based on the `SHT_INIT_ARRAY`,
`SHT_PREINIT_ARRAY`, and `SHT_FINI_ARRAY` section types. This is a newer
approach in ELF, and does not require relying on special symbol names.
More tomorrow.

66
linkers-15.md

@ -0,0 +1,66 @@
# Linkers part 15
## COMDAT sections
In C++ there are several constructs which do not clearly live in a single
place. Examples are inline functions defined in a header file, virtual tables,
and typeinfo objects. There must be only a single instance of each of these
constructs in the final linked program (actually we could probably get away
with multiple copies of a virtual table, but the others must be unique since it
is possible to take their address). Unfortunately, there is not necessarily a
single object file in which they should be generated. These types of constructs
are sometimes described as having vague linkage.
Linkers implement these features by using *COMDAT* sections (there may be other
approaches, but this is the only I know of). COMDAT sections are a special type
of section. Each COMDAT section has a special string. When the linker sees
multiple COMDAT sections with the same special string, it will only keep one of
them.
For example, when the C++ compiler sees an inline function `f1` defined in a
header file, but the compiler is unable to inline the function in all uses
(perhaps because something takes the address of the function), the compiler
will emit `f1` in a COMDAT section associated with the string `f1`. After the
linker sees a COMDAT section `f1`, it will discard all subsequent `f1` COMDAT
sections.
This obviously raises the possibility that there will be two entirely different
inline functions named `f1`, defined in different header files. This would be
an invalid C++ program, violating the One Definition Rule (often abbreviated
ODR). Unfortunately, if no source file included both header files, the
compiler would be unable to diagnose the error. And, unfortunately, the linker
would simply discard the duplicate COMDAT sections, and would not notice the
error either. This is an area where some improvements are needed (at least in
the GNU tools; I don’t know whether any other tools diagnose this error
correctly).
The Microsoft PE object file format provides COMDAT sections. These sections
can be marked so that duplicate COMDAT sections which do not have identical
contents cause an error. That is not as helpful as it seems, as different
compiler options may cause valid duplicates to have different contents. The
string associated with a COMDAT section is stored in the symbol table.
Before I learned about the Microsoft PE format, I introduced a different type
of COMDAT sections into the GNU ELF linker, following a suggestion from Jason
Merrill. Any section whose name starts with “.gnu.linkonce.” is a COMDAT
section. The associated string is simply the section name itself. Thus the
inline function `f1` would be put into the section “.gnu.linkonce.f1”. This
simple implementation works well enough, but it has a flaw in that some
functions require data in multiple sections; e.g., the instructions may be in
one section and associated static data may be in another section. Since
different instances of the inline function may be compiled differently, the
linker can not reliably and consistently discard duplicate data (I don’t know
how the Microsoft linker handles this problem).
Recent versions of ELF introduce section groups. These implement an officially
sanctioned version of COMDAT in ELF, and avoid the problem of “.gnu.linkonce”
sections. I described these briefly in an earlier blog entry. A special section
of type `SHT_GROUP` contains a list of section indices in the group. The group
is retained or discarded as a whole. The string associated with the group is
found in the symbol table. Putting the string in the symbol table makes it
awkward to retrieve, but since the string is generally the name of a symbol it
means that the string only needs to be stored once in the object file; this is
a minor optimization for C++ in which symbol names may be very long.
More tomorrow.

87
linkers-16.md

@ -0,0 +1,87 @@
# Linkers part 16
## C++ Template Instantiation
There is still more C++ fun at link time, though somewhat less related to the
linker proper. A C++ program can declare templates, and instantiate them with
specific types. Ideally those specific instantiations will only appear once in
a program, not once per source file which instantiates the templates. There are
a few ways to make this work.
For object file formats which support COMDAT and vague linkage, which I
described yesterday, the simplest and most reliable mechanism is for the
compiler to generate all the template instantiations required for a source file
and put them into the object file. They should be marked as COMDAT, so that the
linker discards all but one copy. This ensures that all template instantiations
will be available at link time, and that the executable will have only one
copy. This is what gcc does by default for systems which support it. The
obvious disadvantages are the time required to compile all the duplicate
template instantiations and the space they take up in the object files. This is
sometimes called the Borland model, as this is what Borland’s C++ compiler did.
Another approach is to not generate any of the template instantiations at
compile time. Instead, when linking, if we need a template instantiation which
is not found, invoke the compiler to build it. This can be done either by
running the linker and looking for error messages or by using a linker plugin
to handle an undefined symbol error. The difficulties with this approach are to
find the source code to compile and to find the right options to pass to the
compiler. Typically the source code is placed into a repository file of some
sort at compile time, so that it is available at link time. The complexities of
getting the compilation steps right are why this approach is not the default.
When it works, though, it can be faster than the duplicate instantiation
approach. This is sometimes called the Cfront model.
gcc also supports explicit template instantiation, which can be used to control
exactly where templates are instantiated. This approach can work if you have
complete control over your source code base, and can instantiate all required
templates in some central place. This approach is used for gcc’s C++ library,
libstdc++.
C++ defines a keyword export which is supposed to permit exporting template
definitions in such a way that they can be read back in by the compiler. gcc
does not support this keyword. If it worked, it could be a slightly more
reliable way of using a repository when using the Cfront model.
## Exception Frames
C++ and other languages support exceptions. When an exception is thrown in one
function and caught in another, the program needs to reset the stack pointer
and registers to the point where the exception is caught. While resetting the
stack pointer, the program needs to identify all local variables in the part of
the stack being discarded, and run their destructors if any. This process is
known as unwinding the stack.
The information needed to unwind the stack is normally stored in tables in the
program. Supporting library code is used to read the tables and perform the
necessary operations. I’m not going to describe the details of those tables
here. However, there is a linker optimization which applies to them.
The support libraries need to be able to find the exception tables at runtime
when an exception occurs. An exception can be thrown in one shared library and
caught in a different shared library, so finding all the required exception
tables can be a nontrivial operation. One approach that can be used is to
register the exception tables at program startup time or shared library load
time. The registration can be done at the right time using the global
constructor mechanism.
However, this approach imposes a runtime cost for exceptions, in that it takes
longer for the program to start. Therefore, this is not ideal. The linker can
optimize this by building tables which can be used to find the exception
tables. The tables built by the GNU linker are sorted for fast lookup by the
runtime library. The tables are put into a `PT_GNU_EH_FRAME` segment. The
supporting libraries then need a way to look up a segment of this type. This is
done via the `dl_iterate_phdr` API provided by the GNU dynamic linker.
Note that if the compiler believes that the linker will generate a
`PT_GNU_EH_FRAME` segment, it won’t generate the startup code to register the
exception tables. Thus the linker must not fail to create this segment.
Since the GNU linker needs to look at the exception tables in order to generate
the `PT_GNU_EH_FRAME` segment, it will also optimize by discarding duplicate
exception table information.
I know this is section is rather short on details. I hope the general idea is
clear.
More tomorrow.

29
linkers-17.md

@ -0,0 +1,29 @@
# Linkers part 17
## Warning Symbols
The GNU linker supports a weird extension to ELF used to issue warnings when
symbols are referenced at link time. This was originally implemented for a.out
using a special symbol type. For ELF, I implemented it using a special section
name.
If you create a section named `.gnu.warning.SYMBOL`, then if and when the
linker sees an undefined reference to `SYMBOL`, it will issue a warning. The
warning is triggered by seeing an undefined symbol with the right name in an
object file. Unlike the warning about an undefined symbol, it is not triggered
by seeing a relocation entry. The text of the warning is simply the contents of
the `.gnu.warning.SYMBOL` section.
The GNU C library uses this feature to warn about references to symbols like
`gets` which are required by standards but are generally considered to be
unsafe. This is done by creating a section named `.gnu.warning.gets` in the
same object file which defines `gets`.
The GNU linker also supports another type of warning, triggered by sections
named `.gnu.warning` (without the symbol name). If an object file with a
section of that name is included in the link, the linker will issue a warning.
Again, the text of the warning is simply the contents of the `.gnu.warning`
section. I don’t know if anybody actually uses this feature.
Short entry today, more tomorrow.

53
linkers-18.md

@ -0,0 +1,53 @@
# Linkers part 18
## Incremental Linking
Often a programmer will make change a single source file and recompile and
relink the application. A standard linker will need to read all the input
objects and libraries in order to regenerate the executable with the change.
For a large application, this is a lot of work. If only one input object file
changed, it is a lot more work than really needs to be done. One solution is to
use an incremental linker. An incremental linker makes incremental changes to
an existing executable or shared library, rather than rebuilding them from
scratch.
I’ve never actually written or worked on an incremental linker, but the general
idea is straightforward enough. When the linker writes the output file, it must
attach additional information.
* The linker must create a mapping of object files to areas in the output file,
so that an incremental link will know what to remove when replacing an object
file.
* The linker must retain all the relocations for each input object which refer
to symbols defined in other objects, so that it can reprocess them when
symbols change. The linker should store the relocations mapped by symbol, so
that it can quickly find the relevant relocations.
* The linker should leave extra space in the text and data segments, to allow
for object files to grow to a limited extent without requiring rewriting the
whole executable. It must keep a map of where this extra space is, as it will
tend to move over time over the course of incremental links.
* The linker should keep a list of object file timestamps in the output file,
so that it can quickly determine which objects have changed.
With this information, the linker can identify which object files have changed
since the last time the output file was linked, and replace them in the
existing output file. When an object file changes, the linker can identify all
the relocations which refer to symbols defined in the object file, and
reprocess them.
When an object file gets too large to fit in the available space in a text or
data segment, then the linker has the option of creating additional text or
data segments at different addresses. This requires some care to ensure that
the new code does not collide with the heap, depending upon how the local
malloc implementation works. Alternatively, the incremental linker could fall
back on doing a full link, and allocating more space again.
Incremental linking can greatly speed up the edit/compile/debug cycle.
Unfortunately it is not implemented in most common linkers. Of course an
incremental link is not equivalent to a final link, and in particular some
linker optimizations are difficult to implement while acting incrementally. An
incremental link is really only suitable for use during the development cycle,
which is course the time when the speed of the linker is most important.
More on Monday.

139
linkers-19.md

@ -0,0 +1,139 @@
# Linkers part 19
I’ve pretty much run out of linker topics. Unless I think of something new, I’ll make tomorrow’s post be the last one, for a total of 20.
## __start and __stop Symbols
A quick note about another GNU linker extension. If the linker sees a section
in the output file which can be part of a C variable name–the name contains
only alphanumeric characters or underscore–the linker will automatically define
symbols marking the start and stop of the section. Note that this is not true
of most section names, as by convention most section names start with a period.
But the name of a section can be any string; it doesn’t have to start with a
period. And when that happens for section `NAME`, the GNU linker will define
the symbols `__start_NAME` and `__stop_NAME` to the address of the beginning
and the end of section, respectively.
This is convenient for collecting some information in several different object
files, and then referring to it in the code. For example, the GNU C library
uses this to keep a list of functions which may be called to free memory. The
`__start` and `__stop` symbols are used to walk through the list.
In C code, these symbols should be declared as something like extern char
`__start_NAME[]`. For an extern array the value of the symbol and the value of
the variable are the same.
## Byte Swapping
The new linker I am working on, gold, is written in C++. One of the attractions
was to use template specialization to do efficient byte swapping. Any linker
which can be used in a cross-compiler needs to be able to swap bytes when
writing them out, in order to generate code for a big-endian system while
running on a little-endian system, or vice-versa. The GNU linker always stores
data into memory a byte at a time, which is unnecessary for a native linker.
Measurements from a few years ago showed that this took about 5% of the
linker’s CPU time. Since the native linker is by far the most common case, it
is worth avoiding this penalty.
In C++, this can be done using templates and template specialization. The idea
is to write a template for writing out the data. Then provide two
specializations of the template, one for a linker of the same endianness and
one for a linker of the opposite endianness. Then pick the one to use at
compile time. The code looks this; I’m only showing the 16-bit case for
simplicity.
```cpp
// Endian simply indicates whether the host is big endian or not.
struct Endian
{
public:
// Used for template specializations.
static const bool host_big_endian = __BYTE_ORDER == __BIG_ENDIAN;
};
// Valtype_base is a template based on size (8, 16, 32, 64) which
// defines the type Valtype as the unsigned integer of the specified
// size.
template
struct Valtype_base;
template<>
struct Valtype_base<16>
{
typedef uint16_t Valtype;
};
// Convert_endian is a template based on size and on whether the host
// and target have the same endianness. It defines the type Valtype
// as Valtype_base does, and also defines a function convert_host
// which takes an argument of type Valtype and returns the same value,
// but swapped if the host and target have different endianness.
template
struct Convert_endian;
template
struct Convert_endian
{
typedef typename Valtype_base::Valtype Valtype;
static inline Valtype
convert_host(Valtype v)
{ return v; }
};
template<>
struct Convert_endian<16, false>
{
typedef Valtype_base<16>::Valtype Valtype;
static inline Valtype
convert_host(Valtype v)
{ return bswap_16(v); }
};
// Convert is a template based on size and on whether the target is
// big endian. It defines Valtype and convert_host like
// Convert_endian. That is, it is just like Convert_endian except in
// the meaning of the second template parameter.
template
struct Convert
{
typedef typename Valtype_base::Valtype Valtype;
static inline Valtype
convert_host(Valtype v)
{
return Convert_endian
::convert_host(v);
}
};
// Swap is a template based on size and on whether the target is big
// endian. It defines the type Valtype and the functions readval and
// writeval. The functions read and write values of the appropriate
// size out of buffers, swapping them if necessary.
template
struct Swap
{
typedef typename Valtype_base::Valtype Valtype;
static inline Valtype
readval(const Valtype* wv)
{ return Convert::convert_host(*wv); }
static inline void
writeval(Valtype* wv, Valtype v)
{ *wv = Convert::convert_host(v); }
};
```
Now, for example, the linker reads a 16-bit big-endian value using
`Swap<16,true>::readval`. This works because the linker always knows how much
data to swap in, and it always knows whether it is reading big- or
little-endian data.

107
linkers-2.md

@ -0,0 +1,107 @@
# Linkers part 2
I’m back, and I’m still doing the linker technical introduction.
Shared libraries were invented as an optimization for virtual memory systems
running many processes simultaneously. People noticed that there is a set of
basic functions which appear in almost every program. Before shared libraries,
in a system which runs multiple processes simultaneously, that meant that
almost every process had a copy of exactly the same code. This suggested that
on a virtual memory system it would be possible to arrange that code so that a
single copy could be shared by every process using it. The virtual memory
system would be used to map the single copy into the address space of each
process which needed it. This would require less physical memory to run
multiple programs, and thus yield better performance.
I believe the first implementation of shared libraries was on SVR3, based on
COFF. This implementation was simple, and basically assigned each shared
library a fixed portion of the virtual address space. This did not require any
significant changes to the linker. However, requiring each shared library to
reserve an appropriate portion of the virtual address space was inconvenient.
SunOS4 introduced a more flexible version of shared libraries, which was later
picked up by SVR4. This implementation postponed some of the operation of the
linker to runtime. When the program started, it would automatically run a
limited version of the linker which would link the program proper with the
shared libraries. The version of the linker which runs when the program starts
is known as the dynamic linker. When it is necessary to distinguish them, I
will refer to the version of the linker which creates the program as the
program linker. This type of shared libraries was a significant change to the
traditional program linker: it now had to build linking information which could
be used efficiently at runtime by the dynamic linker.
That is the end of the introduction. You should now understand the basics of
what a linker does. I will now turn to how it does it.
## Basic Linker Data Types
The linker operates on a small number of basic data types: symbols,
relocations, and contents. These are defined in the input object files. Here is
an overview of each of these.
A symbol is basically a name and a value. Many symbols represent static objects
in the original source code–that is, objects which exist in a single place for
the duration of the program. For example, in an object file generated from C
code, there will be a symbol for each function and for each global and static
variable. The value of such a symbol is simply an offset into the contents.
This type of symbol is known as a defined symbol. It’s important not to confuse
the value of the symbol representing the variable `my_global_var` with the
value of `my_global_var` itself. The value of the symbol is roughly the address
of the variable: the value you would get from the expression
`&my_global_var` in C.
Symbols are also used to indicate a reference to a name defined in a different
object file. Such a reference is known as an undefined symbol. There are other
less commonly used types of symbols which I will describe later.
During the linking process, the linker will assign an address to each defined
symbol, and will resolve each undefined symbol by finding a defined symbol with
the same name.
A relocation is a computation to perform on the contents. Most relocations
refer to a symbol and to an offset within the contents. Many relocations will
also provide an additional operand, known as the addend. A simple, and commonly
used, relocation is “set this location in the contents to the value of this
symbol plus this addend.” The types of computations that relocations do are
inherently dependent on the architecture of the processor for which the linker
is generating code. For example, RISC processors which require two or more
instructions to form a memory address will have separate relocations to be
used with each of those instructions; for example, “set this location in the
contents to the lower 16 bits of the value of this symbol.”
During the linking process, the linker will perform all of the relocation
computations as directed. A relocation in an object file may refer to an
undefined symbol. If the linker is unable to resolve that symbol, it will
normally issue an error (but not always: for some symbol types or some
relocation types an error may not be appropriate).
The contents are what memory should look like during the execution of the
program. Contents have a size, an array of bytes, and a type. They contain the
machine code generated by the compiler and assembler (known as text). They
contain the values of initialized variables (data). They contain static
unnamed data like string constants and switch tables (read-only data or rdata).
They contain uninitialized variables, in which case the array of bytes is
generally omitted and assumed to contain only zeroes (bss). The compiler and
the assembler work hard to generate exactly the right contents, but the linker
really doesn’t care about them except as raw data. The linker reads the
contents from each file, concatenates them all together sorted by type,
applies the relocations, and writes the result into the executable file.
## Basic Linker Operation
At this point we already know enough to understand the basic steps used by
every linker.
* Read the input object files. Determine the length and type of the contents.
Read the symbols.
* Build a symbol table containing all the symbols, linking undefined symbols to
their definitions.
* Decide where all the contents should go in the output executable file, which
means deciding where they should go in memory when the program runs.
* Read the contents data and the relocations. Apply the relocations to the
contents. Write the result to the output file.
* Optionally write out the complete symbol table with the final values of the
symbols.
More tomorrow.

34
linkers-20.md

@ -0,0 +1,34 @@
# Linkers part 20
This will be my last blog posting on linkers for the time being. Tomorrow my
blog will return to its usual trivialities. People who are specifically
interested in linker information are warned to stop reading with this post.
I’ll close the series with a short update on gold, the new linker I’ve been
working on. It currently (September 25, 2007) can create executables. It can
not create shared libraries or relocateable objects. It has very limited
support for linker scripts–enough to read `/usr/lib/libc.so` on a GNU/Linux
system. It doesn’t have any interesting new features at this point. It only
supports x86. The focus to date has been entirely on speed. It is written to be
multi-threaded, but the threading support has not been hooked in yet.
By way of example, when linking a 900M C++ executable, the GNU linker (version
2.16.91 20060118 on an Ubuntu based system) took 700 seconds of user time, 24
seconds of system time, and 16 minutes of wall time. gold took 7 seconds of
user time, 3 seconds of system time, and 30 seconds of wall time. So while I
can’t promise that it will stay as fast as all features are added, it’s in a
pretty good position at the moment.
I’m the main developer on gold, but I’m not the only person working on it. A
few other people are also making improvements.
The goal is to release gold as a free program, ideally as part of the GNU
binutils. I want it to be more nearly feature complete before doing this,
though. It needs to at least support `-shared` and `-r`. I doubt gold will ever
support all of the features of the GNU linker. I doubt it will ever support the
full GNU linker script language, although I do plan to support enough to link
the Linux kernel.
Future plans for gold, once it actually works, include incremental linking and
more far-reaching speed improvements.

90
linkers-3.md

@ -0,0 +1,90 @@
# Linkers part 3
Continuing notes on linkers.
## Address Spaces
An address space is simply a view of memory, in which each byte has an address.
The linker deals with three distinct types of address space.
Every input object file is a small address space: the contents have addresses,
and the symbols and relocations refer to the contents by addresses.
The output program will be placed at some location in memory when it runs.
This is the output address space, which I generally refer to as using virtual
memory addresses.
The output program will be loaded at some location in memory. This is the load
memory address. On typical Unix systems virtual memory addresses and load
memory addresses are the same. On embedded systems they are often different;
for example, the initialized data (the initial contents of global or static
variables) may be loaded into ROM at the load memory address, and then copied
into RAM at the virtual memory address.
Shared libraries can normally be run at different virtual memory address in
different processes. A shared library has a base address when it is created;
this is often simply zero. When the dynamic linker copies the shared library
into the virtual memory space of a process, it must apply relocations to
adjust the shared library to run at its virtual memory address. Shared library
systems minimize the number of relocations which must be applied, since they
take time when starting the program.
## Object File Formats
As I said above, an assembler turns human readable assembly language into an
object file. An object file is a binary data file written in a format designed
as input to the linker. The linker generates an executable file. This
executable file is a binary data file written in a format designed as input for
the operating system or the loader (this is true even when linking dynamically,
as normally the operating system loads the executable before invoking the
dynamic linker to begin running the program). There is no logical requirement
that the object file format resemble the executable file format. However,
in practice they are normally very similar.
Most object file formats define sections. A section typically holds memory
contents, or it may be used to hold other types of data. Sections generally
have a name, a type, a size, an address, and an associated array of data.
Object file formats may be classed in two general types: record oriented and
section oriented.
A record oriented object file format defines a series of records of varying
size. Each record starts with some special code, and may be followed by data.
Reading the object file requires reading it from the begininng and processing
each record. Records are used to describe symbols and sections. Relocations may
be associated with sections or may be specified by other records. IEEE-695
and Mach-O are record oriented object file formats used today.
In a section oriented object file format the file header describes a section
table with a specified number of sections. Symbols may appear in a separate
part of the object file described by the file header, or they may appear in a
special section. Relocations may be attached to sections, or they may appear in
separate sections. The object file may be read by reading the section table,
and then reading specific sections directly. ELF, COFF, PE, and a.out are
section oriented object file formats.
Every object file format needs to be able to represent debugging information.
Debugging informations is generated by the compiler and read by the debugger.
In general the linker can just treat it like any other type of data. However,
in practice the debugging information for a program can be larger than the
actual program itself. The linker can use various techniques to reduce the
amount of debugging information, thus reducing the size of the executable.
This can speed up the link, but requires the linker to understand the
debugging information.
The a.out object file format stores debugging information using special strings
in the symbol table, known as stabs. These special strings are simply the names
of symbols with a special type. This technique is also used by some variants of
ECOFF, and by older versions of Mach-O.
The COFF object file format stores debugging information using special fields
in the symbol table. This type information is limited, and is completely
inadequate for C++. A common technique to work around these limitations is to
embed stabs strings in a COFF section.
The ELF object file format stores debugging information in sections with
special names. The debugging information can be stabs strings or the DWARF
debugging format.
More next week.

177
linkers-4.md

@ -0,0 +1,177 @@
# Linkers part 4
## Shared Libraries
We’ve talked a bit about what object files and executables look like, so what
do shared libraries look like? I’m going to focus on ELF shared libraries as
used in SVR4 (and GNU/Linux, etc.), as they are the most flexible shared
library implementation and the one I know best.
Windows shared libraries, known as DLLs, are less flexible in that you have to
compile code differently depending on whether it will go into a shared library
or not. You also have to express symbol visibility in the source code. This is
not inherently bad, and indeed ELF has picked up some of these ideas over time,
but the ELF format makes more decisions at link time and is thus more powerful.
When the program linker creates a shared library, it does not yet know which
virtual address that shared library will run at. In fact, in different
processes, the same shared library will run at different address, depending on
the decisions made by the dynamic linker. This means that shared library code
must be position independent. More precisely, it must be position independent
after the dynamic linker has finished loading it. It is always possible for the
dynamic linker to convert any piece of code to run at any virtual address,
given sufficient relocation information. However, performing the reloc
computations must be done every time the program starts, implying that it will
start more slowly. Therefore, any shared library system seeks to generate
position independent code which requires a minimal number of relocations to be
applied at runtime, while still running at close to the runtime efficiency of
position dependent code.
An additional complexity is that ELF shared libraries were designed to be
roughly equivalent to ordinary archives. This means that by default the main
executable may override symbols in the shared library, such that references in
the shared library will call the definition in the executable, even if the
shared library also defines that same symbol. For example, an executable may
define its own version of `malloc`. The C library also defines `malloc`, and
the C library contains code which calls `malloc`. If the executable defines
`malloc` itself, it will override the function in the C library. When some
other function in the C library calls `malloc`, it will call the definition in
the executable, not the definition in the C library.
There are thus different requirements pulling in different directions for any
specific ELF implementation. The right implementation choices will depend on
the characteristics of the processor. That said, most, but not all, processors
make fairly similar decisions. I will describe the common case here. An example
of a processor which uses the common case is the i386; an example of a
processor which make some different decisions is the PowerPC.
In the common case, code may be compiled in two different modes. By default,
code is position dependent. Putting position dependent code into a shared
library will cause the program linker to generate a lot of relocation
information, and cause the dynamic linker to do a lot of processing at
runtime. Code may also be compiled in position independent mode, typically
with the `-fpic` option. Position independent code is slightly slower when it
calls a non-static function or refers to a global or static variable. However,
it requires much less relocation information, and thus the dynamic linker will
start the program faster.
Position independent code will call non-static functions via the *Procedure
Linkage Table* or *PLT*. This PLT does not exist in .o files. In a .o file, use
of the PLT is indicated by a special relocation. When the program linker
processes such a relocation, it will create an entry in the PLT. It will
adjust the instruction such that it becomes a PC-relative call to the PLT