add stuff

This commit is contained in:
Triss 2021-01-12 21:17:52 +01:00
parent f54c03cf01
commit bd3524e516
32 changed files with 2958 additions and 1 deletions

View File

@ -1,3 +1,45 @@
# airs-notes # airs-notes
Collection of ELF and GOLD linker notes from AIRS' blog, for easier searching ## Source
https://www.airs.com/blog/index.php?s=linkers+part
Authored and copyright by Ian Lance Taylor, collected here fore easy lookup.
## Index
[Linkers part 1: introduction](/linkers-1.md)
[Linkers part 2: technial introduction](/linkers-2.md)
[Linkers part 3: address spaces, object file formats](/linkers-3.md)
[Linkers part 4: shared libraries](/linkers-4.md)
[Linkers part 5: shared libraries redux, ELF symbols](/linkers-5.md)
[Linkers part 6: relocations, position-dependent libraries](/linkers-6.md)
[Linkers part 7: thread-local storage](/linkers-7.md)
[Linkers part 8: ELF segments and sections](/linkers-8.md)
[Linkers part 9: symbol versions, relaxation](/linkers-9.md)
[Linkers part 10: parallel linking](/linkers-10.md)
[Linkers part 11: archives](/linkers-11.md)
[Linkers part 12: symbol resolution](/linkers-12.md)
[Linkers part 13: symbol versions redux](/linkers-13.md)
[Linkers part 14: link-time optimization, initialization code](/linkers-14.md)
[Linkers part 15: COMDAT sections](/linkers-15.md)
[Linkers part 16: C++ template instantiation, exception frames](/linkers-16.md)
[Linkers part 17: warning symbols](/linkers-17.md)
[Linkers part 18: incremental linking](/linkers-18.md)
[Linkers part 19: `__start` and `__stop` symbols, byte swapping](/linkers-19.md)
[Linkers part 20: ending note](/linkers-20.md)
Other articles included as well:
[GCC exception frames](/gcc-exception-frames.md)
[Linker combreloc](/linker-combreloc.md)
[Linker relro](/linker-relro.md)
[Combining versions](/combining-versions.md)
[Version scripts](/version-scripts.md)
[Protected symbols](/protected-symbols.md)
[`.eh_frame`](/eh_frame.md)
[`.eh_frame_hdr`](/eh_frame_hdr.md)
[`.gcc_except_table`](/gcc_except_table.md)
[Executable stack](/executable-stack.md)
[Piece of PIE](/piece-of-pie.md)

58
combining-versions.md Normal file
View File

@ -0,0 +1,58 @@
# Combining versions
Sun introduced a symbol versioning scheme to use for the linker. Their
implementation is relatively simple: symbol versions are defined in a version
script provided when a shared library was created. The dynamic linker can
verify that all required versions are present. This is useful for ensuring that
an application can run with a specific version of the library.
In the Sun versioning scheme, when a symbol is changed to have an incompatible
interface, the library file name must change. This then produces a new
`DT_SONAME` entry, which leads to new `DT_NEEDED` entries, and thus manages
incompatibility at that level.
Ulrich Drepper and Eric Youngdale introduced a much more sophisticated symbol
versioning scheme, which is used by the glibc, the GNU linker, and gold. The
key differences are that versions may be specified in object files and that
shared libraries may contain multiple independent versions of the same symbol.
Versions are specified in object files by naming the symbol `NAME@VERSION` or
`NAME@@VERSION`. In the former case the symbol is a hidden version, available
only by specific request. In the latter case the symbol is a default version,
and references to `NAME` will be linked to `NAME@@VERSION`. Versions may also
be specified in version scripts.
This facility means that in principle it is never necessary to change the
library file name. The versioning scheme lets the dynamic linker direct each
symbol reference to the appropriate version. This in turn means that in a
complicated program with many shared libraries compiled against different
versions of the base library, only one instance of the base library needs to be
loaded.
However, this additional complexity leads to additional ambiguity. There are
now two possible sources of a symbol version: the name in the object file and
an entry in the version script. There is the possibility that two instances of
the same name will disagree on whether the name should be globally visible or
notin fact, this is normal, as undefined references will always use
`NAME@VERSION`, not `NAME@@VERSION`. Symbol overriding can be confusing: if the
main executable defines `NAME` without a version, which versions should it
override in the shared library? Which version should be used in the program?
Symbol visibility adds an additional wrinkle to this.
The most important issue for the linker arises when it sees both NAME and
`NAME@VERSION`, and then sees `NAME@@VERSION`. At that time the linker has seen
two separate symbols and has to decide whether to merge them. The rules that
gold currently follows are these:
* If `NAME` is hidden, and `NAME@@VERSION` is in a shared object, they are two
independent symbols, and we do not change `NAME` or its version.
* If `NAME` already has a version, because we earlier saw `NAME@@VERSION2`,
then we produce two separate symbols, and leave `NAME@@VERSION2` as the
default symbol.
* Otherwise, we change the version of `NAME` to `VERSION`, and do normal symbol
resolution.
I recently fixed a bug in this code in gold, which was breaking symbol
overriding in a specific case. I wouldnt be surprised if there are more bugs.
As far as I know nobody has worked through all the symbol combining issues and
defined what should happen.

124
eh_frame.md Normal file
View File

@ -0,0 +1,124 @@
# .eh_frame
When gcc generates code that handles exceptions, it produces tables that
describe how to unwind the stack. These tables are found in the `.eh_frame`
section. The format of the `.eh_frame` section is very similar to the format of
a DWARF `.debug_frame` section. Unfortunately, it is not precisely identical. I
dont know of any documentation which describes this format. The following
should be read in conjunction with the relevant section of the DWARF standard,
available from http://dwarfstd.org.
The `.eh_frame` section is a sequence of records. Each record is either a CIE
(Common Information Entry) or an FDE (Frame Description Entry). In general
there is one CIE per object file, and each CIE is associated with a list of
FDEs. Each FDE is typically associated with a single function. The CIE and the
FDE together describe how to unwind to the caller if the current instruction
pointer is in the range covered by the FDE.
There should be exactly one FDE covering each instruction which may be being
executed when an exception occurs. By default an exception can only occur
during a function call or a throw. When using the `-fnon-call-exceptions` gcc
option, an exception can also occur on most memory references and floating
point operations. When using `-fasynchronous-unwind-tables`, the FDE will cover
every instruction, to permit unwinding from a signal handler.
The general format of a CIE or FDE starts as follows:
* Length of record. Read 4 bytes. If they are not `0xffffffff`, they are the
length of the CIE or FDE record. Otherwise the next 64 bits holds the length,
and this is a 64-bit DWARF format. This is like `.debug_frame`.
* A 4 byte ID. For a CIE this is 0. For an FDE it is the byte offset from this
field to the start of the CIE with which this FDE is associated. The byte
offset goes to the length record of the CIE. A positive value goes backward;
that is, you have to subtract the value of the ID field from the current byte
position to get the CIE position. This differs from `.debug_frame` in that
the offset is relative rather than being an offset into the `.debug_frame`
section.
A CIE record continues as follows:
* 1 byte CIE version. As of this writing this should be 1 or 3.
* NUL terminated augmentation string. This is a sequence of characters. Very
old versions of gcc used the string “eh” here, but I wont document that.
This is described further below.
* Code alignment factor, an unsigned LEB128 (LEB128 is a DWARF encoding for
numbers which I wont describe here). This should always be 1 for `.eh_frame`.
* Data alignment factor, a signed LEB128. This is a constant factored out of
offset instructions, as in `.debug_frame`.
* The return address register. In CIE version 1 this is a single byte; in CIE
version 3 this is an unsigned LEB128. This indicates which column in the
frame table represents the return address.
The next fields of the CIE depend on the augmentation string.
* If the augmentation string starts with z, we now find an unsigned LEB128
which is the length of the augmentation data, rounded up so that the CIE ends
on an address boundary. This is used to skip to the end of the augmentation
data if an unrecognized augmentation character is seen.
* If the next character in the augmentation string is L, the next byte in the
CIE is the LSDA (Language Specific Data Area) encoding. This is a
`DW_EH_PE_xxx` value (described later). The default is `DW_EH_PE_absptr`.
* If the next character in the augmentation string is R, the next byte in the
CIE is the FDE encoding. This is a `DW_EH_PE_xxx` value. The default is
`DW_EH_PE_absptr`.
* The character S in the augmentation string means that this CIE represents a
stack frame for the invocation of a signal handler. When unwinding the stack,
signal stack frames are handled slightly differently: the instruction pointer
is assumed to be before the next instruction to execute rather than after it.
* If the next character in the augmentation string is P, the next byte in the
CIE is the personality encoding, a `DW_EH_PE_xxx` value. This is followed by
a pointer to the personality function, encoded using the personality
encoding. Ill describe the personality function some other day.
The remaining bytes are an array of `DW_CFA_xxx` opcodes which define the
initial values for the frame table. This is then followed by `DW_CFA_nop`
padding bytes as required to match the total length of the CIE.
An FDE starts with the length and ID described above, and then continues as
follows.
* The starting address to which this FDE applies. This is encoded using the FDE
encoding specified by the associated CIE.
* The number of bytes after the start address to which this FDE applies. This
is encoded using the FDE encoding.
* If the CIE augmentation string starts with z, the FDE next has an unsigned
LEB128 which is the total size of the FDE augmentation data. This may be used
to skip data associated with unrecognized augmentation characters.
* If the CIE does not specify `DW_EH_PE_omit` as the LSDA encoding, the FDE
next has a pointer to the LSDA, encoded as specified by the CIE.
The remaining bytes in the FDE are an array of `DW_CFA_xxx` opcodes which set
values in the frame table for unwinding to the caller.
The `DW_EH_PE_xxx` encodings describe how to encode values in a CIE or FDE. The
basic encoding is as follows:
* `DW_EH_PE_absptr = 0x00`: An absolute pointer. The size is determined by
whether this is a 32-bit or 64-bit address space, and will be 32 or 64 bits.
* `DW_EH_PE_omit = 0xff`: The value is omitted.
* `DW_EH_PE_uleb128 = 0x01`: The value is an unsigned LEB128.
* `DW_EH_PE_udata2 = 0x02`, `DW_EH_PE_udata4 = 0x03`, `DW_EH_PE_udata8 = 0x04`:
The value is stored as unsigned data with the specified number of bytes.
* `DW_EH_PE_signed = 0x08`: A signed number. The size is determined by whether
this is a 32-bit or 64-bit address space. I dont think this ever appears in
a CIE or FDE in practice.
* `DW_EH_PE_sleb128 = 0x09`: A signed LEB128. Not used in practice.
* `DW_EH_PE_sdata2 = 0x0a`, `DW_EH_PE_sdata4 = 0x0b`, `DW_EH_PE_sdata8 = 0x0c`:
The value is stored as signed data with the specified number of bytes. Not
used in practice.
In addition the above basic encodings, there are modifiers.
* `DW_EH_PE_pcrel = 0x10`: Value is PC relative.
* `DW_EH_PE_textrel = 0x20`: Value is text relative.
* `DW_EH_PE_datarel = 0x30`: Value is data relative.
* `DW_EH_PE_funcrel = 0x40`: Value is relative to start of function.
* `DW_EH_PE_aligned = 0x50`: Value is aligned: padding bytes are inserted as
required to make value be naturally aligned.
* `DW_EH_PE_indirect = 0x80`: This is actually the address of the real value.
If you follow all that, and also read up on `.debug_frame`, then you have
enough information to unwind the stack at runtime, e.g. to implement glibcs
backtrace function. Later Ill describe the LSDA and the personality function,
which work together to implement exception catching on top of stack unwinding.

49
eh_frame_hdr.md Normal file
View File

@ -0,0 +1,49 @@
# .eh_frame_hdr
If you followed my last post, you will see that in order to unwind the stack
you have to find the FDE associated with a given program counter value. There
are two steps to this problem. The first one is finding the CIEs and FDEs at
all. The second one is, given the set of FDEs, finding the one you need.
The old way this worked was that gcc would create a global constructor which
called the function `__register_frame_info`, passing a pointer to the
`.eh_frame` data and a pointer to the object. The latter pointer would indicate
the shared library, and was used to deregister the information after a dlclose.
When looking for an FDE, the unwinder would walk through the registered frames,
and sort them. Then it would use the sorted list to find the desired FDE.
The old way still works, but these days, at least on GNU/Linux, the sorting is
done at link time, which is better than doing it at runtime. Both gold and the
GNU linker support an option `--eh-frame-hdr` which tell them to construct a
header for all the .eh_frame sections. This header is placed in a section named
.eh_frame_hdr and also in a PT_GNU_EH_FRAME segment. At runtime the unwinder
can find all the `PT_GNU_EH_FRAME` segments by calling `dl_iterate_phdr`.
The format of the `.eh_frame_hdr` section is as follows:
* A 1 byte version number, currently 1.
* A 1 byte encoding of the pointer to the exception frames. This is a
`DW_EH_PE_xxx` value. It is normally `DW_EH_PE_pcrel | DW_EH_PE_sdata4`,
meaning a 4 byte relative offset.
* A 1 byte encoding of the count of the number of FDEs in the lookup table.
This is a `DW_EH_PE_xxx` value. It is normally `DW_EH_PE_udata4`, meaning a 4
byte unsigned count.
* A 1 byte encoding of the entries in the lookup table. This is a
`DW_EH_PE_xxx` value. It is normally `DW_EH_PE_datarel | DW_EH_PE_sdata4`,
meaning a 4 byte offset from the start of the `.eh_frame_hdr` section. That
is the only encoding that gccs current unwind library supports.
* A pointer to the contents of the `.eh_frame` section, encoded as indicated by
the second byte in the header. This pointer is only used if the format of the
lookup table is not supported or is for some reason omitted..
* The number of FDE pointers in the table, encoded as indicated by the third
byte in the header. If there are no FDEs, the encoding can be `DW_EH_PE_omit`
and this number will not be present.
* The lookup table itself, starting at a 4-byte aligned address in memory.
Assuming the fourth byte in the header is `DW_EH_PE_datarel | DW_EH_PE_sdata4`,
each entry in the table is 8 bytes long. The first four bytes are an offset
to the initial PC value for the FDE. The last four byte are an offset to the
FDE data itself. The table is sorted by starting PC.
Since FDEs do not overlap, this table is sufficient for the stack unwinder to
quickly find the relevant FDE if there is one.

104
executable-stack.md Normal file
View File

@ -0,0 +1,104 @@
# Executable stack
The gcc compiler implements an extension to C: nested functions. A trivial example:
```c
int f() {
int i = 2;
int g(int j) { return i + j; }
return g(3);
}
```
The function `f` will return 5. Note in particular that the nested function `g`
refers to the variable i defined in the enclosing function.
You can mostly treat nested functions as ordinary functions. In particular, you
can take the address of a nested function, and you can pass the resulting
function pointer to another function, that function can make a call through the
function pointer to the nested function, and the nested function will correctly
refer to variables in its callers stack frame. Im not here going to go into
the details of how this is implemented. What I will say is that gcc currently
implements this by writing instructions to the stack and using a pointer to
those instructions. This requires that the stack be executable.
This approach was implemented many years ago, before computers were routinely
attacked. In the hostile Internet environment of today, an area of memory that
is both writable and executable is dangerous, because it gives an attacker
space to create brand new instructions to execute. Since the stack must be
writable, this means that we want to make the stack non-executable if possible.
Since very few programs use nested functions, this is normally possible. But we
dont want to break those few programs either.
This is how the GNU tools do it on ELF systems such as GNU/Linux. The compiler
adds a new section to all code that it compiles. The section is named
`.note.GNU-stack`. It is empty and not allocated, which means that it takes up
no space at runtime. If the code being compiled does not require an executable
stack—the normal case—the compiler doesnt set any flags for the section. If
the code does require an executable stack, the compiler sets the
`SHF_EXECINSTR` flag.
When the linker links a program, it checks each input object for a
`.note.GNU-stack` section. If there is no such section, the linker assumes that
the object must be old, and therefore may require an executable stack. If there
is such a section, the linker checks the section flags to see whether the code
requires an executable stack. The linker discards the `.note.GNU-stack`
sections, and creates a `PT_GNU_STACK` segment in the output executable. The
`PT_GNU_STACK` segment is empty and is not part of any `PT_LOAD` segment. The
segment flags `PF_R` and `PF_W` are always set. If the linker has determined
that the program requires an executable stack, it also sets the `PF_X` flag.
When the Linux kernel starts a program, it looks for a `PT_GNU_STACK` segment.
If it does not find one, it sets the stack to be executable (if appropriate for
the architecture). If it does find a `PT_GNU_STACK` segment, it marks the stack
as executable if the segment flags call for it. (Its possible to override this
and force the kernel to never use an executable stack.) Similarly, the dynamic
linker looks for a `PT_GNU_STACK` in any executable or shared library that it
loads, and changes the stack to be executable if any of them require it.
When this all works smoothly, most programs wind up with a non-executable
stack, which is what we want. The most common reason that this fails these days
is that part of the program is written in assembler, and the assembler code
does not create a `.note.GNU_stack` section. If you write assembler code for
GNU/Linux, you must always be careful to add the appropriate line to your file.
For most targets, the line you want is:
```asm
.section .note.GNU-stack,"",@progbits
```
There are some linker options to control this. The `-z execstack` option tells
the linker to mark the program as requiring an executable stack, regardless of
the input files. The `-z noexecstack` option marks it as not requiring an
executable stack. The gold linker has a `--warn-execstack` option which will
cause the linker to warn about any object which is missing a `.note.GNU-stack`
option or which has an executable `.note.GNU-stack` option.
The execstack program may also be used to query whether a program requires an
executable stack, and to change its setting.
These days we could probably change the default: we could probably say that if
an object file does not have a `.note.GNU-stack` section, then it does not
require an executable stack. That would avoid the problem of files written in
assembler which do not create the section. Its possible that this would cause
some programs to incorrectly get a non-executable stack, but I think that would
be quite unlikely in practice. An advantage of changing the default would be
that the compiler would not have to create an empty `.note.GNU-stack` section
in all object files.
By the way, there is one thing you can do with a normal function that you can
not do with a nested function: if the nested function refers to any variables
in the enclosing function, you can not return a pointer to the nested function
to the caller. If you do, the variable will disappear, so the variable
reference in the nested function will be dangling reference. Its worth noting
here that the Go language supports nested function literals which may refer to
variables in the enclosing function, and when using Go this works correctly.
The compiler creates variables on the heap if necessary, so they do not
disappear until the garbage collector determines that nothing refers to them
any more.
Finally, Ill mention that there are some plans to implement a different scheme
for nested functions in C, one which does not require any memory to be both
writable and executable, but these plans have not yet been implemented. Ill
leave the implementation as an exercise for the reader.

56
gcc-exception-frames.md Normal file
View File

@ -0,0 +1,56 @@
# GCC Exception Frames
When an exception is thrown in C++ and caught by one of the calling functions,
the supporting libraries need to unwind the stack. With gcc this is done using
a variant of DWARF debugging information. The unwind information is loaded at
runtime, but is not read unless an exception is thrown. That means that the
unwind library needs to have some way of finding the appropriate unwind
information at runtime.
On some systems, this is done by registering the exception frame information
when the program starts. The registration is done with a variant of the
handling of C++ constructors. This becomes interesting when one shared library
can throw an exception which is caught by another shared library. It is
possible for such a case to arise when the executable itself never throws
exceptions and therefore has no frames to register. Obviously the unwinder
needs to be able to find the unwind information for both shared libraries,
which means that both shared libraries need to use the same registration
functions. With gcc this is normally ensured by putting the unwind code in a
shared library, `libgcc_s.so`. Each shared library, and sometimes the
executable, will use `libgcc_s.so`. That ensures a single copy of the
registration and unwind functions, so the library will be able to reliably
unwind across shared libraries. With gcc the use of `libgcc_s.so` can be
controlled with the `-shared-libgcc` and `-static-libgcc` options. Normally the
right thing will happen by default.
That approach has a cost: there is an extra shared library, and there is a
small cost of registering the unwind information at program startup or library
load time (and unregistering it if a shared library is unloaded via dlclose).
There is now a better way, which requires linker support.
Both gold and the GNU linker support the command line option `--eh-frame-hdr`.
With this option, when the linker sees the `.eh_frame` sections used to hold
the unwind information, it automatically builds a header. This header is a
sorted array mapping program counter addresses to unwind information. The
header is recorded as a program segment of type `PT_GNU_EH_FRAME`. (This is a
little bit ugly since the `.eh_frame` sections are recognized only by name;
ideally they should have a special section type.)
At runtime, the unwind library can use the `dl_iterate_phdr` function to find
the program segments of the executable and all currently loaded shared
libraries. It can use that to find the `PT_GNU_EH_FRAME` segments, and use the
sorted array in those segments to quickly find the unwind information.
This approach means that no registration functions are required. It also means
that it is not necessary to have a single shared library, since
`dl_iterate_phdr` is available no matter which shared library throws the
exception.
This all only works if you have a linker which supports generating
`PT_GNU_EH_FRAME` sections, if all the shared libraries and the executable are
linked by such a linker, and if you have a working `dl_iterate_phdr` function
in your C library or dynamic linker. I think that pretty much restricts this
approach to GNU/Linux and possibly other free operating systems. For those
scenarios, I hope that gcc will soon be able to stop using `libgcc_s.so` by
default.

157
gcc_except_table.md Normal file
View File

@ -0,0 +1,157 @@
# .gcc_except_table
Throwing an exception in C++ requires more than unwinding the stack. As the
program unwinds, local variable destructors must be executed. Catch clauses
must be examined to see if they should catch the exception. Exception
specifications must be checked to see if the exception should be redirected to
the unexpected handler. Similar issues arise in Go, Java, and even C when using
gccs cleanup function attribute.
As I described earlier, each CIE in the unwind data may contain a pointer to a
personality function, and each FDE may contain a pointer to the LSDA, the
Language Specific Data Area. Each language has its own personality function.
The LSDA is only used by the personality function, so it could in principle
differ for each language. However, at least for gcc, every language uses the
same format, since the LSDA is generated by the language-independent
middle-end.
The personality function takes five arguments:
1. A int version number, currently 1.
2. A bitmask of actions.
3. An exception class, a 64-bit unsigned integer which is specific to a language.
4. A pointer to information about the specific exception being thrown.
5. Unwinder state information.
The exception class permits code written in one language to work correctly when
an exception is thrown by code written in a different language. The value for
g++ is “GNUCC++\0” (or “GNUCC++\1” for a dependent exception, which is used
when rethrowing an exception). The value for Go is “GNUCGO\0\0”. The exception
specific information can only be examined if the exception class is recognized.
Unwinding the stack for an exception is done in two phases. In the first phase,
the unwinder walks up the stack passing the action `_UA_SEARCH_PHASE` (which
has the value 1) to each personality function that it finds. The personality
function should examine the LSDA to see if there is a handler for the exception
being thrown. It should return `_URC_HANDLER_FOUND` (`6`) if there is or
`_URC_CONTINUE_UNWIND` (`8`) if there isnt. The search phase will continue
until a handler is found or until the top of the stack is reached. The unwinder
will not actually change anything while walking. If the top of the stack is
reached the unwinder will simply return, and the calling code will take the
appropriate action, which for C++ is to call `std::terminate`. Because of the
two phase unwinding approach, if `std::terminate` dumps core, a backtrace will
show the code which threw the exception.
If a handler is found, the second phase begins. The unwinder walks up the stack
passing the action `_UA_CLEANUP_PHASE` (`2`) to each personality function. The
unwinder will also set `_UA_FORCE_UNWIND` (`8`) in the actions bitmask if the
personality function may not catch the exception, because the unwinding is
happening due to some event like thread cancellation. The unwinder will walk up
the stack until it finds the handler—the stack frame for which the personality
function returned `_URC_HANDLER_FOUND`. When it calls that function, the
unwinder will pass `_UA_HANDLER_FRAME` (`4`) in the actions bitmask. This time,
the unwinder will changes things as it goes, removing stack frames.
In order to run destructors, the personality function will call `_Unwind_SetIP`
on the context parameter to set the program counter to point to the cleanup
routine, and then return `_URC_INSTALL_CONTEXT` (`7`) to tell the unwinder to
branch to the current context. The address which starts the cleanup is known as
a landing pad. The cleanup should do whatever it needs to do, and then call
`_Unwind_Resume`. The exception information needs to be passed to
`_Unwind_Resume`. The personality routine arranges to pass the exception
information to the cleanup by calling `_Unwind_SetGR` passing
`__builtin_eh_return_data_regno(0)` and the exception information passed to the
personality routine. Each target which supports this approach has to dedicate
two registers to holding exception information. This is the first one.
The personality function which finds the handler works pretty much the same
way. It may also use `_Unwind_SetGR` to set a value in
`__builtin_eh_return_data_regno(1)` to indicate which exception was found. The
exception handler may rethrow the exception via `_Unwind_RaiseException` or it
may simply continue a normal execution path.
At this point weve seen everything except how the personality function decides
whether it needs to run a cleanup or catch an exception. The personality
function makes this decision based on the LSDA. As mentioned above, while the
LSDA could be language dependent, in practice it is not. There is a different
personality function for each language, but they all do more or less the same
thing, omitting aspects which are not relevant for the language (e.g., there is
a personality function for C, but it only runs cleanups and does not bother to
look for exception handlers).
The LSDA is found in the section `.gcc_except_table` (the personality function
is just a function and lives in the `.text` section as usual). The personality
function gets a pointer to it by calling `_Unwind_GetLanguageSpecificData`. The
LSDA starts with the following fields:
1. A 1 byte encoding of the following field (a `DW_EH_PE_xxx` value).
2. If the encoding is not `DW_EH_PE_omit`, the landing pad base. This is the
base from which landing pad offsets are computed. If this is omitted, the
base comes from calling `_Unwind_GetRegionStart`, which returns the beginning
of the code described by the current FDE. In practice this field is normally
omitted.
3. A 1 byte encoding of the entries in the type table (a `DW_EH_PE_xxx` value).
4. If the encoding is not `DW_EH_PE_omit`, the types table pointer. This is an
unsigned LEB128 value, and is the byte offset from this field to the start
of the types table used for exception matching.
5. A 1 byte encoding of the fields in the call-site table (a `DW_EH_PE_xxx`
value).
6. An unsigned LEB128 value holding the length in bytes of the call-site table.
This header is immediately followed by the call-site table. Each entry in the
call-site table has four fields. The number of bytes in the header gives the
total length. Each entry in the call-site table describes a particular sequence
of instructions within the function that the FDE desribes.
1. The start of the instructions for the current call site, a byte offset from
the landing pad base. This is encoded using the encoding from the header.
2. The length of the instructions for the current call site, in bytes. This is
encoded using the encoding from the header.
3. A pointer to the landing pad for this sequence of instructions, or 0 if
there isnt one. This is a byte offset from the landing pad base. This is
encoded using the encoding from the header.
4. The action to take, an unsigned LEB128. This is 1 plus a byte offset into
the action table. The value zero means that there is no action.
The call-site table is sorted by the start address field. If the personality
function finds that there is no entry for the current PC in the call-site
table, then there is no exception information. This should not happen in normal
operation, and in C++ will lead to a call to `std::terminate`. If there is an
entry in the call-site table, but the landing pad is zero, then there is
nothing to do: there are no destructors to run or exceptions to catch. This is
a normal case, and the unwinder will simply continue. If the action record is
zero, then there are destructors to run but no exceptions to catch. The
personality function will arrange to run the destructors as described above,
and unwinding will continue.
Otherwise, we have an offset into the action table. Each entry in the action
table is a pair of signed LEB128 values. The first number is a type filter. The
second number is a byte offset to the next entry in the action table. A byte
offset of 0 ends the current set of actions.
A type filter of zero indicates a cleanup, which is the same as an action
record of zero in the call-site table. This means that there is a cleanup to be
called even if none of the types match.
A positive type filter is an index into the types table. This is a negative
index: the value 1 means the entry preceding the types table base, 2 means the
entry before that, etc. The size of entries in the types table comes from the
encoding in the header, as does the base of the types table. Each entry in the
types table is a pointer to a type information structure. If this type
information structure matches the type of the exception, then we have found a
handler for this exception. The type filter value is a switch value will be
passed to the handler in exception register 1. The actual comparison of the
type information, and determining the type information from the exception
pointer, really is language dependent. In C++ this is a pointer to a
`std::type_info` structure. A `NULL` pointer in the types table is a catch-all
handler.
A negative type filter is a byte offset into the types table of a `NULL`
terminated list of pointers to type information structures. If the type of the
current exception does not match any of the entries in the list, then there is
an exception specification error. This is treated as an exception handler with
a negative switch value.
I think that covers everything about how gcc unwinds the stack and throws
exceptions.

23
linker-combreloc.md Normal file
View File

@ -0,0 +1,23 @@
# Linker combreloc
The GNU linker has a `-z combreloc` option, which is enabled by default (it can
be turned off via `-z nocombreloc`). I just implemented this in gold as well.
This option directs the linker to sort the dynamic relocations. The sorting is
done in order to optimize the dynamic linker.
The dynamic linker in glibc uses a one element cache when processing relocs: if
a relocation refers to the same symbol as the previous relocation, then the
dynamic linker reuses the value rather than looking up the symbol again. Thus
the dynamic linker gets the best results if the dynamic relocations are sorted
so that all dynamic relocations for a given dynamic symbol are adjacent.
Other than that, the linker sorts together all relative relocations, which
dont have symbols. Two relative relocations, or two relocations against the
same symbol, are sorted by the address in the output file. This tends to
optimize paging and caching when there are two references from the same page.
This may seem like a micro-optimization, but it can have a real effect on
program startup time, especially if the program has lots of shared libraries.
Ive seen a case where a program starts up 16% faster because the relocations
were sorted.

56
linker-relro.md Normal file
View File

@ -0,0 +1,56 @@
# Linker relro
gcc, the GNU linker, and the glibc dynamic linker cooperate to implement an
idea called read-only relocations, or relro. This permits the linker to
designate a part of an executable or (more commonly) a shared library as being
read-only after dynamic relocations have been applied.
This may be used for read-only global variables which are initialized to
something which requires a relocation, such as the address of a function or a
different global variable. Because the global variable requires a runtime
initialization in the form of a dynamic relocation, it can not be placed in a
read-only segment. However, because it is declared to be constant, and
therefore may not be changed by the program, the dynamic linker can mark it as
read-only after the dynamic relocation has been applied.
For some targets this technique may also be used for the PLT or parts of the
GOT.
Making these pages read-only helps catch some cases of memory corruption, and
making the PLT in particular read-only helps prevent some types of buffer
overflow exploits.
The first step is in gcc. When gcc sees a variable which is constant but
requires a dynamic relocation, it puts it into a section named `.data.rel.ro`
(this functionality unfortunately relies on magic section names). A variable
which requires a dynamic relocation against a local symbol is put into a
`.data.rel.ro.local` section; this helps group such variables together, so that
the dynamic linker may apply the relocations, which will always be `RELATIVE`
relocations, more efficiently, especially when using `combreloc`.
The linker groups `.data.rel.ro` and `.data.rel.ro.local` sections as usual.
The new step is that the linker then emits a `PT_GNU_RELRO` program segment
which covers these sections. If the PLT and/or GOT can be read-only after
dynamic relocations, they are put next to the `.data.rel.ro` sections and also
become part of the new segment. This segment will enclosed within a `PT_LOAD`
segment. The `p_vaddr` field of the `PT_GNU_RELRO` segment gives the virtual
address of the start of the read-only after dynamic relocations code, and the
`p_memsz` field gives its length.
When the dynamic linker sees a `PT_GNU_RELRO` segment, it uses mprotect to mark
the pages as read-only after the dynamic relocations have been applied. Of
course this only works if the segment does in fact cover an entire page. The
linker will try to force this to happen.
Note that the current dynamic linker code will only work correctly if the
`PT_GNU_RELRO` segment starts on a page boundary. This is because the dynamic
linker rounds the `p_vaddr` field down to the previous page boundary. If there is
anything on the page which should not be read-only, the program is likely to
fail at runtime. So in effect the linker must only emit a `PT_GNU_RELRO`
segment if it ensures that it starts on a page boundary.
I see this as a relatively minor security benefit. It is not an optimization as
far as I can see. I am documenting it here as part of my general documentation
of obscure linker features. The current description of this feature in the GNU
linker manual is rather obscure.

83
linkers-1.md Normal file
View File

@ -0,0 +1,83 @@
# Linkers part 1
Ive been working on and off on a new linker. To my surprise, Ive discovered
in talking about this that some people, even some computer programmers, are
unfamiliar with the details of the linking process. Ive decided to write some
notes about linkers, with the goal of producing an essay similar to my existing
one about the GNU configure and build system.
As I only have the time to write one thing a day, Im going to do this on my
blog over time, and gather the final essay together later. I believe that I may
be up to five readers, and I hope yall will accept this digression into stuff
that matters. I will return to random philosophizing and minding other peoples
business soon enough.
## A Personal Introduction
Who am I to write about linkers?
I wrote my first linker back in 1988, for the AMOS operating system which ran
on Alpha Micro systems. (If you dont understand the following description,
dont worry; all will be explained below). I used a single global database to
register all symbols. Object files were checked into the database after they
had been compiled. The link process mainly required identifying the object file
holding the main function. Other objects files were pulled in by reference. I
reverse engineered the object file format, which was undocumented but quite
simple. The goal of all this was speed, and indeed this linker was much faster
than the system one, mainly because of the speed of the database.
I wrote my second linker in 1993 and 1994. This linker was designed and
prototyped by Steve Chamberlain while we both worked at Cygnus Support (later
Cygnus Solutions, later part of Red Hat). This was a complete reimplementation
of the BFD based linker which Steve had written a couple of years before.
The primary target was a.out and COFF. Again the goal was speed, especially
compared to the original BFD based linker. On SunOS 4 this linker was almost as
fast as running the cat program on the input .o files.
The linker I am now working, called gold, on will be my third. It is
exclusively an ELF linker. Once again, the goal is speed, in this case being
faster than my second linker. That linker has been significantly slowed down
over the years by adding support for ELF and for shared libraries. This support
was patched in rather than being designed in. Future plans for the new linker
include support for incremental linkingwhich is another way of increasing
speed.
There is an obvious pattern here: everybody wants linkers to be faster. This is
because the job which a linker does is uninteresting. The linker is a speed
bump for a developer, a process which takes a relatively long time but adds no
real value. So why do we have linkers at all? That brings us to our next topic.
## A Technical Introduction
What does a linker do?
Its simple: a linker converts object files into executables and shared
libraries. Lets look at what that means. For cases where a linker is used,
the software development process consists of writing program code in some
language: e.g., C or C++ or Fortran (but typically not Java, as Java normally
works differently, using a loader rather than a linker). A compiler translates
this program code, which is human readable text, into into another form of
human readable text known as assembly code. Assembly code is a readable form of
the machine language which the computer can execute directly. An assembler is
used to turn this assembly code into an object file. For completeness, Ill
note that some compilers include an assembler internally, and produce an object
file directly. Either way, this is where things get interesting.
In the old days, when dinosaurs roamed the data centers, many programs were
complete in themselves. In those days there was generally no compilerpeople
wrote directly in assembly codeand the assembler actually generated an
executable file which the machine could execute directly. As languages liked
Fortran and Cobol started to appear, people began to think in terms of
libraries of subroutines, which meant that there had to be some way to run the
assembler at two different times, and combine the output into a single
executable file. This required the assembler to generate a different type of
output, which became known as an object file (I have no idea where this name
came from). And a new program was required to combine different object files
together into a single executable. This new program became known as the linker
(the source of this name should be obvious).
Linkers still do the same job today. In the decades that followed, one new
feature has been added: shared libraries.
More tomorrow.

37
linkers-10.md Normal file
View File

@ -0,0 +1,37 @@
# Linkers part 10
## Parallel Linking
It is possible to parallelize the linking process somewhat. This can help hide
I/O latency and can take better advantage of modern multi-core systems. My
intention with gold is to use these ideas to speed up the linking process.
The first area which can be parallelized is reading the symbols and relocation
entries of all the input files. The symbols must be processed in order;
otherwise, it will be difficult for the linker to resolve multiple definitions
correctly. In particular all the symbols which are used before an archive must
be fully processed before the archive is processed, or the linker wont know
which members of the archive to include in the link (I guess I havent talked
about archives yet). However, despite these ordering requirements, it can be
beneficial to do the actual I/O in parallel.
After all the symbols and relocations have been read, the linker must complete
the layout of all the input contents. Most of this can not be done in parallel,
as setting the location of one type of contents requires knowing the size of
all the preceding types of contents. While doing the layout, the linker can
determine the final location in the output file of all the data which needs to
be written out.
After layout is complete, the process of reading the contents, applying
relocations, and writing the contents to the output file can be fully
parallelized. Each input file can be processed separately.
Since the final size of the output file is known after the layout phase, it is
possible to use `mmap` for the output file. When not doing relaxation, it is
then possible to read the input contents directly into place in the output
file, and to relocation them in place. This reduces the number of system calls
required, and ideally will permit the operating system to do optimal disk I/O
for the output file.
Just a short entry tonight. More tomorrow.

49
linkers-11.md Normal file
View File

@ -0,0 +1,49 @@
# Linkers part 11
## Archives
Archives are a traditional Unix package format. They are created by the `ar`
program, and they are normally named with a `.a` extension. Archives are passed
to a Unix linker with the `-l` option.
Although the `ar` program is capable of creating an archive from any type of
file, it is normally used to put object files into an archive. When it is used
in this way, it creates a symbol table for the archive. The symbol table lists
all the symbols defined by any object file in the archive, and for each symbol
indicates which object file defines it. Originally the symbol table was created
by the `ranlib` program, but these days it is always created by `ar` by default
(despite this, many Makefiles continue to run `ranlib` unnecessarily).
When the linker sees an archive, it looks at the archives symbol table. For
each symbol the linker checks whether it has seen an undefined reference to
that symbol without seeing a definition. If that is the case, it pulls the
object file out of the archive and includes it in the link. In other words, the
linker pulls in all the object files which defines symbols which are referenced
but not yet defined.
This operation repeats until no more symbols can be defined by the archive.
This permits object files in an archive to refer to symbols defined by other
object files in the same archive, without worrying about the order in which
they appear.
Note that the linker considers an archive in its position on the command line
relative to other object files and archives. If an object file appears after an
archive on the command line, that archive will not be used to defined symbols
referenced by the object file.
In general the linker will not include archives if they provide a definition
for a common symbol. You will recall that if the linker sees a common symbol
followed by a defined symbol with the same name, it will treat the common
symbol as an undefined reference. That will only happen if there is some other
reason to include the defined symbol in the link; the defined symbol will not
be pulled in from the archive.
There was an interesting twist for common symbols in archives on old
`a.out`-based SunOS systems. If the linker saw a common symbol, and then saw a
common symbol in an archive, it would not include the object file from the
archive, but it would change the size of the common symbol to the size in the
archive if that were larger than the current size. The C library relied on this
behaviour when implementing the `stdin` variable.
My next posting should be on Monday.

110
linkers-12.md Normal file
View File

@ -0,0 +1,110 @@
# Linkers part 12
I apologize for the pause in posts. We moved over the weekend. Last Friday AT&T
told me that the new DSL was working at our new house. However, it did not
actually start working outside the house until Wednesday. Then a problem with
the internal wiring meant that it was not working inside the house until today.
I am now finally back online at home.
## Symbol Resolution
I find that symbol resolution is one of the trickier aspects of a linker.
Symbol resolution is what the linker does the second and subsequent times that
it sees a particular symbol. Ive already touched on the topic in a few
previous entries, but lets look at it in a bit more depth.
Some symbols are local to a specific object files. We can ignore these for the
purposes of symbol resolution, as by definition the linker will never see them
more than once. In ELF these are the symbols with a binding of `STB_LOCAL`.
In general, symbols are resolved by name: every symbol with the same name is
the same entity. Weve already seen a few exceptions to that general rule. A
symbol can have a version: two symbols with the same name but different
versions are different symbols. A symbol can have non-default visibility: a
symbol with hidden visibility in one shared library is not the same as a symbol
with the same name in a different shared library.
The characteristics of a symbol which matter for resolution are:
* The symbol name
* The symbol version.
* Whether the symbol is the default version or not.
* Whether the symbol is a definition or a reference or a common symbol.
* The symbol visibility.
* Whether the symbol is weak or strong (i.e., non-weak).
* Whether the symbol is defined in a regular object file being included in the
output, or in a shared library.
* Whether the symbol is thread local.
* Whether the symbol refers to a function or a variable.
The goal of symbol resolution is to determine the final value of the symbol.
After all symbols are resolved, we should know the specific object file or
shared library which defines the symbol, and we should know the symbols type,
size, etc. It is possible that some symbols will remain undefined after all the
symbol tables have been read; in general this is only an error if some
relocation refers to that symbol.
At this point Id like to present a simple algorithm for symbol resolution, but
I dont think I can. Ill try to hit all the high points, though. Lets assume
that we have two symbols with the same name. Lets call the symbol we saw first
A and the new symbol B. (Im going to ignore symbol visibility in the algorithm
below; the effects of visibility should be obvious, I hope.)
1. If A has a version:
* If B has a version different from A, they are actually different symbols.
* If B has the same version as A, they are the same symbol; carry on.
* If B does not have a version, and A is the default version of the symbol,
they are the same symbol; carry on.
* Otherwise B is probably a different symbol. But note that if A and B are
both undefined references, then it is possible that A refers to the default
version of the symbol but we dont yet know that. In that case, if B does
not have a version, A and B really are the same symbol. We cant tell until
we see the actual definition.
2. If A does not have a version:
* If B does not have a version, they are the same symbol; carry on.
* If B has a version, and it is the default version, they are the same
symbol; carry on.
* Otherwise, B is probably a different symbol, as above.
3. If A is thread local and B is not, or vice-versa, then we have an error.
4. If A is an undefined reference:
* If B is an undefined reference, then we can complete the resolution, and
more or less ignore B.
* If B is a definition or a common symbol, then we can resolve A to B.
5. If A is a strong definition in an object file:
* If B is an undefined reference, then we resolve B to A.
* If B is a strong definition in an object file, then we have a multiple
definition error.
* If B is a weak definition in an object file, then A overrides B. In effect,
B is ignored.
* If B is a common symbol, then we treat B as an undefined reference.
* If B is a definition in a shared library, then A overrides B. The dynamic
linker will change all references to B in the shared library to refer to A
instead.
6. If A is a weak definition in an object file, we act just like the strong
definition case, with one exception: if B is a strong definition in an
object file. In the original SVR4 linker, this case was treated as a
multiple definition error. In the Solaris and GNU linkers, this case is
handled by letting B override A.
7. If A is a common symbol in an object file:
* If B is a common symbol, we set the size of A to be the maximum of the size
of A and the size of B, and then treat B as an undefined reference.
* If B is a definition in a shared library with function type, then A
overrides B (this oddball case is required to correctly handle some Unix
system libraries).
* Otherwise, we treat A as an undefined reference.
8. If A is a definition in a shared library, then if B is a definition in a
regular object (strong or weak), it overrides A. Otherwise we act as though
A were defined in an object file.
9. If A is a common symbol in a shared library, we have a funny case. Symbols
in shared libraries must have addresses, so they cant be common in the same
sense as symbols in an object file. But ELF does permit symbols in a shared
library to have the type `STT_COMMON` (this is a relatively recent
addition). For purposes of symbol resolution, if A is a common symbol in a
shared library, we still treat it as a definition, unless B is also a common
symbol. In the latter case, B overrides A, and the size of B is set to the
maximum of the size of A and the size of B.
I hope I got all that right.
More tomorrow, assuming the Internet connection holds up.

91
linkers-13.md Normal file
View File

@ -0,0 +1,91 @@
# Linkers part 13
## Symbol Versions Redux
Ive talked about symbol versions from the linkers point of view. I think its
worth discussing them a bit from the users point of view.
As Ive discussed before, symbol versions are an ELF extension designed to
solve a specific problem: making it possible to upgrade a shared library
without changing existing executables. That is, they provide backward
compatibility for shared libraries. There are a number of related problems
which symbol versions do not solve. They do not provide forward compatibility
for shared libraries: if you upgrade your executable, you may need to upgrade
your shared library also (it would be nice to have a feature to build your
executable against an older version of the shared library, but that is
difficult to implement in practice). They only work at the shared library
interface: they do not help with a change to the ABI of a system call, which is
at the kernel interface. They do not help with the problem of sharing
incompatible versions of a shared library, as may happen when a complex
application is built out of several different existing shared libraries which
have incompatible dependencies.
Despite these limitations, shared library backward compatibility is an
important issue. Using symbol versions to ensure backward compatibility
requires a careful and rigorous approach. You must start by applying a version
to every symbol. If a symbol in the shared library does not have a version,
then it is impossible to change it in a backward compatible fashion. Then you
must pay close attention to the ABI of every symbol. If the ABI of a symbol
changes for any reason, you must provide a copy which implements the old ABI.
That copy should be marked with the original version. The new symbol must be
given a new version.
The ABI of a symbol can change in a number of ways. Any change to the parameter
types or the return type of a function is an ABI change. Any change in the type
of a variable is an ABI change. If a parameter or a return type is a struct or
class, then any change in the type of any field is an ABI changei.e., if a
field in a struct points to another struct, and that struct changes, the ABI
has changed. If a function is defined to return an instance of an enum, and a
new value is added to the enum, that is an ABI change. In other words, even
minor changes can be ABI changes. The question you need to ask is: can existing
code which has already been compiled continue to use the new symbol with no
change? If the answer is no, you have an ABI change, and you must define a new
symbol version.
You must be very careful when writing the symbol implementing the old ABI, if
you dont just copy the existing code. You must be certain that it really does
implement the old ABI.
There are some special challenges when using C++. Adding a new virtual method
to a class can be an ABI change for any function which uses that class.
Providing the backward compatible version of the class in such a situation is
very awkwardthere is no natural way to specify the name and version to use for
the virtual table or the RTTI information for the old version.
Naturally, you must never delete any symbols.
Getting all the details correct, and verifying that you got them correct,
requires great attention to detail. Unfortunately, I dont know of any tools to
help people write correct version scripts, or to verify them. Still, if
implemented correctly, the results are good: existing executables will continue
to run.
## Static Linking vs. Dynamic Linking
There is, of course, another way to ensure that existing executables will
continue to run: link them statically, without using any shared libraries. That
will limit their ABI issues to the kernel interface, which is normally
significantly smaller than the library interface.
There is a performance tradeoff with static linking. A statically linked
program does not get the benefit of sharing libraries with other programs
executing at the same time. On the other hand, a statically linked program does
not have to pay the performance penalty of position independent code when
executing within the library.
Upgrading the shared library is only possible with dynamic linking. Such an
upgrade can provide bug fixes and better performance. Also, the dynamic linker
can select a version of the shared library appropriate for the specific
platform, which can also help performance.
Static linking permits more reliable testing of the program. You only need to
worry about kernel changes, not about shared library changes.
Some people argue that dynamic linking is always superior. I think there are
benefits on both sides, and which choice is best depends on the specific
circumstances.
More on Monday. If you think I should write about any specific linker related
topics which have not already been mentioned in the comments, please let me
know.

92
linkers-14.md Normal file
View File

@ -0,0 +1,92 @@
# Linkers part 14
## Link Time Optimization
Ive already mentioned some optimizations which are peculiar to the linker:
relaxation and garbage collection of unwanted sections. There is another class
of optimizations which occur at link time, but are really related to the
compiler. The general name for these optimizations is link time optimization or
whole program optimization.
The general idea is that the compiler optimization passes are run at link time.
The advantage of running them at link time is that the compiler can then see
the entire program. This permits the compiler to perform optimizations which
can not be done when sources files are compiled separately. The most obvious
such optimization is inlining functions across source files. Another is
optimizing the calling sequence for simple functionse.g., passing more
parameters in registers, or knowing that the function will not clobber all
registers; this can only be done when the compiler can see all callers of the
function. Experience shows that these and other optimizations can bring
significant performance benefits.
Generally these optimizations are implemented by having the compiler write a
version of its intermediate representation into the object file, or into some
parallel file. The intermediate representation will be the parsed version of
the source file, and may already have had some local optimizations applied.
Sometimes the object file contains only the compiler intermediate
representation, sometimes it also contains the usual object code. In the former
case link time optimization is required, in the latter case it is optional.
I know of two typical ways to implement link time optimization. The first
approach is for the compiler to provide a pre-linker. The pre-linker examines
the object files looking for stored intermediate representation. When it finds
some, it runs the link time optimization passes. The second approach is for the
linker proper to call back into the compiler when it finds intermediate
representation. This is generally done via some sort of plugin API.
Although these optimizations happen at link time, they are not part of the
linker proper, at least not as I defined it. When the compiler reads the stored
intermediate representation, it will eventually generate an object file, one
way or another. The linker proper will then process that object file as usual.
These optimizations should be thought of as part of the compiler.
## Initialization Code
C++ permits globals variables to have constructors and destructors. The global
constructors must be run before main starts, and the global destructors must be
run after exit is called. Making this work requires the compiler and the linker
to cooperate.
The a.out object file format is rarely used these days, but the GNU a.out
linker has an interesting extension. In a.out symbols have a one byte type
field. This encodes a bunch of debugging information, and also the section in
which the symbol is defined. The a.out object file format only supports three
sectionstext, data, and bss. Four symbol types are defined as sets: text set,
data set, bss set, and absolute set. A symbol with a set type is permitted to
be defined multiple times. The GNU linker will not give a multiple definition
error, but will instead build a table with all the values of the symbol. The
table will start with one word holding the number of entries, and will end with
a zero word. In the output file the set symbol will be defined as the address
of the start of the table.
For each C++ global constructor, the compiler would generate a symbol named
`__CTOR_LIST__` with the text set type. The value of the symbol in the object
file would be the global constructor function. The linker would gather together
all the `__CTOR_LIST__` functions into a table. The startup code supplied by
the compiler would walk down the `__CTOR_LIST__` table and call each function.
Global destructors were handled similarly, with the name `__DTOR_LIST__`.
Anyhow, so much for a.out. In ELF, global constructors are handled in a fairly
similar way, but without using magic symbol types. Ill describe what gcc does.
An object file which defines a global constructor will include a `.ctors`
section. The compiler will arrange to link special object files at the very
start and very end of the link. The one at the start of the link will define a
symbol for the `.ctors` section; that symbol will wind up at the start of the
section. The one at the end of the link will define a symbol for the end of the
`.ctors` section. The compiler startup code will walk between the two symbols,
calling the constructors. Global destructors work similarly, in a `.dtors`
section.
ELF shared libraries work similarly. When the dynamic linker loads a shared
library, it will call the function at the `DT_INIT` tag if there is one. By
convention the ELF program linker will set this to the function named `_init`,
if there is one. Similarly the `DT_FINI` tag is called when a shared library is
unloaded, and the program linker will set this to the function named `_fini`.
As I mentioned earlier, three are also `DT_INIT_ARRAY`, `DT_PREINIT_ARRAY`, and
`DT_FINI_ARRAY` tags, which are set based on the `SHT_INIT_ARRAY`,
`SHT_PREINIT_ARRAY`, and `SHT_FINI_ARRAY` section types. This is a newer
approach in ELF, and does not require relying on special symbol names.
More tomorrow.

66
linkers-15.md Normal file
View File

@ -0,0 +1,66 @@
# Linkers part 15
## COMDAT sections
In C++ there are several constructs which do not clearly live in a single
place. Examples are inline functions defined in a header file, virtual tables,
and typeinfo objects. There must be only a single instance of each of these
constructs in the final linked program (actually we could probably get away
with multiple copies of a virtual table, but the others must be unique since it
is possible to take their address). Unfortunately, there is not necessarily a
single object file in which they should be generated. These types of constructs
are sometimes described as having vague linkage.
Linkers implement these features by using *COMDAT* sections (there may be other
approaches, but this is the only I know of). COMDAT sections are a special type
of section. Each COMDAT section has a special string. When the linker sees
multiple COMDAT sections with the same special string, it will only keep one of
them.
For example, when the C++ compiler sees an inline function `f1` defined in a
header file, but the compiler is unable to inline the function in all uses
(perhaps because something takes the address of the function), the compiler
will emit `f1` in a COMDAT section associated with the string `f1`. After the
linker sees a COMDAT section `f1`, it will discard all subsequent `f1` COMDAT
sections.
This obviously raises the possibility that there will be two entirely different
inline functions named `f1`, defined in different header files. This would be
an invalid C++ program, violating the One Definition Rule (often abbreviated
ODR). Unfortunately, if no source file included both header files, the
compiler would be unable to diagnose the error. And, unfortunately, the linker
would simply discard the duplicate COMDAT sections, and would not notice the
error either. This is an area where some improvements are needed (at least in
the GNU tools; I dont know whether any other tools diagnose this error
correctly).
The Microsoft PE object file format provides COMDAT sections. These sections
can be marked so that duplicate COMDAT sections which do not have identical
contents cause an error. That is not as helpful as it seems, as different
compiler options may cause valid duplicates to have different contents. The
string associated with a COMDAT section is stored in the symbol table.
Before I learned about the Microsoft PE format, I introduced a different type
of COMDAT sections into the GNU ELF linker, following a suggestion from Jason
Merrill. Any section whose name starts with “.gnu.linkonce.” is a COMDAT
section. The associated string is simply the section name itself. Thus the
inline function `f1` would be put into the section “.gnu.linkonce.f1”. This
simple implementation works well enough, but it has a flaw in that some
functions require data in multiple sections; e.g., the instructions may be in
one section and associated static data may be in another section. Since
different instances of the inline function may be compiled differently, the
linker can not reliably and consistently discard duplicate data (I dont know
how the Microsoft linker handles this problem).
Recent versions of ELF introduce section groups. These implement an officially
sanctioned version of COMDAT in ELF, and avoid the problem of “.gnu.linkonce”
sections. I described these briefly in an earlier blog entry. A special section
of type `SHT_GROUP` contains a list of section indices in the group. The group
is retained or discarded as a whole. The string associated with the group is
found in the symbol table. Putting the string in the symbol table makes it
awkward to retrieve, but since the string is generally the name of a symbol it
means that the string only needs to be stored once in the object file; this is
a minor optimization for C++ in which symbol names may be very long.
More tomorrow.

87
linkers-16.md Normal file
View File

@ -0,0 +1,87 @@
# Linkers part 16
## C++ Template Instantiation
There is still more C++ fun at link time, though somewhat less related to the
linker proper. A C++ program can declare templates, and instantiate them with
specific types. Ideally those specific instantiations will only appear once in
a program, not once per source file which instantiates the templates. There are
a few ways to make this work.
For object file formats which support COMDAT and vague linkage, which I
described yesterday, the simplest and most reliable mechanism is for the
compiler to generate all the template instantiations required for a source file
and put them into the object file. They should be marked as COMDAT, so that the
linker discards all but one copy. This ensures that all template instantiations
will be available at link time, and that the executable will have only one
copy. This is what gcc does by default for systems which support it. The
obvious disadvantages are the time required to compile all the duplicate
template instantiations and the space they take up in the object files. This is
sometimes called the Borland model, as this is what Borlands C++ compiler did.
Another approach is to not generate any of the template instantiations at
compile time. Instead, when linking, if we need a template instantiation which
is not found, invoke the compiler to build it. This can be done either by
running the linker and looking for error messages or by using a linker plugin
to handle an undefined symbol error. The difficulties with this approach are to
find the source code to compile and to find the right options to pass to the
compiler. Typically the source code is placed into a repository file of some
sort at compile time, so that it is available at link time. The complexities of
getting the compilation steps right are why this approach is not the default.
When it works, though, it can be faster than the duplicate instantiation
approach. This is sometimes called the Cfront model.
gcc also supports explicit template instantiation, which can be used to control
exactly where templates are instantiated. This approach can work if you have
complete control over your source code base, and can instantiate all required
templates in some central place. This approach is used for gccs C++ library,
libstdc++.
C++ defines a keyword export which is supposed to permit exporting template
definitions in such a way that they can be read back in by the compiler. gcc
does not support this keyword. If it worked, it could be a slightly more
reliable way of using a repository when using the Cfront model.
## Exception Frames
C++ and other languages support exceptions. When an exception is thrown in one
function and caught in another, the program needs to reset the stack pointer
and registers to the point where the exception is caught. While resetting the
stack pointer, the program needs to identify all local variables in the part of
the stack being discarded, and run their destructors if any. This process is
known as unwinding the stack.
The information needed to unwind the stack is normally stored in tables in the
program. Supporting library code is used to read the tables and perform the
necessary operations. Im not going to describe the details of those tables
here. However, there is a linker optimization which applies to them.
The support libraries need to be able to find the exception tables at runtime
when an exception occurs. An exception can be thrown in one shared library and
caught in a different shared library, so finding all the required exception
tables can be a nontrivial operation. One approach that can be used is to
register the exception tables at program startup time or shared library load
time. The registration can be done at the right time using the global
constructor mechanism.
However, this approach imposes a runtime cost for exceptions, in that it takes
longer for the program to start. Therefore, this is not ideal. The linker can
optimize this by building tables which can be used to find the exception
tables. The tables built by the GNU linker are sorted for fast lookup by the
runtime library. The tables are put into a `PT_GNU_EH_FRAME` segment. The
supporting libraries then need a way to look up a segment of this type. This is
done via the `dl_iterate_phdr` API provided by the GNU dynamic linker.
Note that if the compiler believes that the linker will generate a
`PT_GNU_EH_FRAME` segment, it wont generate the startup code to register the
exception tables. Thus the linker must not fail to create this segment.
Since the GNU linker needs to look at the exception tables in order to generate
the `PT_GNU_EH_FRAME` segment, it will also optimize by discarding duplicate
exception table information.
I know this is section is rather short on details. I hope the general idea is
clear.
More tomorrow.

29
linkers-17.md Normal file
View File

@ -0,0 +1,29 @@
# Linkers part 17
## Warning Symbols
The GNU linker supports a weird extension to ELF used to issue warnings when
symbols are referenced at link time. This was originally implemented for a.out
using a special symbol type. For ELF, I implemented it using a special section
name.
If you create a section named `.gnu.warning.SYMBOL`, then if and when the
linker sees an undefined reference to `SYMBOL`, it will issue a warning. The
warning is triggered by seeing an undefined symbol with the right name in an
object file. Unlike the warning about an undefined symbol, it is not triggered
by seeing a relocation entry. The text of the warning is simply the contents of
the `.gnu.warning.SYMBOL` section.
The GNU C library uses this feature to warn about references to symbols like
`gets` which are required by standards but are generally considered to be
unsafe. This is done by creating a section named `.gnu.warning.gets` in the
same object file which defines `gets`.
The GNU linker also supports another type of warning, triggered by sections
named `.gnu.warning` (without the symbol name). If an object file with a
section of that name is included in the link, the linker will issue a warning.
Again, the text of the warning is simply the contents of the `.gnu.warning`
section. I dont know if anybody actually uses this feature.
Short entry today, more tomorrow.

53
linkers-18.md Normal file
View File

@ -0,0 +1,53 @@
# Linkers part 18
## Incremental Linking
Often a programmer will make change a single source file and recompile and
relink the application. A standard linker will need to read all the input
objects and libraries in order to regenerate the executable with the change.
For a large application, this is a lot of work. If only one input object file
changed, it is a lot more work than really needs to be done. One solution is to
use an incremental linker. An incremental linker makes incremental changes to
an existing executable or shared library, rather than rebuilding them from
scratch.
Ive never actually written or worked on an incremental linker, but the general
idea is straightforward enough. When the linker writes the output file, it must
attach additional information.
* The linker must create a mapping of object files to areas in the output file,
so that an incremental link will know what to remove when replacing an object
file.
* The linker must retain all the relocations for each input object which refer
to symbols defined in other objects, so that it can reprocess them when
symbols change. The linker should store the relocations mapped by symbol, so
that it can quickly find the relevant relocations.
* The linker should leave extra space in the text and data segments, to allow
for object files to grow to a limited extent without requiring rewriting the
whole executable. It must keep a map of where this extra space is, as it will
tend to move over time over the course of incremental links.
* The linker should keep a list of object file timestamps in the output file,
so that it can quickly determine which objects have changed.
With this information, the linker can identify which object files have changed
since the last time the output file was linked, and replace them in the
existing output file. When an object file changes, the linker can identify all
the relocations which refer to symbols defined in the object file, and
reprocess them.
When an object file gets too large to fit in the available space in a text or
data segment, then the linker has the option of creating additional text or
data segments at different addresses. This requires some care to ensure that
the new code does not collide with the heap, depending upon how the local
malloc implementation works. Alternatively, the incremental linker could fall
back on doing a full link, and allocating more space again.
Incremental linking can greatly speed up the edit/compile/debug cycle.
Unfortunately it is not implemented in most common linkers. Of course an
incremental link is not equivalent to a final link, and in particular some
linker optimizations are difficult to implement while acting incrementally. An
incremental link is really only suitable for use during the development cycle,
which is course the time when the speed of the linker is most important.
More on Monday.

139
linkers-19.md Normal file
View File

@ -0,0 +1,139 @@
# Linkers part 19
Ive pretty much run out of linker topics. Unless I think of something new, Ill make tomorrows post be the last one, for a total of 20.
## __start and __stop Symbols
A quick note about another GNU linker extension. If the linker sees a section
in the output file which can be part of a C variable namethe name contains
only alphanumeric characters or underscorethe linker will automatically define
symbols marking the start and stop of the section. Note that this is not true
of most section names, as by convention most section names start with a period.
But the name of a section can be any string; it doesnt have to start with a
period. And when that happens for section `NAME`, the GNU linker will define
the symbols `__start_NAME` and `__stop_NAME` to the address of the beginning
and the end of section, respectively.
This is convenient for collecting some information in several different object
files, and then referring to it in the code. For example, the GNU C library
uses this to keep a list of functions which may be called to free memory. The
`__start` and `__stop` symbols are used to walk through the list.
In C code, these symbols should be declared as something like extern char
`__start_NAME[]`. For an extern array the value of the symbol and the value of
the variable are the same.
## Byte Swapping
The new linker I am working on, gold, is written in C++. One of the attractions
was to use template specialization to do efficient byte swapping. Any linker
which can be used in a cross-compiler needs to be able to swap bytes when
writing them out, in order to generate code for a big-endian system while
running on a little-endian system, or vice-versa. The GNU linker always stores
data into memory a byte at a time, which is unnecessary for a native linker.
Measurements from a few years ago showed that this took about 5% of the
linkers CPU time. Since the native linker is by far the most common case, it
is worth avoiding this penalty.
In C++, this can be done using templates and template specialization. The idea
is to write a template for writing out the data. Then provide two
specializations of the template, one for a linker of the same endianness and
one for a linker of the opposite endianness. Then pick the one to use at
compile time. The code looks this; Im only showing the 16-bit case for
simplicity.
```cpp
// Endian simply indicates whether the host is big endian or not.
struct Endian
{
public:
// Used for template specializations.
static const bool host_big_endian = __BYTE_ORDER == __BIG_ENDIAN;
};
// Valtype_base is a template based on size (8, 16, 32, 64) which
// defines the type Valtype as the unsigned integer of the specified
// size.
template
struct Valtype_base;
template<>
struct Valtype_base<16>
{
typedef uint16_t Valtype;
};
// Convert_endian is a template based on size and on whether the host
// and target have the same endianness. It defines the type Valtype
// as Valtype_base does, and also defines a function convert_host
// which takes an argument of type Valtype and returns the same value,
// but swapped if the host and target have different endianness.
template
struct Convert_endian;
template
struct Convert_endian
{
typedef typename Valtype_base::Valtype Valtype;
static inline Valtype
convert_host(Valtype v)
{ return v; }
};
template<>
struct Convert_endian<16, false>
{
typedef Valtype_base<16>::Valtype Valtype;
static inline Valtype
convert_host(Valtype v)
{ return bswap_16(v); }
};
// Convert is a template based on size and on whether the target is
// big endian. It defines Valtype and convert_host like
// Convert_endian. That is, it is just like Convert_endian except in
// the meaning of the second template parameter.
template
struct Convert
{
typedef typename Valtype_base::Valtype Valtype;
static inline Valtype
convert_host(Valtype v)
{
return Convert_endian
::convert_host(v);
}
};
// Swap is a template based on size and on whether the target is big
// endian. It defines the type Valtype and the functions readval and
// writeval. The functions read and write values of the appropriate
// size out of buffers, swapping them if necessary.
template
struct Swap
{
typedef typename Valtype_base::Valtype Valtype;
static inline Valtype
readval(const Valtype* wv)
{ return Convert::convert_host(*wv); }
static inline void
writeval(Valtype* wv, Valtype v)
{ *wv = Convert::convert_host(v); }
};
```
Now, for example, the linker reads a 16-bit big-endian value using
`Swap<16,true>::readval`. This works because the linker always knows how much
data to swap in, and it always knows whether it is reading big- or
little-endian data.

107
linkers-2.md Normal file
View File

@ -0,0 +1,107 @@
# Linkers part 2
Im back, and Im still doing the linker technical introduction.
Shared libraries were invented as an optimization for virtual memory systems
running many processes simultaneously. People noticed that there is a set of
basic functions which appear in almost every program. Before shared libraries,
in a system which runs multiple processes simultaneously, that meant that
almost every process had a copy of exactly the same code. This suggested that
on a virtual memory system it would be possible to arrange that code so that a
single copy could be shared by every process using it. The virtual memory
system would be used to map the single copy into the address space of each
process which needed it. This would require less physical memory to run
multiple programs, and thus yield better performance.
I believe the first implementation of shared libraries was on SVR3, based on
COFF. This implementation was simple, and basically assigned each shared
library a fixed portion of the virtual address space. This did not require any
significant changes to the linker. However, requiring each shared library to
reserve an appropriate portion of the virtual address space was inconvenient.
SunOS4 introduced a more flexible version of shared libraries, which was later
picked up by SVR4. This implementation postponed some of the operation of the
linker to runtime. When the program started, it would automatically run a
limited version of the linker which would link the program proper with the
shared libraries. The version of the linker which runs when the program starts
is known as the dynamic linker. When it is necessary to distinguish them, I
will refer to the version of the linker which creates the program as the
program linker. This type of shared libraries was a significant change to the
traditional program linker: it now had to build linking information which could
be used efficiently at runtime by the dynamic linker.
That is the end of the introduction. You should now understand the basics of
what a linker does. I will now turn to how it does it.
## Basic Linker Data Types
The linker operates on a small number of basic data types: symbols,
relocations, and contents. These are defined in the input object files. Here is
an overview of each of these.
A symbol is basically a name and a value. Many symbols represent static objects
in the original source codethat is, objects which exist in a single place for
the duration of the program. For example, in an object file generated from C
code, there will be a symbol for each function and for each global and static
variable. The value of such a symbol is simply an offset into the contents.
This type of symbol is known as a defined symbol. Its important not to confuse
the value of the symbol representing the variable `my_global_var` with the
value of `my_global_var` itself. The value of the symbol is roughly the address
of the variable: the value you would get from the expression
`&my_global_var` in C.
Symbols are also used to indicate a reference to a name defined in a different
object file. Such a reference is known as an undefined symbol. There are other
less commonly used types of symbols which I will describe later.
During the linking process, the linker will assign an address to each defined
symbol, and will resolve each undefined symbol by finding a defined symbol with
the same name.
A relocation is a computation to perform on the contents. Most relocations
refer to a symbol and to an offset within the contents. Many relocations will
also provide an additional operand, known as the addend. A simple, and commonly
used, relocation is “set this location in the contents to the value of this
symbol plus this addend.” The types of computations that relocations do are
inherently dependent on the architecture of the processor for which the linker
is generating code. For example, RISC processors which require two or more
instructions to form a memory address will have separate relocations to be
used with each of those instructions; for example, “set this location in the
contents to the lower 16 bits of the value of this symbol.”
During the linking process, the linker will perform all of the relocation
computations as directed. A relocation in an object file may refer to an
undefined symbol. If the linker is unable to resolve that symbol, it will
normally issue an error (but not always: for some symbol types or some
relocation types an error may not be appropriate).
The contents are what memory should look like during the execution of the
program. Contents have a size, an array of bytes, and a type. They contain the
machine code generated by the compiler and assembler (known as text). They
contain the values of initialized variables (data). They contain static
unnamed data like string constants and switch tables (read-only data or rdata).
They contain uninitialized variables, in which case the array of bytes is
generally omitted and assumed to contain only zeroes (bss). The compiler and
the assembler work hard to generate exactly the right contents, but the linker
really doesnt care about them except as raw data. The linker reads the
contents from each file, concatenates them all together sorted by type,
applies the relocations, and writes the result into the executable file.
## Basic Linker Operation
At this point we already know enough to understand the basic steps used by
every linker.
* Read the input object files. Determine the length and type of the contents.
Read the symbols.
* Build a symbol table containing all the symbols, linking undefined symbols to
their definitions.
* Decide where all the contents should go in the output executable file, which
means deciding where they should go in memory when the program runs.
* Read the contents data and the relocations. Apply the relocations to the
contents. Write the result to the output file.
* Optionally write out the complete symbol table with the final values of the
symbols.
More tomorrow.

34
linkers-20.md Normal file
View File

@ -0,0 +1,34 @@
# Linkers part 20
This will be my last blog posting on linkers for the time being. Tomorrow my
blog will return to its usual trivialities. People who are specifically
interested in linker information are warned to stop reading with this post.
Ill close the series with a short update on gold, the new linker Ive been
working on. It currently (September 25, 2007) can create executables. It can
not create shared libraries or relocateable objects. It has very limited
support for linker scriptsenough to read `/usr/lib/libc.so` on a GNU/Linux
system. It doesnt have any interesting new features at this point. It only
supports x86. The focus to date has been entirely on speed. It is written to be
multi-threaded, but the threading support has not been hooked in yet.
By way of example, when linking a 900M C++ executable, the GNU linker (version
2.16.91 20060118 on an Ubuntu based system) took 700 seconds of user time, 24
seconds of system time, and 16 minutes of wall time. gold took 7 seconds of
user time, 3 seconds of system time, and 30 seconds of wall time. So while I
cant promise that it will stay as fast as all features are added, its in a
pretty good position at the moment.
Im the main developer on gold, but Im not the only person working on it. A
few other people are also making improvements.
The goal is to release gold as a free program, ideally as part of the GNU
binutils. I want it to be more nearly feature complete before doing this,
though. It needs to at least support `-shared` and `-r`. I doubt gold will ever
support all of the features of the GNU linker. I doubt it will ever support the
full GNU linker script language, although I do plan to support enough to link
the Linux kernel.
Future plans for gold, once it actually works, include incremental linking and
more far-reaching speed improvements.

90
linkers-3.md Normal file
View File

@ -0,0 +1,90 @@
# Linkers part 3
Continuing notes on linkers.
## Address Spaces
An address space is simply a view of memory, in which each byte has an address.
The linker deals with three distinct types of address space.
Every input object file is a small address space: the contents have addresses,
and the symbols and relocations refer to the contents by addresses.
The output program will be placed at some location in memory when it runs.
This is the output address space, which I generally refer to as using virtual
memory addresses.
The output program will be loaded at some location in memory. This is the load
memory address. On typical Unix systems virtual memory addresses and load
memory addresses are the same. On embedded systems they are often different;
for example, the initialized data (the initial contents of global or static
variables) may be loaded into ROM at the load memory address, and then copied
into RAM at the virtual memory address.
Shared libraries can normally be run at different virtual memory address in
different processes. A shared library has a base address when it is created;
this is often simply zero. When the dynamic linker copies the shared library
into the virtual memory space of a process, it must apply relocations to
adjust the shared library to run at its virtual memory address. Shared library
systems minimize the number of relocations which must be applied, since they
take time when starting the program.
## Object File Formats
As I said above, an assembler turns human readable assembly language into an
object file. An object file is a binary data file written in a format designed
as input to the linker. The linker generates an executable file. This
executable file is a binary data file written in a format designed as input for
the operating system or the loader (this is true even when linking dynamically,
as normally the operating system loads the executable before invoking the
dynamic linker to begin running the program). There is no logical requirement
that the object file format resemble the executable file format. However,
in practice they are normally very similar.
Most object file formats define sections. A section typically holds memory
contents, or it may be used to hold other types of data. Sections generally
have a name, a type, a size, an address, and an associated array of data.
Object file formats may be classed in two general types: record oriented and
section oriented.
A record oriented object file format defines a series of records of varying
size. Each record starts with some special code, and may be followed by data.
Reading the object file requires reading it from the begininng and processing
each record. Records are used to describe symbols and sections. Relocations may
be associated with sections or may be specified by other records. IEEE-695
and Mach-O are record oriented object file formats used today.
In a section oriented object file format the file header describes a section
table with a specified number of sections. Symbols may appear in a separate
part of the object file described by the file header, or they may appear in a
special section. Relocations may be attached to sections, or they may appear in
separate sections. The object file may be read by reading the section table,
and then reading specific sections directly. ELF, COFF, PE, and a.out are
section oriented object file formats.
Every object file format needs to be able to represent debugging information.
Debugging informations is generated by the compiler and read by the debugger.
In general the linker can just treat it like any other type of data. However,
in practice the debugging information for a program can be larger than the
actual program itself. The linker can use various techniques to reduce the
amount of debugging information, thus reducing the size of the executable.
This can speed up the link, but requires the linker to understand the
debugging information.
The a.out object file format stores debugging information using special strings
in the symbol table, known as stabs. These special strings are simply the names
of symbols with a special type. This technique is also used by some variants of
ECOFF, and by older versions of Mach-O.
The COFF object file format stores debugging information using special fields
in the symbol table. This type information is limited, and is completely
inadequate for C++. A common technique to work around these limitations is to
embed stabs strings in a COFF section.
The ELF object file format stores debugging information in sections with
special names. The debugging information can be stabs strings or the DWARF
debugging format.
More next week.

177
linkers-4.md Normal file
View File

@ -0,0 +1,177 @@
# Linkers part 4
## Shared Libraries
Weve talked a bit about what object files and executables look like, so what
do shared libraries look like? Im going to focus on ELF shared libraries as
used in SVR4 (and GNU/Linux, etc.), as they are the most flexible shared
library implementation and the one I know best.
Windows shared libraries, known as DLLs, are less flexible in that you have to
compile code differently depending on whether it will go into a shared library
or not. You also have to express symbol visibility in the source code. This is
not inherently bad, and indeed ELF has picked up some of these ideas over time,
but the ELF format makes more decisions at link time and is thus more powerful.
When the program linker creates a shared library, it does not yet know which
virtual address that shared library will run at. In fact, in different
processes, the same shared library will run at different address, depending on
the decisions made by the dynamic linker. This means that shared library code
must be position independent. More precisely, it must be position independent
after the dynamic linker has finished loading it. It is always possible for the
dynamic linker to convert any piece of code to run at any virtual address,
given sufficient relocation information. However, performing the reloc
computations must be done every time the program starts, implying that it will
start more slowly. Therefore, any shared library system seeks to generate
position independent code which requires a minimal number of relocations to be
applied at runtime, while still running at close to the runtime efficiency of
position dependent code.
An additional complexity is that ELF shared libraries were designed to be
roughly equivalent to ordinary archives. This means that by default the main
executable may override symbols in the shared library, such that references in
the shared library will call the definition in the executable, even if the
shared library also defines that same symbol. For example, an executable may
define its own version of `malloc`. The C library also defines `malloc`, and
the C library contains code which calls `malloc`. If the executable defines
`malloc` itself, it will override the function in the C library. When some
other function in the C library calls `malloc`, it will call the definition in
the executable, not the definition in the C library.
There are thus different requirements pulling in different directions for any
specific ELF implementation. The right implementation choices will depend on
the characteristics of the processor. That said, most, but not all, processors
make fairly similar decisions. I will describe the common case here. An example
of a processor which uses the common case is the i386; an example of a
processor which make some different decisions is the PowerPC.
In the common case, code may be compiled in two different modes. By default,
code is position dependent. Putting position dependent code into a shared
library will cause the program linker to generate a lot of relocation
information, and cause the dynamic linker to do a lot of processing at
runtime. Code may also be compiled in position independent mode, typically
with the `-fpic` option. Position independent code is slightly slower when it
calls a non-static function or refers to a global or static variable. However,
it requires much less relocation information, and thus the dynamic linker will
start the program faster.
Position independent code will call non-static functions via the *Procedure
Linkage Table* or *PLT*. This PLT does not exist in .o files. In a .o file, use
of the PLT is indicated by a special relocation. When the program linker
processes such a relocation, it will create an entry in the PLT. It will
adjust the instruction such that it becomes a PC-relative call to the PLT
entry. PC-relative calls are inherently position independent and thus do not
require a relocation entry themselves. The program linker will create a
relocation for the PLT entry which tells the dynamic linker which symbol is
associated with that entry. This process reduces the number of dynamic
relocations in the shared library from one per function call to one per
function called.
Further, PLT entries are normally relocated lazily by the dynamic linker. On
most ELF systems this laziness may be overridden by setting the LD_BIND_NOW
environment variable when running the program. However, by default, the dynamic
linker will not actually apply a relocation to the PLT until some code actually
calls the function in question. This also speeds up startup time, in that many
invocations of a program will not call every possible function. This is
particularly true when considering the shared C library, which has many more
function calls than any typical program will execute.
In order to make this work, the program linker initializes the PLT entries to
load an index into some register or push it on the stack, and then to branch to
common code. The common code calls back into the dynamic linker, which uses the
index to find the appropriate PLT relocation, and uses that to find the
function being called. The dynamic linker then initializes the PLT entry with
the address of the function, and then jumps to the code of the function. The
next time the function is called, the PLT entry will branch directly to the
function.
Before giving an example, I will talk about the other major data structure in
position independent code, the *Global Offset Table* or *GOT*. This is used for
global and static variables. For every reference to a global variable from
position independent code, the compiler will generate a load from the GOT to
get the address of the variable, followed by a second load to get the actual
value of the variable. The address of the GOT will normally be held in a
register, permitting efficient access. Like the PLT, the GOT does not exist in
a .o file, but is created by the program linker. The program linker will create
the dynamic relocations which the dynamic linker will use to initialize the GOT
at runtime. Unlike the PLT, the dynamic linker always fully initializes the GOT
when the program starts.
For example, on the i386, the address of the GOT is held in the register
`%ebx`. This register is initialized at the entry to each function in position
independent code. The initialization sequence varies from one compiler to
another, but typically looks something like this:
```asm
call __i686.get_pc_thunk.bx
add $offset,%ebx
```
The function `__i686.get_pc_thunk.bx` simply looks like this:
```asm
mov (%esp),%ebx
ret
```
This sequence of instructions uses a position independent sequence to get the
address at which it is running. Then is uses an offset to get the address of
the GOT. Note that this requires that the GOT always be a fixed offset from the
code, regardless of where the shared library is loaded. That is, the dynamic
linker must load the shared library as a fixed unit; it may not load different
parts at varying addresses.
Global and static variables are now read or written by first loading the
address via a fixed offset from `%ebx`. The program linker will create dynamic
relocations for each entry in the GOT, telling the dynamic linker how to
initialize the entry. These relocations are of type `GLOB_DAT`.
For function calls, the program linker will set up a PLT entry to look like
this:
```asm
jmp *offset(%ebx)
pushl #index
jmp first_plt_entry
```
The program linker will allocate an entry in the GOT for each entry in the
PLT. It will create a dynamic relocation for the GOT entry of type `JMP_SLOT`.
It will initialize the GOT entry to the base address of the shared library plus
the address of the second instruction in the code sequence above. When the
dynamic linker does the initial lazy binding on a `JMP_SLOT` reloc, it will
simply add the difference between the shared library load address and the
shared library base address to the GOT entry. The effect is that the first jmp
instruction will jump to the second instruction, which will push the index
entry and branch to the first PLT entry. The first PLT entry is special, and
looks like this:
```asm
pushl 4(%ebx)
jmp *8(%ebx)
```
This references the second and third entries in the GOT. The dynamic linker
will initialize them to have appropriate values for a callback into the dynamic
linker itself. The dynamic linker will use the index pushed by the first code
sequence to find the `JMP_SLOT` relocation. When the dynamic linker determines
the function to be called, it will store the address of the function into the
GOT entry references by the first code sequence. Thus, the next time the
function is called, the jmp instruction will branch directly to the right code.
That was a fast pass over a lot of details, but I hope that it conveys the
main idea. It means that for position independent code on the i386, every call
to a global function requires one extra instruction after the first time it is
called. Every reference to a global or static variable requires one extra
instruction. Almost every function uses four extra instructions when it starts
to initialize `%ebx` (leaf functions which do not refer to any global variables
do not need to initialize `%ebx`). This all has some negative impact on the
program cache. This is the runtime performance penalty paid to let the dynamic
linker start the program quickly.
On other processors, the details are naturally different. However, the general
flavour is similar: position independent code in a shared library starts faster
and runs slightly slower.
More tomorrow.

184
linkers-5.md Normal file
View File

@ -0,0 +1,184 @@
# Linkers part 5
## Shared Libraries Redux
Yesterday I talked about how shared libraries work. I realized that I should
say something about how linkers implement shared libraries. This discussion
will again be ELF specific.
When the program linker puts position dependent code into a shared library, it
has to copy more of the relocations from the object file into the shared
library. They will become dynamic relocations computed by the dynamic linker at
runtime. Some relocations do not have to be copied; for example, a PC relative
relocation to a symbol which is local to shared library can be fully resolved
by the program linker, and does not require a dynamic reloc. However, note that
a PC relative relocation to a global symbol does require a dynamic relocation;
otherwise, the main executable would not be able to override the symbol. Some
relocations have to exist in the shared library, but do not need to be actual
copies of the relocations in the object file; for example, a relocation which
computes the absolute address of symbol which is local to the shared library
can often be replaced with a `RELATIVE` reloc, which simply directs the dynamic
linker to add the difference between the shared librarys load address and its
base address. The advantage of using a `RELATIVE` reloc is that the dynamic
linker can compute it quickly at runtime, because it does not require
determining the value of a symbol.
For position independent code, the program linker has a harder job. The
compiler and assembler will cooperate to generate special relocs for position
independent code. Although details differ among processors, there will
typically be a `PLT` reloc and a `GOT` reloc. These relocs will direct the program
linker to add an entry to the PLT or the GOT, as well as performing some
computation. For example, on the i386 a function call in position independent
code will generate a `R_386_PLT32` reloc. This reloc will refer to a symbol as
usual. It will direct the program linker to add a PLT entry for that symbol,
if one does not already exist. The computation of the reloc is then a
PC-relative reference to the PLT entry. (The `32` in the name of the reloc
refers to the size of the reference, which is 32 bits). Yesterday I described
how on the i386 every PLT entry also has a corresponding GOT entry, so the
`R_386_PLT32` reloc actually directs the program linker to create both a PLT
entry and a GOT entry.
When the program linker creates an entry in the PLT or the GOT, it must also
generate a dynamic reloc to tell the dynamic linker about the entry. This will
typically be a `JMP_SLOT` or `GLOB_DAT` relocation.
This all means that the program linker must keep track of the PLT entry and the
GOT entry for each symbol. Initially, of course, there will be no such entries.
When the linker sees a PLT or GOT reloc, it must check whether the symbol
referenced by the reloc already has a PLT or GOT entry, and create one if it
does not. Note that it is possible for a single symbol to have both a PLT entry
and a GOT entry; this will happen for position independent code which both
calls a function and also takes its address.
The dynamic linkers job for the PLT and GOT tables is to simply compute the
`JMP_SLOT` and `GLOB_DAT` relocs at runtime. The main complexity here is the
lazy evaluation of PLT entries which I described yesterday.
The fact that C permits taking the address of a function introduces an
interesting wrinkle. In C you are permitted to take the address of a function,
and you are permitted to compare that address to another function address. The
problem is that if you take the address of a function in a shared library, the
natural result would be to get the address of the PLT entry. After all, that is
address to which a call to the function will jump. However, each shared library
has its own PLT, and thus the address of a particular function would differ in
each shared library. That means that comparisons of function pointers generated
in different shared libraries may be different when they should be the same.
This is not a purely hypothetical problem; when I did a port which got it
wrong, before I fixed the bug I saw failures in the Tcl shared library when it
compared function pointers.
The fix for this bug on most processors is a special marking for a symbol which
has a PLT entry but is not defined. Typically the symbol will be marked as
undefined, but with a non-zero valuethe value will be set to the address of
the PLT entry. When the dynamic linker is searching for the value of a symbol
to use for a reloc other than a `JMP_SLOT` reloc, if it finds such a specially
marked symbol, it will use the non-zero value. This will ensure that all
references to the symbol which are not function calls will use the same value.
To make this work, the compiler and assembler must make sure that any reference
to a function which does not involve calling it will not carry a standard PLT
reloc. This special handling of function addresses needs to be implemented in
both the program linker and the dynamic linker.
## ELF Symbols
OK, enough about shared libraries. Lets go over ELF symbols in more detail.
Im not going to lay out the exact data structuresgo to the ELF ABI for that.
Im going to take about the different fields and what they mean. Many of the
different types of ELF symbols are also used by other object file formats, but
I wont cover that.
An entry in an ELF symbol table has eight pieces of information: a name, a
value, a size, a section, a binding, a type, a visibility, and undefined
additional information (currently there are six undefined bits, though more may
be added). An ELF symbol defined in a shared object may also have an associated
version name.
The name is obvious.
For an ordinary defined symbol, the section is some section in the file
(specifically, the symbol table entry holds an index into the section table).
For an object file the value is relative to the start of the section. For an
executable the value is an absolute address. For a shared library the value is
relative to the base address.
For an undefined reference symbol, the section index is the special value
`SHN_UNDEF` which has the value `0`. A section index of `SHN_ABS` (`0xfff1`)
indicates that the value of the symbol is an absolute value, not relative to
any section.
A section index of `SHN_COMMON` (`0xfff2`) indicates a common symbol. Common
symbols were invented to handle Fortran common blocks, and they are also often
used for uninitialized global variables in C. A common symbol has unusual
semantics. Common symbols have a value of zero, but set the size field to the
desired size. If one object file has a common symbol and another has a
definition, the common symbol is treated as an undefined reference. If there is
no definition for a common symbol, the program linker acts as though it saw a
definition initialized to zero of the appropriate size. Two object files may
have common symbols of different sizes, in which case the program linker will
use the largest size. Implementing common symbol semantics across shared
libraries is a touchy subject, somewhat helped by the recent introduction of a
type for common symbols as well as a special section index (see the discussion
of symbol types below).
The size of an ELF symbol, other than a common symbol, is the size of the
variable or function. This is mainly used for debugging purposes.
The binding of an elf symbol is global, local, or weak. A global symbol is
globally visible. A local symbol is only locally visible (e.g., a static
function). Weak symbols come in two flavors. A weak undefined reference is like
an ordinary undefined reference, except that it is not an error if a relocation
refers to a weak undefined reference symbol which has no defining symbol.
Instead, the relocation is computed as though the symbol had the value zero.
A weak defined symbol is permitted to be linked with a non-weak defined symbol
of the same name without causing a multiple definition error. Historically
there are two ways for the program linker to handle a weak defined symbol. On
SVR4 if the program linker sees a weak defined symbol followed by a non-weak
defined symbol with the same name, it will issue a multiple definition error.
However, a non-weak defined symbol followed by a weak defined symbol will not
cause an error. On Solaris, a weak defined symbol followed by a non-weak
defined symbol is handled by causing all references to attach to the non-weak
defined symbol, with no error. This difference in behaviour is due to an
ambiguity in the ELF ABI which was read differently by different people. The
GNU linker follows the Solaris behaviour.
The type of an ELF symbol is one of the following:
* `STT_NOTYPE`: no particular type.
* `STT_OBJECT`: a data object, such as a variable.
* `STT_FUNC`: a function
* `STT_SECTION`: a local symbol associated with a section. This type of symbol
is used to reduce the number of local symbols required, by changing all
relocations against local symbols in a specific section to use the
STT_SECTION symbol instead.
* `STT_FILE`: a special symbol whose name is the name of the source file which
produced the object file.
* `STT_COMMON`: a common symbol. This is the same as setting the section index
to `SHN_COMMON`, except in a shared object. The program linker will normally
have allocated space for the common symbol in the shared object, so it will
have a real section index. The `STT_COMMON` type tells the dynamic linker
that although the symbol has a regular definition, it is a common symbol.
* `STT_TLS`: a symbol in the Thread Local Storage area. I will describe this in
more detail some other day.
ELF symbol visibility was invented to provide more control over which symbols
were accessible outside a shared library. The basic idea is that a symbol may
be global within a shared library, but local outside the shared library.
* `STV_DEFAULT`: the usual visibility rules apply: global symbols are visible
everywhere.
* `STV_INTERNAL`: the symbol is not accessible outside the current executable
or shared library.
* `STV_HIDDEN`: the symbol is not visible outside the current executable or
shared library, but it may be accessed indirectly, probably because some code
took its address.
* `STV_PROTECTED`: the symbol is visible outside the current executable or
shared object, but it may not be overridden. That is, if a protected symbol
in a shared library is referenced by other code in the shared library, that
other code will always reference the symbol in the shared library, even if
the executable defines a symbol with the same name.
Ill described symbol versions later.
More tomorrow.

127
linkers-6.md Normal file
View File

@ -0,0 +1,127 @@
# Linkers part 6
So many things to talk about. Lets go back and cover relocations in some more
detail, with some examples.
## Relocations
As I said back in part 2, a relocation is a computation to perform on the
contents. And as I said yesterday, a relocation can also direct the linker to
take other actions, like creating a PLT or GOT entry. Lets take a closer look
at the computation.
In general a relocation has a type, a symbol, an offset into the contents, and
an addend. From the linkers point of view, the contents are simply an
uninterpreted series of bytes. A relocation changes those bytes as necessary to
produce the correct final executable. For example, consider the C code
`g = 0;` where `g` is a global variable. On the i386, the compiler will turn
this into an assembly language instruction, which will most likely be
`movl $0, g` (for position dependent codeposition independent code would
loading the address of `g` from the GOT). Now, the `g` in the C code is a
global variable, and we all more or less know what that means. The `g` in the
assembly code is not that variable. It is a symbol which holds the address of
that variable.
The assembler does not know the address of the global variable `g`, which is
another way of saying that the assembler does not know the value of the symbol
`g`. It is the linker that is going to pick that address. So the assembler has
to tell the linker that it needs to use the address of `g` in this instruction.
The way the assembler does this is to create a relocation. We dont use a
separate relocation type for each instruction; instead, each processor will
have a natural set of relocation types which are appropriate for the machine
architecture. Each type of relocation expresses a specific computation.
In the i386 case, the assembler will generate these bytes:
```
c7 05 00 00 00 00 00 00 00 00
```
The `c7 05` are the instruction (movl constant to address). The first four `00`
bytes are the 32-bit constant 0. The second four `00` bytes are the address.
The assembler tells the linker to put the value of the symbol `g` into those
four bytes by generating (in this case) a `R_386_32` relocation. For this
relocation the symbol will be `g`, the offset will be to the last four bytes of
the instruction, the type will be `R_386_32`, and the addend will be 0 (in the
case of the i386 the addend is stored in the contents rather than in the
relocation itself, but this is a detail). The type `R_386_32` expresses a
specific computation, which is: put the 32-bit sum of the value of the symbol
and the addend into the offset. Since for the i386 the addend is stored in the
contents, this can also be expressed as: add the value of the symbol to the
32-bit field at the offset. When the linker performs this computation, the
address in the instruction will be the address of the global variable g.
Regardless of the details, the important point to note is that the relocation
adjusts the contents by applying a specific computation selected by the type.
An example of a simple case which does use an addend would be
```c
char a[10]; // A global array.
char* p = &a[1]; // In a function.
```
The assignment to p will wind up requiring a relocation for the symbol `a`.
Here the addend will be 1, so that the resulting instruction references `a + 1`
rather than `a + 0`.
To point out how relocations are processor dependent, lets consider `g = 0;`
on a RISC processor: the PowerPC (in 32-bit mode). In this case, multiple
assembly language instructions are required:
```asm
li 1,0 // Set register 1 to 0
lis 9,g@ha // Load high-adjusted part of g into register 9
stw 1,g@l(9) // Store register 1 to address in register 9 plus low adjusted part g
```
The `lis` instruction loads a value into the upper 16 bits of register 9,
setting the lower 16 bits to zero. The `stw` instruction adds a signed 16 bit
value to register 9 to form an address, and then stores the value of register 1
at that address. The `@ha` part of the operand directs the assembler to
generate a `R_PPC_ADDR16_HA` reloc. The `@l` produces a `R_PPC_ADDR16_LO`
reloc. The goal of these relocs is to compute the value of the symbol `g` and
use it as the store address.
That is enough information to determine the computations performed by these
relocs. The `R_PPC_ADDR16_HA` reloc computes
`(SYMBOL >> 16) + ((SYMBOL & 0x8000) ? 1 : 0)`. `The R_PPC_ADDR16_LO` computes
`SYMBOL & 0xffff`. The extra computation for `R_PPC_ADDR16_HA` is because the
`stw` instruction adds the signed 16-bit value, which means that if the low 16
bits appears negative we have to adjust the high 16 bits accordingly. The
offsets of the relocations are such that the 16-bit resulting values are stored
into the appropriate parts of the machine instructions.
The specific examples of relocations Ive discussed here are ELF specific, but
the same sorts of relocations occur for any object file format.
The examples Ive shown are for relocations which appear in an object file. As
discussed in part 4, these types of relocations may also appear in a shared
library, if they are copied there by the program linker. In ELF, there are also
specific relocation types which never appear in object files but only appear in
shared libraries or executables. These are the `JMP_SLOT`, `GLOB_DAT`, and
`RELATIVE` relocations discussed earlier. Another type of relocation which only
appears in an executable is a `COPY` relocation, which I will discuss later.
## Position Dependent Shared Libraries
I realized that in part 4 I forgot to say one of the important reasons that ELF
shared libraries use PLT and GOT tables. The idea of a shared library is to
permit mapping the same shared library into different processes. This only
works at maximum efficiency if the shared library code looks the same in each
process. If it does not look the same, then each process will need its own
private copy, and the savings in physical memory and sharing will be lost.
As discussed in part 4, when the dynamic linker loads a shared library which
contains position dependent code, it must apply a set of dynamic relocations.
Those relocations will change the code in the shared library, and it will no
longer be sharable.
The advantage of the PLT and GOT is that they move the relocations elsewhere,
to the PLT and GOT tables themselves. Those tables can then be put into a
read-write part of the shared library. This part of the shared library will be
much smaller than the code. The PLT and GOT tables will be different in each
process using the shared library, but the code will be the same.
Ill be taking a vacation for the long weekend. My next post will most likely
be on Tuesday.

176
linkers-7.md Normal file
View File

@ -0,0 +1,176 @@
# Linkers part 7
As weve seen, what linkers do is basically quite simple, but the details can
get complicated. The complexity is because smart programmers can see small
optimizations to speed up their programs a little bit, and somtimes the only
place those optimizations can be implemented is the linker. Each such
optimizations makes the linker a little more complicated. At the same time, of
course, the linker has to run as fast as possible, since nobody wants to sit
around waiting for it to finish. Today Ill talk about a classic small
optimization implemented by the linker.
## Thread Local Storage
Ill assume you know what a thread is. It is often useful to have a global
variable which can take on a different value in each thread (if you dont see
why this is useful, just trust me on this). That is, the variable is global to
the program, but the specific value is local to the thread. If thread A sets
the thread local variable to 1, and thread B then sets it to 2, then code
running in thread A will continue to see the value 1 for the variable while
code running in thread B sees the value 2. In Posix threads this type of
variable can be created via `pthread_key_create` and accessed via
`pthread_getspecific` and `pthread_setspecific`.
Those functions work well enough, but making a function call for each access is
awkward and inconvenient. It would be more useful if you could just declare a
regular global variable and mark it as thread local. That is the idea of Thread
Local Storage (TLS), which I believe was invented at Sun. On a system which
supports TLS, any global (or static) variable may be annotated with `__thread`.
The variable is then thread local.
Clearly this requires support from the compiler. It also requires support from
the program linker and the dynamic linker. For maximum efficiencyand why do
this if you arent going to get maximum efficiency?some kernel support is also
needed. The design of TLS on ELF systems fully supports shared libraries,
including having multiple shared libraries, and the executable itself, use the
same name to refer to a single TLS variable. TLS variables can be initialized.
Programs can take the address of a TLS variable, and pass the pointers between
threads, so the address of a TLS variable is a dynamic value and must be
globally unique.
How is this all implemented? First step: define different storage models for
TLS variables.
* Global Dynamic: Fully general access to TLS variables from an executable or a
shared object.
* Local Dynamic: Permits access to a variable which is bound locally within the
executable or shared object from which it is referenced. This is true for all
static TLS variables, for example. It is also true for protected symbolsI
described those back in part 5.
* Initial Executable: Permits access to a variable which is known to be part of
the TLS image of the executable. This is true for all TLS variables defined
in the executable itself, and for all TLS variables in shared libraries
explicitly linked with the executable. This is not true for accesses from a
shared library, nor for accesses to TLS variables defined in shared libraries
opened by `dlopen`.
* Local Executable: Permits access to TLS variables defined in the executable
itself.
These storage models are defined in decreasing order of flexibility. Now, for
efficiency and simplicity, a compiler which supports TLS will permit the
developer to specify the appropriate TLS model to use (with gcc, this is done
with the `-ftls-model` option, although the Global Dynamic and Local Dynamic
models also require using `-fpic`). So, when compiling code which will be in an
executable and never be in a shared library, the developer may choose to set
the TLS storage model to Initial Executable.
Of course, in practice, developers often do not know where code will be used.
And developers may not be aware of the intricacies of TLS models. The program
linker, on the other hand, knows whether it is creating an executable or a
shared library, and it knows whether the TLS variable is defined locally. So
the program linker gets the job of automatically optimizing references to TLS
variables when possible. These references take the form of relocations, and the
linker optimizes the references by changing the code in various ways.
The program linker is also responsible for gathering all TLS variables together
into a single TLS segment (Ill talk more about segments later, for now think
of them as a section). The dynamic linker has to group together the TLS
segments of the executable and all included shared libraries, resolve the
dynamic TLS relocations, and has to build TLS segments dynamically when dlopen
is used. The kernel has to make it possible for access to the TLS segments be
efficient.
That was all pretty general. Lets do an example, again for i386 ELF. There are
three different implementations of i386 ELF TLS; Im going to look at the gnu
implementation. Consider this trivial code:
```asm
__thread int i;
int foo() { return i; }
```
In global dynamic mode, this generates i386 assembler code like this:
```asm
leal i@TLSGD(,%ebx,1), %eax
call ___tls_get_addr@PLT
movl (%eax), %eax
```
Recall from part 4 that `%ebx` holds the address of the GOT table. The first
instruction will have a `R_386_TLS_GD` relocation for the variable `i`; the
relocation will apply to the offset of the leal instruction. When the program
linker sees this relocation, it will create two consecutive entries in the GOT
table for the TLS variable `i`. The first one will get a `R_386_TLS_DTPMOD32`
dynamic relocation, and the second will get a `R_386_TLS_DTPOFF32` dynamic
relocation. The dynamic linker will set the `DTPMOD32` GOT entry to hold the
module ID of the object which defines the variable. The module ID is an index
within the dynamic linkers tables which identifies the executable or a
specific shared library. The dynamic linker will set the `DTPOFF32` GOT entry
to the offset within the TLS segment for that module. The `__tls_get_addr`
function will use those values to compute the address (this function also takes
care of lazy allocation of TLS variables, which is a further optimization
specific to the dynamic linker). Note that `__tls_get_addr` is actually
implemented by the dynamic linker itself; it follows that global dynamic TLS
variables are not supported (and not necessary) in statically linked
executables.
At this point you are probably wondering what is so inefficient
about `pthread_getspecific`. The real advantage of TLS shows when you see what
the program linker can do. The `leal; call` sequence shown above is canonical:
the compiler will always generate the same sequence to access a TLS variable in
global dynamic mode. The program linker takes advantage of that fact. If the
program linker sees that the code shown above is going into an executable, it
knows that the access does not have to be treated as global dynamic; it can be
treated as initial executable. The program linker will actually rewrite the
code to look like this:
```asm
movl %gs:0, %eax
subl $i@GOTTPOFF(%ebx), %eax
```
Here we see that the TLS system has coopted the `%gs` segment register, with
cooperation from the operating system, to point to the TLS segment of the
executable. For each processor which supports TLS, some such efficiency hack is
made. Since the program linker is building the executable, it builds the TLS
segment, and knows the offset of `i` in the segment. The `GOTTPOFF` is not a
real relocation; it is created and then resolved within the program linker. It
is, of course, the offset from the GOT table to the address of `i` in the TLS
segment. The `movl (%eax), %eax` from the original sequence remains to actually
load the value of the variable.
Actually, that is what would happen if `i` were not defined in the executable
itself. In the example I showed, `i` is defined in the executable, so the
program linker can actually go from a global dynamic access all the way to a
local executable access. That looks like this:
```asm
movl %gs:0,%eax
subl $i@TPOFF,%eax
```
Here `i@TPOFF` is simply the known offset of `i` within the TLS segment. Im
not going to go into why this uses `subl` rather than `addl`; suffice it to say
that this is another efficiency hack in the dynamic linker.
If you followed all that, youll see that when an executable accesses a TLS
variable which is defined in that executable, it requires two instructions to
compute the address, typically followed by another one to actually load or
store the value. That is significantly more efficient than calling
`pthread_getspecific`. Admittedly, when a shared library accesses a TLS
variable, the result is not much better than `pthread_getspecific`, but it
shouldnt be any worse, either. And the code using `__thread` is much easier to
write and to read.
That was a real whirlwind tour. There are three separate but related TLS
implementations on i386 (known as sun, gnu, and gnu2), and 23 different
relocation types are defined. Im certainly not going to try to describe all
the details; I dont know them all in any case. They all exist in the name of
efficient access to the TLS variables for a given storage model.
Is TLS worth the additional complexity in the program linker and the dynamic
linker? Since those tools are used for every program, and since the C standard
global variable `errno` in particular can be implemented using TLS, the answer
is most likely yes.

193
linkers-8.md Normal file
View File

@ -0,0 +1,193 @@
# Linkers part 8
## ELF Segments
Earlier I said that executable file formats were normally the same as object
file formats. That is true for ELF, but with a twist. In ELF, object files are
composed of sections: all the data in the file is accessed via the section
table. Executables and shared libraries normally contain a section table, which
is used by programs like `nm`. But the operating system and the dynamic linker
do not use the section table. Instead, they use the segment table, which
provides an alternative view of the file.
All the contents of an ELF executable or shared library which are to be loaded
into memory are contained within a segment (an object file does not have
segments). A segment has a type, some flags, a file offset, a virtual address,
a physical address, a file size, a memory size, and an alignment. The file
offset points to a contiguous set of bytes which are the contents of the
segment, the bytes to load into memory. When the operating system or the
dynamic linker loads a file, it will do so by walking through the segments and
loading them into memory (typically by using the mmap system call). All the
information needed by the dynamic linkerthe dynamic relocations, the dynamic
symbol table, etc.are accessed via information stored in special segments.
Although an ELF executable or shared library does not, strictly speaking,
require any sections, they normally do have them. The contents of a loadable
section will fall entirely within a single segment.
The program linker reads sections from the input object files. It sorts and
concatenates them into sections in the output file. It maps all the loadable
sections into segments in the output file. It lays out the section contents in
the output file segments respecting alignment and access requirements, so that
the segments may be mapped directly into memory. The sections are mapped to
segments based on the access requirements: normally all the read-only sections
are mapped to one segment and all the writable sections are mapped to another
segment. The address of the latter segment will be set so that it starts on a
separate page in memory, permitting `mmap` to set different permissions on the
mapped pages.
The segment flags are a bitmask which define access requirements. The defined
flags are `PF_R`, `PF_W`, and `PF_X`, which mean, respectively, that the
contents must be made readable, writable, or executable.
The segment virtual address is the memory address at which the segment contents
are loaded at runtime. The physical address is officially undefined, but is
often used as the load address when using a system which does not use virtual
memory. The file size is the size of the contents in the file. The memory size
may be larger than the file size when the segment contains uninitialized data;
the extra bytes will be filled with zeroes. The alignment of the segment is
mainly informative, as the address is already specified.
The ELF segment types are as follows:
* `PT_NULL`: A null entry in the segment table, which is ignored.
* `PT_LOAD`: A loadable entry in the segment table. The operating system or
dynamic linker load all segments of this type. All other segments with
contents will have their contents contained completely within a `PT_LOAD`
segment.
* `PT_DYNAMIC`: The dynamic segment. This points to a series of dynamic tags
which the dynamic linker uses to find the dynamic symbol table, dynamic
relocations, and other information that it needs.
* `PT_INTERP`: The interpreter segment. This appears in an executable. The
operating system uses it to find the name of the dynamic linker to run for
the executable. Normally all executables will have the same interpreter name,
but on some operating systems different interpreters are used in different
emulation modes.
* `PT_NOTE`: A note segment. This contains system dependent note information
which may be used by the operating system or the dynamic linker. On
GNU/Linux systems shared libraries often have a ABI tag note which may be
used to specify the minimum version of the kernel which is required for the
shared library. The dynamic linker uses this when selecting among different
shared libraries.
* `PT_SHLIB`: This is not used as far as I know.
* `PT_PHDR`: This indicates the address and size of the segment table. This is
not too useful in practice as you have to have already found the segment
table before you can find this segment.
* `PT_TLS`: The TLS segment. This holds the initial values for TLS variables.
* `PT_GNU_EH_FRAME` (`0x6474e550`): A GNU extension used to hold a sorted table
of unwind information. This table is built by the GNU program linker. It is
used by gccs support library to quickly find the appropriate handler for an
exception, without requiring exception frames to be registered when the
program starts.
* `PT_GNU_STACK` (`0x6474e551`): A GNU extension used to indicate whether the
stack should be executable. This segment has no contents. The dynamic linker
sets the permission of the stack in memory to the permissions of this segment.
* `PT_GNU_RELRO` (`0x6474e552`): A GNU extension which tells the dynamic linker
to set the given address and size to be read-only after applying dynamic
relocations. This is used for const variables which require dynamic
relocations.
## ELF Sections
Now that weve done segments, lets take a quick look at the details of ELF
sections. ELF sections are more complicated than segments, in that there are
more types of sections. Every ELF object file, and most ELF executables and
shared libraries, have a table of sections. The first entry in the table,
section 0, is always a null section.
ELF sections have several fields.
* Name.
* Type. I discuss section types below.
* Flags. I discuss section flags below.
* Address. This is the address of the section. In an object file this is
normally zero. In an executable or shared library it is the virtual address.
Since executables are normally accessed via segments, this is essentially
documentation.
* File offset. This is the offset of the contents within the file.
* Size. The size of the section.
* Link. Depending on the section type, this may hold the index of another
section in the section table.
* Info. The meaning of this field depends on the section type.
* Address alignment. This is the required alignment of the section. The program
linker uses this when laying out the section in memory.
* Entry size. For sections which hold an array of data, this is the size of one
data element.
These are the types of ELF sections which the program linker may see.
* `SHT_NULL`: A null section. Sections with this type may be ignored.
* `SHT_PROGBITS`: A section holding bits of the program. This is an ordinary
section with contents.
* `SHT_SYMTAB`: The symbol table. This section actually holds the symbol table
itself. The section contents are an array of ELF symbol structures.
* `SHT_STRTAB`: A string table. This type of section holds null-terminated
strings. Sections of this type are used for the names of the symbols and the
names of the sections themselves.
* `SHT_RELA`: A relocation table. The link field holds the index of the section
to which these relocations apply. These relocations include addends.
* `SHT_HASH`: A hash table used by the dynamic linker to speed symbol lookup.
* `SHT_DYNAMIC`: The dynamic tags used by the dynamic linker. Normally the
`PT_DYNAMIC` segment and the `SHT_DYNAMIC` section will point to the same
contents.
* `SHT_NOTE`: A note section. This is used in system dependent ways. A loadable
`SHT_NOTE` section will become a `PT_NOTE` segment.
* `SHT_NOBITS`: A section which takes up memory space but has no associated
contents. This is used for zero-initialized data.
* `SHT_REL`: A relocation table, like `SHT_RELA` but the relocations have no
addends.
* `SHT_SHLIB`: This is not used as far as I know.
* `SHT_DYNSYM`: The dynamic symbol table. Normally the `DT_SYMTAB` dynamic tag
will point to the same contents as this section (I havent discussed dynamic
tags yet, though).
* `SHT_INIT_ARRAY`: This section holds a table of function addresses which
should each be called at program startup time, or, for a shared library, when
the library is opened by `dlopen`.
* `SHT_FINI_ARRAY`: Like `SHT_INIT_ARRAY`, but called at program exit time or
`dlclose` time.
* `SHT_PREINIT_ARRAY`: Like `SHT_INIT_ARRAY`, but called before any shared
libraries are initialized. Normally shared libraries initializers are run
before the executable initializers. This section type may only be linked into
an executable, not into a shared library.
* `SHT_GROUP`: This is used to group related sections together, so that the
program linker may discard them as a unit when appropriate. Sections of this
type may only appear in object files. The contents of this type of section
are a flag word followed by a series of section indices.
* `SHT_SYMTAB_SHNDX`: ELF symbol table entries only provide a 16-bit field for
the section index. For a file with more than 65536 sections, a section of
this type is created. It holds one 32-bit word for each symbol. If a symbols
section index is `SHN_XINDEX`, the real section index may be found by looking
in the `SHT_SYMTAB_SHNDX` section.
* `SHT_GNU_LIBLIST` (`0x6ffffff7`): A GNU extension used by the prelinker to
hold a list of libraries found by the prelinker.
* `SHT_GNU_verdef` (`0x6ffffffd`): A Sun and GNU extension used to hold version
definitions (Ill take about symbol versions at some point).
* `SHT_GNU_verneed` (`0x6ffffffe`): A Sun and GNU extension used to hold
versions required from other shared libraries.
* `SHT_GNU_versym` (`0x6fffffff`): A Sun and GNU extension used to hold the
versions for each symbol.
These are the types of section flags.
* `SHF_WRITE`: Section contains writable data.
* `SHF_ALLOC`: Section contains data which should be part of the loaded program
image. For example, this would normally be set for a `SHT_PROGBITS` section
and not set for a `SHT_SYMTAB` section.
* `SHF_EXECINSTR`: Section contains executable instructions.
* `SHF_MERGE`: Section contains constants which the program linker may merge
together to save space. The compiler can use this type of section for
read-only data whose address is unimportant.
* `SHF_STRINGS`: In conjunction with `SHF_MERGE`, this means that the section
holds null terminated string constants which may be merged.
* `SHF_INFO_LINK`: This flag indicates that the info field in the section holds
a section index.
* `SHF_LINK_ORDER`: This flag tells the program linker that when it combines
sections, this section must appear in the same relative order as the section
in the link field. This can be used to ensure that address tables are built
in the expected order.
* `SHF_OS_NONCONFORMING`: If the program linker sees a section with this flag,
and does not understand the type or all other flags, then it must issue an
error.
* `SHF_GROUP`: This section appears in a group (see `SHT_GROUP`, above).
* `SHF_TLS`: This section holds TLS data.

104
linkers-9.md Normal file
View File

@ -0,0 +1,104 @@
# Linkers part 9
## Symbol Versions
A shared library provides an API. Since executables are built with a specific
set of header files and linked against a specific instance of the shared
library, it also provides an ABI. It is desirable to be able to update the
shared library independently of the executable. This permits fixing bugs in the
shared library, and it also permits the shared library and the executable to be
distributed separately. Sometimes an update to the shared library requires
changing the API, and sometimes changing the API requires changing the ABI.
When the ABI of a shared library changes, it is no longer possible to update
the shared library without updating the executable. This is unfortunate.
For example, consider the system C library and the `stat` function. When file
systems were upgraded to support 64-bit file offsets, it became necessary to
change the type of some of the fields in the stat struct. This is a change in
the ABI of `stat`. New versions of the system library should provide a `stat`
which returns 64-bit values. But old existing executables call `stat` expecting
32-bit values. This could be addressed by using complicated macros in the
system header files. But there is a better way.
The better way is symbol versions, which were introduced at Sun and extended by
the GNU tools. Every shared library may define a set of symbol versions, and
assign specific versions to each defined symbol. The versions and symbol
assignments are done by a script passed to the program linker when creating the
shared library.
When an executable or shared library A is linked against another shared library
B, and A refers to a symbol S defined in B with a specific version, the
undefined dynamic symbol reference S in A is given the version of the symbol S
in B. When the dynamic linker sees that A refers to a specific version of S, it
will link it to that specific version in B. If B later introduces a new version
of S, this will not affect A, as long as B continues to provide the old version
of S.
For example, when `stat` changes, the C library would provide two versions of
stat, one with the old version (e.g., `LIBC_1.0`), and one with the new version
(`LIBC_2.0`). The new version of `stat` would be marked as the defaultthe
program linker would use it to satisfy references to stat in object files.
Executables linked against the old version would require the `LIBC_1.0` version
of `stat`, and would therefore continue to work. Note that it is even possible
for both versions of `stat` to be used in a single program, accessed from
different shared libraries.
As you can see, the version effectively is part of the name of the symbol. The
biggest difference is that a shared library can define a specific version which
is used to satisfy an unversioned reference.
Versions can also be used in an object file (this is a GNU extension to the
original Sun implementation). This is useful for specifying versions without
requiring a version script. When a symbol name containts the `@` character, the
string before the `@` is the name of the symbol, and the string after the `@`
is the version. If there are two consecutive `@` characters, then this is the
default version.
## Relaxation
Generally the program linker does not change the contents other than applying
relocations. However, there are some optimizations which the program linker can
perform at link time. One of them is relaxation.
Relaxation is inherently processor specific. It consists of optimizing code
sequences which can become smaller or more efficient when final addresses are
known. The most common type of relaxation is for `call` instructions. A
processor like the m68k supports different PC relative `call` instructions: one
with a 16-bit offset, and one with a 32-bit offset. When calling a function
which is within range of the 16-bit offset, it is more efficient to use the
shorter instruction. The optimization of shrinking these instructions at link
time is known as relaxation.
Relaxation is applied based on relocation entries. The linker looks for
relocations which may be relaxed, and checks whether they are in range. If they
are, the linker applies the relaxation, probably shrinking the size of the
contents. The relaxation can normally only be done when the linker recognizes
the instruction being relocated. Applying a relaxation may in turn bring other
relocations within range, so relaxation is typically done in a loop until there
are no more opportunities.
When the linker relaxes a relocation in the middle of a contents, it may need
to adjust any PC relative references which cross the point of the relaxation.
Therefore, the assembler needs to generate relocation entries for all PC
relative references. When not relaxing, these relocations may not be required,
as a PC relative reference within a single contents will be valid whereever the
contents winds up. When relaxing, though, the linker needs to look through all
the other relocations that apply to the contents, and adjust PC relatives one
where appropriate. This adjustment will simply consist of recomputing the PC
relative offset.
Of course it is also possible to apply relaxations which do not change the size
of the contents. For example, on the MIPS the position independent calling
sequence is normally to load the address of the function into the `$25`
register and then to do an indirect call through the register. When the target
of the call is within the 18-bit range of the branch-and-call instruction, it
is normally more efficient to use branch-and-call, since then the processor
does not have to wait for the load of `$25` to complete before starting the
call. This relaxation changes the instruction sequence without changing the
size.
More tomorrow. I apologize for the haphazard arrangement of these linker notes.
Im just writing about ideas as I think of them, rather than being organized
about that. If I do collect these notes into an essay, Ill try to make them
more structured.

49
piece-of-pie.md Normal file
View File

@ -0,0 +1,49 @@
# Piece of PIE
Modern ELF systems can randomize the address at which shared libraries are
loaded. This is generally referred to as Address Space Layout Randomization, or
ASLR. Shared libraries are always position independent, which means that they
can be loaded at any address. Randomizing the load address makes it slightly
harder for attackers of a running program to exploit buffer overflows or
similar problems, because they have no fixed addresses that they can rely on.
ASLR is part of defense in depth: it does not by itself prevent any attacks,
but it makes it slightly more difficult for attackers to exploit certain kinds
of programming errors in a useful way beyond simply crashing the program.
Although it is straightforward to randomize the load address of a shared
library, an ELF executable is normally linked to run at a fixed address that
can not be changed. This means that attackers have a set of fixed addresses
they can rely on. Permitting the kernel to randomize the address of the
executable itself is done by generating a Position Independent Executable, or
PIE.
It turns out to be quite simple to create a PIE: a PIE is simply an executable
shared library. To make a shared library executable you just need to give it a
`PT_INTERP` segment and appropriate startup code. The startup code can be the
same as the usual executable startup code, though of course it must be compiled
to be position independent.
When compiling code to go into a shared library, you use the `-fpic` option.
When compiling code to go into a PIE, you use the `-fpie` option. Since a PIE
is just a shared library, these options are almost exactly the same. The only
difference is that since `-fpie` implies that you are building the main
executable, there is no need to support symbol interposition for defined
symbols. In a shared library, if function `f1` calls `f2`, and `f2` is globally
visible, the code has to consider the possibility that `f2` will be interposed.
Thus, the call must go through the PLT. In a PIE, `f2` can not be interposed,
so the call may be made directly, though of course still in a position
independent manner. Similarly, if the processor can do PC-relative loads and
stores, all global variables can be accessed directly rather than going through
the GOT.
Other than that ability to avoid the PLT and GOT in some cases, a PIE is really
just a shared library. The dynamic linker will ask the kernel to map it at a
random address and will then relocate it as usual.
This does imply that a PIE must be dynamically linked, in the sense of using
the dynamic linker. Since the dynamic linker and the C library are closely
intertwined, linking the PIE statically with the C library is unlikely to work
in general. It is possible to design a statically linked PIE, in which the
program relocates itself at startup time. The dynamic linker itself does this.
However, there is no general mechanism for this at present.

91
protected-symbols.md Normal file
View File

@ -0,0 +1,91 @@
# Protected symbols
Now for something really controversial: whats wrong with protected symbols?
In an ELF shared library, an ordinary global symbol may be overridden if a
symbol of the same name is defined in the executable or in a shared library
which appears earlier in the runtime search path. This is called symbol
interposition. It is often used with functions such as `malloc`. A shared
library can define `malloc` and it can have code which calls `malloc`. If the
executable linked with the shared library defines `malloc` itself, then the
version in the executable will be used rather than the version in the shared
library. This permits the executable to control the memory allocation done by
the shared library, perhaps for debugging or logging purposes. In this regard,
shared libraries act much as static archives do.
This has a few consequences. One of them is that within a shared library, all
references to a global symbol must use the GOT and PLT, to make the overriding
possible. That means that all function calls and variable accesses are slightly
slower. Also, some compiler optimizations are forbidden: the compiler can not
inline a call to a global symbol, since that symbol might be overridden at run
time.
When building a shared library, you can provide a version script which
indicates that some symbols are actually not global. That can eliminate the GOT
and PLT accesses, but it does not permit the compiler optimizations, and you do
have to write that version script and keep it up to date.
When compiling code that goes into a shared library, you can set the visibility
of symbols. You can use hidden visibility, which means that the symbol is not
visible outside the shared library. You can use internal visibility, which is a
lot like hidden—Ill skip the difference here. Or you can use protected
visibility. Protected visibility means that the symbol is visible outside of
the shared library, and can be accessed as usual. However, all references from
within the shared library will use the definition in the shared library. In
other words, the symbol acts more or less as usual, but it can not be
overridden. This means that accesses to the symbol avoid the GOT and PLT, and
it permits compiler optimizations.
So, whats wrong with them? It turns out that protected symbols are slower at
dynamic link time, which means that programs which use the shared library start
up slower. This happens because of the C rule that two pointers to the same
function must compare as equal. Since protected symbols are globally visible,
you can get a pointer to a protected function in the main executable. You can
also get a pointer to that same function in the shared library, of course.
Those pointers have to be equal, or the C rule will break.
As noted, the access to the function in the shared library will not use the GOT
or PLT. The access in the main executable obviously will use the PLT. How can
we make those function pointers equal? We cant. The executable will have a
direct reference to the PLT. The shared library will have a direct reference to
the function itself. In neither case will there be a relocation for the
reference. So there is no way to make the results equal. (This can work for
some targets, but not for ones with simple function references like the x86
targets.)
So, I must have lied. The lie was that there is a case where you need to use
the GOT for a protected symbol: when compiling position independent code for a
shared library, and taking the address of a protected function, you need to use
the GOT. Unfortunately, gcc for the x86_64 target, surely the most widely used
gcc target today, gets this wrong: http://gcc.gnu.org/PR19520. This generally
reveals itself as an error report when you go to create a shared library:
relocation R_X86_64_PC32 against protected symbol `NAME` can not be used when
making a shared object.
In any case, when the compiler gets it right, the dynamic linker has to fill in
that GOT entry. In order to make the function pointers compare as equal, it has
to fill in the entry with the address of the PLT in the executable (or the
earlier shared library). But remember, this is a protected symbol, and
protected symbols dont support symbol interposition. So the dynamic linker
must only use the PLT of the executable if the reference in the executable
refers to the definition in the shared library. That means that when the
dynamic linker sees a reloc against a protected symbol in a shared library, it
has to do another walk through the executable and earlier shared libraries to
see if any of them have a definition for the symbol, in which case the GOT
entry must not be set to that earlier PLT entry but must instead be set to the
address of the symbol in the shared library itself. This check has to be done
for every symbol in the shared library.
Those extra symbol resolution passes means a slow down for every program which
uses the shared library, and that is what is wrong with protected symbols.
So how do you get the compiler and linker speedups available by avoiding symbol
interpositioning? Unfortunately, you have to give your symbols hidden
visibility, which means that they can not be accessed from other modules.
Assuming you do want them to be accessed, you need to define symbol aliases for
the ones which should be publicly visible. That means that you need to use
different names for the hidden symbols. This is awkward at best. Unfortunately
I have nothing better to offer. ELF is designed to support symbol
interpositioning, and there is no very good way to avoid that without causing
other consequences.

120
version-scripts.md Normal file
View File

@ -0,0 +1,120 @@
# Version Scripts
I recently spent some time sorting through linker version script issues, so Im
going to document what I discovered.
Linker symbol versioning was invented at Sun. The Solaris linker lets you use a
version script when you create a shared library. This script assigns versions
to specific named symbols, and defines a version hierarchy. When an executable
is linked against the shared library, the versions that it uses are recorded in
the executable. If you later try to dynamically link the executable with a
shared library which does not provide the required versions, you get a sensible
error message.
Suns scheme (as I understand it) only permits you to add new versions and new
symbols. Once a symbol has been defined at a specific version, you can not
change that in later releases. if you change the behaviour of a symbol, you
dont change the version of the symbol itself, instead you add a new version to
the library even if it does not define any symbols. That is sufficient to
ensure that an executable will not be dynamically linked against a version of
the shared library which is too old.
Eric Youngdale and Ulrich Drepper introduced a more sophisticated symbol
versioning scheme in the GNU linker and the GNU/Linux dynamic linker. The GNU
linker permits symbols to have multiple versions, of which only one is the
default. These versions are specified in the object files linked together to
form the shared library. The assembler `.symver` directive is used to assign a
version to a symbol (the version is simply encoded in the name of the symbol).
This scheme permits using symbol versioning to actually change the behaviour of
a symbol; older executables will continue to use the old version. This also
permits deleting symbols, by removing the default version. The older versions
of the symbol remain but are inaccessible.
That is all fine. The problems come in with the extensions to the version
script language. First, the GNU linker permits wildcards in version scripts.
Second, the GNU linker permits symbols to match against demangled names, again
typically using wildcards. Third, the GNU linker permits the version script to
hide symbols which have explicit versions in input object files.
Every symbol can only have one version. When the linker asks for the version of
a symbol, there can only be one answer. The support for wildcards and matching
of demangled names in the GNU linker script means that there may not be a
unique answer for the version to use for a given name. The fact that the GNU
linker permits version scripts to hide symbols with explicit versions means
that in some cases you absolutely must list a symbol two times in a version
script (because you might have a `local: *;` entry which must not match your
symbol with an old version). This potential confusion means that using linker
scripts correctly with wildcards requires a clear understanding of exactly how
the linker parses a version script.
Unfortunately, this was never documented. Until now. Here are the rules which
the GNU linker uses to parse version scripts, as of 2010-01-11.
The GNU linker walks through the version tags in the order in which they appear
in the version script. For each tag, it first walks through the global patterns
for that tag, then the local patterns. When looking at a single pattern, it
first applies any language specific demangling as specified for the pattern,
and then matches the resulting symbol name to the pattern. If it finds an exact
match for a literal pattern (a pattern enclosed in quotes or with no wildcard
characters), then that is the match that it uses. If finds a match with a
wildcard pattern, then it saves it and continues searching. Wildcard patterns
that are exactly “*” are saved separately.
If no exact match with a literal pattern is ever found, then if a wildcard
match with a global pattern was found it is used, otherwise if a wildcard match
with a local pattern was found it is used.
This is the result:
* If there is an exact match, then we use the first tag in the version script
where it matches.
* If the exact match in that tag is global, it is used.
* Otherwise the exact match in that tag is local, and is used.
* Otherwise, if there is any match with a global wildcard pattern:
* If there is any match with a wildcard pattern which is not `*`, then we use
the tag in which the last such pattern appears.
* Otherwise, we matched `*`. If there is no match with a local wildcard
pattern which is not `*`, then we use the last match with a global `*`.
Otherwise, continue.
* Otherwise, if there is any match with a local wildcard pattern:
* If there is any match with a wildcard pattern which is not `*`, then we use
the tag in which the last such pattern appears.
* Otherwise, we matched `*`, and we use the tag in which the last such match
occurred.
As mentioned above, there is an additional wrinkle. When the GNU linker finds a
symbol with a version defined in an object file due to a `.symver` directive, it
looks up that symbol name in that version tag. If it finds it, it matches the
symbol name against the patterns for that version. If there is no match with a
global pattern, but there is a match with a local pattern, then the GNU linker
marks the symbol as local.
I want gold to be compatible, but I also want gold to be efficient. Ive
introduced a hash table in gold to do fast lookups for exact matches. That
makes it impossible for gold to follow the exact rules when matching demangled
names. Currently gold does not do the final lookup to see if a symbol with an
explicit version should be forced local; I dont understand why that is useful.
It is possible that I will be forced to add that to gold at some later date.
Here are the current rules for gold:
* If there is an exact match for the mangled name, we use it.
* If there is more than one exact match, we give a warning, and we use the
first tag in the script which matches.
* If a symbol has an exact match as both global and local for the same
version tag, we give an error.
* Otherwise, we look for an extern C++ or an extern Java exact match. If we
find an exact match, we use it.
* If there is more than one exact match, we give a warning, and we use the
first tag in the script which matches.
* If a symbol has an exact match as both global and local for the same
version tag, we give an error.
* Otherwise, we look through the wildcard patterns, ignoring `*` patterns. We
look through the version tags in reverse order. For each version tag, we look
through the global patterns and then the local patterns. We use the first
match we find (i.e., the last matching version tag in the file).
* Otherwise, we use the `*` pattern if there is one. We give a warning if there
are multiple `*` patterns.
I hope for your sake that this information never actually matters to you.