airs-notes/version-scripts.md

121 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Version Scripts
I recently spent some time sorting through linker version script issues, so Im
going to document what I discovered.
Linker symbol versioning was invented at Sun. The Solaris linker lets you use a
version script when you create a shared library. This script assigns versions
to specific named symbols, and defines a version hierarchy. When an executable
is linked against the shared library, the versions that it uses are recorded in
the executable. If you later try to dynamically link the executable with a
shared library which does not provide the required versions, you get a sensible
error message.
Suns scheme (as I understand it) only permits you to add new versions and new
symbols. Once a symbol has been defined at a specific version, you can not
change that in later releases. if you change the behaviour of a symbol, you
dont change the version of the symbol itself, instead you add a new version to
the library even if it does not define any symbols. That is sufficient to
ensure that an executable will not be dynamically linked against a version of
the shared library which is too old.
Eric Youngdale and Ulrich Drepper introduced a more sophisticated symbol
versioning scheme in the GNU linker and the GNU/Linux dynamic linker. The GNU
linker permits symbols to have multiple versions, of which only one is the
default. These versions are specified in the object files linked together to
form the shared library. The assembler `.symver` directive is used to assign a
version to a symbol (the version is simply encoded in the name of the symbol).
This scheme permits using symbol versioning to actually change the behaviour of
a symbol; older executables will continue to use the old version. This also
permits deleting symbols, by removing the default version. The older versions
of the symbol remain but are inaccessible.
That is all fine. The problems come in with the extensions to the version
script language. First, the GNU linker permits wildcards in version scripts.
Second, the GNU linker permits symbols to match against demangled names, again
typically using wildcards. Third, the GNU linker permits the version script to
hide symbols which have explicit versions in input object files.
Every symbol can only have one version. When the linker asks for the version of
a symbol, there can only be one answer. The support for wildcards and matching
of demangled names in the GNU linker script means that there may not be a
unique answer for the version to use for a given name. The fact that the GNU
linker permits version scripts to hide symbols with explicit versions means
that in some cases you absolutely must list a symbol two times in a version
script (because you might have a `local: *;` entry which must not match your
symbol with an old version). This potential confusion means that using linker
scripts correctly with wildcards requires a clear understanding of exactly how
the linker parses a version script.
Unfortunately, this was never documented. Until now. Here are the rules which
the GNU linker uses to parse version scripts, as of 2010-01-11.
The GNU linker walks through the version tags in the order in which they appear
in the version script. For each tag, it first walks through the global patterns
for that tag, then the local patterns. When looking at a single pattern, it
first applies any language specific demangling as specified for the pattern,
and then matches the resulting symbol name to the pattern. If it finds an exact
match for a literal pattern (a pattern enclosed in quotes or with no wildcard
characters), then that is the match that it uses. If finds a match with a
wildcard pattern, then it saves it and continues searching. Wildcard patterns
that are exactly “*” are saved separately.
If no exact match with a literal pattern is ever found, then if a wildcard
match with a global pattern was found it is used, otherwise if a wildcard match
with a local pattern was found it is used.
This is the result:
* If there is an exact match, then we use the first tag in the version script
where it matches.
* If the exact match in that tag is global, it is used.
* Otherwise the exact match in that tag is local, and is used.
* Otherwise, if there is any match with a global wildcard pattern:
* If there is any match with a wildcard pattern which is not `*`, then we use
the tag in which the last such pattern appears.
* Otherwise, we matched `*`. If there is no match with a local wildcard
pattern which is not `*`, then we use the last match with a global `*`.
Otherwise, continue.
* Otherwise, if there is any match with a local wildcard pattern:
* If there is any match with a wildcard pattern which is not `*`, then we use
the tag in which the last such pattern appears.
* Otherwise, we matched `*`, and we use the tag in which the last such match
occurred.
As mentioned above, there is an additional wrinkle. When the GNU linker finds a
symbol with a version defined in an object file due to a `.symver` directive, it
looks up that symbol name in that version tag. If it finds it, it matches the
symbol name against the patterns for that version. If there is no match with a
global pattern, but there is a match with a local pattern, then the GNU linker
marks the symbol as local.
I want gold to be compatible, but I also want gold to be efficient. Ive
introduced a hash table in gold to do fast lookups for exact matches. That
makes it impossible for gold to follow the exact rules when matching demangled
names. Currently gold does not do the final lookup to see if a symbol with an
explicit version should be forced local; I dont understand why that is useful.
It is possible that I will be forced to add that to gold at some later date.
Here are the current rules for gold:
* If there is an exact match for the mangled name, we use it.
* If there is more than one exact match, we give a warning, and we use the
first tag in the script which matches.
* If a symbol has an exact match as both global and local for the same
version tag, we give an error.
* Otherwise, we look for an extern C++ or an extern Java exact match. If we
find an exact match, we use it.
* If there is more than one exact match, we give a warning, and we use the
first tag in the script which matches.
* If a symbol has an exact match as both global and local for the same
version tag, we give an error.
* Otherwise, we look through the wildcard patterns, ignoring `*` patterns. We
look through the version tags in reverse order. For each version tag, we look
through the global patterns and then the local patterns. We use the first
match we find (i.e., the last matching version tag in the file).
* Otherwise, we use the `*` pattern if there is one. We give a warning if there
are multiple `*` patterns.
I hope for your sake that this information never actually matters to you.