121 lines
6.8 KiB
Markdown
121 lines
6.8 KiB
Markdown
|
# Version Scripts
|
|||
|
|
|||
|
I recently spent some time sorting through linker version script issues, so I’m
|
|||
|
going to document what I discovered.
|
|||
|
|
|||
|
Linker symbol versioning was invented at Sun. The Solaris linker lets you use a
|
|||
|
version script when you create a shared library. This script assigns versions
|
|||
|
to specific named symbols, and defines a version hierarchy. When an executable
|
|||
|
is linked against the shared library, the versions that it uses are recorded in
|
|||
|
the executable. If you later try to dynamically link the executable with a
|
|||
|
shared library which does not provide the required versions, you get a sensible
|
|||
|
error message.
|
|||
|
|
|||
|
Sun’s scheme (as I understand it) only permits you to add new versions and new
|
|||
|
symbols. Once a symbol has been defined at a specific version, you can not
|
|||
|
change that in later releases. if you change the behaviour of a symbol, you
|
|||
|
don’t change the version of the symbol itself, instead you add a new version to
|
|||
|
the library even if it does not define any symbols. That is sufficient to
|
|||
|
ensure that an executable will not be dynamically linked against a version of
|
|||
|
the shared library which is too old.
|
|||
|
|
|||
|
Eric Youngdale and Ulrich Drepper introduced a more sophisticated symbol
|
|||
|
versioning scheme in the GNU linker and the GNU/Linux dynamic linker. The GNU
|
|||
|
linker permits symbols to have multiple versions, of which only one is the
|
|||
|
default. These versions are specified in the object files linked together to
|
|||
|
form the shared library. The assembler `.symver` directive is used to assign a
|
|||
|
version to a symbol (the version is simply encoded in the name of the symbol).
|
|||
|
This scheme permits using symbol versioning to actually change the behaviour of
|
|||
|
a symbol; older executables will continue to use the old version. This also
|
|||
|
permits deleting symbols, by removing the default version. The older versions
|
|||
|
of the symbol remain but are inaccessible.
|
|||
|
|
|||
|
That is all fine. The problems come in with the extensions to the version
|
|||
|
script language. First, the GNU linker permits wildcards in version scripts.
|
|||
|
Second, the GNU linker permits symbols to match against demangled names, again
|
|||
|
typically using wildcards. Third, the GNU linker permits the version script to
|
|||
|
hide symbols which have explicit versions in input object files.
|
|||
|
|
|||
|
Every symbol can only have one version. When the linker asks for the version of
|
|||
|
a symbol, there can only be one answer. The support for wildcards and matching
|
|||
|
of demangled names in the GNU linker script means that there may not be a
|
|||
|
unique answer for the version to use for a given name. The fact that the GNU
|
|||
|
linker permits version scripts to hide symbols with explicit versions means
|
|||
|
that in some cases you absolutely must list a symbol two times in a version
|
|||
|
script (because you might have a `local: *;` entry which must not match your
|
|||
|
symbol with an old version). This potential confusion means that using linker
|
|||
|
scripts correctly with wildcards requires a clear understanding of exactly how
|
|||
|
the linker parses a version script.
|
|||
|
|
|||
|
Unfortunately, this was never documented. Until now. Here are the rules which
|
|||
|
the GNU linker uses to parse version scripts, as of 2010-01-11.
|
|||
|
|
|||
|
The GNU linker walks through the version tags in the order in which they appear
|
|||
|
in the version script. For each tag, it first walks through the global patterns
|
|||
|
for that tag, then the local patterns. When looking at a single pattern, it
|
|||
|
first applies any language specific demangling as specified for the pattern,
|
|||
|
and then matches the resulting symbol name to the pattern. If it finds an exact
|
|||
|
match for a literal pattern (a pattern enclosed in quotes or with no wildcard
|
|||
|
characters), then that is the match that it uses. If finds a match with a
|
|||
|
wildcard pattern, then it saves it and continues searching. Wildcard patterns
|
|||
|
that are exactly “*” are saved separately.
|
|||
|
|
|||
|
If no exact match with a literal pattern is ever found, then if a wildcard
|
|||
|
match with a global pattern was found it is used, otherwise if a wildcard match
|
|||
|
with a local pattern was found it is used.
|
|||
|
|
|||
|
This is the result:
|
|||
|
|
|||
|
* If there is an exact match, then we use the first tag in the version script
|
|||
|
where it matches.
|
|||
|
* If the exact match in that tag is global, it is used.
|
|||
|
* Otherwise the exact match in that tag is local, and is used.
|
|||
|
* Otherwise, if there is any match with a global wildcard pattern:
|
|||
|
* If there is any match with a wildcard pattern which is not `*`, then we use
|
|||
|
the tag in which the last such pattern appears.
|
|||
|
* Otherwise, we matched `*`. If there is no match with a local wildcard
|
|||
|
pattern which is not `*`, then we use the last match with a global `*`.
|
|||
|
Otherwise, continue.
|
|||
|
* Otherwise, if there is any match with a local wildcard pattern:
|
|||
|
* If there is any match with a wildcard pattern which is not `*`, then we use
|
|||
|
the tag in which the last such pattern appears.
|
|||
|
* Otherwise, we matched `*`, and we use the tag in which the last such match
|
|||
|
occurred.
|
|||
|
|
|||
|
As mentioned above, there is an additional wrinkle. When the GNU linker finds a
|
|||
|
symbol with a version defined in an object file due to a `.symver` directive, it
|
|||
|
looks up that symbol name in that version tag. If it finds it, it matches the
|
|||
|
symbol name against the patterns for that version. If there is no match with a
|
|||
|
global pattern, but there is a match with a local pattern, then the GNU linker
|
|||
|
marks the symbol as local.
|
|||
|
|
|||
|
I want gold to be compatible, but I also want gold to be efficient. I’ve
|
|||
|
introduced a hash table in gold to do fast lookups for exact matches. That
|
|||
|
makes it impossible for gold to follow the exact rules when matching demangled
|
|||
|
names. Currently gold does not do the final lookup to see if a symbol with an
|
|||
|
explicit version should be forced local; I don’t understand why that is useful.
|
|||
|
It is possible that I will be forced to add that to gold at some later date.
|
|||
|
|
|||
|
Here are the current rules for gold:
|
|||
|
|
|||
|
* If there is an exact match for the mangled name, we use it.
|
|||
|
* If there is more than one exact match, we give a warning, and we use the
|
|||
|
first tag in the script which matches.
|
|||
|
* If a symbol has an exact match as both global and local for the same
|
|||
|
version tag, we give an error.
|
|||
|
* Otherwise, we look for an extern C++ or an extern Java exact match. If we
|
|||
|
find an exact match, we use it.
|
|||
|
* If there is more than one exact match, we give a warning, and we use the
|
|||
|
first tag in the script which matches.
|
|||
|
* If a symbol has an exact match as both global and local for the same
|
|||
|
version tag, we give an error.
|
|||
|
* Otherwise, we look through the wildcard patterns, ignoring `*` patterns. We
|
|||
|
look through the version tags in reverse order. For each version tag, we look
|
|||
|
through the global patterns and then the local patterns. We use the first
|
|||
|
match we find (i.e., the last matching version tag in the file).
|
|||
|
* Otherwise, we use the `*` pattern if there is one. We give a warning if there
|
|||
|
are multiple `*` patterns.
|
|||
|
|
|||
|
I hope for your sake that this information never actually matters to you.
|
|||
|
|