airs-notes/linkers-15.md

67 lines
3.8 KiB
Markdown
Raw Permalink Normal View History

2021-01-12 20:17:52 +00:00
# Linkers part 15
## COMDAT sections
In C++ there are several constructs which do not clearly live in a single
place. Examples are inline functions defined in a header file, virtual tables,
and typeinfo objects. There must be only a single instance of each of these
constructs in the final linked program (actually we could probably get away
with multiple copies of a virtual table, but the others must be unique since it
is possible to take their address). Unfortunately, there is not necessarily a
single object file in which they should be generated. These types of constructs
are sometimes described as having vague linkage.
Linkers implement these features by using *COMDAT* sections (there may be other
approaches, but this is the only I know of). COMDAT sections are a special type
of section. Each COMDAT section has a special string. When the linker sees
multiple COMDAT sections with the same special string, it will only keep one of
them.
For example, when the C++ compiler sees an inline function `f1` defined in a
header file, but the compiler is unable to inline the function in all uses
(perhaps because something takes the address of the function), the compiler
will emit `f1` in a COMDAT section associated with the string `f1`. After the
linker sees a COMDAT section `f1`, it will discard all subsequent `f1` COMDAT
sections.
This obviously raises the possibility that there will be two entirely different
inline functions named `f1`, defined in different header files. This would be
an invalid C++ program, violating the One Definition Rule (often abbreviated
ODR). Unfortunately, if no source file included both header files, the
compiler would be unable to diagnose the error. And, unfortunately, the linker
would simply discard the duplicate COMDAT sections, and would not notice the
error either. This is an area where some improvements are needed (at least in
the GNU tools; I dont know whether any other tools diagnose this error
correctly).
The Microsoft PE object file format provides COMDAT sections. These sections
can be marked so that duplicate COMDAT sections which do not have identical
contents cause an error. That is not as helpful as it seems, as different
compiler options may cause valid duplicates to have different contents. The
string associated with a COMDAT section is stored in the symbol table.
Before I learned about the Microsoft PE format, I introduced a different type
of COMDAT sections into the GNU ELF linker, following a suggestion from Jason
Merrill. Any section whose name starts with “.gnu.linkonce.” is a COMDAT
section. The associated string is simply the section name itself. Thus the
inline function `f1` would be put into the section “.gnu.linkonce.f1”. This
simple implementation works well enough, but it has a flaw in that some
functions require data in multiple sections; e.g., the instructions may be in
one section and associated static data may be in another section. Since
different instances of the inline function may be compiled differently, the
linker can not reliably and consistently discard duplicate data (I dont know
how the Microsoft linker handles this problem).
Recent versions of ELF introduce section groups. These implement an officially
sanctioned version of COMDAT in ELF, and avoid the problem of “.gnu.linkonce”
sections. I described these briefly in an earlier blog entry. A special section
of type `SHT_GROUP` contains a list of section indices in the group. The group
is retained or discarded as a whole. The string associated with the group is
found in the symbol table. Putting the string in the symbol table makes it
awkward to retrieve, but since the string is generally the name of a symbol it
means that the string only needs to be stored once in the object file; this is
a minor optimization for C++ in which symbol names may be very long.
More tomorrow.