67 lines
3.8 KiB
Markdown
67 lines
3.8 KiB
Markdown
# Linkers part 15
|
||
|
||
## COMDAT sections
|
||
|
||
In C++ there are several constructs which do not clearly live in a single
|
||
place. Examples are inline functions defined in a header file, virtual tables,
|
||
and typeinfo objects. There must be only a single instance of each of these
|
||
constructs in the final linked program (actually we could probably get away
|
||
with multiple copies of a virtual table, but the others must be unique since it
|
||
is possible to take their address). Unfortunately, there is not necessarily a
|
||
single object file in which they should be generated. These types of constructs
|
||
are sometimes described as having vague linkage.
|
||
|
||
Linkers implement these features by using *COMDAT* sections (there may be other
|
||
approaches, but this is the only I know of). COMDAT sections are a special type
|
||
of section. Each COMDAT section has a special string. When the linker sees
|
||
multiple COMDAT sections with the same special string, it will only keep one of
|
||
them.
|
||
|
||
For example, when the C++ compiler sees an inline function `f1` defined in a
|
||
header file, but the compiler is unable to inline the function in all uses
|
||
(perhaps because something takes the address of the function), the compiler
|
||
will emit `f1` in a COMDAT section associated with the string `f1`. After the
|
||
linker sees a COMDAT section `f1`, it will discard all subsequent `f1` COMDAT
|
||
sections.
|
||
|
||
This obviously raises the possibility that there will be two entirely different
|
||
inline functions named `f1`, defined in different header files. This would be
|
||
an invalid C++ program, violating the One Definition Rule (often abbreviated
|
||
ODR). Unfortunately, if no source file included both header files, the
|
||
compiler would be unable to diagnose the error. And, unfortunately, the linker
|
||
would simply discard the duplicate COMDAT sections, and would not notice the
|
||
error either. This is an area where some improvements are needed (at least in
|
||
the GNU tools; I don’t know whether any other tools diagnose this error
|
||
correctly).
|
||
|
||
The Microsoft PE object file format provides COMDAT sections. These sections
|
||
can be marked so that duplicate COMDAT sections which do not have identical
|
||
contents cause an error. That is not as helpful as it seems, as different
|
||
compiler options may cause valid duplicates to have different contents. The
|
||
string associated with a COMDAT section is stored in the symbol table.
|
||
|
||
Before I learned about the Microsoft PE format, I introduced a different type
|
||
of COMDAT sections into the GNU ELF linker, following a suggestion from Jason
|
||
Merrill. Any section whose name starts with “.gnu.linkonce.” is a COMDAT
|
||
section. The associated string is simply the section name itself. Thus the
|
||
inline function `f1` would be put into the section “.gnu.linkonce.f1”. This
|
||
simple implementation works well enough, but it has a flaw in that some
|
||
functions require data in multiple sections; e.g., the instructions may be in
|
||
one section and associated static data may be in another section. Since
|
||
different instances of the inline function may be compiled differently, the
|
||
linker can not reliably and consistently discard duplicate data (I don’t know
|
||
how the Microsoft linker handles this problem).
|
||
|
||
Recent versions of ELF introduce section groups. These implement an officially
|
||
sanctioned version of COMDAT in ELF, and avoid the problem of “.gnu.linkonce”
|
||
sections. I described these briefly in an earlier blog entry. A special section
|
||
of type `SHT_GROUP` contains a list of section indices in the group. The group
|
||
is retained or discarded as a whole. The string associated with the group is
|
||
found in the symbol table. Putting the string in the symbol table makes it
|
||
awkward to retrieve, but since the string is generally the name of a symbol it
|
||
means that the string only needs to be stored once in the object file; this is
|
||
a minor optimization for C++ in which symbol names may be very long.
|
||
|
||
More tomorrow.
|
||
|