67 lines
3.8 KiB
Markdown
67 lines
3.8 KiB
Markdown
|
# Linkers part 15
|
|||
|
|
|||
|
## COMDAT sections
|
|||
|
|
|||
|
In C++ there are several constructs which do not clearly live in a single
|
|||
|
place. Examples are inline functions defined in a header file, virtual tables,
|
|||
|
and typeinfo objects. There must be only a single instance of each of these
|
|||
|
constructs in the final linked program (actually we could probably get away
|
|||
|
with multiple copies of a virtual table, but the others must be unique since it
|
|||
|
is possible to take their address). Unfortunately, there is not necessarily a
|
|||
|
single object file in which they should be generated. These types of constructs
|
|||
|
are sometimes described as having vague linkage.
|
|||
|
|
|||
|
Linkers implement these features by using *COMDAT* sections (there may be other
|
|||
|
approaches, but this is the only I know of). COMDAT sections are a special type
|
|||
|
of section. Each COMDAT section has a special string. When the linker sees
|
|||
|
multiple COMDAT sections with the same special string, it will only keep one of
|
|||
|
them.
|
|||
|
|
|||
|
For example, when the C++ compiler sees an inline function `f1` defined in a
|
|||
|
header file, but the compiler is unable to inline the function in all uses
|
|||
|
(perhaps because something takes the address of the function), the compiler
|
|||
|
will emit `f1` in a COMDAT section associated with the string `f1`. After the
|
|||
|
linker sees a COMDAT section `f1`, it will discard all subsequent `f1` COMDAT
|
|||
|
sections.
|
|||
|
|
|||
|
This obviously raises the possibility that there will be two entirely different
|
|||
|
inline functions named `f1`, defined in different header files. This would be
|
|||
|
an invalid C++ program, violating the One Definition Rule (often abbreviated
|
|||
|
ODR). Unfortunately, if no source file included both header files, the
|
|||
|
compiler would be unable to diagnose the error. And, unfortunately, the linker
|
|||
|
would simply discard the duplicate COMDAT sections, and would not notice the
|
|||
|
error either. This is an area where some improvements are needed (at least in
|
|||
|
the GNU tools; I don’t know whether any other tools diagnose this error
|
|||
|
correctly).
|
|||
|
|
|||
|
The Microsoft PE object file format provides COMDAT sections. These sections
|
|||
|
can be marked so that duplicate COMDAT sections which do not have identical
|
|||
|
contents cause an error. That is not as helpful as it seems, as different
|
|||
|
compiler options may cause valid duplicates to have different contents. The
|
|||
|
string associated with a COMDAT section is stored in the symbol table.
|
|||
|
|
|||
|
Before I learned about the Microsoft PE format, I introduced a different type
|
|||
|
of COMDAT sections into the GNU ELF linker, following a suggestion from Jason
|
|||
|
Merrill. Any section whose name starts with “.gnu.linkonce.” is a COMDAT
|
|||
|
section. The associated string is simply the section name itself. Thus the
|
|||
|
inline function `f1` would be put into the section “.gnu.linkonce.f1”. This
|
|||
|
simple implementation works well enough, but it has a flaw in that some
|
|||
|
functions require data in multiple sections; e.g., the instructions may be in
|
|||
|
one section and associated static data may be in another section. Since
|
|||
|
different instances of the inline function may be compiled differently, the
|
|||
|
linker can not reliably and consistently discard duplicate data (I don’t know
|
|||
|
how the Microsoft linker handles this problem).
|
|||
|
|
|||
|
Recent versions of ELF introduce section groups. These implement an officially
|
|||
|
sanctioned version of COMDAT in ELF, and avoid the problem of “.gnu.linkonce”
|
|||
|
sections. I described these briefly in an earlier blog entry. A special section
|
|||
|
of type `SHT_GROUP` contains a list of section indices in the group. The group
|
|||
|
is retained or discarded as a whole. The string associated with the group is
|
|||
|
found in the symbol table. Putting the string in the symbol table makes it
|
|||
|
awkward to retrieve, but since the string is generally the name of a symbol it
|
|||
|
means that the string only needs to be stored once in the object file; this is
|
|||
|
a minor optimization for C++ in which symbol names may be very long.
|
|||
|
|
|||
|
More tomorrow.
|
|||
|
|