airs-notes/linkers-1.md

# Linkers part 1

I’ve been working on and off on a new linker. To my surprise, I’ve discovered
in talking about this that some people, even some computer programmers, are
unfamiliar with the details of the linking process. I’ve decided to write some
notes about linkers, with the goal of producing an essay similar to my existing
one about the GNU configure and build system.

As I only have the time to write one thing a day, I’m going to do this on my
blog over time, and gather the final essay together later. I believe that I may
be up to five readers, and I hope y’all will accept this digression into stuff
that matters. I will return to random philosophizing and minding other people’s
business soon enough.

## A Personal Introduction

Who am I to write about linkers?

I wrote my first linker back in 1988, for the AMOS operating system which ran
on Alpha Micro systems. (If you don’t understand the following description,
don’t worry; all will be explained below). I used a single global database to
register all symbols. Object files were checked into the database after they
had been compiled. The link process mainly required identifying the object file
holding the main function. Other objects files were pulled in by reference. I
reverse engineered the object file format, which was undocumented but quite
simple. The goal of all this was speed, and indeed this linker was much faster
than the system one, mainly because of the speed of the database.

I wrote my second linker in 1993 and 1994. This linker was designed and
prototyped by Steve Chamberlain while we both worked at Cygnus Support (later
Cygnus Solutions, later part of Red Hat). This was a complete reimplementation
of the BFD based linker which Steve had written a couple of years before.
The primary target was a.out and COFF. Again the goal was speed, especially
compared to the original BFD based linker. On SunOS 4 this linker was almost as
fast as running the cat program on the input .o files.

The linker I am now working, called gold, on will be my third. It is
exclusively an ELF linker. Once again, the goal is speed, in this case being
faster than my second linker. That linker has been significantly slowed down
over the years by adding support for ELF and for shared libraries. This support
was patched in rather than being designed in. Future plans for the new linker
include support for incremental linking–which is another way of increasing
speed.

There is an obvious pattern here: everybody wants linkers to be faster. This is
because the job which a linker does is uninteresting. The linker is a speed
bump for a developer, a process which takes a relatively long time but adds no
real value. So why do we have linkers at all? That brings us to our next topic.

## A Technical Introduction

What does a linker do?

It’s simple: a linker converts object files into executables and shared
libraries. Let’s look at what that means. For cases where a linker is used,
the software development process consists of writing program code in some
language: e.g., C or C++ or Fortran (but typically not Java, as Java normally
works differently, using a loader rather than a linker). A compiler translates
this program code, which is human readable text, into into another form of
human readable text known as assembly code. Assembly code is a readable form of
the machine language which the computer can execute directly. An assembler is
used to turn this assembly code into an object file. For completeness, I’ll
note that some compilers include an assembler internally, and produce an object
file directly. Either way, this is where things get interesting.

In the old days, when dinosaurs roamed the data centers, many programs were
complete in themselves. In those days there was generally no compiler–people
wrote directly in assembly code–and the assembler actually generated an
executable file which the machine could execute directly. As languages liked
Fortran and Cobol started to appear, people began to think in terms of
libraries of subroutines, which meant that there had to be some way to run the
assembler at two different times, and combine the output into a single
executable file. This required the assembler to generate a different type of
output, which became known as an object file (I have no idea where this name
came from). And a new program was required to combine different object files
together into a single executable. This new program became known as the linker
(the source of this name should be obvious).

Linkers still do the same job today. In the decades that followed, one new
feature has been added: shared libraries.

More tomorrow.