airs-notes/linkers-6.md

# Linkers part 6

So many things to talk about. Let’s go back and cover relocations in some more
detail, with some examples.

## Relocations

As I said back in part 2, a relocation is a computation to perform on the
contents. And as I said yesterday, a relocation can also direct the linker to
take other actions, like creating a PLT or GOT entry. Let’s take a closer look
at the computation.

In general a relocation has a type, a symbol, an offset into the contents, and
an addend.  From the linker’s point of view, the contents are simply an
uninterpreted series of bytes. A relocation changes those bytes as necessary to
produce the correct final executable. For example, consider the C code
`g = 0;` where `g` is a global variable. On the i386, the compiler will turn
this into an assembly language instruction, which will most likely be
`movl $0, g` (for position dependent code–position independent code would
loading the address of `g` from the GOT). Now, the `g` in the C code is a
global variable, and we all more or less know what that means. The `g` in the
assembly code is not that variable. It is a symbol which holds the address of
that variable.

The assembler does not know the address of the global variable `g`, which is
another way of saying that the assembler does not know the value of the symbol
`g`. It is the linker that is going to pick that address. So the assembler has
to tell the linker that it needs to use the address of `g` in this instruction.
The way the assembler does this is to create a relocation. We don’t use a
separate relocation type for each instruction; instead, each processor will
have a natural set of relocation types which are appropriate for the machine
architecture. Each type of relocation expresses a specific computation.

In the i386 case, the assembler will generate these bytes:

```
    c7 05 00 00 00 00 00 00 00 00
```

The `c7 05` are the instruction (movl constant to address). The first four `00`
bytes are the 32-bit constant 0. The second four `00` bytes are the address.
The assembler tells the linker to put the value of the symbol `g` into those
four bytes by generating (in this case) a `R_386_32` relocation. For this
relocation the symbol will be `g`, the offset will be to the last four bytes of
the instruction, the type will be `R_386_32`, and the addend will be 0 (in the
case of the i386 the addend is stored in the contents rather than in the
relocation itself, but this is a detail). The type `R_386_32` expresses a
specific computation, which is: put the 32-bit sum of the value of the symbol
and the addend into the offset. Since for the i386 the addend is stored in the
contents, this can also be expressed as: add the value of the symbol to the
32-bit field at the offset. When the linker performs this computation, the
address in the instruction will be the address of the global variable g.
Regardless of the details, the important point to note is that the relocation
adjusts the contents by applying a specific computation selected by the type.

An example of a simple case which does use an addend would be

```c
    char a[10]; // A global array.
    char* p = &a[1]; // In a function.
```

The assignment to p will wind up requiring a relocation for the symbol `a`.
Here the addend will be 1, so that the resulting instruction references `a + 1`
rather than `a + 0`.

To point out how relocations are processor dependent, let’s consider `g = 0;`
on a RISC processor: the PowerPC (in 32-bit mode). In this case, multiple
assembly language instructions are required:

```asm
    li 1,0 // Set register 1 to 0
    lis 9,g@ha // Load high-adjusted part of g into register 9
    stw 1,g@l(9) // Store register 1 to address in register 9 plus low adjusted part g
```

The `lis` instruction loads a value into the upper 16 bits of register 9,
setting the lower 16 bits to zero. The `stw` instruction adds a signed 16 bit
value to register 9 to form an address, and then stores the value of register 1
at that address. The `@ha` part of the operand directs the assembler to
generate a `R_PPC_ADDR16_HA` reloc. The `@l` produces a `R_PPC_ADDR16_LO`
reloc. The goal of these relocs is to compute the value of the symbol `g` and
use it as the store address.

That is enough information to determine the computations performed by these
relocs. The `R_PPC_ADDR16_HA` reloc computes
`(SYMBOL >> 16) + ((SYMBOL & 0x8000) ? 1 : 0)`. `The R_PPC_ADDR16_LO` computes
`SYMBOL & 0xffff`. The extra computation for `R_PPC_ADDR16_HA` is because the
`stw` instruction adds the signed 16-bit value, which means that if the low 16
bits appears negative we have to adjust the high 16 bits accordingly. The
offsets of the relocations are such that the 16-bit resulting values are stored
into the appropriate parts of the machine instructions.

The specific examples of relocations I’ve discussed here are ELF specific, but
the same sorts of relocations occur for any object file format.

The examples I’ve shown are for relocations which appear in an object file. As
discussed in part 4, these types of relocations may also appear in a shared
library, if they are copied there by the program linker. In ELF, there are also
specific relocation types which never appear in object files but only appear in
shared libraries or executables. These are the `JMP_SLOT`, `GLOB_DAT`, and
`RELATIVE` relocations discussed earlier. Another type of relocation which only
appears in an executable is a `COPY` relocation, which I will discuss later.

## Position Dependent Shared Libraries

I realized that in part 4 I forgot to say one of the important reasons that ELF
shared libraries use PLT and GOT tables. The idea of a shared library is to
permit mapping the same shared library into different processes. This only
works at maximum efficiency if the shared library code looks the same in each
process. If it does not look the same, then each process will need its own
private copy, and the savings in physical memory and sharing will be lost.

As discussed in part 4, when the dynamic linker loads a shared library which
contains position dependent code, it must apply a set of dynamic relocations.
Those relocations will change the code in the shared library, and it will no
longer be sharable.

The advantage of the PLT and GOT is that they move the relocations elsewhere,
to the PLT and GOT tables themselves. Those tables can then be put into a
read-write part of the shared library. This part of the shared library will be
much smaller than the code. The PLT and GOT tables will be different in each
process using the shared library, but the code will be the same.

I’ll be taking a vacation for the long weekend. My next post will most likely
be on Tuesday.