airs-notes/maskray-6.md

329 lines
12 KiB
Markdown

# GNU indirect function
UNDER CONSTRUCTION.
GNU indirect function (ifunc) is a mechanism making a direct function call
resolve to an implementation picked by a resolver. It is mainly used in glibc
but has adoption in FreeBSD.
For some performance critical functions, e.g. memcpy/memset/strcpy, glibc
provides multiple implementations optimized for different architecture levels.
The application just uses `memcpy(...)` which compiles to call memcpy. The
linker will create a PLT for `memcpy` and produce an associated special dynamic
relocation referencing the resolver symbol/address. During relocation resolving
at runtime, the return value of the resolver will be placed in the GOT entry
and the PLT entry will load the address.
## Representation
ifunc has a dedicated symbol type `STT_GNU_IFUNC` to mark it different from a
regular function (`STT_FUNC`). The value 10 is in the OS-specific range (10~12).
`readelf -s` tell you that the symbol is ifunc if OSABI is `ELFOSABI_GNU` or
`ELFOSABI_FREEBSD`.
On Linux, by default GNU as uses `ELFOSABI_NONE` (0). If ifunc is used, the OSABI
will be changed to `ELFOSABI_GNU`. Similarly, GNU ld sets the OSABI to
`ELFOSABI_GNU` if ifunc is used. gold does not do this [PR17735](https://sourceware.org/bugzilla/show_bug.cgi?id=17735).
Things are loose in LLVM. The integrated assembler and LLD do not set
`ELFOSABI_GNU`. Currently the only problem I know is the `readelf -s` display.
Everything else works fine.
### Assembler behavior
In assembly, you can assign the type `STT_GNU_IFUNC` to a symbol via
`.type foo, @gnu_indirect_function`. An ifunc symbol is typically `STB_GLOBAL`.
In the object file, `st_shndx` and `st_value` of an `STT_GNU_IFUNC` symbol
indicate the resolver. After linking, if the symbol is still `STT_GNU_IFUNC`,
its `st_value` field indicates the resolver address in the linked image.
Assemblers usually convert relocations referencing a local symbol to reference
the section symbol, but this behavior needs to be inhibited for `STT_GNU_IFUNC`.
### Example
```
cat > b.s <<e
.global ifunc
.type ifunc, @gnu_indirect_function
.set ifunc, resolver
resolver:
leaq impl(%rip), %rax
ret
impl:
movq $42, %rax
ret
e
cat > a.c <<e
int ifunc(void);
int main() { return ifunc(); }
e
cc a.c b.s
./a.out # exit code 42
```
GNU as makes transitive aliases to an `STT_GNU_IFUNC` ifunc as well.
```
.type foo,@gnu_indirect_function
.set foo, foo_resolver
.set foo2, foo
.set foo3, foo2
```
GCC and Clang support a function attribute which emits
`.type ifunc, @gnu_indirect_function; .set ifunc, resolver`:
```c
static int impl(void) { return 42; }
static void *resolver(void) { return impl; }
void *ifunc(void) __attribute__((ifunc("resolver")));
```
## Preemptible ifunc
A preemptible ifunc call is no different from a regular function call from the
linker perspective.
The linker creates a PLT entry, reserves an associated GOT entry, and emits an
`R_*_JUMP_SLOT` relocation resolving the address into the GOT entry. The PLT
code sequence is the same as a regular PLT for `STT_FUNC`.
If the ifunc is defined within the module, the symbol type in the linked image
is `STT_GNU_IFUNC`, otherwise (defined in a DSO), the symbol type is `STT_FUNC`.
The difference resides in the loader.
At runtime, the relocation resolver checks whether the `R_*_JUMP_SLOT`
relocation refers to an ifunc. If it does, instead of filling the GOT entry
with the target address, the resolver calls the target address as an indirect
function, with ABI specified additional parameters (hwcap related), and places
the return value into the GOT entry.
## Non-preemptible ifunc
The non-preemptible ifunc case is where all sorts of complexity come from.
First, the `R_*_JUMP_SLOT` relocation type cannot be used in some cases:
* A non-preemptible ifunc may not have a dynamic symbol table entry. It can be
local. It can be defined in the executable without the need to export.
* A non-local `STV_DEFAULT` symbol defined in a shared object is by default
preemptible. Using `R_*_JUMP_SLOT` for such a case will make the ifunc look
like preemptible.
Therefore a new relocation type `R_*_IRELATIVE` was introduced. There is no
associated symbol and the address indicates the resolver.
```
R_*_RELATIVE: B + A
R_*_IRELATIVE: call (B + A) as a function
R_*_JUMP_SLOT: S
```
When an `R_*_JUMP_SLOT` can be used, there is a trade-off between
`R_*_JUMP_SLOT` and `R_*_IRELATIVE`: an `R_*_JUMP_SLOT` can be lazily resolved
but needs a symbol lookup. Currently powerpc can use `R_PPC64_JMP_SLOT` in some
cases [PR27203](https://sourceware.org/bugzilla/show_bug.cgi?id=27203).
A PLT entry is needed for two reasons:
* The call sites emit instructions like call foo. We need to forward them to a
place to perform the indirection. Text relocations are usually not an option
(exception: [{ifunc-noplt}]()).
* If the ifunc is exported, we need a place to mark its canonical address.
Such PLT entries are sometimes referred to as IPLT. They are placed in the
synthetic section .iplt. In GNU ld, `.iplt` will be placed in the output
section `.plt`. In LLD, I decided that `.iplt` is better
https://reviews.llvm.org/D71520.
On many architectures (e.g. AArch64/PowerPC/x86), the PLT code sequence is the
same as a regular PLT, but it could be different.
On x86-64, the code sequence is:
```
jmp *got(%rip)
pushq $0
jmp .plt
```
Since there is no lazy binding, `pushq $0; jmp .plt` are not needed. However,
to make all PLT entries of the same shape to simplify linker implementations
and facilitate analyzers, it is find to keep it this way.
## PowerPC32 `-msecure-plt` IPLT
As a design to work around the lack of PC-relative instructions, PowerPC32 uses
multiple GOT sections, one per file in `.got2`. To support multiple GOT
pointers, the addend on each `R_PPC_PLTREL24` reloc will have the offset within
`.got2`.
`-msecure-plt` has small/large PIC differences.
* `-fpic`/`-fpie`: `R_PPC_PLTREL24 r_addend=0`. The call stub loads an address
relative to `_GLOBAL_OFFSET_TABLE_`.
* `-fPIC`/`-fPIE`: `R_PPC_PLTREL24 r_addend=0x8000`. (A partial linked object
file may have an addend larger than 0x8000.) The call stub loads an address
relative to `.got2+0x8000`.
If a non-preemptible ifunc is referenced in two object files, in
`-pie`/`-shared` mode, the two object files cannot share the same IPLT entry.
When I added non-preemptible ifunc support for PowerPC32 to LLD
https://reviews.llvm.org/D71621, I did not handle this case.
### `.rela.dyn` vs `.rela.plt`
LLD placed `R_*_IRELATIVE` in the `.rela.plt` section because many ports of GNU
ld behaved this way. While implementing ifunc for PowerPC, I noticed that GNU
ld powerpc actually places `R_*_IRELATIVE` in `.rela.dyn` and glibc powerpc
does not actually support `R_*_IRELATIVE` in `.rela.plt`. This makes a lot of
sense to me because `.rela.plt` normally just contains `R_*_JUMP_SLOT` which
can be lazily resolved. ifunc relocations need to be eagerly resolved so
`.rela.plt` was a misplace. Therefore I changed LLD to use `.rela.dyn` in
https://reviews.llvm.org/D65651.
## `__rela_iplt_start` and `__rela_iplt_end`
A statically linked position dependent executable traditionally had no dynamic
relocations.
With ifunc, these `R_*_IRELATIVE` relocations must be resolved at runtime. Such
relocations are in a magic array delimitered by `__rela_iplt_start` and
`__rela_iplt_end`. In glibc, `csu/libc-start.c` has special code processing the
relocation range.
GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in `-pie`
mode. LLD defines `__rela_iplt_start` regardless of `-no-pie`, `-pie` or
`-shared`.
In glibc, static pie uses self-relocation (`_dl_relocate_static_pie`) to take
care of `R_*_IRELATIVE`. The above magic array code is executed by static pie
as well. If `__rela_iplt_start`/`__rela_iplt_end` are defined, we will get
`0 < __rela_iplt_start < __rela_iplt_end` in `csu/libc-start.c`.
`ARCH_SETUP_IREL` will crash when resolving the first relocation which has been
processed.
I think the difference in the
`diff -u =(ld.bfd --verbose) =(ld.bfd -pie --verbose)` output is unneeded.
https://sourceware.org/pipermail/libc-alpha/2021-January/121755.html
## Address significance
A non-GOT-generating non-PLT-generating relocation referencing a
`STT_GNU_IFUNC` indicates a potential address-taken operation.
With a function attribute, the compilers knows that a symbol indicates an ifunc
and will avoid generating such relocations. With assembly such relocations may
be unavoidable.
In most cases the linker needs to convert the symbol type to `STT_FUNC` and
create a special PLT entry, which is called a "canonical PLT entry" in LLD.
References from other modules will resolve to the PLT entry to keep pointer
equality: the address taken from the defining module should match the address
taken from another module.
This approach has pros and cons:
* With a canonical PLT entry, the resolver of a symbol is called only once.
There is exactly one `R_*_IRELATIVE` relocation.
* If the relocation appears in a non-`SHF_WRITE` section, a text relocation can
be avoided.
* Relocation types which are not valid dynamic relocation types are supported.
GNU ld may error relocation `R_X86_64_PC32` against `STT_GNU_IFUNC` symbol
`ifunc` isn't supported
* References will bind to the canonical PLT entry. A function call needs to
jump to the PLT, loads the value from the GOT, then does an indirect call.
For a symbolic relocation type (a special case of absolute relocation types
where the width matches the word size) like `R_X86_64_64`, when the addend is 0
and the section has the `SHF_WRITE` flag, the linker can emit an
`R_X86_64_IRELATIVE`. https://reviews.llvm.org/D65995 dropped the case.
For the following example, GNU ld linked `a.out` calls `fff_resolver` three
times while LLD calls it once.
```c
// RUN: split-file %s %t
// RUN: clang -fuse-ld=bfd -fpic %t/dso.c -o %t/dso.so --shared
// RUN: clang -fuse-ld=bfd %t/main.c %t/dso.so -o %t/a.out
// RUN: %t/a.out
//--- dso.c
typedef void fptr(void);
extern void fff(void);
fptr *global_fptr0 = &fff;
fptr *global_fptr1 = &fff;
//--- main.c
#include <stdio.h>
static void fff_impl() { printf("fff_impl()\n"); }
static int z;
void *fff_resolver() { return (char *)&fff_impl + z++; }
__attribute__((ifunc("fff_resolver"))) void fff();
typedef void fptr(void);
fptr *local_fptr = fff;
extern fptr *global_fptr0, *global_fptr1;
int main() {
printf("local %p global0 %p global1 %p\n", local_fptr, global_fptr0, global_fptr1);
return 0;
}
```
### Relocation resolving order
`R_*_IRELATIVE` relocations are resolved eagerly. In glibc, there used to be a
problem where ifunc resolvers ran before `GL(dl_hwcap)` and `GL(dl_hwcap2)`
were set up https://sourceware.org/bugzilla/show_bug.cgi?id=27072.
For the relocation resolver, the main executable needs to be processed the last
to process `R_*_COPY`. Without ifunc, the resolving order of shared objects can
be arbitrary.
For ifunc, if the ifunc is defined in a processed module, it is fine. If the
ifunc is defined in an unprocessed module, it may crash.
For an ifunc defined in an executable, calling it from a shared object can be
problematic because the executable's relocations haven't been resolved. The
issue can be circumvented by converting the non-preemptible ifunc defined in
the executable to `STT_FUNC`. GNU ld's x86 port made the change
[PR23169](https://sourceware.org/bugzilla/show_bug.cgi?id=23169).
## `-z ifunc-noplt`
Mark Johnston introduced `-z ifunc-noplt` for FreeBSD
https://reviews.llvm.org/D61613. With this option, all relocations referencing
`STT_GNU_IFUNC` will be emitted as dynamic relocations (if `.dynsym` is
created). The canonical PLT entry will not be used.
## Miscellaneous
GNU ld has implemented a diagnostic (["i686 ifunc and non-default symbol
visibility"](https://sourceware.org/bugzilla/show_bug.cgi?id=20515)) to flag
`R_386_PC32` referencing non-default visibility ifunc in `-pie` and `-shared`
links. This diagnostic looks like the most prominent reason blocking my
proposal to use `R_386_PLT32` for `call/jump foo`. See [Copy relocations,
canonical PLT entries and protected visibility](maskray-5.md) for details.
https://sourceware.org/glibc/wiki/GNU_IFUNC misses a lot of information. There
are quite a few arch differences. I asked for clarification
https://sourceware.org/pipermail/libc-alpha/2021-January/121752.html
### Dynamic loader
In glibc, `_dl_runtime_resolver` needs to save and restore vector and floating
point registers. ifunc resolvers add another reason that `_dl_runtime_resolver`
cannot only use integer registers. (The other reasons are that ld.so has string
function calls which may use vectors and external calls to libc.so.)