more articles

2021-02-03 01:08:25 +01:00 · 2021-02-03 01:08:25 +01:00 · 589177c6c2
parent 58a5061601
commit 589177c6c2
14 changed files with 4397 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -43,3 +43,14 @@ Other articles included as well:
 * [Executable stack](executable-stack.md)
 * [Piece of PIE](piece-of-pie.md)

+Even more articles, from [MaskRay's blog](https://maskray.me/blog/):
+
+* [Stack unwinding](maskray-1.md)
+* [All about symbol versioning](maskray-2.md)
+* [C++ exception handling ABI](maskray-3.md)
+* [LLD and GNU linker incompatibilities](maskray-4.md)
+* [Copy relocations, canonical PLT entries and protected visibility](maskray-5.md)
+* [GNU indirect function](maskray-6.md)
+* [Everything I know about GNU toolchain](maskray-7.md)
+* [Metadata sections, COMDAT and `SHF_LINK_ORDER`](maskray-8.md)
+
--- a/img/eh_frame_and_monolithic_gcc_except_table.svg
+++ b/img/eh_frame_and_monolithic_gcc_except_table.svg
@ -0,0 +1,123 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.43.0 (0)
+ -->
+<!-- Title: %3 Pages: 1 -->
+<svg width="630pt" height="224pt"
+ viewBox="0.00 0.00 630.00 224.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 220)">
+<title>%3</title>
+<polygon fill="white" stroke="transparent" points="-4,4 -4,-220 626,-220 626,4 -4,4"/>
+<!-- eh_frame -->
+<g id="node1" class="node">
+<title>eh_frame</title>
+<polygon fill="none" stroke="black" points="0,-146.5 0,-215.5 622,-215.5 622,-146.5 0,-146.5"/>
+<text text-anchor="middle" x="311" y="-200.3" font-family="Times,serif" font-size="14.00">.eh_frame</text>
+<polyline fill="none" stroke="black" points="0,-192.5 622,-192.5 "/>
+<text text-anchor="middle" x="131" y="-177.3" font-family="Times,serif" font-size="14.00">FDE0</text>
+<polyline fill="none" stroke="black" points="0,-169.5 262,-169.5 "/>
+<text text-anchor="middle" x="49" y="-154.3" font-family="Times,serif" font-size="14.00">initial_location</text>
+<polyline fill="none" stroke="black" points="98,-146.5 98,-169.5 "/>
+<text text-anchor="middle" x="148.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_personality</text>
+<polyline fill="none" stroke="black" points="199,-146.5 199,-169.5 "/>
+<text text-anchor="middle" x="230.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_lsda</text>
+<polyline fill="none" stroke="black" points="262,-146.5 262,-192.5 "/>
+<text text-anchor="middle" x="393" y="-177.3" font-family="Times,serif" font-size="14.00">FDE1</text>
+<polyline fill="none" stroke="black" points="262,-169.5 524,-169.5 "/>
+<text text-anchor="middle" x="311" y="-154.3" font-family="Times,serif" font-size="14.00">initial_location</text>
+<polyline fill="none" stroke="black" points="360,-146.5 360,-169.5 "/>
+<text text-anchor="middle" x="410.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_personality</text>
+<polyline fill="none" stroke="black" points="461,-146.5 461,-169.5 "/>
+<text text-anchor="middle" x="492.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_lsda</text>
+<polyline fill="none" stroke="black" points="524,-146.5 524,-192.5 "/>
+<text text-anchor="middle" x="573" y="-177.3" font-family="Times,serif" font-size="14.00">FDE2</text>
+<polyline fill="none" stroke="black" points="524,-169.5 622,-169.5 "/>
+<text text-anchor="middle" x="573" y="-154.3" font-family="Times,serif" font-size="14.00">initial_location</text>
+</g>
+<!-- text_a -->
+<g id="node2" class="node">
+<title>text_a</title>
+<polygon fill="none" stroke="black" points="131.5,-0.5 131.5,-36.5 210.5,-36.5 210.5,-0.5 131.5,-0.5"/>
+<text text-anchor="middle" x="171" y="-14.8" font-family="Times,serif" font-size="14.00">.text._Z1av</text>
+</g>
+<!-- eh_frame&#45;&gt;text_a -->
+<g id="edge1" class="edge">
+<title>eh_frame:loc0&#45;&gt;text_a</title>
+<path fill="none" stroke="black" d="M49,-146C49,-113.05 42.18,-99.33 62,-73 76.62,-53.58 100.2,-40.8 121.68,-32.61"/>
+<polygon fill="black" stroke="black" points="123.12,-35.82 131.37,-29.17 120.78,-29.22 123.12,-35.82"/>
+</g>
+<!-- text_b -->
+<g id="node3" class="node">
+<title>text_b</title>
+<polygon fill="none" stroke="black" points="314,-0.5 314,-36.5 394,-36.5 394,-0.5 314,-0.5"/>
+<text text-anchor="middle" x="354" y="-14.8" font-family="Times,serif" font-size="14.00">.text._Z1bv</text>
+</g>
+<!-- eh_frame&#45;&gt;text_b -->
+<g id="edge4" class="edge">
+<title>eh_frame:loc1&#45;&gt;text_b</title>
+<path fill="none" stroke="black" d="M311,-146C311,-112.2 360.65,-139.01 378,-110 389.9,-90.1 381.08,-64.27 370.95,-45.31"/>
+<polygon fill="black" stroke="black" points="373.96,-43.53 365.95,-36.6 367.89,-47.02 373.96,-43.53"/>
+</g>
+<!-- text_c -->
+<g id="node4" class="node">
+<title>text_c</title>
+<polygon fill="none" stroke="black" points="533.5,-73.5 533.5,-109.5 612.5,-109.5 612.5,-73.5 533.5,-73.5"/>
+<text text-anchor="middle" x="573" y="-87.8" font-family="Times,serif" font-size="14.00">.text._Z1cv</text>
+</g>
+<!-- eh_frame&#45;&gt;text_c -->
+<g id="edge7" class="edge">
+<title>eh_frame:loc2&#45;&gt;text_c</title>
+<path fill="none" stroke="black" d="M573,-146C573,-137.51 573,-128.26 573,-119.88"/>
+<polygon fill="black" stroke="black" points="576.5,-119.85 573,-109.85 569.5,-119.85 576.5,-119.85"/>
+</g>
+<!-- text_personality -->
+<g id="node5" class="node">
+<title>text_personality</title>
+<polygon fill="none" stroke="black" points="71.5,-73.5 71.5,-109.5 236.5,-109.5 236.5,-73.5 71.5,-73.5"/>
+<text text-anchor="middle" x="154" y="-87.8" font-family="Times,serif" font-size="14.00">.text.__gxx_personality_v0</text>
+</g>
+<!-- eh_frame&#45;&gt;text_personality -->
+<g id="edge2" class="edge">
+<title>eh_frame:personality0&#45;&gt;text_personality</title>
+<path fill="none" stroke="black" d="M148,-146C148,-137.46 148.76,-128.19 149.75,-119.81"/>
+<polygon fill="black" stroke="black" points="153.23,-120.16 151.07,-109.79 146.29,-119.25 153.23,-120.16"/>
+</g>
+<!-- eh_frame&#45;&gt;text_personality -->
+<g id="edge5" class="edge">
+<title>eh_frame:personality1&#45;&gt;text_personality</title>
+<path fill="none" stroke="black" d="M411,-146C411,-143.84 320.57,-125.41 246.99,-110.78"/>
+<polygon fill="black" stroke="black" points="247.22,-107.26 236.73,-108.75 245.86,-114.13 247.22,-107.26"/>
+</g>
+<!-- lsda -->
+<g id="node6" class="node">
+<title>lsda</title>
+<polygon fill="none" stroke="black" points="255,-73.5 255,-109.5 369,-109.5 369,-73.5 255,-73.5"/>
+<text text-anchor="middle" x="312" y="-87.8" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
+</g>
+<!-- eh_frame&#45;&gt;lsda -->
+<g id="edge3" class="edge">
+<title>eh_frame:lsda0&#45;&gt;lsda</title>
+<path fill="none" stroke="black" d="M230,-146C230,-132.86 237.48,-122.78 247.91,-115.11"/>
+<polygon fill="black" stroke="black" points="249.85,-118.02 256.4,-109.7 246.08,-112.12 249.85,-118.02"/>
+</g>
+<!-- eh_frame&#45;&gt;lsda -->
+<g id="edge6" class="edge">
+<title>eh_frame:lsda1&#45;&gt;lsda</title>
+<path fill="none" stroke="black" d="M493,-146C493,-139.84 430.6,-122.47 379.08,-109.18"/>
+<polygon fill="black" stroke="black" points="379.83,-105.76 369.27,-106.66 378.09,-112.54 379.83,-105.76"/>
+</g>
+<!-- lsda&#45;&gt;text_a -->
+<g id="edge8" class="edge">
+<title>lsda&#45;&gt;text_a</title>
+<path fill="none" stroke="black" stroke-dasharray="1,5" d="M278.23,-73.49C259.01,-63.82 234.74,-51.6 214.15,-41.23"/>
+<polygon fill="black" stroke="black" points="215.49,-37.99 204.99,-36.61 212.34,-44.24 215.49,-37.99"/>
+</g>
+<!-- lsda&#45;&gt;text_b -->
+<g id="edge9" class="edge">
+<title>lsda&#45;&gt;text_b</title>
+<path fill="none" stroke="black" stroke-dasharray="1,5" d="M322.17,-73.31C327.12,-64.94 333.18,-54.7 338.68,-45.4"/>
+<polygon fill="black" stroke="black" points="341.85,-46.92 343.93,-36.53 335.82,-43.35 341.85,-46.92"/>
+</g>
+</g>
+</svg>
--- a/img/fragmented_gcc_except_table.svg
+++ b/img/fragmented_gcc_except_table.svg
@ -0,0 +1,66 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.43.0 (0)
+ -->
+<!-- Title: %3 Pages: 1 -->
+<svg width="364pt" height="209pt"
+ viewBox="0.00 0.00 364.00 209.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 205)">
+<title>%3</title>
+<polygon fill="white" stroke="transparent" points="-4,4 -4,-205 360,-205 360,4 -4,4"/>
+<g id="clust1" class="cluster">
+<title>cluster</title>
+<polygon fill="none" stroke="black" points="8,-8 8,-157 348,-157 348,-8 8,-8"/>
+<text text-anchor="middle" x="178" y="-141.8" font-family="Times,serif" font-size="14.00">Edges represent relocations</text>
+</g>
+<!-- unused -->
+<g id="node1" class="node">
+<title>unused</title>
+<ellipse fill="none" stroke="black" cx="264" cy="-183" rx="36" ry="18"/>
+<text text-anchor="middle" x="264" y="-179.3" font-family="Times,serif" font-size="14.00">unused</text>
+</g>
+<!-- fde_a -->
+<g id="node2" class="node">
+<title>fde_a</title>
+<polygon fill="none" stroke="black" points="210,-89.5 210,-125.5 318,-125.5 318,-89.5 210,-89.5"/>
+<text text-anchor="middle" x="264" y="-103.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE0</text>
+</g>
+<!-- unused&#45;&gt;fde_a -->
+<g id="edge3" class="edge">
+<title>unused&#45;&gt;fde_a</title>
+<path fill="none" stroke="black" d="M264,-164.95C264,-156.3 264,-145.57 264,-135.79"/>
+<polygon fill="black" stroke="black" points="267.5,-135.71 264,-125.71 260.5,-135.71 267.5,-135.71"/>
+</g>
+<!-- lsda_a -->
+<g id="node4" class="node">
+<title>lsda_a</title>
+<polygon fill="none" stroke="black" points="188,-16.5 188,-52.5 340,-52.5 340,-16.5 188,-16.5"/>
+<text text-anchor="middle" x="264" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1av</text>
+</g>
+<!-- fde_a&#45;&gt;lsda_a -->
+<g id="edge1" class="edge">
+<title>fde_a&#45;&gt;lsda_a</title>
+<path fill="none" stroke="black" d="M264,-89.31C264,-81.29 264,-71.55 264,-62.57"/>
+<polygon fill="black" stroke="black" points="267.5,-62.53 264,-52.53 260.5,-62.53 267.5,-62.53"/>
+</g>
+<!-- fde_b -->
+<g id="node3" class="node">
+<title>fde_b</title>
+<polygon fill="none" stroke="black" points="39,-89.5 39,-125.5 147,-125.5 147,-89.5 39,-89.5"/>
+<text text-anchor="middle" x="93" y="-103.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE1</text>
+</g>
+<!-- lsda_b -->
+<g id="node5" class="node">
+<title>lsda_b</title>
+<polygon fill="none" stroke="black" points="16.5,-16.5 16.5,-52.5 169.5,-52.5 169.5,-16.5 16.5,-16.5"/>
+<text text-anchor="middle" x="93" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1bv</text>
+</g>
+<!-- fde_b&#45;&gt;lsda_b -->
+<g id="edge2" class="edge">
+<title>fde_b&#45;&gt;lsda_b</title>
+<path fill="none" stroke="black" d="M93,-89.31C93,-81.29 93,-71.55 93,-62.57"/>
+<polygon fill="black" stroke="black" points="96.5,-62.53 93,-52.53 89.5,-62.53 96.5,-62.53"/>
+</g>
+</g>
+</svg>
--- a/img/lsda_gc.svg
+++ b/img/lsda_gc.svg
@ -0,0 +1,96 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.43.0 (0)
+ -->
+<!-- Title: %3 Pages: 1 -->
+<svg width="496pt" height="246pt"
+ viewBox="0.00 0.00 496.00 246.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 242)">
+<title>%3</title>
+<polygon fill="white" stroke="transparent" points="-4,4 -4,-242 492,-242 492,4 -4,4"/>
+<g id="clust1" class="cluster">
+<title>cluster</title>
+<polygon fill="none" stroke="black" points="8,-8 8,-230 480,-230 480,-8 8,-8"/>
+<text text-anchor="middle" x="244" y="-214.8" font-family="Times,serif" font-size="14.00">Edges represent GC references</text>
+</g>
+<!-- eh_frame -->
+<g id="node1" class="node">
+<title>eh_frame</title>
+<polygon fill="none" stroke="black" points="159.5,-162.5 159.5,-198.5 288.5,-198.5 288.5,-162.5 159.5,-162.5"/>
+<text text-anchor="middle" x="224" y="-176.8" font-family="Times,serif" font-size="14.00">.eh_frame (GC root)</text>
+</g>
+<!-- lsda -->
+<g id="node4" class="node">
+<title>lsda</title>
+<polygon fill="none" stroke="black" points="16,-89.5 16,-125.5 130,-125.5 130,-89.5 16,-89.5"/>
+<text text-anchor="middle" x="73" y="-103.8" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
+</g>
+<!-- eh_frame&#45;&gt;lsda -->
+<g id="edge1" class="edge">
+<title>eh_frame&#45;&gt;lsda</title>
+<path fill="none" stroke="black" d="M187.83,-162.49C167.07,-152.73 140.79,-140.38 118.62,-129.95"/>
+<polygon fill="black" stroke="black" points="119.94,-126.7 109.4,-125.61 116.96,-133.04 119.94,-126.7"/>
+</g>
+<!-- lsda_a -->
+<g id="node5" class="node">
+<title>lsda_a</title>
+<polygon fill="none" stroke="black" points="148,-89.5 148,-125.5 300,-125.5 300,-89.5 148,-89.5"/>
+<text text-anchor="middle" x="224" y="-103.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1av</text>
+</g>
+<!-- eh_frame&#45;&gt;lsda_a -->
+<g id="edge2" class="edge">
+<title>eh_frame&#45;&gt;lsda_a</title>
+<path fill="none" stroke="black" d="M224,-162.31C224,-154.29 224,-144.55 224,-135.57"/>
+<polygon fill="black" stroke="black" points="227.5,-135.53 224,-125.53 220.5,-135.53 227.5,-135.53"/>
+</g>
+<!-- lsda_b -->
+<g id="node6" class="node">
+<title>lsda_b</title>
+<polygon fill="none" stroke="black" points="318.5,-89.5 318.5,-125.5 471.5,-125.5 471.5,-89.5 318.5,-89.5"/>
+<text text-anchor="middle" x="395" y="-103.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1bv</text>
+</g>
+<!-- eh_frame&#45;&gt;lsda_b -->
+<g id="edge3" class="edge">
+<title>eh_frame&#45;&gt;lsda_b</title>
+<path fill="none" stroke="black" d="M264.96,-162.49C288.79,-152.6 319.03,-140.04 344.34,-129.53"/>
+<polygon fill="black" stroke="black" points="345.89,-132.68 353.78,-125.61 343.21,-126.22 345.89,-132.68"/>
+</g>
+<!-- text_a -->
+<g id="node2" class="node">
+<title>text_a</title>
+<polygon fill="none" stroke="black" points="184.5,-16.5 184.5,-52.5 263.5,-52.5 263.5,-16.5 184.5,-16.5"/>
+<text text-anchor="middle" x="224" y="-30.8" font-family="Times,serif" font-size="14.00">.text._Z1av</text>
+</g>
+<!-- text_a&#45;&gt;lsda_a -->
+<g id="edge4" class="edge">
+<title>text_a&#45;&gt;lsda_a</title>
+<path fill="none" stroke="black" d="M229.86,-52.53C230.71,-60.53 230.95,-70.27 230.59,-79.25"/>
+<polygon fill="black" stroke="black" points="227.09,-79.09 229.88,-89.31 234.07,-79.58 227.09,-79.09"/>
+</g>
+<!-- text_b -->
+<g id="node3" class="node">
+<title>text_b</title>
+<polygon fill="none" stroke="black" points="355,-16.5 355,-52.5 435,-52.5 435,-16.5 355,-16.5"/>
+<text text-anchor="middle" x="395" y="-30.8" font-family="Times,serif" font-size="14.00">.text._Z1bv</text>
+</g>
+<!-- text_b&#45;&gt;lsda_b -->
+<g id="edge6" class="edge">
+<title>text_b&#45;&gt;lsda_b</title>
+<path fill="none" stroke="black" d="M400.86,-52.53C401.71,-60.53 401.95,-70.27 401.59,-79.25"/>
+<polygon fill="black" stroke="black" points="398.09,-79.09 400.88,-89.31 405.07,-79.58 398.09,-79.09"/>
+</g>
+<!-- lsda_a&#45;&gt;text_a -->
+<g id="edge5" class="edge">
+<title>lsda_a&#45;&gt;text_a</title>
+<path fill="none" stroke="black" d="M218.12,-89.31C217.28,-81.29 217.05,-71.55 217.42,-62.57"/>
+<polygon fill="black" stroke="black" points="220.92,-62.75 218.14,-52.53 213.94,-62.25 220.92,-62.75"/>
+</g>
+<!-- lsda_b&#45;&gt;text_b -->
+<g id="edge7" class="edge">
+<title>lsda_b&#45;&gt;text_b</title>
+<path fill="none" stroke="black" d="M389.12,-89.31C388.28,-81.29 388.05,-71.55 388.42,-62.57"/>
+<polygon fill="black" stroke="black" points="391.92,-62.75 389.14,-52.53 384.94,-62.25 391.92,-62.75"/>
+</g>
+</g>
+</svg>
--- a/img/lsda_gc_new.svg
+++ b/img/lsda_gc_new.svg
@ -0,0 +1,84 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.43.0 (0)
+ -->
+<!-- Title: %3 Pages: 1 -->
+<svg width="496pt" height="173pt"
+ viewBox="0.00 0.00 496.00 173.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 169)">
+<title>%3</title>
+<polygon fill="white" stroke="transparent" points="-4,4 -4,-169 492,-169 492,4 -4,4"/>
+<g id="clust1" class="cluster">
+<title>cluster</title>
+<polygon fill="none" stroke="black" points="8,-8 8,-157 480,-157 480,-8 8,-8"/>
+<text text-anchor="middle" x="244" y="-141.8" font-family="Times,serif" font-size="14.00">Edges represent GC references</text>
+</g>
+<!-- eh_frame -->
+<g id="node1" class="node">
+<title>eh_frame</title>
+<polygon fill="none" stroke="black" points="342.5,-89.5 342.5,-125.5 471.5,-125.5 471.5,-89.5 342.5,-89.5"/>
+<text text-anchor="middle" x="407" y="-103.8" font-family="Times,serif" font-size="14.00">.eh_frame (GC root)</text>
+</g>
+<!-- lsda -->
+<g id="node4" class="node">
+<title>lsda</title>
+<polygon fill="none" stroke="black" points="358,-16.5 358,-52.5 472,-52.5 472,-16.5 358,-16.5"/>
+<text text-anchor="middle" x="415" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
+</g>
+<!-- eh_frame&#45;&gt;lsda -->
+<g id="edge1" class="edge">
+<title>eh_frame&#45;&gt;lsda</title>
+<path fill="none" stroke="black" d="M408.94,-89.31C409.84,-81.29 410.94,-71.55 411.95,-62.57"/>
+<polygon fill="black" stroke="black" points="415.44,-62.86 413.08,-52.53 408.48,-62.07 415.44,-62.86"/>
+</g>
+<!-- text_a -->
+<g id="node2" class="node">
+<title>text_a</title>
+<polygon fill="none" stroke="black" points="224.5,-89.5 224.5,-125.5 303.5,-125.5 303.5,-89.5 224.5,-89.5"/>
+<text text-anchor="middle" x="264" y="-103.8" font-family="Times,serif" font-size="14.00">.text._Z1av</text>
+</g>
+<!-- lsda_a -->
+<g id="node5" class="node">
+<title>lsda_a</title>
+<polygon fill="none" stroke="black" points="188,-16.5 188,-52.5 340,-52.5 340,-16.5 188,-16.5"/>
+<text text-anchor="middle" x="264" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1av</text>
+</g>
+<!-- text_a&#45;&gt;lsda_a -->
+<g id="edge2" class="edge">
+<title>text_a&#45;&gt;lsda_a</title>
+<path fill="none" stroke="black" d="M258.12,-89.31C257.28,-81.29 257.05,-71.55 257.42,-62.57"/>
+<polygon fill="black" stroke="black" points="260.92,-62.75 258.14,-52.53 253.94,-62.25 260.92,-62.75"/>
+</g>
+<!-- text_b -->
+<g id="node3" class="node">
+<title>text_b</title>
+<polygon fill="none" stroke="black" points="53,-89.5 53,-125.5 133,-125.5 133,-89.5 53,-89.5"/>
+<text text-anchor="middle" x="93" y="-103.8" font-family="Times,serif" font-size="14.00">.text._Z1bv</text>
+</g>
+<!-- lsda_b -->
+<g id="node6" class="node">
+<title>lsda_b</title>
+<polygon fill="none" stroke="black" points="16.5,-16.5 16.5,-52.5 169.5,-52.5 169.5,-16.5 16.5,-16.5"/>
+<text text-anchor="middle" x="93" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1bv</text>
+</g>
+<!-- text_b&#45;&gt;lsda_b -->
+<g id="edge4" class="edge">
+<title>text_b&#45;&gt;lsda_b</title>
+<path fill="none" stroke="black" d="M87.12,-89.31C86.28,-81.29 86.05,-71.55 86.42,-62.57"/>
+<polygon fill="black" stroke="black" points="89.92,-62.75 87.14,-52.53 82.94,-62.25 89.92,-62.75"/>
+</g>
+<!-- lsda_a&#45;&gt;text_a -->
+<g id="edge3" class="edge">
+<title>lsda_a&#45;&gt;text_a</title>
+<path fill="none" stroke="black" d="M269.86,-52.53C270.71,-60.53 270.95,-70.27 270.59,-79.25"/>
+<polygon fill="black" stroke="black" points="267.09,-79.09 269.88,-89.31 274.07,-79.58 267.09,-79.09"/>
+</g>
+<!-- lsda_b&#45;&gt;text_b -->
+<g id="edge5" class="edge">
+<title>lsda_b&#45;&gt;text_b</title>
+<path fill="none" stroke="black" d="M98.86,-52.53C99.71,-60.53 99.95,-70.27 99.59,-79.25"/>
+<polygon fill="black" stroke="black" points="96.09,-79.09 98.88,-89.31 103.07,-79.58 96.09,-79.09"/>
+</g>
+</g>
+</svg>
--- a/img/monolithic_gcc_except_table.svg
+++ b/img/monolithic_gcc_except_table.svg
@ -0,0 +1,64 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.43.0 (0)
+ -->
+<!-- Title: %3 Pages: 1 -->
+<svg width="274pt" height="219pt"
+ viewBox="0.00 0.00 274.00 219.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 215)">
+<title>%3</title>
+<polygon fill="white" stroke="transparent" points="-4,4 -4,-215 270,-215 270,4 -4,4"/>
+<g id="clust1" class="cluster">
+<title>cluster</title>
+<polygon fill="none" stroke="black" points="8,-8 8,-167 258,-167 258,-8 8,-8"/>
+<text text-anchor="middle" x="133" y="-151.8" font-family="Times,serif" font-size="14.00">Edges represent relocations</text>
+</g>
+<!-- unused -->
+<g id="node1" class="node">
+<title>unused</title>
+<ellipse fill="none" stroke="black" cx="70" cy="-193" rx="36" ry="18"/>
+<text text-anchor="middle" x="70" y="-189.3" font-family="Times,serif" font-size="14.00">unused</text>
+</g>
+<!-- fde_a -->
+<g id="node2" class="node">
+<title>fde_a</title>
+<polygon fill="none" stroke="black" points="16,-99.5 16,-135.5 124,-135.5 124,-99.5 16,-99.5"/>
+<text text-anchor="middle" x="70" y="-113.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE0</text>
+</g>
+<!-- unused&#45;&gt;fde_a -->
+<g id="edge3" class="edge">
+<title>unused&#45;&gt;fde_a</title>
+<path fill="none" stroke="black" d="M70,-174.95C70,-166.3 70,-155.57 70,-145.79"/>
+<polygon fill="black" stroke="black" points="73.5,-145.71 70,-135.71 66.5,-145.71 73.5,-145.71"/>
+</g>
+<!-- lsda -->
+<g id="node4" class="node">
+<title>lsda</title>
+<polygon fill="none" stroke="black" points="76,-16.5 76,-62.5 190,-62.5 190,-16.5 76,-16.5"/>
+<text text-anchor="middle" x="133" y="-47.3" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
+<polyline fill="none" stroke="black" points="76,-39.5 190,-39.5 "/>
+<text text-anchor="middle" x="104" y="-24.3" font-family="Times,serif" font-size="14.00">lsda_a</text>
+<polyline fill="none" stroke="black" points="132,-16.5 132,-39.5 "/>
+<text text-anchor="middle" x="161" y="-24.3" font-family="Times,serif" font-size="14.00">lsda_b</text>
+</g>
+<!-- fde_a&#45;&gt;lsda -->
+<g id="edge1" class="edge">
+<title>fde_a&#45;&gt;lsda:a</title>
+<path fill="none" stroke="black" d="M64.21,-99.34C57.5,-77.11 49.26,-40.03 65.15,-30.04"/>
+<polygon fill="black" stroke="black" points="66.19,-33.39 75,-27.5 64.44,-26.61 66.19,-33.39"/>
+</g>
+<!-- fde_b -->
+<g id="node3" class="node">
+<title>fde_b</title>
+<polygon fill="none" stroke="black" points="142,-99.5 142,-135.5 250,-135.5 250,-99.5 142,-99.5"/>
+<text text-anchor="middle" x="196" y="-113.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE1</text>
+</g>
+<!-- fde_b&#45;&gt;lsda -->
+<g id="edge2" class="edge">
+<title>fde_b&#45;&gt;lsda:b</title>
+<path fill="none" stroke="black" d="M201.79,-99.34C208.5,-77.11 216.74,-40.03 200.85,-30.04"/>
+<polygon fill="black" stroke="black" points="201.56,-26.61 191,-27.5 199.81,-33.39 201.56,-26.61"/>
+</g>
+</g>
+</svg>
--- a/maskray-1.md
+++ b/maskray-1.md
@ -0,0 +1,708 @@
+# Stack unwinding
+
+The main usage of stack unwinding is:
+
+* To obtain a stack trace for debugger, crash reporter, profiler, garbage
+  collector, etc.
+* With personality routines and language specific data area, to implement C++
+  exceptions (Itanium C++ ABI). See [C++ exception handling ABI](maskray-3.md)
+
+Stack unwinding tasks can be divided into two categories:
+
+* synchronous: triggered by the program itself, C++ throw, get its own stack
+  trace, etc. This type of stack unwinding only occurs at the function call
+  (in the function body, it will not appear in the prologue/epilogue)
+* asynchronous: triggered by a garbage collector, signals or an external
+  program, this kind of stack unwinding can happen in function prologue/epilogue
+
+## Frame pointer
+
+The most classic and simplest stack unwinding is based on the frame pointer:
+fix a register as the frame pointer (RBP on x86-64), put the frame pointer in
+the stack frame at the function prologue, and update the frame pointer to the
+address of the saved frame pointer. The frame pointer and its saved values in
+the stack form a singly linked list. After obtaining the initial frame pointer
+value (`__builtin_frame_address`), dereference the frame pointer continuously
+to get the frame pointer values of all stack frames. This method is not
+applicable to some instructions in the prologue/epilogue.
+
+```
+pushq %rbp
+movq %rsp, %rbp # after this, RBP references the current frame
+...
+popq %rbp
+retq  # RBP references the previous frame
+```
+
+```c
+#include <stdio.h>
+[[gnu::noinline]] void qux() {
+  void **fp = __builtin_frame_address(0);
+  for (;;) {
+    printf("%p\n", fp);
+    void **next_fp = *fp;
+    if (next_fp <= fp) break;
+    fp = next_fp;
+  }
+}
+[[gnu::noinline]] void bar() { qux(); }
+[[gnu::noinline]] void foo() { bar(); }
+int main() { foo(); }
+```
+
+The frame pointer-based method is simple, but has several drawbacks.
+
+When the above code is compiled with `-O1` or above, foo and bar will have tail
+calls, and the program output will not include the stack frame of foo and bar
+(`-fomit-leaf-frame-pointer` does not hinder the tail call).
+
+In practice, it is not guaranteed that all libraries contain frame pointers.
+When unwinding a thread, it is necessary to check whether `next_fp` is like a
+stack address before dereferencing it to prevent segfaults. One way to check
+page accessibility is to parse `/proc/*/maps` to determine whether the address is
+readable (slow). There is a smart trick:
+
+```c
+// Or use the write end of a pipe.
+int fd = open("/dev/random", O_WRONLY);
+if (write(fd, address, 1) < 0)
+  // not readable
+```
+
+In addition, reserving a register for the frame pointer will increase text size
+and have negative performance impact (prologue, epilogue additional instruction
+overhead and register pressure caused by one fewer register), which may be
+quite significant on x86-32 which lack registers. On an architecture with
+relatively sufficient registers, e.g. x86-64, the performance loss can be more
+than 1%.
+
+### Compiler behavior
+
+* -O0: Default `-fno-omit-frame-pointer`, all functions have frame pointer
+* -O1 or above: Preset `-fomit-frame-pointer`, set frame pointer only if
+  necessary. Specify `-fno-omit-leaf-frame-pointer` to get a similar effect to
+  -O0. You can additionally specify `-momti-leaf-frame-pointer` to remove the
+  frame pointer of leaf functions
+
+## libunwind
+
+C++ exception and stack unwinding of profiler/crash reporter usually use
+libunwind API and DWARF Call Frame Information. In the 1990s, Hewlett-Packard
+defined a set of libunwind API, which is divided into two categories:
+
+* `unw_*`: The entry points are `unw_init_local` (local unwinding, current
+  process) and `unw_init_remote` (remote unwinding, other processes).
+  Applications that usually use libunwind use this API. For example, Linux perf
+  will call `unw_init_remote`
+* `_Unwind_*`: This part is standardized as Level 1: Base ABI of [Itanium C++
+  ABI: Exception Handling](https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html).
+  The Level 2 C++ ABI calls these `_Unwind_*` APIs. Among them, `_Unwind_Resume`
+  is the only API that is directly called by C++ compiled code.
+  `_Unwind_Backtrace` is used by a few applications to obtain stack traces. Other
+  functions are called by libsupc++/libc++abi `__cxa_*` functions and
+  `__gxx_personality_v0`.
+
+Hewlett-Packard has open sourced https://www.nongnu.org/libunwind/ (in addition
+to many projects called "libunwind"). The common implementations of this API on
+Linux are:
+
+* libgcc/unwind-\* (`libgcc_s.so.1` or `libgcc_eh.a`): Implemented `_Unwind_*`
+  and introduced some extensions: `_Unwind_Resume_or_Rethrow`,
+  `_Unwind_FindEnclosingFunction`, `__register_frame` etc.
+* llvm-project/libunwind (`libunwind.so` or `libunwind.a`) is a simplified
+  implementation of HP API, which provides part of `unw_*`, but does not
+  implement `unw_init_remote`. Part of the code is taken from ld64. If you use
+  Clang, you can use `--rtlib=compiler-rt --unwindlib=libunwind` to choose
+* glibc's internal implementation of `_Unwind_Find_FDE`, usually not exported,
+  and related to `__register_frame_info`
+
+## DWARF Call Frame Information
+
+The unwind instructions required by different areas of the program are
+described by DWARF Call Frame Information (CFI) and stored by `.eh_frame` on
+the ELF platform. Compiler/assembler/linker/libunwind provides corresponding
+support.
+
+`.eh_frame` is composed of Common Information Entry (CIE) and Frame Description
+Entry (FDE). CIE has these fields:
+
+* `length`
+* `CIE_id`: Constant 0. This field is used to distinguish CIE and FDE. In FDE,
+  this field is non-zero, representing `CIE_pointer`
+* `version`: Constant 1
+* `augmentation_string`: A string describing the CIE/FDE parameter list. The `P`
+  character indicates the personality routine pointer; the `L` character
+  indicates that the augmentation data of the FDE stores the language-specific
+  data area (LSDA)
+* `address_size`: Generally 4 or 8
+* `segment_selector_size`: For x86
+* `code_alignment_factor`: Assuming that the instruction length is a multiple of
+  2 or 4 (for RISC), it affects the multiplier of parameters such as
+  `DW_CFA_advance_loc`
+* `data_alignment_factor`: The multiplier that affects parameters such as
+  `DW_CFA_offset` `DW_CFA_val_offset`
+* `return_address_register`
+* `augmentation_data_length`
+* `augmentation_data`: personality
+* `initial_instructions`: bytecode for unwinding, a common prefix used by all
+  FDEs using this CIE
+* padding
+
+Each FDE has an associated CIE. FDE has these fields:
+
+* `length`: The length of FDE itself. If it is `0xffffffff`, the next 8 bytes
+  (`extended_length`) record the actual length. Unless specially constructed,
+  `extended_length` is not used
+* `CIE_pointer`: Subtract CIE_pointer from the current position to get the
+  associated CIE
+* `initial_location`: The address of the first location described by the FDE.
+  There is a relocation referring to the section symbol in .o
+* `address_range`: initial_location and address_range describe an address range
+* `instructions`: bytecode for unwinding, essentially (address,opcode) pairs
+* `augmentation_data_length`
+* `augmentation_data`: If the associated CIE augmentation contains `L`
+  characters, language-specific data area will be recorded here
+* padding
+
+A CIE may optionally refer to a personality routine in the text section. A FDE
+may optionally refer to its associated LSDA in `.gcc_except_table`. The
+personality routine and LSDA are used in Level 2: C++ ABI of Itanium C++ ABI.
+
+`.eh_frame` is based on `.debug_frame` introduced in DWARF v2. They have some
+differences, though:
+
+* `.eh_frame` has the flag of `SHF_ALLOC` (indicating that a section should be
+  part of the mirror image in memory) but `.debug_frame` does not, so the latter
+  has very few usage scenarios.
+* `debug_frame` supports DWARF64 format (supports 64-bit offsets but the volume
+  will be slightly larger) but `.eh_frame` does not support (in fact, it can be
+  expanded, but lacks demand)
+* There is no augmentation_data_length and augmentation_data in the CIE of
+  `.debug_frame`
+* The version field in CIE is different
+* The meaning of CIE_pointer in FDE is different. `.debug_frame` indicates a
+  section offset (absolute) and `.eh_frame` indicates a relative offset. This
+  change made by `.eh_frame` is great. If the length of `.eh_frame` exceeds
+  32-bit, `.debug_frame` has to be converted to DWARF64 to represent
+  `CIE_pointer`, and relative offset does not need to worry about this issue (if
+  the distance between FDE and CIE exceeds 32-bit, add a CIE OK)
+
+For the following function:
+
+```c
+void f() {
+  __builtin_unwind_init();
+}
+```
+
+The compiler produces `.cfi_*` (CFI directives) to annotate the assembly,
+`.cfi_startproc` and `.cfi_endproc` annotate the FDE area, and other CFI directives
+describe CFI instructions. A call frame is indicated by an address on the
+stack. This address is called Canonical Frame Address (CFA), and is usually the
+stack pointer value of the call site. The following example demonstrates the
+usage of CFI instructions:
+
+```
+f:
+# At the function entry, CFA = rsp+8
+	.cfi_startproc
+# %bb.0:
+	pushq	%rbp
+# Redefine CFA = rsp+16
+	.cfi_def_cfa_offset 16
+# rbp is saved at the address CFA-16
+	.cfi_offset %rbp, -16
+	movq	%rsp, %rbp
+# CFA = rbp+16. CFA does not needed to be redefined when rsp changes
+	.cfi_def_cfa_register %rbp
+	pushq	%r15
+	pushq	%r14
+	pushq	%r13
+	pushq	%r12
+	pushq	%rbx
+# rbx is saved at the address CFA-56
+	.cfi_offset %rbx, -56
+	.cfi_offset %r12, -48
+	.cfi_offset %r13, -40
+	.cfi_offset %r14, -32
+	.cfi_offset %r15, -24
+	popq	%rbx
+	popq	%r12
+	popq	%r13
+	popq	%r14
+	popq	%r15
+	popq	%rbp
+# CFA = rsp+8
+	.cfi_def_cfa %rsp, 8
+	retq
+.Lfunc_end0:
+	.size	f, .Lfunc_end0-f
+	.cfi_endproc
+```
+
+The assembler parses CFI directives and generates `.eh_frame` (this mechanism was
+introduced by Alan Modra in 2003). Linker collects `.eh_frame` input sections in
+.o/.a files to generate output `.eh_frame`. In 2006, GNU as introduced
+`.cfi_personality` and `.cfi_lsda`.
+
+### `.eh_frame_hdr` and `PT_EH_FRAME`
+
+To locate the FDE where a pc is located, you need to scan `.eh_frame` from the
+beginning to find the appropriate FDE (whether the pc falls in the interval
+indicated by initial_location and address_range). The time spent is
+proportional to the number of scanned CIE and FDE records.
+https://sourceware.org/pipermail/binutils/2001-December/015674.html introduced
+`.eh_frame_hdr`, a binary search index table describing (`initial_location`, FDE
+address) pairs.
+
+The linker collects all `.eh_frame` input sections. With `--eh-frame-hdr`, `ld`
+generates `.eh_frame_hdr` and creates a program header `PT_EH_FRAME` to describe
+`.eh_frame_hdr`. An unwinder can parse the program headers and look for
+`PT_EH_FRAME` to locate `.eh_frame_hdr`. Please check out the example below.
+
+### `__register_frame_info`
+
+Before `.eh_frame_hdr` and `PT_EH_FRAME` were invented, there was a static
+constructor `frame_dummy` in crtbegin (`crtstuff.c`): calling
+`__register_frame_info` to register the executable file `.eh_frame`.
+
+Now `__register_frame_info` is only used by programs linked with `-static`.
+Correspondingly, if you specify `-Wl,--no-eh-frame-hdr` when linking, you cannot
+unwind (if you use a C++ exception, the program will call `std::terminate`).
+
+### libunwind example
+
+```c
+#include <libunwind.h>
+#include <stdio.h>
+
+void backtrace() {
+  unw_context_t context;
+  unw_cursor_t cursor;
+  // Store register values into context.
+  unw_getcontext(&context);
+  // Locate the PT_GNU_EH_FRAME which contains PC.
+  unw_init_local(&cursor, &context);
+  size_t rip, rsp;
+  do {
+    unw_get_reg(&cursor, UNW_X86_64_RIP, &rip);
+    unw_get_reg(&cursor, UNW_X86_64_RSP, &rsp);
+    printf("rip: %zx rsp: %zx\n", rip, rsp);
+  } while (unw_step(&cursor) > 0);
+}
+
+void bar() {backtrace();}
+void foo() {bar();}
+int main() {foo();}
+```
+
+If you use llvm-project/libunwind：
+
+```sh
+$CC a.c -Ipath/to/include -Lpath/to/lib -lunwind
+```
+
+If you use nongnu.org/libunwind, there are two options: (a) Add `#define
+UNW_LOCAL_ONLY` before `#include <libunwind.h>` (b) Link one more library, on
+x86-64 it is `-l:libunwind-x86_64.so`. If you use Clang, you can also use `clang
+--rtlib=compiler-rt --unwindlib=libunwind -I path/to/include a.c`, in addition
+to providing `unw_*`, it can ensure that `libgcc_s.so` is not linked
+
+* `unw_getcontext`: Get register value (including PC)
+* `unw_init_local`
+  * Use `dl_iterate_phdr` to traverse executable files and shared objects, and
+    find the `PT_LOAD` program header that contains the PC
+  * Find the `PT_EH_FRAME`(`.eh_frame_hdr`) of the module where you are, and
+    save it in cursor
+* `unw_step`
+  * Binary search for the `.eh_frame_hdr` item corresponding to the PC, record
+    the FDE found and the CIE it points to
+  * Execute `initial_instructions` in CIE
+  * Execute the instructions (bytecode) in FDE. An automaton maintains the
+    current location and CFA. Among the instructions, `DW_CFA_advance_loc`
+    advances the location; `DW_CFA_def_cfa_*` updates CFA; `DW_CFA_offset`
+    indicates that the value of a register is stored at CFA+offset
+  * The automaton stops when the current location is greater than or equal to
+    PC. In other words, the executed instruction is a prefix of FDE instructions
+
+An unwinder locates the applicable FDE according to the program counter, and
+executes all the CFI instructions before the program counter.
+
+There are several important
+
+* `DW_CFA_def_cfa_*`
+* `DW_CFA_offset`
+* `DW_CFA_advance_loc`
+
+A `-DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=X86` clang, `.text`
+51.7MiB, `.eh_frame` 4.2MiB, `.eh_frame_hdr` 646, 2 CIE, 82745 FDE.
+
+### Remarks
+
+CFI instructions are suitable for the compiler to generate code, but cumbersome
+to write in hand-written assembly. In 2015, Alex Dowad contributed an awk
+script to musl libc to parse the assembly and automatically generate CFI
+directives. In fact, generating precise CFI instructions is challenging for
+ompilers as well. For a function that does not use a frame pointer, adjusting
+SP requires outputting a CFI directive to redefine CFA. GCC does not parse
+inline assembly, so adjusting SP in inline assembly often results in imprecise
+CFI.
+
+```c
+void foo() {
+  asm("subq $128, %rsp\n"
+  // Cannot unwind if -fomit-leaf-frame-pointer
+      "nop\n"
+      "addq $128, %rsp\n");
+}
+
+int main() {
+  foo();
+}
+```
+
+The CFIInstrInserter pass in LLVM can insert `.cfi_def_cfa_*` `.cfi_offset`
+`.cfi_restore` to adjust the CFA and callee-saved registers.
+
+The DWARF scheme also has very low information density. The various compact
+unwind schemes have made improvement on this aspect. To list a few issues:
+
+* CIE `address_size`: nobody uses different values for an architecture. Even if
+  they do (ILP32 ABIs in AArch64 and x86-64), the information is already
+  available elsewhere.
+* CIE `segment_selector_size`: It is nice that they cared x86, but x86 itself
+  does not need it anymore :/
+* CIE `code_alignment_factor` and `data_alignment_factor`: A RISC architecture
+  with such preference can hard code the values.
+* CIE `return_address_register`: I do not know when an architecture wants to
+  use a different register for the return address.
+* `length`: The DWARF's 8-byte form is definitely overengineered... For standard
+  form prologue/epilogue, the field should not be needed.
+* `initial_location` and `address_range`: if a binary search index table is
+  always needed, why do we need the length field?
+* `instructions`: bytecode is flexible but commonly a function
+  prologue/epilogue is of a standard form and the few callee-saved registers
+  can be encoded in a more compact way.
+* `augmentation_data`: While this provide flexibility, in practice very rarely
+  a function needs anything more than a personality and a LSDA pointer.
+
+Callee-saved registers other than FP are oftentimes unneeded but there is no
+compiler option to drop them.
+
+## `SHT_X86_64_UNWIND`
+
+`.eh_frame` has special processing in linker/dynamic loader, so conventionally
+it should use a separate section type, but `SHT_PROGBITS` was used in the
+design. In the x86-64 psABI, the type of `.eh_frame` is `SHT_X86_64_UNWIND`
+(influenced by Solaris).
+
+* In GNU as, `.section .eh_frame,"a",@unwind` will generate `SHT_X86_64_UNWIND`,
+  and `.cfi_*` will generate `SHT_PROGBITS`.
+* Since Clang 3.8, `.cfi_*` generates `SHT_X86_64_UNWIND`
+
+`.section .eh_frame,"a",@unwind` is rare (glibc's x86 port, libffi, LuaJIT and
+other packages), so checking the type of `.eh_frame` is a good way to
+distinguish Clang/GCC object file :) For LLD 11.0.0, I contributed
+https://reviews.llvm.org/D85785 to allow mixed types for `.eh_frame` in a
+relocatable link ;-)
+
+Suggestion to future architectures: When defining processor-specific section
+types, please do not use 0x70000001
+(`SHT_ARM_EXIDX=SHT_IA_64_UNWIND=SHT_PARISC_UNWIND=SHT_X86_64_UNWIND=SHT_LOPROC+1`)
+for purposes other than unwinding :) `SHT_CSKY_ATTRIBUTES=0x70000001` :)
+
+### Linker perspective
+
+Usually in the case of COMDAT group and `-ffunction-sections`,
+`.data`/`.rodata` needs to be split like `.text`, but `.eh_frame` is
+monolithic. Like many other metadata sections, the main problem with the
+monolithic section is that garbage collection is challenging in the linker.
+Unlike some other metadata sections, simply abandoning garbage collecting is
+not a choice: `.eh_frame_hdr` is a binary search index table and
+duplicate/unused entries can confuse the customers.
+
+When a linker processes `.eh_frame`, it needs to conceptually split `.eh_frame`
+into CIE/FDE. During `--gc-sections`, the conceptual reference relationship is
+reversed considering the actual relocation: a FDE has a relocation referencing
+the text section; during GC, if the pointed text section is discarded, the FDE
+that references it should also be discarded.
+
+LLD has some special handling for `.eh_frame`:
+
+* `-M` requires special code
+* `--gc-sections` occurs before `.eh_frame` deduplication/GC. The personality
+  in a CIE is a valid reference. However, `initial_location` in FDE should be
+  ignored. Moreover, a LSDA reference in a FDE in a section group should be
+  ignored.
+* In a relocatable link, a relocation from `.eh_frame` to a `STT_SECTION`
+  symbol in a discarded section (due to COMDAT group rule) should be allowed
+  (normally such a `STB_LOCAL` relocation from outside of the group is
+  disallowed).
+
+## Compact unwind descriptors
+
+On macOS, Apple designed the compact unwind descriptors mechanism to accelerate
+unwinding. In theory, this technique can be used to save some space in
+`__eh_frame`, but it has not been implemented. The main idea is:
+
+* The FDE of most functions has a fixed mode (specify CFA at the prologue,
+  store callee-saved registers), and the FDE instructions can be compressed to
+  32-bit.
+* Personality/lsda described by CIE/FDE augmentation data is very common and
+  can be extracted as a fixed field.
+
+Only 64-bit will be discussed below. A descriptor occupies 32 bytes
+
+```
+.quad _foo
+.set L1, Lfoo_end-_foo
+.long L1
+.long compact_unwind_description
+.quad personality
+.quad lsda_address
+```
+
+If you study `.eh_frame_hdr` (binary search index table) and `.ARM.exidx`, you
+can know that the length field is redundant.
+
+The Compact unwind descriptor is encoded as:
+
+```c
+uint32_t : 24; // vary with different modes
+uint32_t mode : 4;
+uint32_t flags : 4;
+```
+
+Five modes are defined:
+
+* 0: reserved
+* 1: FP-based frame: RBP is frame pointer, frame size is variable
+* 2: SP-based frame: frame pointer is not used, frame size is fixed during
+  compilation
+* 3: large SP-based frame: frame pointer is not used, the frame size is fixed
+  at compile time but the value is large and cannot be represented by mode 2
+* 4: DWARF CFI escape
+
+### FP-based frame (`UNWIND_MODE_BP_FRAME`)
+
+The compact unwind encoding is:
+
+```c
+uint32_t regs : 15;
+uint32_t : 1; // 0
+uint32_t stack_adjust : 8;
+uint32_t mode : 4;
+uint32_t flags : 4;
+```
+
+The callee-saved registers on x86-64 are: RBX, R12, R13, R14, R15, RBP. 3 bits
+can encode a register, 15 bits are enough to represent 5 registers except RBP
+(whether to save and where). `stack_adjust` records the extra stack space outside
+the save register.
+
+### SP-based frame (`UNWIND_MODE_STACK_IMMD`)
+
+The compact unwind encoding is:
+
+```c
+uint32_t reg_permutation : 10;
+uint32_t cnt : 3;
+uint32_t : 3;
+uint32_t size : 8;
+uint32_t mode : 4;
+uint32_t flags : 4;
+```
+
+`cnt` represents the number of saved registers (maximum 6). `reg_permutation`
+indicates the sequence number of the saved register. `size*8` represents the
+stack frame size.
+
+### Large SP-based frame (`UNWIND_MODE_STACK_IND`)
+
+Compact unwind descriptor编码为：
+
+```c
+uint32_t reg_permutation : 10;
+uint32_t cnt : 3;
+uint32_t adj : 3;
+uint32_t size_offset : 8;
+uint32_t mode : 4;
+uint32_t flags : 4;
+```
+
+Similar to SP-based frame. In particular: the stack frame size is read from the
+text section. The RSP adjustment is usually represented by `subq imm, %rsp`, and
+`size_offset` is used to represent the distance from the instruction to the
+beginning of the function. The actual stack size also includes `adj*8`.
+
+### DWARF CFI escape
+
+If for various reasons, the compact unwind descriptor cannot be expressed, it
+must fall back to DWARF CFI.
+
+In the LLVM implementation, each function is represented by only a compact
+unwind descriptor. If asynchronous stack unwinding occurs in epilogue, existing
+implementations cannot distinguish it from stack unwinding in function body.
+Canonical Frame Address will be calculated incorrectly, and the caller-saved
+register will be read incorrectly. If it happens in prologue, and the prologue
+has other instructions outside the push register and `subq imm, $rsp`, an error
+will occur. In addition, if shrink wrapping is enabled for a function, prologue
+may not be at the beginning of the function. The asynchronous stack unwinding
+from the beginning to the prologue also fails. It seems that most people don't
+care about this issue. It may be because the profiler loses a few percentage
+points of the profile.
+
+In fact, if you use multiple descriptors to describe each area of a function,
+you can still unwind accurately. OpenVMS proposed [\[RFC\] Improving compact
+x86-64 compact unwind descriptors](http://lists.llvm.org/pipermail/llvm-dev/2018-January/120741.html)
+in 2018, but unfortunately there is no relevant implementation.
+
+### ARM exception handling
+
+Divided into `.ARM.exidx` and `.ARM.extab`
+
+`.ARM.exidx` is a binary search index table, composed of 2-word pairs. The
+first word is 31-bit PC-relative offset to the start of the region. The second
+word uses the program description more clearly:
+
+```c
+if (indexData == EXIDX_CANTUNWIND)
+  return false;  // like an absent .eh_frame entry. In the case of C++ exceptions, std::terminate
+if (indexData & 0x80000000) {
+  extabAddr = &indexData;
+  extabData = indexData; // inline
+} else {
+  extabAddr = &indexData + signExtendPrel31(indexData);
+  extabData = read32(&indexData + signExtendPrel31(indexData)); // stored in .ARM.extab
+}
+```
+
+`tableData & 0x80000000` means a compact model entry, otherwise means a generic
+model entry.
+
+`.ARM.exidx` is equivalent to enhanced `.eh_frame_hdr`, compact model is
+equivalent to inlining the personality and lsda in `.eh_frame`. Consider the
+following three situations:
+
+* If the C++ exception will not be triggered and the function that may trigger
+  the exception will not be called: no personality is needed, only one
+  `EXIDX_CANTUNWIND` entry is needed, no `.ARM.extab`
+* If a C++ exception is triggered but no landing pad is required: personality
+  is `__aeabi_unwind_cpp_pr0`, only a compact model entry is needed, no
+  `.ARM.extab`
+* If there is a catch: `__gxx_personality_v0` is required, `.ARM.extab` is
+  required
+
+`.ARM.extab` is equivalent to the combined `.eh_frame` and `.gcc_except_table`.
+
+### Generic model
+
+```c
+uint32_t personality; // bit 31 is 0
+uint32_t : 24;
+uint32_t num : 8;
+uint32_t opcodes[];   // opcodes, variable length
+uint8_t lsda[];       // variable length
+```
+
+In construction.
+
+## Windows ARM64 exception handling
+
+See https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling, this
+is my favorite coding scheme. Support the unwinding of mid-prolog and
+mid-epilog. Support function fragments (used to represent unconventional stack
+frames such as shrink wrapping).
+
+Saved in two sections `.pdata` and `.xdata`.
+
+```c
+uint32_t function_start_rva;
+uint32_t Flag : 2;
+uint32_t Data : 30;
+```
+
+For canonical form functions, Packed Unwind Data is used, and no `.xdata` record
+is required; for descriptors that cannot be represented by Packed Unwind Data,
+it is stored in `.xdata`.
+
+### Packed Unwind Data
+
+```c
+uint32_t FunctionStartRVA;
+uint32_t Flag : 2;
+uint32_t FunctionLength : 11;
+uint32_t RegF : 3;
+uint32_t RegI : 4;
+uint32_t H : 1;
+uint32_t CR : 2;
+uint32_t FrameSize : 9;
+```
+
+## MIPS compact exception tables
+
+In construction.
+
+## Linux kernel ORC unwind tables
+
+For x86-64, the Linux kernel uses its own unwind tables: ORC. You can find its
+documentation on https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html
+and there is an lwn.net introduction [The ORCs are coming](https://lwn.net/Articles/728339/).
+
+`objtool orc generate a.o` parses `.eh_frame` and generates `.orc_unwind` and
+`.orc_unwind_ip`. For an object file assembled from:
+
+```
+.globl foo
+.type foo, @function
+foo:
+  ret
+```
+
+At two addresses the unwind information changes: the start of foo and the end
+of foo, so 2 ORC entries will be produced. If the DWARF CFA changes (e.g. due
+to push/pop) in the middle of the function, there may be more entries.
+
+`.orc_unwind_ip` contains two entries, representing the PC-relative addresses.
+
+```
+Relocation section '.rela.orc_unwind_ip' at offset 0x2028 contains 2 entries:
+    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
+0000000000000000  0000000500000002 R_X86_64_PC32          0000000000000000 .text + 0
+0000000000000004  0000000500000002 R_X86_64_PC32          0000000000000000 .text + 1
+```
+
+`.orc_unwind` contains two entries of type `orc_entry`. The entries encode how
+IP/SP/BP of the previous frame are stored.
+
+```c
+struct orc_entry {
+  s16 sp_offset; // sp_offset and sp_reg encode where SP of the previous frame is stored
+  s16 bp_offset; // bp_offset and bp_reg encode where BP of the previous frame is stored
+  unsigned sp_reg:4;
+  unsigned bp_reg:4;
+  unsigned type:2; // how IP of the previous frame is stored
+  unsigned end:1;
+} __attribute__((__packed__));
+```
+
+You may find similarities in this scheme and `UNWIND_MODE_BP_FRAME` and
+`UNWIND_MODE_STACK_IMMD` in Apples's compact unwind descriptors. The ORC scheme
+uses 16-bit integers so assumably `UNWIND_MODE_STACK_IND` will not be needed.
+During unwinding, most callee-saved registers other than BP are unneeded, so
+ORC does not bother recording them.
+
+The linker will resolve relocations in `.orc_unwind_ip` and create
+`__start_orc_unwind_ip`/`__stop_orc_unwind_ip`/`__start_orc_unwind`/
+`__stop_orc_unwind` delimiter the section contents. Then, a host utility
+scripts/sorttable sorts the contents of `.orc_unwind_ip` and `.orc_unwind`. To
+unwind a stack frame, `unwind_next_frame`
+* performs a binary search into the `.orc_unwind_ip` table to figure out the
+  relevant ORC entry
+* retrieves the previous SP with the current SP, `orc->sp_reg` and
+  `orc->sp_offset`.
+* retrieves the previous IP with `orc->type` and other values.
+* retrieves the previous BP with the currrent BP, the previous SP, `orc->bp_reg`
+  and `orc->bp_offset`. `bp->reg` can be
+  `ORC_REG_UNDEFINED`/`ORC_REG_PREV_SP`/`ORC_REG_BP`.
+
--- a/maskray-2.md
+++ b/maskray-2.md
@ -0,0 +1,558 @@
+# All about symbol versioning
+
+In 1995, Solaris' link editor and ld.so introduced the symbol versioning
+mechanism. Ulrich Drepper and Eric Youngdale borrowed Solaris symbol versioning
+in 1997 and designed the GNU style symbol versioning for glibc.
+
+When a shared object is updated, the behavior of a symbol changes (ABI changes
+(such as changing the type of parameters or return values) or behavior
+changes), traditionally a `DT_SONAME` bump is required. Otherwise a dependent
+application/shared object built with the old version may run abnormally. This
+can be inconvenient if the number of dependent applications is large.
+
+Symbol versioning provides backward compatibility without changing `DT_SONAME`.
+
+The following part describes the representation, and then describes the
+behaviors from the perspectives of assembler, linker, and ld.so. One may wish
+to skip the representation part when reading for the first time.
+
+## Representation
+
+In a shared object or executable file that uses symbol versioning, there are up
+to three sections related to symbol versioning. `.gnu.version_r` and
+`.gnu.version_d` among them are optional:
+
+* `.gnu.version` (version symbol section). The `DT_VERSYM` tag in the dynamic
+  table points to the section. Assuming there are N entries in `.dynsym`,
+  `.gnu.version` contains N `uint16_t` values, with the i-th entry indicating
+  the version ID of the i-th symbol. Put it another way, `.gnu.version` is a
+  parallel table to `.dynsym`.
+* `.gnu.version_r` (version requirement section). The `DT_VERNEED`/
+  `DT_VERNEEDNUM` tags in the dynamic table delimiter this section. This
+  section describes the version information used by the undefined versioned
+  symbol in the module.
+* `.gnu.version_d` (version definition section). The `DT_VERDEF`/`DT_VERDEFNUM`
+  tags in the dynamic table delimiter this section. This section describes the
+  version information used by the defined versioned symbols in the module.
+
+```c
+// Version definitions
+typedef struct {
+  Elf64_Half    vd_version;  // version: 1
+  Elf64_Half    vd_flags;    // VER_FLG_BASE (index 1) or 0 (index != 1)
+  Elf64_Half    vd_ndx;      // version index
+  Elf64_Half    vd_cnt;      // number of associated aux entries, always 1 in practice
+  Elf64_Word    vd_hash;     // SysV hash of the version name
+  Elf64_Word    vd_aux;      // offset in bytes to the verdaux array
+  Elf64_Word    vd_next;     // offset in bytes to the next verdef entry
+} Elf64_Verdef;
+
+typedef struct {
+  Elf64_Word    vda_name;    // version name
+  Elf64_Word    vda_next;    // offset in bytes to the next verdaux entry
+} Elf64_Verdaux;
+
+// Version needs
+typedef struct {
+  Elf64_Half    vn_version;  // version: 1
+  Elf64_Half    vn_cnt;      // number of associated aux entries
+  Elf64_Word    vn_file;     // .dynstr offset of the depended filename
+  Elf64_Word    vn_aux;      // offset in bytes to vernaux array
+  Elf64_Word    vn_next;     // offset in bytes to next verneed entry
+} Elf64_Verneed;
+
+typedef struct {
+  Elf64_Word    vna_hash;    // SysV hash of vna_name
+  Elf64_Half    vna_flags;   // usually 0; copied from vd_flags of the depended so
+  Elf64_Half    vna_other;   // unused
+  Elf64_Word    vna_name;    // .dynstr offset of the version name
+  Elf64_Word    vna_next;    // offset in bytes to next vernaux entry
+} Elf64_Vernaux;
+```
+
+Currently GNU ld does not set the `VER_FLG_WEAK` flag. [BZ24718#c15](https://sourceware.org/bugzilla/show_bug.cgi?id=24718#c15) proposed "set
+`VER_FLG_WEAK` on version reference if all symbols are weak".
+
+The advantage of using a parallel table for `.gnu.version` is that symbol
+versioning is optional. ld.so implementations which do not support symbol
+versioning can freely assume no symbol has a version. The behavior is that all
+references as if bind to the default version definitions. musl ld.so falls into
+this category.
+
+### Version index values
+
+Index 0 is called `VER_NDX_LOCAL`. The binding of the symbol will be changed to
+`STB_LOCAL`. Index 1 is called `VER_NDX_GLOBAL`. It has no special effect and
+is used for unversioned symbols. Index 2 to 0xffef are used for user defined
+versions.
+
+Defined versioned symbols have two forms:
+
+* foo@@v2, the default version.
+* foo@v2, a non-default version (hidden version). The `VERSYM_HIDDEN` bit of the
+  version ID is set.
+
+Undefined versioned symbols have only the `foo@v2` form.
+
+Usually versioned symbols are only defined in shared objects, but executables
+can have defined versioned symbols as well. (When a shared object is updated,
+the old symbols are retained so that other shared objects do not need to be
+relinked, and executable files usually do not provide versioned symbols for
+other shared objects to reference.)
+
+### Example
+
+`readelf -V` can dump the symbol versioning tables.
+
+In the `.gnu.version_d` output below:
+
+* Version index 1 (`VER_NDX_GLOBAL`) is the filename (soname if shared object).
+  The `VER_FLG_BASE` flag is set.
+* Version index 2 is a user defined version. Its name is `LUA_5.3`.
+
+In the `.gnu.version_r` output below, each of version indexes 3~10 represents a
+version in a depended shared object. The name `GLIBC_2.2.5` appears thrice,
+each for a different shared object.
+
+The `.gnu.version` table assigns a version index to each `.dynsym` entry.
+
+```
+% readelf -V /usr/bin/lua5.3
+
+Version symbols section '.gnu.version' contains 248 entries:
+ Addr: 0x0000000000002af4  Offset: 0x002af4  Link: 5 (.dynsym)
+  000:   0 (*local*)       3 (GLIBC_2.3)     4 (GLIBC_2.2.5)   4 (GLIBC_2.2.5)
+  004:   5 (GLIBC_2.3.4)   4 (GLIBC_2.2.5)   4 (GLIBC_2.2.5)   4 (GLIBC_2.2.5)
+  ...
+
+Version definition section '.gnu.version_d' contains 2 entries:
+ Addr: 0x0000000000002ce8  Offset: 0x002ce8  Link: 6 (.dynstr)
+  000000: Rev: 1  Flags: BASE  Index: 1  Cnt: 1  Name: lua5.3
+  0x001c: Rev: 1  Flags: none  Index: 2  Cnt: 1  Name: LUA_5.3
+
+Version needs section '.gnu.version_r' contains 3 entries:
+ Addr: 0x0000000000002d20  Offset: 0x002d20  Link: 6 (.dynstr)
+  000000: Version: 1  File: libdl.so.2  Cnt: 1
+  0x0010:   Name: GLIBC_2.2.5  Flags: none  Version: 9
+  0x0020: Version: 1  File: libm.so.6  Cnt: 1
+  0x0030:   Name: GLIBC_2.2.5  Flags: none  Version: 6
+  0x0040: Version: 1  File: libc.so.6  Cnt: 6
+  0x0050:   Name: GLIBC_2.11  Flags: none  Version: 10
+  0x0060:   Name: GLIBC_2.14  Flags: none  Version: 8
+  0x0070:   Name: GLIBC_2.4  Flags: none  Version: 7
+  0x0080:   Name: GLIBC_2.3.4  Flags: none  Version: 5
+  0x0090:   Name: GLIBC_2.2.5  Flags: none  Version: 4
+  0x00a0:   Name: GLIBC_2.3  Flags: none  Version: 3
+```
+
+### Symbol versioning in object files
+
+The GNU scheme allows `.symver` directives to label the versions of the symbols
+in objec files. The symbol names residing in .o contain `@` or `@@`.
+
+## Assembler behavior
+
+GNU as and LLVM integrated assembler provide implementation.
+
+* `.symver foo, foo@v1`
+  * If foo is undefined, produce `foo@v1`
+  * If foo is defined, produce `foo` and `foo@v1` with the same binding
+    (`STB_LOCAL`, `STB_WEAK`, or `STB_GLOBAL`) and `st_other` value (i.e. the
+    same visibility). Personally I think this behavior is a design flaw
+    [{gas-copy}](). The proposed [V4 PATCH gas: Extend .symver directive](https://sourceware.org/pipermail/binutils/2020-April/110622.html)
+    can address this problem.
+* `.symver foo, foo@@v1`
+  * If foo is undefined, error
+  * If foo is defined, produce `foo` and `foo@v1` with the same binding and `st_other` value.
+* `.symver foo, foo@@@v1`
+  * If foo is undefined, produce `foo@v1`
+  * If foo is defined, produce `foo@@v1`
+
+Personal recommendation:
+
+* To define a default version symbol: use `.symver foo, foo@@@v2` so that foo
+  is not present.
+* To define a non-default version symbol, add a suffix to the original symbol
+  name (`.symver foo_v1, foo@v1`) to prevent conflicts with `foo`. This will
+  however leave (usually undesirable) `foo_v1`. If you don't strip `foo_v1` from
+  the object file, you may localize it with a local: pattern in the version
+  script. With GNU as 2.35 ([PR25295](https://sourceware.org/bugzilla/show_bug.cgi?id=25295)),
+  you can use `.symver foo_v1, foo@v1, remove`
+* The version of an undefined symbol is usually bound at link time. It is
+  usually unnecessary to set the version with `.symver`. If required, prefer
+  `.symver foo, foo@@@v1` to `.symver foo, foo@v1`.
+
+## Linker behavior
+
+The linker enters the symbol resolution stage after reading in object files,
+archive files, shared objects, LTO files, linker scripts, etc.
+
+GNU ld uses indirect symbol to represent versioned symbols. There are
+complicated rules, and these rules are not documented. The symbol resolution
+rules that I personally derived:
+
+* Defined `foo` resolves undefined `foo` (traditional unversioned rule)
+* Defined `foo@v1` resolves undefined `foo@v1` (a non-default version symbol is
+  like a separate symbol)
+* Defined `foo@@v1` (default version) resolves both undefined `foo` and `foo@v1`
+
+If there are multiple default version definitions (such as `foo@@v1 foo@@v2`),
+a duplicate definition error should be issued even if one is weak. Usually a
+symbol has zero or one default version (`@@`) definition, and an arbitrary
+number of non-default version (`@`) definitions.
+
+If the linker sees undefined `foo` and `foo@v1` first, it will treat them as
+two symbols. When the linker see the definition `foo@@v1`, conceptually `foo`
+and `foo@@v1` should be combined. If the linker sees `foo@@v2` instead,
+`foo@@v2` should resolve `foo` and `foo@v1` should be a separate symbol.
+
+* [Combining Versions](combining-versions.md) describes the problem.
+* `gold/symtab.cc Symbol_table::define_default_version` uses a heuristic rule
+  to solve this problem. It special cases on visibility, but I feel that this
+  rule is unneeded.
+* Before 2.26, GNU ld reported a bogus multiple definition error for defined
+  weak `foo@@v1` and defined global `foo@v1` [PR ld/26978](https://sourceware.org/bugzilla/show_bug.cgi?id=26978)
+* Before 2.26, GNU ld had a bug that the visibility of undefined `foo@v1` does
+  not affect the output visibility of `foo@@v1`: [PR ld/26979](https://sourceware.org/bugzilla/show_bug.cgi?id=26979)
+* I fixed the object file side problem of LLD 12.0 in https://reviews.llvm.org/D92259
+  `foo` Archive files and lazy object files may still have incompatibility issues.
+
+When LLD sees a defined `foo@@v`, it adds both `foo` and `foo@v1` into the
+symbol table, thus `foo@@v1` can resolve both undefined `foo` and `foo@v1`.
+After processing all input files, a pass iterates symbols and redirects
+`foo@v1` to `foo@@v1`.  Becase LLD treats them as separate symbols during input
+processing, a defined `foo@v` cannot suppress the extraction of an archive
+member defining `foo@@v1`, leading to a behavior incompatible with GNU ld. This
+probably does not matter, though.
+
+GNU ld has another strange behavior: if both `foo` and `foo@v1` are defined, `foo`
+will be removed. I strongly believe it is an issue in GNU ld but the maintainer
+rejected [PR ld/27210](https://sourceware.org/bugzilla/show_bug.cgi?id=27210).
+
+## Version script
+
+To define a versioned symbol in a shared object or an executable, a version
+script must be specified. If all versioned symbols are undefined, then the
+version script can be omitted.
+
+```
+# Make all symbols other than foo and bar local.
+{ global: foo; bar; local: *; };
+
+# Assign version FBSD_1.0 to malloc and version FBSD_1.3 to mallocx,
+# and make internal local.
+FBSD_1.0 { malloc; local: internal; };
+FBSD_1.3 { mallocx; };
+```
+
+A version script has three purposes:
+
+* Define versions.
+* Specify some patterns so that matched defined symbols (which do not have `@`
+  in the name) are tied to the specified version.
+* Scope reduction: for a defined unversioned symbol matched by a `local:`
+  pattern, its binding will be changed to `STB_LOCAL` and will not be exported
+  to the dynamic symbol table.
+
+A version script can consist of one anonymous version tag (`{...};`) or a list of
+named version tags (`v1 {...};`). If you use an anonymous version tag with other
+version tags, GNU ld will error: `anonymous version tag cannot be combined with
+other version tags`. A `local:` part can be placed in any version tag. Which
+version tag is used does not matter.
+
+If a defined symbol is matched by multiple version tags, the following
+precedence rules apply (`binutils-gdb/bfd/linker.c:find_version_for_sym`):
+
+* The first version tag with an exact pattern (i.e. there is no wildcard) wins.
+* Otherwise, the last version tag with a non-`*` wildcard pattern wins.
+* Otherwise, the first version tag with a `*` pattern wins.
+
+The gotcha is that `**` is a wildcard pattern which matches any symbol but its
+precedence is higher than `*`.
+
+Most patterns are exact so gold and LLD iterate patterns instead of symbols to
+improve performance.
+
+## How a versioned symbol is produced
+
+An undefined symbol can be assigned a version if:
+
+* its name does not contain `@` (`.symver` is unused) and a shared object
+  provides a default version definition.
+* its name contains `@` and a shared object defines the symbol. GNU ld errors
+  if there is no such a shared object. After https://reviews.llvm.org/D92260,
+  LLD will report an error as well.
+
+A defined symbol can be assigned a version if:
+
+* its name does not contain `@` and it is matched by a pattern in a named version tag in a version script.
+* its name contains `@`
+  * If `-shared`, the version should be defined by a version script, otherwise
+    GNU ld errors version node not found for symbol. This exception looks
+    strange to me so I have filed [PR ld/26980](https://sourceware.org/bugzilla/show_bug.cgi?id=26980).
+  * If `-no-pie` or `-pie`, a version definition is unneeded in GNU ld. This
+    behavior is strange.
+
+## ld.so behavior
+
+/Linux Standard Base Core Specification, Generic Part/ describes the behavior
+of ld.so. Kan added symbol versioning support to FreeBSD rtld in 2005.
+
+The `DT_VERNEED` and `DT_VERNEEDNUM` tags in the dynamic table delimiter the
+version requirement by a shared object/executable file: the requires versions
+and required shared object names (`Vernaux::vna_name`).
+
+For each Vernaux entry (a Verneed's auxilliary entry) without the
+`VER_FLG_WEAK` bit, ld.so checks whether the referenced shared object has the
+`DT_VERDEF` table.  If no, ld.so handles the case as a graceful degradation; if
+yes and the table does not define the version, ld.so reports an error.
+[verneed-check]
+
+Usually a minor release does not bump soname. Suppose that libB.so depends on
+the libA 1.3 (soname is libA.so.1) and calls an function which does not exist
+in libA 1.2. If PLT lazy binding is used, libB.so may seem to work on a system
+with libA 1.2, until the PLT of the 1.3 symbol is called. If symbol versioning
+is not used and you want to solve this problem, you have to record the minor
+version number (`libA.so.1.3`) in the soname. However, bumping soname is
+all-or-nothing: all the dependent shared objects need to be relinked. If symbol
+versioning is used, you can continue to use the soname `libA.so.1`. ld.so will
+report an error if libA 1.2 is used, because the 1.3 version required by
+libB.so does not exist.
+
+In the symbol resolution stage:
+
+* An undefined foo can be resolved to a definition of `foo` or `foo@@v2` (only
+  the definitions with index number 1 (`VER_NDX_GLOBAL`) and 2 are used in the
+  reference match).
+* An undefined `foo@v1` can be resolved to a definition of `foo`, `foo@v1`, or
+  `foo@@v1`.
+
+Note (undefined `foo` resolving to `foo@v1`) is allowed by ld.so but not
+allowed by the linker [{reject-non-default}](). This difference provides a
+mechanism to refuse linking against old symbols while keeping compatibility
+with unversioned old libraries. If a new version of a shared object needs to
+deprecate an unversioned `bar`, you can remove bar and define `bar@compat`
+instead. Libraries using `bar` are unaffected but new links against `bar` are
+disallowed.
+
+## Upgraded symbols in glibc
+
+Note that GNU nm before binutils 2.35 does not display `@` or `@@`.
+
+```
+nm -D /lib/x86_64-linux-gnu/libc.so.6 | \
+  awk '$2!="U" {i=index($3,"@"); if(i){v=substr($3,i); $3=substr($3,1,i-1); m[$3]=m[$3]" "v}} \
+  END {for(f in m)if(m[f]~/@.+@/)print f, m[f]}'
+```
+
+The output on my x86-64 system:
+
+```
+pthread_cond_broadcast  @GLIBC_2.2.5 @@GLIBC_2.3.2
+clock_nanosleep  @@GLIBC_2.17 @GLIBC_2.2.5
+_sys_siglist  @@GLIBC_2.3.3 @GLIBC_2.2.5
+sys_errlist  @@GLIBC_2.12 @GLIBC_2.2.5 @GLIBC_2.3 @GLIBC_2.4
+quick_exit  @GLIBC_2.10 @@GLIBC_2.24
+memcpy  @@GLIBC_2.14 @GLIBC_2.2.5
+regexec  @GLIBC_2.2.5 @@GLIBC_2.3.4
+pthread_cond_destroy  @GLIBC_2.2.5 @@GLIBC_2.3.2
+nftw  @GLIBC_2.2.5 @@GLIBC_2.3.3
+pthread_cond_timedwait  @@GLIBC_2.3.2 @GLIBC_2.2.5
+clock_getres  @GLIBC_2.2.5 @@GLIBC_2.17
+pthread_cond_signal  @@GLIBC_2.3.2 @GLIBC_2.2.5
+fmemopen  @GLIBC_2.2.5 @@GLIBC_2.22
+pthread_cond_init  @GLIBC_2.2.5 @@GLIBC_2.3.2
+clock_gettime  @GLIBC_2.2.5 @@GLIBC_2.17
+sched_setaffinity  @GLIBC_2.3.3 @@GLIBC_2.3.4
+glob  @@GLIBC_2.27 @GLIBC_2.2.5
+sys_nerr  @GLIBC_2.2.5 @GLIBC_2.4 @@GLIBC_2.12 @GLIBC_2.3
+_sys_errlist  @GLIBC_2.3 @GLIBC_2.4 @@GLIBC_2.12 @GLIBC_2.2.5
+sys_siglist  @GLIBC_2.2.5 @@GLIBC_2.3.3
+clock_getcpuclockid  @GLIBC_2.2.5 @@GLIBC_2.17
+realpath  @GLIBC_2.2.5 @@GLIBC_2.3
+sys_sigabbrev  @GLIBC_2.2.5 @@GLIBC_2.3.3
+posix_spawnp  @@GLIBC_2.15 @GLIBC_2.2.5
+posix_spawn  @@GLIBC_2.15 @GLIBC_2.2.5
+_sys_nerr  @@GLIBC_2.12 @GLIBC_2.4 @GLIBC_2.3 @GLIBC_2.2.5
+nftw64  @GLIBC_2.2.5 @@GLIBC_2.3.3
+pthread_cond_wait  @GLIBC_2.2.5 @@GLIBC_2.3.2
+sched_getaffinity  @GLIBC_2.3.3 @@GLIBC_2.3.4
+clock_settime  @GLIBC_2.2.5 @@GLIBC_2.17
+glob64  @@GLIBC_2.27 @GLIBC_2.2.5
+```
+
+* `realpath@@GLIBC_2.3`: the previous version returns `EINVAL` when the second
+  parameter is NULL
+* `memcpy@@GLIBC_2.14` [BZ12518](https://sourceware.org/bugzilla/show_bug.cgi?id=12518):
+  the previous version guarantees a forward copying behavior. Shockwave Flash
+  at that time had a "memcpy downward" bug which required the workaround.
+* `quick_exit@@GLIBC_2.24` [BZ20198](https://sourceware.org/bugzilla/show_bug.cgi?id=20198):
+  the previous version copies the destructors of `thread_local` objects.
+* `glob64@@GLIBC_2.27`: the previous version does not follow dangling symlinks.
+
+## How to remove symbol versioning
+
+Imagine that you want to build an application with a prebuilt shared object
+which has versioned references, but you can only find shared objects providing
+the unversioned definitions. The linker will helpfully error:
+
+```
+ld.lld: error: undefined reference to foo@v1 [--no-allow-shlib-undefined]
+```
+
+As the diagnostic suggests, you can add `--allow-shlib-undefined` to get rid of
+the error. It is not recommended but the built application may happen to work.
+
+For this case, an alternative hacky solution is:
+
+```
+# 32-bit
+cp in.so out.so
+r2 -wqc '/x feffff6f00000000 @ section..dynamic; w0 16 @ hit0_0' out.so
+llvm-objcopy -R .gnu.version out.so
+
+# 64-bit
+cp in.so out.so
+r2 -wqc '/x feffff6f @ section..dynamic; w0 8 @ hit0_0' out.so
+llvm-objcopy -R .gnu.version out.so
+```
+
+With the removal of `.gnu.version`, the linker will think that `out.so`
+references foo instead of `foo@v1`. However, llvm-objcopy will zero out the
+section contents. At runtime, glibc ld.so will complain unsupported version 0
+of Verneed record. To make glibc happy, you can delete `DT_VER*` tags from the
+dynamic table. The above code snippet uses an r2 command to locate
+`DT_VERNEED(0x6ffffffe)` and rewrite it to `DT_NULL`(a `DT_NULL` entry stops
+the parsing of the dynamic table). The difference of the `readelf -d` output is
+roughly:
+
+```
+  0x000000006ffffffb (FLAGS_1)            Flags: NOW
+- 0x000000006ffffffe (VERNEED)            0x8ef0
+- 0x000000006fffffff (VERNEEDNUM)         5
+- 0x000000006ffffff0 (VERSYM)             0x89c0
+- 0x000000006ffffff9 (RELACOUNT)          1536
+  0x0000000000000000 (NULL)               0x0
+```
+
+## LLD
+
+* If an undefined symbol is not defined by a shared object, GNU ld will report
+  an error. LLD before 12.0 did not error (I fixed it in
+  https://reviews.llvm.org/D92260).
+
+## Remarks
+
+GCC/Clang supports asm specifier and `#pragma redefine_extname` renaming a
+symbol. For example, if you declare `int foo() asm("foo_v1");` and then
+reference `foo`, the symbol in .o will be `foo_v1`.
+
+For example, the biggest change in musl v1.2.0 is the time64 support for its
+supported 32-bit architectures. musl adopted a scheme based on asm specifiers:
+
+```c
+// include/features.h
+#define __REDIR(x,y) __typeof__(x) x __asm__(#y)
+
+// API header include/sys/time.h
+int utimes(cosnt char *, const struct timeval [2]);
+__REDIR(utimes, __utimes_time64);
+
+// Implementation src/linux/utimes.c
+int utimes(const char *path, const struct timeval times[2]) { ... }
+
+// Internal header compat/time32/time32.h
+int __utimes_time32() __asm__("utimes");
+
+// Compat implementation compat/time32/utimes_time32.c
+int __utimes_time32(const char *path, const struct timeval32 times32[2]) { ... }
+```
+
+* In .o, the time32 symbol remains `utimes` and is compatible with the ABI
+  required by programs linked against old musl versions; the time64 symbol is
+  `__utimes_time64`.
+* The public header redirects utimes to `__utimes_time64`.
+  * cons: if the user declares utimes by themself, they will not link against
+    the correct `__utimes_time64`.
+* The "good-looking" name `utimes` is used for the preferred time64
+  implementation internally and the "ugly" name `__utimes_time32` is used for
+  the legacy time32 implementation.
+  * If the time32 implementation is called elsewhere, the "ugly" name can make
+    it stand out.
+
+For the above example, here is an implementation with symbol versioning:
+
+```c
+// API header include/sys/time.h
+int utimes(cosnt char *, const struct timeval [2]);
+
+// Implementation src/linux/utimes.c
+int utimes(const char *path, const struct timeval times[2]) { ... }
+
+// Internal header compat/time32/time32.h
+// Probably __asm__(".symver __utimes_time32, utimes@time32, rename"); if supported
+__asm__(".symver __utimes_time32, utimes@time32");
+
+// Implementation compat/time32/utimes_time32.c
+int __utimes_time32(const char *path, const struct timeval32 times32[2])
+{
+  ...
+}
+```
+
+Note that it is `@@@` cannot be used. The header is included in a defining
+translation unit and `@@@` will lead to a default version definition while we
+want a non-default version definition.
+
+According to Assembler behavior, the undesirable `__utimes_time32` is present.
+Be careful to use a version script to localize it.
+
+So what is the significance of symbol versioning? I think carefully:
+
+* Refuse linking against old symbols while keeping compatibility with
+  unversioned old libraries. [{reject-non-default}]()
+* No need to label declarations.
+* The version definition can be delayed until link time. The version script
+  provides a flexible pattern matching mechanism to assign versions.
+* Scope reduction. Arguably another mechanism like `--dynamic-list` might have
+  been developed if version scripts did not provide `local:`.
+* There are some semantic issues in renaming builtin functions with asm
+  specifiers in GCC and Clang (they do not know that the renamed symbol has
+  built-in semantic). See [2020-10-15-intra-call-and-libc-symbol-renaming](https://maskray.me/blog/2020-10-15-intra-call-and-libc-symbol-renaming)
+* [verneed-check]
+
+For the first item, the asm specifier scheme uses conventions to prevent
+problems (users should include the header); and symbol versioning can be forced
+by ld.
+
+Design flaws:
+
+* `.symver foo, foo@v1` In foobehavior defined [{gas-copy}](): reserved symbol
+  `foo`(redundant symbol has a link), binding / `st_other`sync (not convenient
+  to set different binding / visibility)
+* Verdaux is a bit redundant. In practice, one Verdef has only one auxilliary
+  Verdaux entry.
+* This is arguably a minor problem but annoying for a framework providing
+  multiple shared objects. ld.so requires "a versioned symbol is implemented in
+  the same shared object in which it was found at link time", which disallows
+  moving definitions between shared objects. Fortunately, glibc 2.30 [BZ24741](http://sourceware.org/PR24741)
+  relaxes this requirement, essentially ignoring `Vernaux::vna_name`.
+
+Before that, glibc used a forwarder to move `clock_*` functions from librt.so
+to libc.so:
+
+```c
+// rt/clock-compat.c
+__typeof(clock_getres) *clock_getres_ifunc(void) asm("clock_getres");
+__typeof(clock_getres) *clock_getres_ifunc(void) { return &__clock_getres; }
+```
+
+libc.so defines `__clock_getres` and `clock_getres`. librt.so defines an ifunc
+called `clock_getres` which forwards to libc.so `__clock_getres`.
+
+## Related links
+
+* [Combining Versions](combining-versions.md)
+* [Version Scripts](version-scripts.md)
+* https://invisible-island.net/ncurses/ncurses-mapsyms.html
+
--- a/maskray-3.md
+++ b/maskray-3.md
--- a/maskray-4.md
+++ b/maskray-4.md
@ -0,0 +1,371 @@
+# LLD and GNU linker incompatibilities
+
+Subtitle: Is LLD a drop-in replacement for GNU ld?
+
+The motivation for this article was someone challenging the "drop-in
+replacement" claim on LLD's website (the discussion was about Linux-like ELF
+toolchain):
+
+> LLD is a linker from the LLVM project that is a drop-in replacement for
+> system linkers and runs much faster than them. It also provides features that
+> are useful for toolchain developers.
+
+99.9% pieces of software work with LLD without a change. Some linker script
+applications may need an adaption (such adaption is oftentimes due to brittle
+assumptions: asking too much from GNU ld's behavior which should be fixed
+anyway). So I defended for this claim.
+
+Piotr Kubaj said that this is a probably more of a marketing term than a
+technical term, the term tries to lure existing users into thinking "it's the
+same you know, but better!". I think that this is fair in some senses: for many
+applications LLD has achieved much faster speed and much lower memory usage
+than GNU ld. A more important thing is that LLD adds a third choice to the
+spectrum. It brings competitive pressure to both sides, gives incentive for
+improvement, and makes for more standardized future features/extensions. One
+reason that I am subscribed to the binutils mailing list is I want to
+participate in its design processes (I am proud to say that I have managed to
+find some early issues of various new things).
+
+Anyway, I thought documenting the compatibility problems between the ELF ports
+of LLD and GNU ld is useful, not only to others but also to my future self,
+hence this article. I will try to describe GNU gold behaviors as well.
+
+So here is the long list. Please keep in mind that many compatibility issues do
+not really matter and a user may never run into such an issue. Many of them
+just serve as educational purposes and my personal reference. There some some
+user perceivable differences but quite a lot are WONTFIX on both GNU ld and
+LLD. LLD, as a newer linker, has less legacy compatibility burden and can make
+good default choices in some cases and say no to some unneeded
+features/behaviors. A large number of features are duplicated in GNU ld's
+various ports. It is also common that one thing behaves this way in port A and
+another way in port B.
+
+* GNU ld reports `gc-sections requires either an entry or an undefined symbol`
+  in a -r --gc-section link. LLD doesn't error
+  (https://reviews.llvm.org/D84131#2162411). I am unsure whether such a
+  diagnostic will be useful (an uncommon use case where the GC roots are more
+  than the explict linker options).
+* The default image base for `-no-pie` links is different. For example, on
+  x86-64, GNU ld defaults to 0x400000 while LLD defaults to 0x200000.
+* GNU ld synthesizes a `STT_FILE` symbol when copying non-`STT_SECTION`
+  `STB_LOCAL` symbols. LLD doesn't.
+  * The `STT_FILE` symbol name is the input filename. For compiler driver
+    specified startup files like `crti.o` and `crtn.o`, their absolute paths
+    will end up in the linked image. This breaks local determinism (toolchain
+    paths are leaked) for some users.
+  * I filed https://bugs.llvm.org/show_bug.cgi?id=48023 and
+    https://sourceware.org/bugzilla/show_bug.cgi?id=26822. From binutils 2.36
+    onwards, the base name will be used.
+* Text relocations.
+  * In GNU ld, `-z notext`/`-z text`/unspecified are a tri-state. For
+    `-z notext`/unspecified, the dynamic tags `DT_TEXTREL` and `DF_TEXTREL` are
+    added on demand. If unspecified and GNU ld is configured with
+    `--enable-textrel-check=warning`, a warning will be issued.
+  * LLD has two states and add `DT_TEXTREL` and `DF_TEXTREL` if `-z notext` is specified.
+  * GNU ld supports more relocation types as text relocations.
+* Default library paths.
+  * GNU ld has default library paths.
+  * LLD doesn't. This is intentional so https://reviews.llvm.org/D70048
+    (NetBSD) cannot be accepted.
+* GNU ld supports grouped short options. This can sometimes cause surprising
+  behaviors with misspelled or unimplemented options, e.g. `-no-pie` means
+  `-n -o -pie` because GNU ld as of 2.35 has not implemented `-no-pie`. Nick
+  Clifton committed `Update the BFD linker so that it deprecates grouped short
+  options.` to deprecated the GNU ld feature. LLD never supports grouped short
+  options.
+* Mixed `SHF_LINK_ORDER` and non-`SHF_LINK_ORDER` input sections in an output
+  section.
+  * LLD performs sorting within an input section description and allows
+    arbitrary mixes.
+  * GNU ld does not allow mixed sections
+    https://sourceware.org/bugzilla/show_bug.cgi?id=26256 (H.J. Lu has a patch)
+* LLD defaults to `-z relro` by default. This is probably not a good default
+  but it is difficult to change now. I have a comment
+  https://bugs.llvm.org/show_bug.cgi?id=48549. GNU ld warns for `-z relro` and
+  `-z norelro` for non Linux/FreeBSD BFD emulations (e.g. `-m aarch64elf`).
+* Different archive member extraction semantics. See
+  http://lld.llvm.org/ELF/warn_backrefs.html for details.
+* LLD `--warn-backrefs` warns for `def.a ref.o def.so` if `def.a` cannot
+  satisfy previous unresolved symbols. LLD resolves the definition to `def.a`
+  while GNU linkers resolve the definition to `def.so`.
+* GNU ld `-static` has traditionally been a synonym to `-Bstatic`. Recently on
+  x86 it has been changed to behave a bit similar to gold `-static`, which
+  disallows linking against shared objects. LLD `-static` is still a synonym to
+  `-Bstatic`.
+* GNU linkers have a default `--dynamic-linker`. LLD doesn't.
+* GNU linkers warn for `.gnu.warning.*` sections. LLD doesn't. It is unclear
+  the feature is useful. https://bugs.llvm.org/show_bug.cgi?id=42008
+* GNU ld has architecture-specific rules for relocations referencing undefined
+  weak symbols. I don't think the GNU ld behaviors can be summarized (even by
+  maintainers!). LLD's are consistent.
+* The conditions to create `.interp` are different. I believe GNU ld's is quite
+  difficult to describe.
+* `--no-allow-shlib-undefined` and `--rpath-link`
+  * GNU ld traces all shared objects (transitive `DT_NEEDED` dependencies) and
+    emulates the bheavior of a dynamic loader to warn more cases.
+  * gold and LLD implement a simplified version. They warn for shared objects
+    whose `DT_NEEDED` dependencies are all seen as input files.
+* `--fatal-warnings`
+  * GNU ld still reports warning: ....
+  * LLD switches to error: ....
+* `--no-relax`
+  * GNU ld: disable `R_X86_64_[REX_]GOTPCRELX`
+  * LLD: no-op (https://reviews.llvm.org/D81359)
+* LLD places `.rodata` (among other `SHF_ALLOC` and
+  non-`SHF_WRITE`-non-`SHF_EXECINSTR` sections) before .text (among other
+  `SHF_ALLOC` and `SHF_EXECINSTR` sections).
+* `.symtab`/`.shstrtab`/`.strtab` in a linker script.
+  * Ignored by GNU ld, therefore `--orphan-handling=` does not warn/error.
+  * Respected by LLD
+* Whether `ADDR(.foo)` in a linker script can retain an empty output section.
+  * GNU ld: no. Symbol assignments relative to such empty sections may have
+    strange `st_shndx`.
+  * LLD: yes.
+* If an undefined symbol is referenced by both `R_X86_64_JUMP_SLOT` (lazy) and
+  R_X86_64_GLOB_DAT (`non-lazy`)
+  * GNU ld generates `.plt.got` with `R_X86_64_GLOB_DAT` relocations.
+    `R_X86_64_JUMP_SLOT` can thus be omitted to decrease the number of dynamic
+    relocations.
+  * LLD does not implement this saving. This naturally requires more than one
+    pass scanning relocations which LLD doesn't do at present. https://bugs.llvm.org/show_bug.cgi?id=32938
+* GNU ld relaxes `R_X86_64_GOTPCREL` relocations with some forms (e.g.
+  `movq foo@GOTPCREL(%rip), %reg` -&gt; `leaq foo(%rip), %reg`). LLD never
+  relaxes `R_X86_64_GOTPCREL` relocations.
+* GNU linkers give `.gnu.linkonce*` sections COMDAT section semantics. LLD
+  simply ignores such sections. https://bugs.llvm.org/show_bug.cgi?id=31586
+  tracks when the hack can be removed.
+* GNU ld adds `PT_PHDR` and `PT_INTERP` together. A shared object usually does
+  not have two program headers. In LLD, `PT_PHDR` is always added unless the
+  address assignment makes is unsuitable to place program headers at all.
+* The conditions to create the dynamic symbol table `.dynsym`.
+  * LLD: there is an input shared object, `-pie`/`-shared`, or `--export-dynamic`.
+  * GNU ld's is quite complex. `--export-dynamic` is not special, though.
+* `--export-dynamic-symbol`
+  * gold's implies `-u`.
+  * GNU ld (from 2.35 onwards) and LLD's do not imply `-u`.
+* In GNU ld, a defined `foo@v` can suppress the extraction of an archive member
+  defining `foo@@v1`. LLD treats them two separate symbols and thus the archive
+  member extraction still happens. This can hardly matter. See [All about symbol
+  versioning](maskray-2.md) for details.
+* Default program headers.
+  * With traditional `-z noseparate-code`, GNU ld defaults to a `RX/R/RW`
+    program header layout. With `-z separate-code` (default on Linux/x86 from
+    binutils 2.31 onwards), GNU ld defaults to a `R/RX/R/RW` program header
+    layout.
+  * LLD defaults to `R/RX/RW(RELRO)/RW(non-RELRO)`. With `--rosegment`, LLD
+    uses `RX/RW(RELRO)/RW(non-RELRO)`.
+  * Placing all R before RX is preferable because it can save one program
+    header and reduce alignment costs.
+  * LLD's split of RW saves one maxpagesize alignment and can make the linked
+    image smaller.
+  * This breaks some assumptions that the (so-called) "text segment" precedes
+    the (so-called) "data segment".
+  * For example, certain programs expect `.text` is the first section of the
+    text segment and specify `-Ttext=0` to place the `PF_R|PF_X` program header
+    at `p_vaddr=0`. This is a brittle assumption and should be avoided. If
+    `PT_PHDR` is needed, `--image-base=0` is a replacement. If `PT_PHDR` is not
+    needed, `.text 0 : { *(.text .text.*) }` is a replacement.
+* GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in
+  `-pie` mode. glibc `csu/libc-start.c` needs it when statically linked, but
+  not in the static pie mode. LLD does not distinguish `-no-pie`, `-pie` and
+  `-shared`. https://bugs.llvm.org/show_bug.cgi?id=48674
+* LLD uses `--no-apply-dynamic-relocs` by default. GNU ld and gold fill in the
+  GOT entries with link-time values. GNU ld only supports
+  `--no-apply-dynamic-relocs` for aarch64
+  https://sourceware.org/bugzilla/show_bug.cgi?id=25891.
+* When relaxing `R_X86_64_REX_GOTPCRELX`, GNU ld suppresses the relaxation if
+  it would cause relocation overflow. LLD does not perform the check.
+* GNU ld and gold allow `--exclude-libs=b` to hide `b.a`. LLD requires
+  `--exclude=libs=b.a`.
+* Whether to use executable stack if neither `-z execstack` nor `-z noexecstack`
+  is specified. GNU ld and gold check whether an object file does not have
+  `.note.GNU-stack`. LLD ignores `.note.GNU-stack` and defaults to `-z
+  noexecstack`.
+
+## Semantics of `--wrap`
+
+GNU ld and LLD have slightly different `--wrap` semantics. I use "slightly"
+because in most use cases users will not observe a difference.
+
+In GNU ld, `--wrap` only applies to undefined symbols. In LLD, `--wrap` happens
+after all other symbol resolution steps. The implementation is to mangle the
+symbol table of each object file (`foo` -&gt; `__wrap_foo`; `__real_foo` -&gt;
+`foo`) so that all relocations to foo or `__real_foo` will be redirected.
+
+The LLD semantics have the advantage that non-LTO, LTO and relocatable link
+behaviors are consistent. I filed
+https://sourceware.org/bugzilla/show_bug.cgi?id=26358 for GNU ld.
+
+```
+# GNU ld: call bar
+# LLD: call __wrap_bar
+  call bar
+.globl bar
+bar:
+```
+
+## Relocation referencing a local relative to a discarded input section
+
+* How to resolve a relocation referencing a STT_SECTION symbol associated to a
+  discarded `.debug_*` input section.
+  * GNU ld and gold have logic resolving the relocation to the prevailing
+    section symbol.
+  *  LLD does not have the logic. LLD 11 defines some tombstone values.
+
+> A symbol table entry with `STB_LOCAL` binding that is defined relative to one
+> of a group's sections, and that is contained in a symbol table section that
+> is not part of the group, must be discarded if the group members are
+> discarded. References to this symbol table entry from outside the group are
+> not allowed.
+
+ld.bfd/gold/lld error if the section containing the relocation is `SHF_ALLOC`.
+`.debug*` do not have the `SHF_ALLOC` flag and those relocations are allowed.
+
+lld resolves such relocations to 0. ld.bfd and gold, however, have some
+`CB_PRETEND`/`PRETEND` logic to resolve relocations to the definitions in the
+prevailing comdat groups. The code is hacky and may not suit lld.
+
+https://bugs.llvm.org/show_bug.cgi?id=42030
+
+## Canonical PLT entry for ifunc
+
+How to handle a direct access relocation referencing a `STT_GNU_IFUNC`?
+
+c.f. [GNU indirect function](maskray-6.md).
+
+## `__rela_iplt_start`
+
+GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in `-pie`
+mode.  LLD defines `__rela_iplt_start` regardless of `-no-pie`, `-pie` or
+`-shared`.
+
+Static pie and static no-pie relocation processing is very different in glibc.
+
+* Static no-pie uses special code to process a magic array delimitered by
+  `__rela_iplt_start`/`__rela_iplt_end`.
+* Static pie uses self-relocation to take care of `R_*_IRELATIVE`. The above
+  magic array code is executed as well. If `__rela_iplt_start`/`__rela_iplt_end`
+  are defined (like what LLD does), we will get
+  `0 < __rela_iplt_start < __rela_iplt_end` in `csu/libc-start.c`.
+  `ARCH_SETUP_IREL` will crash when resolving the first relocation which has
+  been processed.
+
+nsz has a glibc patch that moves the self-relocation later so everything is set up for ifunc resolvers.
+
+## Linker scripts
+
+* Some linker script commands are unimplemented in LLD, e.g. `BLOCK()` as a
+  compatibility alias for `ALIGN()`. `BLOCK` is documented in GNU ld as a
+  compatibility alias and it is not widely used, so there is no reason to keep
+  the kludge in LLD.
+* Some syntax is not recognized by LLD, e.g. LLD recognizes
+  `*(EXCLUDE_FILE(a.o) .text)` but not `EXCLUDE_FILE(a.o) *(.text)`
+  (https://bugs.llvm.org/show_bug.cgi?id=45764)
+  * To me the unrecognized syntax is misleading.
+  * If we support one way doing something, and the thing has several
+    alternative syntax, we may not consider the alternative syntax just for the
+    sake of completeness.
+* Different orphan section placement. GNU ld has very complex rules and certain
+  section names have special semantics. LLD adopted some of its core ideas but
+  made a lot of simplication:
+  * output sections are given ranks
+  * output sections are placed after symbol assignments At some point we should
+    document it. https://bugs.llvm.org/show_bug.cgi?id=42327
+* For an error detected when processing a linker script, LLD may report it
+  multiple times (e.g. `ASSERT` failure). GNU ld has such issues, too, but
+  probably much rarer.
+* `SORT` commands
+  * GNU ld: https://sourceware.org/binutils/docs/ld/Input-Section-Basics.html#Input-Section-Basics
+    mentions the feature but its behavior is strange/unintuitive. I created
+    `SORT` and multiple patterns in an input section description.
+  * LLD performs sorting within an input section description.
+    https://reviews.llvm.org/D91127
+* In LLD, `AT(lma)` forces creation of a new `PT_LOAD` program header. GNU ld
+  can reuse the previous `PT_LOAD` program header if LMA addresses are
+  contiguous. `lma-offset.s`
+* In LLD, non-`SHF_ALLOC` sections always get 0 `sh_addr`. In GNU ld you can
+  have non-zero `sh_addr` but `STT_SECTION` relocations referencing such
+  sections are not really meaningful.
+* Dot assignment (e.g. `. = 4;`) in an output section description.
+  * GNU ld: dot advances to 4 relative to the start. If you consider . on the
+    right hand side and `ABSOLUTE(.)`, I don't think the behaviors are
+    consistent.
+  * LLD: move dot to address 0x4, which will usually trigger an unable to move
+    location counter backward error. https://bugs.llvm.org/show_bug.cgi?id=41169
+
+I'll also mention some LLD release notes which can demonstrate some GNU
+incompatibility in previous versions. (For example, if one thing is supported
+in version N, then the implication is that it is unsupported in previous
+versions. Well, it could be that it worked in older versions but regressed at
+some version. However, I don't know the existence of such things.)
+
+LLD 12.0.0
+
+* `-r --gc-sections` is supported.
+* The archive member extraction semantics of COMMON symbols is by default
+  (`--fortran-common`) compatible with GNU ld. You may want to read Semantics
+  of a common definition in an archive for details. This is unfortunate.
+* `.rel[a].plt` and `.rel[a].dyn` get the `SHF_INFO_LINK` flag. https://reviews.llvm.org/D89828
+
+LLD 11.0.0
+
+* LLD can discard unused symbols with `--discard-all`/`--discard-locals` when
+  `-r` or `--emit-relocs` is specified. https://reviews.llvm.org/D77807
+* `--emit-relocs --strip-debug` can be used. https://reviews.llvm.org/D74375
+* `SHT_GNU_verneed` in shared objects are parsed, and versioned undefined
+  symbols in shared objects are respected. Previously non-default version
+  symbols could cause spurious `--no-allow-shlib-undefined` errors.
+  https://reviews.llvm.org/D80059
+* `DF_1_PIE` is set for position-independent executables. https://reviews.llvm.org/D80872
+* Better compatibility related to output section alignments and LMA regions.
+  [D75286](https://reviews.llvm.org/D75286) [D74297](https://reviews.llvm.org/D74297)
+  [D75724](https://reviews.llvm.org/D75725) [D81986](https://reviews.llvm.org/D81986)
+* `-r` allows `SHT_X86_64_UNWIND` to be merged into `SHT_PROGBITS`. This allows
+  clang/GCC produced object files to be mixed together. https://reviews.llvm.org/D85785
+* In a input section description, the filename can be specified in double
+  quotes. archive:file syntax is added. https://reviews.llvm.org/D72517 https://reviews.llvm.org/D75100
+* Linker script specified empty `(.init|.preinit|.fini)_array` are allowed with
+  `RELRO`. https://reviews.llvm.org/D76915
+
+LLD 10.0.0
+
+* LLD supports `\` (treating the next character like a non-meta character) and
+  `[!...]` (negation) in glob patterns. https://reviews.llvm.org/D66613
+
+LLD 9.0.0
+
+* The `DF_STATIC_TLS` flag is set for i386 and x86-64 when initial-exec TLS
+  models are used.
+* Many configurations of the Linux kernel's `arm32_7`, `arm64`, `powerpc64le`
+  and `x86_64` ports can be linked by LLD.
+
+LLD 8.0.0
+
+* `SHT_NOTE` sections get very high ranks (they usually precede other
+  sections). https://reviews.llvm.org/D55800
+
+In the LLD 7.0.0 era, https://reviews.llvm.org/D44264 was my first meaningful
+(albeit trivial) patch to LLD. Next I made contribution to `--warn-backrefs`.
+Then I started to fix tricky issues like copy relocations of a versioned
+symbol, duplicate `--wrap`, and section ranks. I have learned a lot from these
+code reviews. In the 8.0.0, 9.0.0 and 10.0.0 era, I have fixed a number of
+tricky issues and improved a dozen of other things and am confident to say that
+other than MIPS ;-) and certain other ISA specific things I am familiar with
+every corner of the code base. These are still challenges such as integration
+of RISC-V style linker relaxation and post-link optimization, improvement to
+some aspects of the linker script, but otherwise LLD is a stable and finished
+part of the toolchain.
+
+A few random notes:
+
+* Symbol resolution can take 10%~20% time. Parallelization can theoretically
+  improve the process but it is hard to overstate the challenge (if you
+  additionally take into account determinism).
+* Be wary of feature creep. I have learned a lot from ELF design discussions
+  on generic-abi and from Solaris "linker aliens" in particular. I am sorry to
+  say so but some development on LLD indeed belongs to such categories.
+  Sometimes it is difficult to draw a line between unsupported legacy and
+  legacy we have to support.
+* LLD's adoption is now so large that sometimes a decision (like a default
+  value for an option) cannot make everyone happy.
+
--- a/maskray-5.md
+++ b/maskray-5.md
@ -0,0 +1,462 @@
+# Copy relocations, canonical PLT entries and protected visibility
+
+Background:
+
+* `-fno-pic` can only be used by executables. On most platforms and
+  architectures, direct access relocations are used to reference external data
+  symbols.
+* `-fpic` can be used by both executables and shared objects. Windows has
+  `__declspec(dllimport)` but most other binary formats allow a default
+  visibility external data to be resolved to a shared object, so generally
+  direct access relocations are disallowed.
+* `-fpie` was introduced as a mode similar to `-fpic` for ELF: the compiler can
+  make the assumption that the produced object file can only be used by
+  executables, thus all definitions are non-preemptible and thus
+  interprocedural optimizations can apply on them.
+
+For
+
+```c
+extern int a;
+int *foo() { return &a; }
+```
+
+`-fno-pic` typically produces an absolute relocation (a PC-relative relocation
+can be used as well). On ELF x86-64 it is usually `R_X86_64_32` in the position
+dependent small code model. If a is defined in the executable (by another
+translation unit), everything works fine. If a turns out to be defined in a
+shared object, its real address will be non-constant at link time. Either
+action needs to be taken:
+
+* Emit a dynamic relocation in every use site. Text sections are usually
+  non-writable. A dynamic relocation applied on a non-writable section is
+  called a text relocation.
+* Emit a single copy relocation. Copy relocations only work for executables.
+  The linker obtains the size of the symbol, allocates the bytes in `.bss`
+  (this may make the object writable. On LLD a readonly area may be picked.),
+  and emit an `R_*_COPY` relocation. All references resolve to the new location.
+
+Multiple text relocations are even less acceptable, so on ELF a copy relocation
+is generally used. Here is a nice description from [Rich
+Felker](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55012): "Copy relocations
+are not a case of overriding the definition in the abstract machine, but an
+implementation detail used to support data objects in shared libraries when the
+main program is non-PIC."
+
+Copy relocations have drawbacks:
+
+* Break page sharing.
+* Make the symbol properties (e.g. size) part of ABI.
+* If the shared object is linked with `-Bsymbolic` or `--dynamic-list` and
+  defines a data symbol copy relocated by the executable, the address of the
+  symbol may be different in the shared object and in the executable.
+
+What went poorly was that `-fno-pic` code had no way to avoid copy relocations
+on ELF. Traditionally copy relocations could only occur in `-fno-pic` code. A
+GCC 5 change made this possible for x86-64. Please read on.
+
+## x86-64: copy relocations and `-fpie`
+
+`-fpic` using GOT indirection for external data symbols has cost. Making
+`-fpie` similar to `-fpic` in this regard incurs costs if the data symbol turns
+out to be defined in the executable. Having the data symbol defined in another
+translation unit linked into the executable is very common, especially if the
+vendor uses fully/mostly statically linking mode.
+
+In GCC 5, ["x86-64: Optimize access to globals in PIE with copy
+reloc"](https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=77ad54d911dd7cb88caf697ac213929f6132fdcf)
+started to use direct access relocations for external data symbols on x86-64 in
+`-fpie` mode.
+
+```c
+extern int a;
+int foo() { return a; }
+```
+
+* GCC&lt;5: `movq a@GOTPCREL(%rip), %rax; movl (%rax), %eax` (8 bytes)
+* GCC&gt;=5: `movl a(%rip), %eax` (6 bytes)
+
+This change is actually useful for architectures other than x86-64 but is never
+implemented for other architectures. What went wrong: the change was
+implemented as an inflexible configure-time choice (`HAVE_LD_PIE_COPYRELOC`),
+defaulting to such a behavior if ld supports PIE copy relocations (most
+binutils installations). Keep in mind that such a `-fpie` default [breaks
+`-Bsymbolic` and `--dynamic-list` in shared objects](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65888).
+
+Clang addressed the inflexible configure-time choice via an opt-in option
+`-mpie-copy-relocations` (D19996).
+
+I noticed that:
+
+* The option can be used for `-fno-pic` code as well to prevent copy
+  relocations on ELF. This is occasionally users want (if their shared objects
+  use `-Bsymbolic` and export data symbols (usually undesired from API
+  perspecitives but can avoid costs at times)), and they switch from `-fno-pic`
+  to `-fpic` just for this purpose.
+* The option name should describe the code generation behavior, instead of the
+  inferred behavior at the linking stage on a partibular binary format.
+* The option does not need to tie to ELF.
+  * On COFF, the behavior is like always `-fdirect-access-external-data`.
+    `__declspec(dllimport)` is needed to enable indirect access.
+  * On Mach-O, the behavior is like `-fdirect-access-external-data` for
+    `-fno-pic` (only available on arm) and the opposite for `-fpic`.
+* H.J. Lu introduced `R_X86_64_GOTPCRELX` and `R_X86_64_REX_GOTPCRELX` as GOT
+  optimization to x86-64 psABI. This is great! With the optimization, GOT
+  indirection can be optimized, so the incured cost is very low now.
+
+So I proposed an alternative option `-f[no-]direct-access-external-data`:
+https://reviews.llvm.org/D92633
+https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112. My wish on the GCC side is
+to drop `HAVE_LD_PIE_COPYRELOC` and (x86-64) default to GOT indirection for
+external data symbols in `-fpie` mode.
+
+Please keep in mind that `-f[no-]semantic-interposition` is for definitions
+while `-f[no-]direct-access-external-data` is for undefined data symbols. GCC 5
+introduced `-fno-semantic-interposition` to use local aliases for references to
+definitions in the same translation unit.
+
+## `STV_PROTECTED`
+
+Now let's consider how `STV_PROTECTED` comes into play. Here is the generic ABI
+definition:
+
+> A symbol defined in the current component is protected if it is visible in
+> other components but not preemptable, meaning that any reference to such a
+> symbol from within the defining component must be resolved to the definition
+> in that component, even if there is a definition in another component that
+> would preempt by the default rules. A symbol with `STB_LOCAL` binding may not
+> have `STV_PROTECTED` visibility. If a symbol definition with `STV_PROTECTED`
+> visibility from a shared object is taken as resolving a reference from an
+> executable or another shared object, the `SHN_UNDEF` symbol table entry
+> created has `STV_DEFAULT` visibility.
+
+A non-local `STV_DEFAULT` defined symbol is by default preemptible in a shared
+object on ELF. `STV_PROTECTED` can make the symbol non-preemptible. You may
+have noticed that I use "preemptible" while the generic ABI uses "preemptable"
+and LLVM IR uses "`dso_preemptable`". Both forms work. "preemptible" is my
+opition because it is more common.
+
+### Protected data symbols and copy relocations
+
+Many folks consider that copy relocations are best-effort support provided by
+the toolchain. `STV_PROTECTED` is intended as an optimization and the
+optimization can error out if it can't be done for whatever reason. Since copy
+relocations are already oftentimes unacceptable, it is natural to think that we
+should just disallow copy relocations on protected data symbols.
+
+However, GNU ld 2.26 made a change which enabled copy relocations on protected
+data symbols for i386 and x86-64.
+
+A glibc change ["Add `ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA` to
+x86"](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=62da1e3b00b51383ffa7efc89d8addda0502e107)
+is needed to make copy relocations on protected data symbols work.
+["[AArch64][BZ #17711] Fix extern protected data handling"](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0910702c4d2cf9e8302b35c9519548726e1ac489)
+and ["[ARM][BZ #17711] Fix extern protected data handling"](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3bcea719ddd6ce399d7bccb492c40af77d216e42)
+ported the thing to arm and aarch64.
+
+Despite the glibc support, GNU ld aarch64 errors relocation
+`R_AARCH64_ADR_PREL_PG_HI21` against symbol `foo` which may bind externally can
+not be used when making a shared object; recompile with `-fPIC`.
+
+powerpc64 ELFv2 is interesting: TOC indirection (TOC is a variant of GOT) is
+used everywhere, data symbols normally have no direct access relocations, so
+this is not a problem.
+
+```c
+// b.c
+__attribute__((visibility("protected"))) int foo;
+// a.c
+extern int foo;
+int main() { return foo; }
+```
+
+```
+gcc -fuse-ld=bfd -fpic -shared b.c -o b.so
+gcc -fuse-ld=bfd -pie -fno-pic a.c ./b.so
+```
+
+gold does not allow copy relocations on protected data symbols, but it misses
+some cases: https://sourceware.org/bugzilla/show_bug.cgi?id=19823.
+
+### Protected data symbols and direct accesses
+
+If a protected data symbol in a shared object is copy relocated, allowing
+direct accesses will cause the shared object to operate on a different copy
+from the executable. Therefore, direct accesses to protected data symbols have
+to be disallowed in `-fpic` code, just in case the symbols may be copy
+relocated.  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 changed GCC 5 to
+use GOT indirection for protected external data.
+
+```c
+__attribute__((visibility("protected"))) int foo;
+int val() { return foo; }
+// -fPIC: GOT on at least aarch64, arm, i386, x86-64
+```
+
+This caused unneeded pessimization for protected external data. Clang always
+treats protected similar to hidden/internal.
+
+For older GCC (and all versions of Clang), direct accesses are produced in
+`-fpic` code. Mixing such object files can silently break copy relocations on
+protected data symbols. Therefore, GNU ld made the change
+https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=ca3fe95e469b9daec153caa2c90665f5daaec2b5
+to error in `-shared` mode.
+
+```
+% cat a.s
+leaq foo(%rip), %rax
+
+.data
+.global foo
+.protected foo
+foo:
+```
+```
+% gcc -fuse-ld=bfd -shared a.s
+/usr/bin/ld.bfd: /tmp/ccchu3Xo.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
+/usr/bin/ld.bfd: final link failed: bad value
+collect2: error: ld returned 1 exit status
+```
+
+This led to a heated discussion
+https://sourceware.org/legacy-ml/binutils/2016-03/msg00312.html. Swift folks
+noticed this https://bugs.swift.org/browse/SR-1023 and their reaction was to
+switch from GNU ld to gold.
+
+GNU ld's aarch64 port does not have the diagnostic.
+
+binutils commit ["x86: Clear `extern_protected_data` for
+`GNU_PROPERTY_NO_COPY_ON_PROTECTED`"](https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=73784fa565bd66f1ac165816c03e5217b7d67bbc)
+introduced
+`GNU_PROPERTY_NO_COPY_ON_PROTECTED`. With this property, `ld -shared` will not
+error for relocation `R_X86_64_PC32` against protected symbol `foo` can not be
+used when making a shared object.
+
+The two issues above are the costs enabling copy relocations on protected data
+symbols. Personally I don't think copy relocations on protected data symbols
+are actually leveraged. GNU ld's x86 port can just (1) reject such copy
+relocations and (2) allow direct accesses referencing protected data symbols in
+`-shared` mode. But I am not really clear about the glibc case. I wish
+`GNU_PROPERTY_NO_COPY_ON_PROTECTED` can become the default or be phased out in
+the future.
+
+### Protected function symbols and canonical PLT entries
+
+```c
+// b.c
+__attribute__((visibility("protected"))) void *foo () {
+  return (void *)foo;
+}
+```
+
+GNU ld's aarch64 and x86 ports rejects the above code. On many other
+architectures including powerpc the code is supported.
+
+```
+% gcc -fpic -shared b.c -fuse-ld=bfd b.c -o b.so
+/usr/bin/ld.bfd: /tmp/cc3Ay0Gh.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
+/usr/bin/ld.bfd: final link failed: bad value
+collect2: error: ld returned 1 exit status
+% gcc -shared -fuse-ld=bfd -fpic b.c -o b.so
+/usr/bin/ld.bfd: /tmp/ccXdBqMf.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `foo' which may bind externally can not be used when making a shared object; recompile with -fPIC
+/tmp/ccXdBqMf.o: in function `foo':
+a.c:(.text+0x0): dangerous relocation: unsupported relocation
+collect2: error: ld returned 1 exit status
+```
+
+The rejection is mainly a historical issue to make pointer equality work with
+`-fno-pic` code. The GNU ld idea is that:
+
+* The compiler emits GOT-generating relocations for `-fpic` code (in reality it
+  does it for declarations but not for definitions).
+* `-fno-pic` main executable uses direct access relocation types and gets a
+  canonical PLT entry.
+* glibc ld.so resolves the GOT in the shared object to the canonical PLT entry.
+
+Actually we can take the interepretation that a canonical PLT entry is
+incompatible with a shared `STV_PROTECTED` definition, and reject the attempt
+to create a canonical PLT entry (gold/LLD). And we can keep producing direct
+access relocations referencing protected symbols for `-fpic` code.
+`STV_PROTECTED` is no different from `STV_HIDDEN`.
+
+On many architectures, a branch instruction uses a branch specific relocation
+type (e.g. `R_AARCH64_CALL26`, `R_PPC64_REL24`, `R_RISCV_CALL_PLT`). This is
+great because the address is insignificant and the linker can arrange for a
+regular PLT if the symbol turns out to be external.
+
+On i386, a branch in `-fno-pic` code emits an `R_386_PC32` relocation, which is
+indistinguishable from an address taken operation. If the symbol turns out to
+be external, the linker has to employ a tricky called "canonical PLT entry"
+(`st_shndx=0, st_value!=0`). The term is a parlance within a few LLD
+developers, but not broadly adopted.
+
+```c
+// a.c
+extern void foo(void);
+int main() { foo(); }
+```
+```
+% gcc -m32 -shared -fuse-ld=bfd -fpic b.c -o b.so
+% gcc -m32 -fno-pic -no-pie -fuse-ld=lld a.c ./b.so
+
+% gcc -m32 -fno-pic a.c ./b.so -fuse-ld=lld
+ld.lld: error: cannot preempt symbol: foo
+>>> defined in ./b.so
+>>> referenced by a.c
+>>>               /tmp/ccDGhzEy.o:(main)
+collect2: error: ld returned 1 exit status
+
+% gcc -m32 -fno-pic -no-pie a.c ./b.so -fuse-ld=bfd
+# canonical PLT entry; foo has different addresses in a.out and b.so.
+% gcc -m32 -fno-pic -pie a.c ./b.so -fuse-ld=bfd
+/usr/bin/ld.bfd: /tmp/ccZ3Rl8Y.o: warning: relocation against `foo' in read-only section `.text'
+/usr/bin/ld.bfd: warning: creating DT_TEXTREL in a PIE
+% gcc -m32 -fno-pic -pie a.c ./b.so -fuse-ld=bfd -z text
+/usr/bin/ld.bfd: /tmp/ccUv8wXc.o: warning: relocation against `foo' in read-only section `.text'
+/usr/bin/ld.bfd: read-only segment has dynamic relocations
+collect2: error: ld returned 1 exit status
+```
+
+This used to be a problem for x86-64 as well, until ["x86-64: Generate branch
+with PLT32 relocation"](https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bd7ab16b4537788ad53521c45469a1bdae84ad4a)
+changed call/jmp foo to emit `R_X86_64_PLT32` instead of `R_X86_64_PC32`. Note:
+(`-fpie`/`-fpic`) `call/jmp foo@PLT` always emits `R_X86_64_PLT32`.
+
+The relocation type name is a bit misleading, `_PLT32` does not mean that a PLT
+will always be created. Rather, it is optional: the linker can resolve `_PLT32`
+to any place where the function will be called. If the symbol is preemptible,
+the place is usually the PLT entry. If the symbol is non-preemptible, the
+linker can convert `_PLT32` into `_PC32`. A function symbol can be either
+branched or taken address. For an address taken operation, the function symbol
+is used in a manner similar to a data symbol. `R_386_PLT32` cannot be used. LLD
+and gold will just reject the link if text relocations are disabled.
+
+On i386, my proposal is that branches to a default visibility function
+declaration should use `R_386_PLT32` instead of `R_386_PC32`, in a manner
+similar to x86-64. Originally I thought an assembler change sufficed:
+https://sourceware.org/bugzilla/show_bug.cgi?id=27169. Please read the next
+section why this should be changed on the compiler side.
+
+### Non-default visibility ifunc and `R_386_PC32`
+
+For a call to a hidden function declaration, the compiler produces an
+`R_386_PC32` relocation. The relocation is an indicator that EBX may not be set
+up.
+
+If the declaration refers to an ifunc definition, the linker will resolve the
+`R_386_PC32` to an IPLT entry. For `-pie` and `-shared` links, the IPLT entry
+references EBX. If the call site does not set up EBX to be
+`_GLOBAL_OFFSET_TABLE_`, the IPLT call will be incorrect.
+
+GNU ld has implemented a diagnostic (["i686 ifunc and non-default symbol
+visibility"](https://sourceware.org/bugzilla/show_bug.cgi?id=20515)) to catch
+the problem. If we change `call/jmp foo` to always use `R_386_PLT32`, such a
+diagnostic will be lost.
+
+Can we change the compiler to emit `call/jmp foo@PLT` for default visibility
+function declarations? If the compiler emits such a modifier but does not set
+up EBX, the ifunc can still be non-preemptible (e.g. hidden in another
+translation unit or `-Bsymbolic`) and we will still have a dilemma.
+
+Personally, I think avoiding a canonical PLT entry is more useful than a ld
+ifunc diagnostic. i386 ABI is legacy and the x86 maintainer will not make the
+change, though.
+
+## Summary
+
+I hope the above give an overview to interested readers. Symbol interposition
+is subtle. One has to think about all the factors related to symbol
+interposition and the relevant toolchain fixes are like a whack-a-mole game. I
+appreciate all the prior discussions and I believe many unsatisfactory things
+can be fixed in a quite backward-compatible way.
+
+Some features are inherently incompatible. We make the trade-off in favor of
+more important features. Here are two things that should not work. However, if
+`-fpie` or `-fno-direct-access-external-data` is specified, both limitations
+will be circumvented.
+
+* Copy relocations on protected data symbols.
+* Canonical PLT entries on protected function symbols. With the `R_386_PLT32`
+  change, this issue will only affect function pointers.
+
+People sometimes simply just say: "protected visibility does not work." I'd
+argue that Clang+gold/LLD works quite well.
+
+The things on GCC+GNU ld side are inconsistent, though. Here is a list of
+changes I wish can happen:
+
+* GCC: add `-f[no-]direct-access-external-data`.
+* GCC: drop `HAVE_LD_PIE_COPYRELOC` in favor of `-f[no-]direct-access-external-data`.
+* GCC x86-64: default to GOT indirection for external data symbols in `-fpie`
+  mode.
+* GCC or GNU as i386: emit `R_386_PLT32` for branches to undefined function
+  symbols.
+* GNU ld x86: disallow copy relocations on protected data symbols. (I think
+  canonical PLT entries on protected symbols have been disallowed.)
+* GCC aarch64/arm/x86/...: allow direct access relocations on protected symbols
+  in `-fpic` mode.
+* GNU ld aarch64/x86: allow direct access relocations on protected data symbols
+  in `-shared` mode.
+
+The breaking changes for GCC+GNU ld:
+
+* The "copy relocations on protected data symbols" scheme has been supported in
+  the past few years with GNU ld on x86, but it did not work before circa 2015,
+  and should not work in the future. Fortunately the breaking surface may be
+  narrow: this scheme does not work with gold or LLD. Many architectures don't
+  work.
+* ld is not the only consumer of `R_386_PLT32`. The Linux kernel has code
+  resolving relocations and it needs to be fixed (patch uploaded: https://github.com/ClangBuiltLinux/linux/issues/1210).
+
+I'll conclude thie article with random notes on other binary formats:
+
+Windows/COFF `__declspec(dllimport)` gives us a different perspecitive how
+external references can be designed. The annotation is verbose but
+differentiates the two cases (1) the symbol has to be defined in the same
+linkage unit (2) the symbol can be defined in another linkage unit. If we lift
+the "the symbol visibility is decided by the most constrained visibility"
+requirement for protected-&gt;default, a COFF undefined/defined symbol is quite
+like a protected undefined/defined symbol in ELF. `__declspec(dllimport)` gives
+the undefined symbol default visibility (i.e. the LLVM IR `dllimport` is
+redundant). `__declspec(dllexport)` is something which cannot be modeled with
+the existing ELF visibilities.
+
+For an undefined variable, Mach-O uses `__attribute__((visibility("hidden")))`
+to say "a definition must be available in another translation unit in the same
+linkage unit" but does not actually mark the undefined symbol anyway. COFF uses
+`__declspec(dllimport)` to convey this. In ELF,
+`__attribute__((visibility("hidden")))` additionally makes the undefined symbol
+unexportable. The Mach-O notation actually resembles COFF: it can be exported
+by the definition in another translation unit. From its behavior, I think it
+would be more appropriately mapped to LLVM IR protected instead of hidden.
+
+## Appendix
+
+For a `STB_GLOBAL`/`STB_WEAK` symbol,
+
+`STV_DEFAULT`: both compiler &amp; linker need to assume such symbols can be
+preempted in `-fpic` mode. The compiler emits GOT indirection by default. GCC
+`-fno-semantic-interposition` uses local aliases on defined non-weak function
+symbols for x86 (unimplemented in other architectures). Clang
+`-fno-semantic-interposition` uses local aliases on defined non-weak symbols
+(both function and data) for x86.
+
+`STV_PROTECTED`: GCC `-fpic` uses GOT indirection for data symbols, regardless
+of defined or undefined. This pessimization is to make a misfeature "copy
+relocation on protected data symbol" work
+(https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#protected-data-symbols-and-direct-accesses).
+Clang code generation treats `STV_PROTECTED` the same way as `STV_HIDDEN`.
+
+`STV_HIDDEN`: non-preemptible, regardless of defined or undefined. The compiler
+suppresses GOT indirection, unless undefined `STB_WEAK`.
+
+For defined symbols, `-fno-pic`/`-fpie` can avoid GOT indirection for
+`STV_DEFAULT` (and GCC `STV_PROTECTED`). `-fvisibility=hidden` can change
+visibility.
+
+For undefined symbols, `-fpie`/`-fpic` use GOT indirection by default. Clang
+`-fno-direct-access-external-data` (discussed in my article) can avoid GOT
+indirection. If you `-fpic -fno-direct-access-external-data` &amp; `ld
+-shared`, you'll need additional linker options to make the linker know defined
+non-`STB_LOCAL` `STV_DEFAULT` symbols are non-preemptible.
+
--- a/maskray-6.md
+++ b/maskray-6.md
@ -0,0 +1,328 @@
+# GNU indirect function
+
+UNDER CONSTRUCTION.
+
+GNU indirect function (ifunc) is a mechanism making a direct function call
+resolve to an implementation picked by a resolver. It is mainly used in glibc
+but has adoption in FreeBSD.
+
+For some performance critical functions, e.g. memcpy/memset/strcpy, glibc
+provides multiple implementations optimized for different architecture levels.
+The application just uses `memcpy(...)` which compiles to call memcpy. The
+linker will create a PLT for `memcpy` and produce an associated special dynamic
+relocation referencing the resolver symbol/address. During relocation resolving
+at runtime, the return value of the resolver will be placed in the GOT entry
+and the PLT entry will load the address.
+
+## Representation
+
+ifunc has a dedicated symbol type `STT_GNU_IFUNC` to mark it different from a
+regular function (`STT_FUNC`). The value 10 is in the OS-specific range (10~12).
+`readelf -s` tell you that the symbol is ifunc if OSABI is `ELFOSABI_GNU` or
+`ELFOSABI_FREEBSD`.
+
+On Linux, by default GNU as uses `ELFOSABI_NONE` (0). If ifunc is used, the OSABI
+will be changed to `ELFOSABI_GNU`. Similarly, GNU ld sets the OSABI to
+`ELFOSABI_GNU` if ifunc is used. gold does not do this [PR17735](https://sourceware.org/bugzilla/show_bug.cgi?id=17735).
+
+Things are loose in LLVM. The integrated assembler and LLD do not set
+`ELFOSABI_GNU`. Currently the only problem I know is the `readelf -s` display.
+Everything else works fine.
+
+### Assembler behavior
+
+In assembly, you can assign the type `STT_GNU_IFUNC` to a symbol via
+`.type foo, @gnu_indirect_function`. An ifunc symbol is typically `STB_GLOBAL`.
+
+In the object file, `st_shndx` and `st_value` of an `STT_GNU_IFUNC` symbol
+indicate the resolver. After linking, if the symbol is still `STT_GNU_IFUNC`,
+its `st_value` field indicates the resolver address in the linked image.
+
+Assemblers usually convert relocations referencing a local symbol to reference
+the section symbol, but this behavior needs to be inhibited for `STT_GNU_IFUNC`.
+
+### Example
+
+```
+cat > b.s <<e
+.global ifunc
+.type ifunc, @gnu_indirect_function
+.set ifunc, resolver
+
+resolver:
+  leaq impl(%rip), %rax
+  ret
+
+impl:
+  movq $42, %rax
+  ret
+e
+
+cat > a.c <<e
+int ifunc(void);
+int main() { return ifunc(); }
+e
+
+cc a.c b.s
+./a.out  # exit code 42
+```
+
+GNU as makes transitive aliases to an `STT_GNU_IFUNC` ifunc as well.
+
+```
+.type foo,@gnu_indirect_function
+.set foo, foo_resolver
+
+.set foo2, foo
+.set foo3, foo2
+```
+
+GCC and Clang support a function attribute which emits
+`.type ifunc, @gnu_indirect_function; .set ifunc, resolver`:
+
+```c
+static int impl(void) { return 42; }
+static void *resolver(void) { return impl; }
+void *ifunc(void) __attribute__((ifunc("resolver")));
+```
+
+## Preemptible ifunc
+
+A preemptible ifunc call is no different from a regular function call from the
+linker perspective.
+
+The linker creates a PLT entry, reserves an associated GOT entry, and emits an
+`R_*_JUMP_SLOT` relocation resolving the address into the GOT entry. The PLT
+code sequence is the same as a regular PLT for `STT_FUNC`.
+
+If the ifunc is defined within the module, the symbol type in the linked image
+is `STT_GNU_IFUNC`, otherwise (defined in a DSO), the symbol type is `STT_FUNC`.
+
+The difference resides in the loader.
+
+At runtime, the relocation resolver checks whether the `R_*_JUMP_SLOT`
+relocation refers to an ifunc. If it does, instead of filling the GOT entry
+with the target address, the resolver calls the target address as an indirect
+function, with ABI specified additional parameters (hwcap related), and places
+the return value into the GOT entry.
+
+## Non-preemptible ifunc
+
+The non-preemptible ifunc case is where all sorts of complexity come from.
+
+First, the `R_*_JUMP_SLOT` relocation type cannot be used in some cases:
+
+* A non-preemptible ifunc may not have a dynamic symbol table entry. It can be
+  local. It can be defined in the executable without the need to export.
+* A non-local `STV_DEFAULT` symbol defined in a shared object is by default
+  preemptible. Using `R_*_JUMP_SLOT` for such a case will make the ifunc look
+  like preemptible.
+
+Therefore a new relocation type `R_*_IRELATIVE` was introduced. There is no
+associated symbol and the address indicates the resolver.
+
+```
+R_*_RELATIVE: B + A
+R_*_IRELATIVE: call (B + A) as a function
+R_*_JUMP_SLOT: S
+```
+
+When an `R_*_JUMP_SLOT` can be used, there is a trade-off between
+`R_*_JUMP_SLOT` and `R_*_IRELATIVE`: an `R_*_JUMP_SLOT` can be lazily resolved
+but needs a symbol lookup. Currently powerpc can use `R_PPC64_JMP_SLOT` in some
+cases [PR27203](https://sourceware.org/bugzilla/show_bug.cgi?id=27203).
+
+A PLT entry is needed for two reasons:
+
+* The call sites emit instructions like call foo. We need to forward them to a
+  place to perform the indirection. Text relocations are usually not an option
+  (exception: [{ifunc-noplt}]()).
+* If the ifunc is exported, we need a place to mark its canonical address.
+
+Such PLT entries are sometimes referred to as IPLT. They are placed in the
+synthetic section .iplt. In GNU ld, `.iplt` will be placed in the output
+section `.plt`. In LLD, I decided that `.iplt` is better
+https://reviews.llvm.org/D71520.
+
+On many architectures (e.g. AArch64/PowerPC/x86), the PLT code sequence is the
+same as a regular PLT, but it could be different.
+
+On x86-64, the code sequence is:
+
+```
+jmp *got(%rip)
+pushq $0
+jmp .plt
+```
+
+Since there is no lazy binding, `pushq $0; jmp .plt` are not needed. However,
+to make all PLT entries of the same shape to simplify linker implementations
+and facilitate analyzers, it is find to keep it this way.
+
+## PowerPC32 `-msecure-plt` IPLT
+
+As a design to work around the lack of PC-relative instructions, PowerPC32 uses
+multiple GOT sections, one per file in `.got2`. To support multiple GOT
+pointers, the addend on each `R_PPC_PLTREL24` reloc will have the offset within
+`.got2`.
+
+`-msecure-plt` has small/large PIC differences.
+* `-fpic`/`-fpie`: `R_PPC_PLTREL24 r_addend=0`. The call stub loads an address
+  relative to `_GLOBAL_OFFSET_TABLE_`.
+* `-fPIC`/`-fPIE`: `R_PPC_PLTREL24 r_addend=0x8000`. (A partial linked object
+  file may have an addend larger than 0x8000.) The call stub loads an address
+  relative to `.got2+0x8000`.
+
+If a non-preemptible ifunc is referenced in two object files, in
+`-pie`/`-shared` mode, the two object files cannot share the same IPLT entry.
+When I added non-preemptible ifunc support for PowerPC32 to LLD
+https://reviews.llvm.org/D71621, I did not handle this case.
+
+### `.rela.dyn` vs `.rela.plt`
+
+LLD placed `R_*_IRELATIVE` in the `.rela.plt` section because many ports of GNU
+ld behaved this way. While implementing ifunc for PowerPC, I noticed that GNU
+ld powerpc actually places `R_*_IRELATIVE` in `.rela.dyn` and glibc powerpc
+does not actually support `R_*_IRELATIVE` in `.rela.plt`. This makes a lot of
+sense to me because `.rela.plt` normally just contains `R_*_JUMP_SLOT` which
+can be lazily resolved. ifunc relocations need to be eagerly resolved so
+`.rela.plt` was a misplace. Therefore I changed LLD to use `.rela.dyn` in
+https://reviews.llvm.org/D65651.
+
+## `__rela_iplt_start` and `__rela_iplt_end`
+
+A statically linked position dependent executable traditionally had no dynamic
+relocations.
+
+With ifunc, these `R_*_IRELATIVE` relocations must be resolved at runtime. Such
+relocations are in a magic array delimitered by `__rela_iplt_start` and
+`__rela_iplt_end`. In glibc, `csu/libc-start.c` has special code processing the
+relocation range.
+
+GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in `-pie`
+mode. LLD defines `__rela_iplt_start` regardless of `-no-pie`, `-pie` or
+`-shared`.
+
+In glibc, static pie uses self-relocation (`_dl_relocate_static_pie`) to take
+care of `R_*_IRELATIVE`. The above magic array code is executed by static pie
+as well. If `__rela_iplt_start`/`__rela_iplt_end` are defined, we will get
+`0 < __rela_iplt_start < __rela_iplt_end` in `csu/libc-start.c`.
+`ARCH_SETUP_IREL` will crash when resolving the first relocation which has been
+processed.
+
+I think the difference in the
+`diff -u =(ld.bfd --verbose) =(ld.bfd -pie --verbose)` output is unneeded.
+https://sourceware.org/pipermail/libc-alpha/2021-January/121755.html
+
+## Address significance
+
+A non-GOT-generating non-PLT-generating relocation referencing a
+`STT_GNU_IFUNC` indicates a potential address-taken operation.
+
+With a function attribute, the compilers knows that a symbol indicates an ifunc
+and will avoid generating such relocations. With assembly such relocations may
+be unavoidable.
+
+In most cases the linker needs to convert the symbol type to `STT_FUNC` and
+create a special PLT entry, which is called a "canonical PLT entry" in LLD.
+References from other modules will resolve to the PLT entry to keep pointer
+equality: the address taken from the defining module should match the address
+taken from another module.
+
+This approach has pros and cons:
+
+* With a canonical PLT entry, the resolver of a symbol is called only once.
+  There is exactly one `R_*_IRELATIVE` relocation.
+* If the relocation appears in a non-`SHF_WRITE` section, a text relocation can
+  be avoided.
+* Relocation types which are not valid dynamic relocation types are supported.
+  GNU ld may error relocation `R_X86_64_PC32` against `STT_GNU_IFUNC` symbol
+  `ifunc` isn't supported
+* References will bind to the canonical PLT entry. A function call needs to
+  jump to the PLT, loads the value from the GOT, then does an indirect call.
+
+For a symbolic relocation type (a special case of absolute relocation types
+where the width matches the word size) like `R_X86_64_64`, when the addend is 0
+and the section has the `SHF_WRITE` flag, the linker can emit an
+`R_X86_64_IRELATIVE`. https://reviews.llvm.org/D65995 dropped the case.
+
+For the following example, GNU ld linked `a.out` calls `fff_resolver` three
+times while LLD calls it once.
+
+```c
+// RUN: split-file %s %t
+// RUN: clang -fuse-ld=bfd -fpic %t/dso.c -o %t/dso.so --shared
+// RUN: clang -fuse-ld=bfd %t/main.c %t/dso.so -o %t/a.out
+// RUN: %t/a.out
+
+//--- dso.c
+typedef void fptr(void);
+extern void fff(void);
+
+fptr *global_fptr0 = &fff;
+fptr *global_fptr1 = &fff;
+
+//--- main.c
+#include <stdio.h>
+
+static void fff_impl() { printf("fff_impl()\n"); }
+static int z;
+void *fff_resolver() { return (char *)&fff_impl + z++; }
+
+__attribute__((ifunc("fff_resolver"))) void fff();
+typedef void fptr(void);
+fptr *local_fptr = fff;
+extern fptr *global_fptr0, *global_fptr1;
+
+int main() {
+  printf("local %p global0 %p global1 %p\n", local_fptr, global_fptr0, global_fptr1);
+  return 0;
+}
+```
+
+### Relocation resolving order
+
+`R_*_IRELATIVE` relocations are resolved eagerly. In glibc, there used to be a
+problem where ifunc resolvers ran before `GL(dl_hwcap)` and `GL(dl_hwcap2)`
+were set up https://sourceware.org/bugzilla/show_bug.cgi?id=27072.
+
+For the relocation resolver, the main executable needs to be processed the last
+to process `R_*_COPY`. Without ifunc, the resolving order of shared objects can
+be arbitrary.
+
+For ifunc, if the ifunc is defined in a processed module, it is fine. If the
+ifunc is defined in an unprocessed module, it may crash.
+
+For an ifunc defined in an executable, calling it from a shared object can be
+problematic because the executable's relocations haven't been resolved. The
+issue can be circumvented by converting the non-preemptible ifunc defined in
+the executable to `STT_FUNC`. GNU ld's x86 port made the change
+[PR23169](https://sourceware.org/bugzilla/show_bug.cgi?id=23169).
+
+## `-z ifunc-noplt`
+
+Mark Johnston introduced `-z ifunc-noplt` for FreeBSD
+https://reviews.llvm.org/D61613. With this option, all relocations referencing
+`STT_GNU_IFUNC` will be emitted as dynamic relocations (if `.dynsym` is
+created).  The canonical PLT entry will not be used.
+
+## Miscellaneous
+
+GNU ld has implemented a diagnostic (["i686 ifunc and non-default symbol
+visibility"](https://sourceware.org/bugzilla/show_bug.cgi?id=20515)) to flag
+`R_386_PC32` referencing non-default visibility ifunc in `-pie` and `-shared`
+links. This diagnostic looks like the most prominent reason blocking my
+proposal to use `R_386_PLT32` for `call/jump foo`. See [Copy relocations,
+canonical PLT entries and protected visibility](maskray-5.md) for details.
+
+https://sourceware.org/glibc/wiki/GNU_IFUNC misses a lot of information. There
+are quite a few arch differences. I asked for clarification
+https://sourceware.org/pipermail/libc-alpha/2021-January/121752.html
+
+### Dynamic loader
+
+In glibc, `_dl_runtime_resolver` needs to save and restore vector and floating
+point registers. ifunc resolvers add another reason that `_dl_runtime_resolver`
+cannot only use integer registers. (The other reasons are that ld.so has string
+function calls which may use vectors and external calls to libc.so.)
+
--- a/maskray-7.md
+++ b/maskray-7.md
@ -0,0 +1,223 @@
+# Everything I know about GNU toolchain
+
+As mainly an LLVM person, I occasionally contribute to GNU toolchain projects.
+This is sometimes for fun, sometimes for investigating why an (usually ancient)
+feature works in a particular way, sometimes for pushing forward a toolchain
+feature with the mind of both communities, or sometimes just for getting sense
+of how things work with mailing list+GNU make.
+
+For a debug build, I normally place my build directory `Debug` directly under
+the project root.
+
+## binutils
+
+* Repository: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git
+* Mailing list: https://sourceware.org/pipermail/binutils
+* Bugzilla: https://sourceware.org/bugzilla/
+* Main tools: as (`gas/`, GNU assembler), ld (`ld/`, GNU ld), gold (`gold/`,
+  GNU gold)
+
+As of 2021-01, it has no wiki.
+
+Target `all` builds targets `all-host` and `all-target`. When running
+configure, by default most top-level directories binutils `gas gdb gdbserver ld
+libctf` are all enabled. You can disable some components via `--disable-*`.
+`--enable-gold` is needed to enable gold.
+
+```sh
+mkdir Debug; cd Debug
+../configure --target=x86_64-linux-gnu --prefix=/tmp/opt --disable-gdb --disable-gdbserver
+```
+
+For cross compiling, make sure your have `$target-{gcc,as,ld}`.
+
+For many tools (binutils, gdb, ld), `--enable-targets=all` will build every
+supported architectures and binary formats. However, one gas build can only
+support one architecture. ld has a default emulation and needs `-m` to support
+other architectures (`aarch64 architecture of input file 'a.o' is incompatible
+with i386:x86-64 output`). Many tests are generic and can be run on many
+targets, but a `--enable-targets=all` build only tests its default target.
+
+```sh
+# binutils (binutils/*)
+make -C Debug all-binutils
+# gas (gas/as-new)
+make -C Debug all-gas
+# ld (ld/ld-new)
+make -C Debug all-ld
+
+# Build all enabled tools.
+make -C Debug all
+```
+
+Build with Clang:
+
+```sh
+mkdir -p out/clang-debug; cd out/clang-debug
+../../configure CC=~/Stable/bin/clang CXX=~/Stable/bin/clang++ CFLAGS='-O0 -g' CXXFLAGS='-O0 -g'
+```
+
+About security aspect, "don't run any of binutils as root" is sufficient advice
+(Alan Modra).
+
+## Test
+
+GNU Test Framework DejaGnu is based on Expect, which is in turn based on Tcl.
+
+To run tests:
+
+```sh
+make -C Debug check-binutils
+# Find the result in (summary) Debug/binutils/binutils.sum and (details) Debug/binutils/binutils.log
+
+make -C Debug check-gas
+# Find the result in (summary) Debug/gas/testsuite/gas.sum and (details) Debug/gas/testsuite/gas.log
+
+make -C Debug check-ld
+
+# Test all enabled tools.
+make -C Debug check-all
+```
+
+For ld, tests are listed in `.exp` files under `ld/testsuite`. A single test
+normally consists of a `.d` file and several associated `.s` files.
+
+To run the tests in `ld/testsuite/ld-shared/shared.exp`:
+
+```sh
+make -C Debug check-ld RUNTESTFLAGS=ld-shared/shared.exp
+```
+
+### Misc
+
+* A bot updates bfd/version.h (`BFD_VERSION_DATE`) daily.
+* Test coverage is low.
+
+## gdb
+
+gdb resides in the binutils-gdb repository. `configure` enables gdb and
+gdbserver by default. You just need to make sure `--disable-gdb
+--disable-gdbserver` is not on the configure line.
+
+Run gdb under the build directory:
+
+```sh
+gdb/gdb -data-directory gdb/data-directory
+```
+
+To run the tests in `gdb/testsuite/gdb.dwarf2/dw2-abs-hi-pc.exp`:
+
+```sh
+make check-gdb RUNTESTFLAGS=gdb.dwarf2/dw2-abs-hi-pc.exp
+
+# cd $build/gdb/testsuite/outputs/gdb.dwarf2/dw2-abs-hi-pc
+```
+
+## glibc
+
+* Repository: https://sourceware.org/git/gitweb.cgi?p=glibc.git
+* Wiki: https://sourceware.org/glibc/wiki/
+* Bugzilla: https://sourceware.org/bugzilla/
+* Mailing lists: `{libc-announce,libc-alpha,libc-locale,libc-stable,libc-help}@sourceware.org`
+
+(Mostly) an implementation of the user-space side of standard C/POSIX functions
+with Linux extensions.
+
+A very unfortunate fact: glibc can only be built with `-O2`, not `-O0` or
+`-O1`. If you want to have an un-optimized debug build, deleting an object file
+and recompiling it with `-g` usually works. Another workaround is `#pragma GCC
+optimize ("O0")`.
+
+The `-O2` issue is probably related to (1) expected inlining and (2) avoiding
+dynamic relocations.
+
+Run the following commands to populate `/tmp/glibc-many` with toolchains.
+Caution: please make sure the target file system has tens of gigabytes.
+
+Preparation:
+
+```sh
+scripts/build-many-glibcs.py /tmp/glibc-many checkout --shallow
+scripts/build-many-glibcs.py /tmp/glibc-many host-libraries
+
+scripts/build-many-glibcs.py /tmp/glibc-many compilers aarch64-linux-gnu
+scripts/build-many-glibcs.py /tmp/glibc-many compilers powerpc64le-linux-gnu
+scripts/build-many-glibcs.py /tmp/glibc-many compilers sparc64-linux-gnu
+```
+
+* `--shallow` passes `--depth 1` to the git clone command.
+* `--keep` all keeps intermediary build directories intact. You may want this
+  option to investigate build issues.
+
+The `glibcs` command will delete the glibc build directory, build glibc, and
+run `make check`.
+
+```sh
+scripts/build-many-glibcs.py /tmp/glibc-many glibcs aarch64-linux-gnu
+# Find the logs and test results under /tmp/glibc-many/logs/glibcs/aarch64-linux-gnu/
+
+scripts/build-many-glibcs.py /tmp/glibc-many glibcs powerpc64le-linux-gnu
+
+scripts/build-many-glibcs.py /tmp/glibc-many glibcs sparc64-linux-gnu
+```
+
+"On build-many-glibcs.py and most stage1 compiler bootstrap, gcc is build
+statically against newlib. the static linked gcc (with a lot of disabled
+features) is then used to build glibc and then the stage2 gcc (which will then
+have all the features that rely on libc enabled) so the stage1 gcc *might* not
+have the require started files"
+
+During development, some interesting targets:
+
+```sh
+make -C Debug check-abi
+```
+
+Building with Clang is not an option.
+
+* Clang does not support GCC nested functions [BZ #27220](https://sourceware.org/bugzilla/show_bug.cgi?id=27220)
+* x86 `PRESERVE_BND_REGS_PREFIX`: integrated assembler does not support the
+  `bnd` prefix.
+* `sysdeps/powerpc/powerpc64/Makefile`: Clang does not support
+  `-ffixed-vrsave -ffixed-vscr`
+
+## GCC
+
+* Mailing lists: `gcc-{patches,regression}@sourceware.org`
+
+`--disable-bootstrap` is the most important, otherwise you will get a stage 2
+build. It is not clear what make does when you touch a source file. It
+definitely rebuilds stage1, but it is not clear to me how well stage2
+dependency is handled. Anyway, touching a source file causes a total build is
+not what you desire.
+
+```sh
+../configure --disable-bootstrap --enable-languages=c,c++ --disable-multilib
+make -j 30
+
+# Incremental build
+make -C gcc cc1 cc1plus xgcc
+make -C x86_64-pc-linux-gnu/libstdc++-v3
+```
+
+Use built libstdc++ and libgcc.
+
+```sh
+$build/gcc/xg++ -B $build/release/gcc forced1.C -Wl,-rpath,$build/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs,-rpath,$build/x86_64-pc-linux-gnu/libgcc
+```
+
+### Misc
+
+* A bot updates `ChangeLog` files daily. `Daily bump.`
+
+## Unlisted
+
+autotools, bison, m4, make, ...
+
+### Contributing
+
+[GNU Coding Standards](https://www.gnu.org/prep/standards/). Emacs has good
+built-in support. clang-format's support is not as good.
+
+Legally significant changes need [Copyright Papers](https://www.gnu.org/prep/maintain/html_node/Copyright-Papers.html).
+
--- a/maskray-8.md
+++ b/maskray-8.md
@ -0,0 +1,253 @@
+# Metadata sections, COMDAT and `SHF_LINK_ORDER`
+
+## COMDAT
+
+In C++, inline functions, template instantiations and a few other things can be
+defined in multiple object files but need deduplication at link time. In the
+dark ages the functionality was implemented by weak definitions: the linker
+does not report duplicate definition errors and resolves the references to the
+first definition. The downside is that unneeded copies remained in the linked
+image.
+
+In Microsoft PE file format, the section flag (`IMAGE_SCN_LNK_COMDAT`) marks a
+section COMDAT and enables deduplication on a per-section basis. If a text
+section needs a data section and deduplication is needed for both sections, two
+COMDAT symbols are needed.
+
+In the GNU world, `.gnu.linkonce.` was invented to duplicate groups with just
+one member. `.gnu.linkonce.` has been long obsoleted in favor of section groups
+but the usage has been daunting til 2020. Adhemerval Zanella removed the the
+last live glibc use case for `.gnu.linkonce.`
+[BZ #20543](http://sourceware.org/PR20543).
+
+## ELF section groups
+
+The ELF specification generalized this use case to allow an arbitrary number of
+groups to be interrelated.
+
+> Some sections occur in interrelated groups. For example, an out-of-line
+> definition of an inline function might require, in addition to the section
+> containing its executable instructions, a read-only data section containing
+> literals referenced, one or more debugging information sections and other
+> informational sections. Furthermore, there may be internal references among
+> these sections that would not make sense if one of the sections were removed
+> or replaced by a duplicate from another object. Therefore, such groups must
+> be included or omitted from the linked object as a unit. A section cannot be
+> a member of more than one group.
+
+According to "such groups must be included or omitted from the linked object as
+a unit", a linker's garbage collection feature must retain or discard the
+sections as a unit.
+
+The most common section group flag is `GRP_COMDAT`, which makes the member
+sections similar to COMDAT in Microsoft PE file format, but can apply to
+multiple sections. (The committee borrowed the name "COMDAT" from PE.)
+
+> This is a COMDAT group. It may duplicate another COMDAT group in another
+> object file, where duplication is defined as having the same group signature.
+> In such cases, only one of the duplicate groups may be retained by the
+> linker, and the members of the remaining groups must be discarded.
+
+I want to highlight one thing GCC does (and Clang inherits) for backward
+compatibility: the definitions relatived to a COMDAT group member are kept
+`STB_WEAK` instead of `STB_GLOBAL`. The idea is that old toolchain which does
+not recognize COMDAT groups can still operate correctly, just in a degraded
+manner.
+
+## Metadata sections
+
+Many compiler options intrument text sections or annotate text sections, and
+need to create a metadata section for (almost) every text section. Such
+metadata sections have some characteristics:
+
+* All relocations from the metadata section reference the associated text
+  section.
+* The metadata section is only referenced by the associated text section or not
+  referenced at all.
+
+Below is an example:
+
+```
+.section .text.foo,"ax",@progbits
+
+.section .meta.foo,"a",@progbits
+.quad .text.foo-.
+```
+
+Users want GC semantics for such metadata sections: if `.text.foo` is retained,
+`.meta.foo` is retained. Note: the regular GC semantics are converse: if
+`.meta.foo` is retained, `.text.foo` is retained.
+
+To achieve the desired GC semantics on ELF platforms, we could use a non-COMDAT
+section group. However, using a section group requires one extra section
+(usually named `.group`), which requires 40 bytes on ELFCLASS32 platforms and
+64 bytes on ELFCLASS64 platforms. Put it in another way, to represent the
+metadata of a text section, we need two sections (the metadata section and the
+section group), 128 bytes on ELFCLASS64 platforms. The size overhead is
+concerning in many applications. (AArch64 and x86-64 define ILP32 ABIs and use
+ELFCLASS32, but technically they can use ELFCLASS32 for small code model with
+regular ABIs, if the kernel allows.)
+
+In a generic-abi thread, Cary Coutant initially suggested to use a new section
+flag `SHF_ASSOCIATED`. HP-UX and Solaris folks objected to a new generic flag.
+Cary Coutant then discussed with Jim Dehnert and noticed that the existing
+(rare) flag `SHF_LINK_ORDER` has semantics closer to the metadata GC semantics,
+so he intended to replace the existing flag `SHF_LINK_ORDER`. Solaris had used
+its own `SHF_ORDERED` extension before it migrated to the ELF simplification
+`SHF_LINK_ORDER`. Solaris is still using `SHF_LINK_ORDER` so the flag cannot be
+repurposed. People discussed whether `SHF_OS_NONCONFORMING` could be repurposed
+but did not take that route: the platform already knows whether a flag is
+unknown and knowing a flag is non-conforming does not help produce better
+output. In the end the agreement was that `SHF_LINK_ORDER` gained additional
+metadata GC semantics.
+
+The new semantics:
+
+> This flag adds special ordering requirements for link editors. The
+> requirements apply to the referenced section identified by the sh_link field
+> of this section's header. If this section is combined with other sections in
+> the output file, the section must appear in the same relative order with
+> respect to those sections, as the referenced section appears with respect to
+> sections the referenced section is combined with.
+>
+> A typical use of this flag is to build a table that references text or data
+> sections in address order.
+>
+> In addition to adding ordering requirements, `SHF_LINK_ORDER` indicates that
+> the section contains metadata describing the referenced section. When
+> performing unused section elimination, the link editor should ensure that
+> both the section and the referenced section are retained or discarded
+> together. Furthermore, relocations from this section into the referenced
+> section should not be taken as evidence that the referenced section should be
+> retained.
+
+Actually, ARM EHABI has been using `SHF_LINK_ORDER` for index table sections
+`.ARM.exidx*`. A `.ARM.exidx` section contains a sequence of 2-word pairs. The
+first word is 31-bit PC-relative offset to the start of the region. The idea is
+that if the entries are ordered by the start address, the end address of an
+entry is implicitly the start address of the next entry and does not need to be
+explicitly encoded. For this reason the section uses `SHF_LINK_ORDER` for the
+ordering requirement. The GC semantics are very similar to the metadata
+sections'.
+
+So the updated `SHF_LINK_ORDER` wording can be seen as recognition for the
+current practice (even though the original discussion did not actually notice
+ARM EHABI).
+
+However, in binutils, before 2.35, `SHF_LINK_ORDER` could be produced by ARM
+assembly directives, but not specified by user-customized sections.
+
+## C identifier name sections
+
+A section whose name consists of pure C-like identifier characters (isalnum
+characters in the C locale plus `_`) is considered as a GC root by ld
+`--gc-sections`. The idea is that linker defined `__start_foo` and `__stop_foo`
+are used to delimiter the output section foo. Even if input sections foo are
+not referenced by other sections, `__start_foo`/`__stop_foo` is a signal that
+foo should be retained.
+
+The metadata use case requires an amendment of the rule: if `SHF_LINK_ORDER` is
+set on foo, foo can be GCed (LLD r294592).
+
+GNU ld does not implement this rule yet. https://sourceware.org/bugzilla/show_bug.cgi?id=27259
+
+## Pitfalls
+
+### Mixed unordered and ordered sections
+
+If an output section consists of only non-`SHF_LINK_ORDER` sections, the rule is
+clear: input sections are ordered in their input order. If an output section
+consists of only `SHF_LINK_ORDER` sections, the rule is also clear: input
+sections are ordered with respect to their linked-to sections.
+
+What is unclear is how to handle an output section with mixed unordered and
+ordered sections.
+
+GNU ld had a diagnostic: . LLD rejected the case as well error:
+`incompatible section flags for .rodata`.
+
+When I implemented `-fpatchable-function-entry=` for Clang, I observed some GC
+related issues with the GCC implementation. I reported them and carefully chose
+`SHF_LINK_ORDER` in the Clang implementation if the integrated assembler is
+used.
+
+This was a problem if the user wanted to place such input sections along with
+unordered sections, e.g.
+`.init.data : { ... KEEP(*(__patchable_function_entries)) ... }`
+(https://github.com/ClangBuiltLinux/linux/issues/953).
+
+As a response, I submitted https://reviews.llvm.org/D77007 to allow ordered
+input section descriptions within an output section.
+
+This worked well for the Linux kernel. Mixed unordered and ordered sections
+within an input section description was still a problem. This made it
+infeasible to add `SHF_LINK_ORDER` to an existing metadata section and expect
+new object files linkable with old object files which do not have the flag. I
+asked how to resolve this upgrade issue and Ali Bahrami responded:
+
+> The Solaris linker puts sections without `SHF_LINK_ORDER` at the end of the
+> output section, in first-in-first-out order, and I don't believe that's
+> considered to be an error.
+
+So I went ahead and implemented a similar rule for LLD:
+https://reviews.llvm.org/D84001 allowes arbitrary mix and places
+`SHF_LINK_ORDER` sections before non-`SHF_LINK_ORDER` sections.
+
+### If the associated section is discarded
+
+We decided that the integrated assembler allows `SHF_LINK_ORDER` with
+`sh_link=0` and LLD can handle such sections as regular unordered sections
+(https://reviews.llvm.org/D72904).
+
+### Other pitfalls
+
+* During `--icf={safe,all}`, `SHF_LINK_ORDER` sections should not be separately
+  considered.
+* In relocatable output, `SHF_LINK_ORDER` sections cannot be combined by name.
+* When comparing two input sections with different linked-to output sections,
+  use vaddr of output sections instead of section indexes. Peter Smith fixed
+  this in https://reviews.llvm.org/D79286.
+
+## Miscellaneous
+
+Arm Compiler 5 splits up DWARF Version 3 debug information and puts these
+sections into comdat groups. On "monolithic input section handling", Peter
+Smith commented that:
+
+> We found that splitting up the debug into fragments works well as it permits
+> the linker to ensure that all the references to local symbols are to sections
+> within the same group, this makes it easy for the linker to remove all the
+> debug when the group isn't selected.
+>
+> This approach did produce significantly more debug information than gcc did.
+> For small microcontroller projects this wasn't a problem. For larger feature
+> phone problems we had to put a lot of work into keeping the linker's memory
+> usage down as many of our customers at the time were using 32-bit Windows
+> machines with a default maximum virtual memory of 2Gb.
+
+COMDAT sections have size overhead on extra section headers. Developers may be
+tempted to decrease the overhead with `SHF_LINK_ORDER`. However, the approach
+does not work due to the ordering requirement. Considering the following
+fragments:
+
+```
+header [a.o common]
+- DW_TAG_compile_unit [a.o common]
+-- DW_TAG_variable [a.o .data.foo]
+-- DW_TAG_namespace [common]
+--- DW_TAG_subprogram [a.o .text.bar]
+--- DW_TAG_variable [a.o .data.baz]
+footer [a.o common]
+header [b.o common]
+- DW_TAG_compile_unit [b.o common]
+-- DW_TAG_variable [b.o .data.foo]
+-- DW_TAG_namespace [common]
+--- DW_TAG_subprogram [b.o .text.bar]
+--- DW_TAG_variable [b.o .data.baz]
+footer [b.o common]
+```
+
+`DW_TAG_*` tags associated with concrete sections can be represented with
+`SHF_LINK_ORDER` sections. After linking the sections will be ordered before the
+common parts.
+