more articles

This commit is contained in:
Triss 2021-02-03 01:08:25 +01:00
parent 58a5061601
commit 589177c6c2
14 changed files with 4397 additions and 0 deletions

View File

@ -43,3 +43,14 @@ Other articles included as well:
* [Executable stack](executable-stack.md)
* [Piece of PIE](piece-of-pie.md)
Even more articles, from [MaskRay's blog](https://maskray.me/blog/):
* [Stack unwinding](maskray-1.md)
* [All about symbol versioning](maskray-2.md)
* [C++ exception handling ABI](maskray-3.md)
* [LLD and GNU linker incompatibilities](maskray-4.md)
* [Copy relocations, canonical PLT entries and protected visibility](maskray-5.md)
* [GNU indirect function](maskray-6.md)
* [Everything I know about GNU toolchain](maskray-7.md)
* [Metadata sections, COMDAT and `SHF_LINK_ORDER`](maskray-8.md)

View File

@ -0,0 +1,123 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.43.0 (0)
-->
<!-- Title: %3 Pages: 1 -->
<svg width="630pt" height="224pt"
viewBox="0.00 0.00 630.00 224.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 220)">
<title>%3</title>
<polygon fill="white" stroke="transparent" points="-4,4 -4,-220 626,-220 626,4 -4,4"/>
<!-- eh_frame -->
<g id="node1" class="node">
<title>eh_frame</title>
<polygon fill="none" stroke="black" points="0,-146.5 0,-215.5 622,-215.5 622,-146.5 0,-146.5"/>
<text text-anchor="middle" x="311" y="-200.3" font-family="Times,serif" font-size="14.00">.eh_frame</text>
<polyline fill="none" stroke="black" points="0,-192.5 622,-192.5 "/>
<text text-anchor="middle" x="131" y="-177.3" font-family="Times,serif" font-size="14.00">FDE0</text>
<polyline fill="none" stroke="black" points="0,-169.5 262,-169.5 "/>
<text text-anchor="middle" x="49" y="-154.3" font-family="Times,serif" font-size="14.00">initial_location</text>
<polyline fill="none" stroke="black" points="98,-146.5 98,-169.5 "/>
<text text-anchor="middle" x="148.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_personality</text>
<polyline fill="none" stroke="black" points="199,-146.5 199,-169.5 "/>
<text text-anchor="middle" x="230.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_lsda</text>
<polyline fill="none" stroke="black" points="262,-146.5 262,-192.5 "/>
<text text-anchor="middle" x="393" y="-177.3" font-family="Times,serif" font-size="14.00">FDE1</text>
<polyline fill="none" stroke="black" points="262,-169.5 524,-169.5 "/>
<text text-anchor="middle" x="311" y="-154.3" font-family="Times,serif" font-size="14.00">initial_location</text>
<polyline fill="none" stroke="black" points="360,-146.5 360,-169.5 "/>
<text text-anchor="middle" x="410.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_personality</text>
<polyline fill="none" stroke="black" points="461,-146.5 461,-169.5 "/>
<text text-anchor="middle" x="492.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_lsda</text>
<polyline fill="none" stroke="black" points="524,-146.5 524,-192.5 "/>
<text text-anchor="middle" x="573" y="-177.3" font-family="Times,serif" font-size="14.00">FDE2</text>
<polyline fill="none" stroke="black" points="524,-169.5 622,-169.5 "/>
<text text-anchor="middle" x="573" y="-154.3" font-family="Times,serif" font-size="14.00">initial_location</text>
</g>
<!-- text_a -->
<g id="node2" class="node">
<title>text_a</title>
<polygon fill="none" stroke="black" points="131.5,-0.5 131.5,-36.5 210.5,-36.5 210.5,-0.5 131.5,-0.5"/>
<text text-anchor="middle" x="171" y="-14.8" font-family="Times,serif" font-size="14.00">.text._Z1av</text>
</g>
<!-- eh_frame&#45;&gt;text_a -->
<g id="edge1" class="edge">
<title>eh_frame:loc0&#45;&gt;text_a</title>
<path fill="none" stroke="black" d="M49,-146C49,-113.05 42.18,-99.33 62,-73 76.62,-53.58 100.2,-40.8 121.68,-32.61"/>
<polygon fill="black" stroke="black" points="123.12,-35.82 131.37,-29.17 120.78,-29.22 123.12,-35.82"/>
</g>
<!-- text_b -->
<g id="node3" class="node">
<title>text_b</title>
<polygon fill="none" stroke="black" points="314,-0.5 314,-36.5 394,-36.5 394,-0.5 314,-0.5"/>
<text text-anchor="middle" x="354" y="-14.8" font-family="Times,serif" font-size="14.00">.text._Z1bv</text>
</g>
<!-- eh_frame&#45;&gt;text_b -->
<g id="edge4" class="edge">
<title>eh_frame:loc1&#45;&gt;text_b</title>
<path fill="none" stroke="black" d="M311,-146C311,-112.2 360.65,-139.01 378,-110 389.9,-90.1 381.08,-64.27 370.95,-45.31"/>
<polygon fill="black" stroke="black" points="373.96,-43.53 365.95,-36.6 367.89,-47.02 373.96,-43.53"/>
</g>
<!-- text_c -->
<g id="node4" class="node">
<title>text_c</title>
<polygon fill="none" stroke="black" points="533.5,-73.5 533.5,-109.5 612.5,-109.5 612.5,-73.5 533.5,-73.5"/>
<text text-anchor="middle" x="573" y="-87.8" font-family="Times,serif" font-size="14.00">.text._Z1cv</text>
</g>
<!-- eh_frame&#45;&gt;text_c -->
<g id="edge7" class="edge">
<title>eh_frame:loc2&#45;&gt;text_c</title>
<path fill="none" stroke="black" d="M573,-146C573,-137.51 573,-128.26 573,-119.88"/>
<polygon fill="black" stroke="black" points="576.5,-119.85 573,-109.85 569.5,-119.85 576.5,-119.85"/>
</g>
<!-- text_personality -->
<g id="node5" class="node">
<title>text_personality</title>
<polygon fill="none" stroke="black" points="71.5,-73.5 71.5,-109.5 236.5,-109.5 236.5,-73.5 71.5,-73.5"/>
<text text-anchor="middle" x="154" y="-87.8" font-family="Times,serif" font-size="14.00">.text.__gxx_personality_v0</text>
</g>
<!-- eh_frame&#45;&gt;text_personality -->
<g id="edge2" class="edge">
<title>eh_frame:personality0&#45;&gt;text_personality</title>
<path fill="none" stroke="black" d="M148,-146C148,-137.46 148.76,-128.19 149.75,-119.81"/>
<polygon fill="black" stroke="black" points="153.23,-120.16 151.07,-109.79 146.29,-119.25 153.23,-120.16"/>
</g>
<!-- eh_frame&#45;&gt;text_personality -->
<g id="edge5" class="edge">
<title>eh_frame:personality1&#45;&gt;text_personality</title>
<path fill="none" stroke="black" d="M411,-146C411,-143.84 320.57,-125.41 246.99,-110.78"/>
<polygon fill="black" stroke="black" points="247.22,-107.26 236.73,-108.75 245.86,-114.13 247.22,-107.26"/>
</g>
<!-- lsda -->
<g id="node6" class="node">
<title>lsda</title>
<polygon fill="none" stroke="black" points="255,-73.5 255,-109.5 369,-109.5 369,-73.5 255,-73.5"/>
<text text-anchor="middle" x="312" y="-87.8" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
</g>
<!-- eh_frame&#45;&gt;lsda -->
<g id="edge3" class="edge">
<title>eh_frame:lsda0&#45;&gt;lsda</title>
<path fill="none" stroke="black" d="M230,-146C230,-132.86 237.48,-122.78 247.91,-115.11"/>
<polygon fill="black" stroke="black" points="249.85,-118.02 256.4,-109.7 246.08,-112.12 249.85,-118.02"/>
</g>
<!-- eh_frame&#45;&gt;lsda -->
<g id="edge6" class="edge">
<title>eh_frame:lsda1&#45;&gt;lsda</title>
<path fill="none" stroke="black" d="M493,-146C493,-139.84 430.6,-122.47 379.08,-109.18"/>
<polygon fill="black" stroke="black" points="379.83,-105.76 369.27,-106.66 378.09,-112.54 379.83,-105.76"/>
</g>
<!-- lsda&#45;&gt;text_a -->
<g id="edge8" class="edge">
<title>lsda&#45;&gt;text_a</title>
<path fill="none" stroke="black" stroke-dasharray="1,5" d="M278.23,-73.49C259.01,-63.82 234.74,-51.6 214.15,-41.23"/>
<polygon fill="black" stroke="black" points="215.49,-37.99 204.99,-36.61 212.34,-44.24 215.49,-37.99"/>
</g>
<!-- lsda&#45;&gt;text_b -->
<g id="edge9" class="edge">
<title>lsda&#45;&gt;text_b</title>
<path fill="none" stroke="black" stroke-dasharray="1,5" d="M322.17,-73.31C327.12,-64.94 333.18,-54.7 338.68,-45.4"/>
<polygon fill="black" stroke="black" points="341.85,-46.92 343.93,-36.53 335.82,-43.35 341.85,-46.92"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 6.8 KiB

View File

@ -0,0 +1,66 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.43.0 (0)
-->
<!-- Title: %3 Pages: 1 -->
<svg width="364pt" height="209pt"
viewBox="0.00 0.00 364.00 209.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 205)">
<title>%3</title>
<polygon fill="white" stroke="transparent" points="-4,4 -4,-205 360,-205 360,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster</title>
<polygon fill="none" stroke="black" points="8,-8 8,-157 348,-157 348,-8 8,-8"/>
<text text-anchor="middle" x="178" y="-141.8" font-family="Times,serif" font-size="14.00">Edges represent relocations</text>
</g>
<!-- unused -->
<g id="node1" class="node">
<title>unused</title>
<ellipse fill="none" stroke="black" cx="264" cy="-183" rx="36" ry="18"/>
<text text-anchor="middle" x="264" y="-179.3" font-family="Times,serif" font-size="14.00">unused</text>
</g>
<!-- fde_a -->
<g id="node2" class="node">
<title>fde_a</title>
<polygon fill="none" stroke="black" points="210,-89.5 210,-125.5 318,-125.5 318,-89.5 210,-89.5"/>
<text text-anchor="middle" x="264" y="-103.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE0</text>
</g>
<!-- unused&#45;&gt;fde_a -->
<g id="edge3" class="edge">
<title>unused&#45;&gt;fde_a</title>
<path fill="none" stroke="black" d="M264,-164.95C264,-156.3 264,-145.57 264,-135.79"/>
<polygon fill="black" stroke="black" points="267.5,-135.71 264,-125.71 260.5,-135.71 267.5,-135.71"/>
</g>
<!-- lsda_a -->
<g id="node4" class="node">
<title>lsda_a</title>
<polygon fill="none" stroke="black" points="188,-16.5 188,-52.5 340,-52.5 340,-16.5 188,-16.5"/>
<text text-anchor="middle" x="264" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1av</text>
</g>
<!-- fde_a&#45;&gt;lsda_a -->
<g id="edge1" class="edge">
<title>fde_a&#45;&gt;lsda_a</title>
<path fill="none" stroke="black" d="M264,-89.31C264,-81.29 264,-71.55 264,-62.57"/>
<polygon fill="black" stroke="black" points="267.5,-62.53 264,-52.53 260.5,-62.53 267.5,-62.53"/>
</g>
<!-- fde_b -->
<g id="node3" class="node">
<title>fde_b</title>
<polygon fill="none" stroke="black" points="39,-89.5 39,-125.5 147,-125.5 147,-89.5 39,-89.5"/>
<text text-anchor="middle" x="93" y="-103.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE1</text>
</g>
<!-- lsda_b -->
<g id="node5" class="node">
<title>lsda_b</title>
<polygon fill="none" stroke="black" points="16.5,-16.5 16.5,-52.5 169.5,-52.5 169.5,-16.5 16.5,-16.5"/>
<text text-anchor="middle" x="93" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1bv</text>
</g>
<!-- fde_b&#45;&gt;lsda_b -->
<g id="edge2" class="edge">
<title>fde_b&#45;&gt;lsda_b</title>
<path fill="none" stroke="black" d="M93,-89.31C93,-81.29 93,-71.55 93,-62.57"/>
<polygon fill="black" stroke="black" points="96.5,-62.53 93,-52.53 89.5,-62.53 96.5,-62.53"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 3.0 KiB

96
img/lsda_gc.svg Normal file
View File

@ -0,0 +1,96 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.43.0 (0)
-->
<!-- Title: %3 Pages: 1 -->
<svg width="496pt" height="246pt"
viewBox="0.00 0.00 496.00 246.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 242)">
<title>%3</title>
<polygon fill="white" stroke="transparent" points="-4,4 -4,-242 492,-242 492,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster</title>
<polygon fill="none" stroke="black" points="8,-8 8,-230 480,-230 480,-8 8,-8"/>
<text text-anchor="middle" x="244" y="-214.8" font-family="Times,serif" font-size="14.00">Edges represent GC references</text>
</g>
<!-- eh_frame -->
<g id="node1" class="node">
<title>eh_frame</title>
<polygon fill="none" stroke="black" points="159.5,-162.5 159.5,-198.5 288.5,-198.5 288.5,-162.5 159.5,-162.5"/>
<text text-anchor="middle" x="224" y="-176.8" font-family="Times,serif" font-size="14.00">.eh_frame (GC root)</text>
</g>
<!-- lsda -->
<g id="node4" class="node">
<title>lsda</title>
<polygon fill="none" stroke="black" points="16,-89.5 16,-125.5 130,-125.5 130,-89.5 16,-89.5"/>
<text text-anchor="middle" x="73" y="-103.8" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
</g>
<!-- eh_frame&#45;&gt;lsda -->
<g id="edge1" class="edge">
<title>eh_frame&#45;&gt;lsda</title>
<path fill="none" stroke="black" d="M187.83,-162.49C167.07,-152.73 140.79,-140.38 118.62,-129.95"/>
<polygon fill="black" stroke="black" points="119.94,-126.7 109.4,-125.61 116.96,-133.04 119.94,-126.7"/>
</g>
<!-- lsda_a -->
<g id="node5" class="node">
<title>lsda_a</title>
<polygon fill="none" stroke="black" points="148,-89.5 148,-125.5 300,-125.5 300,-89.5 148,-89.5"/>
<text text-anchor="middle" x="224" y="-103.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1av</text>
</g>
<!-- eh_frame&#45;&gt;lsda_a -->
<g id="edge2" class="edge">
<title>eh_frame&#45;&gt;lsda_a</title>
<path fill="none" stroke="black" d="M224,-162.31C224,-154.29 224,-144.55 224,-135.57"/>
<polygon fill="black" stroke="black" points="227.5,-135.53 224,-125.53 220.5,-135.53 227.5,-135.53"/>
</g>
<!-- lsda_b -->
<g id="node6" class="node">
<title>lsda_b</title>
<polygon fill="none" stroke="black" points="318.5,-89.5 318.5,-125.5 471.5,-125.5 471.5,-89.5 318.5,-89.5"/>
<text text-anchor="middle" x="395" y="-103.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1bv</text>
</g>
<!-- eh_frame&#45;&gt;lsda_b -->
<g id="edge3" class="edge">
<title>eh_frame&#45;&gt;lsda_b</title>
<path fill="none" stroke="black" d="M264.96,-162.49C288.79,-152.6 319.03,-140.04 344.34,-129.53"/>
<polygon fill="black" stroke="black" points="345.89,-132.68 353.78,-125.61 343.21,-126.22 345.89,-132.68"/>
</g>
<!-- text_a -->
<g id="node2" class="node">
<title>text_a</title>
<polygon fill="none" stroke="black" points="184.5,-16.5 184.5,-52.5 263.5,-52.5 263.5,-16.5 184.5,-16.5"/>
<text text-anchor="middle" x="224" y="-30.8" font-family="Times,serif" font-size="14.00">.text._Z1av</text>
</g>
<!-- text_a&#45;&gt;lsda_a -->
<g id="edge4" class="edge">
<title>text_a&#45;&gt;lsda_a</title>
<path fill="none" stroke="black" d="M229.86,-52.53C230.71,-60.53 230.95,-70.27 230.59,-79.25"/>
<polygon fill="black" stroke="black" points="227.09,-79.09 229.88,-89.31 234.07,-79.58 227.09,-79.09"/>
</g>
<!-- text_b -->
<g id="node3" class="node">
<title>text_b</title>
<polygon fill="none" stroke="black" points="355,-16.5 355,-52.5 435,-52.5 435,-16.5 355,-16.5"/>
<text text-anchor="middle" x="395" y="-30.8" font-family="Times,serif" font-size="14.00">.text._Z1bv</text>
</g>
<!-- text_b&#45;&gt;lsda_b -->
<g id="edge6" class="edge">
<title>text_b&#45;&gt;lsda_b</title>
<path fill="none" stroke="black" d="M400.86,-52.53C401.71,-60.53 401.95,-70.27 401.59,-79.25"/>
<polygon fill="black" stroke="black" points="398.09,-79.09 400.88,-89.31 405.07,-79.58 398.09,-79.09"/>
</g>
<!-- lsda_a&#45;&gt;text_a -->
<g id="edge5" class="edge">
<title>lsda_a&#45;&gt;text_a</title>
<path fill="none" stroke="black" d="M218.12,-89.31C217.28,-81.29 217.05,-71.55 217.42,-62.57"/>
<polygon fill="black" stroke="black" points="220.92,-62.75 218.14,-52.53 213.94,-62.25 220.92,-62.75"/>
</g>
<!-- lsda_b&#45;&gt;text_b -->
<g id="edge7" class="edge">
<title>lsda_b&#45;&gt;text_b</title>
<path fill="none" stroke="black" d="M389.12,-89.31C388.28,-81.29 388.05,-71.55 388.42,-62.57"/>
<polygon fill="black" stroke="black" points="391.92,-62.75 389.14,-52.53 384.94,-62.25 391.92,-62.75"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.6 KiB

84
img/lsda_gc_new.svg Normal file
View File

@ -0,0 +1,84 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.43.0 (0)
-->
<!-- Title: %3 Pages: 1 -->
<svg width="496pt" height="173pt"
viewBox="0.00 0.00 496.00 173.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 169)">
<title>%3</title>
<polygon fill="white" stroke="transparent" points="-4,4 -4,-169 492,-169 492,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster</title>
<polygon fill="none" stroke="black" points="8,-8 8,-157 480,-157 480,-8 8,-8"/>
<text text-anchor="middle" x="244" y="-141.8" font-family="Times,serif" font-size="14.00">Edges represent GC references</text>
</g>
<!-- eh_frame -->
<g id="node1" class="node">
<title>eh_frame</title>
<polygon fill="none" stroke="black" points="342.5,-89.5 342.5,-125.5 471.5,-125.5 471.5,-89.5 342.5,-89.5"/>
<text text-anchor="middle" x="407" y="-103.8" font-family="Times,serif" font-size="14.00">.eh_frame (GC root)</text>
</g>
<!-- lsda -->
<g id="node4" class="node">
<title>lsda</title>
<polygon fill="none" stroke="black" points="358,-16.5 358,-52.5 472,-52.5 472,-16.5 358,-16.5"/>
<text text-anchor="middle" x="415" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
</g>
<!-- eh_frame&#45;&gt;lsda -->
<g id="edge1" class="edge">
<title>eh_frame&#45;&gt;lsda</title>
<path fill="none" stroke="black" d="M408.94,-89.31C409.84,-81.29 410.94,-71.55 411.95,-62.57"/>
<polygon fill="black" stroke="black" points="415.44,-62.86 413.08,-52.53 408.48,-62.07 415.44,-62.86"/>
</g>
<!-- text_a -->
<g id="node2" class="node">
<title>text_a</title>
<polygon fill="none" stroke="black" points="224.5,-89.5 224.5,-125.5 303.5,-125.5 303.5,-89.5 224.5,-89.5"/>
<text text-anchor="middle" x="264" y="-103.8" font-family="Times,serif" font-size="14.00">.text._Z1av</text>
</g>
<!-- lsda_a -->
<g id="node5" class="node">
<title>lsda_a</title>
<polygon fill="none" stroke="black" points="188,-16.5 188,-52.5 340,-52.5 340,-16.5 188,-16.5"/>
<text text-anchor="middle" x="264" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1av</text>
</g>
<!-- text_a&#45;&gt;lsda_a -->
<g id="edge2" class="edge">
<title>text_a&#45;&gt;lsda_a</title>
<path fill="none" stroke="black" d="M258.12,-89.31C257.28,-81.29 257.05,-71.55 257.42,-62.57"/>
<polygon fill="black" stroke="black" points="260.92,-62.75 258.14,-52.53 253.94,-62.25 260.92,-62.75"/>
</g>
<!-- text_b -->
<g id="node3" class="node">
<title>text_b</title>
<polygon fill="none" stroke="black" points="53,-89.5 53,-125.5 133,-125.5 133,-89.5 53,-89.5"/>
<text text-anchor="middle" x="93" y="-103.8" font-family="Times,serif" font-size="14.00">.text._Z1bv</text>
</g>
<!-- lsda_b -->
<g id="node6" class="node">
<title>lsda_b</title>
<polygon fill="none" stroke="black" points="16.5,-16.5 16.5,-52.5 169.5,-52.5 169.5,-16.5 16.5,-16.5"/>
<text text-anchor="middle" x="93" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1bv</text>
</g>
<!-- text_b&#45;&gt;lsda_b -->
<g id="edge4" class="edge">
<title>text_b&#45;&gt;lsda_b</title>
<path fill="none" stroke="black" d="M87.12,-89.31C86.28,-81.29 86.05,-71.55 86.42,-62.57"/>
<polygon fill="black" stroke="black" points="89.92,-62.75 87.14,-52.53 82.94,-62.25 89.92,-62.75"/>
</g>
<!-- lsda_a&#45;&gt;text_a -->
<g id="edge3" class="edge">
<title>lsda_a&#45;&gt;text_a</title>
<path fill="none" stroke="black" d="M269.86,-52.53C270.71,-60.53 270.95,-70.27 270.59,-79.25"/>
<polygon fill="black" stroke="black" points="267.09,-79.09 269.88,-89.31 274.07,-79.58 267.09,-79.09"/>
</g>
<!-- lsda_b&#45;&gt;text_b -->
<g id="edge5" class="edge">
<title>lsda_b&#45;&gt;text_b</title>
<path fill="none" stroke="black" d="M98.86,-52.53C99.71,-60.53 99.95,-70.27 99.59,-79.25"/>
<polygon fill="black" stroke="black" points="96.09,-79.09 98.88,-89.31 103.07,-79.58 96.09,-79.09"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.0 KiB

View File

@ -0,0 +1,64 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.43.0 (0)
-->
<!-- Title: %3 Pages: 1 -->
<svg width="274pt" height="219pt"
viewBox="0.00 0.00 274.00 219.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 215)">
<title>%3</title>
<polygon fill="white" stroke="transparent" points="-4,4 -4,-215 270,-215 270,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster</title>
<polygon fill="none" stroke="black" points="8,-8 8,-167 258,-167 258,-8 8,-8"/>
<text text-anchor="middle" x="133" y="-151.8" font-family="Times,serif" font-size="14.00">Edges represent relocations</text>
</g>
<!-- unused -->
<g id="node1" class="node">
<title>unused</title>
<ellipse fill="none" stroke="black" cx="70" cy="-193" rx="36" ry="18"/>
<text text-anchor="middle" x="70" y="-189.3" font-family="Times,serif" font-size="14.00">unused</text>
</g>
<!-- fde_a -->
<g id="node2" class="node">
<title>fde_a</title>
<polygon fill="none" stroke="black" points="16,-99.5 16,-135.5 124,-135.5 124,-99.5 16,-99.5"/>
<text text-anchor="middle" x="70" y="-113.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE0</text>
</g>
<!-- unused&#45;&gt;fde_a -->
<g id="edge3" class="edge">
<title>unused&#45;&gt;fde_a</title>
<path fill="none" stroke="black" d="M70,-174.95C70,-166.3 70,-155.57 70,-145.79"/>
<polygon fill="black" stroke="black" points="73.5,-145.71 70,-135.71 66.5,-145.71 73.5,-145.71"/>
</g>
<!-- lsda -->
<g id="node4" class="node">
<title>lsda</title>
<polygon fill="none" stroke="black" points="76,-16.5 76,-62.5 190,-62.5 190,-16.5 76,-16.5"/>
<text text-anchor="middle" x="133" y="-47.3" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
<polyline fill="none" stroke="black" points="76,-39.5 190,-39.5 "/>
<text text-anchor="middle" x="104" y="-24.3" font-family="Times,serif" font-size="14.00">lsda_a</text>
<polyline fill="none" stroke="black" points="132,-16.5 132,-39.5 "/>
<text text-anchor="middle" x="161" y="-24.3" font-family="Times,serif" font-size="14.00">lsda_b</text>
</g>
<!-- fde_a&#45;&gt;lsda -->
<g id="edge1" class="edge">
<title>fde_a&#45;&gt;lsda:a</title>
<path fill="none" stroke="black" d="M64.21,-99.34C57.5,-77.11 49.26,-40.03 65.15,-30.04"/>
<polygon fill="black" stroke="black" points="66.19,-33.39 75,-27.5 64.44,-26.61 66.19,-33.39"/>
</g>
<!-- fde_b -->
<g id="node3" class="node">
<title>fde_b</title>
<polygon fill="none" stroke="black" points="142,-99.5 142,-135.5 250,-135.5 250,-99.5 142,-99.5"/>
<text text-anchor="middle" x="196" y="-113.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE1</text>
</g>
<!-- fde_b&#45;&gt;lsda -->
<g id="edge2" class="edge">
<title>fde_b&#45;&gt;lsda:b</title>
<path fill="none" stroke="black" d="M201.79,-99.34C208.5,-77.11 216.74,-40.03 200.85,-30.04"/>
<polygon fill="black" stroke="black" points="201.56,-26.61 191,-27.5 199.81,-33.39 201.56,-26.61"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 3.0 KiB

708
maskray-1.md Normal file
View File

@ -0,0 +1,708 @@
# Stack unwinding
The main usage of stack unwinding is:
* To obtain a stack trace for debugger, crash reporter, profiler, garbage
collector, etc.
* With personality routines and language specific data area, to implement C++
exceptions (Itanium C++ ABI). See [C++ exception handling ABI](maskray-3.md)
Stack unwinding tasks can be divided into two categories:
* synchronous: triggered by the program itself, C++ throw, get its own stack
trace, etc. This type of stack unwinding only occurs at the function call
(in the function body, it will not appear in the prologue/epilogue)
* asynchronous: triggered by a garbage collector, signals or an external
program, this kind of stack unwinding can happen in function prologue/epilogue
## Frame pointer
The most classic and simplest stack unwinding is based on the frame pointer:
fix a register as the frame pointer (RBP on x86-64), put the frame pointer in
the stack frame at the function prologue, and update the frame pointer to the
address of the saved frame pointer. The frame pointer and its saved values in
the stack form a singly linked list. After obtaining the initial frame pointer
value (`__builtin_frame_address`), dereference the frame pointer continuously
to get the frame pointer values of all stack frames. This method is not
applicable to some instructions in the prologue/epilogue.
```
pushq %rbp
movq %rsp, %rbp # after this, RBP references the current frame
...
popq %rbp
retq # RBP references the previous frame
```
```c
#include <stdio.h>
[[gnu::noinline]] void qux() {
void **fp = __builtin_frame_address(0);
for (;;) {
printf("%p\n", fp);
void **next_fp = *fp;
if (next_fp <= fp) break;
fp = next_fp;
}
}
[[gnu::noinline]] void bar() { qux(); }
[[gnu::noinline]] void foo() { bar(); }
int main() { foo(); }
```
The frame pointer-based method is simple, but has several drawbacks.
When the above code is compiled with `-O1` or above, foo and bar will have tail
calls, and the program output will not include the stack frame of foo and bar
(`-fomit-leaf-frame-pointer` does not hinder the tail call).
In practice, it is not guaranteed that all libraries contain frame pointers.
When unwinding a thread, it is necessary to check whether `next_fp` is like a
stack address before dereferencing it to prevent segfaults. One way to check
page accessibility is to parse `/proc/*/maps` to determine whether the address is
readable (slow). There is a smart trick:
```c
// Or use the write end of a pipe.
int fd = open("/dev/random", O_WRONLY);
if (write(fd, address, 1) < 0)
// not readable
```
In addition, reserving a register for the frame pointer will increase text size
and have negative performance impact (prologue, epilogue additional instruction
overhead and register pressure caused by one fewer register), which may be
quite significant on x86-32 which lack registers. On an architecture with
relatively sufficient registers, e.g. x86-64, the performance loss can be more
than 1%.
### Compiler behavior
* -O0: Default `-fno-omit-frame-pointer`, all functions have frame pointer
* -O1 or above: Preset `-fomit-frame-pointer`, set frame pointer only if
necessary. Specify `-fno-omit-leaf-frame-pointer` to get a similar effect to
-O0. You can additionally specify `-momti-leaf-frame-pointer` to remove the
frame pointer of leaf functions
## libunwind
C++ exception and stack unwinding of profiler/crash reporter usually use
libunwind API and DWARF Call Frame Information. In the 1990s, Hewlett-Packard
defined a set of libunwind API, which is divided into two categories:
* `unw_*`: The entry points are `unw_init_local` (local unwinding, current
process) and `unw_init_remote` (remote unwinding, other processes).
Applications that usually use libunwind use this API. For example, Linux perf
will call `unw_init_remote`
* `_Unwind_*`: This part is standardized as Level 1: Base ABI of [Itanium C++
ABI: Exception Handling](https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html).
The Level 2 C++ ABI calls these `_Unwind_*` APIs. Among them, `_Unwind_Resume`
is the only API that is directly called by C++ compiled code.
`_Unwind_Backtrace` is used by a few applications to obtain stack traces. Other
functions are called by libsupc++/libc++abi `__cxa_*` functions and
`__gxx_personality_v0`.
Hewlett-Packard has open sourced https://www.nongnu.org/libunwind/ (in addition
to many projects called "libunwind"). The common implementations of this API on
Linux are:
* libgcc/unwind-\* (`libgcc_s.so.1` or `libgcc_eh.a`): Implemented `_Unwind_*`
and introduced some extensions: `_Unwind_Resume_or_Rethrow`,
`_Unwind_FindEnclosingFunction`, `__register_frame` etc.
* llvm-project/libunwind (`libunwind.so` or `libunwind.a`) is a simplified
implementation of HP API, which provides part of `unw_*`, but does not
implement `unw_init_remote`. Part of the code is taken from ld64. If you use
Clang, you can use `--rtlib=compiler-rt --unwindlib=libunwind` to choose
* glibc's internal implementation of `_Unwind_Find_FDE`, usually not exported,
and related to `__register_frame_info`
## DWARF Call Frame Information
The unwind instructions required by different areas of the program are
described by DWARF Call Frame Information (CFI) and stored by `.eh_frame` on
the ELF platform. Compiler/assembler/linker/libunwind provides corresponding
support.
`.eh_frame` is composed of Common Information Entry (CIE) and Frame Description
Entry (FDE). CIE has these fields:
* `length`
* `CIE_id`: Constant 0. This field is used to distinguish CIE and FDE. In FDE,
this field is non-zero, representing `CIE_pointer`
* `version`: Constant 1
* `augmentation_string`: A string describing the CIE/FDE parameter list. The `P`
character indicates the personality routine pointer; the `L` character
indicates that the augmentation data of the FDE stores the language-specific
data area (LSDA)
* `address_size`: Generally 4 or 8
* `segment_selector_size`: For x86
* `code_alignment_factor`: Assuming that the instruction length is a multiple of
2 or 4 (for RISC), it affects the multiplier of parameters such as
`DW_CFA_advance_loc`
* `data_alignment_factor`: The multiplier that affects parameters such as
`DW_CFA_offset` `DW_CFA_val_offset`
* `return_address_register`
* `augmentation_data_length`
* `augmentation_data`: personality
* `initial_instructions`: bytecode for unwinding, a common prefix used by all
FDEs using this CIE
* padding
Each FDE has an associated CIE. FDE has these fields:
* `length`: The length of FDE itself. If it is `0xffffffff`, the next 8 bytes
(`extended_length`) record the actual length. Unless specially constructed,
`extended_length` is not used
* `CIE_pointer`: Subtract CIE_pointer from the current position to get the
associated CIE
* `initial_location`: The address of the first location described by the FDE.
There is a relocation referring to the section symbol in .o
* `address_range`: initial_location and address_range describe an address range
* `instructions`: bytecode for unwinding, essentially (address,opcode) pairs
* `augmentation_data_length`
* `augmentation_data`: If the associated CIE augmentation contains `L`
characters, language-specific data area will be recorded here
* padding
A CIE may optionally refer to a personality routine in the text section. A FDE
may optionally refer to its associated LSDA in `.gcc_except_table`. The
personality routine and LSDA are used in Level 2: C++ ABI of Itanium C++ ABI.
`.eh_frame` is based on `.debug_frame` introduced in DWARF v2. They have some
differences, though:
* `.eh_frame` has the flag of `SHF_ALLOC` (indicating that a section should be
part of the mirror image in memory) but `.debug_frame` does not, so the latter
has very few usage scenarios.
* `debug_frame` supports DWARF64 format (supports 64-bit offsets but the volume
will be slightly larger) but `.eh_frame` does not support (in fact, it can be
expanded, but lacks demand)
* There is no augmentation_data_length and augmentation_data in the CIE of
`.debug_frame`
* The version field in CIE is different
* The meaning of CIE_pointer in FDE is different. `.debug_frame` indicates a
section offset (absolute) and `.eh_frame` indicates a relative offset. This
change made by `.eh_frame` is great. If the length of `.eh_frame` exceeds
32-bit, `.debug_frame` has to be converted to DWARF64 to represent
`CIE_pointer`, and relative offset does not need to worry about this issue (if
the distance between FDE and CIE exceeds 32-bit, add a CIE OK)
For the following function:
```c
void f() {
__builtin_unwind_init();
}
```
The compiler produces `.cfi_*` (CFI directives) to annotate the assembly,
`.cfi_startproc` and `.cfi_endproc` annotate the FDE area, and other CFI directives
describe CFI instructions. A call frame is indicated by an address on the
stack. This address is called Canonical Frame Address (CFA), and is usually the
stack pointer value of the call site. The following example demonstrates the
usage of CFI instructions:
```
f:
# At the function entry, CFA = rsp+8
.cfi_startproc
# %bb.0:
pushq %rbp
# Redefine CFA = rsp+16
.cfi_def_cfa_offset 16
# rbp is saved at the address CFA-16
.cfi_offset %rbp, -16
movq %rsp, %rbp
# CFA = rbp+16. CFA does not needed to be redefined when rsp changes
.cfi_def_cfa_register %rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
# rbx is saved at the address CFA-56
.cfi_offset %rbx, -56
.cfi_offset %r12, -48
.cfi_offset %r13, -40
.cfi_offset %r14, -32
.cfi_offset %r15, -24
popq %rbx
popq %r12
popq %r13
popq %r14
popq %r15
popq %rbp
# CFA = rsp+8
.cfi_def_cfa %rsp, 8
retq
.Lfunc_end0:
.size f, .Lfunc_end0-f
.cfi_endproc
```
The assembler parses CFI directives and generates `.eh_frame` (this mechanism was
introduced by Alan Modra in 2003). Linker collects `.eh_frame` input sections in
.o/.a files to generate output `.eh_frame`. In 2006, GNU as introduced
`.cfi_personality` and `.cfi_lsda`.
### `.eh_frame_hdr` and `PT_EH_FRAME`
To locate the FDE where a pc is located, you need to scan `.eh_frame` from the
beginning to find the appropriate FDE (whether the pc falls in the interval
indicated by initial_location and address_range). The time spent is
proportional to the number of scanned CIE and FDE records.
https://sourceware.org/pipermail/binutils/2001-December/015674.html introduced
`.eh_frame_hdr`, a binary search index table describing (`initial_location`, FDE
address) pairs.
The linker collects all `.eh_frame` input sections. With `--eh-frame-hdr`, `ld`
generates `.eh_frame_hdr` and creates a program header `PT_EH_FRAME` to describe
`.eh_frame_hdr`. An unwinder can parse the program headers and look for
`PT_EH_FRAME` to locate `.eh_frame_hdr`. Please check out the example below.
### `__register_frame_info`
Before `.eh_frame_hdr` and `PT_EH_FRAME` were invented, there was a static
constructor `frame_dummy` in crtbegin (`crtstuff.c`): calling
`__register_frame_info` to register the executable file `.eh_frame`.
Now `__register_frame_info` is only used by programs linked with `-static`.
Correspondingly, if you specify `-Wl,--no-eh-frame-hdr` when linking, you cannot
unwind (if you use a C++ exception, the program will call `std::terminate`).
### libunwind example
```c
#include <libunwind.h>
#include <stdio.h>
void backtrace() {
unw_context_t context;
unw_cursor_t cursor;
// Store register values into context.
unw_getcontext(&context);
// Locate the PT_GNU_EH_FRAME which contains PC.
unw_init_local(&cursor, &context);
size_t rip, rsp;
do {
unw_get_reg(&cursor, UNW_X86_64_RIP, &rip);
unw_get_reg(&cursor, UNW_X86_64_RSP, &rsp);
printf("rip: %zx rsp: %zx\n", rip, rsp);
} while (unw_step(&cursor) > 0);
}
void bar() {backtrace();}
void foo() {bar();}
int main() {foo();}
```
If you use llvm-project/libunwind
```sh
$CC a.c -Ipath/to/include -Lpath/to/lib -lunwind
```
If you use nongnu.org/libunwind, there are two options: (a) Add `#define
UNW_LOCAL_ONLY` before `#include <libunwind.h>` (b) Link one more library, on
x86-64 it is `-l:libunwind-x86_64.so`. If you use Clang, you can also use `clang
--rtlib=compiler-rt --unwindlib=libunwind -I path/to/include a.c`, in addition
to providing `unw_*`, it can ensure that `libgcc_s.so` is not linked
* `unw_getcontext`: Get register value (including PC)
* `unw_init_local`
* Use `dl_iterate_phdr` to traverse executable files and shared objects, and
find the `PT_LOAD` program header that contains the PC
* Find the `PT_EH_FRAME`(`.eh_frame_hdr`) of the module where you are, and
save it in cursor
* `unw_step`
* Binary search for the `.eh_frame_hdr` item corresponding to the PC, record
the FDE found and the CIE it points to
* Execute `initial_instructions` in CIE
* Execute the instructions (bytecode) in FDE. An automaton maintains the
current location and CFA. Among the instructions, `DW_CFA_advance_loc`
advances the location; `DW_CFA_def_cfa_*` updates CFA; `DW_CFA_offset`
indicates that the value of a register is stored at CFA+offset
* The automaton stops when the current location is greater than or equal to
PC. In other words, the executed instruction is a prefix of FDE instructions
An unwinder locates the applicable FDE according to the program counter, and
executes all the CFI instructions before the program counter.
There are several important
* `DW_CFA_def_cfa_*`
* `DW_CFA_offset`
* `DW_CFA_advance_loc`
A `-DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=X86` clang, `.text`
51.7MiB, `.eh_frame` 4.2MiB, `.eh_frame_hdr` 646, 2 CIE, 82745 FDE.
### Remarks
CFI instructions are suitable for the compiler to generate code, but cumbersome
to write in hand-written assembly. In 2015, Alex Dowad contributed an awk
script to musl libc to parse the assembly and automatically generate CFI
directives. In fact, generating precise CFI instructions is challenging for
ompilers as well. For a function that does not use a frame pointer, adjusting
SP requires outputting a CFI directive to redefine CFA. GCC does not parse
inline assembly, so adjusting SP in inline assembly often results in imprecise
CFI.
```c
void foo() {
asm("subq $128, %rsp\n"
// Cannot unwind if -fomit-leaf-frame-pointer
"nop\n"
"addq $128, %rsp\n");
}
int main() {
foo();
}
```
The CFIInstrInserter pass in LLVM can insert `.cfi_def_cfa_*` `.cfi_offset`
`.cfi_restore` to adjust the CFA and callee-saved registers.
The DWARF scheme also has very low information density. The various compact
unwind schemes have made improvement on this aspect. To list a few issues:
* CIE `address_size`: nobody uses different values for an architecture. Even if
they do (ILP32 ABIs in AArch64 and x86-64), the information is already
available elsewhere.
* CIE `segment_selector_size`: It is nice that they cared x86, but x86 itself
does not need it anymore :/
* CIE `code_alignment_factor` and `data_alignment_factor`: A RISC architecture
with such preference can hard code the values.
* CIE `return_address_register`: I do not know when an architecture wants to
use a different register for the return address.
* `length`: The DWARF's 8-byte form is definitely overengineered... For standard
form prologue/epilogue, the field should not be needed.
* `initial_location` and `address_range`: if a binary search index table is
always needed, why do we need the length field?
* `instructions`: bytecode is flexible but commonly a function
prologue/epilogue is of a standard form and the few callee-saved registers
can be encoded in a more compact way.
* `augmentation_data`: While this provide flexibility, in practice very rarely
a function needs anything more than a personality and a LSDA pointer.
Callee-saved registers other than FP are oftentimes unneeded but there is no
compiler option to drop them.
## `SHT_X86_64_UNWIND`
`.eh_frame` has special processing in linker/dynamic loader, so conventionally
it should use a separate section type, but `SHT_PROGBITS` was used in the
design. In the x86-64 psABI, the type of `.eh_frame` is `SHT_X86_64_UNWIND`
(influenced by Solaris).
* In GNU as, `.section .eh_frame,"a",@unwind` will generate `SHT_X86_64_UNWIND`,
and `.cfi_*` will generate `SHT_PROGBITS`.
* Since Clang 3.8, `.cfi_*` generates `SHT_X86_64_UNWIND`
`.section .eh_frame,"a",@unwind` is rare (glibc's x86 port, libffi, LuaJIT and
other packages), so checking the type of `.eh_frame` is a good way to
distinguish Clang/GCC object file :) For LLD 11.0.0, I contributed
https://reviews.llvm.org/D85785 to allow mixed types for `.eh_frame` in a
relocatable link ;-)
Suggestion to future architectures: When defining processor-specific section
types, please do not use 0x70000001
(`SHT_ARM_EXIDX=SHT_IA_64_UNWIND=SHT_PARISC_UNWIND=SHT_X86_64_UNWIND=SHT_LOPROC+1`)
for purposes other than unwinding :) `SHT_CSKY_ATTRIBUTES=0x70000001` :)
### Linker perspective
Usually in the case of COMDAT group and `-ffunction-sections`,
`.data`/`.rodata` needs to be split like `.text`, but `.eh_frame` is
monolithic. Like many other metadata sections, the main problem with the
monolithic section is that garbage collection is challenging in the linker.
Unlike some other metadata sections, simply abandoning garbage collecting is
not a choice: `.eh_frame_hdr` is a binary search index table and
duplicate/unused entries can confuse the customers.
When a linker processes `.eh_frame`, it needs to conceptually split `.eh_frame`
into CIE/FDE. During `--gc-sections`, the conceptual reference relationship is
reversed considering the actual relocation: a FDE has a relocation referencing
the text section; during GC, if the pointed text section is discarded, the FDE
that references it should also be discarded.
LLD has some special handling for `.eh_frame`:
* `-M` requires special code
* `--gc-sections` occurs before `.eh_frame` deduplication/GC. The personality
in a CIE is a valid reference. However, `initial_location` in FDE should be
ignored. Moreover, a LSDA reference in a FDE in a section group should be
ignored.
* In a relocatable link, a relocation from `.eh_frame` to a `STT_SECTION`
symbol in a discarded section (due to COMDAT group rule) should be allowed
(normally such a `STB_LOCAL` relocation from outside of the group is
disallowed).
## Compact unwind descriptors
On macOS, Apple designed the compact unwind descriptors mechanism to accelerate
unwinding. In theory, this technique can be used to save some space in
`__eh_frame`, but it has not been implemented. The main idea is:
* The FDE of most functions has a fixed mode (specify CFA at the prologue,
store callee-saved registers), and the FDE instructions can be compressed to
32-bit.
* Personality/lsda described by CIE/FDE augmentation data is very common and
can be extracted as a fixed field.
Only 64-bit will be discussed below. A descriptor occupies 32 bytes
```
.quad _foo
.set L1, Lfoo_end-_foo
.long L1
.long compact_unwind_description
.quad personality
.quad lsda_address
```
If you study `.eh_frame_hdr` (binary search index table) and `.ARM.exidx`, you
can know that the length field is redundant.
The Compact unwind descriptor is encoded as:
```c
uint32_t : 24; // vary with different modes
uint32_t mode : 4;
uint32_t flags : 4;
```
Five modes are defined:
* 0: reserved
* 1: FP-based frame: RBP is frame pointer, frame size is variable
* 2: SP-based frame: frame pointer is not used, frame size is fixed during
compilation
* 3: large SP-based frame: frame pointer is not used, the frame size is fixed
at compile time but the value is large and cannot be represented by mode 2
* 4: DWARF CFI escape
### FP-based frame (`UNWIND_MODE_BP_FRAME`)
The compact unwind encoding is:
```c
uint32_t regs : 15;
uint32_t : 1; // 0
uint32_t stack_adjust : 8;
uint32_t mode : 4;
uint32_t flags : 4;
```
The callee-saved registers on x86-64 are: RBX, R12, R13, R14, R15, RBP. 3 bits
can encode a register, 15 bits are enough to represent 5 registers except RBP
(whether to save and where). `stack_adjust` records the extra stack space outside
the save register.
### SP-based frame (`UNWIND_MODE_STACK_IMMD`)
The compact unwind encoding is:
```c
uint32_t reg_permutation : 10;
uint32_t cnt : 3;
uint32_t : 3;
uint32_t size : 8;
uint32_t mode : 4;
uint32_t flags : 4;
```
`cnt` represents the number of saved registers (maximum 6). `reg_permutation`
indicates the sequence number of the saved register. `size*8` represents the
stack frame size.
### Large SP-based frame (`UNWIND_MODE_STACK_IND`)
Compact unwind descriptor编码为
```c
uint32_t reg_permutation : 10;
uint32_t cnt : 3;
uint32_t adj : 3;
uint32_t size_offset : 8;
uint32_t mode : 4;
uint32_t flags : 4;
```
Similar to SP-based frame. In particular: the stack frame size is read from the
text section. The RSP adjustment is usually represented by `subq imm, %rsp`, and
`size_offset` is used to represent the distance from the instruction to the
beginning of the function. The actual stack size also includes `adj*8`.
### DWARF CFI escape
If for various reasons, the compact unwind descriptor cannot be expressed, it
must fall back to DWARF CFI.
In the LLVM implementation, each function is represented by only a compact
unwind descriptor. If asynchronous stack unwinding occurs in epilogue, existing
implementations cannot distinguish it from stack unwinding in function body.
Canonical Frame Address will be calculated incorrectly, and the caller-saved
register will be read incorrectly. If it happens in prologue, and the prologue
has other instructions outside the push register and `subq imm, $rsp`, an error
will occur. In addition, if shrink wrapping is enabled for a function, prologue
may not be at the beginning of the function. The asynchronous stack unwinding
from the beginning to the prologue also fails. It seems that most people don't
care about this issue. It may be because the profiler loses a few percentage
points of the profile.
In fact, if you use multiple descriptors to describe each area of a function,
you can still unwind accurately. OpenVMS proposed [\[RFC\] Improving compact
x86-64 compact unwind descriptors](http://lists.llvm.org/pipermail/llvm-dev/2018-January/120741.html)
in 2018, but unfortunately there is no relevant implementation.
### ARM exception handling
Divided into `.ARM.exidx` and `.ARM.extab`
`.ARM.exidx` is a binary search index table, composed of 2-word pairs. The
first word is 31-bit PC-relative offset to the start of the region. The second
word uses the program description more clearly:
```c
if (indexData == EXIDX_CANTUNWIND)
return false; // like an absent .eh_frame entry. In the case of C++ exceptions, std::terminate
if (indexData & 0x80000000) {
extabAddr = &indexData;
extabData = indexData; // inline
} else {
extabAddr = &indexData + signExtendPrel31(indexData);
extabData = read32(&indexData + signExtendPrel31(indexData)); // stored in .ARM.extab
}
```
`tableData & 0x80000000` means a compact model entry, otherwise means a generic
model entry.
`.ARM.exidx` is equivalent to enhanced `.eh_frame_hdr`, compact model is
equivalent to inlining the personality and lsda in `.eh_frame`. Consider the
following three situations:
* If the C++ exception will not be triggered and the function that may trigger
the exception will not be called: no personality is needed, only one
`EXIDX_CANTUNWIND` entry is needed, no `.ARM.extab`
* If a C++ exception is triggered but no landing pad is required: personality
is `__aeabi_unwind_cpp_pr0`, only a compact model entry is needed, no
`.ARM.extab`
* If there is a catch: `__gxx_personality_v0` is required, `.ARM.extab` is
required
`.ARM.extab` is equivalent to the combined `.eh_frame` and `.gcc_except_table`.
### Generic model
```c
uint32_t personality; // bit 31 is 0
uint32_t : 24;
uint32_t num : 8;
uint32_t opcodes[]; // opcodes, variable length
uint8_t lsda[]; // variable length
```
In construction.
## Windows ARM64 exception handling
See https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling, this
is my favorite coding scheme. Support the unwinding of mid-prolog and
mid-epilog. Support function fragments (used to represent unconventional stack
frames such as shrink wrapping).
Saved in two sections `.pdata` and `.xdata`.
```c
uint32_t function_start_rva;
uint32_t Flag : 2;
uint32_t Data : 30;
```
For canonical form functions, Packed Unwind Data is used, and no `.xdata` record
is required; for descriptors that cannot be represented by Packed Unwind Data,
it is stored in `.xdata`.
### Packed Unwind Data
```c
uint32_t FunctionStartRVA;
uint32_t Flag : 2;
uint32_t FunctionLength : 11;
uint32_t RegF : 3;
uint32_t RegI : 4;
uint32_t H : 1;
uint32_t CR : 2;
uint32_t FrameSize : 9;
```
## MIPS compact exception tables
In construction.
## Linux kernel ORC unwind tables
For x86-64, the Linux kernel uses its own unwind tables: ORC. You can find its
documentation on https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html
and there is an lwn.net introduction [The ORCs are coming](https://lwn.net/Articles/728339/).
`objtool orc generate a.o` parses `.eh_frame` and generates `.orc_unwind` and
`.orc_unwind_ip`. For an object file assembled from:
```
.globl foo
.type foo, @function
foo:
ret
```
At two addresses the unwind information changes: the start of foo and the end
of foo, so 2 ORC entries will be produced. If the DWARF CFA changes (e.g. due
to push/pop) in the middle of the function, there may be more entries.
`.orc_unwind_ip` contains two entries, representing the PC-relative addresses.
```
Relocation section '.rela.orc_unwind_ip' at offset 0x2028 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000500000002 R_X86_64_PC32 0000000000000000 .text + 0
0000000000000004 0000000500000002 R_X86_64_PC32 0000000000000000 .text + 1
```
`.orc_unwind` contains two entries of type `orc_entry`. The entries encode how
IP/SP/BP of the previous frame are stored.
```c
struct orc_entry {
s16 sp_offset; // sp_offset and sp_reg encode where SP of the previous frame is stored
s16 bp_offset; // bp_offset and bp_reg encode where BP of the previous frame is stored
unsigned sp_reg:4;
unsigned bp_reg:4;
unsigned type:2; // how IP of the previous frame is stored
unsigned end:1;
} __attribute__((__packed__));
```
You may find similarities in this scheme and `UNWIND_MODE_BP_FRAME` and
`UNWIND_MODE_STACK_IMMD` in Apples's compact unwind descriptors. The ORC scheme
uses 16-bit integers so assumably `UNWIND_MODE_STACK_IND` will not be needed.
During unwinding, most callee-saved registers other than BP are unneeded, so
ORC does not bother recording them.
The linker will resolve relocations in `.orc_unwind_ip` and create
`__start_orc_unwind_ip`/`__stop_orc_unwind_ip`/`__start_orc_unwind`/
`__stop_orc_unwind` delimiter the section contents. Then, a host utility
scripts/sorttable sorts the contents of `.orc_unwind_ip` and `.orc_unwind`. To
unwind a stack frame, `unwind_next_frame`
* performs a binary search into the `.orc_unwind_ip` table to figure out the
relevant ORC entry
* retrieves the previous SP with the current SP, `orc->sp_reg` and
`orc->sp_offset`.
* retrieves the previous IP with `orc->type` and other values.
* retrieves the previous BP with the currrent BP, the previous SP, `orc->bp_reg`
and `orc->bp_offset`. `bp->reg` can be
`ORC_REG_UNDEFINED`/`ORC_REG_PREV_SP`/`ORC_REG_BP`.

558
maskray-2.md Normal file
View File

@ -0,0 +1,558 @@
# All about symbol versioning
In 1995, Solaris' link editor and ld.so introduced the symbol versioning
mechanism. Ulrich Drepper and Eric Youngdale borrowed Solaris symbol versioning
in 1997 and designed the GNU style symbol versioning for glibc.
When a shared object is updated, the behavior of a symbol changes (ABI changes
(such as changing the type of parameters or return values) or behavior
changes), traditionally a `DT_SONAME` bump is required. Otherwise a dependent
application/shared object built with the old version may run abnormally. This
can be inconvenient if the number of dependent applications is large.
Symbol versioning provides backward compatibility without changing `DT_SONAME`.
The following part describes the representation, and then describes the
behaviors from the perspectives of assembler, linker, and ld.so. One may wish
to skip the representation part when reading for the first time.
## Representation
In a shared object or executable file that uses symbol versioning, there are up
to three sections related to symbol versioning. `.gnu.version_r` and
`.gnu.version_d` among them are optional:
* `.gnu.version` (version symbol section). The `DT_VERSYM` tag in the dynamic
table points to the section. Assuming there are N entries in `.dynsym`,
`.gnu.version` contains N `uint16_t` values, with the i-th entry indicating
the version ID of the i-th symbol. Put it another way, `.gnu.version` is a
parallel table to `.dynsym`.
* `.gnu.version_r` (version requirement section). The `DT_VERNEED`/
`DT_VERNEEDNUM` tags in the dynamic table delimiter this section. This
section describes the version information used by the undefined versioned
symbol in the module.
* `.gnu.version_d` (version definition section). The `DT_VERDEF`/`DT_VERDEFNUM`
tags in the dynamic table delimiter this section. This section describes the
version information used by the defined versioned symbols in the module.
```c
// Version definitions
typedef struct {
Elf64_Half vd_version; // version: 1
Elf64_Half vd_flags; // VER_FLG_BASE (index 1) or 0 (index != 1)
Elf64_Half vd_ndx; // version index
Elf64_Half vd_cnt; // number of associated aux entries, always 1 in practice
Elf64_Word vd_hash; // SysV hash of the version name
Elf64_Word vd_aux; // offset in bytes to the verdaux array
Elf64_Word vd_next; // offset in bytes to the next verdef entry
} Elf64_Verdef;
typedef struct {
Elf64_Word vda_name; // version name
Elf64_Word vda_next; // offset in bytes to the next verdaux entry
} Elf64_Verdaux;
// Version needs
typedef struct {
Elf64_Half vn_version; // version: 1
Elf64_Half vn_cnt; // number of associated aux entries
Elf64_Word vn_file; // .dynstr offset of the depended filename
Elf64_Word vn_aux; // offset in bytes to vernaux array
Elf64_Word vn_next; // offset in bytes to next verneed entry
} Elf64_Verneed;
typedef struct {
Elf64_Word vna_hash; // SysV hash of vna_name
Elf64_Half vna_flags; // usually 0; copied from vd_flags of the depended so
Elf64_Half vna_other; // unused
Elf64_Word vna_name; // .dynstr offset of the version name
Elf64_Word vna_next; // offset in bytes to next vernaux entry
} Elf64_Vernaux;
```
Currently GNU ld does not set the `VER_FLG_WEAK` flag. [BZ24718#c15](https://sourceware.org/bugzilla/show_bug.cgi?id=24718#c15) proposed "set
`VER_FLG_WEAK` on version reference if all symbols are weak".
The advantage of using a parallel table for `.gnu.version` is that symbol
versioning is optional. ld.so implementations which do not support symbol
versioning can freely assume no symbol has a version. The behavior is that all
references as if bind to the default version definitions. musl ld.so falls into
this category.
### Version index values
Index 0 is called `VER_NDX_LOCAL`. The binding of the symbol will be changed to
`STB_LOCAL`. Index 1 is called `VER_NDX_GLOBAL`. It has no special effect and
is used for unversioned symbols. Index 2 to 0xffef are used for user defined
versions.
Defined versioned symbols have two forms:
* foo@@v2, the default version.
* foo@v2, a non-default version (hidden version). The `VERSYM_HIDDEN` bit of the
version ID is set.
Undefined versioned symbols have only the `foo@v2` form.
Usually versioned symbols are only defined in shared objects, but executables
can have defined versioned symbols as well. (When a shared object is updated,
the old symbols are retained so that other shared objects do not need to be
relinked, and executable files usually do not provide versioned symbols for
other shared objects to reference.)
### Example
`readelf -V` can dump the symbol versioning tables.
In the `.gnu.version_d` output below:
* Version index 1 (`VER_NDX_GLOBAL`) is the filename (soname if shared object).
The `VER_FLG_BASE` flag is set.
* Version index 2 is a user defined version. Its name is `LUA_5.3`.
In the `.gnu.version_r` output below, each of version indexes 3~10 represents a
version in a depended shared object. The name `GLIBC_2.2.5` appears thrice,
each for a different shared object.
The `.gnu.version` table assigns a version index to each `.dynsym` entry.
```
% readelf -V /usr/bin/lua5.3
Version symbols section '.gnu.version' contains 248 entries:
Addr: 0x0000000000002af4 Offset: 0x002af4 Link: 5 (.dynsym)
000: 0 (*local*) 3 (GLIBC_2.3) 4 (GLIBC_2.2.5) 4 (GLIBC_2.2.5)
004: 5 (GLIBC_2.3.4) 4 (GLIBC_2.2.5) 4 (GLIBC_2.2.5) 4 (GLIBC_2.2.5)
...
Version definition section '.gnu.version_d' contains 2 entries:
Addr: 0x0000000000002ce8 Offset: 0x002ce8 Link: 6 (.dynstr)
000000: Rev: 1 Flags: BASE Index: 1 Cnt: 1 Name: lua5.3
0x001c: Rev: 1 Flags: none Index: 2 Cnt: 1 Name: LUA_5.3
Version needs section '.gnu.version_r' contains 3 entries:
Addr: 0x0000000000002d20 Offset: 0x002d20 Link: 6 (.dynstr)
000000: Version: 1 File: libdl.so.2 Cnt: 1
0x0010: Name: GLIBC_2.2.5 Flags: none Version: 9
0x0020: Version: 1 File: libm.so.6 Cnt: 1
0x0030: Name: GLIBC_2.2.5 Flags: none Version: 6
0x0040: Version: 1 File: libc.so.6 Cnt: 6
0x0050: Name: GLIBC_2.11 Flags: none Version: 10
0x0060: Name: GLIBC_2.14 Flags: none Version: 8
0x0070: Name: GLIBC_2.4 Flags: none Version: 7
0x0080: Name: GLIBC_2.3.4 Flags: none Version: 5
0x0090: Name: GLIBC_2.2.5 Flags: none Version: 4
0x00a0: Name: GLIBC_2.3 Flags: none Version: 3
```
### Symbol versioning in object files
The GNU scheme allows `.symver` directives to label the versions of the symbols
in objec files. The symbol names residing in .o contain `@` or `@@`.
## Assembler behavior
GNU as and LLVM integrated assembler provide implementation.
* `.symver foo, foo@v1`
* If foo is undefined, produce `foo@v1`
* If foo is defined, produce `foo` and `foo@v1` with the same binding
(`STB_LOCAL`, `STB_WEAK`, or `STB_GLOBAL`) and `st_other` value (i.e. the
same visibility). Personally I think this behavior is a design flaw
[{gas-copy}](). The proposed [V4 PATCH gas: Extend .symver directive](https://sourceware.org/pipermail/binutils/2020-April/110622.html)
can address this problem.
* `.symver foo, foo@@v1`
* If foo is undefined, error
* If foo is defined, produce `foo` and `foo@v1` with the same binding and `st_other` value.
* `.symver foo, foo@@@v1`
* If foo is undefined, produce `foo@v1`
* If foo is defined, produce `foo@@v1`
Personal recommendation:
* To define a default version symbol: use `.symver foo, foo@@@v2` so that foo
is not present.
* To define a non-default version symbol, add a suffix to the original symbol
name (`.symver foo_v1, foo@v1`) to prevent conflicts with `foo`. This will
however leave (usually undesirable) `foo_v1`. If you don't strip `foo_v1` from
the object file, you may localize it with a local: pattern in the version
script. With GNU as 2.35 ([PR25295](https://sourceware.org/bugzilla/show_bug.cgi?id=25295)),
you can use `.symver foo_v1, foo@v1, remove`
* The version of an undefined symbol is usually bound at link time. It is
usually unnecessary to set the version with `.symver`. If required, prefer
`.symver foo, foo@@@v1` to `.symver foo, foo@v1`.
## Linker behavior
The linker enters the symbol resolution stage after reading in object files,
archive files, shared objects, LTO files, linker scripts, etc.
GNU ld uses indirect symbol to represent versioned symbols. There are
complicated rules, and these rules are not documented. The symbol resolution
rules that I personally derived:
* Defined `foo` resolves undefined `foo` (traditional unversioned rule)
* Defined `foo@v1` resolves undefined `foo@v1` (a non-default version symbol is
like a separate symbol)
* Defined `foo@@v1` (default version) resolves both undefined `foo` and `foo@v1`
If there are multiple default version definitions (such as `foo@@v1 foo@@v2`),
a duplicate definition error should be issued even if one is weak. Usually a
symbol has zero or one default version (`@@`) definition, and an arbitrary
number of non-default version (`@`) definitions.
If the linker sees undefined `foo` and `foo@v1` first, it will treat them as
two symbols. When the linker see the definition `foo@@v1`, conceptually `foo`
and `foo@@v1` should be combined. If the linker sees `foo@@v2` instead,
`foo@@v2` should resolve `foo` and `foo@v1` should be a separate symbol.
* [Combining Versions](combining-versions.md) describes the problem.
* `gold/symtab.cc Symbol_table::define_default_version` uses a heuristic rule
to solve this problem. It special cases on visibility, but I feel that this
rule is unneeded.
* Before 2.26, GNU ld reported a bogus multiple definition error for defined
weak `foo@@v1` and defined global `foo@v1` [PR ld/26978](https://sourceware.org/bugzilla/show_bug.cgi?id=26978)
* Before 2.26, GNU ld had a bug that the visibility of undefined `foo@v1` does
not affect the output visibility of `foo@@v1`: [PR ld/26979](https://sourceware.org/bugzilla/show_bug.cgi?id=26979)
* I fixed the object file side problem of LLD 12.0 in https://reviews.llvm.org/D92259
`foo` Archive files and lazy object files may still have incompatibility issues.
When LLD sees a defined `foo@@v`, it adds both `foo` and `foo@v1` into the
symbol table, thus `foo@@v1` can resolve both undefined `foo` and `foo@v1`.
After processing all input files, a pass iterates symbols and redirects
`foo@v1` to `foo@@v1`. Becase LLD treats them as separate symbols during input
processing, a defined `foo@v` cannot suppress the extraction of an archive
member defining `foo@@v1`, leading to a behavior incompatible with GNU ld. This
probably does not matter, though.
GNU ld has another strange behavior: if both `foo` and `foo@v1` are defined, `foo`
will be removed. I strongly believe it is an issue in GNU ld but the maintainer
rejected [PR ld/27210](https://sourceware.org/bugzilla/show_bug.cgi?id=27210).
## Version script
To define a versioned symbol in a shared object or an executable, a version
script must be specified. If all versioned symbols are undefined, then the
version script can be omitted.
```
# Make all symbols other than foo and bar local.
{ global: foo; bar; local: *; };
# Assign version FBSD_1.0 to malloc and version FBSD_1.3 to mallocx,
# and make internal local.
FBSD_1.0 { malloc; local: internal; };
FBSD_1.3 { mallocx; };
```
A version script has three purposes:
* Define versions.
* Specify some patterns so that matched defined symbols (which do not have `@`
in the name) are tied to the specified version.
* Scope reduction: for a defined unversioned symbol matched by a `local:`
pattern, its binding will be changed to `STB_LOCAL` and will not be exported
to the dynamic symbol table.
A version script can consist of one anonymous version tag (`{...};`) or a list of
named version tags (`v1 {...};`). If you use an anonymous version tag with other
version tags, GNU ld will error: `anonymous version tag cannot be combined with
other version tags`. A `local:` part can be placed in any version tag. Which
version tag is used does not matter.
If a defined symbol is matched by multiple version tags, the following
precedence rules apply (`binutils-gdb/bfd/linker.c:find_version_for_sym`):
* The first version tag with an exact pattern (i.e. there is no wildcard) wins.
* Otherwise, the last version tag with a non-`*` wildcard pattern wins.
* Otherwise, the first version tag with a `*` pattern wins.
The gotcha is that `**` is a wildcard pattern which matches any symbol but its
precedence is higher than `*`.
Most patterns are exact so gold and LLD iterate patterns instead of symbols to
improve performance.
## How a versioned symbol is produced
An undefined symbol can be assigned a version if:
* its name does not contain `@` (`.symver` is unused) and a shared object
provides a default version definition.
* its name contains `@` and a shared object defines the symbol. GNU ld errors
if there is no such a shared object. After https://reviews.llvm.org/D92260,
LLD will report an error as well.
A defined symbol can be assigned a version if:
* its name does not contain `@` and it is matched by a pattern in a named version tag in a version script.
* its name contains `@`
* If `-shared`, the version should be defined by a version script, otherwise
GNU ld errors version node not found for symbol. This exception looks
strange to me so I have filed [PR ld/26980](https://sourceware.org/bugzilla/show_bug.cgi?id=26980).
* If `-no-pie` or `-pie`, a version definition is unneeded in GNU ld. This
behavior is strange.
## ld.so behavior
/Linux Standard Base Core Specification, Generic Part/ describes the behavior
of ld.so. Kan added symbol versioning support to FreeBSD rtld in 2005.
The `DT_VERNEED` and `DT_VERNEEDNUM` tags in the dynamic table delimiter the
version requirement by a shared object/executable file: the requires versions
and required shared object names (`Vernaux::vna_name`).
For each Vernaux entry (a Verneed's auxilliary entry) without the
`VER_FLG_WEAK` bit, ld.so checks whether the referenced shared object has the
`DT_VERDEF` table. If no, ld.so handles the case as a graceful degradation; if
yes and the table does not define the version, ld.so reports an error.
[verneed-check]
Usually a minor release does not bump soname. Suppose that libB.so depends on
the libA 1.3 (soname is libA.so.1) and calls an function which does not exist
in libA 1.2. If PLT lazy binding is used, libB.so may seem to work on a system
with libA 1.2, until the PLT of the 1.3 symbol is called. If symbol versioning
is not used and you want to solve this problem, you have to record the minor
version number (`libA.so.1.3`) in the soname. However, bumping soname is
all-or-nothing: all the dependent shared objects need to be relinked. If symbol
versioning is used, you can continue to use the soname `libA.so.1`. ld.so will
report an error if libA 1.2 is used, because the 1.3 version required by
libB.so does not exist.
In the symbol resolution stage:
* An undefined foo can be resolved to a definition of `foo` or `foo@@v2` (only
the definitions with index number 1 (`VER_NDX_GLOBAL`) and 2 are used in the
reference match).
* An undefined `foo@v1` can be resolved to a definition of `foo`, `foo@v1`, or
`foo@@v1`.
Note (undefined `foo` resolving to `foo@v1`) is allowed by ld.so but not
allowed by the linker [{reject-non-default}](). This difference provides a
mechanism to refuse linking against old symbols while keeping compatibility
with unversioned old libraries. If a new version of a shared object needs to
deprecate an unversioned `bar`, you can remove bar and define `bar@compat`
instead. Libraries using `bar` are unaffected but new links against `bar` are
disallowed.
## Upgraded symbols in glibc
Note that GNU nm before binutils 2.35 does not display `@` or `@@`.
```
nm -D /lib/x86_64-linux-gnu/libc.so.6 | \
awk '$2!="U" {i=index($3,"@"); if(i){v=substr($3,i); $3=substr($3,1,i-1); m[$3]=m[$3]" "v}} \
END {for(f in m)if(m[f]~/@.+@/)print f, m[f]}'
```
The output on my x86-64 system:
```
pthread_cond_broadcast @GLIBC_2.2.5 @@GLIBC_2.3.2
clock_nanosleep @@GLIBC_2.17 @GLIBC_2.2.5
_sys_siglist @@GLIBC_2.3.3 @GLIBC_2.2.5
sys_errlist @@GLIBC_2.12 @GLIBC_2.2.5 @GLIBC_2.3 @GLIBC_2.4
quick_exit @GLIBC_2.10 @@GLIBC_2.24
memcpy @@GLIBC_2.14 @GLIBC_2.2.5
regexec @GLIBC_2.2.5 @@GLIBC_2.3.4
pthread_cond_destroy @GLIBC_2.2.5 @@GLIBC_2.3.2
nftw @GLIBC_2.2.5 @@GLIBC_2.3.3
pthread_cond_timedwait @@GLIBC_2.3.2 @GLIBC_2.2.5
clock_getres @GLIBC_2.2.5 @@GLIBC_2.17
pthread_cond_signal @@GLIBC_2.3.2 @GLIBC_2.2.5
fmemopen @GLIBC_2.2.5 @@GLIBC_2.22
pthread_cond_init @GLIBC_2.2.5 @@GLIBC_2.3.2
clock_gettime @GLIBC_2.2.5 @@GLIBC_2.17
sched_setaffinity @GLIBC_2.3.3 @@GLIBC_2.3.4
glob @@GLIBC_2.27 @GLIBC_2.2.5
sys_nerr @GLIBC_2.2.5 @GLIBC_2.4 @@GLIBC_2.12 @GLIBC_2.3
_sys_errlist @GLIBC_2.3 @GLIBC_2.4 @@GLIBC_2.12 @GLIBC_2.2.5
sys_siglist @GLIBC_2.2.5 @@GLIBC_2.3.3
clock_getcpuclockid @GLIBC_2.2.5 @@GLIBC_2.17
realpath @GLIBC_2.2.5 @@GLIBC_2.3
sys_sigabbrev @GLIBC_2.2.5 @@GLIBC_2.3.3
posix_spawnp @@GLIBC_2.15 @GLIBC_2.2.5
posix_spawn @@GLIBC_2.15 @GLIBC_2.2.5
_sys_nerr @@GLIBC_2.12 @GLIBC_2.4 @GLIBC_2.3 @GLIBC_2.2.5
nftw64 @GLIBC_2.2.5 @@GLIBC_2.3.3
pthread_cond_wait @GLIBC_2.2.5 @@GLIBC_2.3.2
sched_getaffinity @GLIBC_2.3.3 @@GLIBC_2.3.4
clock_settime @GLIBC_2.2.5 @@GLIBC_2.17
glob64 @@GLIBC_2.27 @GLIBC_2.2.5
```
* `realpath@@GLIBC_2.3`: the previous version returns `EINVAL` when the second
parameter is NULL
* `memcpy@@GLIBC_2.14` [BZ12518](https://sourceware.org/bugzilla/show_bug.cgi?id=12518):
the previous version guarantees a forward copying behavior. Shockwave Flash
at that time had a "memcpy downward" bug which required the workaround.
* `quick_exit@@GLIBC_2.24` [BZ20198](https://sourceware.org/bugzilla/show_bug.cgi?id=20198):
the previous version copies the destructors of `thread_local` objects.
* `glob64@@GLIBC_2.27`: the previous version does not follow dangling symlinks.
## How to remove symbol versioning
Imagine that you want to build an application with a prebuilt shared object
which has versioned references, but you can only find shared objects providing
the unversioned definitions. The linker will helpfully error:
```
ld.lld: error: undefined reference to foo@v1 [--no-allow-shlib-undefined]
```
As the diagnostic suggests, you can add `--allow-shlib-undefined` to get rid of
the error. It is not recommended but the built application may happen to work.
For this case, an alternative hacky solution is:
```
# 32-bit
cp in.so out.so
r2 -wqc '/x feffff6f00000000 @ section..dynamic; w0 16 @ hit0_0' out.so
llvm-objcopy -R .gnu.version out.so
# 64-bit
cp in.so out.so
r2 -wqc '/x feffff6f @ section..dynamic; w0 8 @ hit0_0' out.so
llvm-objcopy -R .gnu.version out.so
```
With the removal of `.gnu.version`, the linker will think that `out.so`
references foo instead of `foo@v1`. However, llvm-objcopy will zero out the
section contents. At runtime, glibc ld.so will complain unsupported version 0
of Verneed record. To make glibc happy, you can delete `DT_VER*` tags from the
dynamic table. The above code snippet uses an r2 command to locate
`DT_VERNEED(0x6ffffffe)` and rewrite it to `DT_NULL`(a `DT_NULL` entry stops
the parsing of the dynamic table). The difference of the `readelf -d` output is
roughly:
```
0x000000006ffffffb (FLAGS_1) Flags: NOW
- 0x000000006ffffffe (VERNEED) 0x8ef0
- 0x000000006fffffff (VERNEEDNUM) 5
- 0x000000006ffffff0 (VERSYM) 0x89c0
- 0x000000006ffffff9 (RELACOUNT) 1536
0x0000000000000000 (NULL) 0x0
```
## LLD
* If an undefined symbol is not defined by a shared object, GNU ld will report
an error. LLD before 12.0 did not error (I fixed it in
https://reviews.llvm.org/D92260).
## Remarks
GCC/Clang supports asm specifier and `#pragma redefine_extname` renaming a
symbol. For example, if you declare `int foo() asm("foo_v1");` and then
reference `foo`, the symbol in .o will be `foo_v1`.
For example, the biggest change in musl v1.2.0 is the time64 support for its
supported 32-bit architectures. musl adopted a scheme based on asm specifiers:
```c
// include/features.h
#define __REDIR(x,y) __typeof__(x) x __asm__(#y)
// API header include/sys/time.h
int utimes(cosnt char *, const struct timeval [2]);
__REDIR(utimes, __utimes_time64);
// Implementation src/linux/utimes.c
int utimes(const char *path, const struct timeval times[2]) { ... }
// Internal header compat/time32/time32.h
int __utimes_time32() __asm__("utimes");
// Compat implementation compat/time32/utimes_time32.c
int __utimes_time32(const char *path, const struct timeval32 times32[2]) { ... }
```
* In .o, the time32 symbol remains `utimes` and is compatible with the ABI
required by programs linked against old musl versions; the time64 symbol is
`__utimes_time64`.
* The public header redirects utimes to `__utimes_time64`.
* cons: if the user declares utimes by themself, they will not link against
the correct `__utimes_time64`.
* The "good-looking" name `utimes` is used for the preferred time64
implementation internally and the "ugly" name `__utimes_time32` is used for
the legacy time32 implementation.
* If the time32 implementation is called elsewhere, the "ugly" name can make
it stand out.
For the above example, here is an implementation with symbol versioning:
```c
// API header include/sys/time.h
int utimes(cosnt char *, const struct timeval [2]);
// Implementation src/linux/utimes.c
int utimes(const char *path, const struct timeval times[2]) { ... }
// Internal header compat/time32/time32.h
// Probably __asm__(".symver __utimes_time32, utimes@time32, rename"); if supported
__asm__(".symver __utimes_time32, utimes@time32");
// Implementation compat/time32/utimes_time32.c
int __utimes_time32(const char *path, const struct timeval32 times32[2])
{
...
}
```
Note that it is `@@@` cannot be used. The header is included in a defining
translation unit and `@@@` will lead to a default version definition while we
want a non-default version definition.
According to Assembler behavior, the undesirable `__utimes_time32` is present.
Be careful to use a version script to localize it.
So what is the significance of symbol versioning? I think carefully:
* Refuse linking against old symbols while keeping compatibility with
unversioned old libraries. [{reject-non-default}]()
* No need to label declarations.
* The version definition can be delayed until link time. The version script
provides a flexible pattern matching mechanism to assign versions.
* Scope reduction. Arguably another mechanism like `--dynamic-list` might have
been developed if version scripts did not provide `local:`.
* There are some semantic issues in renaming builtin functions with asm
specifiers in GCC and Clang (they do not know that the renamed symbol has
built-in semantic). See [2020-10-15-intra-call-and-libc-symbol-renaming](https://maskray.me/blog/2020-10-15-intra-call-and-libc-symbol-renaming)
* [verneed-check]
For the first item, the asm specifier scheme uses conventions to prevent
problems (users should include the header); and symbol versioning can be forced
by ld.
Design flaws:
* `.symver foo, foo@v1` In foobehavior defined [{gas-copy}](): reserved symbol
`foo`(redundant symbol has a link), binding / `st_other`sync (not convenient
to set different binding / visibility)
* Verdaux is a bit redundant. In practice, one Verdef has only one auxilliary
Verdaux entry.
* This is arguably a minor problem but annoying for a framework providing
multiple shared objects. ld.so requires "a versioned symbol is implemented in
the same shared object in which it was found at link time", which disallows
moving definitions between shared objects. Fortunately, glibc 2.30 [BZ24741](http://sourceware.org/PR24741)
relaxes this requirement, essentially ignoring `Vernaux::vna_name`.
Before that, glibc used a forwarder to move `clock_*` functions from librt.so
to libc.so:
```c
// rt/clock-compat.c
__typeof(clock_getres) *clock_getres_ifunc(void) asm("clock_getres");
__typeof(clock_getres) *clock_getres_ifunc(void) { return &__clock_getres; }
```
libc.so defines `__clock_getres` and `clock_getres`. librt.so defines an ifunc
called `clock_getres` which forwards to libc.so `__clock_getres`.
## Related links
* [Combining Versions](combining-versions.md)
* [Version Scripts](version-scripts.md)
* https://invisible-island.net/ncurses/ncurses-mapsyms.html

1050
maskray-3.md Normal file

File diff suppressed because it is too large Load Diff

371
maskray-4.md Normal file
View File

@ -0,0 +1,371 @@
# LLD and GNU linker incompatibilities
Subtitle: Is LLD a drop-in replacement for GNU ld?
The motivation for this article was someone challenging the "drop-in
replacement" claim on LLD's website (the discussion was about Linux-like ELF
toolchain):
> LLD is a linker from the LLVM project that is a drop-in replacement for
> system linkers and runs much faster than them. It also provides features that
> are useful for toolchain developers.
99.9% pieces of software work with LLD without a change. Some linker script
applications may need an adaption (such adaption is oftentimes due to brittle
assumptions: asking too much from GNU ld's behavior which should be fixed
anyway). So I defended for this claim.
Piotr Kubaj said that this is a probably more of a marketing term than a
technical term, the term tries to lure existing users into thinking "it's the
same you know, but better!". I think that this is fair in some senses: for many
applications LLD has achieved much faster speed and much lower memory usage
than GNU ld. A more important thing is that LLD adds a third choice to the
spectrum. It brings competitive pressure to both sides, gives incentive for
improvement, and makes for more standardized future features/extensions. One
reason that I am subscribed to the binutils mailing list is I want to
participate in its design processes (I am proud to say that I have managed to
find some early issues of various new things).
Anyway, I thought documenting the compatibility problems between the ELF ports
of LLD and GNU ld is useful, not only to others but also to my future self,
hence this article. I will try to describe GNU gold behaviors as well.
So here is the long list. Please keep in mind that many compatibility issues do
not really matter and a user may never run into such an issue. Many of them
just serve as educational purposes and my personal reference. There some some
user perceivable differences but quite a lot are WONTFIX on both GNU ld and
LLD. LLD, as a newer linker, has less legacy compatibility burden and can make
good default choices in some cases and say no to some unneeded
features/behaviors. A large number of features are duplicated in GNU ld's
various ports. It is also common that one thing behaves this way in port A and
another way in port B.
* GNU ld reports `gc-sections requires either an entry or an undefined symbol`
in a -r --gc-section link. LLD doesn't error
(https://reviews.llvm.org/D84131#2162411). I am unsure whether such a
diagnostic will be useful (an uncommon use case where the GC roots are more
than the explict linker options).
* The default image base for `-no-pie` links is different. For example, on
x86-64, GNU ld defaults to 0x400000 while LLD defaults to 0x200000.
* GNU ld synthesizes a `STT_FILE` symbol when copying non-`STT_SECTION`
`STB_LOCAL` symbols. LLD doesn't.
* The `STT_FILE` symbol name is the input filename. For compiler driver
specified startup files like `crti.o` and `crtn.o`, their absolute paths
will end up in the linked image. This breaks local determinism (toolchain
paths are leaked) for some users.
* I filed https://bugs.llvm.org/show_bug.cgi?id=48023 and
https://sourceware.org/bugzilla/show_bug.cgi?id=26822. From binutils 2.36
onwards, the base name will be used.
* Text relocations.
* In GNU ld, `-z notext`/`-z text`/unspecified are a tri-state. For
`-z notext`/unspecified, the dynamic tags `DT_TEXTREL` and `DF_TEXTREL` are
added on demand. If unspecified and GNU ld is configured with
`--enable-textrel-check=warning`, a warning will be issued.
* LLD has two states and add `DT_TEXTREL` and `DF_TEXTREL` if `-z notext` is specified.
* GNU ld supports more relocation types as text relocations.
* Default library paths.
* GNU ld has default library paths.
* LLD doesn't. This is intentional so https://reviews.llvm.org/D70048
(NetBSD) cannot be accepted.
* GNU ld supports grouped short options. This can sometimes cause surprising
behaviors with misspelled or unimplemented options, e.g. `-no-pie` means
`-n -o -pie` because GNU ld as of 2.35 has not implemented `-no-pie`. Nick
Clifton committed `Update the BFD linker so that it deprecates grouped short
options.` to deprecated the GNU ld feature. LLD never supports grouped short
options.
* Mixed `SHF_LINK_ORDER` and non-`SHF_LINK_ORDER` input sections in an output
section.
* LLD performs sorting within an input section description and allows
arbitrary mixes.
* GNU ld does not allow mixed sections
https://sourceware.org/bugzilla/show_bug.cgi?id=26256 (H.J. Lu has a patch)
* LLD defaults to `-z relro` by default. This is probably not a good default
but it is difficult to change now. I have a comment
https://bugs.llvm.org/show_bug.cgi?id=48549. GNU ld warns for `-z relro` and
`-z norelro` for non Linux/FreeBSD BFD emulations (e.g. `-m aarch64elf`).
* Different archive member extraction semantics. See
http://lld.llvm.org/ELF/warn_backrefs.html for details.
* LLD `--warn-backrefs` warns for `def.a ref.o def.so` if `def.a` cannot
satisfy previous unresolved symbols. LLD resolves the definition to `def.a`
while GNU linkers resolve the definition to `def.so`.
* GNU ld `-static` has traditionally been a synonym to `-Bstatic`. Recently on
x86 it has been changed to behave a bit similar to gold `-static`, which
disallows linking against shared objects. LLD `-static` is still a synonym to
`-Bstatic`.
* GNU linkers have a default `--dynamic-linker`. LLD doesn't.
* GNU linkers warn for `.gnu.warning.*` sections. LLD doesn't. It is unclear
the feature is useful. https://bugs.llvm.org/show_bug.cgi?id=42008
* GNU ld has architecture-specific rules for relocations referencing undefined
weak symbols. I don't think the GNU ld behaviors can be summarized (even by
maintainers!). LLD's are consistent.
* The conditions to create `.interp` are different. I believe GNU ld's is quite
difficult to describe.
* `--no-allow-shlib-undefined` and `--rpath-link`
* GNU ld traces all shared objects (transitive `DT_NEEDED` dependencies) and
emulates the bheavior of a dynamic loader to warn more cases.
* gold and LLD implement a simplified version. They warn for shared objects
whose `DT_NEEDED` dependencies are all seen as input files.
* `--fatal-warnings`
* GNU ld still reports warning: ....
* LLD switches to error: ....
* `--no-relax`
* GNU ld: disable `R_X86_64_[REX_]GOTPCRELX`
* LLD: no-op (https://reviews.llvm.org/D81359)
* LLD places `.rodata` (among other `SHF_ALLOC` and
non-`SHF_WRITE`-non-`SHF_EXECINSTR` sections) before .text (among other
`SHF_ALLOC` and `SHF_EXECINSTR` sections).
* `.symtab`/`.shstrtab`/`.strtab` in a linker script.
* Ignored by GNU ld, therefore `--orphan-handling=` does not warn/error.
* Respected by LLD
* Whether `ADDR(.foo)` in a linker script can retain an empty output section.
* GNU ld: no. Symbol assignments relative to such empty sections may have
strange `st_shndx`.
* LLD: yes.
* If an undefined symbol is referenced by both `R_X86_64_JUMP_SLOT` (lazy) and
R_X86_64_GLOB_DAT (`non-lazy`)
* GNU ld generates `.plt.got` with `R_X86_64_GLOB_DAT` relocations.
`R_X86_64_JUMP_SLOT` can thus be omitted to decrease the number of dynamic
relocations.
* LLD does not implement this saving. This naturally requires more than one
pass scanning relocations which LLD doesn't do at present. https://bugs.llvm.org/show_bug.cgi?id=32938
* GNU ld relaxes `R_X86_64_GOTPCREL` relocations with some forms (e.g.
`movq foo@GOTPCREL(%rip), %reg` -&gt; `leaq foo(%rip), %reg`). LLD never
relaxes `R_X86_64_GOTPCREL` relocations.
* GNU linkers give `.gnu.linkonce*` sections COMDAT section semantics. LLD
simply ignores such sections. https://bugs.llvm.org/show_bug.cgi?id=31586
tracks when the hack can be removed.
* GNU ld adds `PT_PHDR` and `PT_INTERP` together. A shared object usually does
not have two program headers. In LLD, `PT_PHDR` is always added unless the
address assignment makes is unsuitable to place program headers at all.
* The conditions to create the dynamic symbol table `.dynsym`.
* LLD: there is an input shared object, `-pie`/`-shared`, or `--export-dynamic`.
* GNU ld's is quite complex. `--export-dynamic` is not special, though.
* `--export-dynamic-symbol`
* gold's implies `-u`.
* GNU ld (from 2.35 onwards) and LLD's do not imply `-u`.
* In GNU ld, a defined `foo@v` can suppress the extraction of an archive member
defining `foo@@v1`. LLD treats them two separate symbols and thus the archive
member extraction still happens. This can hardly matter. See [All about symbol
versioning](maskray-2.md) for details.
* Default program headers.
* With traditional `-z noseparate-code`, GNU ld defaults to a `RX/R/RW`
program header layout. With `-z separate-code` (default on Linux/x86 from
binutils 2.31 onwards), GNU ld defaults to a `R/RX/R/RW` program header
layout.
* LLD defaults to `R/RX/RW(RELRO)/RW(non-RELRO)`. With `--rosegment`, LLD
uses `RX/RW(RELRO)/RW(non-RELRO)`.
* Placing all R before RX is preferable because it can save one program
header and reduce alignment costs.
* LLD's split of RW saves one maxpagesize alignment and can make the linked
image smaller.
* This breaks some assumptions that the (so-called) "text segment" precedes
the (so-called) "data segment".
* For example, certain programs expect `.text` is the first section of the
text segment and specify `-Ttext=0` to place the `PF_R|PF_X` program header
at `p_vaddr=0`. This is a brittle assumption and should be avoided. If
`PT_PHDR` is needed, `--image-base=0` is a replacement. If `PT_PHDR` is not
needed, `.text 0 : { *(.text .text.*) }` is a replacement.
* GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in
`-pie` mode. glibc `csu/libc-start.c` needs it when statically linked, but
not in the static pie mode. LLD does not distinguish `-no-pie`, `-pie` and
`-shared`. https://bugs.llvm.org/show_bug.cgi?id=48674
* LLD uses `--no-apply-dynamic-relocs` by default. GNU ld and gold fill in the
GOT entries with link-time values. GNU ld only supports
`--no-apply-dynamic-relocs` for aarch64
https://sourceware.org/bugzilla/show_bug.cgi?id=25891.
* When relaxing `R_X86_64_REX_GOTPCRELX`, GNU ld suppresses the relaxation if
it would cause relocation overflow. LLD does not perform the check.
* GNU ld and gold allow `--exclude-libs=b` to hide `b.a`. LLD requires
`--exclude=libs=b.a`.
* Whether to use executable stack if neither `-z execstack` nor `-z noexecstack`
is specified. GNU ld and gold check whether an object file does not have
`.note.GNU-stack`. LLD ignores `.note.GNU-stack` and defaults to `-z
noexecstack`.
## Semantics of `--wrap`
GNU ld and LLD have slightly different `--wrap` semantics. I use "slightly"
because in most use cases users will not observe a difference.
In GNU ld, `--wrap` only applies to undefined symbols. In LLD, `--wrap` happens
after all other symbol resolution steps. The implementation is to mangle the
symbol table of each object file (`foo` -&gt; `__wrap_foo`; `__real_foo` -&gt;
`foo`) so that all relocations to foo or `__real_foo` will be redirected.
The LLD semantics have the advantage that non-LTO, LTO and relocatable link
behaviors are consistent. I filed
https://sourceware.org/bugzilla/show_bug.cgi?id=26358 for GNU ld.
```
# GNU ld: call bar
# LLD: call __wrap_bar
call bar
.globl bar
bar:
```
## Relocation referencing a local relative to a discarded input section
* How to resolve a relocation referencing a STT_SECTION symbol associated to a
discarded `.debug_*` input section.
* GNU ld and gold have logic resolving the relocation to the prevailing
section symbol.
* LLD does not have the logic. LLD 11 defines some tombstone values.
> A symbol table entry with `STB_LOCAL` binding that is defined relative to one
> of a group's sections, and that is contained in a symbol table section that
> is not part of the group, must be discarded if the group members are
> discarded. References to this symbol table entry from outside the group are
> not allowed.
ld.bfd/gold/lld error if the section containing the relocation is `SHF_ALLOC`.
`.debug*` do not have the `SHF_ALLOC` flag and those relocations are allowed.
lld resolves such relocations to 0. ld.bfd and gold, however, have some
`CB_PRETEND`/`PRETEND` logic to resolve relocations to the definitions in the
prevailing comdat groups. The code is hacky and may not suit lld.
https://bugs.llvm.org/show_bug.cgi?id=42030
## Canonical PLT entry for ifunc
How to handle a direct access relocation referencing a `STT_GNU_IFUNC`?
c.f. [GNU indirect function](maskray-6.md).
## `__rela_iplt_start`
GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in `-pie`
mode. LLD defines `__rela_iplt_start` regardless of `-no-pie`, `-pie` or
`-shared`.
Static pie and static no-pie relocation processing is very different in glibc.
* Static no-pie uses special code to process a magic array delimitered by
`__rela_iplt_start`/`__rela_iplt_end`.
* Static pie uses self-relocation to take care of `R_*_IRELATIVE`. The above
magic array code is executed as well. If `__rela_iplt_start`/`__rela_iplt_end`
are defined (like what LLD does), we will get
`0 < __rela_iplt_start < __rela_iplt_end` in `csu/libc-start.c`.
`ARCH_SETUP_IREL` will crash when resolving the first relocation which has
been processed.
nsz has a glibc patch that moves the self-relocation later so everything is set up for ifunc resolvers.
## Linker scripts
* Some linker script commands are unimplemented in LLD, e.g. `BLOCK()` as a
compatibility alias for `ALIGN()`. `BLOCK` is documented in GNU ld as a
compatibility alias and it is not widely used, so there is no reason to keep
the kludge in LLD.
* Some syntax is not recognized by LLD, e.g. LLD recognizes
`*(EXCLUDE_FILE(a.o) .text)` but not `EXCLUDE_FILE(a.o) *(.text)`
(https://bugs.llvm.org/show_bug.cgi?id=45764)
* To me the unrecognized syntax is misleading.
* If we support one way doing something, and the thing has several
alternative syntax, we may not consider the alternative syntax just for the
sake of completeness.
* Different orphan section placement. GNU ld has very complex rules and certain
section names have special semantics. LLD adopted some of its core ideas but
made a lot of simplication:
* output sections are given ranks
* output sections are placed after symbol assignments At some point we should
document it. https://bugs.llvm.org/show_bug.cgi?id=42327
* For an error detected when processing a linker script, LLD may report it
multiple times (e.g. `ASSERT` failure). GNU ld has such issues, too, but
probably much rarer.
* `SORT` commands
* GNU ld: https://sourceware.org/binutils/docs/ld/Input-Section-Basics.html#Input-Section-Basics
mentions the feature but its behavior is strange/unintuitive. I created
`SORT` and multiple patterns in an input section description.
* LLD performs sorting within an input section description.
https://reviews.llvm.org/D91127
* In LLD, `AT(lma)` forces creation of a new `PT_LOAD` program header. GNU ld
can reuse the previous `PT_LOAD` program header if LMA addresses are
contiguous. `lma-offset.s`
* In LLD, non-`SHF_ALLOC` sections always get 0 `sh_addr`. In GNU ld you can
have non-zero `sh_addr` but `STT_SECTION` relocations referencing such
sections are not really meaningful.
* Dot assignment (e.g. `. = 4;`) in an output section description.
* GNU ld: dot advances to 4 relative to the start. If you consider . on the
right hand side and `ABSOLUTE(.)`, I don't think the behaviors are
consistent.
* LLD: move dot to address 0x4, which will usually trigger an unable to move
location counter backward error. https://bugs.llvm.org/show_bug.cgi?id=41169
I'll also mention some LLD release notes which can demonstrate some GNU
incompatibility in previous versions. (For example, if one thing is supported
in version N, then the implication is that it is unsupported in previous
versions. Well, it could be that it worked in older versions but regressed at
some version. However, I don't know the existence of such things.)
LLD 12.0.0
* `-r --gc-sections` is supported.
* The archive member extraction semantics of COMMON symbols is by default
(`--fortran-common`) compatible with GNU ld. You may want to read Semantics
of a common definition in an archive for details. This is unfortunate.
* `.rel[a].plt` and `.rel[a].dyn` get the `SHF_INFO_LINK` flag. https://reviews.llvm.org/D89828
LLD 11.0.0
* LLD can discard unused symbols with `--discard-all`/`--discard-locals` when
`-r` or `--emit-relocs` is specified. https://reviews.llvm.org/D77807
* `--emit-relocs --strip-debug` can be used. https://reviews.llvm.org/D74375
* `SHT_GNU_verneed` in shared objects are parsed, and versioned undefined
symbols in shared objects are respected. Previously non-default version
symbols could cause spurious `--no-allow-shlib-undefined` errors.
https://reviews.llvm.org/D80059
* `DF_1_PIE` is set for position-independent executables. https://reviews.llvm.org/D80872
* Better compatibility related to output section alignments and LMA regions.
[D75286](https://reviews.llvm.org/D75286) [D74297](https://reviews.llvm.org/D74297)
[D75724](https://reviews.llvm.org/D75725) [D81986](https://reviews.llvm.org/D81986)
* `-r` allows `SHT_X86_64_UNWIND` to be merged into `SHT_PROGBITS`. This allows
clang/GCC produced object files to be mixed together. https://reviews.llvm.org/D85785
* In a input section description, the filename can be specified in double
quotes. archive:file syntax is added. https://reviews.llvm.org/D72517 https://reviews.llvm.org/D75100
* Linker script specified empty `(.init|.preinit|.fini)_array` are allowed with
`RELRO`. https://reviews.llvm.org/D76915
LLD 10.0.0
* LLD supports `\` (treating the next character like a non-meta character) and
`[!...]` (negation) in glob patterns. https://reviews.llvm.org/D66613
LLD 9.0.0
* The `DF_STATIC_TLS` flag is set for i386 and x86-64 when initial-exec TLS
models are used.
* Many configurations of the Linux kernel's `arm32_7`, `arm64`, `powerpc64le`
and `x86_64` ports can be linked by LLD.
LLD 8.0.0
* `SHT_NOTE` sections get very high ranks (they usually precede other
sections). https://reviews.llvm.org/D55800
In the LLD 7.0.0 era, https://reviews.llvm.org/D44264 was my first meaningful
(albeit trivial) patch to LLD. Next I made contribution to `--warn-backrefs`.
Then I started to fix tricky issues like copy relocations of a versioned
symbol, duplicate `--wrap`, and section ranks. I have learned a lot from these
code reviews. In the 8.0.0, 9.0.0 and 10.0.0 era, I have fixed a number of
tricky issues and improved a dozen of other things and am confident to say that
other than MIPS ;-) and certain other ISA specific things I am familiar with
every corner of the code base. These are still challenges such as integration
of RISC-V style linker relaxation and post-link optimization, improvement to
some aspects of the linker script, but otherwise LLD is a stable and finished
part of the toolchain.
A few random notes:
* Symbol resolution can take 10%~20% time. Parallelization can theoretically
improve the process but it is hard to overstate the challenge (if you
additionally take into account determinism).
* Be wary of feature creep. I have learned a lot from ELF design discussions
on generic-abi and from Solaris "linker aliens" in particular. I am sorry to
say so but some development on LLD indeed belongs to such categories.
Sometimes it is difficult to draw a line between unsupported legacy and
legacy we have to support.
* LLD's adoption is now so large that sometimes a decision (like a default
value for an option) cannot make everyone happy.

462
maskray-5.md Normal file
View File

@ -0,0 +1,462 @@
# Copy relocations, canonical PLT entries and protected visibility
Background:
* `-fno-pic` can only be used by executables. On most platforms and
architectures, direct access relocations are used to reference external data
symbols.
* `-fpic` can be used by both executables and shared objects. Windows has
`__declspec(dllimport)` but most other binary formats allow a default
visibility external data to be resolved to a shared object, so generally
direct access relocations are disallowed.
* `-fpie` was introduced as a mode similar to `-fpic` for ELF: the compiler can
make the assumption that the produced object file can only be used by
executables, thus all definitions are non-preemptible and thus
interprocedural optimizations can apply on them.
For
```c
extern int a;
int *foo() { return &a; }
```
`-fno-pic` typically produces an absolute relocation (a PC-relative relocation
can be used as well). On ELF x86-64 it is usually `R_X86_64_32` in the position
dependent small code model. If a is defined in the executable (by another
translation unit), everything works fine. If a turns out to be defined in a
shared object, its real address will be non-constant at link time. Either
action needs to be taken:
* Emit a dynamic relocation in every use site. Text sections are usually
non-writable. A dynamic relocation applied on a non-writable section is
called a text relocation.
* Emit a single copy relocation. Copy relocations only work for executables.
The linker obtains the size of the symbol, allocates the bytes in `.bss`
(this may make the object writable. On LLD a readonly area may be picked.),
and emit an `R_*_COPY` relocation. All references resolve to the new location.
Multiple text relocations are even less acceptable, so on ELF a copy relocation
is generally used. Here is a nice description from [Rich
Felker](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55012): "Copy relocations
are not a case of overriding the definition in the abstract machine, but an
implementation detail used to support data objects in shared libraries when the
main program is non-PIC."
Copy relocations have drawbacks:
* Break page sharing.
* Make the symbol properties (e.g. size) part of ABI.
* If the shared object is linked with `-Bsymbolic` or `--dynamic-list` and
defines a data symbol copy relocated by the executable, the address of the
symbol may be different in the shared object and in the executable.
What went poorly was that `-fno-pic` code had no way to avoid copy relocations
on ELF. Traditionally copy relocations could only occur in `-fno-pic` code. A
GCC 5 change made this possible for x86-64. Please read on.
## x86-64: copy relocations and `-fpie`
`-fpic` using GOT indirection for external data symbols has cost. Making
`-fpie` similar to `-fpic` in this regard incurs costs if the data symbol turns
out to be defined in the executable. Having the data symbol defined in another
translation unit linked into the executable is very common, especially if the
vendor uses fully/mostly statically linking mode.
In GCC 5, ["x86-64: Optimize access to globals in PIE with copy
reloc"](https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=77ad54d911dd7cb88caf697ac213929f6132fdcf)
started to use direct access relocations for external data symbols on x86-64 in
`-fpie` mode.
```c
extern int a;
int foo() { return a; }
```
* GCC&lt;5: `movq a@GOTPCREL(%rip), %rax; movl (%rax), %eax` (8 bytes)
* GCC&gt;=5: `movl a(%rip), %eax` (6 bytes)
This change is actually useful for architectures other than x86-64 but is never
implemented for other architectures. What went wrong: the change was
implemented as an inflexible configure-time choice (`HAVE_LD_PIE_COPYRELOC`),
defaulting to such a behavior if ld supports PIE copy relocations (most
binutils installations). Keep in mind that such a `-fpie` default [breaks
`-Bsymbolic` and `--dynamic-list` in shared objects](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65888).
Clang addressed the inflexible configure-time choice via an opt-in option
`-mpie-copy-relocations` (D19996).
I noticed that:
* The option can be used for `-fno-pic` code as well to prevent copy
relocations on ELF. This is occasionally users want (if their shared objects
use `-Bsymbolic` and export data symbols (usually undesired from API
perspecitives but can avoid costs at times)), and they switch from `-fno-pic`
to `-fpic` just for this purpose.
* The option name should describe the code generation behavior, instead of the
inferred behavior at the linking stage on a partibular binary format.
* The option does not need to tie to ELF.
* On COFF, the behavior is like always `-fdirect-access-external-data`.
`__declspec(dllimport)` is needed to enable indirect access.
* On Mach-O, the behavior is like `-fdirect-access-external-data` for
`-fno-pic` (only available on arm) and the opposite for `-fpic`.
* H.J. Lu introduced `R_X86_64_GOTPCRELX` and `R_X86_64_REX_GOTPCRELX` as GOT
optimization to x86-64 psABI. This is great! With the optimization, GOT
indirection can be optimized, so the incured cost is very low now.
So I proposed an alternative option `-f[no-]direct-access-external-data`:
https://reviews.llvm.org/D92633
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112. My wish on the GCC side is
to drop `HAVE_LD_PIE_COPYRELOC` and (x86-64) default to GOT indirection for
external data symbols in `-fpie` mode.
Please keep in mind that `-f[no-]semantic-interposition` is for definitions
while `-f[no-]direct-access-external-data` is for undefined data symbols. GCC 5
introduced `-fno-semantic-interposition` to use local aliases for references to
definitions in the same translation unit.
## `STV_PROTECTED`
Now let's consider how `STV_PROTECTED` comes into play. Here is the generic ABI
definition:
> A symbol defined in the current component is protected if it is visible in
> other components but not preemptable, meaning that any reference to such a
> symbol from within the defining component must be resolved to the definition
> in that component, even if there is a definition in another component that
> would preempt by the default rules. A symbol with `STB_LOCAL` binding may not
> have `STV_PROTECTED` visibility. If a symbol definition with `STV_PROTECTED`
> visibility from a shared object is taken as resolving a reference from an
> executable or another shared object, the `SHN_UNDEF` symbol table entry
> created has `STV_DEFAULT` visibility.
A non-local `STV_DEFAULT` defined symbol is by default preemptible in a shared
object on ELF. `STV_PROTECTED` can make the symbol non-preemptible. You may
have noticed that I use "preemptible" while the generic ABI uses "preemptable"
and LLVM IR uses "`dso_preemptable`". Both forms work. "preemptible" is my
opition because it is more common.
### Protected data symbols and copy relocations
Many folks consider that copy relocations are best-effort support provided by
the toolchain. `STV_PROTECTED` is intended as an optimization and the
optimization can error out if it can't be done for whatever reason. Since copy
relocations are already oftentimes unacceptable, it is natural to think that we
should just disallow copy relocations on protected data symbols.
However, GNU ld 2.26 made a change which enabled copy relocations on protected
data symbols for i386 and x86-64.
A glibc change ["Add `ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA` to
x86"](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=62da1e3b00b51383ffa7efc89d8addda0502e107)
is needed to make copy relocations on protected data symbols work.
["[AArch64][BZ #17711] Fix extern protected data handling"](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0910702c4d2cf9e8302b35c9519548726e1ac489)
and ["[ARM][BZ #17711] Fix extern protected data handling"](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3bcea719ddd6ce399d7bccb492c40af77d216e42)
ported the thing to arm and aarch64.
Despite the glibc support, GNU ld aarch64 errors relocation
`R_AARCH64_ADR_PREL_PG_HI21` against symbol `foo` which may bind externally can
not be used when making a shared object; recompile with `-fPIC`.
powerpc64 ELFv2 is interesting: TOC indirection (TOC is a variant of GOT) is
used everywhere, data symbols normally have no direct access relocations, so
this is not a problem.
```c
// b.c
__attribute__((visibility("protected"))) int foo;
// a.c
extern int foo;
int main() { return foo; }
```
```
gcc -fuse-ld=bfd -fpic -shared b.c -o b.so
gcc -fuse-ld=bfd -pie -fno-pic a.c ./b.so
```
gold does not allow copy relocations on protected data symbols, but it misses
some cases: https://sourceware.org/bugzilla/show_bug.cgi?id=19823.
### Protected data symbols and direct accesses
If a protected data symbol in a shared object is copy relocated, allowing
direct accesses will cause the shared object to operate on a different copy
from the executable. Therefore, direct accesses to protected data symbols have
to be disallowed in `-fpic` code, just in case the symbols may be copy
relocated. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 changed GCC 5 to
use GOT indirection for protected external data.
```c
__attribute__((visibility("protected"))) int foo;
int val() { return foo; }
// -fPIC: GOT on at least aarch64, arm, i386, x86-64
```
This caused unneeded pessimization for protected external data. Clang always
treats protected similar to hidden/internal.
For older GCC (and all versions of Clang), direct accesses are produced in
`-fpic` code. Mixing such object files can silently break copy relocations on
protected data symbols. Therefore, GNU ld made the change
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=ca3fe95e469b9daec153caa2c90665f5daaec2b5
to error in `-shared` mode.
```
% cat a.s
leaq foo(%rip), %rax
.data
.global foo
.protected foo
foo:
```
```
% gcc -fuse-ld=bfd -shared a.s
/usr/bin/ld.bfd: /tmp/ccchu3Xo.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
/usr/bin/ld.bfd: final link failed: bad value
collect2: error: ld returned 1 exit status
```
This led to a heated discussion
https://sourceware.org/legacy-ml/binutils/2016-03/msg00312.html. Swift folks
noticed this https://bugs.swift.org/browse/SR-1023 and their reaction was to
switch from GNU ld to gold.
GNU ld's aarch64 port does not have the diagnostic.
binutils commit ["x86: Clear `extern_protected_data` for
`GNU_PROPERTY_NO_COPY_ON_PROTECTED`"](https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=73784fa565bd66f1ac165816c03e5217b7d67bbc)
introduced
`GNU_PROPERTY_NO_COPY_ON_PROTECTED`. With this property, `ld -shared` will not
error for relocation `R_X86_64_PC32` against protected symbol `foo` can not be
used when making a shared object.
The two issues above are the costs enabling copy relocations on protected data
symbols. Personally I don't think copy relocations on protected data symbols
are actually leveraged. GNU ld's x86 port can just (1) reject such copy
relocations and (2) allow direct accesses referencing protected data symbols in
`-shared` mode. But I am not really clear about the glibc case. I wish
`GNU_PROPERTY_NO_COPY_ON_PROTECTED` can become the default or be phased out in
the future.
### Protected function symbols and canonical PLT entries
```c
// b.c
__attribute__((visibility("protected"))) void *foo () {
return (void *)foo;
}
```
GNU ld's aarch64 and x86 ports rejects the above code. On many other
architectures including powerpc the code is supported.
```
% gcc -fpic -shared b.c -fuse-ld=bfd b.c -o b.so
/usr/bin/ld.bfd: /tmp/cc3Ay0Gh.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
/usr/bin/ld.bfd: final link failed: bad value
collect2: error: ld returned 1 exit status
% gcc -shared -fuse-ld=bfd -fpic b.c -o b.so
/usr/bin/ld.bfd: /tmp/ccXdBqMf.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `foo' which may bind externally can not be used when making a shared object; recompile with -fPIC
/tmp/ccXdBqMf.o: in function `foo':
a.c:(.text+0x0): dangerous relocation: unsupported relocation
collect2: error: ld returned 1 exit status
```
The rejection is mainly a historical issue to make pointer equality work with
`-fno-pic` code. The GNU ld idea is that:
* The compiler emits GOT-generating relocations for `-fpic` code (in reality it
does it for declarations but not for definitions).
* `-fno-pic` main executable uses direct access relocation types and gets a
canonical PLT entry.
* glibc ld.so resolves the GOT in the shared object to the canonical PLT entry.
Actually we can take the interepretation that a canonical PLT entry is
incompatible with a shared `STV_PROTECTED` definition, and reject the attempt
to create a canonical PLT entry (gold/LLD). And we can keep producing direct
access relocations referencing protected symbols for `-fpic` code.
`STV_PROTECTED` is no different from `STV_HIDDEN`.
On many architectures, a branch instruction uses a branch specific relocation
type (e.g. `R_AARCH64_CALL26`, `R_PPC64_REL24`, `R_RISCV_CALL_PLT`). This is
great because the address is insignificant and the linker can arrange for a
regular PLT if the symbol turns out to be external.
On i386, a branch in `-fno-pic` code emits an `R_386_PC32` relocation, which is
indistinguishable from an address taken operation. If the symbol turns out to
be external, the linker has to employ a tricky called "canonical PLT entry"
(`st_shndx=0, st_value!=0`). The term is a parlance within a few LLD
developers, but not broadly adopted.
```c
// a.c
extern void foo(void);
int main() { foo(); }
```
```
% gcc -m32 -shared -fuse-ld=bfd -fpic b.c -o b.so
% gcc -m32 -fno-pic -no-pie -fuse-ld=lld a.c ./b.so
% gcc -m32 -fno-pic a.c ./b.so -fuse-ld=lld
ld.lld: error: cannot preempt symbol: foo
>>> defined in ./b.so
>>> referenced by a.c
>>> /tmp/ccDGhzEy.o:(main)
collect2: error: ld returned 1 exit status
% gcc -m32 -fno-pic -no-pie a.c ./b.so -fuse-ld=bfd
# canonical PLT entry; foo has different addresses in a.out and b.so.
% gcc -m32 -fno-pic -pie a.c ./b.so -fuse-ld=bfd
/usr/bin/ld.bfd: /tmp/ccZ3Rl8Y.o: warning: relocation against `foo' in read-only section `.text'
/usr/bin/ld.bfd: warning: creating DT_TEXTREL in a PIE
% gcc -m32 -fno-pic -pie a.c ./b.so -fuse-ld=bfd -z text
/usr/bin/ld.bfd: /tmp/ccUv8wXc.o: warning: relocation against `foo' in read-only section `.text'
/usr/bin/ld.bfd: read-only segment has dynamic relocations
collect2: error: ld returned 1 exit status
```
This used to be a problem for x86-64 as well, until ["x86-64: Generate branch
with PLT32 relocation"](https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bd7ab16b4537788ad53521c45469a1bdae84ad4a)
changed call/jmp foo to emit `R_X86_64_PLT32` instead of `R_X86_64_PC32`. Note:
(`-fpie`/`-fpic`) `call/jmp foo@PLT` always emits `R_X86_64_PLT32`.
The relocation type name is a bit misleading, `_PLT32` does not mean that a PLT
will always be created. Rather, it is optional: the linker can resolve `_PLT32`
to any place where the function will be called. If the symbol is preemptible,
the place is usually the PLT entry. If the symbol is non-preemptible, the
linker can convert `_PLT32` into `_PC32`. A function symbol can be either
branched or taken address. For an address taken operation, the function symbol
is used in a manner similar to a data symbol. `R_386_PLT32` cannot be used. LLD
and gold will just reject the link if text relocations are disabled.
On i386, my proposal is that branches to a default visibility function
declaration should use `R_386_PLT32` instead of `R_386_PC32`, in a manner
similar to x86-64. Originally I thought an assembler change sufficed:
https://sourceware.org/bugzilla/show_bug.cgi?id=27169. Please read the next
section why this should be changed on the compiler side.
### Non-default visibility ifunc and `R_386_PC32`
For a call to a hidden function declaration, the compiler produces an
`R_386_PC32` relocation. The relocation is an indicator that EBX may not be set
up.
If the declaration refers to an ifunc definition, the linker will resolve the
`R_386_PC32` to an IPLT entry. For `-pie` and `-shared` links, the IPLT entry
references EBX. If the call site does not set up EBX to be
`_GLOBAL_OFFSET_TABLE_`, the IPLT call will be incorrect.
GNU ld has implemented a diagnostic (["i686 ifunc and non-default symbol
visibility"](https://sourceware.org/bugzilla/show_bug.cgi?id=20515)) to catch
the problem. If we change `call/jmp foo` to always use `R_386_PLT32`, such a
diagnostic will be lost.
Can we change the compiler to emit `call/jmp foo@PLT` for default visibility
function declarations? If the compiler emits such a modifier but does not set
up EBX, the ifunc can still be non-preemptible (e.g. hidden in another
translation unit or `-Bsymbolic`) and we will still have a dilemma.
Personally, I think avoiding a canonical PLT entry is more useful than a ld
ifunc diagnostic. i386 ABI is legacy and the x86 maintainer will not make the
change, though.
## Summary
I hope the above give an overview to interested readers. Symbol interposition
is subtle. One has to think about all the factors related to symbol
interposition and the relevant toolchain fixes are like a whack-a-mole game. I
appreciate all the prior discussions and I believe many unsatisfactory things
can be fixed in a quite backward-compatible way.
Some features are inherently incompatible. We make the trade-off in favor of
more important features. Here are two things that should not work. However, if
`-fpie` or `-fno-direct-access-external-data` is specified, both limitations
will be circumvented.
* Copy relocations on protected data symbols.
* Canonical PLT entries on protected function symbols. With the `R_386_PLT32`
change, this issue will only affect function pointers.
People sometimes simply just say: "protected visibility does not work." I'd
argue that Clang+gold/LLD works quite well.
The things on GCC+GNU ld side are inconsistent, though. Here is a list of
changes I wish can happen:
* GCC: add `-f[no-]direct-access-external-data`.
* GCC: drop `HAVE_LD_PIE_COPYRELOC` in favor of `-f[no-]direct-access-external-data`.
* GCC x86-64: default to GOT indirection for external data symbols in `-fpie`
mode.
* GCC or GNU as i386: emit `R_386_PLT32` for branches to undefined function
symbols.
* GNU ld x86: disallow copy relocations on protected data symbols. (I think
canonical PLT entries on protected symbols have been disallowed.)
* GCC aarch64/arm/x86/...: allow direct access relocations on protected symbols
in `-fpic` mode.
* GNU ld aarch64/x86: allow direct access relocations on protected data symbols
in `-shared` mode.
The breaking changes for GCC+GNU ld:
* The "copy relocations on protected data symbols" scheme has been supported in
the past few years with GNU ld on x86, but it did not work before circa 2015,
and should not work in the future. Fortunately the breaking surface may be
narrow: this scheme does not work with gold or LLD. Many architectures don't
work.
* ld is not the only consumer of `R_386_PLT32`. The Linux kernel has code
resolving relocations and it needs to be fixed (patch uploaded: https://github.com/ClangBuiltLinux/linux/issues/1210).
I'll conclude thie article with random notes on other binary formats:
Windows/COFF `__declspec(dllimport)` gives us a different perspecitive how
external references can be designed. The annotation is verbose but
differentiates the two cases (1) the symbol has to be defined in the same
linkage unit (2) the symbol can be defined in another linkage unit. If we lift
the "the symbol visibility is decided by the most constrained visibility"
requirement for protected-&gt;default, a COFF undefined/defined symbol is quite
like a protected undefined/defined symbol in ELF. `__declspec(dllimport)` gives
the undefined symbol default visibility (i.e. the LLVM IR `dllimport` is
redundant). `__declspec(dllexport)` is something which cannot be modeled with
the existing ELF visibilities.
For an undefined variable, Mach-O uses `__attribute__((visibility("hidden")))`
to say "a definition must be available in another translation unit in the same
linkage unit" but does not actually mark the undefined symbol anyway. COFF uses
`__declspec(dllimport)` to convey this. In ELF,
`__attribute__((visibility("hidden")))` additionally makes the undefined symbol
unexportable. The Mach-O notation actually resembles COFF: it can be exported
by the definition in another translation unit. From its behavior, I think it
would be more appropriately mapped to LLVM IR protected instead of hidden.
## Appendix
For a `STB_GLOBAL`/`STB_WEAK` symbol,
`STV_DEFAULT`: both compiler &amp; linker need to assume such symbols can be
preempted in `-fpic` mode. The compiler emits GOT indirection by default. GCC
`-fno-semantic-interposition` uses local aliases on defined non-weak function
symbols for x86 (unimplemented in other architectures). Clang
`-fno-semantic-interposition` uses local aliases on defined non-weak symbols
(both function and data) for x86.
`STV_PROTECTED`: GCC `-fpic` uses GOT indirection for data symbols, regardless
of defined or undefined. This pessimization is to make a misfeature "copy
relocation on protected data symbol" work
(https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#protected-data-symbols-and-direct-accesses).
Clang code generation treats `STV_PROTECTED` the same way as `STV_HIDDEN`.
`STV_HIDDEN`: non-preemptible, regardless of defined or undefined. The compiler
suppresses GOT indirection, unless undefined `STB_WEAK`.
For defined symbols, `-fno-pic`/`-fpie` can avoid GOT indirection for
`STV_DEFAULT` (and GCC `STV_PROTECTED`). `-fvisibility=hidden` can change
visibility.
For undefined symbols, `-fpie`/`-fpic` use GOT indirection by default. Clang
`-fno-direct-access-external-data` (discussed in my article) can avoid GOT
indirection. If you `-fpic -fno-direct-access-external-data` &amp; `ld
-shared`, you'll need additional linker options to make the linker know defined
non-`STB_LOCAL` `STV_DEFAULT` symbols are non-preemptible.

328
maskray-6.md Normal file
View File

@ -0,0 +1,328 @@
# GNU indirect function
UNDER CONSTRUCTION.
GNU indirect function (ifunc) is a mechanism making a direct function call
resolve to an implementation picked by a resolver. It is mainly used in glibc
but has adoption in FreeBSD.
For some performance critical functions, e.g. memcpy/memset/strcpy, glibc
provides multiple implementations optimized for different architecture levels.
The application just uses `memcpy(...)` which compiles to call memcpy. The
linker will create a PLT for `memcpy` and produce an associated special dynamic
relocation referencing the resolver symbol/address. During relocation resolving
at runtime, the return value of the resolver will be placed in the GOT entry
and the PLT entry will load the address.
## Representation
ifunc has a dedicated symbol type `STT_GNU_IFUNC` to mark it different from a
regular function (`STT_FUNC`). The value 10 is in the OS-specific range (10~12).
`readelf -s` tell you that the symbol is ifunc if OSABI is `ELFOSABI_GNU` or
`ELFOSABI_FREEBSD`.
On Linux, by default GNU as uses `ELFOSABI_NONE` (0). If ifunc is used, the OSABI
will be changed to `ELFOSABI_GNU`. Similarly, GNU ld sets the OSABI to
`ELFOSABI_GNU` if ifunc is used. gold does not do this [PR17735](https://sourceware.org/bugzilla/show_bug.cgi?id=17735).
Things are loose in LLVM. The integrated assembler and LLD do not set
`ELFOSABI_GNU`. Currently the only problem I know is the `readelf -s` display.
Everything else works fine.
### Assembler behavior
In assembly, you can assign the type `STT_GNU_IFUNC` to a symbol via
`.type foo, @gnu_indirect_function`. An ifunc symbol is typically `STB_GLOBAL`.
In the object file, `st_shndx` and `st_value` of an `STT_GNU_IFUNC` symbol
indicate the resolver. After linking, if the symbol is still `STT_GNU_IFUNC`,
its `st_value` field indicates the resolver address in the linked image.
Assemblers usually convert relocations referencing a local symbol to reference
the section symbol, but this behavior needs to be inhibited for `STT_GNU_IFUNC`.
### Example
```
cat > b.s <<e
.global ifunc
.type ifunc, @gnu_indirect_function
.set ifunc, resolver
resolver:
leaq impl(%rip), %rax
ret
impl:
movq $42, %rax
ret
e
cat > a.c <<e
int ifunc(void);
int main() { return ifunc(); }
e
cc a.c b.s
./a.out # exit code 42
```
GNU as makes transitive aliases to an `STT_GNU_IFUNC` ifunc as well.
```
.type foo,@gnu_indirect_function
.set foo, foo_resolver
.set foo2, foo
.set foo3, foo2
```
GCC and Clang support a function attribute which emits
`.type ifunc, @gnu_indirect_function; .set ifunc, resolver`:
```c
static int impl(void) { return 42; }
static void *resolver(void) { return impl; }
void *ifunc(void) __attribute__((ifunc("resolver")));
```
## Preemptible ifunc
A preemptible ifunc call is no different from a regular function call from the
linker perspective.
The linker creates a PLT entry, reserves an associated GOT entry, and emits an
`R_*_JUMP_SLOT` relocation resolving the address into the GOT entry. The PLT
code sequence is the same as a regular PLT for `STT_FUNC`.
If the ifunc is defined within the module, the symbol type in the linked image
is `STT_GNU_IFUNC`, otherwise (defined in a DSO), the symbol type is `STT_FUNC`.
The difference resides in the loader.
At runtime, the relocation resolver checks whether the `R_*_JUMP_SLOT`
relocation refers to an ifunc. If it does, instead of filling the GOT entry
with the target address, the resolver calls the target address as an indirect
function, with ABI specified additional parameters (hwcap related), and places
the return value into the GOT entry.
## Non-preemptible ifunc
The non-preemptible ifunc case is where all sorts of complexity come from.
First, the `R_*_JUMP_SLOT` relocation type cannot be used in some cases:
* A non-preemptible ifunc may not have a dynamic symbol table entry. It can be
local. It can be defined in the executable without the need to export.
* A non-local `STV_DEFAULT` symbol defined in a shared object is by default
preemptible. Using `R_*_JUMP_SLOT` for such a case will make the ifunc look
like preemptible.
Therefore a new relocation type `R_*_IRELATIVE` was introduced. There is no
associated symbol and the address indicates the resolver.
```
R_*_RELATIVE: B + A
R_*_IRELATIVE: call (B + A) as a function
R_*_JUMP_SLOT: S
```
When an `R_*_JUMP_SLOT` can be used, there is a trade-off between
`R_*_JUMP_SLOT` and `R_*_IRELATIVE`: an `R_*_JUMP_SLOT` can be lazily resolved
but needs a symbol lookup. Currently powerpc can use `R_PPC64_JMP_SLOT` in some
cases [PR27203](https://sourceware.org/bugzilla/show_bug.cgi?id=27203).
A PLT entry is needed for two reasons:
* The call sites emit instructions like call foo. We need to forward them to a
place to perform the indirection. Text relocations are usually not an option
(exception: [{ifunc-noplt}]()).
* If the ifunc is exported, we need a place to mark its canonical address.
Such PLT entries are sometimes referred to as IPLT. They are placed in the
synthetic section .iplt. In GNU ld, `.iplt` will be placed in the output
section `.plt`. In LLD, I decided that `.iplt` is better
https://reviews.llvm.org/D71520.
On many architectures (e.g. AArch64/PowerPC/x86), the PLT code sequence is the
same as a regular PLT, but it could be different.
On x86-64, the code sequence is:
```
jmp *got(%rip)
pushq $0
jmp .plt
```
Since there is no lazy binding, `pushq $0; jmp .plt` are not needed. However,
to make all PLT entries of the same shape to simplify linker implementations
and facilitate analyzers, it is find to keep it this way.
## PowerPC32 `-msecure-plt` IPLT
As a design to work around the lack of PC-relative instructions, PowerPC32 uses
multiple GOT sections, one per file in `.got2`. To support multiple GOT
pointers, the addend on each `R_PPC_PLTREL24` reloc will have the offset within
`.got2`.
`-msecure-plt` has small/large PIC differences.
* `-fpic`/`-fpie`: `R_PPC_PLTREL24 r_addend=0`. The call stub loads an address
relative to `_GLOBAL_OFFSET_TABLE_`.
* `-fPIC`/`-fPIE`: `R_PPC_PLTREL24 r_addend=0x8000`. (A partial linked object
file may have an addend larger than 0x8000.) The call stub loads an address
relative to `.got2+0x8000`.
If a non-preemptible ifunc is referenced in two object files, in
`-pie`/`-shared` mode, the two object files cannot share the same IPLT entry.
When I added non-preemptible ifunc support for PowerPC32 to LLD
https://reviews.llvm.org/D71621, I did not handle this case.
### `.rela.dyn` vs `.rela.plt`
LLD placed `R_*_IRELATIVE` in the `.rela.plt` section because many ports of GNU
ld behaved this way. While implementing ifunc for PowerPC, I noticed that GNU
ld powerpc actually places `R_*_IRELATIVE` in `.rela.dyn` and glibc powerpc
does not actually support `R_*_IRELATIVE` in `.rela.plt`. This makes a lot of
sense to me because `.rela.plt` normally just contains `R_*_JUMP_SLOT` which
can be lazily resolved. ifunc relocations need to be eagerly resolved so
`.rela.plt` was a misplace. Therefore I changed LLD to use `.rela.dyn` in
https://reviews.llvm.org/D65651.
## `__rela_iplt_start` and `__rela_iplt_end`
A statically linked position dependent executable traditionally had no dynamic
relocations.
With ifunc, these `R_*_IRELATIVE` relocations must be resolved at runtime. Such
relocations are in a magic array delimitered by `__rela_iplt_start` and
`__rela_iplt_end`. In glibc, `csu/libc-start.c` has special code processing the
relocation range.
GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in `-pie`
mode. LLD defines `__rela_iplt_start` regardless of `-no-pie`, `-pie` or
`-shared`.
In glibc, static pie uses self-relocation (`_dl_relocate_static_pie`) to take
care of `R_*_IRELATIVE`. The above magic array code is executed by static pie
as well. If `__rela_iplt_start`/`__rela_iplt_end` are defined, we will get
`0 < __rela_iplt_start < __rela_iplt_end` in `csu/libc-start.c`.
`ARCH_SETUP_IREL` will crash when resolving the first relocation which has been
processed.
I think the difference in the
`diff -u =(ld.bfd --verbose) =(ld.bfd -pie --verbose)` output is unneeded.
https://sourceware.org/pipermail/libc-alpha/2021-January/121755.html
## Address significance
A non-GOT-generating non-PLT-generating relocation referencing a
`STT_GNU_IFUNC` indicates a potential address-taken operation.
With a function attribute, the compilers knows that a symbol indicates an ifunc
and will avoid generating such relocations. With assembly such relocations may
be unavoidable.
In most cases the linker needs to convert the symbol type to `STT_FUNC` and
create a special PLT entry, which is called a "canonical PLT entry" in LLD.
References from other modules will resolve to the PLT entry to keep pointer
equality: the address taken from the defining module should match the address
taken from another module.
This approach has pros and cons:
* With a canonical PLT entry, the resolver of a symbol is called only once.
There is exactly one `R_*_IRELATIVE` relocation.
* If the relocation appears in a non-`SHF_WRITE` section, a text relocation can
be avoided.
* Relocation types which are not valid dynamic relocation types are supported.
GNU ld may error relocation `R_X86_64_PC32` against `STT_GNU_IFUNC` symbol
`ifunc` isn't supported
* References will bind to the canonical PLT entry. A function call needs to
jump to the PLT, loads the value from the GOT, then does an indirect call.
For a symbolic relocation type (a special case of absolute relocation types
where the width matches the word size) like `R_X86_64_64`, when the addend is 0
and the section has the `SHF_WRITE` flag, the linker can emit an
`R_X86_64_IRELATIVE`. https://reviews.llvm.org/D65995 dropped the case.
For the following example, GNU ld linked `a.out` calls `fff_resolver` three
times while LLD calls it once.
```c
// RUN: split-file %s %t
// RUN: clang -fuse-ld=bfd -fpic %t/dso.c -o %t/dso.so --shared
// RUN: clang -fuse-ld=bfd %t/main.c %t/dso.so -o %t/a.out
// RUN: %t/a.out
//--- dso.c
typedef void fptr(void);
extern void fff(void);
fptr *global_fptr0 = &fff;
fptr *global_fptr1 = &fff;
//--- main.c
#include <stdio.h>
static void fff_impl() { printf("fff_impl()\n"); }
static int z;
void *fff_resolver() { return (char *)&fff_impl + z++; }
__attribute__((ifunc("fff_resolver"))) void fff();
typedef void fptr(void);
fptr *local_fptr = fff;
extern fptr *global_fptr0, *global_fptr1;
int main() {
printf("local %p global0 %p global1 %p\n", local_fptr, global_fptr0, global_fptr1);
return 0;
}
```
### Relocation resolving order
`R_*_IRELATIVE` relocations are resolved eagerly. In glibc, there used to be a
problem where ifunc resolvers ran before `GL(dl_hwcap)` and `GL(dl_hwcap2)`
were set up https://sourceware.org/bugzilla/show_bug.cgi?id=27072.
For the relocation resolver, the main executable needs to be processed the last
to process `R_*_COPY`. Without ifunc, the resolving order of shared objects can
be arbitrary.
For ifunc, if the ifunc is defined in a processed module, it is fine. If the
ifunc is defined in an unprocessed module, it may crash.
For an ifunc defined in an executable, calling it from a shared object can be
problematic because the executable's relocations haven't been resolved. The
issue can be circumvented by converting the non-preemptible ifunc defined in
the executable to `STT_FUNC`. GNU ld's x86 port made the change
[PR23169](https://sourceware.org/bugzilla/show_bug.cgi?id=23169).
## `-z ifunc-noplt`
Mark Johnston introduced `-z ifunc-noplt` for FreeBSD
https://reviews.llvm.org/D61613. With this option, all relocations referencing
`STT_GNU_IFUNC` will be emitted as dynamic relocations (if `.dynsym` is
created). The canonical PLT entry will not be used.
## Miscellaneous
GNU ld has implemented a diagnostic (["i686 ifunc and non-default symbol
visibility"](https://sourceware.org/bugzilla/show_bug.cgi?id=20515)) to flag
`R_386_PC32` referencing non-default visibility ifunc in `-pie` and `-shared`
links. This diagnostic looks like the most prominent reason blocking my
proposal to use `R_386_PLT32` for `call/jump foo`. See [Copy relocations,
canonical PLT entries and protected visibility](maskray-5.md) for details.
https://sourceware.org/glibc/wiki/GNU_IFUNC misses a lot of information. There
are quite a few arch differences. I asked for clarification
https://sourceware.org/pipermail/libc-alpha/2021-January/121752.html
### Dynamic loader
In glibc, `_dl_runtime_resolver` needs to save and restore vector and floating
point registers. ifunc resolvers add another reason that `_dl_runtime_resolver`
cannot only use integer registers. (The other reasons are that ld.so has string
function calls which may use vectors and external calls to libc.so.)

223
maskray-7.md Normal file
View File

@ -0,0 +1,223 @@
# Everything I know about GNU toolchain
As mainly an LLVM person, I occasionally contribute to GNU toolchain projects.
This is sometimes for fun, sometimes for investigating why an (usually ancient)
feature works in a particular way, sometimes for pushing forward a toolchain
feature with the mind of both communities, or sometimes just for getting sense
of how things work with mailing list+GNU make.
For a debug build, I normally place my build directory `Debug` directly under
the project root.
## binutils
* Repository: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git
* Mailing list: https://sourceware.org/pipermail/binutils
* Bugzilla: https://sourceware.org/bugzilla/
* Main tools: as (`gas/`, GNU assembler), ld (`ld/`, GNU ld), gold (`gold/`,
GNU gold)
As of 2021-01, it has no wiki.
Target `all` builds targets `all-host` and `all-target`. When running
configure, by default most top-level directories binutils `gas gdb gdbserver ld
libctf` are all enabled. You can disable some components via `--disable-*`.
`--enable-gold` is needed to enable gold.
```sh
mkdir Debug; cd Debug
../configure --target=x86_64-linux-gnu --prefix=/tmp/opt --disable-gdb --disable-gdbserver
```
For cross compiling, make sure your have `$target-{gcc,as,ld}`.
For many tools (binutils, gdb, ld), `--enable-targets=all` will build every
supported architectures and binary formats. However, one gas build can only
support one architecture. ld has a default emulation and needs `-m` to support
other architectures (`aarch64 architecture of input file 'a.o' is incompatible
with i386:x86-64 output`). Many tests are generic and can be run on many
targets, but a `--enable-targets=all` build only tests its default target.
```sh
# binutils (binutils/*)
make -C Debug all-binutils
# gas (gas/as-new)
make -C Debug all-gas
# ld (ld/ld-new)
make -C Debug all-ld
# Build all enabled tools.
make -C Debug all
```
Build with Clang:
```sh
mkdir -p out/clang-debug; cd out/clang-debug
../../configure CC=~/Stable/bin/clang CXX=~/Stable/bin/clang++ CFLAGS='-O0 -g' CXXFLAGS='-O0 -g'
```
About security aspect, "don't run any of binutils as root" is sufficient advice
(Alan Modra).
## Test
GNU Test Framework DejaGnu is based on Expect, which is in turn based on Tcl.
To run tests:
```sh
make -C Debug check-binutils
# Find the result in (summary) Debug/binutils/binutils.sum and (details) Debug/binutils/binutils.log
make -C Debug check-gas
# Find the result in (summary) Debug/gas/testsuite/gas.sum and (details) Debug/gas/testsuite/gas.log
make -C Debug check-ld
# Test all enabled tools.
make -C Debug check-all
```
For ld, tests are listed in `.exp` files under `ld/testsuite`. A single test
normally consists of a `.d` file and several associated `.s` files.
To run the tests in `ld/testsuite/ld-shared/shared.exp`:
```sh
make -C Debug check-ld RUNTESTFLAGS=ld-shared/shared.exp
```
### Misc
* A bot updates bfd/version.h (`BFD_VERSION_DATE`) daily.
* Test coverage is low.
## gdb
gdb resides in the binutils-gdb repository. `configure` enables gdb and
gdbserver by default. You just need to make sure `--disable-gdb
--disable-gdbserver` is not on the configure line.
Run gdb under the build directory:
```sh
gdb/gdb -data-directory gdb/data-directory
```
To run the tests in `gdb/testsuite/gdb.dwarf2/dw2-abs-hi-pc.exp`:
```sh
make check-gdb RUNTESTFLAGS=gdb.dwarf2/dw2-abs-hi-pc.exp
# cd $build/gdb/testsuite/outputs/gdb.dwarf2/dw2-abs-hi-pc
```
## glibc
* Repository: https://sourceware.org/git/gitweb.cgi?p=glibc.git
* Wiki: https://sourceware.org/glibc/wiki/
* Bugzilla: https://sourceware.org/bugzilla/
* Mailing lists: `{libc-announce,libc-alpha,libc-locale,libc-stable,libc-help}@sourceware.org`
(Mostly) an implementation of the user-space side of standard C/POSIX functions
with Linux extensions.
A very unfortunate fact: glibc can only be built with `-O2`, not `-O0` or
`-O1`. If you want to have an un-optimized debug build, deleting an object file
and recompiling it with `-g` usually works. Another workaround is `#pragma GCC
optimize ("O0")`.
The `-O2` issue is probably related to (1) expected inlining and (2) avoiding
dynamic relocations.
Run the following commands to populate `/tmp/glibc-many` with toolchains.
Caution: please make sure the target file system has tens of gigabytes.
Preparation:
```sh
scripts/build-many-glibcs.py /tmp/glibc-many checkout --shallow
scripts/build-many-glibcs.py /tmp/glibc-many host-libraries
scripts/build-many-glibcs.py /tmp/glibc-many compilers aarch64-linux-gnu
scripts/build-many-glibcs.py /tmp/glibc-many compilers powerpc64le-linux-gnu
scripts/build-many-glibcs.py /tmp/glibc-many compilers sparc64-linux-gnu
```
* `--shallow` passes `--depth 1` to the git clone command.
* `--keep` all keeps intermediary build directories intact. You may want this
option to investigate build issues.
The `glibcs` command will delete the glibc build directory, build glibc, and
run `make check`.
```sh
scripts/build-many-glibcs.py /tmp/glibc-many glibcs aarch64-linux-gnu
# Find the logs and test results under /tmp/glibc-many/logs/glibcs/aarch64-linux-gnu/
scripts/build-many-glibcs.py /tmp/glibc-many glibcs powerpc64le-linux-gnu
scripts/build-many-glibcs.py /tmp/glibc-many glibcs sparc64-linux-gnu
```
"On build-many-glibcs.py and most stage1 compiler bootstrap, gcc is build
statically against newlib. the static linked gcc (with a lot of disabled
features) is then used to build glibc and then the stage2 gcc (which will then
have all the features that rely on libc enabled) so the stage1 gcc *might* not
have the require started files"
During development, some interesting targets:
```sh
make -C Debug check-abi
```
Building with Clang is not an option.
* Clang does not support GCC nested functions [BZ #27220](https://sourceware.org/bugzilla/show_bug.cgi?id=27220)
* x86 `PRESERVE_BND_REGS_PREFIX`: integrated assembler does not support the
`bnd` prefix.
* `sysdeps/powerpc/powerpc64/Makefile`: Clang does not support
`-ffixed-vrsave -ffixed-vscr`
## GCC
* Mailing lists: `gcc-{patches,regression}@sourceware.org`
`--disable-bootstrap` is the most important, otherwise you will get a stage 2
build. It is not clear what make does when you touch a source file. It
definitely rebuilds stage1, but it is not clear to me how well stage2
dependency is handled. Anyway, touching a source file causes a total build is
not what you desire.
```sh
../configure --disable-bootstrap --enable-languages=c,c++ --disable-multilib
make -j 30
# Incremental build
make -C gcc cc1 cc1plus xgcc
make -C x86_64-pc-linux-gnu/libstdc++-v3
```
Use built libstdc++ and libgcc.
```sh
$build/gcc/xg++ -B $build/release/gcc forced1.C -Wl,-rpath,$build/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs,-rpath,$build/x86_64-pc-linux-gnu/libgcc
```
### Misc
* A bot updates `ChangeLog` files daily. `Daily bump.`
## Unlisted
autotools, bison, m4, make, ...
### Contributing
[GNU Coding Standards](https://www.gnu.org/prep/standards/). Emacs has good
built-in support. clang-format's support is not as good.
Legally significant changes need [Copyright Papers](https://www.gnu.org/prep/maintain/html_node/Copyright-Papers.html).

253
maskray-8.md Normal file
View File

@ -0,0 +1,253 @@
# Metadata sections, COMDAT and `SHF_LINK_ORDER`
## COMDAT
In C++, inline functions, template instantiations and a few other things can be
defined in multiple object files but need deduplication at link time. In the
dark ages the functionality was implemented by weak definitions: the linker
does not report duplicate definition errors and resolves the references to the
first definition. The downside is that unneeded copies remained in the linked
image.
In Microsoft PE file format, the section flag (`IMAGE_SCN_LNK_COMDAT`) marks a
section COMDAT and enables deduplication on a per-section basis. If a text
section needs a data section and deduplication is needed for both sections, two
COMDAT symbols are needed.
In the GNU world, `.gnu.linkonce.` was invented to duplicate groups with just
one member. `.gnu.linkonce.` has been long obsoleted in favor of section groups
but the usage has been daunting til 2020. Adhemerval Zanella removed the the
last live glibc use case for `.gnu.linkonce.`
[BZ #20543](http://sourceware.org/PR20543).
## ELF section groups
The ELF specification generalized this use case to allow an arbitrary number of
groups to be interrelated.
> Some sections occur in interrelated groups. For example, an out-of-line
> definition of an inline function might require, in addition to the section
> containing its executable instructions, a read-only data section containing
> literals referenced, one or more debugging information sections and other
> informational sections. Furthermore, there may be internal references among
> these sections that would not make sense if one of the sections were removed
> or replaced by a duplicate from another object. Therefore, such groups must
> be included or omitted from the linked object as a unit. A section cannot be
> a member of more than one group.
According to "such groups must be included or omitted from the linked object as
a unit", a linker's garbage collection feature must retain or discard the
sections as a unit.
The most common section group flag is `GRP_COMDAT`, which makes the member
sections similar to COMDAT in Microsoft PE file format, but can apply to
multiple sections. (The committee borrowed the name "COMDAT" from PE.)
> This is a COMDAT group. It may duplicate another COMDAT group in another
> object file, where duplication is defined as having the same group signature.
> In such cases, only one of the duplicate groups may be retained by the
> linker, and the members of the remaining groups must be discarded.
I want to highlight one thing GCC does (and Clang inherits) for backward
compatibility: the definitions relatived to a COMDAT group member are kept
`STB_WEAK` instead of `STB_GLOBAL`. The idea is that old toolchain which does
not recognize COMDAT groups can still operate correctly, just in a degraded
manner.
## Metadata sections
Many compiler options intrument text sections or annotate text sections, and
need to create a metadata section for (almost) every text section. Such
metadata sections have some characteristics:
* All relocations from the metadata section reference the associated text
section.
* The metadata section is only referenced by the associated text section or not
referenced at all.
Below is an example:
```
.section .text.foo,"ax",@progbits
.section .meta.foo,"a",@progbits
.quad .text.foo-.
```
Users want GC semantics for such metadata sections: if `.text.foo` is retained,
`.meta.foo` is retained. Note: the regular GC semantics are converse: if
`.meta.foo` is retained, `.text.foo` is retained.
To achieve the desired GC semantics on ELF platforms, we could use a non-COMDAT
section group. However, using a section group requires one extra section
(usually named `.group`), which requires 40 bytes on ELFCLASS32 platforms and
64 bytes on ELFCLASS64 platforms. Put it in another way, to represent the
metadata of a text section, we need two sections (the metadata section and the
section group), 128 bytes on ELFCLASS64 platforms. The size overhead is
concerning in many applications. (AArch64 and x86-64 define ILP32 ABIs and use
ELFCLASS32, but technically they can use ELFCLASS32 for small code model with
regular ABIs, if the kernel allows.)
In a generic-abi thread, Cary Coutant initially suggested to use a new section
flag `SHF_ASSOCIATED`. HP-UX and Solaris folks objected to a new generic flag.
Cary Coutant then discussed with Jim Dehnert and noticed that the existing
(rare) flag `SHF_LINK_ORDER` has semantics closer to the metadata GC semantics,
so he intended to replace the existing flag `SHF_LINK_ORDER`. Solaris had used
its own `SHF_ORDERED` extension before it migrated to the ELF simplification
`SHF_LINK_ORDER`. Solaris is still using `SHF_LINK_ORDER` so the flag cannot be
repurposed. People discussed whether `SHF_OS_NONCONFORMING` could be repurposed
but did not take that route: the platform already knows whether a flag is
unknown and knowing a flag is non-conforming does not help produce better
output. In the end the agreement was that `SHF_LINK_ORDER` gained additional
metadata GC semantics.
The new semantics:
> This flag adds special ordering requirements for link editors. The
> requirements apply to the referenced section identified by the sh_link field
> of this section's header. If this section is combined with other sections in
> the output file, the section must appear in the same relative order with
> respect to those sections, as the referenced section appears with respect to
> sections the referenced section is combined with.
>
> A typical use of this flag is to build a table that references text or data
> sections in address order.
>
> In addition to adding ordering requirements, `SHF_LINK_ORDER` indicates that
> the section contains metadata describing the referenced section. When
> performing unused section elimination, the link editor should ensure that
> both the section and the referenced section are retained or discarded
> together. Furthermore, relocations from this section into the referenced
> section should not be taken as evidence that the referenced section should be
> retained.
Actually, ARM EHABI has been using `SHF_LINK_ORDER` for index table sections
`.ARM.exidx*`. A `.ARM.exidx` section contains a sequence of 2-word pairs. The
first word is 31-bit PC-relative offset to the start of the region. The idea is
that if the entries are ordered by the start address, the end address of an
entry is implicitly the start address of the next entry and does not need to be
explicitly encoded. For this reason the section uses `SHF_LINK_ORDER` for the
ordering requirement. The GC semantics are very similar to the metadata
sections'.
So the updated `SHF_LINK_ORDER` wording can be seen as recognition for the
current practice (even though the original discussion did not actually notice
ARM EHABI).
However, in binutils, before 2.35, `SHF_LINK_ORDER` could be produced by ARM
assembly directives, but not specified by user-customized sections.
## C identifier name sections
A section whose name consists of pure C-like identifier characters (isalnum
characters in the C locale plus `_`) is considered as a GC root by ld
`--gc-sections`. The idea is that linker defined `__start_foo` and `__stop_foo`
are used to delimiter the output section foo. Even if input sections foo are
not referenced by other sections, `__start_foo`/`__stop_foo` is a signal that
foo should be retained.
The metadata use case requires an amendment of the rule: if `SHF_LINK_ORDER` is
set on foo, foo can be GCed (LLD r294592).
GNU ld does not implement this rule yet. https://sourceware.org/bugzilla/show_bug.cgi?id=27259
## Pitfalls
### Mixed unordered and ordered sections
If an output section consists of only non-`SHF_LINK_ORDER` sections, the rule is
clear: input sections are ordered in their input order. If an output section
consists of only `SHF_LINK_ORDER` sections, the rule is also clear: input
sections are ordered with respect to their linked-to sections.
What is unclear is how to handle an output section with mixed unordered and
ordered sections.
GNU ld had a diagnostic: . LLD rejected the case as well error:
`incompatible section flags for .rodata`.
When I implemented `-fpatchable-function-entry=` for Clang, I observed some GC
related issues with the GCC implementation. I reported them and carefully chose
`SHF_LINK_ORDER` in the Clang implementation if the integrated assembler is
used.
This was a problem if the user wanted to place such input sections along with
unordered sections, e.g.
`.init.data : { ... KEEP(*(__patchable_function_entries)) ... }`
(https://github.com/ClangBuiltLinux/linux/issues/953).
As a response, I submitted https://reviews.llvm.org/D77007 to allow ordered
input section descriptions within an output section.
This worked well for the Linux kernel. Mixed unordered and ordered sections
within an input section description was still a problem. This made it
infeasible to add `SHF_LINK_ORDER` to an existing metadata section and expect
new object files linkable with old object files which do not have the flag. I
asked how to resolve this upgrade issue and Ali Bahrami responded:
> The Solaris linker puts sections without `SHF_LINK_ORDER` at the end of the
> output section, in first-in-first-out order, and I don't believe that's
> considered to be an error.
So I went ahead and implemented a similar rule for LLD:
https://reviews.llvm.org/D84001 allowes arbitrary mix and places
`SHF_LINK_ORDER` sections before non-`SHF_LINK_ORDER` sections.
### If the associated section is discarded
We decided that the integrated assembler allows `SHF_LINK_ORDER` with
`sh_link=0` and LLD can handle such sections as regular unordered sections
(https://reviews.llvm.org/D72904).
### Other pitfalls
* During `--icf={safe,all}`, `SHF_LINK_ORDER` sections should not be separately
considered.
* In relocatable output, `SHF_LINK_ORDER` sections cannot be combined by name.
* When comparing two input sections with different linked-to output sections,
use vaddr of output sections instead of section indexes. Peter Smith fixed
this in https://reviews.llvm.org/D79286.
## Miscellaneous
Arm Compiler 5 splits up DWARF Version 3 debug information and puts these
sections into comdat groups. On "monolithic input section handling", Peter
Smith commented that:
> We found that splitting up the debug into fragments works well as it permits
> the linker to ensure that all the references to local symbols are to sections
> within the same group, this makes it easy for the linker to remove all the
> debug when the group isn't selected.
>
> This approach did produce significantly more debug information than gcc did.
> For small microcontroller projects this wasn't a problem. For larger feature
> phone problems we had to put a lot of work into keeping the linker's memory
> usage down as many of our customers at the time were using 32-bit Windows
> machines with a default maximum virtual memory of 2Gb.
COMDAT sections have size overhead on extra section headers. Developers may be
tempted to decrease the overhead with `SHF_LINK_ORDER`. However, the approach
does not work due to the ordering requirement. Considering the following
fragments:
```
header [a.o common]
- DW_TAG_compile_unit [a.o common]
-- DW_TAG_variable [a.o .data.foo]
-- DW_TAG_namespace [common]
--- DW_TAG_subprogram [a.o .text.bar]
--- DW_TAG_variable [a.o .data.baz]
footer [a.o common]
header [b.o common]
- DW_TAG_compile_unit [b.o common]
-- DW_TAG_variable [b.o .data.foo]
-- DW_TAG_namespace [common]
--- DW_TAG_subprogram [b.o .text.bar]
--- DW_TAG_variable [b.o .data.baz]
footer [b.o common]
```
`DW_TAG_*` tags associated with concrete sections can be represented with
`SHF_LINK_ORDER` sections. After linking the sections will be ordered before the
common parts.