more articles
This commit is contained in:
parent
58a5061601
commit
589177c6c2
11
README.md
11
README.md
|
@ -43,3 +43,14 @@ Other articles included as well:
|
||||||
* [Executable stack](executable-stack.md)
|
* [Executable stack](executable-stack.md)
|
||||||
* [Piece of PIE](piece-of-pie.md)
|
* [Piece of PIE](piece-of-pie.md)
|
||||||
|
|
||||||
|
Even more articles, from [MaskRay's blog](https://maskray.me/blog/):
|
||||||
|
|
||||||
|
* [Stack unwinding](maskray-1.md)
|
||||||
|
* [All about symbol versioning](maskray-2.md)
|
||||||
|
* [C++ exception handling ABI](maskray-3.md)
|
||||||
|
* [LLD and GNU linker incompatibilities](maskray-4.md)
|
||||||
|
* [Copy relocations, canonical PLT entries and protected visibility](maskray-5.md)
|
||||||
|
* [GNU indirect function](maskray-6.md)
|
||||||
|
* [Everything I know about GNU toolchain](maskray-7.md)
|
||||||
|
* [Metadata sections, COMDAT and `SHF_LINK_ORDER`](maskray-8.md)
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,123 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||||
|
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
|
||||||
|
<!-- Generated by graphviz version 2.43.0 (0)
|
||||||
|
-->
|
||||||
|
<!-- Title: %3 Pages: 1 -->
|
||||||
|
<svg width="630pt" height="224pt"
|
||||||
|
viewBox="0.00 0.00 630.00 224.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||||
|
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 220)">
|
||||||
|
<title>%3</title>
|
||||||
|
<polygon fill="white" stroke="transparent" points="-4,4 -4,-220 626,-220 626,4 -4,4"/>
|
||||||
|
<!-- eh_frame -->
|
||||||
|
<g id="node1" class="node">
|
||||||
|
<title>eh_frame</title>
|
||||||
|
<polygon fill="none" stroke="black" points="0,-146.5 0,-215.5 622,-215.5 622,-146.5 0,-146.5"/>
|
||||||
|
<text text-anchor="middle" x="311" y="-200.3" font-family="Times,serif" font-size="14.00">.eh_frame</text>
|
||||||
|
<polyline fill="none" stroke="black" points="0,-192.5 622,-192.5 "/>
|
||||||
|
<text text-anchor="middle" x="131" y="-177.3" font-family="Times,serif" font-size="14.00">FDE0</text>
|
||||||
|
<polyline fill="none" stroke="black" points="0,-169.5 262,-169.5 "/>
|
||||||
|
<text text-anchor="middle" x="49" y="-154.3" font-family="Times,serif" font-size="14.00">initial_location</text>
|
||||||
|
<polyline fill="none" stroke="black" points="98,-146.5 98,-169.5 "/>
|
||||||
|
<text text-anchor="middle" x="148.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_personality</text>
|
||||||
|
<polyline fill="none" stroke="black" points="199,-146.5 199,-169.5 "/>
|
||||||
|
<text text-anchor="middle" x="230.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_lsda</text>
|
||||||
|
<polyline fill="none" stroke="black" points="262,-146.5 262,-192.5 "/>
|
||||||
|
<text text-anchor="middle" x="393" y="-177.3" font-family="Times,serif" font-size="14.00">FDE1</text>
|
||||||
|
<polyline fill="none" stroke="black" points="262,-169.5 524,-169.5 "/>
|
||||||
|
<text text-anchor="middle" x="311" y="-154.3" font-family="Times,serif" font-size="14.00">initial_location</text>
|
||||||
|
<polyline fill="none" stroke="black" points="360,-146.5 360,-169.5 "/>
|
||||||
|
<text text-anchor="middle" x="410.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_personality</text>
|
||||||
|
<polyline fill="none" stroke="black" points="461,-146.5 461,-169.5 "/>
|
||||||
|
<text text-anchor="middle" x="492.5" y="-154.3" font-family="Times,serif" font-size="14.00">.cfi_lsda</text>
|
||||||
|
<polyline fill="none" stroke="black" points="524,-146.5 524,-192.5 "/>
|
||||||
|
<text text-anchor="middle" x="573" y="-177.3" font-family="Times,serif" font-size="14.00">FDE2</text>
|
||||||
|
<polyline fill="none" stroke="black" points="524,-169.5 622,-169.5 "/>
|
||||||
|
<text text-anchor="middle" x="573" y="-154.3" font-family="Times,serif" font-size="14.00">initial_location</text>
|
||||||
|
</g>
|
||||||
|
<!-- text_a -->
|
||||||
|
<g id="node2" class="node">
|
||||||
|
<title>text_a</title>
|
||||||
|
<polygon fill="none" stroke="black" points="131.5,-0.5 131.5,-36.5 210.5,-36.5 210.5,-0.5 131.5,-0.5"/>
|
||||||
|
<text text-anchor="middle" x="171" y="-14.8" font-family="Times,serif" font-size="14.00">.text._Z1av</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->text_a -->
|
||||||
|
<g id="edge1" class="edge">
|
||||||
|
<title>eh_frame:loc0->text_a</title>
|
||||||
|
<path fill="none" stroke="black" d="M49,-146C49,-113.05 42.18,-99.33 62,-73 76.62,-53.58 100.2,-40.8 121.68,-32.61"/>
|
||||||
|
<polygon fill="black" stroke="black" points="123.12,-35.82 131.37,-29.17 120.78,-29.22 123.12,-35.82"/>
|
||||||
|
</g>
|
||||||
|
<!-- text_b -->
|
||||||
|
<g id="node3" class="node">
|
||||||
|
<title>text_b</title>
|
||||||
|
<polygon fill="none" stroke="black" points="314,-0.5 314,-36.5 394,-36.5 394,-0.5 314,-0.5"/>
|
||||||
|
<text text-anchor="middle" x="354" y="-14.8" font-family="Times,serif" font-size="14.00">.text._Z1bv</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->text_b -->
|
||||||
|
<g id="edge4" class="edge">
|
||||||
|
<title>eh_frame:loc1->text_b</title>
|
||||||
|
<path fill="none" stroke="black" d="M311,-146C311,-112.2 360.65,-139.01 378,-110 389.9,-90.1 381.08,-64.27 370.95,-45.31"/>
|
||||||
|
<polygon fill="black" stroke="black" points="373.96,-43.53 365.95,-36.6 367.89,-47.02 373.96,-43.53"/>
|
||||||
|
</g>
|
||||||
|
<!-- text_c -->
|
||||||
|
<g id="node4" class="node">
|
||||||
|
<title>text_c</title>
|
||||||
|
<polygon fill="none" stroke="black" points="533.5,-73.5 533.5,-109.5 612.5,-109.5 612.5,-73.5 533.5,-73.5"/>
|
||||||
|
<text text-anchor="middle" x="573" y="-87.8" font-family="Times,serif" font-size="14.00">.text._Z1cv</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->text_c -->
|
||||||
|
<g id="edge7" class="edge">
|
||||||
|
<title>eh_frame:loc2->text_c</title>
|
||||||
|
<path fill="none" stroke="black" d="M573,-146C573,-137.51 573,-128.26 573,-119.88"/>
|
||||||
|
<polygon fill="black" stroke="black" points="576.5,-119.85 573,-109.85 569.5,-119.85 576.5,-119.85"/>
|
||||||
|
</g>
|
||||||
|
<!-- text_personality -->
|
||||||
|
<g id="node5" class="node">
|
||||||
|
<title>text_personality</title>
|
||||||
|
<polygon fill="none" stroke="black" points="71.5,-73.5 71.5,-109.5 236.5,-109.5 236.5,-73.5 71.5,-73.5"/>
|
||||||
|
<text text-anchor="middle" x="154" y="-87.8" font-family="Times,serif" font-size="14.00">.text.__gxx_personality_v0</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->text_personality -->
|
||||||
|
<g id="edge2" class="edge">
|
||||||
|
<title>eh_frame:personality0->text_personality</title>
|
||||||
|
<path fill="none" stroke="black" d="M148,-146C148,-137.46 148.76,-128.19 149.75,-119.81"/>
|
||||||
|
<polygon fill="black" stroke="black" points="153.23,-120.16 151.07,-109.79 146.29,-119.25 153.23,-120.16"/>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->text_personality -->
|
||||||
|
<g id="edge5" class="edge">
|
||||||
|
<title>eh_frame:personality1->text_personality</title>
|
||||||
|
<path fill="none" stroke="black" d="M411,-146C411,-143.84 320.57,-125.41 246.99,-110.78"/>
|
||||||
|
<polygon fill="black" stroke="black" points="247.22,-107.26 236.73,-108.75 245.86,-114.13 247.22,-107.26"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda -->
|
||||||
|
<g id="node6" class="node">
|
||||||
|
<title>lsda</title>
|
||||||
|
<polygon fill="none" stroke="black" points="255,-73.5 255,-109.5 369,-109.5 369,-73.5 255,-73.5"/>
|
||||||
|
<text text-anchor="middle" x="312" y="-87.8" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->lsda -->
|
||||||
|
<g id="edge3" class="edge">
|
||||||
|
<title>eh_frame:lsda0->lsda</title>
|
||||||
|
<path fill="none" stroke="black" d="M230,-146C230,-132.86 237.48,-122.78 247.91,-115.11"/>
|
||||||
|
<polygon fill="black" stroke="black" points="249.85,-118.02 256.4,-109.7 246.08,-112.12 249.85,-118.02"/>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->lsda -->
|
||||||
|
<g id="edge6" class="edge">
|
||||||
|
<title>eh_frame:lsda1->lsda</title>
|
||||||
|
<path fill="none" stroke="black" d="M493,-146C493,-139.84 430.6,-122.47 379.08,-109.18"/>
|
||||||
|
<polygon fill="black" stroke="black" points="379.83,-105.76 369.27,-106.66 378.09,-112.54 379.83,-105.76"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda->text_a -->
|
||||||
|
<g id="edge8" class="edge">
|
||||||
|
<title>lsda->text_a</title>
|
||||||
|
<path fill="none" stroke="black" stroke-dasharray="1,5" d="M278.23,-73.49C259.01,-63.82 234.74,-51.6 214.15,-41.23"/>
|
||||||
|
<polygon fill="black" stroke="black" points="215.49,-37.99 204.99,-36.61 212.34,-44.24 215.49,-37.99"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda->text_b -->
|
||||||
|
<g id="edge9" class="edge">
|
||||||
|
<title>lsda->text_b</title>
|
||||||
|
<path fill="none" stroke="black" stroke-dasharray="1,5" d="M322.17,-73.31C327.12,-64.94 333.18,-54.7 338.68,-45.4"/>
|
||||||
|
<polygon fill="black" stroke="black" points="341.85,-46.92 343.93,-36.53 335.82,-43.35 341.85,-46.92"/>
|
||||||
|
</g>
|
||||||
|
</g>
|
||||||
|
</svg>
|
After Width: | Height: | Size: 6.8 KiB |
|
@ -0,0 +1,66 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||||
|
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
|
||||||
|
<!-- Generated by graphviz version 2.43.0 (0)
|
||||||
|
-->
|
||||||
|
<!-- Title: %3 Pages: 1 -->
|
||||||
|
<svg width="364pt" height="209pt"
|
||||||
|
viewBox="0.00 0.00 364.00 209.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||||
|
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 205)">
|
||||||
|
<title>%3</title>
|
||||||
|
<polygon fill="white" stroke="transparent" points="-4,4 -4,-205 360,-205 360,4 -4,4"/>
|
||||||
|
<g id="clust1" class="cluster">
|
||||||
|
<title>cluster</title>
|
||||||
|
<polygon fill="none" stroke="black" points="8,-8 8,-157 348,-157 348,-8 8,-8"/>
|
||||||
|
<text text-anchor="middle" x="178" y="-141.8" font-family="Times,serif" font-size="14.00">Edges represent relocations</text>
|
||||||
|
</g>
|
||||||
|
<!-- unused -->
|
||||||
|
<g id="node1" class="node">
|
||||||
|
<title>unused</title>
|
||||||
|
<ellipse fill="none" stroke="black" cx="264" cy="-183" rx="36" ry="18"/>
|
||||||
|
<text text-anchor="middle" x="264" y="-179.3" font-family="Times,serif" font-size="14.00">unused</text>
|
||||||
|
</g>
|
||||||
|
<!-- fde_a -->
|
||||||
|
<g id="node2" class="node">
|
||||||
|
<title>fde_a</title>
|
||||||
|
<polygon fill="none" stroke="black" points="210,-89.5 210,-125.5 318,-125.5 318,-89.5 210,-89.5"/>
|
||||||
|
<text text-anchor="middle" x="264" y="-103.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE0</text>
|
||||||
|
</g>
|
||||||
|
<!-- unused->fde_a -->
|
||||||
|
<g id="edge3" class="edge">
|
||||||
|
<title>unused->fde_a</title>
|
||||||
|
<path fill="none" stroke="black" d="M264,-164.95C264,-156.3 264,-145.57 264,-135.79"/>
|
||||||
|
<polygon fill="black" stroke="black" points="267.5,-135.71 264,-125.71 260.5,-135.71 267.5,-135.71"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_a -->
|
||||||
|
<g id="node4" class="node">
|
||||||
|
<title>lsda_a</title>
|
||||||
|
<polygon fill="none" stroke="black" points="188,-16.5 188,-52.5 340,-52.5 340,-16.5 188,-16.5"/>
|
||||||
|
<text text-anchor="middle" x="264" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1av</text>
|
||||||
|
</g>
|
||||||
|
<!-- fde_a->lsda_a -->
|
||||||
|
<g id="edge1" class="edge">
|
||||||
|
<title>fde_a->lsda_a</title>
|
||||||
|
<path fill="none" stroke="black" d="M264,-89.31C264,-81.29 264,-71.55 264,-62.57"/>
|
||||||
|
<polygon fill="black" stroke="black" points="267.5,-62.53 264,-52.53 260.5,-62.53 267.5,-62.53"/>
|
||||||
|
</g>
|
||||||
|
<!-- fde_b -->
|
||||||
|
<g id="node3" class="node">
|
||||||
|
<title>fde_b</title>
|
||||||
|
<polygon fill="none" stroke="black" points="39,-89.5 39,-125.5 147,-125.5 147,-89.5 39,-89.5"/>
|
||||||
|
<text text-anchor="middle" x="93" y="-103.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE1</text>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_b -->
|
||||||
|
<g id="node5" class="node">
|
||||||
|
<title>lsda_b</title>
|
||||||
|
<polygon fill="none" stroke="black" points="16.5,-16.5 16.5,-52.5 169.5,-52.5 169.5,-16.5 16.5,-16.5"/>
|
||||||
|
<text text-anchor="middle" x="93" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1bv</text>
|
||||||
|
</g>
|
||||||
|
<!-- fde_b->lsda_b -->
|
||||||
|
<g id="edge2" class="edge">
|
||||||
|
<title>fde_b->lsda_b</title>
|
||||||
|
<path fill="none" stroke="black" d="M93,-89.31C93,-81.29 93,-71.55 93,-62.57"/>
|
||||||
|
<polygon fill="black" stroke="black" points="96.5,-62.53 93,-52.53 89.5,-62.53 96.5,-62.53"/>
|
||||||
|
</g>
|
||||||
|
</g>
|
||||||
|
</svg>
|
After Width: | Height: | Size: 3.0 KiB |
|
@ -0,0 +1,96 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||||
|
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
|
||||||
|
<!-- Generated by graphviz version 2.43.0 (0)
|
||||||
|
-->
|
||||||
|
<!-- Title: %3 Pages: 1 -->
|
||||||
|
<svg width="496pt" height="246pt"
|
||||||
|
viewBox="0.00 0.00 496.00 246.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||||
|
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 242)">
|
||||||
|
<title>%3</title>
|
||||||
|
<polygon fill="white" stroke="transparent" points="-4,4 -4,-242 492,-242 492,4 -4,4"/>
|
||||||
|
<g id="clust1" class="cluster">
|
||||||
|
<title>cluster</title>
|
||||||
|
<polygon fill="none" stroke="black" points="8,-8 8,-230 480,-230 480,-8 8,-8"/>
|
||||||
|
<text text-anchor="middle" x="244" y="-214.8" font-family="Times,serif" font-size="14.00">Edges represent GC references</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame -->
|
||||||
|
<g id="node1" class="node">
|
||||||
|
<title>eh_frame</title>
|
||||||
|
<polygon fill="none" stroke="black" points="159.5,-162.5 159.5,-198.5 288.5,-198.5 288.5,-162.5 159.5,-162.5"/>
|
||||||
|
<text text-anchor="middle" x="224" y="-176.8" font-family="Times,serif" font-size="14.00">.eh_frame (GC root)</text>
|
||||||
|
</g>
|
||||||
|
<!-- lsda -->
|
||||||
|
<g id="node4" class="node">
|
||||||
|
<title>lsda</title>
|
||||||
|
<polygon fill="none" stroke="black" points="16,-89.5 16,-125.5 130,-125.5 130,-89.5 16,-89.5"/>
|
||||||
|
<text text-anchor="middle" x="73" y="-103.8" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->lsda -->
|
||||||
|
<g id="edge1" class="edge">
|
||||||
|
<title>eh_frame->lsda</title>
|
||||||
|
<path fill="none" stroke="black" d="M187.83,-162.49C167.07,-152.73 140.79,-140.38 118.62,-129.95"/>
|
||||||
|
<polygon fill="black" stroke="black" points="119.94,-126.7 109.4,-125.61 116.96,-133.04 119.94,-126.7"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_a -->
|
||||||
|
<g id="node5" class="node">
|
||||||
|
<title>lsda_a</title>
|
||||||
|
<polygon fill="none" stroke="black" points="148,-89.5 148,-125.5 300,-125.5 300,-89.5 148,-89.5"/>
|
||||||
|
<text text-anchor="middle" x="224" y="-103.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1av</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->lsda_a -->
|
||||||
|
<g id="edge2" class="edge">
|
||||||
|
<title>eh_frame->lsda_a</title>
|
||||||
|
<path fill="none" stroke="black" d="M224,-162.31C224,-154.29 224,-144.55 224,-135.57"/>
|
||||||
|
<polygon fill="black" stroke="black" points="227.5,-135.53 224,-125.53 220.5,-135.53 227.5,-135.53"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_b -->
|
||||||
|
<g id="node6" class="node">
|
||||||
|
<title>lsda_b</title>
|
||||||
|
<polygon fill="none" stroke="black" points="318.5,-89.5 318.5,-125.5 471.5,-125.5 471.5,-89.5 318.5,-89.5"/>
|
||||||
|
<text text-anchor="middle" x="395" y="-103.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1bv</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->lsda_b -->
|
||||||
|
<g id="edge3" class="edge">
|
||||||
|
<title>eh_frame->lsda_b</title>
|
||||||
|
<path fill="none" stroke="black" d="M264.96,-162.49C288.79,-152.6 319.03,-140.04 344.34,-129.53"/>
|
||||||
|
<polygon fill="black" stroke="black" points="345.89,-132.68 353.78,-125.61 343.21,-126.22 345.89,-132.68"/>
|
||||||
|
</g>
|
||||||
|
<!-- text_a -->
|
||||||
|
<g id="node2" class="node">
|
||||||
|
<title>text_a</title>
|
||||||
|
<polygon fill="none" stroke="black" points="184.5,-16.5 184.5,-52.5 263.5,-52.5 263.5,-16.5 184.5,-16.5"/>
|
||||||
|
<text text-anchor="middle" x="224" y="-30.8" font-family="Times,serif" font-size="14.00">.text._Z1av</text>
|
||||||
|
</g>
|
||||||
|
<!-- text_a->lsda_a -->
|
||||||
|
<g id="edge4" class="edge">
|
||||||
|
<title>text_a->lsda_a</title>
|
||||||
|
<path fill="none" stroke="black" d="M229.86,-52.53C230.71,-60.53 230.95,-70.27 230.59,-79.25"/>
|
||||||
|
<polygon fill="black" stroke="black" points="227.09,-79.09 229.88,-89.31 234.07,-79.58 227.09,-79.09"/>
|
||||||
|
</g>
|
||||||
|
<!-- text_b -->
|
||||||
|
<g id="node3" class="node">
|
||||||
|
<title>text_b</title>
|
||||||
|
<polygon fill="none" stroke="black" points="355,-16.5 355,-52.5 435,-52.5 435,-16.5 355,-16.5"/>
|
||||||
|
<text text-anchor="middle" x="395" y="-30.8" font-family="Times,serif" font-size="14.00">.text._Z1bv</text>
|
||||||
|
</g>
|
||||||
|
<!-- text_b->lsda_b -->
|
||||||
|
<g id="edge6" class="edge">
|
||||||
|
<title>text_b->lsda_b</title>
|
||||||
|
<path fill="none" stroke="black" d="M400.86,-52.53C401.71,-60.53 401.95,-70.27 401.59,-79.25"/>
|
||||||
|
<polygon fill="black" stroke="black" points="398.09,-79.09 400.88,-89.31 405.07,-79.58 398.09,-79.09"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_a->text_a -->
|
||||||
|
<g id="edge5" class="edge">
|
||||||
|
<title>lsda_a->text_a</title>
|
||||||
|
<path fill="none" stroke="black" d="M218.12,-89.31C217.28,-81.29 217.05,-71.55 217.42,-62.57"/>
|
||||||
|
<polygon fill="black" stroke="black" points="220.92,-62.75 218.14,-52.53 213.94,-62.25 220.92,-62.75"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_b->text_b -->
|
||||||
|
<g id="edge7" class="edge">
|
||||||
|
<title>lsda_b->text_b</title>
|
||||||
|
<path fill="none" stroke="black" d="M389.12,-89.31C388.28,-81.29 388.05,-71.55 388.42,-62.57"/>
|
||||||
|
<polygon fill="black" stroke="black" points="391.92,-62.75 389.14,-52.53 384.94,-62.25 391.92,-62.75"/>
|
||||||
|
</g>
|
||||||
|
</g>
|
||||||
|
</svg>
|
After Width: | Height: | Size: 4.6 KiB |
|
@ -0,0 +1,84 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||||
|
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
|
||||||
|
<!-- Generated by graphviz version 2.43.0 (0)
|
||||||
|
-->
|
||||||
|
<!-- Title: %3 Pages: 1 -->
|
||||||
|
<svg width="496pt" height="173pt"
|
||||||
|
viewBox="0.00 0.00 496.00 173.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||||
|
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 169)">
|
||||||
|
<title>%3</title>
|
||||||
|
<polygon fill="white" stroke="transparent" points="-4,4 -4,-169 492,-169 492,4 -4,4"/>
|
||||||
|
<g id="clust1" class="cluster">
|
||||||
|
<title>cluster</title>
|
||||||
|
<polygon fill="none" stroke="black" points="8,-8 8,-157 480,-157 480,-8 8,-8"/>
|
||||||
|
<text text-anchor="middle" x="244" y="-141.8" font-family="Times,serif" font-size="14.00">Edges represent GC references</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame -->
|
||||||
|
<g id="node1" class="node">
|
||||||
|
<title>eh_frame</title>
|
||||||
|
<polygon fill="none" stroke="black" points="342.5,-89.5 342.5,-125.5 471.5,-125.5 471.5,-89.5 342.5,-89.5"/>
|
||||||
|
<text text-anchor="middle" x="407" y="-103.8" font-family="Times,serif" font-size="14.00">.eh_frame (GC root)</text>
|
||||||
|
</g>
|
||||||
|
<!-- lsda -->
|
||||||
|
<g id="node4" class="node">
|
||||||
|
<title>lsda</title>
|
||||||
|
<polygon fill="none" stroke="black" points="358,-16.5 358,-52.5 472,-52.5 472,-16.5 358,-16.5"/>
|
||||||
|
<text text-anchor="middle" x="415" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
|
||||||
|
</g>
|
||||||
|
<!-- eh_frame->lsda -->
|
||||||
|
<g id="edge1" class="edge">
|
||||||
|
<title>eh_frame->lsda</title>
|
||||||
|
<path fill="none" stroke="black" d="M408.94,-89.31C409.84,-81.29 410.94,-71.55 411.95,-62.57"/>
|
||||||
|
<polygon fill="black" stroke="black" points="415.44,-62.86 413.08,-52.53 408.48,-62.07 415.44,-62.86"/>
|
||||||
|
</g>
|
||||||
|
<!-- text_a -->
|
||||||
|
<g id="node2" class="node">
|
||||||
|
<title>text_a</title>
|
||||||
|
<polygon fill="none" stroke="black" points="224.5,-89.5 224.5,-125.5 303.5,-125.5 303.5,-89.5 224.5,-89.5"/>
|
||||||
|
<text text-anchor="middle" x="264" y="-103.8" font-family="Times,serif" font-size="14.00">.text._Z1av</text>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_a -->
|
||||||
|
<g id="node5" class="node">
|
||||||
|
<title>lsda_a</title>
|
||||||
|
<polygon fill="none" stroke="black" points="188,-16.5 188,-52.5 340,-52.5 340,-16.5 188,-16.5"/>
|
||||||
|
<text text-anchor="middle" x="264" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1av</text>
|
||||||
|
</g>
|
||||||
|
<!-- text_a->lsda_a -->
|
||||||
|
<g id="edge2" class="edge">
|
||||||
|
<title>text_a->lsda_a</title>
|
||||||
|
<path fill="none" stroke="black" d="M258.12,-89.31C257.28,-81.29 257.05,-71.55 257.42,-62.57"/>
|
||||||
|
<polygon fill="black" stroke="black" points="260.92,-62.75 258.14,-52.53 253.94,-62.25 260.92,-62.75"/>
|
||||||
|
</g>
|
||||||
|
<!-- text_b -->
|
||||||
|
<g id="node3" class="node">
|
||||||
|
<title>text_b</title>
|
||||||
|
<polygon fill="none" stroke="black" points="53,-89.5 53,-125.5 133,-125.5 133,-89.5 53,-89.5"/>
|
||||||
|
<text text-anchor="middle" x="93" y="-103.8" font-family="Times,serif" font-size="14.00">.text._Z1bv</text>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_b -->
|
||||||
|
<g id="node6" class="node">
|
||||||
|
<title>lsda_b</title>
|
||||||
|
<polygon fill="none" stroke="black" points="16.5,-16.5 16.5,-52.5 169.5,-52.5 169.5,-16.5 16.5,-16.5"/>
|
||||||
|
<text text-anchor="middle" x="93" y="-30.8" font-family="Times,serif" font-size="14.00">.gcc_except_table._Z1bv</text>
|
||||||
|
</g>
|
||||||
|
<!-- text_b->lsda_b -->
|
||||||
|
<g id="edge4" class="edge">
|
||||||
|
<title>text_b->lsda_b</title>
|
||||||
|
<path fill="none" stroke="black" d="M87.12,-89.31C86.28,-81.29 86.05,-71.55 86.42,-62.57"/>
|
||||||
|
<polygon fill="black" stroke="black" points="89.92,-62.75 87.14,-52.53 82.94,-62.25 89.92,-62.75"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_a->text_a -->
|
||||||
|
<g id="edge3" class="edge">
|
||||||
|
<title>lsda_a->text_a</title>
|
||||||
|
<path fill="none" stroke="black" d="M269.86,-52.53C270.71,-60.53 270.95,-70.27 270.59,-79.25"/>
|
||||||
|
<polygon fill="black" stroke="black" points="267.09,-79.09 269.88,-89.31 274.07,-79.58 267.09,-79.09"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda_b->text_b -->
|
||||||
|
<g id="edge5" class="edge">
|
||||||
|
<title>lsda_b->text_b</title>
|
||||||
|
<path fill="none" stroke="black" d="M98.86,-52.53C99.71,-60.53 99.95,-70.27 99.59,-79.25"/>
|
||||||
|
<polygon fill="black" stroke="black" points="96.09,-79.09 98.88,-89.31 103.07,-79.58 96.09,-79.09"/>
|
||||||
|
</g>
|
||||||
|
</g>
|
||||||
|
</svg>
|
After Width: | Height: | Size: 4.0 KiB |
|
@ -0,0 +1,64 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||||
|
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
|
||||||
|
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
|
||||||
|
<!-- Generated by graphviz version 2.43.0 (0)
|
||||||
|
-->
|
||||||
|
<!-- Title: %3 Pages: 1 -->
|
||||||
|
<svg width="274pt" height="219pt"
|
||||||
|
viewBox="0.00 0.00 274.00 219.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||||
|
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 215)">
|
||||||
|
<title>%3</title>
|
||||||
|
<polygon fill="white" stroke="transparent" points="-4,4 -4,-215 270,-215 270,4 -4,4"/>
|
||||||
|
<g id="clust1" class="cluster">
|
||||||
|
<title>cluster</title>
|
||||||
|
<polygon fill="none" stroke="black" points="8,-8 8,-167 258,-167 258,-8 8,-8"/>
|
||||||
|
<text text-anchor="middle" x="133" y="-151.8" font-family="Times,serif" font-size="14.00">Edges represent relocations</text>
|
||||||
|
</g>
|
||||||
|
<!-- unused -->
|
||||||
|
<g id="node1" class="node">
|
||||||
|
<title>unused</title>
|
||||||
|
<ellipse fill="none" stroke="black" cx="70" cy="-193" rx="36" ry="18"/>
|
||||||
|
<text text-anchor="middle" x="70" y="-189.3" font-family="Times,serif" font-size="14.00">unused</text>
|
||||||
|
</g>
|
||||||
|
<!-- fde_a -->
|
||||||
|
<g id="node2" class="node">
|
||||||
|
<title>fde_a</title>
|
||||||
|
<polygon fill="none" stroke="black" points="16,-99.5 16,-135.5 124,-135.5 124,-99.5 16,-99.5"/>
|
||||||
|
<text text-anchor="middle" x="70" y="-113.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE0</text>
|
||||||
|
</g>
|
||||||
|
<!-- unused->fde_a -->
|
||||||
|
<g id="edge3" class="edge">
|
||||||
|
<title>unused->fde_a</title>
|
||||||
|
<path fill="none" stroke="black" d="M70,-174.95C70,-166.3 70,-155.57 70,-145.79"/>
|
||||||
|
<polygon fill="black" stroke="black" points="73.5,-145.71 70,-135.71 66.5,-145.71 73.5,-145.71"/>
|
||||||
|
</g>
|
||||||
|
<!-- lsda -->
|
||||||
|
<g id="node4" class="node">
|
||||||
|
<title>lsda</title>
|
||||||
|
<polygon fill="none" stroke="black" points="76,-16.5 76,-62.5 190,-62.5 190,-16.5 76,-16.5"/>
|
||||||
|
<text text-anchor="middle" x="133" y="-47.3" font-family="Times,serif" font-size="14.00">.gcc_except_table</text>
|
||||||
|
<polyline fill="none" stroke="black" points="76,-39.5 190,-39.5 "/>
|
||||||
|
<text text-anchor="middle" x="104" y="-24.3" font-family="Times,serif" font-size="14.00">lsda_a</text>
|
||||||
|
<polyline fill="none" stroke="black" points="132,-16.5 132,-39.5 "/>
|
||||||
|
<text text-anchor="middle" x="161" y="-24.3" font-family="Times,serif" font-size="14.00">lsda_b</text>
|
||||||
|
</g>
|
||||||
|
<!-- fde_a->lsda -->
|
||||||
|
<g id="edge1" class="edge">
|
||||||
|
<title>fde_a->lsda:a</title>
|
||||||
|
<path fill="none" stroke="black" d="M64.21,-99.34C57.5,-77.11 49.26,-40.03 65.15,-30.04"/>
|
||||||
|
<polygon fill="black" stroke="black" points="66.19,-33.39 75,-27.5 64.44,-26.61 66.19,-33.39"/>
|
||||||
|
</g>
|
||||||
|
<!-- fde_b -->
|
||||||
|
<g id="node3" class="node">
|
||||||
|
<title>fde_b</title>
|
||||||
|
<polygon fill="none" stroke="black" points="142,-99.5 142,-135.5 250,-135.5 250,-99.5 142,-99.5"/>
|
||||||
|
<text text-anchor="middle" x="196" y="-113.8" font-family="Times,serif" font-size="14.00">.eh_frame FDE1</text>
|
||||||
|
</g>
|
||||||
|
<!-- fde_b->lsda -->
|
||||||
|
<g id="edge2" class="edge">
|
||||||
|
<title>fde_b->lsda:b</title>
|
||||||
|
<path fill="none" stroke="black" d="M201.79,-99.34C208.5,-77.11 216.74,-40.03 200.85,-30.04"/>
|
||||||
|
<polygon fill="black" stroke="black" points="201.56,-26.61 191,-27.5 199.81,-33.39 201.56,-26.61"/>
|
||||||
|
</g>
|
||||||
|
</g>
|
||||||
|
</svg>
|
After Width: | Height: | Size: 3.0 KiB |
|
@ -0,0 +1,708 @@
|
||||||
|
# Stack unwinding
|
||||||
|
|
||||||
|
The main usage of stack unwinding is:
|
||||||
|
|
||||||
|
* To obtain a stack trace for debugger, crash reporter, profiler, garbage
|
||||||
|
collector, etc.
|
||||||
|
* With personality routines and language specific data area, to implement C++
|
||||||
|
exceptions (Itanium C++ ABI). See [C++ exception handling ABI](maskray-3.md)
|
||||||
|
|
||||||
|
Stack unwinding tasks can be divided into two categories:
|
||||||
|
|
||||||
|
* synchronous: triggered by the program itself, C++ throw, get its own stack
|
||||||
|
trace, etc. This type of stack unwinding only occurs at the function call
|
||||||
|
(in the function body, it will not appear in the prologue/epilogue)
|
||||||
|
* asynchronous: triggered by a garbage collector, signals or an external
|
||||||
|
program, this kind of stack unwinding can happen in function prologue/epilogue
|
||||||
|
|
||||||
|
## Frame pointer
|
||||||
|
|
||||||
|
The most classic and simplest stack unwinding is based on the frame pointer:
|
||||||
|
fix a register as the frame pointer (RBP on x86-64), put the frame pointer in
|
||||||
|
the stack frame at the function prologue, and update the frame pointer to the
|
||||||
|
address of the saved frame pointer. The frame pointer and its saved values in
|
||||||
|
the stack form a singly linked list. After obtaining the initial frame pointer
|
||||||
|
value (`__builtin_frame_address`), dereference the frame pointer continuously
|
||||||
|
to get the frame pointer values of all stack frames. This method is not
|
||||||
|
applicable to some instructions in the prologue/epilogue.
|
||||||
|
|
||||||
|
```
|
||||||
|
pushq %rbp
|
||||||
|
movq %rsp, %rbp # after this, RBP references the current frame
|
||||||
|
...
|
||||||
|
popq %rbp
|
||||||
|
retq # RBP references the previous frame
|
||||||
|
```
|
||||||
|
|
||||||
|
```c
|
||||||
|
#include <stdio.h>
|
||||||
|
[[gnu::noinline]] void qux() {
|
||||||
|
void **fp = __builtin_frame_address(0);
|
||||||
|
for (;;) {
|
||||||
|
printf("%p\n", fp);
|
||||||
|
void **next_fp = *fp;
|
||||||
|
if (next_fp <= fp) break;
|
||||||
|
fp = next_fp;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
[[gnu::noinline]] void bar() { qux(); }
|
||||||
|
[[gnu::noinline]] void foo() { bar(); }
|
||||||
|
int main() { foo(); }
|
||||||
|
```
|
||||||
|
|
||||||
|
The frame pointer-based method is simple, but has several drawbacks.
|
||||||
|
|
||||||
|
When the above code is compiled with `-O1` or above, foo and bar will have tail
|
||||||
|
calls, and the program output will not include the stack frame of foo and bar
|
||||||
|
(`-fomit-leaf-frame-pointer` does not hinder the tail call).
|
||||||
|
|
||||||
|
In practice, it is not guaranteed that all libraries contain frame pointers.
|
||||||
|
When unwinding a thread, it is necessary to check whether `next_fp` is like a
|
||||||
|
stack address before dereferencing it to prevent segfaults. One way to check
|
||||||
|
page accessibility is to parse `/proc/*/maps` to determine whether the address is
|
||||||
|
readable (slow). There is a smart trick:
|
||||||
|
|
||||||
|
```c
|
||||||
|
// Or use the write end of a pipe.
|
||||||
|
int fd = open("/dev/random", O_WRONLY);
|
||||||
|
if (write(fd, address, 1) < 0)
|
||||||
|
// not readable
|
||||||
|
```
|
||||||
|
|
||||||
|
In addition, reserving a register for the frame pointer will increase text size
|
||||||
|
and have negative performance impact (prologue, epilogue additional instruction
|
||||||
|
overhead and register pressure caused by one fewer register), which may be
|
||||||
|
quite significant on x86-32 which lack registers. On an architecture with
|
||||||
|
relatively sufficient registers, e.g. x86-64, the performance loss can be more
|
||||||
|
than 1%.
|
||||||
|
|
||||||
|
### Compiler behavior
|
||||||
|
|
||||||
|
* -O0: Default `-fno-omit-frame-pointer`, all functions have frame pointer
|
||||||
|
* -O1 or above: Preset `-fomit-frame-pointer`, set frame pointer only if
|
||||||
|
necessary. Specify `-fno-omit-leaf-frame-pointer` to get a similar effect to
|
||||||
|
-O0. You can additionally specify `-momti-leaf-frame-pointer` to remove the
|
||||||
|
frame pointer of leaf functions
|
||||||
|
|
||||||
|
## libunwind
|
||||||
|
|
||||||
|
C++ exception and stack unwinding of profiler/crash reporter usually use
|
||||||
|
libunwind API and DWARF Call Frame Information. In the 1990s, Hewlett-Packard
|
||||||
|
defined a set of libunwind API, which is divided into two categories:
|
||||||
|
|
||||||
|
* `unw_*`: The entry points are `unw_init_local` (local unwinding, current
|
||||||
|
process) and `unw_init_remote` (remote unwinding, other processes).
|
||||||
|
Applications that usually use libunwind use this API. For example, Linux perf
|
||||||
|
will call `unw_init_remote`
|
||||||
|
* `_Unwind_*`: This part is standardized as Level 1: Base ABI of [Itanium C++
|
||||||
|
ABI: Exception Handling](https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html).
|
||||||
|
The Level 2 C++ ABI calls these `_Unwind_*` APIs. Among them, `_Unwind_Resume`
|
||||||
|
is the only API that is directly called by C++ compiled code.
|
||||||
|
`_Unwind_Backtrace` is used by a few applications to obtain stack traces. Other
|
||||||
|
functions are called by libsupc++/libc++abi `__cxa_*` functions and
|
||||||
|
`__gxx_personality_v0`.
|
||||||
|
|
||||||
|
Hewlett-Packard has open sourced https://www.nongnu.org/libunwind/ (in addition
|
||||||
|
to many projects called "libunwind"). The common implementations of this API on
|
||||||
|
Linux are:
|
||||||
|
|
||||||
|
* libgcc/unwind-\* (`libgcc_s.so.1` or `libgcc_eh.a`): Implemented `_Unwind_*`
|
||||||
|
and introduced some extensions: `_Unwind_Resume_or_Rethrow`,
|
||||||
|
`_Unwind_FindEnclosingFunction`, `__register_frame` etc.
|
||||||
|
* llvm-project/libunwind (`libunwind.so` or `libunwind.a`) is a simplified
|
||||||
|
implementation of HP API, which provides part of `unw_*`, but does not
|
||||||
|
implement `unw_init_remote`. Part of the code is taken from ld64. If you use
|
||||||
|
Clang, you can use `--rtlib=compiler-rt --unwindlib=libunwind` to choose
|
||||||
|
* glibc's internal implementation of `_Unwind_Find_FDE`, usually not exported,
|
||||||
|
and related to `__register_frame_info`
|
||||||
|
|
||||||
|
## DWARF Call Frame Information
|
||||||
|
|
||||||
|
The unwind instructions required by different areas of the program are
|
||||||
|
described by DWARF Call Frame Information (CFI) and stored by `.eh_frame` on
|
||||||
|
the ELF platform. Compiler/assembler/linker/libunwind provides corresponding
|
||||||
|
support.
|
||||||
|
|
||||||
|
`.eh_frame` is composed of Common Information Entry (CIE) and Frame Description
|
||||||
|
Entry (FDE). CIE has these fields:
|
||||||
|
|
||||||
|
* `length`
|
||||||
|
* `CIE_id`: Constant 0. This field is used to distinguish CIE and FDE. In FDE,
|
||||||
|
this field is non-zero, representing `CIE_pointer`
|
||||||
|
* `version`: Constant 1
|
||||||
|
* `augmentation_string`: A string describing the CIE/FDE parameter list. The `P`
|
||||||
|
character indicates the personality routine pointer; the `L` character
|
||||||
|
indicates that the augmentation data of the FDE stores the language-specific
|
||||||
|
data area (LSDA)
|
||||||
|
* `address_size`: Generally 4 or 8
|
||||||
|
* `segment_selector_size`: For x86
|
||||||
|
* `code_alignment_factor`: Assuming that the instruction length is a multiple of
|
||||||
|
2 or 4 (for RISC), it affects the multiplier of parameters such as
|
||||||
|
`DW_CFA_advance_loc`
|
||||||
|
* `data_alignment_factor`: The multiplier that affects parameters such as
|
||||||
|
`DW_CFA_offset` `DW_CFA_val_offset`
|
||||||
|
* `return_address_register`
|
||||||
|
* `augmentation_data_length`
|
||||||
|
* `augmentation_data`: personality
|
||||||
|
* `initial_instructions`: bytecode for unwinding, a common prefix used by all
|
||||||
|
FDEs using this CIE
|
||||||
|
* padding
|
||||||
|
|
||||||
|
Each FDE has an associated CIE. FDE has these fields:
|
||||||
|
|
||||||
|
* `length`: The length of FDE itself. If it is `0xffffffff`, the next 8 bytes
|
||||||
|
(`extended_length`) record the actual length. Unless specially constructed,
|
||||||
|
`extended_length` is not used
|
||||||
|
* `CIE_pointer`: Subtract CIE_pointer from the current position to get the
|
||||||
|
associated CIE
|
||||||
|
* `initial_location`: The address of the first location described by the FDE.
|
||||||
|
There is a relocation referring to the section symbol in .o
|
||||||
|
* `address_range`: initial_location and address_range describe an address range
|
||||||
|
* `instructions`: bytecode for unwinding, essentially (address,opcode) pairs
|
||||||
|
* `augmentation_data_length`
|
||||||
|
* `augmentation_data`: If the associated CIE augmentation contains `L`
|
||||||
|
characters, language-specific data area will be recorded here
|
||||||
|
* padding
|
||||||
|
|
||||||
|
A CIE may optionally refer to a personality routine in the text section. A FDE
|
||||||
|
may optionally refer to its associated LSDA in `.gcc_except_table`. The
|
||||||
|
personality routine and LSDA are used in Level 2: C++ ABI of Itanium C++ ABI.
|
||||||
|
|
||||||
|
`.eh_frame` is based on `.debug_frame` introduced in DWARF v2. They have some
|
||||||
|
differences, though:
|
||||||
|
|
||||||
|
* `.eh_frame` has the flag of `SHF_ALLOC` (indicating that a section should be
|
||||||
|
part of the mirror image in memory) but `.debug_frame` does not, so the latter
|
||||||
|
has very few usage scenarios.
|
||||||
|
* `debug_frame` supports DWARF64 format (supports 64-bit offsets but the volume
|
||||||
|
will be slightly larger) but `.eh_frame` does not support (in fact, it can be
|
||||||
|
expanded, but lacks demand)
|
||||||
|
* There is no augmentation_data_length and augmentation_data in the CIE of
|
||||||
|
`.debug_frame`
|
||||||
|
* The version field in CIE is different
|
||||||
|
* The meaning of CIE_pointer in FDE is different. `.debug_frame` indicates a
|
||||||
|
section offset (absolute) and `.eh_frame` indicates a relative offset. This
|
||||||
|
change made by `.eh_frame` is great. If the length of `.eh_frame` exceeds
|
||||||
|
32-bit, `.debug_frame` has to be converted to DWARF64 to represent
|
||||||
|
`CIE_pointer`, and relative offset does not need to worry about this issue (if
|
||||||
|
the distance between FDE and CIE exceeds 32-bit, add a CIE OK)
|
||||||
|
|
||||||
|
For the following function:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void f() {
|
||||||
|
__builtin_unwind_init();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The compiler produces `.cfi_*` (CFI directives) to annotate the assembly,
|
||||||
|
`.cfi_startproc` and `.cfi_endproc` annotate the FDE area, and other CFI directives
|
||||||
|
describe CFI instructions. A call frame is indicated by an address on the
|
||||||
|
stack. This address is called Canonical Frame Address (CFA), and is usually the
|
||||||
|
stack pointer value of the call site. The following example demonstrates the
|
||||||
|
usage of CFI instructions:
|
||||||
|
|
||||||
|
```
|
||||||
|
f:
|
||||||
|
# At the function entry, CFA = rsp+8
|
||||||
|
.cfi_startproc
|
||||||
|
# %bb.0:
|
||||||
|
pushq %rbp
|
||||||
|
# Redefine CFA = rsp+16
|
||||||
|
.cfi_def_cfa_offset 16
|
||||||
|
# rbp is saved at the address CFA-16
|
||||||
|
.cfi_offset %rbp, -16
|
||||||
|
movq %rsp, %rbp
|
||||||
|
# CFA = rbp+16. CFA does not needed to be redefined when rsp changes
|
||||||
|
.cfi_def_cfa_register %rbp
|
||||||
|
pushq %r15
|
||||||
|
pushq %r14
|
||||||
|
pushq %r13
|
||||||
|
pushq %r12
|
||||||
|
pushq %rbx
|
||||||
|
# rbx is saved at the address CFA-56
|
||||||
|
.cfi_offset %rbx, -56
|
||||||
|
.cfi_offset %r12, -48
|
||||||
|
.cfi_offset %r13, -40
|
||||||
|
.cfi_offset %r14, -32
|
||||||
|
.cfi_offset %r15, -24
|
||||||
|
popq %rbx
|
||||||
|
popq %r12
|
||||||
|
popq %r13
|
||||||
|
popq %r14
|
||||||
|
popq %r15
|
||||||
|
popq %rbp
|
||||||
|
# CFA = rsp+8
|
||||||
|
.cfi_def_cfa %rsp, 8
|
||||||
|
retq
|
||||||
|
.Lfunc_end0:
|
||||||
|
.size f, .Lfunc_end0-f
|
||||||
|
.cfi_endproc
|
||||||
|
```
|
||||||
|
|
||||||
|
The assembler parses CFI directives and generates `.eh_frame` (this mechanism was
|
||||||
|
introduced by Alan Modra in 2003). Linker collects `.eh_frame` input sections in
|
||||||
|
.o/.a files to generate output `.eh_frame`. In 2006, GNU as introduced
|
||||||
|
`.cfi_personality` and `.cfi_lsda`.
|
||||||
|
|
||||||
|
### `.eh_frame_hdr` and `PT_EH_FRAME`
|
||||||
|
|
||||||
|
To locate the FDE where a pc is located, you need to scan `.eh_frame` from the
|
||||||
|
beginning to find the appropriate FDE (whether the pc falls in the interval
|
||||||
|
indicated by initial_location and address_range). The time spent is
|
||||||
|
proportional to the number of scanned CIE and FDE records.
|
||||||
|
https://sourceware.org/pipermail/binutils/2001-December/015674.html introduced
|
||||||
|
`.eh_frame_hdr`, a binary search index table describing (`initial_location`, FDE
|
||||||
|
address) pairs.
|
||||||
|
|
||||||
|
The linker collects all `.eh_frame` input sections. With `--eh-frame-hdr`, `ld`
|
||||||
|
generates `.eh_frame_hdr` and creates a program header `PT_EH_FRAME` to describe
|
||||||
|
`.eh_frame_hdr`. An unwinder can parse the program headers and look for
|
||||||
|
`PT_EH_FRAME` to locate `.eh_frame_hdr`. Please check out the example below.
|
||||||
|
|
||||||
|
### `__register_frame_info`
|
||||||
|
|
||||||
|
Before `.eh_frame_hdr` and `PT_EH_FRAME` were invented, there was a static
|
||||||
|
constructor `frame_dummy` in crtbegin (`crtstuff.c`): calling
|
||||||
|
`__register_frame_info` to register the executable file `.eh_frame`.
|
||||||
|
|
||||||
|
Now `__register_frame_info` is only used by programs linked with `-static`.
|
||||||
|
Correspondingly, if you specify `-Wl,--no-eh-frame-hdr` when linking, you cannot
|
||||||
|
unwind (if you use a C++ exception, the program will call `std::terminate`).
|
||||||
|
|
||||||
|
### libunwind example
|
||||||
|
|
||||||
|
```c
|
||||||
|
#include <libunwind.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
|
||||||
|
void backtrace() {
|
||||||
|
unw_context_t context;
|
||||||
|
unw_cursor_t cursor;
|
||||||
|
// Store register values into context.
|
||||||
|
unw_getcontext(&context);
|
||||||
|
// Locate the PT_GNU_EH_FRAME which contains PC.
|
||||||
|
unw_init_local(&cursor, &context);
|
||||||
|
size_t rip, rsp;
|
||||||
|
do {
|
||||||
|
unw_get_reg(&cursor, UNW_X86_64_RIP, &rip);
|
||||||
|
unw_get_reg(&cursor, UNW_X86_64_RSP, &rsp);
|
||||||
|
printf("rip: %zx rsp: %zx\n", rip, rsp);
|
||||||
|
} while (unw_step(&cursor) > 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
void bar() {backtrace();}
|
||||||
|
void foo() {bar();}
|
||||||
|
int main() {foo();}
|
||||||
|
```
|
||||||
|
|
||||||
|
If you use llvm-project/libunwind:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$CC a.c -Ipath/to/include -Lpath/to/lib -lunwind
|
||||||
|
```
|
||||||
|
|
||||||
|
If you use nongnu.org/libunwind, there are two options: (a) Add `#define
|
||||||
|
UNW_LOCAL_ONLY` before `#include <libunwind.h>` (b) Link one more library, on
|
||||||
|
x86-64 it is `-l:libunwind-x86_64.so`. If you use Clang, you can also use `clang
|
||||||
|
--rtlib=compiler-rt --unwindlib=libunwind -I path/to/include a.c`, in addition
|
||||||
|
to providing `unw_*`, it can ensure that `libgcc_s.so` is not linked
|
||||||
|
|
||||||
|
* `unw_getcontext`: Get register value (including PC)
|
||||||
|
* `unw_init_local`
|
||||||
|
* Use `dl_iterate_phdr` to traverse executable files and shared objects, and
|
||||||
|
find the `PT_LOAD` program header that contains the PC
|
||||||
|
* Find the `PT_EH_FRAME`(`.eh_frame_hdr`) of the module where you are, and
|
||||||
|
save it in cursor
|
||||||
|
* `unw_step`
|
||||||
|
* Binary search for the `.eh_frame_hdr` item corresponding to the PC, record
|
||||||
|
the FDE found and the CIE it points to
|
||||||
|
* Execute `initial_instructions` in CIE
|
||||||
|
* Execute the instructions (bytecode) in FDE. An automaton maintains the
|
||||||
|
current location and CFA. Among the instructions, `DW_CFA_advance_loc`
|
||||||
|
advances the location; `DW_CFA_def_cfa_*` updates CFA; `DW_CFA_offset`
|
||||||
|
indicates that the value of a register is stored at CFA+offset
|
||||||
|
* The automaton stops when the current location is greater than or equal to
|
||||||
|
PC. In other words, the executed instruction is a prefix of FDE instructions
|
||||||
|
|
||||||
|
An unwinder locates the applicable FDE according to the program counter, and
|
||||||
|
executes all the CFI instructions before the program counter.
|
||||||
|
|
||||||
|
There are several important
|
||||||
|
|
||||||
|
* `DW_CFA_def_cfa_*`
|
||||||
|
* `DW_CFA_offset`
|
||||||
|
* `DW_CFA_advance_loc`
|
||||||
|
|
||||||
|
A `-DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=X86` clang, `.text`
|
||||||
|
51.7MiB, `.eh_frame` 4.2MiB, `.eh_frame_hdr` 646, 2 CIE, 82745 FDE.
|
||||||
|
|
||||||
|
### Remarks
|
||||||
|
|
||||||
|
CFI instructions are suitable for the compiler to generate code, but cumbersome
|
||||||
|
to write in hand-written assembly. In 2015, Alex Dowad contributed an awk
|
||||||
|
script to musl libc to parse the assembly and automatically generate CFI
|
||||||
|
directives. In fact, generating precise CFI instructions is challenging for
|
||||||
|
ompilers as well. For a function that does not use a frame pointer, adjusting
|
||||||
|
SP requires outputting a CFI directive to redefine CFA. GCC does not parse
|
||||||
|
inline assembly, so adjusting SP in inline assembly often results in imprecise
|
||||||
|
CFI.
|
||||||
|
|
||||||
|
```c
|
||||||
|
void foo() {
|
||||||
|
asm("subq $128, %rsp\n"
|
||||||
|
// Cannot unwind if -fomit-leaf-frame-pointer
|
||||||
|
"nop\n"
|
||||||
|
"addq $128, %rsp\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
int main() {
|
||||||
|
foo();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The CFIInstrInserter pass in LLVM can insert `.cfi_def_cfa_*` `.cfi_offset`
|
||||||
|
`.cfi_restore` to adjust the CFA and callee-saved registers.
|
||||||
|
|
||||||
|
The DWARF scheme also has very low information density. The various compact
|
||||||
|
unwind schemes have made improvement on this aspect. To list a few issues:
|
||||||
|
|
||||||
|
* CIE `address_size`: nobody uses different values for an architecture. Even if
|
||||||
|
they do (ILP32 ABIs in AArch64 and x86-64), the information is already
|
||||||
|
available elsewhere.
|
||||||
|
* CIE `segment_selector_size`: It is nice that they cared x86, but x86 itself
|
||||||
|
does not need it anymore :/
|
||||||
|
* CIE `code_alignment_factor` and `data_alignment_factor`: A RISC architecture
|
||||||
|
with such preference can hard code the values.
|
||||||
|
* CIE `return_address_register`: I do not know when an architecture wants to
|
||||||
|
use a different register for the return address.
|
||||||
|
* `length`: The DWARF's 8-byte form is definitely overengineered... For standard
|
||||||
|
form prologue/epilogue, the field should not be needed.
|
||||||
|
* `initial_location` and `address_range`: if a binary search index table is
|
||||||
|
always needed, why do we need the length field?
|
||||||
|
* `instructions`: bytecode is flexible but commonly a function
|
||||||
|
prologue/epilogue is of a standard form and the few callee-saved registers
|
||||||
|
can be encoded in a more compact way.
|
||||||
|
* `augmentation_data`: While this provide flexibility, in practice very rarely
|
||||||
|
a function needs anything more than a personality and a LSDA pointer.
|
||||||
|
|
||||||
|
Callee-saved registers other than FP are oftentimes unneeded but there is no
|
||||||
|
compiler option to drop them.
|
||||||
|
|
||||||
|
## `SHT_X86_64_UNWIND`
|
||||||
|
|
||||||
|
`.eh_frame` has special processing in linker/dynamic loader, so conventionally
|
||||||
|
it should use a separate section type, but `SHT_PROGBITS` was used in the
|
||||||
|
design. In the x86-64 psABI, the type of `.eh_frame` is `SHT_X86_64_UNWIND`
|
||||||
|
(influenced by Solaris).
|
||||||
|
|
||||||
|
* In GNU as, `.section .eh_frame,"a",@unwind` will generate `SHT_X86_64_UNWIND`,
|
||||||
|
and `.cfi_*` will generate `SHT_PROGBITS`.
|
||||||
|
* Since Clang 3.8, `.cfi_*` generates `SHT_X86_64_UNWIND`
|
||||||
|
|
||||||
|
`.section .eh_frame,"a",@unwind` is rare (glibc's x86 port, libffi, LuaJIT and
|
||||||
|
other packages), so checking the type of `.eh_frame` is a good way to
|
||||||
|
distinguish Clang/GCC object file :) For LLD 11.0.0, I contributed
|
||||||
|
https://reviews.llvm.org/D85785 to allow mixed types for `.eh_frame` in a
|
||||||
|
relocatable link ;-)
|
||||||
|
|
||||||
|
Suggestion to future architectures: When defining processor-specific section
|
||||||
|
types, please do not use 0x70000001
|
||||||
|
(`SHT_ARM_EXIDX=SHT_IA_64_UNWIND=SHT_PARISC_UNWIND=SHT_X86_64_UNWIND=SHT_LOPROC+1`)
|
||||||
|
for purposes other than unwinding :) `SHT_CSKY_ATTRIBUTES=0x70000001` :)
|
||||||
|
|
||||||
|
### Linker perspective
|
||||||
|
|
||||||
|
Usually in the case of COMDAT group and `-ffunction-sections`,
|
||||||
|
`.data`/`.rodata` needs to be split like `.text`, but `.eh_frame` is
|
||||||
|
monolithic. Like many other metadata sections, the main problem with the
|
||||||
|
monolithic section is that garbage collection is challenging in the linker.
|
||||||
|
Unlike some other metadata sections, simply abandoning garbage collecting is
|
||||||
|
not a choice: `.eh_frame_hdr` is a binary search index table and
|
||||||
|
duplicate/unused entries can confuse the customers.
|
||||||
|
|
||||||
|
When a linker processes `.eh_frame`, it needs to conceptually split `.eh_frame`
|
||||||
|
into CIE/FDE. During `--gc-sections`, the conceptual reference relationship is
|
||||||
|
reversed considering the actual relocation: a FDE has a relocation referencing
|
||||||
|
the text section; during GC, if the pointed text section is discarded, the FDE
|
||||||
|
that references it should also be discarded.
|
||||||
|
|
||||||
|
LLD has some special handling for `.eh_frame`:
|
||||||
|
|
||||||
|
* `-M` requires special code
|
||||||
|
* `--gc-sections` occurs before `.eh_frame` deduplication/GC. The personality
|
||||||
|
in a CIE is a valid reference. However, `initial_location` in FDE should be
|
||||||
|
ignored. Moreover, a LSDA reference in a FDE in a section group should be
|
||||||
|
ignored.
|
||||||
|
* In a relocatable link, a relocation from `.eh_frame` to a `STT_SECTION`
|
||||||
|
symbol in a discarded section (due to COMDAT group rule) should be allowed
|
||||||
|
(normally such a `STB_LOCAL` relocation from outside of the group is
|
||||||
|
disallowed).
|
||||||
|
|
||||||
|
## Compact unwind descriptors
|
||||||
|
|
||||||
|
On macOS, Apple designed the compact unwind descriptors mechanism to accelerate
|
||||||
|
unwinding. In theory, this technique can be used to save some space in
|
||||||
|
`__eh_frame`, but it has not been implemented. The main idea is:
|
||||||
|
|
||||||
|
* The FDE of most functions has a fixed mode (specify CFA at the prologue,
|
||||||
|
store callee-saved registers), and the FDE instructions can be compressed to
|
||||||
|
32-bit.
|
||||||
|
* Personality/lsda described by CIE/FDE augmentation data is very common and
|
||||||
|
can be extracted as a fixed field.
|
||||||
|
|
||||||
|
Only 64-bit will be discussed below. A descriptor occupies 32 bytes
|
||||||
|
|
||||||
|
```
|
||||||
|
.quad _foo
|
||||||
|
.set L1, Lfoo_end-_foo
|
||||||
|
.long L1
|
||||||
|
.long compact_unwind_description
|
||||||
|
.quad personality
|
||||||
|
.quad lsda_address
|
||||||
|
```
|
||||||
|
|
||||||
|
If you study `.eh_frame_hdr` (binary search index table) and `.ARM.exidx`, you
|
||||||
|
can know that the length field is redundant.
|
||||||
|
|
||||||
|
The Compact unwind descriptor is encoded as:
|
||||||
|
|
||||||
|
```c
|
||||||
|
uint32_t : 24; // vary with different modes
|
||||||
|
uint32_t mode : 4;
|
||||||
|
uint32_t flags : 4;
|
||||||
|
```
|
||||||
|
|
||||||
|
Five modes are defined:
|
||||||
|
|
||||||
|
* 0: reserved
|
||||||
|
* 1: FP-based frame: RBP is frame pointer, frame size is variable
|
||||||
|
* 2: SP-based frame: frame pointer is not used, frame size is fixed during
|
||||||
|
compilation
|
||||||
|
* 3: large SP-based frame: frame pointer is not used, the frame size is fixed
|
||||||
|
at compile time but the value is large and cannot be represented by mode 2
|
||||||
|
* 4: DWARF CFI escape
|
||||||
|
|
||||||
|
### FP-based frame (`UNWIND_MODE_BP_FRAME`)
|
||||||
|
|
||||||
|
The compact unwind encoding is:
|
||||||
|
|
||||||
|
```c
|
||||||
|
uint32_t regs : 15;
|
||||||
|
uint32_t : 1; // 0
|
||||||
|
uint32_t stack_adjust : 8;
|
||||||
|
uint32_t mode : 4;
|
||||||
|
uint32_t flags : 4;
|
||||||
|
```
|
||||||
|
|
||||||
|
The callee-saved registers on x86-64 are: RBX, R12, R13, R14, R15, RBP. 3 bits
|
||||||
|
can encode a register, 15 bits are enough to represent 5 registers except RBP
|
||||||
|
(whether to save and where). `stack_adjust` records the extra stack space outside
|
||||||
|
the save register.
|
||||||
|
|
||||||
|
### SP-based frame (`UNWIND_MODE_STACK_IMMD`)
|
||||||
|
|
||||||
|
The compact unwind encoding is:
|
||||||
|
|
||||||
|
```c
|
||||||
|
uint32_t reg_permutation : 10;
|
||||||
|
uint32_t cnt : 3;
|
||||||
|
uint32_t : 3;
|
||||||
|
uint32_t size : 8;
|
||||||
|
uint32_t mode : 4;
|
||||||
|
uint32_t flags : 4;
|
||||||
|
```
|
||||||
|
|
||||||
|
`cnt` represents the number of saved registers (maximum 6). `reg_permutation`
|
||||||
|
indicates the sequence number of the saved register. `size*8` represents the
|
||||||
|
stack frame size.
|
||||||
|
|
||||||
|
### Large SP-based frame (`UNWIND_MODE_STACK_IND`)
|
||||||
|
|
||||||
|
Compact unwind descriptor编码为:
|
||||||
|
|
||||||
|
```c
|
||||||
|
uint32_t reg_permutation : 10;
|
||||||
|
uint32_t cnt : 3;
|
||||||
|
uint32_t adj : 3;
|
||||||
|
uint32_t size_offset : 8;
|
||||||
|
uint32_t mode : 4;
|
||||||
|
uint32_t flags : 4;
|
||||||
|
```
|
||||||
|
|
||||||
|
Similar to SP-based frame. In particular: the stack frame size is read from the
|
||||||
|
text section. The RSP adjustment is usually represented by `subq imm, %rsp`, and
|
||||||
|
`size_offset` is used to represent the distance from the instruction to the
|
||||||
|
beginning of the function. The actual stack size also includes `adj*8`.
|
||||||
|
|
||||||
|
### DWARF CFI escape
|
||||||
|
|
||||||
|
If for various reasons, the compact unwind descriptor cannot be expressed, it
|
||||||
|
must fall back to DWARF CFI.
|
||||||
|
|
||||||
|
In the LLVM implementation, each function is represented by only a compact
|
||||||
|
unwind descriptor. If asynchronous stack unwinding occurs in epilogue, existing
|
||||||
|
implementations cannot distinguish it from stack unwinding in function body.
|
||||||
|
Canonical Frame Address will be calculated incorrectly, and the caller-saved
|
||||||
|
register will be read incorrectly. If it happens in prologue, and the prologue
|
||||||
|
has other instructions outside the push register and `subq imm, $rsp`, an error
|
||||||
|
will occur. In addition, if shrink wrapping is enabled for a function, prologue
|
||||||
|
may not be at the beginning of the function. The asynchronous stack unwinding
|
||||||
|
from the beginning to the prologue also fails. It seems that most people don't
|
||||||
|
care about this issue. It may be because the profiler loses a few percentage
|
||||||
|
points of the profile.
|
||||||
|
|
||||||
|
In fact, if you use multiple descriptors to describe each area of a function,
|
||||||
|
you can still unwind accurately. OpenVMS proposed [\[RFC\] Improving compact
|
||||||
|
x86-64 compact unwind descriptors](http://lists.llvm.org/pipermail/llvm-dev/2018-January/120741.html)
|
||||||
|
in 2018, but unfortunately there is no relevant implementation.
|
||||||
|
|
||||||
|
### ARM exception handling
|
||||||
|
|
||||||
|
Divided into `.ARM.exidx` and `.ARM.extab`
|
||||||
|
|
||||||
|
`.ARM.exidx` is a binary search index table, composed of 2-word pairs. The
|
||||||
|
first word is 31-bit PC-relative offset to the start of the region. The second
|
||||||
|
word uses the program description more clearly:
|
||||||
|
|
||||||
|
```c
|
||||||
|
if (indexData == EXIDX_CANTUNWIND)
|
||||||
|
return false; // like an absent .eh_frame entry. In the case of C++ exceptions, std::terminate
|
||||||
|
if (indexData & 0x80000000) {
|
||||||
|
extabAddr = &indexData;
|
||||||
|
extabData = indexData; // inline
|
||||||
|
} else {
|
||||||
|
extabAddr = &indexData + signExtendPrel31(indexData);
|
||||||
|
extabData = read32(&indexData + signExtendPrel31(indexData)); // stored in .ARM.extab
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`tableData & 0x80000000` means a compact model entry, otherwise means a generic
|
||||||
|
model entry.
|
||||||
|
|
||||||
|
`.ARM.exidx` is equivalent to enhanced `.eh_frame_hdr`, compact model is
|
||||||
|
equivalent to inlining the personality and lsda in `.eh_frame`. Consider the
|
||||||
|
following three situations:
|
||||||
|
|
||||||
|
* If the C++ exception will not be triggered and the function that may trigger
|
||||||
|
the exception will not be called: no personality is needed, only one
|
||||||
|
`EXIDX_CANTUNWIND` entry is needed, no `.ARM.extab`
|
||||||
|
* If a C++ exception is triggered but no landing pad is required: personality
|
||||||
|
is `__aeabi_unwind_cpp_pr0`, only a compact model entry is needed, no
|
||||||
|
`.ARM.extab`
|
||||||
|
* If there is a catch: `__gxx_personality_v0` is required, `.ARM.extab` is
|
||||||
|
required
|
||||||
|
|
||||||
|
`.ARM.extab` is equivalent to the combined `.eh_frame` and `.gcc_except_table`.
|
||||||
|
|
||||||
|
### Generic model
|
||||||
|
|
||||||
|
```c
|
||||||
|
uint32_t personality; // bit 31 is 0
|
||||||
|
uint32_t : 24;
|
||||||
|
uint32_t num : 8;
|
||||||
|
uint32_t opcodes[]; // opcodes, variable length
|
||||||
|
uint8_t lsda[]; // variable length
|
||||||
|
```
|
||||||
|
|
||||||
|
In construction.
|
||||||
|
|
||||||
|
## Windows ARM64 exception handling
|
||||||
|
|
||||||
|
See https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling, this
|
||||||
|
is my favorite coding scheme. Support the unwinding of mid-prolog and
|
||||||
|
mid-epilog. Support function fragments (used to represent unconventional stack
|
||||||
|
frames such as shrink wrapping).
|
||||||
|
|
||||||
|
Saved in two sections `.pdata` and `.xdata`.
|
||||||
|
|
||||||
|
```c
|
||||||
|
uint32_t function_start_rva;
|
||||||
|
uint32_t Flag : 2;
|
||||||
|
uint32_t Data : 30;
|
||||||
|
```
|
||||||
|
|
||||||
|
For canonical form functions, Packed Unwind Data is used, and no `.xdata` record
|
||||||
|
is required; for descriptors that cannot be represented by Packed Unwind Data,
|
||||||
|
it is stored in `.xdata`.
|
||||||
|
|
||||||
|
### Packed Unwind Data
|
||||||
|
|
||||||
|
```c
|
||||||
|
uint32_t FunctionStartRVA;
|
||||||
|
uint32_t Flag : 2;
|
||||||
|
uint32_t FunctionLength : 11;
|
||||||
|
uint32_t RegF : 3;
|
||||||
|
uint32_t RegI : 4;
|
||||||
|
uint32_t H : 1;
|
||||||
|
uint32_t CR : 2;
|
||||||
|
uint32_t FrameSize : 9;
|
||||||
|
```
|
||||||
|
|
||||||
|
## MIPS compact exception tables
|
||||||
|
|
||||||
|
In construction.
|
||||||
|
|
||||||
|
## Linux kernel ORC unwind tables
|
||||||
|
|
||||||
|
For x86-64, the Linux kernel uses its own unwind tables: ORC. You can find its
|
||||||
|
documentation on https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html
|
||||||
|
and there is an lwn.net introduction [The ORCs are coming](https://lwn.net/Articles/728339/).
|
||||||
|
|
||||||
|
`objtool orc generate a.o` parses `.eh_frame` and generates `.orc_unwind` and
|
||||||
|
`.orc_unwind_ip`. For an object file assembled from:
|
||||||
|
|
||||||
|
```
|
||||||
|
.globl foo
|
||||||
|
.type foo, @function
|
||||||
|
foo:
|
||||||
|
ret
|
||||||
|
```
|
||||||
|
|
||||||
|
At two addresses the unwind information changes: the start of foo and the end
|
||||||
|
of foo, so 2 ORC entries will be produced. If the DWARF CFA changes (e.g. due
|
||||||
|
to push/pop) in the middle of the function, there may be more entries.
|
||||||
|
|
||||||
|
`.orc_unwind_ip` contains two entries, representing the PC-relative addresses.
|
||||||
|
|
||||||
|
```
|
||||||
|
Relocation section '.rela.orc_unwind_ip' at offset 0x2028 contains 2 entries:
|
||||||
|
Offset Info Type Symbol's Value Symbol's Name + Addend
|
||||||
|
0000000000000000 0000000500000002 R_X86_64_PC32 0000000000000000 .text + 0
|
||||||
|
0000000000000004 0000000500000002 R_X86_64_PC32 0000000000000000 .text + 1
|
||||||
|
```
|
||||||
|
|
||||||
|
`.orc_unwind` contains two entries of type `orc_entry`. The entries encode how
|
||||||
|
IP/SP/BP of the previous frame are stored.
|
||||||
|
|
||||||
|
```c
|
||||||
|
struct orc_entry {
|
||||||
|
s16 sp_offset; // sp_offset and sp_reg encode where SP of the previous frame is stored
|
||||||
|
s16 bp_offset; // bp_offset and bp_reg encode where BP of the previous frame is stored
|
||||||
|
unsigned sp_reg:4;
|
||||||
|
unsigned bp_reg:4;
|
||||||
|
unsigned type:2; // how IP of the previous frame is stored
|
||||||
|
unsigned end:1;
|
||||||
|
} __attribute__((__packed__));
|
||||||
|
```
|
||||||
|
|
||||||
|
You may find similarities in this scheme and `UNWIND_MODE_BP_FRAME` and
|
||||||
|
`UNWIND_MODE_STACK_IMMD` in Apples's compact unwind descriptors. The ORC scheme
|
||||||
|
uses 16-bit integers so assumably `UNWIND_MODE_STACK_IND` will not be needed.
|
||||||
|
During unwinding, most callee-saved registers other than BP are unneeded, so
|
||||||
|
ORC does not bother recording them.
|
||||||
|
|
||||||
|
The linker will resolve relocations in `.orc_unwind_ip` and create
|
||||||
|
`__start_orc_unwind_ip`/`__stop_orc_unwind_ip`/`__start_orc_unwind`/
|
||||||
|
`__stop_orc_unwind` delimiter the section contents. Then, a host utility
|
||||||
|
scripts/sorttable sorts the contents of `.orc_unwind_ip` and `.orc_unwind`. To
|
||||||
|
unwind a stack frame, `unwind_next_frame`
|
||||||
|
* performs a binary search into the `.orc_unwind_ip` table to figure out the
|
||||||
|
relevant ORC entry
|
||||||
|
* retrieves the previous SP with the current SP, `orc->sp_reg` and
|
||||||
|
`orc->sp_offset`.
|
||||||
|
* retrieves the previous IP with `orc->type` and other values.
|
||||||
|
* retrieves the previous BP with the currrent BP, the previous SP, `orc->bp_reg`
|
||||||
|
and `orc->bp_offset`. `bp->reg` can be
|
||||||
|
`ORC_REG_UNDEFINED`/`ORC_REG_PREV_SP`/`ORC_REG_BP`.
|
||||||
|
|
|
@ -0,0 +1,558 @@
|
||||||
|
# All about symbol versioning
|
||||||
|
|
||||||
|
In 1995, Solaris' link editor and ld.so introduced the symbol versioning
|
||||||
|
mechanism. Ulrich Drepper and Eric Youngdale borrowed Solaris symbol versioning
|
||||||
|
in 1997 and designed the GNU style symbol versioning for glibc.
|
||||||
|
|
||||||
|
When a shared object is updated, the behavior of a symbol changes (ABI changes
|
||||||
|
(such as changing the type of parameters or return values) or behavior
|
||||||
|
changes), traditionally a `DT_SONAME` bump is required. Otherwise a dependent
|
||||||
|
application/shared object built with the old version may run abnormally. This
|
||||||
|
can be inconvenient if the number of dependent applications is large.
|
||||||
|
|
||||||
|
Symbol versioning provides backward compatibility without changing `DT_SONAME`.
|
||||||
|
|
||||||
|
The following part describes the representation, and then describes the
|
||||||
|
behaviors from the perspectives of assembler, linker, and ld.so. One may wish
|
||||||
|
to skip the representation part when reading for the first time.
|
||||||
|
|
||||||
|
## Representation
|
||||||
|
|
||||||
|
In a shared object or executable file that uses symbol versioning, there are up
|
||||||
|
to three sections related to symbol versioning. `.gnu.version_r` and
|
||||||
|
`.gnu.version_d` among them are optional:
|
||||||
|
|
||||||
|
* `.gnu.version` (version symbol section). The `DT_VERSYM` tag in the dynamic
|
||||||
|
table points to the section. Assuming there are N entries in `.dynsym`,
|
||||||
|
`.gnu.version` contains N `uint16_t` values, with the i-th entry indicating
|
||||||
|
the version ID of the i-th symbol. Put it another way, `.gnu.version` is a
|
||||||
|
parallel table to `.dynsym`.
|
||||||
|
* `.gnu.version_r` (version requirement section). The `DT_VERNEED`/
|
||||||
|
`DT_VERNEEDNUM` tags in the dynamic table delimiter this section. This
|
||||||
|
section describes the version information used by the undefined versioned
|
||||||
|
symbol in the module.
|
||||||
|
* `.gnu.version_d` (version definition section). The `DT_VERDEF`/`DT_VERDEFNUM`
|
||||||
|
tags in the dynamic table delimiter this section. This section describes the
|
||||||
|
version information used by the defined versioned symbols in the module.
|
||||||
|
|
||||||
|
```c
|
||||||
|
// Version definitions
|
||||||
|
typedef struct {
|
||||||
|
Elf64_Half vd_version; // version: 1
|
||||||
|
Elf64_Half vd_flags; // VER_FLG_BASE (index 1) or 0 (index != 1)
|
||||||
|
Elf64_Half vd_ndx; // version index
|
||||||
|
Elf64_Half vd_cnt; // number of associated aux entries, always 1 in practice
|
||||||
|
Elf64_Word vd_hash; // SysV hash of the version name
|
||||||
|
Elf64_Word vd_aux; // offset in bytes to the verdaux array
|
||||||
|
Elf64_Word vd_next; // offset in bytes to the next verdef entry
|
||||||
|
} Elf64_Verdef;
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
Elf64_Word vda_name; // version name
|
||||||
|
Elf64_Word vda_next; // offset in bytes to the next verdaux entry
|
||||||
|
} Elf64_Verdaux;
|
||||||
|
|
||||||
|
// Version needs
|
||||||
|
typedef struct {
|
||||||
|
Elf64_Half vn_version; // version: 1
|
||||||
|
Elf64_Half vn_cnt; // number of associated aux entries
|
||||||
|
Elf64_Word vn_file; // .dynstr offset of the depended filename
|
||||||
|
Elf64_Word vn_aux; // offset in bytes to vernaux array
|
||||||
|
Elf64_Word vn_next; // offset in bytes to next verneed entry
|
||||||
|
} Elf64_Verneed;
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
Elf64_Word vna_hash; // SysV hash of vna_name
|
||||||
|
Elf64_Half vna_flags; // usually 0; copied from vd_flags of the depended so
|
||||||
|
Elf64_Half vna_other; // unused
|
||||||
|
Elf64_Word vna_name; // .dynstr offset of the version name
|
||||||
|
Elf64_Word vna_next; // offset in bytes to next vernaux entry
|
||||||
|
} Elf64_Vernaux;
|
||||||
|
```
|
||||||
|
|
||||||
|
Currently GNU ld does not set the `VER_FLG_WEAK` flag. [BZ24718#c15](https://sourceware.org/bugzilla/show_bug.cgi?id=24718#c15) proposed "set
|
||||||
|
`VER_FLG_WEAK` on version reference if all symbols are weak".
|
||||||
|
|
||||||
|
The advantage of using a parallel table for `.gnu.version` is that symbol
|
||||||
|
versioning is optional. ld.so implementations which do not support symbol
|
||||||
|
versioning can freely assume no symbol has a version. The behavior is that all
|
||||||
|
references as if bind to the default version definitions. musl ld.so falls into
|
||||||
|
this category.
|
||||||
|
|
||||||
|
### Version index values
|
||||||
|
|
||||||
|
Index 0 is called `VER_NDX_LOCAL`. The binding of the symbol will be changed to
|
||||||
|
`STB_LOCAL`. Index 1 is called `VER_NDX_GLOBAL`. It has no special effect and
|
||||||
|
is used for unversioned symbols. Index 2 to 0xffef are used for user defined
|
||||||
|
versions.
|
||||||
|
|
||||||
|
Defined versioned symbols have two forms:
|
||||||
|
|
||||||
|
* foo@@v2, the default version.
|
||||||
|
* foo@v2, a non-default version (hidden version). The `VERSYM_HIDDEN` bit of the
|
||||||
|
version ID is set.
|
||||||
|
|
||||||
|
Undefined versioned symbols have only the `foo@v2` form.
|
||||||
|
|
||||||
|
Usually versioned symbols are only defined in shared objects, but executables
|
||||||
|
can have defined versioned symbols as well. (When a shared object is updated,
|
||||||
|
the old symbols are retained so that other shared objects do not need to be
|
||||||
|
relinked, and executable files usually do not provide versioned symbols for
|
||||||
|
other shared objects to reference.)
|
||||||
|
|
||||||
|
### Example
|
||||||
|
|
||||||
|
`readelf -V` can dump the symbol versioning tables.
|
||||||
|
|
||||||
|
In the `.gnu.version_d` output below:
|
||||||
|
|
||||||
|
* Version index 1 (`VER_NDX_GLOBAL`) is the filename (soname if shared object).
|
||||||
|
The `VER_FLG_BASE` flag is set.
|
||||||
|
* Version index 2 is a user defined version. Its name is `LUA_5.3`.
|
||||||
|
|
||||||
|
In the `.gnu.version_r` output below, each of version indexes 3~10 represents a
|
||||||
|
version in a depended shared object. The name `GLIBC_2.2.5` appears thrice,
|
||||||
|
each for a different shared object.
|
||||||
|
|
||||||
|
The `.gnu.version` table assigns a version index to each `.dynsym` entry.
|
||||||
|
|
||||||
|
```
|
||||||
|
% readelf -V /usr/bin/lua5.3
|
||||||
|
|
||||||
|
Version symbols section '.gnu.version' contains 248 entries:
|
||||||
|
Addr: 0x0000000000002af4 Offset: 0x002af4 Link: 5 (.dynsym)
|
||||||
|
000: 0 (*local*) 3 (GLIBC_2.3) 4 (GLIBC_2.2.5) 4 (GLIBC_2.2.5)
|
||||||
|
004: 5 (GLIBC_2.3.4) 4 (GLIBC_2.2.5) 4 (GLIBC_2.2.5) 4 (GLIBC_2.2.5)
|
||||||
|
...
|
||||||
|
|
||||||
|
Version definition section '.gnu.version_d' contains 2 entries:
|
||||||
|
Addr: 0x0000000000002ce8 Offset: 0x002ce8 Link: 6 (.dynstr)
|
||||||
|
000000: Rev: 1 Flags: BASE Index: 1 Cnt: 1 Name: lua5.3
|
||||||
|
0x001c: Rev: 1 Flags: none Index: 2 Cnt: 1 Name: LUA_5.3
|
||||||
|
|
||||||
|
Version needs section '.gnu.version_r' contains 3 entries:
|
||||||
|
Addr: 0x0000000000002d20 Offset: 0x002d20 Link: 6 (.dynstr)
|
||||||
|
000000: Version: 1 File: libdl.so.2 Cnt: 1
|
||||||
|
0x0010: Name: GLIBC_2.2.5 Flags: none Version: 9
|
||||||
|
0x0020: Version: 1 File: libm.so.6 Cnt: 1
|
||||||
|
0x0030: Name: GLIBC_2.2.5 Flags: none Version: 6
|
||||||
|
0x0040: Version: 1 File: libc.so.6 Cnt: 6
|
||||||
|
0x0050: Name: GLIBC_2.11 Flags: none Version: 10
|
||||||
|
0x0060: Name: GLIBC_2.14 Flags: none Version: 8
|
||||||
|
0x0070: Name: GLIBC_2.4 Flags: none Version: 7
|
||||||
|
0x0080: Name: GLIBC_2.3.4 Flags: none Version: 5
|
||||||
|
0x0090: Name: GLIBC_2.2.5 Flags: none Version: 4
|
||||||
|
0x00a0: Name: GLIBC_2.3 Flags: none Version: 3
|
||||||
|
```
|
||||||
|
|
||||||
|
### Symbol versioning in object files
|
||||||
|
|
||||||
|
The GNU scheme allows `.symver` directives to label the versions of the symbols
|
||||||
|
in objec files. The symbol names residing in .o contain `@` or `@@`.
|
||||||
|
|
||||||
|
## Assembler behavior
|
||||||
|
|
||||||
|
GNU as and LLVM integrated assembler provide implementation.
|
||||||
|
|
||||||
|
* `.symver foo, foo@v1`
|
||||||
|
* If foo is undefined, produce `foo@v1`
|
||||||
|
* If foo is defined, produce `foo` and `foo@v1` with the same binding
|
||||||
|
(`STB_LOCAL`, `STB_WEAK`, or `STB_GLOBAL`) and `st_other` value (i.e. the
|
||||||
|
same visibility). Personally I think this behavior is a design flaw
|
||||||
|
[{gas-copy}](). The proposed [V4 PATCH gas: Extend .symver directive](https://sourceware.org/pipermail/binutils/2020-April/110622.html)
|
||||||
|
can address this problem.
|
||||||
|
* `.symver foo, foo@@v1`
|
||||||
|
* If foo is undefined, error
|
||||||
|
* If foo is defined, produce `foo` and `foo@v1` with the same binding and `st_other` value.
|
||||||
|
* `.symver foo, foo@@@v1`
|
||||||
|
* If foo is undefined, produce `foo@v1`
|
||||||
|
* If foo is defined, produce `foo@@v1`
|
||||||
|
|
||||||
|
Personal recommendation:
|
||||||
|
|
||||||
|
* To define a default version symbol: use `.symver foo, foo@@@v2` so that foo
|
||||||
|
is not present.
|
||||||
|
* To define a non-default version symbol, add a suffix to the original symbol
|
||||||
|
name (`.symver foo_v1, foo@v1`) to prevent conflicts with `foo`. This will
|
||||||
|
however leave (usually undesirable) `foo_v1`. If you don't strip `foo_v1` from
|
||||||
|
the object file, you may localize it with a local: pattern in the version
|
||||||
|
script. With GNU as 2.35 ([PR25295](https://sourceware.org/bugzilla/show_bug.cgi?id=25295)),
|
||||||
|
you can use `.symver foo_v1, foo@v1, remove`
|
||||||
|
* The version of an undefined symbol is usually bound at link time. It is
|
||||||
|
usually unnecessary to set the version with `.symver`. If required, prefer
|
||||||
|
`.symver foo, foo@@@v1` to `.symver foo, foo@v1`.
|
||||||
|
|
||||||
|
## Linker behavior
|
||||||
|
|
||||||
|
The linker enters the symbol resolution stage after reading in object files,
|
||||||
|
archive files, shared objects, LTO files, linker scripts, etc.
|
||||||
|
|
||||||
|
GNU ld uses indirect symbol to represent versioned symbols. There are
|
||||||
|
complicated rules, and these rules are not documented. The symbol resolution
|
||||||
|
rules that I personally derived:
|
||||||
|
|
||||||
|
* Defined `foo` resolves undefined `foo` (traditional unversioned rule)
|
||||||
|
* Defined `foo@v1` resolves undefined `foo@v1` (a non-default version symbol is
|
||||||
|
like a separate symbol)
|
||||||
|
* Defined `foo@@v1` (default version) resolves both undefined `foo` and `foo@v1`
|
||||||
|
|
||||||
|
If there are multiple default version definitions (such as `foo@@v1 foo@@v2`),
|
||||||
|
a duplicate definition error should be issued even if one is weak. Usually a
|
||||||
|
symbol has zero or one default version (`@@`) definition, and an arbitrary
|
||||||
|
number of non-default version (`@`) definitions.
|
||||||
|
|
||||||
|
If the linker sees undefined `foo` and `foo@v1` first, it will treat them as
|
||||||
|
two symbols. When the linker see the definition `foo@@v1`, conceptually `foo`
|
||||||
|
and `foo@@v1` should be combined. If the linker sees `foo@@v2` instead,
|
||||||
|
`foo@@v2` should resolve `foo` and `foo@v1` should be a separate symbol.
|
||||||
|
|
||||||
|
* [Combining Versions](combining-versions.md) describes the problem.
|
||||||
|
* `gold/symtab.cc Symbol_table::define_default_version` uses a heuristic rule
|
||||||
|
to solve this problem. It special cases on visibility, but I feel that this
|
||||||
|
rule is unneeded.
|
||||||
|
* Before 2.26, GNU ld reported a bogus multiple definition error for defined
|
||||||
|
weak `foo@@v1` and defined global `foo@v1` [PR ld/26978](https://sourceware.org/bugzilla/show_bug.cgi?id=26978)
|
||||||
|
* Before 2.26, GNU ld had a bug that the visibility of undefined `foo@v1` does
|
||||||
|
not affect the output visibility of `foo@@v1`: [PR ld/26979](https://sourceware.org/bugzilla/show_bug.cgi?id=26979)
|
||||||
|
* I fixed the object file side problem of LLD 12.0 in https://reviews.llvm.org/D92259
|
||||||
|
`foo` Archive files and lazy object files may still have incompatibility issues.
|
||||||
|
|
||||||
|
When LLD sees a defined `foo@@v`, it adds both `foo` and `foo@v1` into the
|
||||||
|
symbol table, thus `foo@@v1` can resolve both undefined `foo` and `foo@v1`.
|
||||||
|
After processing all input files, a pass iterates symbols and redirects
|
||||||
|
`foo@v1` to `foo@@v1`. Becase LLD treats them as separate symbols during input
|
||||||
|
processing, a defined `foo@v` cannot suppress the extraction of an archive
|
||||||
|
member defining `foo@@v1`, leading to a behavior incompatible with GNU ld. This
|
||||||
|
probably does not matter, though.
|
||||||
|
|
||||||
|
GNU ld has another strange behavior: if both `foo` and `foo@v1` are defined, `foo`
|
||||||
|
will be removed. I strongly believe it is an issue in GNU ld but the maintainer
|
||||||
|
rejected [PR ld/27210](https://sourceware.org/bugzilla/show_bug.cgi?id=27210).
|
||||||
|
|
||||||
|
## Version script
|
||||||
|
|
||||||
|
To define a versioned symbol in a shared object or an executable, a version
|
||||||
|
script must be specified. If all versioned symbols are undefined, then the
|
||||||
|
version script can be omitted.
|
||||||
|
|
||||||
|
```
|
||||||
|
# Make all symbols other than foo and bar local.
|
||||||
|
{ global: foo; bar; local: *; };
|
||||||
|
|
||||||
|
# Assign version FBSD_1.0 to malloc and version FBSD_1.3 to mallocx,
|
||||||
|
# and make internal local.
|
||||||
|
FBSD_1.0 { malloc; local: internal; };
|
||||||
|
FBSD_1.3 { mallocx; };
|
||||||
|
```
|
||||||
|
|
||||||
|
A version script has three purposes:
|
||||||
|
|
||||||
|
* Define versions.
|
||||||
|
* Specify some patterns so that matched defined symbols (which do not have `@`
|
||||||
|
in the name) are tied to the specified version.
|
||||||
|
* Scope reduction: for a defined unversioned symbol matched by a `local:`
|
||||||
|
pattern, its binding will be changed to `STB_LOCAL` and will not be exported
|
||||||
|
to the dynamic symbol table.
|
||||||
|
|
||||||
|
A version script can consist of one anonymous version tag (`{...};`) or a list of
|
||||||
|
named version tags (`v1 {...};`). If you use an anonymous version tag with other
|
||||||
|
version tags, GNU ld will error: `anonymous version tag cannot be combined with
|
||||||
|
other version tags`. A `local:` part can be placed in any version tag. Which
|
||||||
|
version tag is used does not matter.
|
||||||
|
|
||||||
|
If a defined symbol is matched by multiple version tags, the following
|
||||||
|
precedence rules apply (`binutils-gdb/bfd/linker.c:find_version_for_sym`):
|
||||||
|
|
||||||
|
* The first version tag with an exact pattern (i.e. there is no wildcard) wins.
|
||||||
|
* Otherwise, the last version tag with a non-`*` wildcard pattern wins.
|
||||||
|
* Otherwise, the first version tag with a `*` pattern wins.
|
||||||
|
|
||||||
|
The gotcha is that `**` is a wildcard pattern which matches any symbol but its
|
||||||
|
precedence is higher than `*`.
|
||||||
|
|
||||||
|
Most patterns are exact so gold and LLD iterate patterns instead of symbols to
|
||||||
|
improve performance.
|
||||||
|
|
||||||
|
## How a versioned symbol is produced
|
||||||
|
|
||||||
|
An undefined symbol can be assigned a version if:
|
||||||
|
|
||||||
|
* its name does not contain `@` (`.symver` is unused) and a shared object
|
||||||
|
provides a default version definition.
|
||||||
|
* its name contains `@` and a shared object defines the symbol. GNU ld errors
|
||||||
|
if there is no such a shared object. After https://reviews.llvm.org/D92260,
|
||||||
|
LLD will report an error as well.
|
||||||
|
|
||||||
|
A defined symbol can be assigned a version if:
|
||||||
|
|
||||||
|
* its name does not contain `@` and it is matched by a pattern in a named version tag in a version script.
|
||||||
|
* its name contains `@`
|
||||||
|
* If `-shared`, the version should be defined by a version script, otherwise
|
||||||
|
GNU ld errors version node not found for symbol. This exception looks
|
||||||
|
strange to me so I have filed [PR ld/26980](https://sourceware.org/bugzilla/show_bug.cgi?id=26980).
|
||||||
|
* If `-no-pie` or `-pie`, a version definition is unneeded in GNU ld. This
|
||||||
|
behavior is strange.
|
||||||
|
|
||||||
|
## ld.so behavior
|
||||||
|
|
||||||
|
/Linux Standard Base Core Specification, Generic Part/ describes the behavior
|
||||||
|
of ld.so. Kan added symbol versioning support to FreeBSD rtld in 2005.
|
||||||
|
|
||||||
|
The `DT_VERNEED` and `DT_VERNEEDNUM` tags in the dynamic table delimiter the
|
||||||
|
version requirement by a shared object/executable file: the requires versions
|
||||||
|
and required shared object names (`Vernaux::vna_name`).
|
||||||
|
|
||||||
|
For each Vernaux entry (a Verneed's auxilliary entry) without the
|
||||||
|
`VER_FLG_WEAK` bit, ld.so checks whether the referenced shared object has the
|
||||||
|
`DT_VERDEF` table. If no, ld.so handles the case as a graceful degradation; if
|
||||||
|
yes and the table does not define the version, ld.so reports an error.
|
||||||
|
[verneed-check]
|
||||||
|
|
||||||
|
Usually a minor release does not bump soname. Suppose that libB.so depends on
|
||||||
|
the libA 1.3 (soname is libA.so.1) and calls an function which does not exist
|
||||||
|
in libA 1.2. If PLT lazy binding is used, libB.so may seem to work on a system
|
||||||
|
with libA 1.2, until the PLT of the 1.3 symbol is called. If symbol versioning
|
||||||
|
is not used and you want to solve this problem, you have to record the minor
|
||||||
|
version number (`libA.so.1.3`) in the soname. However, bumping soname is
|
||||||
|
all-or-nothing: all the dependent shared objects need to be relinked. If symbol
|
||||||
|
versioning is used, you can continue to use the soname `libA.so.1`. ld.so will
|
||||||
|
report an error if libA 1.2 is used, because the 1.3 version required by
|
||||||
|
libB.so does not exist.
|
||||||
|
|
||||||
|
In the symbol resolution stage:
|
||||||
|
|
||||||
|
* An undefined foo can be resolved to a definition of `foo` or `foo@@v2` (only
|
||||||
|
the definitions with index number 1 (`VER_NDX_GLOBAL`) and 2 are used in the
|
||||||
|
reference match).
|
||||||
|
* An undefined `foo@v1` can be resolved to a definition of `foo`, `foo@v1`, or
|
||||||
|
`foo@@v1`.
|
||||||
|
|
||||||
|
Note (undefined `foo` resolving to `foo@v1`) is allowed by ld.so but not
|
||||||
|
allowed by the linker [{reject-non-default}](). This difference provides a
|
||||||
|
mechanism to refuse linking against old symbols while keeping compatibility
|
||||||
|
with unversioned old libraries. If a new version of a shared object needs to
|
||||||
|
deprecate an unversioned `bar`, you can remove bar and define `bar@compat`
|
||||||
|
instead. Libraries using `bar` are unaffected but new links against `bar` are
|
||||||
|
disallowed.
|
||||||
|
|
||||||
|
## Upgraded symbols in glibc
|
||||||
|
|
||||||
|
Note that GNU nm before binutils 2.35 does not display `@` or `@@`.
|
||||||
|
|
||||||
|
```
|
||||||
|
nm -D /lib/x86_64-linux-gnu/libc.so.6 | \
|
||||||
|
awk '$2!="U" {i=index($3,"@"); if(i){v=substr($3,i); $3=substr($3,1,i-1); m[$3]=m[$3]" "v}} \
|
||||||
|
END {for(f in m)if(m[f]~/@.+@/)print f, m[f]}'
|
||||||
|
```
|
||||||
|
|
||||||
|
The output on my x86-64 system:
|
||||||
|
|
||||||
|
```
|
||||||
|
pthread_cond_broadcast @GLIBC_2.2.5 @@GLIBC_2.3.2
|
||||||
|
clock_nanosleep @@GLIBC_2.17 @GLIBC_2.2.5
|
||||||
|
_sys_siglist @@GLIBC_2.3.3 @GLIBC_2.2.5
|
||||||
|
sys_errlist @@GLIBC_2.12 @GLIBC_2.2.5 @GLIBC_2.3 @GLIBC_2.4
|
||||||
|
quick_exit @GLIBC_2.10 @@GLIBC_2.24
|
||||||
|
memcpy @@GLIBC_2.14 @GLIBC_2.2.5
|
||||||
|
regexec @GLIBC_2.2.5 @@GLIBC_2.3.4
|
||||||
|
pthread_cond_destroy @GLIBC_2.2.5 @@GLIBC_2.3.2
|
||||||
|
nftw @GLIBC_2.2.5 @@GLIBC_2.3.3
|
||||||
|
pthread_cond_timedwait @@GLIBC_2.3.2 @GLIBC_2.2.5
|
||||||
|
clock_getres @GLIBC_2.2.5 @@GLIBC_2.17
|
||||||
|
pthread_cond_signal @@GLIBC_2.3.2 @GLIBC_2.2.5
|
||||||
|
fmemopen @GLIBC_2.2.5 @@GLIBC_2.22
|
||||||
|
pthread_cond_init @GLIBC_2.2.5 @@GLIBC_2.3.2
|
||||||
|
clock_gettime @GLIBC_2.2.5 @@GLIBC_2.17
|
||||||
|
sched_setaffinity @GLIBC_2.3.3 @@GLIBC_2.3.4
|
||||||
|
glob @@GLIBC_2.27 @GLIBC_2.2.5
|
||||||
|
sys_nerr @GLIBC_2.2.5 @GLIBC_2.4 @@GLIBC_2.12 @GLIBC_2.3
|
||||||
|
_sys_errlist @GLIBC_2.3 @GLIBC_2.4 @@GLIBC_2.12 @GLIBC_2.2.5
|
||||||
|
sys_siglist @GLIBC_2.2.5 @@GLIBC_2.3.3
|
||||||
|
clock_getcpuclockid @GLIBC_2.2.5 @@GLIBC_2.17
|
||||||
|
realpath @GLIBC_2.2.5 @@GLIBC_2.3
|
||||||
|
sys_sigabbrev @GLIBC_2.2.5 @@GLIBC_2.3.3
|
||||||
|
posix_spawnp @@GLIBC_2.15 @GLIBC_2.2.5
|
||||||
|
posix_spawn @@GLIBC_2.15 @GLIBC_2.2.5
|
||||||
|
_sys_nerr @@GLIBC_2.12 @GLIBC_2.4 @GLIBC_2.3 @GLIBC_2.2.5
|
||||||
|
nftw64 @GLIBC_2.2.5 @@GLIBC_2.3.3
|
||||||
|
pthread_cond_wait @GLIBC_2.2.5 @@GLIBC_2.3.2
|
||||||
|
sched_getaffinity @GLIBC_2.3.3 @@GLIBC_2.3.4
|
||||||
|
clock_settime @GLIBC_2.2.5 @@GLIBC_2.17
|
||||||
|
glob64 @@GLIBC_2.27 @GLIBC_2.2.5
|
||||||
|
```
|
||||||
|
|
||||||
|
* `realpath@@GLIBC_2.3`: the previous version returns `EINVAL` when the second
|
||||||
|
parameter is NULL
|
||||||
|
* `memcpy@@GLIBC_2.14` [BZ12518](https://sourceware.org/bugzilla/show_bug.cgi?id=12518):
|
||||||
|
the previous version guarantees a forward copying behavior. Shockwave Flash
|
||||||
|
at that time had a "memcpy downward" bug which required the workaround.
|
||||||
|
* `quick_exit@@GLIBC_2.24` [BZ20198](https://sourceware.org/bugzilla/show_bug.cgi?id=20198):
|
||||||
|
the previous version copies the destructors of `thread_local` objects.
|
||||||
|
* `glob64@@GLIBC_2.27`: the previous version does not follow dangling symlinks.
|
||||||
|
|
||||||
|
## How to remove symbol versioning
|
||||||
|
|
||||||
|
Imagine that you want to build an application with a prebuilt shared object
|
||||||
|
which has versioned references, but you can only find shared objects providing
|
||||||
|
the unversioned definitions. The linker will helpfully error:
|
||||||
|
|
||||||
|
```
|
||||||
|
ld.lld: error: undefined reference to foo@v1 [--no-allow-shlib-undefined]
|
||||||
|
```
|
||||||
|
|
||||||
|
As the diagnostic suggests, you can add `--allow-shlib-undefined` to get rid of
|
||||||
|
the error. It is not recommended but the built application may happen to work.
|
||||||
|
|
||||||
|
For this case, an alternative hacky solution is:
|
||||||
|
|
||||||
|
```
|
||||||
|
# 32-bit
|
||||||
|
cp in.so out.so
|
||||||
|
r2 -wqc '/x feffff6f00000000 @ section..dynamic; w0 16 @ hit0_0' out.so
|
||||||
|
llvm-objcopy -R .gnu.version out.so
|
||||||
|
|
||||||
|
# 64-bit
|
||||||
|
cp in.so out.so
|
||||||
|
r2 -wqc '/x feffff6f @ section..dynamic; w0 8 @ hit0_0' out.so
|
||||||
|
llvm-objcopy -R .gnu.version out.so
|
||||||
|
```
|
||||||
|
|
||||||
|
With the removal of `.gnu.version`, the linker will think that `out.so`
|
||||||
|
references foo instead of `foo@v1`. However, llvm-objcopy will zero out the
|
||||||
|
section contents. At runtime, glibc ld.so will complain unsupported version 0
|
||||||
|
of Verneed record. To make glibc happy, you can delete `DT_VER*` tags from the
|
||||||
|
dynamic table. The above code snippet uses an r2 command to locate
|
||||||
|
`DT_VERNEED(0x6ffffffe)` and rewrite it to `DT_NULL`(a `DT_NULL` entry stops
|
||||||
|
the parsing of the dynamic table). The difference of the `readelf -d` output is
|
||||||
|
roughly:
|
||||||
|
|
||||||
|
```
|
||||||
|
0x000000006ffffffb (FLAGS_1) Flags: NOW
|
||||||
|
- 0x000000006ffffffe (VERNEED) 0x8ef0
|
||||||
|
- 0x000000006fffffff (VERNEEDNUM) 5
|
||||||
|
- 0x000000006ffffff0 (VERSYM) 0x89c0
|
||||||
|
- 0x000000006ffffff9 (RELACOUNT) 1536
|
||||||
|
0x0000000000000000 (NULL) 0x0
|
||||||
|
```
|
||||||
|
|
||||||
|
## LLD
|
||||||
|
|
||||||
|
* If an undefined symbol is not defined by a shared object, GNU ld will report
|
||||||
|
an error. LLD before 12.0 did not error (I fixed it in
|
||||||
|
https://reviews.llvm.org/D92260).
|
||||||
|
|
||||||
|
## Remarks
|
||||||
|
|
||||||
|
GCC/Clang supports asm specifier and `#pragma redefine_extname` renaming a
|
||||||
|
symbol. For example, if you declare `int foo() asm("foo_v1");` and then
|
||||||
|
reference `foo`, the symbol in .o will be `foo_v1`.
|
||||||
|
|
||||||
|
For example, the biggest change in musl v1.2.0 is the time64 support for its
|
||||||
|
supported 32-bit architectures. musl adopted a scheme based on asm specifiers:
|
||||||
|
|
||||||
|
```c
|
||||||
|
// include/features.h
|
||||||
|
#define __REDIR(x,y) __typeof__(x) x __asm__(#y)
|
||||||
|
|
||||||
|
// API header include/sys/time.h
|
||||||
|
int utimes(cosnt char *, const struct timeval [2]);
|
||||||
|
__REDIR(utimes, __utimes_time64);
|
||||||
|
|
||||||
|
// Implementation src/linux/utimes.c
|
||||||
|
int utimes(const char *path, const struct timeval times[2]) { ... }
|
||||||
|
|
||||||
|
// Internal header compat/time32/time32.h
|
||||||
|
int __utimes_time32() __asm__("utimes");
|
||||||
|
|
||||||
|
// Compat implementation compat/time32/utimes_time32.c
|
||||||
|
int __utimes_time32(const char *path, const struct timeval32 times32[2]) { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
* In .o, the time32 symbol remains `utimes` and is compatible with the ABI
|
||||||
|
required by programs linked against old musl versions; the time64 symbol is
|
||||||
|
`__utimes_time64`.
|
||||||
|
* The public header redirects utimes to `__utimes_time64`.
|
||||||
|
* cons: if the user declares utimes by themself, they will not link against
|
||||||
|
the correct `__utimes_time64`.
|
||||||
|
* The "good-looking" name `utimes` is used for the preferred time64
|
||||||
|
implementation internally and the "ugly" name `__utimes_time32` is used for
|
||||||
|
the legacy time32 implementation.
|
||||||
|
* If the time32 implementation is called elsewhere, the "ugly" name can make
|
||||||
|
it stand out.
|
||||||
|
|
||||||
|
For the above example, here is an implementation with symbol versioning:
|
||||||
|
|
||||||
|
```c
|
||||||
|
// API header include/sys/time.h
|
||||||
|
int utimes(cosnt char *, const struct timeval [2]);
|
||||||
|
|
||||||
|
// Implementation src/linux/utimes.c
|
||||||
|
int utimes(const char *path, const struct timeval times[2]) { ... }
|
||||||
|
|
||||||
|
// Internal header compat/time32/time32.h
|
||||||
|
// Probably __asm__(".symver __utimes_time32, utimes@time32, rename"); if supported
|
||||||
|
__asm__(".symver __utimes_time32, utimes@time32");
|
||||||
|
|
||||||
|
// Implementation compat/time32/utimes_time32.c
|
||||||
|
int __utimes_time32(const char *path, const struct timeval32 times32[2])
|
||||||
|
{
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that it is `@@@` cannot be used. The header is included in a defining
|
||||||
|
translation unit and `@@@` will lead to a default version definition while we
|
||||||
|
want a non-default version definition.
|
||||||
|
|
||||||
|
According to Assembler behavior, the undesirable `__utimes_time32` is present.
|
||||||
|
Be careful to use a version script to localize it.
|
||||||
|
|
||||||
|
So what is the significance of symbol versioning? I think carefully:
|
||||||
|
|
||||||
|
* Refuse linking against old symbols while keeping compatibility with
|
||||||
|
unversioned old libraries. [{reject-non-default}]()
|
||||||
|
* No need to label declarations.
|
||||||
|
* The version definition can be delayed until link time. The version script
|
||||||
|
provides a flexible pattern matching mechanism to assign versions.
|
||||||
|
* Scope reduction. Arguably another mechanism like `--dynamic-list` might have
|
||||||
|
been developed if version scripts did not provide `local:`.
|
||||||
|
* There are some semantic issues in renaming builtin functions with asm
|
||||||
|
specifiers in GCC and Clang (they do not know that the renamed symbol has
|
||||||
|
built-in semantic). See [2020-10-15-intra-call-and-libc-symbol-renaming](https://maskray.me/blog/2020-10-15-intra-call-and-libc-symbol-renaming)
|
||||||
|
* [verneed-check]
|
||||||
|
|
||||||
|
For the first item, the asm specifier scheme uses conventions to prevent
|
||||||
|
problems (users should include the header); and symbol versioning can be forced
|
||||||
|
by ld.
|
||||||
|
|
||||||
|
Design flaws:
|
||||||
|
|
||||||
|
* `.symver foo, foo@v1` In foobehavior defined [{gas-copy}](): reserved symbol
|
||||||
|
`foo`(redundant symbol has a link), binding / `st_other`sync (not convenient
|
||||||
|
to set different binding / visibility)
|
||||||
|
* Verdaux is a bit redundant. In practice, one Verdef has only one auxilliary
|
||||||
|
Verdaux entry.
|
||||||
|
* This is arguably a minor problem but annoying for a framework providing
|
||||||
|
multiple shared objects. ld.so requires "a versioned symbol is implemented in
|
||||||
|
the same shared object in which it was found at link time", which disallows
|
||||||
|
moving definitions between shared objects. Fortunately, glibc 2.30 [BZ24741](http://sourceware.org/PR24741)
|
||||||
|
relaxes this requirement, essentially ignoring `Vernaux::vna_name`.
|
||||||
|
|
||||||
|
Before that, glibc used a forwarder to move `clock_*` functions from librt.so
|
||||||
|
to libc.so:
|
||||||
|
|
||||||
|
```c
|
||||||
|
// rt/clock-compat.c
|
||||||
|
__typeof(clock_getres) *clock_getres_ifunc(void) asm("clock_getres");
|
||||||
|
__typeof(clock_getres) *clock_getres_ifunc(void) { return &__clock_getres; }
|
||||||
|
```
|
||||||
|
|
||||||
|
libc.so defines `__clock_getres` and `clock_getres`. librt.so defines an ifunc
|
||||||
|
called `clock_getres` which forwards to libc.so `__clock_getres`.
|
||||||
|
|
||||||
|
## Related links
|
||||||
|
|
||||||
|
* [Combining Versions](combining-versions.md)
|
||||||
|
* [Version Scripts](version-scripts.md)
|
||||||
|
* https://invisible-island.net/ncurses/ncurses-mapsyms.html
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,371 @@
|
||||||
|
# LLD and GNU linker incompatibilities
|
||||||
|
|
||||||
|
Subtitle: Is LLD a drop-in replacement for GNU ld?
|
||||||
|
|
||||||
|
The motivation for this article was someone challenging the "drop-in
|
||||||
|
replacement" claim on LLD's website (the discussion was about Linux-like ELF
|
||||||
|
toolchain):
|
||||||
|
|
||||||
|
> LLD is a linker from the LLVM project that is a drop-in replacement for
|
||||||
|
> system linkers and runs much faster than them. It also provides features that
|
||||||
|
> are useful for toolchain developers.
|
||||||
|
|
||||||
|
99.9% pieces of software work with LLD without a change. Some linker script
|
||||||
|
applications may need an adaption (such adaption is oftentimes due to brittle
|
||||||
|
assumptions: asking too much from GNU ld's behavior which should be fixed
|
||||||
|
anyway). So I defended for this claim.
|
||||||
|
|
||||||
|
Piotr Kubaj said that this is a probably more of a marketing term than a
|
||||||
|
technical term, the term tries to lure existing users into thinking "it's the
|
||||||
|
same you know, but better!". I think that this is fair in some senses: for many
|
||||||
|
applications LLD has achieved much faster speed and much lower memory usage
|
||||||
|
than GNU ld. A more important thing is that LLD adds a third choice to the
|
||||||
|
spectrum. It brings competitive pressure to both sides, gives incentive for
|
||||||
|
improvement, and makes for more standardized future features/extensions. One
|
||||||
|
reason that I am subscribed to the binutils mailing list is I want to
|
||||||
|
participate in its design processes (I am proud to say that I have managed to
|
||||||
|
find some early issues of various new things).
|
||||||
|
|
||||||
|
Anyway, I thought documenting the compatibility problems between the ELF ports
|
||||||
|
of LLD and GNU ld is useful, not only to others but also to my future self,
|
||||||
|
hence this article. I will try to describe GNU gold behaviors as well.
|
||||||
|
|
||||||
|
So here is the long list. Please keep in mind that many compatibility issues do
|
||||||
|
not really matter and a user may never run into such an issue. Many of them
|
||||||
|
just serve as educational purposes and my personal reference. There some some
|
||||||
|
user perceivable differences but quite a lot are WONTFIX on both GNU ld and
|
||||||
|
LLD. LLD, as a newer linker, has less legacy compatibility burden and can make
|
||||||
|
good default choices in some cases and say no to some unneeded
|
||||||
|
features/behaviors. A large number of features are duplicated in GNU ld's
|
||||||
|
various ports. It is also common that one thing behaves this way in port A and
|
||||||
|
another way in port B.
|
||||||
|
|
||||||
|
* GNU ld reports `gc-sections requires either an entry or an undefined symbol`
|
||||||
|
in a -r --gc-section link. LLD doesn't error
|
||||||
|
(https://reviews.llvm.org/D84131#2162411). I am unsure whether such a
|
||||||
|
diagnostic will be useful (an uncommon use case where the GC roots are more
|
||||||
|
than the explict linker options).
|
||||||
|
* The default image base for `-no-pie` links is different. For example, on
|
||||||
|
x86-64, GNU ld defaults to 0x400000 while LLD defaults to 0x200000.
|
||||||
|
* GNU ld synthesizes a `STT_FILE` symbol when copying non-`STT_SECTION`
|
||||||
|
`STB_LOCAL` symbols. LLD doesn't.
|
||||||
|
* The `STT_FILE` symbol name is the input filename. For compiler driver
|
||||||
|
specified startup files like `crti.o` and `crtn.o`, their absolute paths
|
||||||
|
will end up in the linked image. This breaks local determinism (toolchain
|
||||||
|
paths are leaked) for some users.
|
||||||
|
* I filed https://bugs.llvm.org/show_bug.cgi?id=48023 and
|
||||||
|
https://sourceware.org/bugzilla/show_bug.cgi?id=26822. From binutils 2.36
|
||||||
|
onwards, the base name will be used.
|
||||||
|
* Text relocations.
|
||||||
|
* In GNU ld, `-z notext`/`-z text`/unspecified are a tri-state. For
|
||||||
|
`-z notext`/unspecified, the dynamic tags `DT_TEXTREL` and `DF_TEXTREL` are
|
||||||
|
added on demand. If unspecified and GNU ld is configured with
|
||||||
|
`--enable-textrel-check=warning`, a warning will be issued.
|
||||||
|
* LLD has two states and add `DT_TEXTREL` and `DF_TEXTREL` if `-z notext` is specified.
|
||||||
|
* GNU ld supports more relocation types as text relocations.
|
||||||
|
* Default library paths.
|
||||||
|
* GNU ld has default library paths.
|
||||||
|
* LLD doesn't. This is intentional so https://reviews.llvm.org/D70048
|
||||||
|
(NetBSD) cannot be accepted.
|
||||||
|
* GNU ld supports grouped short options. This can sometimes cause surprising
|
||||||
|
behaviors with misspelled or unimplemented options, e.g. `-no-pie` means
|
||||||
|
`-n -o -pie` because GNU ld as of 2.35 has not implemented `-no-pie`. Nick
|
||||||
|
Clifton committed `Update the BFD linker so that it deprecates grouped short
|
||||||
|
options.` to deprecated the GNU ld feature. LLD never supports grouped short
|
||||||
|
options.
|
||||||
|
* Mixed `SHF_LINK_ORDER` and non-`SHF_LINK_ORDER` input sections in an output
|
||||||
|
section.
|
||||||
|
* LLD performs sorting within an input section description and allows
|
||||||
|
arbitrary mixes.
|
||||||
|
* GNU ld does not allow mixed sections
|
||||||
|
https://sourceware.org/bugzilla/show_bug.cgi?id=26256 (H.J. Lu has a patch)
|
||||||
|
* LLD defaults to `-z relro` by default. This is probably not a good default
|
||||||
|
but it is difficult to change now. I have a comment
|
||||||
|
https://bugs.llvm.org/show_bug.cgi?id=48549. GNU ld warns for `-z relro` and
|
||||||
|
`-z norelro` for non Linux/FreeBSD BFD emulations (e.g. `-m aarch64elf`).
|
||||||
|
* Different archive member extraction semantics. See
|
||||||
|
http://lld.llvm.org/ELF/warn_backrefs.html for details.
|
||||||
|
* LLD `--warn-backrefs` warns for `def.a ref.o def.so` if `def.a` cannot
|
||||||
|
satisfy previous unresolved symbols. LLD resolves the definition to `def.a`
|
||||||
|
while GNU linkers resolve the definition to `def.so`.
|
||||||
|
* GNU ld `-static` has traditionally been a synonym to `-Bstatic`. Recently on
|
||||||
|
x86 it has been changed to behave a bit similar to gold `-static`, which
|
||||||
|
disallows linking against shared objects. LLD `-static` is still a synonym to
|
||||||
|
`-Bstatic`.
|
||||||
|
* GNU linkers have a default `--dynamic-linker`. LLD doesn't.
|
||||||
|
* GNU linkers warn for `.gnu.warning.*` sections. LLD doesn't. It is unclear
|
||||||
|
the feature is useful. https://bugs.llvm.org/show_bug.cgi?id=42008
|
||||||
|
* GNU ld has architecture-specific rules for relocations referencing undefined
|
||||||
|
weak symbols. I don't think the GNU ld behaviors can be summarized (even by
|
||||||
|
maintainers!). LLD's are consistent.
|
||||||
|
* The conditions to create `.interp` are different. I believe GNU ld's is quite
|
||||||
|
difficult to describe.
|
||||||
|
* `--no-allow-shlib-undefined` and `--rpath-link`
|
||||||
|
* GNU ld traces all shared objects (transitive `DT_NEEDED` dependencies) and
|
||||||
|
emulates the bheavior of a dynamic loader to warn more cases.
|
||||||
|
* gold and LLD implement a simplified version. They warn for shared objects
|
||||||
|
whose `DT_NEEDED` dependencies are all seen as input files.
|
||||||
|
* `--fatal-warnings`
|
||||||
|
* GNU ld still reports warning: ....
|
||||||
|
* LLD switches to error: ....
|
||||||
|
* `--no-relax`
|
||||||
|
* GNU ld: disable `R_X86_64_[REX_]GOTPCRELX`
|
||||||
|
* LLD: no-op (https://reviews.llvm.org/D81359)
|
||||||
|
* LLD places `.rodata` (among other `SHF_ALLOC` and
|
||||||
|
non-`SHF_WRITE`-non-`SHF_EXECINSTR` sections) before .text (among other
|
||||||
|
`SHF_ALLOC` and `SHF_EXECINSTR` sections).
|
||||||
|
* `.symtab`/`.shstrtab`/`.strtab` in a linker script.
|
||||||
|
* Ignored by GNU ld, therefore `--orphan-handling=` does not warn/error.
|
||||||
|
* Respected by LLD
|
||||||
|
* Whether `ADDR(.foo)` in a linker script can retain an empty output section.
|
||||||
|
* GNU ld: no. Symbol assignments relative to such empty sections may have
|
||||||
|
strange `st_shndx`.
|
||||||
|
* LLD: yes.
|
||||||
|
* If an undefined symbol is referenced by both `R_X86_64_JUMP_SLOT` (lazy) and
|
||||||
|
R_X86_64_GLOB_DAT (`non-lazy`)
|
||||||
|
* GNU ld generates `.plt.got` with `R_X86_64_GLOB_DAT` relocations.
|
||||||
|
`R_X86_64_JUMP_SLOT` can thus be omitted to decrease the number of dynamic
|
||||||
|
relocations.
|
||||||
|
* LLD does not implement this saving. This naturally requires more than one
|
||||||
|
pass scanning relocations which LLD doesn't do at present. https://bugs.llvm.org/show_bug.cgi?id=32938
|
||||||
|
* GNU ld relaxes `R_X86_64_GOTPCREL` relocations with some forms (e.g.
|
||||||
|
`movq foo@GOTPCREL(%rip), %reg` -> `leaq foo(%rip), %reg`). LLD never
|
||||||
|
relaxes `R_X86_64_GOTPCREL` relocations.
|
||||||
|
* GNU linkers give `.gnu.linkonce*` sections COMDAT section semantics. LLD
|
||||||
|
simply ignores such sections. https://bugs.llvm.org/show_bug.cgi?id=31586
|
||||||
|
tracks when the hack can be removed.
|
||||||
|
* GNU ld adds `PT_PHDR` and `PT_INTERP` together. A shared object usually does
|
||||||
|
not have two program headers. In LLD, `PT_PHDR` is always added unless the
|
||||||
|
address assignment makes is unsuitable to place program headers at all.
|
||||||
|
* The conditions to create the dynamic symbol table `.dynsym`.
|
||||||
|
* LLD: there is an input shared object, `-pie`/`-shared`, or `--export-dynamic`.
|
||||||
|
* GNU ld's is quite complex. `--export-dynamic` is not special, though.
|
||||||
|
* `--export-dynamic-symbol`
|
||||||
|
* gold's implies `-u`.
|
||||||
|
* GNU ld (from 2.35 onwards) and LLD's do not imply `-u`.
|
||||||
|
* In GNU ld, a defined `foo@v` can suppress the extraction of an archive member
|
||||||
|
defining `foo@@v1`. LLD treats them two separate symbols and thus the archive
|
||||||
|
member extraction still happens. This can hardly matter. See [All about symbol
|
||||||
|
versioning](maskray-2.md) for details.
|
||||||
|
* Default program headers.
|
||||||
|
* With traditional `-z noseparate-code`, GNU ld defaults to a `RX/R/RW`
|
||||||
|
program header layout. With `-z separate-code` (default on Linux/x86 from
|
||||||
|
binutils 2.31 onwards), GNU ld defaults to a `R/RX/R/RW` program header
|
||||||
|
layout.
|
||||||
|
* LLD defaults to `R/RX/RW(RELRO)/RW(non-RELRO)`. With `--rosegment`, LLD
|
||||||
|
uses `RX/RW(RELRO)/RW(non-RELRO)`.
|
||||||
|
* Placing all R before RX is preferable because it can save one program
|
||||||
|
header and reduce alignment costs.
|
||||||
|
* LLD's split of RW saves one maxpagesize alignment and can make the linked
|
||||||
|
image smaller.
|
||||||
|
* This breaks some assumptions that the (so-called) "text segment" precedes
|
||||||
|
the (so-called) "data segment".
|
||||||
|
* For example, certain programs expect `.text` is the first section of the
|
||||||
|
text segment and specify `-Ttext=0` to place the `PF_R|PF_X` program header
|
||||||
|
at `p_vaddr=0`. This is a brittle assumption and should be avoided. If
|
||||||
|
`PT_PHDR` is needed, `--image-base=0` is a replacement. If `PT_PHDR` is not
|
||||||
|
needed, `.text 0 : { *(.text .text.*) }` is a replacement.
|
||||||
|
* GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in
|
||||||
|
`-pie` mode. glibc `csu/libc-start.c` needs it when statically linked, but
|
||||||
|
not in the static pie mode. LLD does not distinguish `-no-pie`, `-pie` and
|
||||||
|
`-shared`. https://bugs.llvm.org/show_bug.cgi?id=48674
|
||||||
|
* LLD uses `--no-apply-dynamic-relocs` by default. GNU ld and gold fill in the
|
||||||
|
GOT entries with link-time values. GNU ld only supports
|
||||||
|
`--no-apply-dynamic-relocs` for aarch64
|
||||||
|
https://sourceware.org/bugzilla/show_bug.cgi?id=25891.
|
||||||
|
* When relaxing `R_X86_64_REX_GOTPCRELX`, GNU ld suppresses the relaxation if
|
||||||
|
it would cause relocation overflow. LLD does not perform the check.
|
||||||
|
* GNU ld and gold allow `--exclude-libs=b` to hide `b.a`. LLD requires
|
||||||
|
`--exclude=libs=b.a`.
|
||||||
|
* Whether to use executable stack if neither `-z execstack` nor `-z noexecstack`
|
||||||
|
is specified. GNU ld and gold check whether an object file does not have
|
||||||
|
`.note.GNU-stack`. LLD ignores `.note.GNU-stack` and defaults to `-z
|
||||||
|
noexecstack`.
|
||||||
|
|
||||||
|
## Semantics of `--wrap`
|
||||||
|
|
||||||
|
GNU ld and LLD have slightly different `--wrap` semantics. I use "slightly"
|
||||||
|
because in most use cases users will not observe a difference.
|
||||||
|
|
||||||
|
In GNU ld, `--wrap` only applies to undefined symbols. In LLD, `--wrap` happens
|
||||||
|
after all other symbol resolution steps. The implementation is to mangle the
|
||||||
|
symbol table of each object file (`foo` -> `__wrap_foo`; `__real_foo` ->
|
||||||
|
`foo`) so that all relocations to foo or `__real_foo` will be redirected.
|
||||||
|
|
||||||
|
The LLD semantics have the advantage that non-LTO, LTO and relocatable link
|
||||||
|
behaviors are consistent. I filed
|
||||||
|
https://sourceware.org/bugzilla/show_bug.cgi?id=26358 for GNU ld.
|
||||||
|
|
||||||
|
```
|
||||||
|
# GNU ld: call bar
|
||||||
|
# LLD: call __wrap_bar
|
||||||
|
call bar
|
||||||
|
.globl bar
|
||||||
|
bar:
|
||||||
|
```
|
||||||
|
|
||||||
|
## Relocation referencing a local relative to a discarded input section
|
||||||
|
|
||||||
|
* How to resolve a relocation referencing a STT_SECTION symbol associated to a
|
||||||
|
discarded `.debug_*` input section.
|
||||||
|
* GNU ld and gold have logic resolving the relocation to the prevailing
|
||||||
|
section symbol.
|
||||||
|
* LLD does not have the logic. LLD 11 defines some tombstone values.
|
||||||
|
|
||||||
|
> A symbol table entry with `STB_LOCAL` binding that is defined relative to one
|
||||||
|
> of a group's sections, and that is contained in a symbol table section that
|
||||||
|
> is not part of the group, must be discarded if the group members are
|
||||||
|
> discarded. References to this symbol table entry from outside the group are
|
||||||
|
> not allowed.
|
||||||
|
|
||||||
|
ld.bfd/gold/lld error if the section containing the relocation is `SHF_ALLOC`.
|
||||||
|
`.debug*` do not have the `SHF_ALLOC` flag and those relocations are allowed.
|
||||||
|
|
||||||
|
lld resolves such relocations to 0. ld.bfd and gold, however, have some
|
||||||
|
`CB_PRETEND`/`PRETEND` logic to resolve relocations to the definitions in the
|
||||||
|
prevailing comdat groups. The code is hacky and may not suit lld.
|
||||||
|
|
||||||
|
https://bugs.llvm.org/show_bug.cgi?id=42030
|
||||||
|
|
||||||
|
## Canonical PLT entry for ifunc
|
||||||
|
|
||||||
|
How to handle a direct access relocation referencing a `STT_GNU_IFUNC`?
|
||||||
|
|
||||||
|
c.f. [GNU indirect function](maskray-6.md).
|
||||||
|
|
||||||
|
## `__rela_iplt_start`
|
||||||
|
|
||||||
|
GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in `-pie`
|
||||||
|
mode. LLD defines `__rela_iplt_start` regardless of `-no-pie`, `-pie` or
|
||||||
|
`-shared`.
|
||||||
|
|
||||||
|
Static pie and static no-pie relocation processing is very different in glibc.
|
||||||
|
|
||||||
|
* Static no-pie uses special code to process a magic array delimitered by
|
||||||
|
`__rela_iplt_start`/`__rela_iplt_end`.
|
||||||
|
* Static pie uses self-relocation to take care of `R_*_IRELATIVE`. The above
|
||||||
|
magic array code is executed as well. If `__rela_iplt_start`/`__rela_iplt_end`
|
||||||
|
are defined (like what LLD does), we will get
|
||||||
|
`0 < __rela_iplt_start < __rela_iplt_end` in `csu/libc-start.c`.
|
||||||
|
`ARCH_SETUP_IREL` will crash when resolving the first relocation which has
|
||||||
|
been processed.
|
||||||
|
|
||||||
|
nsz has a glibc patch that moves the self-relocation later so everything is set up for ifunc resolvers.
|
||||||
|
|
||||||
|
## Linker scripts
|
||||||
|
|
||||||
|
* Some linker script commands are unimplemented in LLD, e.g. `BLOCK()` as a
|
||||||
|
compatibility alias for `ALIGN()`. `BLOCK` is documented in GNU ld as a
|
||||||
|
compatibility alias and it is not widely used, so there is no reason to keep
|
||||||
|
the kludge in LLD.
|
||||||
|
* Some syntax is not recognized by LLD, e.g. LLD recognizes
|
||||||
|
`*(EXCLUDE_FILE(a.o) .text)` but not `EXCLUDE_FILE(a.o) *(.text)`
|
||||||
|
(https://bugs.llvm.org/show_bug.cgi?id=45764)
|
||||||
|
* To me the unrecognized syntax is misleading.
|
||||||
|
* If we support one way doing something, and the thing has several
|
||||||
|
alternative syntax, we may not consider the alternative syntax just for the
|
||||||
|
sake of completeness.
|
||||||
|
* Different orphan section placement. GNU ld has very complex rules and certain
|
||||||
|
section names have special semantics. LLD adopted some of its core ideas but
|
||||||
|
made a lot of simplication:
|
||||||
|
* output sections are given ranks
|
||||||
|
* output sections are placed after symbol assignments At some point we should
|
||||||
|
document it. https://bugs.llvm.org/show_bug.cgi?id=42327
|
||||||
|
* For an error detected when processing a linker script, LLD may report it
|
||||||
|
multiple times (e.g. `ASSERT` failure). GNU ld has such issues, too, but
|
||||||
|
probably much rarer.
|
||||||
|
* `SORT` commands
|
||||||
|
* GNU ld: https://sourceware.org/binutils/docs/ld/Input-Section-Basics.html#Input-Section-Basics
|
||||||
|
mentions the feature but its behavior is strange/unintuitive. I created
|
||||||
|
`SORT` and multiple patterns in an input section description.
|
||||||
|
* LLD performs sorting within an input section description.
|
||||||
|
https://reviews.llvm.org/D91127
|
||||||
|
* In LLD, `AT(lma)` forces creation of a new `PT_LOAD` program header. GNU ld
|
||||||
|
can reuse the previous `PT_LOAD` program header if LMA addresses are
|
||||||
|
contiguous. `lma-offset.s`
|
||||||
|
* In LLD, non-`SHF_ALLOC` sections always get 0 `sh_addr`. In GNU ld you can
|
||||||
|
have non-zero `sh_addr` but `STT_SECTION` relocations referencing such
|
||||||
|
sections are not really meaningful.
|
||||||
|
* Dot assignment (e.g. `. = 4;`) in an output section description.
|
||||||
|
* GNU ld: dot advances to 4 relative to the start. If you consider . on the
|
||||||
|
right hand side and `ABSOLUTE(.)`, I don't think the behaviors are
|
||||||
|
consistent.
|
||||||
|
* LLD: move dot to address 0x4, which will usually trigger an unable to move
|
||||||
|
location counter backward error. https://bugs.llvm.org/show_bug.cgi?id=41169
|
||||||
|
|
||||||
|
I'll also mention some LLD release notes which can demonstrate some GNU
|
||||||
|
incompatibility in previous versions. (For example, if one thing is supported
|
||||||
|
in version N, then the implication is that it is unsupported in previous
|
||||||
|
versions. Well, it could be that it worked in older versions but regressed at
|
||||||
|
some version. However, I don't know the existence of such things.)
|
||||||
|
|
||||||
|
LLD 12.0.0
|
||||||
|
|
||||||
|
* `-r --gc-sections` is supported.
|
||||||
|
* The archive member extraction semantics of COMMON symbols is by default
|
||||||
|
(`--fortran-common`) compatible with GNU ld. You may want to read Semantics
|
||||||
|
of a common definition in an archive for details. This is unfortunate.
|
||||||
|
* `.rel[a].plt` and `.rel[a].dyn` get the `SHF_INFO_LINK` flag. https://reviews.llvm.org/D89828
|
||||||
|
|
||||||
|
LLD 11.0.0
|
||||||
|
|
||||||
|
* LLD can discard unused symbols with `--discard-all`/`--discard-locals` when
|
||||||
|
`-r` or `--emit-relocs` is specified. https://reviews.llvm.org/D77807
|
||||||
|
* `--emit-relocs --strip-debug` can be used. https://reviews.llvm.org/D74375
|
||||||
|
* `SHT_GNU_verneed` in shared objects are parsed, and versioned undefined
|
||||||
|
symbols in shared objects are respected. Previously non-default version
|
||||||
|
symbols could cause spurious `--no-allow-shlib-undefined` errors.
|
||||||
|
https://reviews.llvm.org/D80059
|
||||||
|
* `DF_1_PIE` is set for position-independent executables. https://reviews.llvm.org/D80872
|
||||||
|
* Better compatibility related to output section alignments and LMA regions.
|
||||||
|
[D75286](https://reviews.llvm.org/D75286) [D74297](https://reviews.llvm.org/D74297)
|
||||||
|
[D75724](https://reviews.llvm.org/D75725) [D81986](https://reviews.llvm.org/D81986)
|
||||||
|
* `-r` allows `SHT_X86_64_UNWIND` to be merged into `SHT_PROGBITS`. This allows
|
||||||
|
clang/GCC produced object files to be mixed together. https://reviews.llvm.org/D85785
|
||||||
|
* In a input section description, the filename can be specified in double
|
||||||
|
quotes. archive:file syntax is added. https://reviews.llvm.org/D72517 https://reviews.llvm.org/D75100
|
||||||
|
* Linker script specified empty `(.init|.preinit|.fini)_array` are allowed with
|
||||||
|
`RELRO`. https://reviews.llvm.org/D76915
|
||||||
|
|
||||||
|
LLD 10.0.0
|
||||||
|
|
||||||
|
* LLD supports `\` (treating the next character like a non-meta character) and
|
||||||
|
`[!...]` (negation) in glob patterns. https://reviews.llvm.org/D66613
|
||||||
|
|
||||||
|
LLD 9.0.0
|
||||||
|
|
||||||
|
* The `DF_STATIC_TLS` flag is set for i386 and x86-64 when initial-exec TLS
|
||||||
|
models are used.
|
||||||
|
* Many configurations of the Linux kernel's `arm32_7`, `arm64`, `powerpc64le`
|
||||||
|
and `x86_64` ports can be linked by LLD.
|
||||||
|
|
||||||
|
LLD 8.0.0
|
||||||
|
|
||||||
|
* `SHT_NOTE` sections get very high ranks (they usually precede other
|
||||||
|
sections). https://reviews.llvm.org/D55800
|
||||||
|
|
||||||
|
In the LLD 7.0.0 era, https://reviews.llvm.org/D44264 was my first meaningful
|
||||||
|
(albeit trivial) patch to LLD. Next I made contribution to `--warn-backrefs`.
|
||||||
|
Then I started to fix tricky issues like copy relocations of a versioned
|
||||||
|
symbol, duplicate `--wrap`, and section ranks. I have learned a lot from these
|
||||||
|
code reviews. In the 8.0.0, 9.0.0 and 10.0.0 era, I have fixed a number of
|
||||||
|
tricky issues and improved a dozen of other things and am confident to say that
|
||||||
|
other than MIPS ;-) and certain other ISA specific things I am familiar with
|
||||||
|
every corner of the code base. These are still challenges such as integration
|
||||||
|
of RISC-V style linker relaxation and post-link optimization, improvement to
|
||||||
|
some aspects of the linker script, but otherwise LLD is a stable and finished
|
||||||
|
part of the toolchain.
|
||||||
|
|
||||||
|
A few random notes:
|
||||||
|
|
||||||
|
* Symbol resolution can take 10%~20% time. Parallelization can theoretically
|
||||||
|
improve the process but it is hard to overstate the challenge (if you
|
||||||
|
additionally take into account determinism).
|
||||||
|
* Be wary of feature creep. I have learned a lot from ELF design discussions
|
||||||
|
on generic-abi and from Solaris "linker aliens" in particular. I am sorry to
|
||||||
|
say so but some development on LLD indeed belongs to such categories.
|
||||||
|
Sometimes it is difficult to draw a line between unsupported legacy and
|
||||||
|
legacy we have to support.
|
||||||
|
* LLD's adoption is now so large that sometimes a decision (like a default
|
||||||
|
value for an option) cannot make everyone happy.
|
||||||
|
|
|
@ -0,0 +1,462 @@
|
||||||
|
# Copy relocations, canonical PLT entries and protected visibility
|
||||||
|
|
||||||
|
Background:
|
||||||
|
|
||||||
|
* `-fno-pic` can only be used by executables. On most platforms and
|
||||||
|
architectures, direct access relocations are used to reference external data
|
||||||
|
symbols.
|
||||||
|
* `-fpic` can be used by both executables and shared objects. Windows has
|
||||||
|
`__declspec(dllimport)` but most other binary formats allow a default
|
||||||
|
visibility external data to be resolved to a shared object, so generally
|
||||||
|
direct access relocations are disallowed.
|
||||||
|
* `-fpie` was introduced as a mode similar to `-fpic` for ELF: the compiler can
|
||||||
|
make the assumption that the produced object file can only be used by
|
||||||
|
executables, thus all definitions are non-preemptible and thus
|
||||||
|
interprocedural optimizations can apply on them.
|
||||||
|
|
||||||
|
For
|
||||||
|
|
||||||
|
```c
|
||||||
|
extern int a;
|
||||||
|
int *foo() { return &a; }
|
||||||
|
```
|
||||||
|
|
||||||
|
`-fno-pic` typically produces an absolute relocation (a PC-relative relocation
|
||||||
|
can be used as well). On ELF x86-64 it is usually `R_X86_64_32` in the position
|
||||||
|
dependent small code model. If a is defined in the executable (by another
|
||||||
|
translation unit), everything works fine. If a turns out to be defined in a
|
||||||
|
shared object, its real address will be non-constant at link time. Either
|
||||||
|
action needs to be taken:
|
||||||
|
|
||||||
|
* Emit a dynamic relocation in every use site. Text sections are usually
|
||||||
|
non-writable. A dynamic relocation applied on a non-writable section is
|
||||||
|
called a text relocation.
|
||||||
|
* Emit a single copy relocation. Copy relocations only work for executables.
|
||||||
|
The linker obtains the size of the symbol, allocates the bytes in `.bss`
|
||||||
|
(this may make the object writable. On LLD a readonly area may be picked.),
|
||||||
|
and emit an `R_*_COPY` relocation. All references resolve to the new location.
|
||||||
|
|
||||||
|
Multiple text relocations are even less acceptable, so on ELF a copy relocation
|
||||||
|
is generally used. Here is a nice description from [Rich
|
||||||
|
Felker](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55012): "Copy relocations
|
||||||
|
are not a case of overriding the definition in the abstract machine, but an
|
||||||
|
implementation detail used to support data objects in shared libraries when the
|
||||||
|
main program is non-PIC."
|
||||||
|
|
||||||
|
Copy relocations have drawbacks:
|
||||||
|
|
||||||
|
* Break page sharing.
|
||||||
|
* Make the symbol properties (e.g. size) part of ABI.
|
||||||
|
* If the shared object is linked with `-Bsymbolic` or `--dynamic-list` and
|
||||||
|
defines a data symbol copy relocated by the executable, the address of the
|
||||||
|
symbol may be different in the shared object and in the executable.
|
||||||
|
|
||||||
|
What went poorly was that `-fno-pic` code had no way to avoid copy relocations
|
||||||
|
on ELF. Traditionally copy relocations could only occur in `-fno-pic` code. A
|
||||||
|
GCC 5 change made this possible for x86-64. Please read on.
|
||||||
|
|
||||||
|
## x86-64: copy relocations and `-fpie`
|
||||||
|
|
||||||
|
`-fpic` using GOT indirection for external data symbols has cost. Making
|
||||||
|
`-fpie` similar to `-fpic` in this regard incurs costs if the data symbol turns
|
||||||
|
out to be defined in the executable. Having the data symbol defined in another
|
||||||
|
translation unit linked into the executable is very common, especially if the
|
||||||
|
vendor uses fully/mostly statically linking mode.
|
||||||
|
|
||||||
|
In GCC 5, ["x86-64: Optimize access to globals in PIE with copy
|
||||||
|
reloc"](https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=77ad54d911dd7cb88caf697ac213929f6132fdcf)
|
||||||
|
started to use direct access relocations for external data symbols on x86-64 in
|
||||||
|
`-fpie` mode.
|
||||||
|
|
||||||
|
```c
|
||||||
|
extern int a;
|
||||||
|
int foo() { return a; }
|
||||||
|
```
|
||||||
|
|
||||||
|
* GCC<5: `movq a@GOTPCREL(%rip), %rax; movl (%rax), %eax` (8 bytes)
|
||||||
|
* GCC>=5: `movl a(%rip), %eax` (6 bytes)
|
||||||
|
|
||||||
|
This change is actually useful for architectures other than x86-64 but is never
|
||||||
|
implemented for other architectures. What went wrong: the change was
|
||||||
|
implemented as an inflexible configure-time choice (`HAVE_LD_PIE_COPYRELOC`),
|
||||||
|
defaulting to such a behavior if ld supports PIE copy relocations (most
|
||||||
|
binutils installations). Keep in mind that such a `-fpie` default [breaks
|
||||||
|
`-Bsymbolic` and `--dynamic-list` in shared objects](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65888).
|
||||||
|
|
||||||
|
Clang addressed the inflexible configure-time choice via an opt-in option
|
||||||
|
`-mpie-copy-relocations` (D19996).
|
||||||
|
|
||||||
|
I noticed that:
|
||||||
|
|
||||||
|
* The option can be used for `-fno-pic` code as well to prevent copy
|
||||||
|
relocations on ELF. This is occasionally users want (if their shared objects
|
||||||
|
use `-Bsymbolic` and export data symbols (usually undesired from API
|
||||||
|
perspecitives but can avoid costs at times)), and they switch from `-fno-pic`
|
||||||
|
to `-fpic` just for this purpose.
|
||||||
|
* The option name should describe the code generation behavior, instead of the
|
||||||
|
inferred behavior at the linking stage on a partibular binary format.
|
||||||
|
* The option does not need to tie to ELF.
|
||||||
|
* On COFF, the behavior is like always `-fdirect-access-external-data`.
|
||||||
|
`__declspec(dllimport)` is needed to enable indirect access.
|
||||||
|
* On Mach-O, the behavior is like `-fdirect-access-external-data` for
|
||||||
|
`-fno-pic` (only available on arm) and the opposite for `-fpic`.
|
||||||
|
* H.J. Lu introduced `R_X86_64_GOTPCRELX` and `R_X86_64_REX_GOTPCRELX` as GOT
|
||||||
|
optimization to x86-64 psABI. This is great! With the optimization, GOT
|
||||||
|
indirection can be optimized, so the incured cost is very low now.
|
||||||
|
|
||||||
|
So I proposed an alternative option `-f[no-]direct-access-external-data`:
|
||||||
|
https://reviews.llvm.org/D92633
|
||||||
|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112. My wish on the GCC side is
|
||||||
|
to drop `HAVE_LD_PIE_COPYRELOC` and (x86-64) default to GOT indirection for
|
||||||
|
external data symbols in `-fpie` mode.
|
||||||
|
|
||||||
|
Please keep in mind that `-f[no-]semantic-interposition` is for definitions
|
||||||
|
while `-f[no-]direct-access-external-data` is for undefined data symbols. GCC 5
|
||||||
|
introduced `-fno-semantic-interposition` to use local aliases for references to
|
||||||
|
definitions in the same translation unit.
|
||||||
|
|
||||||
|
## `STV_PROTECTED`
|
||||||
|
|
||||||
|
Now let's consider how `STV_PROTECTED` comes into play. Here is the generic ABI
|
||||||
|
definition:
|
||||||
|
|
||||||
|
> A symbol defined in the current component is protected if it is visible in
|
||||||
|
> other components but not preemptable, meaning that any reference to such a
|
||||||
|
> symbol from within the defining component must be resolved to the definition
|
||||||
|
> in that component, even if there is a definition in another component that
|
||||||
|
> would preempt by the default rules. A symbol with `STB_LOCAL` binding may not
|
||||||
|
> have `STV_PROTECTED` visibility. If a symbol definition with `STV_PROTECTED`
|
||||||
|
> visibility from a shared object is taken as resolving a reference from an
|
||||||
|
> executable or another shared object, the `SHN_UNDEF` symbol table entry
|
||||||
|
> created has `STV_DEFAULT` visibility.
|
||||||
|
|
||||||
|
A non-local `STV_DEFAULT` defined symbol is by default preemptible in a shared
|
||||||
|
object on ELF. `STV_PROTECTED` can make the symbol non-preemptible. You may
|
||||||
|
have noticed that I use "preemptible" while the generic ABI uses "preemptable"
|
||||||
|
and LLVM IR uses "`dso_preemptable`". Both forms work. "preemptible" is my
|
||||||
|
opition because it is more common.
|
||||||
|
|
||||||
|
### Protected data symbols and copy relocations
|
||||||
|
|
||||||
|
Many folks consider that copy relocations are best-effort support provided by
|
||||||
|
the toolchain. `STV_PROTECTED` is intended as an optimization and the
|
||||||
|
optimization can error out if it can't be done for whatever reason. Since copy
|
||||||
|
relocations are already oftentimes unacceptable, it is natural to think that we
|
||||||
|
should just disallow copy relocations on protected data symbols.
|
||||||
|
|
||||||
|
However, GNU ld 2.26 made a change which enabled copy relocations on protected
|
||||||
|
data symbols for i386 and x86-64.
|
||||||
|
|
||||||
|
A glibc change ["Add `ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA` to
|
||||||
|
x86"](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=62da1e3b00b51383ffa7efc89d8addda0502e107)
|
||||||
|
is needed to make copy relocations on protected data symbols work.
|
||||||
|
["[AArch64][BZ #17711] Fix extern protected data handling"](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0910702c4d2cf9e8302b35c9519548726e1ac489)
|
||||||
|
and ["[ARM][BZ #17711] Fix extern protected data handling"](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3bcea719ddd6ce399d7bccb492c40af77d216e42)
|
||||||
|
ported the thing to arm and aarch64.
|
||||||
|
|
||||||
|
Despite the glibc support, GNU ld aarch64 errors relocation
|
||||||
|
`R_AARCH64_ADR_PREL_PG_HI21` against symbol `foo` which may bind externally can
|
||||||
|
not be used when making a shared object; recompile with `-fPIC`.
|
||||||
|
|
||||||
|
powerpc64 ELFv2 is interesting: TOC indirection (TOC is a variant of GOT) is
|
||||||
|
used everywhere, data symbols normally have no direct access relocations, so
|
||||||
|
this is not a problem.
|
||||||
|
|
||||||
|
```c
|
||||||
|
// b.c
|
||||||
|
__attribute__((visibility("protected"))) int foo;
|
||||||
|
// a.c
|
||||||
|
extern int foo;
|
||||||
|
int main() { return foo; }
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
gcc -fuse-ld=bfd -fpic -shared b.c -o b.so
|
||||||
|
gcc -fuse-ld=bfd -pie -fno-pic a.c ./b.so
|
||||||
|
```
|
||||||
|
|
||||||
|
gold does not allow copy relocations on protected data symbols, but it misses
|
||||||
|
some cases: https://sourceware.org/bugzilla/show_bug.cgi?id=19823.
|
||||||
|
|
||||||
|
### Protected data symbols and direct accesses
|
||||||
|
|
||||||
|
If a protected data symbol in a shared object is copy relocated, allowing
|
||||||
|
direct accesses will cause the shared object to operate on a different copy
|
||||||
|
from the executable. Therefore, direct accesses to protected data symbols have
|
||||||
|
to be disallowed in `-fpic` code, just in case the symbols may be copy
|
||||||
|
relocated. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 changed GCC 5 to
|
||||||
|
use GOT indirection for protected external data.
|
||||||
|
|
||||||
|
```c
|
||||||
|
__attribute__((visibility("protected"))) int foo;
|
||||||
|
int val() { return foo; }
|
||||||
|
// -fPIC: GOT on at least aarch64, arm, i386, x86-64
|
||||||
|
```
|
||||||
|
|
||||||
|
This caused unneeded pessimization for protected external data. Clang always
|
||||||
|
treats protected similar to hidden/internal.
|
||||||
|
|
||||||
|
For older GCC (and all versions of Clang), direct accesses are produced in
|
||||||
|
`-fpic` code. Mixing such object files can silently break copy relocations on
|
||||||
|
protected data symbols. Therefore, GNU ld made the change
|
||||||
|
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=ca3fe95e469b9daec153caa2c90665f5daaec2b5
|
||||||
|
to error in `-shared` mode.
|
||||||
|
|
||||||
|
```
|
||||||
|
% cat a.s
|
||||||
|
leaq foo(%rip), %rax
|
||||||
|
|
||||||
|
.data
|
||||||
|
.global foo
|
||||||
|
.protected foo
|
||||||
|
foo:
|
||||||
|
```
|
||||||
|
```
|
||||||
|
% gcc -fuse-ld=bfd -shared a.s
|
||||||
|
/usr/bin/ld.bfd: /tmp/ccchu3Xo.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
|
||||||
|
/usr/bin/ld.bfd: final link failed: bad value
|
||||||
|
collect2: error: ld returned 1 exit status
|
||||||
|
```
|
||||||
|
|
||||||
|
This led to a heated discussion
|
||||||
|
https://sourceware.org/legacy-ml/binutils/2016-03/msg00312.html. Swift folks
|
||||||
|
noticed this https://bugs.swift.org/browse/SR-1023 and their reaction was to
|
||||||
|
switch from GNU ld to gold.
|
||||||
|
|
||||||
|
GNU ld's aarch64 port does not have the diagnostic.
|
||||||
|
|
||||||
|
binutils commit ["x86: Clear `extern_protected_data` for
|
||||||
|
`GNU_PROPERTY_NO_COPY_ON_PROTECTED`"](https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=73784fa565bd66f1ac165816c03e5217b7d67bbc)
|
||||||
|
introduced
|
||||||
|
`GNU_PROPERTY_NO_COPY_ON_PROTECTED`. With this property, `ld -shared` will not
|
||||||
|
error for relocation `R_X86_64_PC32` against protected symbol `foo` can not be
|
||||||
|
used when making a shared object.
|
||||||
|
|
||||||
|
The two issues above are the costs enabling copy relocations on protected data
|
||||||
|
symbols. Personally I don't think copy relocations on protected data symbols
|
||||||
|
are actually leveraged. GNU ld's x86 port can just (1) reject such copy
|
||||||
|
relocations and (2) allow direct accesses referencing protected data symbols in
|
||||||
|
`-shared` mode. But I am not really clear about the glibc case. I wish
|
||||||
|
`GNU_PROPERTY_NO_COPY_ON_PROTECTED` can become the default or be phased out in
|
||||||
|
the future.
|
||||||
|
|
||||||
|
### Protected function symbols and canonical PLT entries
|
||||||
|
|
||||||
|
```c
|
||||||
|
// b.c
|
||||||
|
__attribute__((visibility("protected"))) void *foo () {
|
||||||
|
return (void *)foo;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
GNU ld's aarch64 and x86 ports rejects the above code. On many other
|
||||||
|
architectures including powerpc the code is supported.
|
||||||
|
|
||||||
|
```
|
||||||
|
% gcc -fpic -shared b.c -fuse-ld=bfd b.c -o b.so
|
||||||
|
/usr/bin/ld.bfd: /tmp/cc3Ay0Gh.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
|
||||||
|
/usr/bin/ld.bfd: final link failed: bad value
|
||||||
|
collect2: error: ld returned 1 exit status
|
||||||
|
% gcc -shared -fuse-ld=bfd -fpic b.c -o b.so
|
||||||
|
/usr/bin/ld.bfd: /tmp/ccXdBqMf.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `foo' which may bind externally can not be used when making a shared object; recompile with -fPIC
|
||||||
|
/tmp/ccXdBqMf.o: in function `foo':
|
||||||
|
a.c:(.text+0x0): dangerous relocation: unsupported relocation
|
||||||
|
collect2: error: ld returned 1 exit status
|
||||||
|
```
|
||||||
|
|
||||||
|
The rejection is mainly a historical issue to make pointer equality work with
|
||||||
|
`-fno-pic` code. The GNU ld idea is that:
|
||||||
|
|
||||||
|
* The compiler emits GOT-generating relocations for `-fpic` code (in reality it
|
||||||
|
does it for declarations but not for definitions).
|
||||||
|
* `-fno-pic` main executable uses direct access relocation types and gets a
|
||||||
|
canonical PLT entry.
|
||||||
|
* glibc ld.so resolves the GOT in the shared object to the canonical PLT entry.
|
||||||
|
|
||||||
|
Actually we can take the interepretation that a canonical PLT entry is
|
||||||
|
incompatible with a shared `STV_PROTECTED` definition, and reject the attempt
|
||||||
|
to create a canonical PLT entry (gold/LLD). And we can keep producing direct
|
||||||
|
access relocations referencing protected symbols for `-fpic` code.
|
||||||
|
`STV_PROTECTED` is no different from `STV_HIDDEN`.
|
||||||
|
|
||||||
|
On many architectures, a branch instruction uses a branch specific relocation
|
||||||
|
type (e.g. `R_AARCH64_CALL26`, `R_PPC64_REL24`, `R_RISCV_CALL_PLT`). This is
|
||||||
|
great because the address is insignificant and the linker can arrange for a
|
||||||
|
regular PLT if the symbol turns out to be external.
|
||||||
|
|
||||||
|
On i386, a branch in `-fno-pic` code emits an `R_386_PC32` relocation, which is
|
||||||
|
indistinguishable from an address taken operation. If the symbol turns out to
|
||||||
|
be external, the linker has to employ a tricky called "canonical PLT entry"
|
||||||
|
(`st_shndx=0, st_value!=0`). The term is a parlance within a few LLD
|
||||||
|
developers, but not broadly adopted.
|
||||||
|
|
||||||
|
```c
|
||||||
|
// a.c
|
||||||
|
extern void foo(void);
|
||||||
|
int main() { foo(); }
|
||||||
|
```
|
||||||
|
```
|
||||||
|
% gcc -m32 -shared -fuse-ld=bfd -fpic b.c -o b.so
|
||||||
|
% gcc -m32 -fno-pic -no-pie -fuse-ld=lld a.c ./b.so
|
||||||
|
|
||||||
|
% gcc -m32 -fno-pic a.c ./b.so -fuse-ld=lld
|
||||||
|
ld.lld: error: cannot preempt symbol: foo
|
||||||
|
>>> defined in ./b.so
|
||||||
|
>>> referenced by a.c
|
||||||
|
>>> /tmp/ccDGhzEy.o:(main)
|
||||||
|
collect2: error: ld returned 1 exit status
|
||||||
|
|
||||||
|
% gcc -m32 -fno-pic -no-pie a.c ./b.so -fuse-ld=bfd
|
||||||
|
# canonical PLT entry; foo has different addresses in a.out and b.so.
|
||||||
|
% gcc -m32 -fno-pic -pie a.c ./b.so -fuse-ld=bfd
|
||||||
|
/usr/bin/ld.bfd: /tmp/ccZ3Rl8Y.o: warning: relocation against `foo' in read-only section `.text'
|
||||||
|
/usr/bin/ld.bfd: warning: creating DT_TEXTREL in a PIE
|
||||||
|
% gcc -m32 -fno-pic -pie a.c ./b.so -fuse-ld=bfd -z text
|
||||||
|
/usr/bin/ld.bfd: /tmp/ccUv8wXc.o: warning: relocation against `foo' in read-only section `.text'
|
||||||
|
/usr/bin/ld.bfd: read-only segment has dynamic relocations
|
||||||
|
collect2: error: ld returned 1 exit status
|
||||||
|
```
|
||||||
|
|
||||||
|
This used to be a problem for x86-64 as well, until ["x86-64: Generate branch
|
||||||
|
with PLT32 relocation"](https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bd7ab16b4537788ad53521c45469a1bdae84ad4a)
|
||||||
|
changed call/jmp foo to emit `R_X86_64_PLT32` instead of `R_X86_64_PC32`. Note:
|
||||||
|
(`-fpie`/`-fpic`) `call/jmp foo@PLT` always emits `R_X86_64_PLT32`.
|
||||||
|
|
||||||
|
The relocation type name is a bit misleading, `_PLT32` does not mean that a PLT
|
||||||
|
will always be created. Rather, it is optional: the linker can resolve `_PLT32`
|
||||||
|
to any place where the function will be called. If the symbol is preemptible,
|
||||||
|
the place is usually the PLT entry. If the symbol is non-preemptible, the
|
||||||
|
linker can convert `_PLT32` into `_PC32`. A function symbol can be either
|
||||||
|
branched or taken address. For an address taken operation, the function symbol
|
||||||
|
is used in a manner similar to a data symbol. `R_386_PLT32` cannot be used. LLD
|
||||||
|
and gold will just reject the link if text relocations are disabled.
|
||||||
|
|
||||||
|
On i386, my proposal is that branches to a default visibility function
|
||||||
|
declaration should use `R_386_PLT32` instead of `R_386_PC32`, in a manner
|
||||||
|
similar to x86-64. Originally I thought an assembler change sufficed:
|
||||||
|
https://sourceware.org/bugzilla/show_bug.cgi?id=27169. Please read the next
|
||||||
|
section why this should be changed on the compiler side.
|
||||||
|
|
||||||
|
### Non-default visibility ifunc and `R_386_PC32`
|
||||||
|
|
||||||
|
For a call to a hidden function declaration, the compiler produces an
|
||||||
|
`R_386_PC32` relocation. The relocation is an indicator that EBX may not be set
|
||||||
|
up.
|
||||||
|
|
||||||
|
If the declaration refers to an ifunc definition, the linker will resolve the
|
||||||
|
`R_386_PC32` to an IPLT entry. For `-pie` and `-shared` links, the IPLT entry
|
||||||
|
references EBX. If the call site does not set up EBX to be
|
||||||
|
`_GLOBAL_OFFSET_TABLE_`, the IPLT call will be incorrect.
|
||||||
|
|
||||||
|
GNU ld has implemented a diagnostic (["i686 ifunc and non-default symbol
|
||||||
|
visibility"](https://sourceware.org/bugzilla/show_bug.cgi?id=20515)) to catch
|
||||||
|
the problem. If we change `call/jmp foo` to always use `R_386_PLT32`, such a
|
||||||
|
diagnostic will be lost.
|
||||||
|
|
||||||
|
Can we change the compiler to emit `call/jmp foo@PLT` for default visibility
|
||||||
|
function declarations? If the compiler emits such a modifier but does not set
|
||||||
|
up EBX, the ifunc can still be non-preemptible (e.g. hidden in another
|
||||||
|
translation unit or `-Bsymbolic`) and we will still have a dilemma.
|
||||||
|
|
||||||
|
Personally, I think avoiding a canonical PLT entry is more useful than a ld
|
||||||
|
ifunc diagnostic. i386 ABI is legacy and the x86 maintainer will not make the
|
||||||
|
change, though.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
I hope the above give an overview to interested readers. Symbol interposition
|
||||||
|
is subtle. One has to think about all the factors related to symbol
|
||||||
|
interposition and the relevant toolchain fixes are like a whack-a-mole game. I
|
||||||
|
appreciate all the prior discussions and I believe many unsatisfactory things
|
||||||
|
can be fixed in a quite backward-compatible way.
|
||||||
|
|
||||||
|
Some features are inherently incompatible. We make the trade-off in favor of
|
||||||
|
more important features. Here are two things that should not work. However, if
|
||||||
|
`-fpie` or `-fno-direct-access-external-data` is specified, both limitations
|
||||||
|
will be circumvented.
|
||||||
|
|
||||||
|
* Copy relocations on protected data symbols.
|
||||||
|
* Canonical PLT entries on protected function symbols. With the `R_386_PLT32`
|
||||||
|
change, this issue will only affect function pointers.
|
||||||
|
|
||||||
|
People sometimes simply just say: "protected visibility does not work." I'd
|
||||||
|
argue that Clang+gold/LLD works quite well.
|
||||||
|
|
||||||
|
The things on GCC+GNU ld side are inconsistent, though. Here is a list of
|
||||||
|
changes I wish can happen:
|
||||||
|
|
||||||
|
* GCC: add `-f[no-]direct-access-external-data`.
|
||||||
|
* GCC: drop `HAVE_LD_PIE_COPYRELOC` in favor of `-f[no-]direct-access-external-data`.
|
||||||
|
* GCC x86-64: default to GOT indirection for external data symbols in `-fpie`
|
||||||
|
mode.
|
||||||
|
* GCC or GNU as i386: emit `R_386_PLT32` for branches to undefined function
|
||||||
|
symbols.
|
||||||
|
* GNU ld x86: disallow copy relocations on protected data symbols. (I think
|
||||||
|
canonical PLT entries on protected symbols have been disallowed.)
|
||||||
|
* GCC aarch64/arm/x86/...: allow direct access relocations on protected symbols
|
||||||
|
in `-fpic` mode.
|
||||||
|
* GNU ld aarch64/x86: allow direct access relocations on protected data symbols
|
||||||
|
in `-shared` mode.
|
||||||
|
|
||||||
|
The breaking changes for GCC+GNU ld:
|
||||||
|
|
||||||
|
* The "copy relocations on protected data symbols" scheme has been supported in
|
||||||
|
the past few years with GNU ld on x86, but it did not work before circa 2015,
|
||||||
|
and should not work in the future. Fortunately the breaking surface may be
|
||||||
|
narrow: this scheme does not work with gold or LLD. Many architectures don't
|
||||||
|
work.
|
||||||
|
* ld is not the only consumer of `R_386_PLT32`. The Linux kernel has code
|
||||||
|
resolving relocations and it needs to be fixed (patch uploaded: https://github.com/ClangBuiltLinux/linux/issues/1210).
|
||||||
|
|
||||||
|
I'll conclude thie article with random notes on other binary formats:
|
||||||
|
|
||||||
|
Windows/COFF `__declspec(dllimport)` gives us a different perspecitive how
|
||||||
|
external references can be designed. The annotation is verbose but
|
||||||
|
differentiates the two cases (1) the symbol has to be defined in the same
|
||||||
|
linkage unit (2) the symbol can be defined in another linkage unit. If we lift
|
||||||
|
the "the symbol visibility is decided by the most constrained visibility"
|
||||||
|
requirement for protected->default, a COFF undefined/defined symbol is quite
|
||||||
|
like a protected undefined/defined symbol in ELF. `__declspec(dllimport)` gives
|
||||||
|
the undefined symbol default visibility (i.e. the LLVM IR `dllimport` is
|
||||||
|
redundant). `__declspec(dllexport)` is something which cannot be modeled with
|
||||||
|
the existing ELF visibilities.
|
||||||
|
|
||||||
|
For an undefined variable, Mach-O uses `__attribute__((visibility("hidden")))`
|
||||||
|
to say "a definition must be available in another translation unit in the same
|
||||||
|
linkage unit" but does not actually mark the undefined symbol anyway. COFF uses
|
||||||
|
`__declspec(dllimport)` to convey this. In ELF,
|
||||||
|
`__attribute__((visibility("hidden")))` additionally makes the undefined symbol
|
||||||
|
unexportable. The Mach-O notation actually resembles COFF: it can be exported
|
||||||
|
by the definition in another translation unit. From its behavior, I think it
|
||||||
|
would be more appropriately mapped to LLVM IR protected instead of hidden.
|
||||||
|
|
||||||
|
## Appendix
|
||||||
|
|
||||||
|
For a `STB_GLOBAL`/`STB_WEAK` symbol,
|
||||||
|
|
||||||
|
`STV_DEFAULT`: both compiler & linker need to assume such symbols can be
|
||||||
|
preempted in `-fpic` mode. The compiler emits GOT indirection by default. GCC
|
||||||
|
`-fno-semantic-interposition` uses local aliases on defined non-weak function
|
||||||
|
symbols for x86 (unimplemented in other architectures). Clang
|
||||||
|
`-fno-semantic-interposition` uses local aliases on defined non-weak symbols
|
||||||
|
(both function and data) for x86.
|
||||||
|
|
||||||
|
`STV_PROTECTED`: GCC `-fpic` uses GOT indirection for data symbols, regardless
|
||||||
|
of defined or undefined. This pessimization is to make a misfeature "copy
|
||||||
|
relocation on protected data symbol" work
|
||||||
|
(https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#protected-data-symbols-and-direct-accesses).
|
||||||
|
Clang code generation treats `STV_PROTECTED` the same way as `STV_HIDDEN`.
|
||||||
|
|
||||||
|
`STV_HIDDEN`: non-preemptible, regardless of defined or undefined. The compiler
|
||||||
|
suppresses GOT indirection, unless undefined `STB_WEAK`.
|
||||||
|
|
||||||
|
For defined symbols, `-fno-pic`/`-fpie` can avoid GOT indirection for
|
||||||
|
`STV_DEFAULT` (and GCC `STV_PROTECTED`). `-fvisibility=hidden` can change
|
||||||
|
visibility.
|
||||||
|
|
||||||
|
For undefined symbols, `-fpie`/`-fpic` use GOT indirection by default. Clang
|
||||||
|
`-fno-direct-access-external-data` (discussed in my article) can avoid GOT
|
||||||
|
indirection. If you `-fpic -fno-direct-access-external-data` & `ld
|
||||||
|
-shared`, you'll need additional linker options to make the linker know defined
|
||||||
|
non-`STB_LOCAL` `STV_DEFAULT` symbols are non-preemptible.
|
||||||
|
|
|
@ -0,0 +1,328 @@
|
||||||
|
# GNU indirect function
|
||||||
|
|
||||||
|
UNDER CONSTRUCTION.
|
||||||
|
|
||||||
|
GNU indirect function (ifunc) is a mechanism making a direct function call
|
||||||
|
resolve to an implementation picked by a resolver. It is mainly used in glibc
|
||||||
|
but has adoption in FreeBSD.
|
||||||
|
|
||||||
|
For some performance critical functions, e.g. memcpy/memset/strcpy, glibc
|
||||||
|
provides multiple implementations optimized for different architecture levels.
|
||||||
|
The application just uses `memcpy(...)` which compiles to call memcpy. The
|
||||||
|
linker will create a PLT for `memcpy` and produce an associated special dynamic
|
||||||
|
relocation referencing the resolver symbol/address. During relocation resolving
|
||||||
|
at runtime, the return value of the resolver will be placed in the GOT entry
|
||||||
|
and the PLT entry will load the address.
|
||||||
|
|
||||||
|
## Representation
|
||||||
|
|
||||||
|
ifunc has a dedicated symbol type `STT_GNU_IFUNC` to mark it different from a
|
||||||
|
regular function (`STT_FUNC`). The value 10 is in the OS-specific range (10~12).
|
||||||
|
`readelf -s` tell you that the symbol is ifunc if OSABI is `ELFOSABI_GNU` or
|
||||||
|
`ELFOSABI_FREEBSD`.
|
||||||
|
|
||||||
|
On Linux, by default GNU as uses `ELFOSABI_NONE` (0). If ifunc is used, the OSABI
|
||||||
|
will be changed to `ELFOSABI_GNU`. Similarly, GNU ld sets the OSABI to
|
||||||
|
`ELFOSABI_GNU` if ifunc is used. gold does not do this [PR17735](https://sourceware.org/bugzilla/show_bug.cgi?id=17735).
|
||||||
|
|
||||||
|
Things are loose in LLVM. The integrated assembler and LLD do not set
|
||||||
|
`ELFOSABI_GNU`. Currently the only problem I know is the `readelf -s` display.
|
||||||
|
Everything else works fine.
|
||||||
|
|
||||||
|
### Assembler behavior
|
||||||
|
|
||||||
|
In assembly, you can assign the type `STT_GNU_IFUNC` to a symbol via
|
||||||
|
`.type foo, @gnu_indirect_function`. An ifunc symbol is typically `STB_GLOBAL`.
|
||||||
|
|
||||||
|
In the object file, `st_shndx` and `st_value` of an `STT_GNU_IFUNC` symbol
|
||||||
|
indicate the resolver. After linking, if the symbol is still `STT_GNU_IFUNC`,
|
||||||
|
its `st_value` field indicates the resolver address in the linked image.
|
||||||
|
|
||||||
|
Assemblers usually convert relocations referencing a local symbol to reference
|
||||||
|
the section symbol, but this behavior needs to be inhibited for `STT_GNU_IFUNC`.
|
||||||
|
|
||||||
|
### Example
|
||||||
|
|
||||||
|
```
|
||||||
|
cat > b.s <<e
|
||||||
|
.global ifunc
|
||||||
|
.type ifunc, @gnu_indirect_function
|
||||||
|
.set ifunc, resolver
|
||||||
|
|
||||||
|
resolver:
|
||||||
|
leaq impl(%rip), %rax
|
||||||
|
ret
|
||||||
|
|
||||||
|
impl:
|
||||||
|
movq $42, %rax
|
||||||
|
ret
|
||||||
|
e
|
||||||
|
|
||||||
|
cat > a.c <<e
|
||||||
|
int ifunc(void);
|
||||||
|
int main() { return ifunc(); }
|
||||||
|
e
|
||||||
|
|
||||||
|
cc a.c b.s
|
||||||
|
./a.out # exit code 42
|
||||||
|
```
|
||||||
|
|
||||||
|
GNU as makes transitive aliases to an `STT_GNU_IFUNC` ifunc as well.
|
||||||
|
|
||||||
|
```
|
||||||
|
.type foo,@gnu_indirect_function
|
||||||
|
.set foo, foo_resolver
|
||||||
|
|
||||||
|
.set foo2, foo
|
||||||
|
.set foo3, foo2
|
||||||
|
```
|
||||||
|
|
||||||
|
GCC and Clang support a function attribute which emits
|
||||||
|
`.type ifunc, @gnu_indirect_function; .set ifunc, resolver`:
|
||||||
|
|
||||||
|
```c
|
||||||
|
static int impl(void) { return 42; }
|
||||||
|
static void *resolver(void) { return impl; }
|
||||||
|
void *ifunc(void) __attribute__((ifunc("resolver")));
|
||||||
|
```
|
||||||
|
|
||||||
|
## Preemptible ifunc
|
||||||
|
|
||||||
|
A preemptible ifunc call is no different from a regular function call from the
|
||||||
|
linker perspective.
|
||||||
|
|
||||||
|
The linker creates a PLT entry, reserves an associated GOT entry, and emits an
|
||||||
|
`R_*_JUMP_SLOT` relocation resolving the address into the GOT entry. The PLT
|
||||||
|
code sequence is the same as a regular PLT for `STT_FUNC`.
|
||||||
|
|
||||||
|
If the ifunc is defined within the module, the symbol type in the linked image
|
||||||
|
is `STT_GNU_IFUNC`, otherwise (defined in a DSO), the symbol type is `STT_FUNC`.
|
||||||
|
|
||||||
|
The difference resides in the loader.
|
||||||
|
|
||||||
|
At runtime, the relocation resolver checks whether the `R_*_JUMP_SLOT`
|
||||||
|
relocation refers to an ifunc. If it does, instead of filling the GOT entry
|
||||||
|
with the target address, the resolver calls the target address as an indirect
|
||||||
|
function, with ABI specified additional parameters (hwcap related), and places
|
||||||
|
the return value into the GOT entry.
|
||||||
|
|
||||||
|
## Non-preemptible ifunc
|
||||||
|
|
||||||
|
The non-preemptible ifunc case is where all sorts of complexity come from.
|
||||||
|
|
||||||
|
First, the `R_*_JUMP_SLOT` relocation type cannot be used in some cases:
|
||||||
|
|
||||||
|
* A non-preemptible ifunc may not have a dynamic symbol table entry. It can be
|
||||||
|
local. It can be defined in the executable without the need to export.
|
||||||
|
* A non-local `STV_DEFAULT` symbol defined in a shared object is by default
|
||||||
|
preemptible. Using `R_*_JUMP_SLOT` for such a case will make the ifunc look
|
||||||
|
like preemptible.
|
||||||
|
|
||||||
|
Therefore a new relocation type `R_*_IRELATIVE` was introduced. There is no
|
||||||
|
associated symbol and the address indicates the resolver.
|
||||||
|
|
||||||
|
```
|
||||||
|
R_*_RELATIVE: B + A
|
||||||
|
R_*_IRELATIVE: call (B + A) as a function
|
||||||
|
R_*_JUMP_SLOT: S
|
||||||
|
```
|
||||||
|
|
||||||
|
When an `R_*_JUMP_SLOT` can be used, there is a trade-off between
|
||||||
|
`R_*_JUMP_SLOT` and `R_*_IRELATIVE`: an `R_*_JUMP_SLOT` can be lazily resolved
|
||||||
|
but needs a symbol lookup. Currently powerpc can use `R_PPC64_JMP_SLOT` in some
|
||||||
|
cases [PR27203](https://sourceware.org/bugzilla/show_bug.cgi?id=27203).
|
||||||
|
|
||||||
|
A PLT entry is needed for two reasons:
|
||||||
|
|
||||||
|
* The call sites emit instructions like call foo. We need to forward them to a
|
||||||
|
place to perform the indirection. Text relocations are usually not an option
|
||||||
|
(exception: [{ifunc-noplt}]()).
|
||||||
|
* If the ifunc is exported, we need a place to mark its canonical address.
|
||||||
|
|
||||||
|
Such PLT entries are sometimes referred to as IPLT. They are placed in the
|
||||||
|
synthetic section .iplt. In GNU ld, `.iplt` will be placed in the output
|
||||||
|
section `.plt`. In LLD, I decided that `.iplt` is better
|
||||||
|
https://reviews.llvm.org/D71520.
|
||||||
|
|
||||||
|
On many architectures (e.g. AArch64/PowerPC/x86), the PLT code sequence is the
|
||||||
|
same as a regular PLT, but it could be different.
|
||||||
|
|
||||||
|
On x86-64, the code sequence is:
|
||||||
|
|
||||||
|
```
|
||||||
|
jmp *got(%rip)
|
||||||
|
pushq $0
|
||||||
|
jmp .plt
|
||||||
|
```
|
||||||
|
|
||||||
|
Since there is no lazy binding, `pushq $0; jmp .plt` are not needed. However,
|
||||||
|
to make all PLT entries of the same shape to simplify linker implementations
|
||||||
|
and facilitate analyzers, it is find to keep it this way.
|
||||||
|
|
||||||
|
## PowerPC32 `-msecure-plt` IPLT
|
||||||
|
|
||||||
|
As a design to work around the lack of PC-relative instructions, PowerPC32 uses
|
||||||
|
multiple GOT sections, one per file in `.got2`. To support multiple GOT
|
||||||
|
pointers, the addend on each `R_PPC_PLTREL24` reloc will have the offset within
|
||||||
|
`.got2`.
|
||||||
|
|
||||||
|
`-msecure-plt` has small/large PIC differences.
|
||||||
|
* `-fpic`/`-fpie`: `R_PPC_PLTREL24 r_addend=0`. The call stub loads an address
|
||||||
|
relative to `_GLOBAL_OFFSET_TABLE_`.
|
||||||
|
* `-fPIC`/`-fPIE`: `R_PPC_PLTREL24 r_addend=0x8000`. (A partial linked object
|
||||||
|
file may have an addend larger than 0x8000.) The call stub loads an address
|
||||||
|
relative to `.got2+0x8000`.
|
||||||
|
|
||||||
|
If a non-preemptible ifunc is referenced in two object files, in
|
||||||
|
`-pie`/`-shared` mode, the two object files cannot share the same IPLT entry.
|
||||||
|
When I added non-preemptible ifunc support for PowerPC32 to LLD
|
||||||
|
https://reviews.llvm.org/D71621, I did not handle this case.
|
||||||
|
|
||||||
|
### `.rela.dyn` vs `.rela.plt`
|
||||||
|
|
||||||
|
LLD placed `R_*_IRELATIVE` in the `.rela.plt` section because many ports of GNU
|
||||||
|
ld behaved this way. While implementing ifunc for PowerPC, I noticed that GNU
|
||||||
|
ld powerpc actually places `R_*_IRELATIVE` in `.rela.dyn` and glibc powerpc
|
||||||
|
does not actually support `R_*_IRELATIVE` in `.rela.plt`. This makes a lot of
|
||||||
|
sense to me because `.rela.plt` normally just contains `R_*_JUMP_SLOT` which
|
||||||
|
can be lazily resolved. ifunc relocations need to be eagerly resolved so
|
||||||
|
`.rela.plt` was a misplace. Therefore I changed LLD to use `.rela.dyn` in
|
||||||
|
https://reviews.llvm.org/D65651.
|
||||||
|
|
||||||
|
## `__rela_iplt_start` and `__rela_iplt_end`
|
||||||
|
|
||||||
|
A statically linked position dependent executable traditionally had no dynamic
|
||||||
|
relocations.
|
||||||
|
|
||||||
|
With ifunc, these `R_*_IRELATIVE` relocations must be resolved at runtime. Such
|
||||||
|
relocations are in a magic array delimitered by `__rela_iplt_start` and
|
||||||
|
`__rela_iplt_end`. In glibc, `csu/libc-start.c` has special code processing the
|
||||||
|
relocation range.
|
||||||
|
|
||||||
|
GNU ld and gold define `__rela_iplt_start` in `-no-pie` mode, but not in `-pie`
|
||||||
|
mode. LLD defines `__rela_iplt_start` regardless of `-no-pie`, `-pie` or
|
||||||
|
`-shared`.
|
||||||
|
|
||||||
|
In glibc, static pie uses self-relocation (`_dl_relocate_static_pie`) to take
|
||||||
|
care of `R_*_IRELATIVE`. The above magic array code is executed by static pie
|
||||||
|
as well. If `__rela_iplt_start`/`__rela_iplt_end` are defined, we will get
|
||||||
|
`0 < __rela_iplt_start < __rela_iplt_end` in `csu/libc-start.c`.
|
||||||
|
`ARCH_SETUP_IREL` will crash when resolving the first relocation which has been
|
||||||
|
processed.
|
||||||
|
|
||||||
|
I think the difference in the
|
||||||
|
`diff -u =(ld.bfd --verbose) =(ld.bfd -pie --verbose)` output is unneeded.
|
||||||
|
https://sourceware.org/pipermail/libc-alpha/2021-January/121755.html
|
||||||
|
|
||||||
|
## Address significance
|
||||||
|
|
||||||
|
A non-GOT-generating non-PLT-generating relocation referencing a
|
||||||
|
`STT_GNU_IFUNC` indicates a potential address-taken operation.
|
||||||
|
|
||||||
|
With a function attribute, the compilers knows that a symbol indicates an ifunc
|
||||||
|
and will avoid generating such relocations. With assembly such relocations may
|
||||||
|
be unavoidable.
|
||||||
|
|
||||||
|
In most cases the linker needs to convert the symbol type to `STT_FUNC` and
|
||||||
|
create a special PLT entry, which is called a "canonical PLT entry" in LLD.
|
||||||
|
References from other modules will resolve to the PLT entry to keep pointer
|
||||||
|
equality: the address taken from the defining module should match the address
|
||||||
|
taken from another module.
|
||||||
|
|
||||||
|
This approach has pros and cons:
|
||||||
|
|
||||||
|
* With a canonical PLT entry, the resolver of a symbol is called only once.
|
||||||
|
There is exactly one `R_*_IRELATIVE` relocation.
|
||||||
|
* If the relocation appears in a non-`SHF_WRITE` section, a text relocation can
|
||||||
|
be avoided.
|
||||||
|
* Relocation types which are not valid dynamic relocation types are supported.
|
||||||
|
GNU ld may error relocation `R_X86_64_PC32` against `STT_GNU_IFUNC` symbol
|
||||||
|
`ifunc` isn't supported
|
||||||
|
* References will bind to the canonical PLT entry. A function call needs to
|
||||||
|
jump to the PLT, loads the value from the GOT, then does an indirect call.
|
||||||
|
|
||||||
|
For a symbolic relocation type (a special case of absolute relocation types
|
||||||
|
where the width matches the word size) like `R_X86_64_64`, when the addend is 0
|
||||||
|
and the section has the `SHF_WRITE` flag, the linker can emit an
|
||||||
|
`R_X86_64_IRELATIVE`. https://reviews.llvm.org/D65995 dropped the case.
|
||||||
|
|
||||||
|
For the following example, GNU ld linked `a.out` calls `fff_resolver` three
|
||||||
|
times while LLD calls it once.
|
||||||
|
|
||||||
|
```c
|
||||||
|
// RUN: split-file %s %t
|
||||||
|
// RUN: clang -fuse-ld=bfd -fpic %t/dso.c -o %t/dso.so --shared
|
||||||
|
// RUN: clang -fuse-ld=bfd %t/main.c %t/dso.so -o %t/a.out
|
||||||
|
// RUN: %t/a.out
|
||||||
|
|
||||||
|
//--- dso.c
|
||||||
|
typedef void fptr(void);
|
||||||
|
extern void fff(void);
|
||||||
|
|
||||||
|
fptr *global_fptr0 = &fff;
|
||||||
|
fptr *global_fptr1 = &fff;
|
||||||
|
|
||||||
|
//--- main.c
|
||||||
|
#include <stdio.h>
|
||||||
|
|
||||||
|
static void fff_impl() { printf("fff_impl()\n"); }
|
||||||
|
static int z;
|
||||||
|
void *fff_resolver() { return (char *)&fff_impl + z++; }
|
||||||
|
|
||||||
|
__attribute__((ifunc("fff_resolver"))) void fff();
|
||||||
|
typedef void fptr(void);
|
||||||
|
fptr *local_fptr = fff;
|
||||||
|
extern fptr *global_fptr0, *global_fptr1;
|
||||||
|
|
||||||
|
int main() {
|
||||||
|
printf("local %p global0 %p global1 %p\n", local_fptr, global_fptr0, global_fptr1);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Relocation resolving order
|
||||||
|
|
||||||
|
`R_*_IRELATIVE` relocations are resolved eagerly. In glibc, there used to be a
|
||||||
|
problem where ifunc resolvers ran before `GL(dl_hwcap)` and `GL(dl_hwcap2)`
|
||||||
|
were set up https://sourceware.org/bugzilla/show_bug.cgi?id=27072.
|
||||||
|
|
||||||
|
For the relocation resolver, the main executable needs to be processed the last
|
||||||
|
to process `R_*_COPY`. Without ifunc, the resolving order of shared objects can
|
||||||
|
be arbitrary.
|
||||||
|
|
||||||
|
For ifunc, if the ifunc is defined in a processed module, it is fine. If the
|
||||||
|
ifunc is defined in an unprocessed module, it may crash.
|
||||||
|
|
||||||
|
For an ifunc defined in an executable, calling it from a shared object can be
|
||||||
|
problematic because the executable's relocations haven't been resolved. The
|
||||||
|
issue can be circumvented by converting the non-preemptible ifunc defined in
|
||||||
|
the executable to `STT_FUNC`. GNU ld's x86 port made the change
|
||||||
|
[PR23169](https://sourceware.org/bugzilla/show_bug.cgi?id=23169).
|
||||||
|
|
||||||
|
## `-z ifunc-noplt`
|
||||||
|
|
||||||
|
Mark Johnston introduced `-z ifunc-noplt` for FreeBSD
|
||||||
|
https://reviews.llvm.org/D61613. With this option, all relocations referencing
|
||||||
|
`STT_GNU_IFUNC` will be emitted as dynamic relocations (if `.dynsym` is
|
||||||
|
created). The canonical PLT entry will not be used.
|
||||||
|
|
||||||
|
## Miscellaneous
|
||||||
|
|
||||||
|
GNU ld has implemented a diagnostic (["i686 ifunc and non-default symbol
|
||||||
|
visibility"](https://sourceware.org/bugzilla/show_bug.cgi?id=20515)) to flag
|
||||||
|
`R_386_PC32` referencing non-default visibility ifunc in `-pie` and `-shared`
|
||||||
|
links. This diagnostic looks like the most prominent reason blocking my
|
||||||
|
proposal to use `R_386_PLT32` for `call/jump foo`. See [Copy relocations,
|
||||||
|
canonical PLT entries and protected visibility](maskray-5.md) for details.
|
||||||
|
|
||||||
|
https://sourceware.org/glibc/wiki/GNU_IFUNC misses a lot of information. There
|
||||||
|
are quite a few arch differences. I asked for clarification
|
||||||
|
https://sourceware.org/pipermail/libc-alpha/2021-January/121752.html
|
||||||
|
|
||||||
|
### Dynamic loader
|
||||||
|
|
||||||
|
In glibc, `_dl_runtime_resolver` needs to save and restore vector and floating
|
||||||
|
point registers. ifunc resolvers add another reason that `_dl_runtime_resolver`
|
||||||
|
cannot only use integer registers. (The other reasons are that ld.so has string
|
||||||
|
function calls which may use vectors and external calls to libc.so.)
|
||||||
|
|
|
@ -0,0 +1,223 @@
|
||||||
|
# Everything I know about GNU toolchain
|
||||||
|
|
||||||
|
As mainly an LLVM person, I occasionally contribute to GNU toolchain projects.
|
||||||
|
This is sometimes for fun, sometimes for investigating why an (usually ancient)
|
||||||
|
feature works in a particular way, sometimes for pushing forward a toolchain
|
||||||
|
feature with the mind of both communities, or sometimes just for getting sense
|
||||||
|
of how things work with mailing list+GNU make.
|
||||||
|
|
||||||
|
For a debug build, I normally place my build directory `Debug` directly under
|
||||||
|
the project root.
|
||||||
|
|
||||||
|
## binutils
|
||||||
|
|
||||||
|
* Repository: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git
|
||||||
|
* Mailing list: https://sourceware.org/pipermail/binutils
|
||||||
|
* Bugzilla: https://sourceware.org/bugzilla/
|
||||||
|
* Main tools: as (`gas/`, GNU assembler), ld (`ld/`, GNU ld), gold (`gold/`,
|
||||||
|
GNU gold)
|
||||||
|
|
||||||
|
As of 2021-01, it has no wiki.
|
||||||
|
|
||||||
|
Target `all` builds targets `all-host` and `all-target`. When running
|
||||||
|
configure, by default most top-level directories binutils `gas gdb gdbserver ld
|
||||||
|
libctf` are all enabled. You can disable some components via `--disable-*`.
|
||||||
|
`--enable-gold` is needed to enable gold.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
mkdir Debug; cd Debug
|
||||||
|
../configure --target=x86_64-linux-gnu --prefix=/tmp/opt --disable-gdb --disable-gdbserver
|
||||||
|
```
|
||||||
|
|
||||||
|
For cross compiling, make sure your have `$target-{gcc,as,ld}`.
|
||||||
|
|
||||||
|
For many tools (binutils, gdb, ld), `--enable-targets=all` will build every
|
||||||
|
supported architectures and binary formats. However, one gas build can only
|
||||||
|
support one architecture. ld has a default emulation and needs `-m` to support
|
||||||
|
other architectures (`aarch64 architecture of input file 'a.o' is incompatible
|
||||||
|
with i386:x86-64 output`). Many tests are generic and can be run on many
|
||||||
|
targets, but a `--enable-targets=all` build only tests its default target.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# binutils (binutils/*)
|
||||||
|
make -C Debug all-binutils
|
||||||
|
# gas (gas/as-new)
|
||||||
|
make -C Debug all-gas
|
||||||
|
# ld (ld/ld-new)
|
||||||
|
make -C Debug all-ld
|
||||||
|
|
||||||
|
# Build all enabled tools.
|
||||||
|
make -C Debug all
|
||||||
|
```
|
||||||
|
|
||||||
|
Build with Clang:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
mkdir -p out/clang-debug; cd out/clang-debug
|
||||||
|
../../configure CC=~/Stable/bin/clang CXX=~/Stable/bin/clang++ CFLAGS='-O0 -g' CXXFLAGS='-O0 -g'
|
||||||
|
```
|
||||||
|
|
||||||
|
About security aspect, "don't run any of binutils as root" is sufficient advice
|
||||||
|
(Alan Modra).
|
||||||
|
|
||||||
|
## Test
|
||||||
|
|
||||||
|
GNU Test Framework DejaGnu is based on Expect, which is in turn based on Tcl.
|
||||||
|
|
||||||
|
To run tests:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
make -C Debug check-binutils
|
||||||
|
# Find the result in (summary) Debug/binutils/binutils.sum and (details) Debug/binutils/binutils.log
|
||||||
|
|
||||||
|
make -C Debug check-gas
|
||||||
|
# Find the result in (summary) Debug/gas/testsuite/gas.sum and (details) Debug/gas/testsuite/gas.log
|
||||||
|
|
||||||
|
make -C Debug check-ld
|
||||||
|
|
||||||
|
# Test all enabled tools.
|
||||||
|
make -C Debug check-all
|
||||||
|
```
|
||||||
|
|
||||||
|
For ld, tests are listed in `.exp` files under `ld/testsuite`. A single test
|
||||||
|
normally consists of a `.d` file and several associated `.s` files.
|
||||||
|
|
||||||
|
To run the tests in `ld/testsuite/ld-shared/shared.exp`:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
make -C Debug check-ld RUNTESTFLAGS=ld-shared/shared.exp
|
||||||
|
```
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
|
||||||
|
* A bot updates bfd/version.h (`BFD_VERSION_DATE`) daily.
|
||||||
|
* Test coverage is low.
|
||||||
|
|
||||||
|
## gdb
|
||||||
|
|
||||||
|
gdb resides in the binutils-gdb repository. `configure` enables gdb and
|
||||||
|
gdbserver by default. You just need to make sure `--disable-gdb
|
||||||
|
--disable-gdbserver` is not on the configure line.
|
||||||
|
|
||||||
|
Run gdb under the build directory:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
gdb/gdb -data-directory gdb/data-directory
|
||||||
|
```
|
||||||
|
|
||||||
|
To run the tests in `gdb/testsuite/gdb.dwarf2/dw2-abs-hi-pc.exp`:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
make check-gdb RUNTESTFLAGS=gdb.dwarf2/dw2-abs-hi-pc.exp
|
||||||
|
|
||||||
|
# cd $build/gdb/testsuite/outputs/gdb.dwarf2/dw2-abs-hi-pc
|
||||||
|
```
|
||||||
|
|
||||||
|
## glibc
|
||||||
|
|
||||||
|
* Repository: https://sourceware.org/git/gitweb.cgi?p=glibc.git
|
||||||
|
* Wiki: https://sourceware.org/glibc/wiki/
|
||||||
|
* Bugzilla: https://sourceware.org/bugzilla/
|
||||||
|
* Mailing lists: `{libc-announce,libc-alpha,libc-locale,libc-stable,libc-help}@sourceware.org`
|
||||||
|
|
||||||
|
(Mostly) an implementation of the user-space side of standard C/POSIX functions
|
||||||
|
with Linux extensions.
|
||||||
|
|
||||||
|
A very unfortunate fact: glibc can only be built with `-O2`, not `-O0` or
|
||||||
|
`-O1`. If you want to have an un-optimized debug build, deleting an object file
|
||||||
|
and recompiling it with `-g` usually works. Another workaround is `#pragma GCC
|
||||||
|
optimize ("O0")`.
|
||||||
|
|
||||||
|
The `-O2` issue is probably related to (1) expected inlining and (2) avoiding
|
||||||
|
dynamic relocations.
|
||||||
|
|
||||||
|
Run the following commands to populate `/tmp/glibc-many` with toolchains.
|
||||||
|
Caution: please make sure the target file system has tens of gigabytes.
|
||||||
|
|
||||||
|
Preparation:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
scripts/build-many-glibcs.py /tmp/glibc-many checkout --shallow
|
||||||
|
scripts/build-many-glibcs.py /tmp/glibc-many host-libraries
|
||||||
|
|
||||||
|
scripts/build-many-glibcs.py /tmp/glibc-many compilers aarch64-linux-gnu
|
||||||
|
scripts/build-many-glibcs.py /tmp/glibc-many compilers powerpc64le-linux-gnu
|
||||||
|
scripts/build-many-glibcs.py /tmp/glibc-many compilers sparc64-linux-gnu
|
||||||
|
```
|
||||||
|
|
||||||
|
* `--shallow` passes `--depth 1` to the git clone command.
|
||||||
|
* `--keep` all keeps intermediary build directories intact. You may want this
|
||||||
|
option to investigate build issues.
|
||||||
|
|
||||||
|
The `glibcs` command will delete the glibc build directory, build glibc, and
|
||||||
|
run `make check`.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
scripts/build-many-glibcs.py /tmp/glibc-many glibcs aarch64-linux-gnu
|
||||||
|
# Find the logs and test results under /tmp/glibc-many/logs/glibcs/aarch64-linux-gnu/
|
||||||
|
|
||||||
|
scripts/build-many-glibcs.py /tmp/glibc-many glibcs powerpc64le-linux-gnu
|
||||||
|
|
||||||
|
scripts/build-many-glibcs.py /tmp/glibc-many glibcs sparc64-linux-gnu
|
||||||
|
```
|
||||||
|
|
||||||
|
"On build-many-glibcs.py and most stage1 compiler bootstrap, gcc is build
|
||||||
|
statically against newlib. the static linked gcc (with a lot of disabled
|
||||||
|
features) is then used to build glibc and then the stage2 gcc (which will then
|
||||||
|
have all the features that rely on libc enabled) so the stage1 gcc *might* not
|
||||||
|
have the require started files"
|
||||||
|
|
||||||
|
During development, some interesting targets:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
make -C Debug check-abi
|
||||||
|
```
|
||||||
|
|
||||||
|
Building with Clang is not an option.
|
||||||
|
|
||||||
|
* Clang does not support GCC nested functions [BZ #27220](https://sourceware.org/bugzilla/show_bug.cgi?id=27220)
|
||||||
|
* x86 `PRESERVE_BND_REGS_PREFIX`: integrated assembler does not support the
|
||||||
|
`bnd` prefix.
|
||||||
|
* `sysdeps/powerpc/powerpc64/Makefile`: Clang does not support
|
||||||
|
`-ffixed-vrsave -ffixed-vscr`
|
||||||
|
|
||||||
|
## GCC
|
||||||
|
|
||||||
|
* Mailing lists: `gcc-{patches,regression}@sourceware.org`
|
||||||
|
|
||||||
|
`--disable-bootstrap` is the most important, otherwise you will get a stage 2
|
||||||
|
build. It is not clear what make does when you touch a source file. It
|
||||||
|
definitely rebuilds stage1, but it is not clear to me how well stage2
|
||||||
|
dependency is handled. Anyway, touching a source file causes a total build is
|
||||||
|
not what you desire.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
../configure --disable-bootstrap --enable-languages=c,c++ --disable-multilib
|
||||||
|
make -j 30
|
||||||
|
|
||||||
|
# Incremental build
|
||||||
|
make -C gcc cc1 cc1plus xgcc
|
||||||
|
make -C x86_64-pc-linux-gnu/libstdc++-v3
|
||||||
|
```
|
||||||
|
|
||||||
|
Use built libstdc++ and libgcc.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$build/gcc/xg++ -B $build/release/gcc forced1.C -Wl,-rpath,$build/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs,-rpath,$build/x86_64-pc-linux-gnu/libgcc
|
||||||
|
```
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
|
||||||
|
* A bot updates `ChangeLog` files daily. `Daily bump.`
|
||||||
|
|
||||||
|
## Unlisted
|
||||||
|
|
||||||
|
autotools, bison, m4, make, ...
|
||||||
|
|
||||||
|
### Contributing
|
||||||
|
|
||||||
|
[GNU Coding Standards](https://www.gnu.org/prep/standards/). Emacs has good
|
||||||
|
built-in support. clang-format's support is not as good.
|
||||||
|
|
||||||
|
Legally significant changes need [Copyright Papers](https://www.gnu.org/prep/maintain/html_node/Copyright-Papers.html).
|
||||||
|
|
|
@ -0,0 +1,253 @@
|
||||||
|
# Metadata sections, COMDAT and `SHF_LINK_ORDER`
|
||||||
|
|
||||||
|
## COMDAT
|
||||||
|
|
||||||
|
In C++, inline functions, template instantiations and a few other things can be
|
||||||
|
defined in multiple object files but need deduplication at link time. In the
|
||||||
|
dark ages the functionality was implemented by weak definitions: the linker
|
||||||
|
does not report duplicate definition errors and resolves the references to the
|
||||||
|
first definition. The downside is that unneeded copies remained in the linked
|
||||||
|
image.
|
||||||
|
|
||||||
|
In Microsoft PE file format, the section flag (`IMAGE_SCN_LNK_COMDAT`) marks a
|
||||||
|
section COMDAT and enables deduplication on a per-section basis. If a text
|
||||||
|
section needs a data section and deduplication is needed for both sections, two
|
||||||
|
COMDAT symbols are needed.
|
||||||
|
|
||||||
|
In the GNU world, `.gnu.linkonce.` was invented to duplicate groups with just
|
||||||
|
one member. `.gnu.linkonce.` has been long obsoleted in favor of section groups
|
||||||
|
but the usage has been daunting til 2020. Adhemerval Zanella removed the the
|
||||||
|
last live glibc use case for `.gnu.linkonce.`
|
||||||
|
[BZ #20543](http://sourceware.org/PR20543).
|
||||||
|
|
||||||
|
## ELF section groups
|
||||||
|
|
||||||
|
The ELF specification generalized this use case to allow an arbitrary number of
|
||||||
|
groups to be interrelated.
|
||||||
|
|
||||||
|
> Some sections occur in interrelated groups. For example, an out-of-line
|
||||||
|
> definition of an inline function might require, in addition to the section
|
||||||
|
> containing its executable instructions, a read-only data section containing
|
||||||
|
> literals referenced, one or more debugging information sections and other
|
||||||
|
> informational sections. Furthermore, there may be internal references among
|
||||||
|
> these sections that would not make sense if one of the sections were removed
|
||||||
|
> or replaced by a duplicate from another object. Therefore, such groups must
|
||||||
|
> be included or omitted from the linked object as a unit. A section cannot be
|
||||||
|
> a member of more than one group.
|
||||||
|
|
||||||
|
According to "such groups must be included or omitted from the linked object as
|
||||||
|
a unit", a linker's garbage collection feature must retain or discard the
|
||||||
|
sections as a unit.
|
||||||
|
|
||||||
|
The most common section group flag is `GRP_COMDAT`, which makes the member
|
||||||
|
sections similar to COMDAT in Microsoft PE file format, but can apply to
|
||||||
|
multiple sections. (The committee borrowed the name "COMDAT" from PE.)
|
||||||
|
|
||||||
|
> This is a COMDAT group. It may duplicate another COMDAT group in another
|
||||||
|
> object file, where duplication is defined as having the same group signature.
|
||||||
|
> In such cases, only one of the duplicate groups may be retained by the
|
||||||
|
> linker, and the members of the remaining groups must be discarded.
|
||||||
|
|
||||||
|
I want to highlight one thing GCC does (and Clang inherits) for backward
|
||||||
|
compatibility: the definitions relatived to a COMDAT group member are kept
|
||||||
|
`STB_WEAK` instead of `STB_GLOBAL`. The idea is that old toolchain which does
|
||||||
|
not recognize COMDAT groups can still operate correctly, just in a degraded
|
||||||
|
manner.
|
||||||
|
|
||||||
|
## Metadata sections
|
||||||
|
|
||||||
|
Many compiler options intrument text sections or annotate text sections, and
|
||||||
|
need to create a metadata section for (almost) every text section. Such
|
||||||
|
metadata sections have some characteristics:
|
||||||
|
|
||||||
|
* All relocations from the metadata section reference the associated text
|
||||||
|
section.
|
||||||
|
* The metadata section is only referenced by the associated text section or not
|
||||||
|
referenced at all.
|
||||||
|
|
||||||
|
Below is an example:
|
||||||
|
|
||||||
|
```
|
||||||
|
.section .text.foo,"ax",@progbits
|
||||||
|
|
||||||
|
.section .meta.foo,"a",@progbits
|
||||||
|
.quad .text.foo-.
|
||||||
|
```
|
||||||
|
|
||||||
|
Users want GC semantics for such metadata sections: if `.text.foo` is retained,
|
||||||
|
`.meta.foo` is retained. Note: the regular GC semantics are converse: if
|
||||||
|
`.meta.foo` is retained, `.text.foo` is retained.
|
||||||
|
|
||||||
|
To achieve the desired GC semantics on ELF platforms, we could use a non-COMDAT
|
||||||
|
section group. However, using a section group requires one extra section
|
||||||
|
(usually named `.group`), which requires 40 bytes on ELFCLASS32 platforms and
|
||||||
|
64 bytes on ELFCLASS64 platforms. Put it in another way, to represent the
|
||||||
|
metadata of a text section, we need two sections (the metadata section and the
|
||||||
|
section group), 128 bytes on ELFCLASS64 platforms. The size overhead is
|
||||||
|
concerning in many applications. (AArch64 and x86-64 define ILP32 ABIs and use
|
||||||
|
ELFCLASS32, but technically they can use ELFCLASS32 for small code model with
|
||||||
|
regular ABIs, if the kernel allows.)
|
||||||
|
|
||||||
|
In a generic-abi thread, Cary Coutant initially suggested to use a new section
|
||||||
|
flag `SHF_ASSOCIATED`. HP-UX and Solaris folks objected to a new generic flag.
|
||||||
|
Cary Coutant then discussed with Jim Dehnert and noticed that the existing
|
||||||
|
(rare) flag `SHF_LINK_ORDER` has semantics closer to the metadata GC semantics,
|
||||||
|
so he intended to replace the existing flag `SHF_LINK_ORDER`. Solaris had used
|
||||||
|
its own `SHF_ORDERED` extension before it migrated to the ELF simplification
|
||||||
|
`SHF_LINK_ORDER`. Solaris is still using `SHF_LINK_ORDER` so the flag cannot be
|
||||||
|
repurposed. People discussed whether `SHF_OS_NONCONFORMING` could be repurposed
|
||||||
|
but did not take that route: the platform already knows whether a flag is
|
||||||
|
unknown and knowing a flag is non-conforming does not help produce better
|
||||||
|
output. In the end the agreement was that `SHF_LINK_ORDER` gained additional
|
||||||
|
metadata GC semantics.
|
||||||
|
|
||||||
|
The new semantics:
|
||||||
|
|
||||||
|
> This flag adds special ordering requirements for link editors. The
|
||||||
|
> requirements apply to the referenced section identified by the sh_link field
|
||||||
|
> of this section's header. If this section is combined with other sections in
|
||||||
|
> the output file, the section must appear in the same relative order with
|
||||||
|
> respect to those sections, as the referenced section appears with respect to
|
||||||
|
> sections the referenced section is combined with.
|
||||||
|
>
|
||||||
|
> A typical use of this flag is to build a table that references text or data
|
||||||
|
> sections in address order.
|
||||||
|
>
|
||||||
|
> In addition to adding ordering requirements, `SHF_LINK_ORDER` indicates that
|
||||||
|
> the section contains metadata describing the referenced section. When
|
||||||
|
> performing unused section elimination, the link editor should ensure that
|
||||||
|
> both the section and the referenced section are retained or discarded
|
||||||
|
> together. Furthermore, relocations from this section into the referenced
|
||||||
|
> section should not be taken as evidence that the referenced section should be
|
||||||
|
> retained.
|
||||||
|
|
||||||
|
Actually, ARM EHABI has been using `SHF_LINK_ORDER` for index table sections
|
||||||
|
`.ARM.exidx*`. A `.ARM.exidx` section contains a sequence of 2-word pairs. The
|
||||||
|
first word is 31-bit PC-relative offset to the start of the region. The idea is
|
||||||
|
that if the entries are ordered by the start address, the end address of an
|
||||||
|
entry is implicitly the start address of the next entry and does not need to be
|
||||||
|
explicitly encoded. For this reason the section uses `SHF_LINK_ORDER` for the
|
||||||
|
ordering requirement. The GC semantics are very similar to the metadata
|
||||||
|
sections'.
|
||||||
|
|
||||||
|
So the updated `SHF_LINK_ORDER` wording can be seen as recognition for the
|
||||||
|
current practice (even though the original discussion did not actually notice
|
||||||
|
ARM EHABI).
|
||||||
|
|
||||||
|
However, in binutils, before 2.35, `SHF_LINK_ORDER` could be produced by ARM
|
||||||
|
assembly directives, but not specified by user-customized sections.
|
||||||
|
|
||||||
|
## C identifier name sections
|
||||||
|
|
||||||
|
A section whose name consists of pure C-like identifier characters (isalnum
|
||||||
|
characters in the C locale plus `_`) is considered as a GC root by ld
|
||||||
|
`--gc-sections`. The idea is that linker defined `__start_foo` and `__stop_foo`
|
||||||
|
are used to delimiter the output section foo. Even if input sections foo are
|
||||||
|
not referenced by other sections, `__start_foo`/`__stop_foo` is a signal that
|
||||||
|
foo should be retained.
|
||||||
|
|
||||||
|
The metadata use case requires an amendment of the rule: if `SHF_LINK_ORDER` is
|
||||||
|
set on foo, foo can be GCed (LLD r294592).
|
||||||
|
|
||||||
|
GNU ld does not implement this rule yet. https://sourceware.org/bugzilla/show_bug.cgi?id=27259
|
||||||
|
|
||||||
|
## Pitfalls
|
||||||
|
|
||||||
|
### Mixed unordered and ordered sections
|
||||||
|
|
||||||
|
If an output section consists of only non-`SHF_LINK_ORDER` sections, the rule is
|
||||||
|
clear: input sections are ordered in their input order. If an output section
|
||||||
|
consists of only `SHF_LINK_ORDER` sections, the rule is also clear: input
|
||||||
|
sections are ordered with respect to their linked-to sections.
|
||||||
|
|
||||||
|
What is unclear is how to handle an output section with mixed unordered and
|
||||||
|
ordered sections.
|
||||||
|
|
||||||
|
GNU ld had a diagnostic: . LLD rejected the case as well error:
|
||||||
|
`incompatible section flags for .rodata`.
|
||||||
|
|
||||||
|
When I implemented `-fpatchable-function-entry=` for Clang, I observed some GC
|
||||||
|
related issues with the GCC implementation. I reported them and carefully chose
|
||||||
|
`SHF_LINK_ORDER` in the Clang implementation if the integrated assembler is
|
||||||
|
used.
|
||||||
|
|
||||||
|
This was a problem if the user wanted to place such input sections along with
|
||||||
|
unordered sections, e.g.
|
||||||
|
`.init.data : { ... KEEP(*(__patchable_function_entries)) ... }`
|
||||||
|
(https://github.com/ClangBuiltLinux/linux/issues/953).
|
||||||
|
|
||||||
|
As a response, I submitted https://reviews.llvm.org/D77007 to allow ordered
|
||||||
|
input section descriptions within an output section.
|
||||||
|
|
||||||
|
This worked well for the Linux kernel. Mixed unordered and ordered sections
|
||||||
|
within an input section description was still a problem. This made it
|
||||||
|
infeasible to add `SHF_LINK_ORDER` to an existing metadata section and expect
|
||||||
|
new object files linkable with old object files which do not have the flag. I
|
||||||
|
asked how to resolve this upgrade issue and Ali Bahrami responded:
|
||||||
|
|
||||||
|
> The Solaris linker puts sections without `SHF_LINK_ORDER` at the end of the
|
||||||
|
> output section, in first-in-first-out order, and I don't believe that's
|
||||||
|
> considered to be an error.
|
||||||
|
|
||||||
|
So I went ahead and implemented a similar rule for LLD:
|
||||||
|
https://reviews.llvm.org/D84001 allowes arbitrary mix and places
|
||||||
|
`SHF_LINK_ORDER` sections before non-`SHF_LINK_ORDER` sections.
|
||||||
|
|
||||||
|
### If the associated section is discarded
|
||||||
|
|
||||||
|
We decided that the integrated assembler allows `SHF_LINK_ORDER` with
|
||||||
|
`sh_link=0` and LLD can handle such sections as regular unordered sections
|
||||||
|
(https://reviews.llvm.org/D72904).
|
||||||
|
|
||||||
|
### Other pitfalls
|
||||||
|
|
||||||
|
* During `--icf={safe,all}`, `SHF_LINK_ORDER` sections should not be separately
|
||||||
|
considered.
|
||||||
|
* In relocatable output, `SHF_LINK_ORDER` sections cannot be combined by name.
|
||||||
|
* When comparing two input sections with different linked-to output sections,
|
||||||
|
use vaddr of output sections instead of section indexes. Peter Smith fixed
|
||||||
|
this in https://reviews.llvm.org/D79286.
|
||||||
|
|
||||||
|
## Miscellaneous
|
||||||
|
|
||||||
|
Arm Compiler 5 splits up DWARF Version 3 debug information and puts these
|
||||||
|
sections into comdat groups. On "monolithic input section handling", Peter
|
||||||
|
Smith commented that:
|
||||||
|
|
||||||
|
> We found that splitting up the debug into fragments works well as it permits
|
||||||
|
> the linker to ensure that all the references to local symbols are to sections
|
||||||
|
> within the same group, this makes it easy for the linker to remove all the
|
||||||
|
> debug when the group isn't selected.
|
||||||
|
>
|
||||||
|
> This approach did produce significantly more debug information than gcc did.
|
||||||
|
> For small microcontroller projects this wasn't a problem. For larger feature
|
||||||
|
> phone problems we had to put a lot of work into keeping the linker's memory
|
||||||
|
> usage down as many of our customers at the time were using 32-bit Windows
|
||||||
|
> machines with a default maximum virtual memory of 2Gb.
|
||||||
|
|
||||||
|
COMDAT sections have size overhead on extra section headers. Developers may be
|
||||||
|
tempted to decrease the overhead with `SHF_LINK_ORDER`. However, the approach
|
||||||
|
does not work due to the ordering requirement. Considering the following
|
||||||
|
fragments:
|
||||||
|
|
||||||
|
```
|
||||||
|
header [a.o common]
|
||||||
|
- DW_TAG_compile_unit [a.o common]
|
||||||
|
-- DW_TAG_variable [a.o .data.foo]
|
||||||
|
-- DW_TAG_namespace [common]
|
||||||
|
--- DW_TAG_subprogram [a.o .text.bar]
|
||||||
|
--- DW_TAG_variable [a.o .data.baz]
|
||||||
|
footer [a.o common]
|
||||||
|
header [b.o common]
|
||||||
|
- DW_TAG_compile_unit [b.o common]
|
||||||
|
-- DW_TAG_variable [b.o .data.foo]
|
||||||
|
-- DW_TAG_namespace [common]
|
||||||
|
--- DW_TAG_subprogram [b.o .text.bar]
|
||||||
|
--- DW_TAG_variable [b.o .data.baz]
|
||||||
|
footer [b.o common]
|
||||||
|
```
|
||||||
|
|
||||||
|
`DW_TAG_*` tags associated with concrete sections can be represented with
|
||||||
|
`SHF_LINK_ORDER` sections. After linking the sections will be ordered before the
|
||||||
|
common parts.
|
||||||
|
|
Loading…
Reference in New Issue