|
| 1 | +===================================== |
| 2 | +AArch64 Optimization and Flags Status |
| 3 | +===================================== |
| 4 | + |
| 5 | +Overview |
| 6 | +-------- |
| 7 | + |
| 8 | +This page summarizes default-off BOLT optimization flags that users may |
| 9 | +explicitly enable when optimizing AArch64 binaries. |
| 10 | + |
| 11 | +BOLT is to be used with binaries linked with |
| 12 | +relocations (``--emit-relocs`` or ``-Wl,-q``) and representative profile data. |
| 13 | + |
| 14 | +Main Code-Layout Optimizations |
| 15 | +------------------------------ |
| 16 | +The following code-layout optimizations are typically the first options to |
| 17 | +consider when optimizing AArch64 binaries with representative profile data. |
| 18 | +They typically provide the largest performance gains among BOLT optimizations. |
| 19 | + |
| 20 | +.. list-table:: |
| 21 | + :header-rows: 1 |
| 22 | + :widths: 34 42 |
| 23 | + :align: left |
| 24 | + |
| 25 | + * - Flag |
| 26 | + - Optimization |
| 27 | + * - | ``--reorder-functions=exec-count|hfsort|cdsort|pettis-hansen|random|user`` |
| 28 | + | ``--function-order=<file>`` |
| 29 | + - Reorder functions |
| 30 | + * - ``--reorder-blocks=normal|ext-tsp|cache|branch-predictor|reverse|cluster-shuffle`` |
| 31 | + - Reorder basic blocks |
| 32 | + * - | ``--split-functions`` |
| 33 | + | ``--split-strategy=profile2|random2|randomN|all`` |
| 34 | + | ``--split-all-cold`` |
| 35 | + | ``--split-eh`` |
| 36 | + - Split hot and cold code |
| 37 | + |
| 38 | + |
| 39 | +Other Supported Optimizations |
| 40 | +----------------------------- |
| 41 | +The following optimizations are also supported for AArch64. |
| 42 | + |
| 43 | +.. list-table:: |
| 44 | + :header-rows: 1 |
| 45 | + :widths: 34 42 |
| 46 | + :align: left |
| 47 | + |
| 48 | + * - Flag |
| 49 | + - Optimization |
| 50 | + * - | ``--align-blocks`` |
| 51 | + | ``--block-alignment=<uint>`` |
| 52 | + - Align basic blocks |
| 53 | + * - ``--tail-duplication=aggressive|moderate|cache`` |
| 54 | + - Duplicate branch tails |
| 55 | + * - ``--peepholes=double-jumps|tailcall-traps|useless-branches|all`` |
| 56 | + - Run peephole optimizations |
| 57 | + * - | ``--inline-all`` |
| 58 | + | ``--inline-small-functions`` |
| 59 | + | Related options: |
| 60 | + | ``--inline-ap`` |
| 61 | + | ``--inline-limit=<uint>`` |
| 62 | + | ``--inline-small-functions-bytes=<uint>`` |
| 63 | + - Inline functions |
| 64 | + * - ``--icf=safe|all`` |
| 65 | + - Fold identical functions |
| 66 | + |
| 67 | +Supported Flags With Limitations |
| 68 | +-------------------------------- |
| 69 | +The following flags are implemented for AArch64, but require specific runtime |
| 70 | +or option conditions. Enabling them without the required conditions may report |
| 71 | +an error or perform no transformation. |
| 72 | + |
| 73 | +.. list-table:: |
| 74 | + :header-rows: 1 |
| 75 | + :widths: 30 28 44 |
| 76 | + :align: left |
| 77 | + |
| 78 | + * - Flag |
| 79 | + - Optimization |
| 80 | + - Notes |
| 81 | + * - ``--inline-memcpy`` |
| 82 | + - Inline fixed-size ``memcpy`` calls |
| 83 | + - Only applies when the copy size is a known constant; AArch64 skips sizes over 64 bytes. |
| 84 | + * - ``--plt=hot|all`` |
| 85 | + - Optimize PLT calls |
| 86 | + - Requires immediate binding. If BOLT cannot update the binary, relink with ``-znow``. |
| 87 | + * - ``--hugify`` |
| 88 | + - Place hot code on huge pages |
| 89 | + - Applies to binaries with a recognized entry point; skipped when ``--instrument`` is used. |
| 90 | + * - | ``--reorder-data=<section1,section2,...>`` |
| 91 | + | ``--reorder-data-algo=count|funcs`` |
| 92 | + - Reorder data sections |
| 93 | + - ``move``, ``split`` and ``aggressive`` disable data reordering. |
| 94 | + * - ``--split-strategy=cdsplit`` |
| 95 | + - Split functions using cache-directed splitting |
| 96 | + - Requires ``--compact-code-model`` on AArch64. |
| 97 | + |
| 98 | +Unsupported Flags |
| 99 | +----------------- |
| 100 | + |
| 101 | +The following flags are not available for AArch64. ``Not applicable to |
| 102 | +AArch64`` means the optimization targets architectural features or mechanisms |
| 103 | +that do not apply to AArch64. ``Not implemented for AArch64`` means the |
| 104 | +optimization could be relevant, but is not currently implemented for this |
| 105 | +target. |
| 106 | + |
| 107 | +.. list-table:: |
| 108 | + :header-rows: 1 |
| 109 | + :widths: 30 28 42 |
| 110 | + :align: left |
| 111 | + |
| 112 | + * - Flag |
| 113 | + - Optimization |
| 114 | + - Notes |
| 115 | + * - ``--jt-footprint-reduction`` |
| 116 | + - Reduce jump-table footprint |
| 117 | + - Not implemented for AArch64. |
| 118 | + * - ``--three-way-branch`` |
| 119 | + - Reorder three-way branches |
| 120 | + - Not implemented for AArch64. |
| 121 | + * - ``--simplify-rodata-loads`` |
| 122 | + - Replace read-only data loads with constants |
| 123 | + - Not implemented for AArch64. |
| 124 | + * - ``--frame-opt=hot|all`` |
| 125 | + - Optimize stack-frame accesses |
| 126 | + - Not implemented for AArch64. |
| 127 | + * - ``--indirect-call-promotion=calls|jump-tables|all`` |
| 128 | + - Promote indirect calls |
| 129 | + - Not implemented for AArch64. |
| 130 | + * - ``--memcpy1-spec=<func1,func2:cs1:cs2,...>`` |
| 131 | + - Specialize one-byte ``memcpy`` calls |
| 132 | + - Not implemented for AArch64. |
| 133 | + * - ``--reg-reassign`` |
| 134 | + - Reassign registers to reduce encoding size |
| 135 | + - Not applicable to AArch64. |
| 136 | + * - ``--cmov-conversion`` |
| 137 | + - Convert branches to conditional moves |
| 138 | + - Not applicable to AArch64. |
| 139 | + * - | ``--stoke`` |
| 140 | + | ``--stoke-out`` |
| 141 | + - Emit STOKE optimization data |
| 142 | + - Not applicable to AArch64. |
| 143 | + * - ``--insert-retpolines`` |
| 144 | + - Insert retpolines |
| 145 | + - Not applicable to AArch64. |
0 commit comments