Skip to content

Commit ba2413e

Browse files
committed
Merge branch 'main' into cfi2
2 parents 6f7db23 + 4720932 commit ba2413e

715 files changed

Lines changed: 36452 additions & 18845 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
=====================================
2+
AArch64 Optimization and Flags Status
3+
=====================================
4+
5+
Overview
6+
--------
7+
8+
This page summarizes default-off BOLT optimization flags that users may
9+
explicitly enable when optimizing AArch64 binaries.
10+
11+
BOLT is to be used with binaries linked with
12+
relocations (``--emit-relocs`` or ``-Wl,-q``) and representative profile data.
13+
14+
Main Code-Layout Optimizations
15+
------------------------------
16+
The following code-layout optimizations are typically the first options to
17+
consider when optimizing AArch64 binaries with representative profile data.
18+
They typically provide the largest performance gains among BOLT optimizations.
19+
20+
.. list-table::
21+
:header-rows: 1
22+
:widths: 34 42
23+
:align: left
24+
25+
* - Flag
26+
- Optimization
27+
* - | ``--reorder-functions=exec-count|hfsort|cdsort|pettis-hansen|random|user``
28+
| ``--function-order=<file>``
29+
- Reorder functions
30+
* - ``--reorder-blocks=normal|ext-tsp|cache|branch-predictor|reverse|cluster-shuffle``
31+
- Reorder basic blocks
32+
* - | ``--split-functions``
33+
| ``--split-strategy=profile2|random2|randomN|all``
34+
| ``--split-all-cold``
35+
| ``--split-eh``
36+
- Split hot and cold code
37+
38+
39+
Other Supported Optimizations
40+
-----------------------------
41+
The following optimizations are also supported for AArch64.
42+
43+
.. list-table::
44+
:header-rows: 1
45+
:widths: 34 42
46+
:align: left
47+
48+
* - Flag
49+
- Optimization
50+
* - | ``--align-blocks``
51+
| ``--block-alignment=<uint>``
52+
- Align basic blocks
53+
* - ``--tail-duplication=aggressive|moderate|cache``
54+
- Duplicate branch tails
55+
* - ``--peepholes=double-jumps|tailcall-traps|useless-branches|all``
56+
- Run peephole optimizations
57+
* - | ``--inline-all``
58+
| ``--inline-small-functions``
59+
| Related options:
60+
| ``--inline-ap``
61+
| ``--inline-limit=<uint>``
62+
| ``--inline-small-functions-bytes=<uint>``
63+
- Inline functions
64+
* - ``--icf=safe|all``
65+
- Fold identical functions
66+
67+
Supported Flags With Limitations
68+
--------------------------------
69+
The following flags are implemented for AArch64, but require specific runtime
70+
or option conditions. Enabling them without the required conditions may report
71+
an error or perform no transformation.
72+
73+
.. list-table::
74+
:header-rows: 1
75+
:widths: 30 28 44
76+
:align: left
77+
78+
* - Flag
79+
- Optimization
80+
- Notes
81+
* - ``--inline-memcpy``
82+
- Inline fixed-size ``memcpy`` calls
83+
- Only applies when the copy size is a known constant; AArch64 skips sizes over 64 bytes.
84+
* - ``--plt=hot|all``
85+
- Optimize PLT calls
86+
- Requires immediate binding. If BOLT cannot update the binary, relink with ``-znow``.
87+
* - ``--hugify``
88+
- Place hot code on huge pages
89+
- Applies to binaries with a recognized entry point; skipped when ``--instrument`` is used.
90+
* - | ``--reorder-data=<section1,section2,...>``
91+
| ``--reorder-data-algo=count|funcs``
92+
- Reorder data sections
93+
- ``move``, ``split`` and ``aggressive`` disable data reordering.
94+
* - ``--split-strategy=cdsplit``
95+
- Split functions using cache-directed splitting
96+
- Requires ``--compact-code-model`` on AArch64.
97+
98+
Unsupported Flags
99+
-----------------
100+
101+
The following flags are not available for AArch64. ``Not applicable to
102+
AArch64`` means the optimization targets architectural features or mechanisms
103+
that do not apply to AArch64. ``Not implemented for AArch64`` means the
104+
optimization could be relevant, but is not currently implemented for this
105+
target.
106+
107+
.. list-table::
108+
:header-rows: 1
109+
:widths: 30 28 42
110+
:align: left
111+
112+
* - Flag
113+
- Optimization
114+
- Notes
115+
* - ``--jt-footprint-reduction``
116+
- Reduce jump-table footprint
117+
- Not implemented for AArch64.
118+
* - ``--three-way-branch``
119+
- Reorder three-way branches
120+
- Not implemented for AArch64.
121+
* - ``--simplify-rodata-loads``
122+
- Replace read-only data loads with constants
123+
- Not implemented for AArch64.
124+
* - ``--frame-opt=hot|all``
125+
- Optimize stack-frame accesses
126+
- Not implemented for AArch64.
127+
* - ``--indirect-call-promotion=calls|jump-tables|all``
128+
- Promote indirect calls
129+
- Not implemented for AArch64.
130+
* - ``--memcpy1-spec=<func1,func2:cs1:cs2,...>``
131+
- Specialize one-byte ``memcpy`` calls
132+
- Not implemented for AArch64.
133+
* - ``--reg-reassign``
134+
- Reassign registers to reduce encoding size
135+
- Not applicable to AArch64.
136+
* - ``--cmov-conversion``
137+
- Convert branches to conditional moves
138+
- Not applicable to AArch64.
139+
* - | ``--stoke``
140+
| ``--stoke-out``
141+
- Emit STOKE optimization data
142+
- Not applicable to AArch64.
143+
* - ``--insert-retpolines``
144+
- Insert retpolines
145+
- Not applicable to AArch64.

bolt/include/bolt/Core/BinaryContext.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
#include <functional>
4949
#include <list>
5050
#include <map>
51+
#include <mutex>
5152
#include <optional>
5253
#include <set>
5354
#include <string>
@@ -284,12 +285,18 @@ class BinaryContext {
284285
/// Internal helper for removing section name from a lookup table.
285286
void deregisterSectionName(const BinarySection &Section);
286287

288+
/// Mutex used for parallel processing of DWP type units.
289+
std::mutex DWPUnitsMutex;
290+
287291
public:
288292
static Expected<std::unique_ptr<BinaryContext>> createBinaryContext(
289293
Triple TheTriple, std::shared_ptr<orc::SymbolStringPool> SSP,
290294
StringRef InputFileName, SubtargetFeatures *Features, bool IsPIC,
291295
std::unique_ptr<DWARFContext> DwCtx, JournalingStreams Logger);
292296

297+
/// Returns the mutex guarding concurrent access to DWP units.
298+
std::mutex &getUnitsMutex() { return DWPUnitsMutex; }
299+
293300
/// Superset of compiler units that will contain overwritten code that needs
294301
/// new debug info. In a few cases, functions may end up not being
295302
/// overwritten, but it is okay to re-generate debug info for them.

bolt/include/bolt/Core/DIEBuilder.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,9 @@ class DIEBuilder {
217217
/// Returns true if DWARFUnit is registered successfully.
218218
bool registerUnit(DWARFUnit &DU, bool NeedSort);
219219

220+
/// Builds type units needed in the DWO.
221+
void buildDWPTypeUnitsForUnit(DWARFUnit &U);
222+
220223
/// \return the unique ID of \p U if it exists.
221224
std::optional<uint32_t> getUnitId(const DWARFUnit &DU);
222225

bolt/lib/Core/DIEBuilder.cpp

Lines changed: 76 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,77 @@ void DIEBuilder::buildTypeUnits(DebugStrOffsetsWriter *StrOffsetWriter,
283283
}
284284
}
285285

286+
/// Recursively collects type unit signatures from the given DIE and all of its
287+
/// children.
288+
///
289+
/// Note: De-duplication of the collected signatures is handled at the outer
290+
/// level by registerUnit.
291+
static void collectReferencedTypeSignatures(DWARFDie Die,
292+
DenseSet<uint64_t> &ProcessedTU,
293+
SmallVectorImpl<uint64_t> &TUlist) {
294+
SmallVector<DWARFDie, 8> DIElist;
295+
DIElist.push_back(Die);
296+
297+
while (!DIElist.empty()) {
298+
DWARFDie Current = DIElist.pop_back_val();
299+
if (!Current)
300+
continue;
301+
302+
for (const DWARFAttribute &Attr : Current.attributes()) {
303+
if (Attr.Value.getForm() != dwarf::DW_FORM_ref_sig8)
304+
continue;
305+
if (const std::optional<uint64_t> Signature =
306+
Attr.Value.getAsSignatureReference())
307+
if (ProcessedTU.insert(*Signature).second)
308+
TUlist.push_back(*Signature);
309+
}
310+
311+
for (DWARFDie Child : Current.children())
312+
DIElist.push_back(Child);
313+
}
314+
}
315+
316+
void DIEBuilder::buildDWPTypeUnitsForUnit(DWARFUnit &U) {
317+
std::unique_lock<std::mutex> LockGuard(BC.getUnitsMutex());
318+
// Avoid processing the same type unit multiple times.
319+
DenseSet<uint64_t> ProcessedTU;
320+
SmallVector<uint64_t, 8> TUlist;
321+
// Collecting signatures of type units referenced by this unit.
322+
collectReferencedTypeSignatures(U.getUnitDIE(), ProcessedTU, TUlist);
323+
324+
getState().Type = U.getVersion() < 5 ? ProcessingType::DWARF4TUs
325+
: ProcessingType::DWARF5TUs;
326+
// addressing type units referenced.
327+
for (unsigned I = 0; I != TUlist.size(); ++I) {
328+
const uint64_t Signature = TUlist[I];
329+
DWARFTypeUnit *TU = DwarfContext->getTypeUnitForHash(Signature, true);
330+
if (!TU)
331+
continue;
332+
if (!registerUnit(*TU, false))
333+
continue;
334+
335+
const std::optional<uint32_t> UnitId = getUnitId(*TU);
336+
if (!UnitId || getState().CloneUnitCtxMap[*UnitId].IsConstructed)
337+
continue;
338+
339+
collectReferencedTypeSignatures(TU->getUnitDIE(), ProcessedTU, TUlist);
340+
}
341+
342+
// Ensure original order of processing type units
343+
auto SortByOffset = [](const DWARFUnit *A, const DWARFUnit *B) {
344+
return A->getOffset() < B->getOffset();
345+
};
346+
347+
// For Split DWARF, we have either DWARF4 or DWARF5, they cannot be mixed.
348+
std::vector<DWARFUnit *> &TUVec = getState().Type == ProcessingType::DWARF4TUs
349+
? getState().DWARF4TUVector
350+
: getState().DWARF5TUVector;
351+
llvm::sort(TUVec, SortByOffset);
352+
353+
for (DWARFUnit *TU : TUVec)
354+
constructFromUnit(*TU);
355+
}
356+
286357
void DIEBuilder::buildCompileUnits(const bool Init) {
287358
if (Init)
288359
BuilderState.reset(new State());
@@ -328,7 +399,11 @@ void DIEBuilder::buildCompileUnits(const std::vector<DWARFUnit *> &CUs) {
328399
void DIEBuilder::buildDWOUnit(DWARFUnit &U) {
329400
BuilderState.release();
330401
BuilderState = std::make_unique<State>();
331-
buildTypeUnits(nullptr, false);
402+
if (DwarfContext->isDWP()) {
403+
buildDWPTypeUnitsForUnit(U);
404+
} else {
405+
buildTypeUnits(nullptr, false);
406+
}
332407
getState().Type = ProcessingType::CUs;
333408
registerUnit(U, false);
334409
constructFromUnit(U);

bolt/lib/Rewrite/DWARFRewriter.cpp

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -496,12 +496,8 @@ static void emitDWOBuilder(const std::string &DWOName,
496496
createDIEStreamer(*TheTriple, *ObjOS, "DwoStreamerInitAug2",
497497
DWODIEBuilder, GDBIndexSection);
498498
if (SplitCU.getContext().getMaxDWOVersion() >= 5) {
499-
for (std::unique_ptr<llvm::DWARFUnit> &CU :
500-
SplitCU.getContext().dwo_info_section_units()) {
501-
if (!CU->isTypeUnit())
502-
continue;
499+
for (DWARFUnit *CU : DWODIEBuilder.getDWARF5TUVector())
503500
emitUnit(DWODIEBuilder, *Streamer, *CU);
504-
}
505501
emitUnit(DWODIEBuilder, *Streamer, SplitCU);
506502
} else {
507503
emitUnit(DWODIEBuilder, *Streamer, SplitCU);

bolt/lib/Rewrite/RewriteInstance.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@ RewriteInstance::RewriteInstance(ELFObjectFileBase *File, const int Argc,
444444
DWARFContext::create(*File, DWARFContext::ProcessDebugRelocations::Ignore,
445445
nullptr, opts::DWPPathName,
446446
WithColor::defaultErrorHandler,
447-
WithColor::defaultWarningHandler),
447+
WithColor::defaultWarningHandler, true),
448448
JournalingStreams{Stdout, Stderr});
449449
if (Error E = BCOrErr.takeError()) {
450450
Err = std::move(E);

bolt/test/X86/dwarf5-df-types-dup-dwp-input.test

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,17 +11,18 @@
1111
; RUN: llvm-dwarfdump --debug-info -r 0 main.dwo.dwo | FileCheck -check-prefix=BOLT-DWO-DWO-MAIN %s
1212
; RUN: llvm-dwarfdump --debug-info -r 0 helper.dwo.dwo | FileCheck -check-prefix=BOLT-DWO-DWO-HELPER %s
1313

14-
;; Tests that BOLT correctly handles DWARF5 DWP file as input. Output has correct CU, and all the type units are written out.
14+
;; Tests that BOLT correctly handles DWARF5 DWP file as input. Output has the correct CU,
15+
;; and only type units referenced by that CU are written out.
1516

1617
; BOLT-DWO-DWO-MAIN: debug_info.dwo
1718
; BOLT-DWO-DWO-MAIN-NEXT: type_signature = 0x49dc260088be7e56
1819
; BOLT-DWO-DWO-MAIN: type_signature = 0x104ec427d2ebea6f
19-
; BOLT-DWO-DWO-MAIN: type_signature = 0xca1e65a66d92b970
20+
; BOLT-DWO-DWO-MAIN-NOT: type_signature = 0xca1e65a66d92b970
2021
; BOLT-DWO-DWO-MAIN: Compile Unit
2122
; BOLT-DWO-DWO-MAIN-SAME: DWO_id = 0x52bda211bf6d26b7
2223
; BOLT-DWO-DWO-MAIN-NOT: Compile Unit
2324
; BOLT-DWO-DWO-HELPER: debug_info.dwo
24-
; BOLT-DWO-DWO-HELPER-NEXT: type_signature = 0x49dc260088be7e56
25+
; BOLT-DWO-DWO-HELPER-NOT: type_signature = 0x49dc260088be7e56
2526
; BOLT-DWO-DWO-HELPER: type_signature = 0x104ec427d2ebea6f
2627
; BOLT-DWO-DWO-HELPER: type_signature = 0xca1e65a66d92b970
2728
; BOLT-DWO-DWO-HELPER: Compile Unit
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
; RUN: rm -rf %t
2+
; RUN: mkdir %t
3+
; RUN: cd %t
4+
; RUN: llvm-mc -dwarf-version=5 -filetype=obj -triple x86_64-unknown-linux %p/Inputs/dwarf5-df-types-dup-main.s \
5+
; RUN: -split-dwarf-file=main.dwo -o main.o
6+
; RUN: llvm-mc -dwarf-version=5 -filetype=obj -triple x86_64-unknown-linux %p/Inputs/dwarf5-df-types-dup-helper.s \
7+
; RUN: -split-dwarf-file=helper.dwo -o helper.o
8+
; RUN: %clang %cflags -gdwarf-5 -gsplit-dwarf=split main.o helper.o -o main.exe
9+
; RUN: llvm-dwp -e main.exe -o main.exe.dwp
10+
; RUN: llvm-bolt main.exe -o main.exe.bolt --update-debug-sections \
11+
; RUN: --thread-count=4 --cu-processing-batch-size=4 --dwp=main.exe.dwp 2>&1 | FileCheck %s
12+
; RUN: llvm-dwarfdump --debug-info -r 0 main.dwo.dwo | FileCheck -check-prefix=BOLT-DWO-DWO-MAIN %s
13+
; RUN: llvm-dwarfdump --debug-info -r 0 helper.dwo.dwo | FileCheck -check-prefix=BOLT-DWO-DWO-HELPER %s
14+
15+
;; Tests that BOLT correctly handles DWARF5 DWP file as input in a
16+
;; multi-threaded configuration without triggering ThreadSanitizer (TSan)
17+
;; data race reports. The binary must be built with TSan instrumentation
18+
;; (i.e. LLVM_USE_SANITIZER=Thread) for this test to be effective.
19+
;; REQUIRES: tsan
20+
21+
; CHECK-NOT: ThreadSanitizer: data race
22+
; CHECK-NOT: ThreadSanitizer: reported
23+
; CHECK-NOT: WARNING: ThreadSanitizer
24+
25+
; BOLT-DWO-DWO-MAIN: DW_TAG_compile_unit
26+
; BOLT-DWO-DWO-HELPER: DW_TAG_compile_unit

clang-tools-extra/clang-tidy/fuchsia/StaticallyConstructedObjectsCheck.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@ void StaticallyConstructedObjectsCheck::registerMatchers(MatchFinder *Finder) {
3535
hasDescendant(cxxConstructExpr(unless(allOf(
3636
// ... unless it is constexpr ...
3737
hasDeclaration(cxxConstructorDecl(isConstexpr())),
38-
// ... and is statically initialized.
39-
isConstantInitializer())))))
38+
// ... and is statically initialized or value-dependent.
39+
anyOf(isValueDependent(), isConstantInitializer()))))))
4040
.bind("decl")),
4141
this);
4242
}

clang-tools-extra/clangd/InlayHints.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -492,7 +492,8 @@ class InlayHintVisitor : public RecursiveASTVisitor<InlayHintVisitor> {
492492
// either.
493493
if (const CXXMethodDecl *Method =
494494
dyn_cast_or_null<CXXMethodDecl>(Callee.Decl))
495-
if (IsFunctor || Method->hasCXXExplicitFunctionObjectParameter())
495+
if (IsFunctor || (!E->isTypeDependent() &&
496+
Method->hasCXXExplicitFunctionObjectParameter()))
496497
Args = Args.drop_front(1);
497498
processCall(Callee, E->getRParenLoc(), Args);
498499
return true;

0 commit comments

Comments
 (0)