Skip to content

Reland "Compute GUIDs once and store in metadata" (#184065) #201849

Open
mtrofin wants to merge 1 commit into
mainfrom
users/mtrofin/06-05-reland_184065
Open

Reland "Compute GUIDs once and store in metadata" (#184065) #201849
mtrofin wants to merge 1 commit into
mainfrom
users/mtrofin/06-05-reland_184065

Conversation

@mtrofin

@mtrofin mtrofin commented Jun 5, 2026

Copy link
Copy Markdown
Member

This reverts #201194, thus relanding @orodley's PR #184065 (and #200323):

This allows us to keep GUIDs consistent across compilation phases which may change the name or linkage type.
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801

The CFI issues that triggered the original revert are fixed by #201370, together with the addressing of the TODOs in LowerTypeTests.cpp left in the latter. The graphite diff between this change's V1 and V2 shows what's been added:

Currently, we reassign GUIDs when CFI promotes internal linkage symbols, which is counter to the goal of the RFC. This is addressed in PR #203171. The reason for this split fix can be explained on compiler-rt/test/cfi/icall/wrong-signature-mixed-lto.c. Here, a module with the exact same source path is compiled twice, under different conditional compilation, to produce 2 objects. Each object defines an internal linkage symbol with the same name (this is install_trap_loop_detection from compiler-rt/test/cfi/trap_loop_signal_handler.inc which is -include-d by both - see how %clang_cfi is defined). The ThinLTO GUID of this symbol will be the same. Its name won't be - because CFI promotes it and renames it using a hash that is based on the IR Module content (rather than the source path). During thinlink, LTO::addThinLTOwill mark each of the 2 exported symbols as prevailing in their corresponding modules. But that is done by associating their GUID to the module. So whichever comes last wins. The other symbol will be marked available externally and its body DCEd later in backend. But each module will refer to its copy of install_trap_loop_detection, and so we end up with a linker error.

As mentioned, the fix is in PR #203171, and this relanding PR just maintains the existing ThinLTO behavior by rewriting the GUIDs. Since we haven't yet leveraged the GUID mechanics for e.g. simplifying PGO, this aspect of this change is essentially NFC.

Co-authored-by: @orodley

mtrofin commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch 2 times, most recently from ad94f68 to a1a789e Compare June 5, 2026 15:21
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

🐧 Linux x64 Test Results

  • 206509 tests passed
  • 6777 tests skipped

✅ The build succeeded and all tests passed.

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch 4 times, most recently from 8cd8eb5 to f053482 Compare June 9, 2026 02:08
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

✅ With the latest revision this PR passed the C/C++ code formatter.

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch from f053482 to 9926393 Compare June 9, 2026 02:10
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

🪟 Windows x64 Test Results

  • 138000 tests passed
  • 4866 tests skipped

✅ The build succeeded and all tests passed.

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch 2 times, most recently from 13d3ad0 to f8a1ec9 Compare June 11, 2026 04:58
@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch 2 times, most recently from 854a6f9 to 6dc97cf Compare June 16, 2026 03:17
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

✅ With the latest revision this PR passed the LLVM ABI annotation checker.

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch 2 times, most recently from 6f02138 to c7ff207 Compare June 17, 2026 14:54
@mtrofin mtrofin changed the base branch from main to users/mtrofin/06-17-_coro_handle_aliases June 17, 2026 18:14
@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch from c7ff207 to 525046e Compare June 17, 2026 18:15
@mtrofin mtrofin marked this pull request as ready for review June 17, 2026 18:18
@llvmorg-github-actions llvmorg-github-actions Bot added lld backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:codegen lld:ELF PGO Profile Guided Optimizations LTO Link time optimization (regular/full LTO or ThinLTO) labels Jun 17, 2026
@llvmorg-github-actions

Copy link
Copy Markdown

@llvm/pr-subscribers-clang-codegen

Author: Mircea Trofin (mtrofin)

Changes

This reverts #201194, thus relanding PR #184065 (and #200323).

The CFI issues are fixed by #201370, together with the addressing of the TODOs in LowerTypeTests.cpp left in the latter. The graphite diff between this change's V1 and V2 shows what's been added:

  • the TODOs from #201370 are done
  • in LowerTypeTests.cpp, passing !guid when creating a new declaration and when converting a definition to a declaration.
  • llvm/test/Transforms/LowerTypeTests/export-icall.ll tests also the above def->decl conversion

Currently, we reassign GUIDs when CFI promotes internal linkage symbols, which is counter to the goal of the RFC. This is addressed in PR #203171. The reason for this split fix can be explained on compiler-rt/test/cfi/icall/wrong-signature-mixed-lto.c. Here, a module with the exact same source path is compiled twice, under different conditional compilation, to produce 2 objects. Each object defines an internal linkage symbol with the same name (this is install_trap_loop_detection from compiler-rt/test/cfi/trap_loop_signal_handler.inc which is -include-d by both - see how %clang_cfi is defined). The ThinLTO GUID of this symbol will be the same. Its name won't be - because CFI promotes it and renames it using a hash that is based on the IR Module content (rather than the source path). During thinlink, LTO::addThinLTOwill mark each of the 2 exported symbols as prevailing in their corresponding modules. But that is done by associating their GUID to the module. So whichever comes last wins. The other symbol will be marked available externally and its body DCEd later in backend. But each module will refer to its copy of install_trap_loop_detection, and so we end up with a linker error.

As mentioned, the fix is in PR #203171, and this relanding PR just maintains the existing ThinLTO behavior by rewriting the GUIDs. Since we haven't yet leveraged the GUID mechanics for e.g. simplifying PGO, this aspect of this change is essentially NFC.


Patch is 166.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/201849.diff

122 Files Affected:

  • (modified) clang/lib/CodeGen/CGCUDANV.cpp (+5-1)
  • (modified) clang/test/CodeGen/cfi-icall-trap-recover-runtime.c (+12-12)
  • (modified) clang/test/CodeGen/lto-newpm-pipeline.c (+3)
  • (modified) clang/test/CodeGenCXX/cfi-vcall-trap-recover-runtime.cpp (+6-6)
  • (modified) lld/test/ELF/lto/devirt_vcall_vis_export_dynamic.ll (+6-6)
  • (modified) lld/test/ELF/lto/devirt_vcall_vis_public.ll (+3-3)
  • (modified) lld/test/ELF/lto/devirt_vcall_vis_shared_def.ll (+6-6)
  • (modified) llvm/include/llvm/Analysis/CtxProfAnalysis.h (+5-31)
  • (modified) llvm/include/llvm/Bitcode/BitcodeReader.h (+3-1)
  • (modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+3)
  • (modified) llvm/include/llvm/IR/GlobalObject.h (+1-1)
  • (modified) llvm/include/llvm/IR/GlobalValue.h (+45-5)
  • (modified) llvm/include/llvm/IR/Module.h (+28-3)
  • (modified) llvm/include/llvm/IR/ModuleSummaryIndex.h (+11-3)
  • (modified) llvm/include/llvm/LTO/LTO.h (+20)
  • (added) llvm/include/llvm/Transforms/Utils/AssignGUID.h (+49)
  • (modified) llvm/lib/Analysis/CtxProfAnalysis.cpp (+3-39)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+5-1)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp (+1)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+61-17)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+49-3)
  • (modified) llvm/lib/CodeGen/GlobalMerge.cpp (+12-8)
  • (modified) llvm/lib/IR/Globals.cpp (+64-2)
  • (modified) llvm/lib/LTO/LTO.cpp (+45-30)
  • (modified) llvm/lib/LTO/LTOBackend.cpp (+8-2)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+7-3)
  • (modified) llvm/lib/Transforms/IPO/ConstantMerge.cpp (+4-4)
  • (modified) llvm/lib/Transforms/IPO/FunctionImport.cpp (+21-9)
  • (modified) llvm/lib/Transforms/IPO/LowerTypeTests.cpp (+9-5)
  • (modified) llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp (+6-4)
  • (modified) llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp (+1)
  • (modified) llvm/lib/Transforms/Instrumentation/PGOCtxProfFlattening.cpp (+2-2)
  • (modified) llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp (+1-2)
  • (modified) llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp (+3-4)
  • (added) llvm/lib/Transforms/Utils/AssignGUID.cpp (+41)
  • (modified) llvm/lib/Transforms/Utils/CMakeLists.txt (+1)
  • (modified) llvm/lib/Transforms/Utils/CallPromotionUtils.cpp (+2-3)
  • (modified) llvm/lib/Transforms/Utils/CloneModule.cpp (+5-1)
  • (modified) llvm/lib/Transforms/Utils/FunctionImportUtils.cpp (+2-2)
  • (modified) llvm/lib/Transforms/Utils/InlineFunction.cpp (+2-2)
  • (modified) llvm/test/Assembler/index-value-order.ll (+6-3)
  • (modified) llvm/test/Bitcode/thinlto-alias.ll (+1)
  • (modified) llvm/test/Bitcode/thinlto-function-summary-callgraph-partial-sample-profile-summary.ll (+3-2)
  • (modified) llvm/test/Bitcode/thinlto-function-summary-callgraph-pgo.ll (+1)
  • (modified) llvm/test/Bitcode/thinlto-function-summary-callgraph-profile-summary.ll (+3-2)
  • (modified) llvm/test/Bitcode/thinlto-function-summary-callgraph-sample-profile-summary.ll (+3-2)
  • (modified) llvm/test/Bitcode/thinlto-function-summary-callgraph.ll (+1)
  • (modified) llvm/test/Bitcode/thinlto-function-summary-refgraph.ll (+1)
  • (modified) llvm/test/Bitcode/thinlto-function-summary.ll (+1)
  • (modified) llvm/test/CodeGen/X86/fat-lto-section.ll (+1-1)
  • (modified) llvm/test/LTO/Resolution/X86/not-prevailing-alias.ll (+1-1)
  • (modified) llvm/test/LTO/Resolution/X86/not-prevailing-weak-aliasee.ll (+1-1)
  • (modified) llvm/test/Linker/funcimport2.ll (+1-1)
  • (modified) llvm/test/Other/new-pm-O0-defaults.ll (+2-1)
  • (modified) llvm/test/Other/new-pm-defaults.ll (+2-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (+2-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+2-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+1)
  • (modified) llvm/test/ThinLTO/AArch64/aarch64_inline.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/Inputs/cache-typeid-resolutions1.ll (+2-1)
  • (modified) llvm/test/ThinLTO/X86/Inputs/cache-typeid-resolutions2.ll (+4-2)
  • (modified) llvm/test/ThinLTO/X86/Inputs/cache-typeid-resolutions3.ll (+8-4)
  • (modified) llvm/test/ThinLTO/X86/ctor-dtor-alias.ll (+2-2)
  • (modified) llvm/test/ThinLTO/X86/ctor-dtor-alias2.ll (+2-2)
  • (modified) llvm/test/ThinLTO/X86/deadstrip.ll (+4-4)
  • (modified) llvm/test/ThinLTO/X86/devirt_function_alias.ll (+3-3)
  • (modified) llvm/test/ThinLTO/X86/devirt_function_alias2.ll (+3-3)
  • (modified) llvm/test/ThinLTO/X86/devirt_pure_virtual_base.ll (+3-3)
  • (modified) llvm/test/ThinLTO/X86/devirt_vcall_vis_public.ll (+3-3)
  • (modified) llvm/test/ThinLTO/X86/distributed_import.ll (+2-2)
  • (modified) llvm/test/ThinLTO/X86/funcattrs-prop-exported-internal.ll (+2-2)
  • (modified) llvm/test/ThinLTO/X86/funcattrs-prop-unknown.ll (+3-3)
  • (modified) llvm/test/ThinLTO/X86/funcattrs-prop-weak.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/globals-import.ll (+2-2)
  • (modified) llvm/test/ThinLTO/X86/hidden-escaped-symbols-alt.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/hidden-escaped-symbols.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/import-ro-constant.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/index-const-prop-alias.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/index-const-prop.ll (+3-3)
  • (modified) llvm/test/ThinLTO/X86/linkonce_resolution_comdat.ll (+8-8)
  • (modified) llvm/test/ThinLTO/X86/memprof-dups.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/memprof_callee_type_mismatch.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/memprof_imported_internal.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/memprof_imported_internal2.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/prevailing_weak_globals_import.ll (+1-1)
  • (modified) llvm/test/ThinLTO/X86/visibility-elf.ll (+6-6)
  • (modified) llvm/test/ThinLTO/X86/visibility-macho.ll (+3-3)
  • (modified) llvm/test/ThinLTO/X86/weak_resolution.ll (+2-2)
  • (modified) llvm/test/ThinLTO/X86/windows-vftable.ll (+2-2)
  • (modified) llvm/test/ThinLTO/X86/writeonly.ll (+2-2)
  • (added) llvm/test/Transforms/AssignGUID/assign_guid.ll (+18)
  • (modified) llvm/test/Transforms/ConstantMerge/merge-dbg.ll (+1)
  • (modified) llvm/test/Transforms/EmbedBitcode/embed-wpd.ll (+1-1)
  • (modified) llvm/test/Transforms/EmbedBitcode/embed.ll (+7-6)
  • (modified) llvm/test/Transforms/FunctionImport/funcimport-debug-retained-nodes.ll (+2-1)
  • (modified) llvm/test/Transforms/FunctionImport/funcimport.ll (+12-12)
  • (added) llvm/test/Transforms/GlobalMerge/guid.ll (+38)
  • (modified) llvm/test/Transforms/LowerTypeTests/cfi-icall-alias.ll (+1-1)
  • (modified) llvm/test/Transforms/LowerTypeTests/export-icall.ll (+17-10)
  • (modified) llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll (+2-2)
  • (modified) llvm/test/Transforms/PhaseOrdering/speculative-devirt-then-inliner.ll (+1-1)
  • (modified) llvm/test/Transforms/SampleProfile/ctxsplit.ll (+4-4)
  • (modified) llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal-typeid.ll (+1-1)
  • (modified) llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal1.ll (+1-1)
  • (modified) llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll (+3-2)
  • (modified) llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll (+1-1)
  • (modified) llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc.ll (+8-8)
  • (modified) llvm/test/Transforms/ThinLTOBitcodeWriter/split.ll (+2-2)
  • (modified) llvm/test/Transforms/ThinLTOBitcodeWriter/unsplittable.ll (+1-1)
  • (modified) llvm/test/Transforms/WholeProgramDevirt/branch-funnel-profile.ll (+22-22)
  • (modified) llvm/test/Transforms/WholeProgramDevirt/export-single-impl.ll (+2-2)
  • (modified) llvm/test/Transforms/WholeProgramDevirt/export-vcp.ll (+3-3)
  • (modified) llvm/test/tools/gold/X86/devirt_vcall_vis_export_dynamic.ll (+2-2)
  • (modified) llvm/test/tools/gold/X86/devirt_vcall_vis_public.ll (+1-1)
  • (modified) llvm/test/tools/gold/X86/devirt_vcall_vis_shared_def.ll (+2-2)
  • (modified) llvm/test/tools/gold/X86/thinlto_weak_library.ll (+1-1)
  • (modified) llvm/test/tools/gold/X86/thinlto_weak_resolution.ll (+2-2)
  • (modified) llvm/test/tools/gold/X86/v1.16/devirt_vcall_vis_export_dynamic.ll (+1-1)
  • (modified) llvm/tools/llvm-link/llvm-link.cpp (+2-1)
  • (modified) llvm/tools/opt/NewPMDriver.cpp (+8)
  • (modified) llvm/tools/opt/optdriver.cpp (+7)
diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 17b1963684428..23856c98d36ab 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -25,6 +25,7 @@
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/GlobalValue.h"
 #include "llvm/IR/ReplaceConstant.h"
 #include "llvm/ProfileData/InstrProf.h"
 #include "llvm/Support/Format.h"
@@ -1044,7 +1045,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
     // Generate a unique module ID.
     SmallString<64> ModuleID;
     llvm::raw_svector_ostream OS(ModuleID);
-    OS << ModuleIDPrefix << llvm::format("%" PRIx64, FatbinWrapper->getGUID());
+    OS << ModuleIDPrefix
+       << llvm::format("%" PRIx64,
+                       llvm::GlobalValue::getGUIDAssumingExternalLinkage(
+                           FatbinWrapper->getName()));
     llvm::Constant *ModuleIDConstant = makeConstantArray(
         std::string(ModuleID), "", ModuleIDSectionName, 32, /*AddNull=*/true);
 
diff --git a/clang/test/CodeGen/cfi-icall-trap-recover-runtime.c b/clang/test/CodeGen/cfi-icall-trap-recover-runtime.c
index 2c44842f9d28e..5717fc66488af 100644
--- a/clang/test/CodeGen/cfi-icall-trap-recover-runtime.c
+++ b/clang/test/CodeGen/cfi-icall-trap-recover-runtime.c
@@ -15,32 +15,32 @@
 
 
 // TRAP-LABEL: define hidden void @f(
-// TRAP-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]] {
+// TRAP-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]]
 // TRAP-NEXT:  [[ENTRY:.*:]]
 // TRAP-NEXT:    ret void
 //
 // ABORT-LABEL: define hidden void @f(
-// ABORT-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]] {
+// ABORT-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]]
 // ABORT-NEXT:  [[ENTRY:.*:]]
 // ABORT-NEXT:    ret void
 //
 // RECOVER-LABEL: define hidden void @f(
-// RECOVER-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]] {
+// RECOVER-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]]
 // RECOVER-NEXT:  [[ENTRY:.*:]]
 // RECOVER-NEXT:    ret void
 //
 // ABORT_MIN-LABEL: define hidden void @f(
-// ABORT_MIN-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]] {
+// ABORT_MIN-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]]
 // ABORT_MIN-NEXT:  [[ENTRY:.*:]]
 // ABORT_MIN-NEXT:    ret void
 //
 // RECOVER_MIN-LABEL: define hidden void @f(
-// RECOVER_MIN-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]] {
+// RECOVER_MIN-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]]
 // RECOVER_MIN-NEXT:  [[ENTRY:.*:]]
 // RECOVER_MIN-NEXT:    ret void
 //
 // PRESERVE_MIN-LABEL: define hidden void @f(
-// PRESERVE_MIN-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]] {
+// PRESERVE_MIN-SAME: ) #[[ATTR0:[0-9]+]] !type [[META6:![0-9]+]] !type [[META7:![0-9]+]]
 // PRESERVE_MIN-NEXT:  [[ENTRY:.*:]]
 // PRESERVE_MIN-NEXT:    ret void
 //
@@ -50,7 +50,7 @@ void f() {
 void xf();
 
 // TRAP-LABEL: define hidden void @g(
-// TRAP-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]] {
+// TRAP-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]]
 // TRAP-NEXT:  [[ENTRY:.*:]]
 // TRAP-NEXT:    [[B_ADDR:%.*]] = alloca i32, align 4
 // TRAP-NEXT:    [[FP:%.*]] = alloca ptr, align 8
@@ -71,7 +71,7 @@ void xf();
 // TRAP-NEXT:    ret void
 //
 // ABORT-LABEL: define hidden void @g(
-// ABORT-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]] {
+// ABORT-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]]
 // ABORT-NEXT:  [[ENTRY:.*:]]
 // ABORT-NEXT:    [[B_ADDR:%.*]] = alloca i32, align 4
 // ABORT-NEXT:    [[FP:%.*]] = alloca ptr, align 8
@@ -93,7 +93,7 @@ void xf();
 // ABORT-NEXT:    ret void
 //
 // RECOVER-LABEL: define hidden void @g(
-// RECOVER-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]] {
+// RECOVER-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]]
 // RECOVER-NEXT:  [[ENTRY:.*:]]
 // RECOVER-NEXT:    [[B_ADDR:%.*]] = alloca i32, align 4
 // RECOVER-NEXT:    [[FP:%.*]] = alloca ptr, align 8
@@ -115,7 +115,7 @@ void xf();
 // RECOVER-NEXT:    ret void
 //
 // ABORT_MIN-LABEL: define hidden void @g(
-// ABORT_MIN-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]] {
+// ABORT_MIN-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]]
 // ABORT_MIN-NEXT:  [[ENTRY:.*:]]
 // ABORT_MIN-NEXT:    [[B_ADDR:%.*]] = alloca i32, align 4
 // ABORT_MIN-NEXT:    [[FP:%.*]] = alloca ptr, align 8
@@ -136,7 +136,7 @@ void xf();
 // ABORT_MIN-NEXT:    ret void
 //
 // RECOVER_MIN-LABEL: define hidden void @g(
-// RECOVER_MIN-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]] {
+// RECOVER_MIN-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]]
 // RECOVER_MIN-NEXT:  [[ENTRY:.*:]]
 // RECOVER_MIN-NEXT:    [[B_ADDR:%.*]] = alloca i32, align 4
 // RECOVER_MIN-NEXT:    [[FP:%.*]] = alloca ptr, align 8
@@ -157,7 +157,7 @@ void xf();
 // RECOVER_MIN-NEXT:    ret void
 //
 // PRESERVE_MIN-LABEL: define hidden void @g(
-// PRESERVE_MIN-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]] {
+// PRESERVE_MIN-SAME: i32 noundef [[B:%.*]]) #[[ATTR0]] !type [[META8:![0-9]+]] !type [[META9:![0-9]+]]
 // PRESERVE_MIN-NEXT:  [[ENTRY:.*:]]
 // PRESERVE_MIN-NEXT:    [[B_ADDR:%.*]] = alloca i32, align 4
 // PRESERVE_MIN-NEXT:    [[FP:%.*]] = alloca ptr, align 8
diff --git a/clang/test/CodeGen/lto-newpm-pipeline.c b/clang/test/CodeGen/lto-newpm-pipeline.c
index ea9784a76f923..c8ee5e949ae87 100644
--- a/clang/test/CodeGen/lto-newpm-pipeline.c
+++ b/clang/test/CodeGen/lto-newpm-pipeline.c
@@ -34,6 +34,7 @@
 // CHECK-FULL-O0-NEXT: Running pass: CoroConditionalWrapper
 // CHECK-FULL-O0-NEXT: Running pass: CanonicalizeAliasesPass
 // CHECK-FULL-O0-NEXT: Running pass: NameAnonGlobalPass
+// CHECK-FULL-O0-NEXT: Running pass: AssignGUIDPass
 // CHECK-FULL-O0-NEXT: Running pass: AnnotationRemarksPass
 // CHECK-FULL-O0-NEXT: Running analysis: TargetLibraryAnalysis
 // CHECK-FULL-O0-NEXT: Running pass: VerifierPass
@@ -48,6 +49,7 @@
 // CHECK-THIN-O0-NEXT: Running pass: CoroConditionalWrapper
 // CHECK-THIN-O0-NEXT: Running pass: CanonicalizeAliasesPass
 // CHECK-THIN-O0-NEXT: Running pass: NameAnonGlobalPass
+// CHECK-THIN-O0-NEXT: Running pass: AssignGUIDPass
 // CHECK-THIN-O0-NEXT: Running pass: AnnotationRemarksPass
 // CHECK-THIN-O0-NEXT: Running analysis: TargetLibraryAnalysis
 // CHECK-THIN-O0-NEXT: Running pass: VerifierPass
@@ -64,6 +66,7 @@
 // CHECK-THIN-OPTIMIZED-NOT: Running pass: LoopVectorizePass
 // CHECK-THIN-OPTIMIZED: Running pass: CanonicalizeAliasesPass
 // CHECK-THIN-OPTIMIZED: Running pass: NameAnonGlobalPass
+// CHECK-THIN-OPTIMIZED: Running pass: AssignGUIDPass
 // CHECK-THIN-OPTIMIZED: Running pass: ThinLTOBitcodeWriterPass
 
 void Foo(void) {}
diff --git a/clang/test/CodeGenCXX/cfi-vcall-trap-recover-runtime.cpp b/clang/test/CodeGenCXX/cfi-vcall-trap-recover-runtime.cpp
index 2451d31e9a489..a1e1563a4f38d 100644
--- a/clang/test/CodeGenCXX/cfi-vcall-trap-recover-runtime.cpp
+++ b/clang/test/CodeGenCXX/cfi-vcall-trap-recover-runtime.cpp
@@ -19,7 +19,7 @@ struct S1 {
 };
 
 // TRAP-LABEL: define hidden void @_Z3s1fP2S1(
-// TRAP-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]] {
+// TRAP-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]]
 // TRAP-NEXT:  [[ENTRY:.*:]]
 // TRAP-NEXT:    [[S1_ADDR:%.*]] = alloca ptr, align 8
 // TRAP-NEXT:    store ptr [[S1]], ptr [[S1_ADDR]], align 8
@@ -37,7 +37,7 @@ struct S1 {
 // TRAP-NEXT:    ret void
 //
 // ABORT-LABEL: define hidden void @_Z3s1fP2S1(
-// ABORT-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]] {
+// ABORT-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]]
 // ABORT-NEXT:  [[ENTRY:.*:]]
 // ABORT-NEXT:    [[S1_ADDR:%.*]] = alloca ptr, align 8
 // ABORT-NEXT:    store ptr [[S1]], ptr [[S1_ADDR]], align 8
@@ -58,7 +58,7 @@ struct S1 {
 // ABORT-NEXT:    ret void
 //
 // RECOVER-LABEL: define hidden void @_Z3s1fP2S1(
-// RECOVER-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]] {
+// RECOVER-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]]
 // RECOVER-NEXT:  [[ENTRY:.*:]]
 // RECOVER-NEXT:    [[S1_ADDR:%.*]] = alloca ptr, align 8
 // RECOVER-NEXT:    store ptr [[S1]], ptr [[S1_ADDR]], align 8
@@ -79,7 +79,7 @@ struct S1 {
 // RECOVER-NEXT:    ret void
 //
 // ABORT_MIN-LABEL: define hidden void @_Z3s1fP2S1(
-// ABORT_MIN-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]] {
+// ABORT_MIN-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]]
 // ABORT_MIN-NEXT:  [[ENTRY:.*:]]
 // ABORT_MIN-NEXT:    [[S1_ADDR:%.*]] = alloca ptr, align 8
 // ABORT_MIN-NEXT:    store ptr [[S1]], ptr [[S1_ADDR]], align 8
@@ -98,7 +98,7 @@ struct S1 {
 // ABORT_MIN-NEXT:    ret void
 //
 // RECOVER_MIN-LABEL: define hidden void @_Z3s1fP2S1(
-// RECOVER_MIN-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]] {
+// RECOVER_MIN-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]]
 // RECOVER_MIN-NEXT:  [[ENTRY:.*:]]
 // RECOVER_MIN-NEXT:    [[S1_ADDR:%.*]] = alloca ptr, align 8
 // RECOVER_MIN-NEXT:    store ptr [[S1]], ptr [[S1_ADDR]], align 8
@@ -117,7 +117,7 @@ struct S1 {
 // RECOVER_MIN-NEXT:    ret void
 //
 // PRESERVE_MIN-LABEL: define hidden void @_Z3s1fP2S1(
-// PRESERVE_MIN-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]] {
+// PRESERVE_MIN-SAME: ptr noundef [[S1:%.*]]) #[[ATTR0:[0-9]+]]
 // PRESERVE_MIN-NEXT:  [[ENTRY:.*:]]
 // PRESERVE_MIN-NEXT:    [[S1_ADDR:%.*]] = alloca ptr, align 8
 // PRESERVE_MIN-NEXT:    store ptr [[S1]], ptr [[S1_ADDR]], align 8
diff --git a/lld/test/ELF/lto/devirt_vcall_vis_export_dynamic.ll b/lld/test/ELF/lto/devirt_vcall_vis_export_dynamic.ll
index bcb92a1beb17b..9b9c7891a6da6 100644
--- a/lld/test/ELF/lto/devirt_vcall_vis_export_dynamic.ll
+++ b/lld/test/ELF/lto/devirt_vcall_vis_export_dynamic.ll
@@ -5,7 +5,7 @@
 
 ;; Index based WPD
 ;; Generate unsplit module with summary for ThinLTO index-based WPD.
-; RUN: opt --thinlto-bc -o %t2.o %s
+; RUN: opt --passes=assign-guid --thinlto-bc -o %t2.o %s
 ; RUN: ld.lld %t2.o -o %t3 -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck %s --check-prefix=REMARK
 ; RUN: llvm-dis %t2.o.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR
@@ -16,13 +16,13 @@
 
 ;; Hybrid WPD
 ;; Generate split module with summary for hybrid Thin/Regular LTO WPD.
-; RUN: opt --thinlto-bc --thinlto-split-lto-unit -o %t.o %s
+; RUN: opt --passes=assign-guid --thinlto-bc --thinlto-split-lto-unit -o %t.o %s
 ; RUN: ld.lld %t.o -o %t3 -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck %s --check-prefix=REMARK
 ; RUN: llvm-dis %t.o.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR
 
 ;; Regular LTO WPD
-; RUN: opt -o %t4.o %s
+; RUN: opt --passes=assign-guid -o %t4.o %s
 ; RUN: ld.lld %t4.o -o %t3 -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck %s --check-prefix=REMARK
 ; RUN: llvm-dis %t3.0.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR
@@ -107,19 +107,19 @@
 ;; preemption, even without any options.
 
 ;; Index based WPD
-; RUN: opt -relocation-model=pic --thinlto-bc -o %t5.o %s
+; RUN: opt --passes=assign-guid -relocation-model=pic --thinlto-bc -o %t5.o %s
 ; RUN: ld.lld %t5.o -o %t5.so -shared
 ; RUN: ld.lld %t5.o %t5.so -o %t5 -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck /dev/null --implicit-check-not single-impl --allow-empty
 
 ;; Hybrid WPD
-; RUN: opt -relocation-model=pic --thinlto-bc --thinlto-split-lto-unit -o %t5.o %s
+; RUN: opt --passes=assign-guid -relocation-model=pic --thinlto-bc --thinlto-split-lto-unit -o %t5.o %s
 ; RUN: ld.lld %t5.o -o %t5.so -shared
 ; RUN: ld.lld %t5.o %t5.so -o %t5 -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck /dev/null --implicit-check-not single-impl --allow-empty
 
 ;; Regular LTO WPD
-; RUN: opt -relocation-model=pic -o %t5.o %s
+; RUN: opt --passes=assign-guid -relocation-model=pic -o %t5.o %s
 ; RUN: ld.lld %t5.o -o %t5.so -shared
 ; RUN: ld.lld %t5.o %t5.so -o %t5 -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck /dev/null --implicit-check-not single-impl --allow-empty
diff --git a/lld/test/ELF/lto/devirt_vcall_vis_public.ll b/lld/test/ELF/lto/devirt_vcall_vis_public.ll
index a827fea465fd7..0030e5804af81 100644
--- a/lld/test/ELF/lto/devirt_vcall_vis_public.ll
+++ b/lld/test/ELF/lto/devirt_vcall_vis_public.ll
@@ -3,20 +3,20 @@
 
 ;; Index based WPD
 ;; Generate unsplit module with summary for ThinLTO index-based WPD.
-; RUN: opt --thinlto-bc -o %t2.o %s
+; RUN: opt --passes=assign-guid --thinlto-bc -o %t2.o %s
 ; RUN: ld.lld %t2.o -o %t3 -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck %s --check-prefix=REMARK
 ; RUN: llvm-dis %t2.o.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR
 
 ;; Hybrid WPD
 ;; Generate split module with summary for hybrid Thin/Regular LTO WPD.
-; RUN: opt --thinlto-bc --thinlto-split-lto-unit -o %t.o %s
+; RUN: opt --passes=assign-guid --thinlto-bc --thinlto-split-lto-unit -o %t.o %s
 ; RUN: ld.lld %t.o -o %t3 -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck %s --check-prefix=REMARK
 ; RUN: llvm-dis %t.o.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR
 
 ;; Regular LTO WPD
-; RUN: opt -o %t4.o %s
+; RUN: opt --passes=assign-guid -o %t4.o %s
 ; RUN: ld.lld %t4.o -o %t3 -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck %s --check-prefix=REMARK
 ; RUN: llvm-dis %t3.0.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR
diff --git a/lld/test/ELF/lto/devirt_vcall_vis_shared_def.ll b/lld/test/ELF/lto/devirt_vcall_vis_shared_def.ll
index a61e290bb0eb1..b77dde97a2c05 100644
--- a/lld/test/ELF/lto/devirt_vcall_vis_shared_def.ll
+++ b/lld/test/ELF/lto/devirt_vcall_vis_shared_def.ll
@@ -6,23 +6,23 @@
 
 ;; Index based WPD
 ;; Generate unsplit module with summary for ThinLTO index-based WPD.
-; RUN: opt --thinlto-bc -o %t1a.o %s
-; RUN: opt --thinlto-bc -o %t2a.o %S/Inputs/devirt_vcall_vis_shared_def.ll
+; RUN: opt --passes=assign-guid --thinlto-bc -o %t1a.o %s
+; RUN: opt --passes=assign-guid --thinlto-bc -o %t2a.o %S/Inputs/devirt_vcall_vis_shared_def.ll
 ; RUN: ld.lld %t1a.o %t2a.o -o %t3a -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck %s --check-prefix=REMARK
 ; RUN: llvm-dis %t1a.o.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR
 
 ;; Hybrid WPD
 ;; Generate split module with summary for hybrid Thin/Regular LTO WPD.
-; RUN: opt --thinlto-bc --thinlto-split-lto-unit -o %t1b.o %s
-; RUN: opt --thinlto-bc --thinlto-split-lto-unit -o %t2b.o %S/Inputs/devirt_vcall_vis_shared_def.ll
+; RUN: opt --passes=assign-guid --thinlto-bc --thinlto-split-lto-unit -o %t1b.o %s
+; RUN: opt --passes=assign-guid --thinlto-bc --thinlto-split-lto-unit -o %t2b.o %S/Inputs/devirt_vcall_vis_shared_def.ll
 ; RUN: ld.lld %t1b.o %t2b.o -o %t3b -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck %s --check-prefix=REMARK
 ; RUN: llvm-dis %t1b.o.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR
 
 ;; Regular LTO WPD
-; RUN: opt -o %t1c.o %s
-; RUN: opt -o %t2c.o %S/Inputs/devirt_vcall_vis_shared_def.ll
+; RUN: opt --passes=assign-guid -o %t1c.o %s
+; RUN: opt --passes=assign-guid -o %t2c.o %S/Inputs/devirt_vcall_vis_shared_def.ll
 ; RUN: ld.lld %t1c.o %t2c.o -o %t3c -save-temps --lto-whole-program-visibility \
 ; RUN:   -mllvm -pass-remarks=. 2>&1 | FileCheck %s --check-prefix=REMARK
 ; RUN: llvm-dis %t3c.0.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR
diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 8260b95026ad2..d2a1c07bd58e4 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -47,9 +47,6 @@ class PGOContextualProfile {
   // we'll need when we maintain the profiles during IPO transformations.
   std::map<GlobalValue::GUID, FunctionInfo> FuncInfo;
 
-  /// Get the GUID of this Function if it's defined in this module.
-  LLVM_ABI GlobalValue::GUID getDefinedFunctionGUID(const Function &F) const;
-
   // This is meant to be constructed from CtxProfAnalysis, which will also set
   // its state piecemeal.
   PGOContextualProfile() = default;
@@ -68,9 +65,7 @@ class PGOContextualProfile {
 
   LLVM_ABI bool isInSpecializedModule() const;
 
-  bool isFunctionKnown(const Function &F) const {
-    return getDefinedFunctionGUID(F) != 0;
-  }
+  bool isFunctionKnown(const Function &F) const { return !F.isDeclaration(); }
 
   StringRef getFunctionName(GlobalValue::GUID GUID) const {
     auto It = FuncInfo.find(GUID);
@@ -81,22 +76,22 @@ class PGOContextualProfile {
 
   uint32_t getNumCounters(const Function &F) const {
     assert(isFunctionKnown(F));
-    return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCounterIndex;
+    return FuncInfo.find(F.getGUID())->second.NextCounterIndex;
   }
 
   uint32_t getNumCallsites(const Function &F) const {
     assert(isFunctionKnown(F));
-    return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex;
+    return FuncInfo.find(F.getGUID())->second.NextCallsiteIndex;
   }
 
   uint32_t allocateNextCounterIndex(const Function &F) {
     assert(isFunctionKnown(F));
-    return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCounterIndex++;
+    return FuncInfo.find(F.getGUID())->second.NextCounterIndex++;
   }
 
   uint32_t allocateNextCallsiteIndex(const Function &F) {
     assert(isFunctionKnown(F));
-    return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++;
+    return FuncInfo.find(F.getGUID())->second.NextCallsiteIndex++;
   }
 
   using ConstVisitor = function_ref<void(const PGOCtxProfContext &)>;
@@ -187,26 +182,5 @@ class ProfileAnnotator {
   LLVM_ABI ~ProfileAnnotator();
 };
 
-/// Assign a GUID to functions as metadata. GUID calculation takes linkage into
-/// account, which may change especially through and after thinlto. By
-/// pre-computing and assigning as metadata, this mechanism is resilient to such
-/// changes (as well as name changes e.g. suffix ".llvm." additions).
-
-// FIXME(mtrofin): we can generalize this mechanism to calculate a GUID early in
-// the pass pipeline, associate it with any Global Value, and then use it for
-// PGO and ThinLTO.
-// At that point, this should be moved elsewhere.
-class AssignGUIDPass : public OptionalPassInfoMixin<AssignGUIDPass> {
-public:
-  explicit AssignGUIDPass() = default;
-
-  /// Assign a GUID *if* one is not already assign, as a function metadata named
-  /// `GUIDMetadataName`.
-  LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
-  LLVM_ABI static const char *GUIDMetadataName;
-  // This should become GlobalValue::getGUID
-  LLVM_ABI static uint64_t getGUID(const Function &F);
-};
-
 } // namespace llvm
 #endif // LLVM_ANALYSIS_CTXPROFANALYSIS_H
diff --git a/llvm/include/llvm/Bitcode/BitcodeReader.h b/llvm/include/llvm/Bitcode/BitcodeReader.h
index 772ca82019278..b7cab51857f0b 100644
--- a/llvm/include/llvm/Bitcode/BitcodeReader.h
+++ b/llvm/include/llvm/Bitcode/BitcodeReader.h
@@ -17,6 +17,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Bitstream/BitCodeEnums.h"
 #include "llvm/IR/GlobalValue.h"
+#include "llvm/IR/ModuleSummaryIndex.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/Endian.h"
 #include "llvm/Support/Error.h"
@@ -165,7 +166,8 @@ struct ParserCallbacks {
     /// into CombinedIndex.
     LLVM_ABI Error
     readSummary(ModuleSummaryIndex &CombinedIndex, StringRef ModulePath,
-                std::function<bool(GlobalValue::GUID)> IsPrevailing = nullptr);
+                std::function<bool(StringRef)> IsPrevailing = nullptr,
+                std::function<void(ValueInfo)> OnValueInfo = nullptr);
   };
 
   struct BitcodeFileContents {
diff --git a/llvm/include/llvm/Bitcode/LLVMBi...
[truncated]

@efriedma-quic

Copy link
Copy Markdown
Contributor

Please fix the title of the PR: it should explain what the change is doing for anyone who doesn't know what "#184065" is off the top of their head.

@mtrofin mtrofin requested a review from orodley June 17, 2026 19:33
@mtrofin mtrofin changed the title Reland #184065 Reland #184065 : Compute GUIDs once and store in metadata Jun 17, 2026
@mtrofin mtrofin changed the title Reland #184065 : Compute GUIDs once and store in metadata Reland #184065: Compute GUIDs once and store in metadata Jun 17, 2026
@mtrofin mtrofin changed the title Reland #184065: Compute GUIDs once and store in metadata Reland "Compute GUIDs once and store in metadata" (#184065) Jun 17, 2026
@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch from 525046e to fa7961a Compare June 17, 2026 21:36
@mtrofin mtrofin force-pushed the users/mtrofin/06-17-_coro_handle_aliases branch from 7834ddf to 8cce117 Compare June 17, 2026 21:36
@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch from fa7961a to 1817d11 Compare June 18, 2026 04:14

mtrofin commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

@pektezol - please see my notes about removing the test introduced in #194383. Does it make sense?

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch from 1817d11 to 2829ab3 Compare June 18, 2026 05:19
@pektezol

Copy link
Copy Markdown
Contributor

@pektezol - please see my notes about removing the test introduced in #194383. Does it make sense?

@mtrofin LGTM

@nikic

nikic commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Does this also address the compile-time regressions from the previous implementation?

}

if (!ShouldCloneDefinition(&I)) {
CopyMD();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this line create a blowup of unnecessary copies?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - fixed. Thanks!

Base automatically changed from users/mtrofin/06-17-_coro_handle_aliases to main June 18, 2026 14:34

mtrofin commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

I looked more into it but didn't update the other thread.

IIRC the remaining surprise was the clang build regressing by 0.15%, because for the single-TU compilations, the regression was attributable to modules with lots of global values but not so much IR, where the bitcode serialization of the extra GUID table would start showing up.

I'm actually not able to repro the clang build - I figured in the meantime that the "instructions" link in the clang-build part of the report gives me a per-module compilation change, and I found a module (tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ModuleLinker.cpp.o) for which we regress like 30% (!) - but when I repro locally and perf stat -e instructions -r 10 <clang> I see maybe a 0.1% change.

What's the cmake you use for building the host clang, and what's the cmake for the build target?

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch 2 times, most recently from a6fdde1 to 4da2c32 Compare June 23, 2026 01:23

mtrofin commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Turns out the clang build regression was in large due to some #includes in the change that weren't necessary. The 30% regression when compiling ModuleLinker goes away or reappears when either old or new compiler (i.e. pre/post this change) compiles the old or the new source.

Fixed and re-ran the compile time tracker, the largest clang-build regression is now 1%.

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch from 4da2c32 to 4bf5379 Compare June 23, 2026 01:28
@boomanaiden154

Copy link
Copy Markdown
Contributor

The stage1-ReleaseThinLTO results still don't look great and it seems like the geomean for the clang build is about the same?

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch from 4bf5379 to 1cbfe8b Compare June 23, 2026 02:06

mtrofin commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

The stage1-ReleaseThinLTO results still don't look great and it seems like the geomean for the clang build is about the same?

Note that some compile time performance effect in thinlto builds should be expected: we serialize/deserialize more. I think the main concern was "is there something else".

re. stage1 - yes, see my earlier response ("for the single-TU compilations, the regression was attributable to modules with lots of global values but not so much IR, where the bitcode serialization of the extra GUID table would start showing up.")

Re. clang-build, there's little change in the overall build timing (I don't think it's a geomean) because the individual compilation outliers like ModuleLinker were few and their negative contribution likely lost, before, in the overall parallel compilation part of the build. The rest is very likely for the same reason (more serialization) seen in the benchmarks builds, as at most we see compilation regressions within the same relative ranges for individual modules.

@nikic

nikic commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

To clarify, the regression I was concerned about is the one on bin/clang-23, i.e. the one during thin linking, not the one on individual files (pre-link).

mtrofin commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Ah, ok. About that one: perf stat -e instructions -r 10 actually shows ~6% improvement with the new change; when I look at cyclesI see a 1% improvement. Wallclock average showed the same. But, the wallclock measurements are actually pretty noisy - as much as 1.31% over the 10 reps.

Looking at the profile itself, (perf record -e instructions):

Before:

+   13.23%    13.23%  ld.lld          lld                   [.] llvm::MD5::body(llvm::ArrayRef<unsigne
+   12.83%    12.82%  ld.lld          lld                   [.] llvm::SimpleBitstreamCursor::Read(unsi
+   10.19%    10.19%  ld.lld          lld                   [.] llvm::SimpleBitstreamCursor::ReadVBR64
+    6.54%     0.00%  ld.lld          [unknown]             [.] 0000000000000000
+    4.64%     4.64%  ld.lld          lld                   [.] llvm::BitstreamCursor::readRecord(unsi
+    2.11%     2.11%  ld.lld          lld                   [.] llvm::xxh3_64bits(unsigned char const*

After:

+   15.02%    15.02%  ld.lld          lld                   [.] llvm::SimpleBitstreamCursor::Read(unsi
+   10.64%    10.64%  ld.lld          lld                   [.] llvm::SimpleBitstreamCursor::ReadVBR64
+    8.56%     0.00%  ld.lld          [unknown]             [.] 0000000000000000
+    6.48%     6.48%  ld.lld          lld                   [.] llvm::BitstreamCursor::readRecord(unsi
+    3.19%     3.19%  ld.lld          lld                   [.] llvm::MD5::body(llvm::ArrayRef<unsigne
+    2.96%     2.96%  ld.lld          lld                   [.] llvm::xxh3_64bits(unsigned char const*

This lines up with the "we serialize more" (and so we deserialize during linking).

Not sure what to make of the perf stat results difference. Perhaps cold runs -> cold caches -> patch is deserialization heavy -> deserialization (of the GUIDs) costs more than MD5 hashing, but then (hot caches) "the tables flip".

@teresajohnson teresajohnson left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mtrofin mtrofin force-pushed the users/mtrofin/06-05-reland_184065 branch from 1cbfe8b to 55df97c Compare June 30, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. lld:ELF lld llvm:analysis Includes value tracking, cost tables and constant folding llvm:codegen llvm:ir llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO) PGO Profile Guided Optimizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants