Skip to content

Commit 9591558

Browse files
brandonroscodex
andcommitted
chore(llvm19): close out Layer 3 pre-smoke work
Finalize the Layer 3 plan, add env-driven final-module and LLVM IR capture hooks to vecadd, and validate the harness locally so the next phase can move straight to CUDA 12.9+ smoke testing. Co-authored-by: OpenAI Codex <codex@openai.com>
1 parent 3bd0c43 commit 9591558

2 files changed

Lines changed: 243 additions & 63 deletions

File tree

LLVM19_PLAN.md

Lines changed: 226 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -557,7 +557,7 @@ Each "Layer" ends in a known-good state with a single git commit. Within a layer
557557
- `src/llvm.rs` is either unchanged except for a few high-value wrappers, or changed in a way that clearly reduces Rust-side LLVM-version leakage.
558558
- No speculative wrappers remain in the diff "just in case".
559559
- Both feature settings still pass `cargo check`.
560-
- The next work item after Layer 2 is still Layer 3 file-by-file cfg surgery or, if no Rust-side wrapper actually proved necessary, a direct move into the smallest Layer 3 callsite work.
560+
- The next work item after Layer 2 is Layer 3 pre-smoke reconciliation and failure-driven triage, not a speculative file-by-file cfg sweep.
561561

562562
**Commit strategy for Layer 2:**
563563
- Prefer one small commit per sublayer (`2b`, `2c`, `2d`) only when there is real code to land.
@@ -567,85 +567,252 @@ Each "Layer" ends in a known-good state with a single git commit. Within a layer
567567

568568
---
569569

570-
### Layer 3 — Rust-side cfg surgery, file by file
570+
### Layer 3 — Pre-Smoke Reconciliation And Edit Map
571571

572-
**Goal:** All Rust source files compile cleanly under both feature settings. Behavioral correctness on the LLVM 19 path is NOT guaranteed yet — that's Layers 5-6.
572+
**Status (2026-04-14):** Complete.
573+
- `CARGO_TARGET_DIR=/tmp/rust-cuda-layer3-baseline-v7 cargo build -p rustc_codegen_nvvm --no-default-features` passes.
574+
- `CARGO_TARGET_DIR=/tmp/rust-cuda-layer3-baseline-v19 LLVM_CONFIG_19=/usr/bin/llvm-config-19 cargo build -p rustc_codegen_nvvm` passes.
575+
- `CARGO_TARGET_DIR=/tmp/rust-cuda-layer3-vecadd-v19 LLVM_CONFIG_19=/usr/bin/llvm-config-19 cargo build -p vecadd` passes.
576+
- `cuda_builder` definitely invokes the current backend via `-Zcodegen-backend=<path to rustc_codegen_nvvm>`.
577+
- The generated PTX on this host still reports `Based on NVVM 7.0.1`, which means this CUDA 12.8 machine can prove compile success but cannot prove the intended CUDA 12.9+ LLVM-19-bitcode acceptance story.
578+
- `examples/vecadd/build.rs` now has env-driven capture hooks for the first smoke:
579+
- `RUST_CUDA_DUMP_FINAL_MODULE=1`
580+
- `RUST_CUDA_EMIT_LLVM_IR=1`
581+
- Those hooks were locally validated with:
582+
- `CARGO_TARGET_DIR=/tmp/rust-cuda-layer3-harness-v19 LLVM_CONFIG_19=/usr/bin/llvm-config-19 RUST_CUDA_DUMP_FINAL_MODULE=1 cargo build -p vecadd`
583+
- `CARGO_TARGET_DIR=/tmp/rust-cuda-layer3-harness-v19-ir LLVM_CONFIG_19=/usr/bin/llvm-config-19 RUST_CUDA_DUMP_FINAL_MODULE=1 RUST_CUDA_EMIT_LLVM_IR=1 cargo build -p vecadd`
584+
- Therefore the old "edit files until `cargo check` turns green" version of Layer 3 is obsolete, and the repo-side pre-smoke work is now complete.
585+
586+
**Goal:** Enter the first CUDA 12.9+ smoke run with:
587+
- a locked compile/build baseline,
588+
- an explicit capture path for the final LLVM module and PTX,
589+
- a ranked list of Rust files to patch if smoke fails,
590+
- and an explicit list of files that are *not* on the first-smoke hot path.
591+
592+
**Working principle:** do not proactively sweep the Rust tree adding `#[cfg(feature = "llvm19")]` branches. Only edit a file if one of the following is true:
593+
- the file still contains an explicit LLVM-7-era workaround that Layer 3 has now identified as a likely LLVM 19 fast path,
594+
- the first CUDA 12.9+ smoke failure points directly at it,
595+
- or a validation artifact (final module, PTX, backtrace, verifier error) makes the mismatch concrete.
596+
597+
#### Layer 3a — Baseline Lock And Consumer Proof
598+
599+
**Status (2026-04-14):** Complete.
573600

574-
**Working principle:** for each file in the list below, read upstream's current version, ask "does any code here need to behave differently on LLVM 19?", and add localized cfg branches. **Do not** copy June's version of the file. **Do not** restructure upstream's code.
601+
**Goal:** Prove the backend builds, and prove that a real GPU consumer (`vecadd`) can compile through the current LLVM 19 path before making any more Rust changes.
575602

576-
**Files in execution order:**
603+
**Checklist:**
604+
- [x] `cargo build -p rustc_codegen_nvvm --no-default-features`
605+
- [x] `LLVM_CONFIG_19=/usr/bin/llvm-config-19 cargo build -p rustc_codegen_nvvm`
606+
- [x] `LLVM_CONFIG_19=/usr/bin/llvm-config-19 cargo build -p vecadd`
607+
- [x] Confirm `cuda_builder` drives `rustc_codegen_nvvm` through `-Zcodegen-backend=...`
608+
- [x] Inspect generated PTX and record what this host can and cannot prove
577609

578-
#### 3a. `src/back.rs` — TargetMachine factory call site
579-
- The single callsite already goes through `create_target_machine(TargetMachineConfig)`. Only extend the wrapper if runtime validation proves we need more explicit LLVM 19 knobs than the current shim-preserved surface exposes.
580-
- Build the `TargetMachineConfig` struct with sensible defaults for the new v19 fields (`FloatABI::Default`, no split debug file, no debug compression, no emulated TLS).
581-
- Verify upstream's `temp_path_for_cgu``temp_path` rename mentioned in explorer is not actually a v19 thing (it's a nightly-2026-04-02 thing). Skip.
582-
- ~20 lines changed. Smallest hot file. Good warm-up.
610+
**Findings to carry forward:**
611+
- `vecadd` compiles successfully through the current backend path.
612+
- The copied PTX lives at:
613+
- `/tmp/rust-cuda-layer3-vecadd-v19/debug/build/vecadd-0550a17a21893201/out/kernels.ptx`
614+
- The underlying `cuda-builder` target PTX lives at:
615+
- `/tmp/rust-cuda-layer3-vecadd-v19/cuda-builder/nvptx64-nvidia-cuda/release/vecadd_kernels.ptx`
616+
- On this CUDA 12.8 host, the PTX header still says `Based on NVVM 7.0.1`.
617+
- Practical consequence: this machine is good enough to prove the current LLVM 19 backend can compile a real kernel crate, but it is still the wrong host for answering the CUDA 12.9+ / Blackwell acceptance question.
583618

584-
#### 3b. `src/lto.rs` — ThinLTO buffer creation
585-
- The current Rust-side ThinLTO path does **not** call `LLVMRustThinLTOBufferCreate`. `ModuleBuffer::new(..., is_thin)` now explicitly documents that it still serializes full-module bitcode and that the ThinLTO-specific shim APIs remain unwired on the Rust side.
586-
- The old plan's `is_thin=true, emit_summary=true` note was based on a stale candidate API shape and should not drive new wrapper work by itself.
587-
- ~5 lines changed.
619+
**Validation checkpoint for 3a:**
620+
- Backend builds pass under both feature settings.
621+
- A real GPU crate builds through the LLVM 19 path.
622+
- No additional Rust source edits are required just to keep the build green.
588623

589-
#### 3c. `src/debug_info/mod.rs` — debug location setter
590-
- The debug-location callsites already go through `set_current_debug_location(builder, context, Option<&DILocation>)`, and the DIBuilder function/variable creation callsites already go through named Rust helpers.
591-
- Verify the `DIBuilderCreateFunction` callsites still have a clean containment point for any future bool → `DISPFlags` conversion, whether that lives in Layer `2c` or stays in the C++ shim.
592-
- Verify dwarf op widths (i64 vs u64 from explorer report) are absorbed by the wrapper or are local enough to cfg-branch in place.
593-
- ~30 lines changed across this file.
624+
**Suggested commit message:** `chore(llvm19): lock post-Layer-2 baseline and vecadd compile proof`
594625

595-
#### 3d. `src/debug_info/metadata.rs` and `src/debug_info/metadata/enums.rs`
596-
- Similar treatment. If Layer `2c` does not grow Rust-side DI wrappers, keep this work local and minimal. Any leftovers (DIBuilder schema differences in struct/enum metadata creation) get local `#[cfg]` branches.
597-
- ~20 lines combined.
626+
#### Layer 3b — Close Stale Compile-Only Work Items
598627

599-
#### 3e. `src/intrinsic.rs` — the high-stakes file
600-
- **DO NOT** rewrite intrinsic dispatch. Upstream's dispatch is correct.
601-
- Identify the *specific* places where LLVM 19 enables a code path that LLVM 7 cannot:
602-
- `volatile_load` opaque-pointer rewrite (small, local)
603-
- i128 intrinsics that LLVM 19 has natively (`llvm.ctlz.i128`, `llvm.cttz.i128`, `llvm.bswap.i128`, `llvm.bitreverse.i128`, `llvm.fshl.i128`, `llvm.fshr.i128`, `llvm.ctpop.i128`) — on LLVM 19, call them directly instead of falling through to the emulation in `builder/emulate_i128.rs`. On LLVM 7, do nothing different.
604-
- Saturating ops: same — LLVM 19 has them, LLVM 7 doesn't. Skip the reimplementation when feature is on.
605-
- `abort()` callsite already goes through the existing call helper surface; only change it if later wrapper cleanup in Layer `2b` actually alters that surface.
606-
- `type_test` stub for CFI — confirm whether upstream already handles this or if we need a stub.
607-
- Each of these is a `#[cfg(feature = "llvm19")]` branch *inside* upstream's existing function bodies, not a rewrite.
608-
- ~50-100 lines changed across this file. The largest single Rust file in this layer.
628+
**Status (2026-04-14):** Complete.
609629

610-
#### 3f. `src/builder.rs` — opaque-pointer adjustments and call sites
611-
- Most call-site changes can be absorbed by Layer 2 helper cleanup if we choose to add it; otherwise keep the current `LLVMRustBuildCall` / `LLVMRustBuildCall2` split and only touch the specific call sites that need adjustment.
612-
- Any opaque-pointer-related GEP or load/store callsites that LLVM 19 requires `Type*` for need cfg branches.
613-
- ~30 lines changed.
630+
**Goal:** Remove outdated Layer 3 candidates that no longer need proactive Rust work.
614631

615-
#### 3g. `src/builder/emulate_i128.rs` — guard against double emulation
616-
- Upstream already has the full emulation. On LLVM 19, several of its routines should short-circuit to the native LLVM 19 intrinsic.
617-
- Add a top-of-function `#[cfg(feature = "llvm19")]` early-return that calls `LLVMBuildCall` with the native intrinsic name, falling through to the emulation only on LLVM 7.
618-
- ~20-50 lines changed.
632+
**Checklist:**
633+
- [x] Mark `src/back.rs` as no longer a hot Layer 3 file for compile-path reasons:
634+
- it already goes through `create_target_machine(TargetMachineConfig)`
635+
- [x] Mark `src/lto.rs` as no longer a hot Layer 3 file for compile-path reasons:
636+
- Rust does not currently call `LLVMRustThinLTOBufferCreate`
637+
- `ModuleBuffer::new(..., is_thin)` already documents the current behavior
638+
- [x] Mark `src/debug_info/mod.rs` and `src/debug_info/metadata.rs` as wrapped but low priority for the first smoke:
639+
- the relevant positional DIBuilder callsites are already contained
640+
- `vecadd`'s default `CudaBuilder` path is not a debug-info-heavy workload
641+
- [x] Mark `src/init.rs` as *not* requiring an immediate Rust-side cfg branch:
642+
- `LLVMInitializePasses()` is already version-gated in the C++ shim
643+
- [x] Mark `src/abi.rs` and `src/ctx_intrinsics.rs` as low priority for the first smoke:
644+
- upstream already carries the relevant f16/f128 work
645+
- `cuda_builder` currently forces `CARGO_FEATURE_NO_F16_F128=1` for the GPU build path
646+
- [x] Lock the real first-smoke Rust hot files:
647+
- `src/intrinsic.rs`
648+
- `src/builder.rs`
649+
- `src/builder/emulate_i128.rs`
650+
651+
**Validation checkpoint for 3b:**
652+
- The pending edit list is smaller and evidence-driven.
653+
- No file remains on the hot path purely because the older plan predicted it might matter.
654+
655+
**Suggested commit message:** `docs(llvm19): trim stale Layer 3 compile-only work`
656+
657+
#### Layer 3c — First CUDA 12.9+ Smoke Harness
658+
659+
**Status (2026-04-14):** Complete on the repo side; ready to run on the correct host.
660+
661+
**Goal:** Make the first CUDA 12.9+ `vecadd` smoke run maximally informative so that any later Rust edits are driven by concrete evidence.
662+
663+
**Repo work completed in this layer:**
664+
- [x] Added env-driven capture hooks to `examples/vecadd/build.rs`:
665+
- `RUST_CUDA_DUMP_FINAL_MODULE=1` enables `.final_module_path(out_path.join("final-module.ll"))`
666+
- `RUST_CUDA_EMIT_LLVM_IR=1` enables `.emit_llvm_ir(true)`
667+
- [x] Added `cargo::rerun-if-env-changed` hooks for both capture env vars.
668+
- [x] Validated final-module capture on this host:
669+
- `/tmp/rust-cuda-layer3-harness-v19/debug/build/vecadd-0550a17a21893201/out/final-module.ll`
670+
- [x] Validated LLVM-IR emission on this host:
671+
- `/tmp/rust-cuda-layer3-harness-v19-ir/cuda-builder/nvptx64-nvidia-cuda/release/deps/vecadd_kernels.ll`
672+
- `/tmp/rust-cuda-layer3-harness-v19-ir/debug/build/vecadd-0550a17a21893201/out/final-module.ll`
673+
674+
**Exact checklist for the next host:**
675+
- [ ] Run on a host with:
676+
- CUDA Toolkit `12.9+`
677+
- visible NVIDIA driver runtime (`libcuda.so.1`)
678+
- LLVM 19 toolchain reachable via `LLVM_CONFIG_19`
679+
- [ ] Export:
680+
```bash
681+
export LLVM_CONFIG_19=/path/to/llvm19/bin/llvm-config
682+
export CUDA_HOME=/usr/local/cuda-12.9
683+
export LD_LIBRARY_PATH=$CUDA_HOME/nvvm/lib64:$LD_LIBRARY_PATH
684+
```
685+
- [ ] Use a fresh target dir:
686+
```bash
687+
export CARGO_TARGET_DIR=/tmp/rust-cuda-layer3-smoke-v19
688+
```
689+
- [ ] Capture knobs are already implemented in `examples/vecadd/build.rs`; enable them with:
690+
```bash
691+
export RUST_CUDA_DUMP_FINAL_MODULE=1
692+
export RUST_CUDA_EMIT_LLVM_IR=1 # optional
693+
```
694+
- [ ] Run:
695+
```bash
696+
cargo build -p vecadd
697+
```
698+
- [ ] Capture:
699+
- full stderr/stdout,
700+
- copied PTX,
701+
- the final module dumped by `CudaBuilder`,
702+
- and any `cargo` JSON diagnostics if the build fails
703+
- [ ] If runtime is available, also run:
704+
```bash
705+
cargo run -p vecadd
706+
```
707+
and capture runtime stderr / backtrace / driver errors
619708

620-
#### 3h. `src/abi.rs` — verify, do not edit
621-
- The explorer flagged this as HOT, but on closer reading: upstream already added f16/f128 support and PassMode::Cast for align(16) ADTs. June's version was *trying to do* what upstream now does. **No changes expected here.** If `cargo check --features llvm19` flags errors in this file, first ask whether they are really an FFI containment problem before adding Layer 3 cfg branches.
709+
**Expected output classification:**
710+
- [ ] Success
711+
- [ ] Backend crash inside `librustc_codegen_nvvm.so`
712+
- [ ] `libnvvm` rejects the module
713+
- [ ] PTX is emitted but invalid / suspicious
714+
- [ ] PTX loads fail at runtime
622715

623-
#### 3i. `src/ctx_intrinsics.rs` — verify, do not edit
624-
- Same situation. Upstream removed i128 remapping and added f16/f128 entries. Should compile under both feature settings without changes.
716+
**Validation checkpoint for 3c:**
717+
- The repository now has a reproducible capture harness for the first CUDA 12.9+ smoke.
718+
- The capture hooks were validated locally on this CUDA 12.8 host.
719+
- Actual CUDA 12.9+ execution is deferred to Layer 5, which is where the smoke outcome itself gets classified.
625720

626-
#### 3j. `src/init.rs` — pass registration
627-
- The legacy pass registration calls (`LLVMInitializeCore`, `LLVMInitializeCodeGen`, etc.) are LLVM-7-specific. On LLVM 19 with the new pass manager, this becomes a no-op or a `PassBuilder::registerXxx()` call.
628-
- `#[cfg]`-branch the body of the init function. ~20 lines.
721+
**Suggested commit message:** `chore(llvm19): capture first CUDA 12.9 smoke artifacts`
629722

630-
#### 3k. Everything else listed in the explorer's TRIVIAL bucket
631-
- `attributes.rs`, `consts.rs`, `const_ty.rs`, `ty.rs`, `target.rs`, `link.rs`, `mono_item.rs`, `override_fns.rs`, `asm.rs`, `allocator.rs`, `int_replace.rs`
632-
- These need either zero changes or single-line cfg branches. Defer until `cargo check` complains; don't proactively touch.
723+
#### Layer 3d — Ranked Rust Intervention Map
633724

634-
**Validation between sub-steps:**
635-
- After each sub-step, `cargo check --features llvm19` should make incremental progress (fewer errors than before).
636-
- After each sub-step, `cargo check --no-default-features` must remain clean — never break the LLVM 7 path.
725+
**Status (2026-04-14):** Complete.
637726

638-
**Commit message strategy:**
639-
- One commit per sub-step (3a, 3b, ...) for clean bisectability. Easy to revert one without losing the rest.
640-
- Or one big commit if the sub-steps all land in one sitting.
727+
**Goal:** If the first smoke fails, edit the smallest, highest-signal Rust file first.
728+
729+
**Priority order and entry criteria:**
730+
731+
1. `src/intrinsic.rs`
732+
Trigger:
733+
- smoke or final-module inspection implicates explicit LLVM-7-era intrinsic workarounds
734+
- especially the existing `TODO(@LegNeato): LLVM 7.1 doesn't have ...` branches
735+
Planned changes:
736+
- LLVM 19-native fast paths for:
737+
- `llvm.ctlz.i128`
738+
- `llvm.cttz.i128`
739+
- `llvm.ctpop.i128`
740+
- `llvm.bswap.i128`
741+
- `llvm.bitreverse.i128`
742+
- `llvm.fshl.i128`
743+
- `llvm.fshr.i128`
744+
- LLVM 19-native saturating integer intrinsics where available
745+
Guardrail:
746+
- do this as local `#[cfg(feature = "llvm19")]` branches inside the existing dispatch, not as a rewrite
747+
748+
2. `src/builder/emulate_i128.rs`
749+
Trigger:
750+
- LLVM 19 is still falling into emulation for paths that should be native
751+
- or `intrinsic.rs` grows LLVM 19 fast paths and now needs shared helpers or early-outs
752+
Planned changes:
753+
- short-circuit the specific emulation routines that should no longer run on LLVM 19
754+
Guardrail:
755+
- do not refactor the whole emulation module; only fence off the now-obsolete LLVM 7 path where necessary
756+
757+
3. `src/builder.rs`
758+
Trigger:
759+
- verifier output, final-module inspection, or backend crashes implicate typed-pointer / opaque-pointer mismatches
760+
- especially around:
761+
- `load` / `volatile_load`
762+
- local atomic fallback loads
763+
- `LLVMBuildGEP2` / `LLVMBuildInBoundsGEP2`
764+
- call construction sites that still depend on typed-pointer assumptions
765+
Planned changes:
766+
- local cfg branches or helper use to satisfy LLVM 19's stricter pointer/type expectations
767+
Guardrail:
768+
- touch only the exact sites implicated by the smoke artifacts
769+
770+
4. `src/debug_info/mod.rs`, `src/debug_info/metadata.rs`, `src/debug_info/metadata/enums.rs`
771+
Trigger:
772+
- only if the smoke run is done with debug info enabled or `libnvvm` / verifier diagnostics point at metadata schema drift
773+
Planned changes:
774+
- local metadata schema fixes only
775+
Guardrail:
776+
- not on the first default `vecadd` hot path
777+
778+
5. `src/init.rs`, `src/lto.rs`, `src/back.rs`
779+
Trigger:
780+
- only if the failure directly implicates pass initialization, ThinLTO setup, or target-machine option plumbing
781+
Current assessment:
782+
- low probability for the first smoke because these areas are already shimmed or explicitly documented
783+
784+
**Validation checkpoint for 3d:**
785+
- Any Rust edit after Layer 3 starts maps to one of the ranked buckets above.
786+
- We can explain *why* that file moved from dormant to hot.
787+
788+
**Suggested commit message:** `docs(llvm19): rank first-smoke Rust fix buckets`
789+
790+
**Layer 3 done means:**
791+
- The post-Layer-2 compile/build baseline is locked and recorded.
792+
- A real consumer build (`vecadd`) has been proven on the current host.
793+
- The first CUDA 12.9+ smoke command set is written down with capture instructions.
794+
- The repo-side smoke harness is implemented and locally validated.
795+
- The first-smoke Rust edit order is explicit and ranked.
796+
- No speculative file-sweep changes remain justified.
797+
798+
**Commit strategy for Layer 3:**
799+
- Prefer docs-only or harness-only checkpoints until the first CUDA 12.9+ smoke failure exists.
800+
- Once smoke starts producing concrete failures, use one commit per affected hot file or failure bucket.
801+
- Do not land opportunistic edits to dormant files just to "get ahead" of possible runtime issues.
641802

642803
---
643804

644-
### Layer 4 — Get `cargo check --features llvm19` fully green
805+
### Layer 4 — Keep The Build Baseline Green
806+
807+
**Status (2026-04-14):** Baseline already green for backend builds.
808+
- `cargo check -p rustc_codegen_nvvm --no-default-features` passes
809+
- `LLVM_CONFIG_19=/usr/bin/llvm-config-19 cargo check -p rustc_codegen_nvvm` passes
810+
- `cargo build -p rustc_codegen_nvvm --no-default-features` passes
811+
- `LLVM_CONFIG_19=/usr/bin/llvm-config-19 cargo build -p rustc_codegen_nvvm` passes
645812

646-
**Goal:** Both `cargo check --features llvm19` and `cargo check --no-default-features` pass with zero errors and minimal warnings.
813+
**Goal:** Preserve those compile/build baselines while Layer 3/5/6 land smoke-driven fixes.
647814

648-
This isn't a "layer" so much as an iteration loop. Errors will surface in unexpected files. For each:
815+
This is now a maintenance loop. If a later edit breaks the baseline:
649816

650817
1. **Read the error in context.** Don't guess.
651818
2. **Decide: is this an FFI shim gap, a missed cfg branch, or a genuine LLVM 19 API difference we missed?**

0 commit comments

Comments
 (0)