You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Finalize the Layer 3 plan, add env-driven final-module and LLVM IR capture hooks to vecadd, and validate the harness locally so the next phase can move straight to CUDA 12.9+ smoke testing.
Co-authored-by: OpenAI Codex <codex@openai.com>
@@ -557,7 +557,7 @@ Each "Layer" ends in a known-good state with a single git commit. Within a layer
557
557
-`src/llvm.rs` is either unchanged except for a few high-value wrappers, or changed in a way that clearly reduces Rust-side LLVM-version leakage.
558
558
- No speculative wrappers remain in the diff "just in case".
559
559
- Both feature settings still pass `cargo check`.
560
-
- The next work item after Layer 2 is still Layer 3 file-by-file cfg surgery or, if no Rust-side wrapper actually proved necessary, a direct move into the smallest Layer 3 callsite work.
560
+
- The next work item after Layer 2 is Layer 3 pre-smoke reconciliation and failure-driven triage, not a speculative file-by-file cfg sweep.
561
561
562
562
**Commit strategy for Layer 2:**
563
563
- Prefer one small commit per sublayer (`2b`, `2c`, `2d`) only when there is real code to land.
@@ -567,85 +567,252 @@ Each "Layer" ends in a known-good state with a single git commit. Within a layer
567
567
568
568
---
569
569
570
-
### Layer 3 — Rust-side cfg surgery, file by file
570
+
### Layer 3 — Pre-Smoke Reconciliation And Edit Map
571
571
572
-
**Goal:** All Rust source files compile cleanly under both feature settings. Behavioral correctness on the LLVM 19 path is NOT guaranteed yet — that's Layers 5-6.
-`cuda_builder` definitely invokes the current backend via `-Zcodegen-backend=<path to rustc_codegen_nvvm>`.
577
+
- The generated PTX on this host still reports `Based on NVVM 7.0.1`, which means this CUDA 12.8 machine can prove compile success but cannot prove the intended CUDA 12.9+ LLVM-19-bitcode acceptance story.
578
+
-`examples/vecadd/build.rs` now has env-driven capture hooks for the first smoke:
- Therefore the old "edit files until `cargo check` turns green" version of Layer 3 is obsolete, and the repo-side pre-smoke work is now complete.
585
+
586
+
**Goal:** Enter the first CUDA 12.9+ smoke run with:
587
+
- a locked compile/build baseline,
588
+
- an explicit capture path for the final LLVM module and PTX,
589
+
- a ranked list of Rust files to patch if smoke fails,
590
+
- and an explicit list of files that are *not* on the first-smoke hot path.
591
+
592
+
**Working principle:** do not proactively sweep the Rust tree adding `#[cfg(feature = "llvm19")]` branches. Only edit a file if one of the following is true:
593
+
- the file still contains an explicit LLVM-7-era workaround that Layer 3 has now identified as a likely LLVM 19 fast path,
594
+
- the first CUDA 12.9+ smoke failure points directly at it,
595
+
- or a validation artifact (final module, PTX, backtrace, verifier error) makes the mismatch concrete.
596
+
597
+
#### Layer 3a — Baseline Lock And Consumer Proof
598
+
599
+
**Status (2026-04-14):** Complete.
573
600
574
-
**Working principle:**for each file in the list below, read upstream's current version, ask "does any code here need to behave differently on LLVM 19?", and add localized cfg branches. **Do not** copy June's version of the file. **Do not** restructure upstream's code.
601
+
**Goal:**Prove the backend builds, and prove that a real GPU consumer (`vecadd`) can compile through the current LLVM 19 path before making any more Rust changes.
-[x] Confirm `cuda_builder` drives `rustc_codegen_nvvm` through `-Zcodegen-backend=...`
608
+
-[x] Inspect generated PTX and record what this host can and cannot prove
577
609
578
-
#### 3a. `src/back.rs` — TargetMachine factory call site
579
-
- The single callsite already goes through `create_target_machine(TargetMachineConfig)`. Only extend the wrapper if runtime validation proves we need more explicit LLVM 19 knobs than the current shim-preserved surface exposes.
580
-
- Build the `TargetMachineConfig` struct with sensible defaults for the new v19 fields (`FloatABI::Default`, no split debug file, no debug compression, no emulated TLS).
581
-
- Verify upstream's `temp_path_for_cgu` → `temp_path` rename mentioned in explorer is not actually a v19 thing (it's a nightly-2026-04-02 thing). Skip.
582
-
-~20 lines changed. Smallest hot file. Good warm-up.
610
+
**Findings to carry forward:**
611
+
-`vecadd` compiles successfully through the current backend path.
- On this CUDA 12.8 host, the PTX header still says `Based on NVVM 7.0.1`.
617
+
- Practical consequence: this machine is good enough to prove the current LLVM 19 backend can compile a real kernel crate, but it is still the wrong host for answering the CUDA 12.9+ / Blackwell acceptance question.
583
618
584
-
#### 3b. `src/lto.rs` — ThinLTO buffer creation
585
-
-The current Rust-side ThinLTO path does **not** call `LLVMRustThinLTOBufferCreate`. `ModuleBuffer::new(..., is_thin)` now explicitly documents that it still serializes full-module bitcode and that the ThinLTO-specific shim APIs remain unwired on the Rust side.
586
-
-The old plan's `is_thin=true, emit_summary=true` note was based on a stale candidate API shape and should not drive new wrapper work by itself.
587
-
-~5 lines changed.
619
+
**Validation checkpoint for 3a:**
620
+
-Backend builds pass under both feature settings.
621
+
-A real GPU crate builds through the LLVM 19 path.
622
+
-No additional Rust source edits are required just to keep the build green.
- The debug-location callsites already go through `set_current_debug_location(builder, context, Option<&DILocation>)`, and the DIBuilder function/variable creation callsites already go through named Rust helpers.
591
-
- Verify the `DIBuilderCreateFunction` callsites still have a clean containment point for any future bool → `DISPFlags` conversion, whether that lives in Layer `2c` or stays in the C++ shim.
592
-
- Verify dwarf op widths (i64 vs u64 from explorer report) are absorbed by the wrapper or are local enough to cfg-branch in place.
593
-
-~30 lines changed across this file.
624
+
**Suggested commit message:**`chore(llvm19): lock post-Layer-2 baseline and vecadd compile proof`
594
625
595
-
#### 3d. `src/debug_info/metadata.rs` and `src/debug_info/metadata/enums.rs`
596
-
- Similar treatment. If Layer `2c` does not grow Rust-side DI wrappers, keep this work local and minimal. Any leftovers (DIBuilder schema differences in struct/enum metadata creation) get local `#[cfg]` branches.
597
-
-~20 lines combined.
626
+
#### Layer 3b — Close Stale Compile-Only Work Items
598
627
599
-
#### 3e. `src/intrinsic.rs` — the high-stakes file
600
-
-**DO NOT** rewrite intrinsic dispatch. Upstream's dispatch is correct.
601
-
- Identify the *specific* places where LLVM 19 enables a code path that LLVM 7 cannot:
- i128 intrinsics that LLVM 19 has natively (`llvm.ctlz.i128`, `llvm.cttz.i128`, `llvm.bswap.i128`, `llvm.bitreverse.i128`, `llvm.fshl.i128`, `llvm.fshr.i128`, `llvm.ctpop.i128`) — on LLVM 19, call them directly instead of falling through to the emulation in `builder/emulate_i128.rs`. On LLVM 7, do nothing different.
604
-
- Saturating ops: same — LLVM 19 has them, LLVM 7 doesn't. Skip the reimplementation when feature is on.
605
-
-`abort()` callsite already goes through the existing call helper surface; only change it if later wrapper cleanup in Layer `2b` actually alters that surface.
606
-
-`type_test` stub for CFI — confirm whether upstream already handles this or if we need a stub.
607
-
- Each of these is a `#[cfg(feature = "llvm19")]` branch *inside* upstream's existing function bodies, not a rewrite.
608
-
-~50-100 lines changed across this file. The largest single Rust file in this layer.
628
+
**Status (2026-04-14):** Complete.
609
629
610
-
#### 3f. `src/builder.rs` — opaque-pointer adjustments and call sites
611
-
- Most call-site changes can be absorbed by Layer 2 helper cleanup if we choose to add it; otherwise keep the current `LLVMRustBuildCall` / `LLVMRustBuildCall2` split and only touch the specific call sites that need adjustment.
612
-
- Any opaque-pointer-related GEP or load/store callsites that LLVM 19 requires `Type*` for need cfg branches.
613
-
-~30 lines changed.
630
+
**Goal:** Remove outdated Layer 3 candidates that no longer need proactive Rust work.
614
631
615
-
#### 3g. `src/builder/emulate_i128.rs` — guard against double emulation
616
-
- Upstream already has the full emulation. On LLVM 19, several of its routines should short-circuit to the native LLVM 19 intrinsic.
617
-
- Add a top-of-function `#[cfg(feature = "llvm19")]` early-return that calls `LLVMBuildCall` with the native intrinsic name, falling through to the emulation only on LLVM 7.
618
-
-~20-50 lines changed.
632
+
**Checklist:**
633
+
-[x] Mark `src/back.rs` as no longer a hot Layer 3 file for compile-path reasons:
634
+
- it already goes through `create_target_machine(TargetMachineConfig)`
635
+
-[x] Mark `src/lto.rs` as no longer a hot Layer 3 file for compile-path reasons:
636
+
- Rust does not currently call `LLVMRustThinLTOBufferCreate`
637
+
-`ModuleBuffer::new(..., is_thin)` already documents the current behavior
638
+
-[x] Mark `src/debug_info/mod.rs` and `src/debug_info/metadata.rs` as wrapped but low priority for the first smoke:
639
+
- the relevant positional DIBuilder callsites are already contained
640
+
-`vecadd`'s default `CudaBuilder` path is not a debug-info-heavy workload
641
+
-[x] Mark `src/init.rs` as *not* requiring an immediate Rust-side cfg branch:
642
+
-`LLVMInitializePasses()` is already version-gated in the C++ shim
643
+
-[x] Mark `src/abi.rs` and `src/ctx_intrinsics.rs` as low priority for the first smoke:
644
+
- upstream already carries the relevant f16/f128 work
645
+
-`cuda_builder` currently forces `CARGO_FEATURE_NO_F16_F128=1` for the GPU build path
646
+
-[x] Lock the real first-smoke Rust hot files:
647
+
-`src/intrinsic.rs`
648
+
-`src/builder.rs`
649
+
-`src/builder/emulate_i128.rs`
650
+
651
+
**Validation checkpoint for 3b:**
652
+
- The pending edit list is smaller and evidence-driven.
653
+
- No file remains on the hot path purely because the older plan predicted it might matter.
654
+
655
+
**Suggested commit message:**`docs(llvm19): trim stale Layer 3 compile-only work`
656
+
657
+
#### Layer 3c — First CUDA 12.9+ Smoke Harness
658
+
659
+
**Status (2026-04-14):** Complete on the repo side; ready to run on the correct host.
660
+
661
+
**Goal:** Make the first CUDA 12.9+ `vecadd` smoke run maximally informative so that any later Rust edits are driven by concrete evidence.
662
+
663
+
**Repo work completed in this layer:**
664
+
-[x] Added env-driven capture hooks to `examples/vecadd/build.rs`:
-[ ] Capture knobs are already implemented in `examples/vecadd/build.rs`; enable them with:
690
+
```bash
691
+
export RUST_CUDA_DUMP_FINAL_MODULE=1
692
+
export RUST_CUDA_EMIT_LLVM_IR=1 # optional
693
+
```
694
+
-[ ] Run:
695
+
```bash
696
+
cargo build -p vecadd
697
+
```
698
+
-[ ] Capture:
699
+
- full stderr/stdout,
700
+
- copied PTX,
701
+
- the final module dumped by `CudaBuilder`,
702
+
- and any `cargo` JSON diagnostics if the build fails
703
+
-[ ] If runtime is available, also run:
704
+
```bash
705
+
cargo run -p vecadd
706
+
```
707
+
and capture runtime stderr / backtrace / driver errors
619
708
620
-
#### 3h. `src/abi.rs` — verify, do not edit
621
-
- The explorer flagged this as HOT, but on closer reading: upstream already added f16/f128 support and PassMode::Cast for align(16) ADTs. June's version was *trying to do* what upstream now does. **No changes expected here.** If `cargo check --features llvm19` flags errors in this file, first ask whether they are really an FFI containment problem before adding Layer 3 cfg branches.
#### 3i. `src/ctx_intrinsics.rs` — verify, do not edit
624
-
- Same situation. Upstream removed i128 remapping and added f16/f128 entries. Should compile under both feature settings without changes.
716
+
**Validation checkpoint for 3c:**
717
+
- The repository now has a reproducible capture harness for the first CUDA 12.9+ smoke.
718
+
- The capture hooks were validated locally on this CUDA 12.8 host.
719
+
- Actual CUDA 12.9+ execution is deferred to Layer 5, which is where the smoke outcome itself gets classified.
625
720
626
-
#### 3j. `src/init.rs` — pass registration
627
-
- The legacy pass registration calls (`LLVMInitializeCore`, `LLVMInitializeCodeGen`, etc.) are LLVM-7-specific. On LLVM 19 with the new pass manager, this becomes a no-op or a `PassBuilder::registerXxx()` call.
628
-
-`#[cfg]`-branch the body of the init function. ~20 lines.
721
+
**Suggested commit message:**`chore(llvm19): capture first CUDA 12.9 smoke artifacts`
629
722
630
-
#### 3k. Everything else listed in the explorer's TRIVIAL bucket
0 commit comments