Skip to content

Support typed NVVM IR for pre-Blackwell libNVVM targets#101

Open
mohamedsamirx wants to merge 1 commit into
NVlabs:mainfrom
mohamedsamirx:typed-nvvm-ir-pre-blackwell
Open

Support typed NVVM IR for pre-Blackwell libNVVM targets#101
mohamedsamirx wants to merge 1 commit into
NVlabs:mainfrom
mohamedsamirx:typed-nvvm-ir-pre-blackwell

Conversation

@mohamedsamirx
Copy link
Copy Markdown
Contributor

Summary

Fixes #98.

This PR makes NVVM IR export target-aware so pre-Blackwell targets use the typed-pointer NVVM IR dialect expected by older libNVVM paths, while Blackwell and newer targets keep the existing opaque-pointer dialect.

Changes include:

  • Select NVVM IR dialect from CUDA_OXIDE_TARGET / --arch.
  • Emit typed-pointer NVVM IR for pre-Blackwell targets such as sm_75.
  • Preserve opaque-pointer NVVM IR for sm_100 / compute_100 and newer.
  • Use the legacy NVVM datalayout and !nvvmir.version = {2, 0, 3, 1} for typed-pointer mode.
  • Emit typed function references for !nvvm.annotations and @llvm.used.
  • Repair typed-pointer load/store/GEP/atomic use sites when source pointer values are represented as erased i8*.
  • Lower typed-NVVM-incompatible fneg and saturating integer intrinsics into supported LLVM IR operations.
  • Add an optional libnvvm-sys binding for CUDA 13+ nvvmLLVMVersion.
  • Update cargo-oxide, cuda-host, and libnvvm-sys docs.

Motivation

On CUDA 12.4 with an sm_75 NVIDIA T1200 GPU, examples that enter NVVM IR mode through libdevice calls fail in libNVVM:

nvvmCompileProgram ... parse expected type

Clean origin/main at 1f38440 reproduces this for:

  • cargo oxide run addressof_sharedarray_repro --emit-nvvm-ir --arch sm_75
  • cargo oxide run manual_launch_libdevice --emit-nvvm-ir --arch sm_75
  • cargo oxide run primitive_stress --emit-nvvm-ir --arch sm_75

The root problem is that pre-Blackwell libNVVM expects typed-pointer NVVM IR, while the current exporter emits modern opaque-pointer IR.


Testing

cargo fmt --check
git diff --check
cargo test -p dialect-llvm
cargo test -p libnvvm-sys -p cuda-host -- --nocapture
cargo check
cargo clippy --workspace -- -D warnings
cargo clippy -p libnvvm-sys -p cuda-host -p dialect-llvm -p mir-importer -- -D warnings
cargo oxide doctor
cargo oxide run addressof_sharedarray_repro --emit-nvvm-ir --arch sm_75
cargo oxide run manual_launch_libdevice --emit-nvvm-ir --arch sm_75
cargo oxide run primitive_stress --emit-nvvm-ir --arch sm_75
SMOKETEST_LOG_DIR=.smoketest-logs/typed-nvvm-rebased scripts/smoketest.sh

Results on this machine:

  • cargo oxide doctor: Pass
  • Full smoketest on this branch: 46 / 58 pass

The three issue-relevant examples pass on this branch and fail on clean origin/main with nvvmCompileProgram ... parse expected type.

The remaining 12 smoketest failures also reproduce on clean main and are unrelated baseline failures on this sm_75 laptop, mostly advanced barrier/TMA/tcgen05 paths, one local nvJitLink header/constants issue, and two runtime illegal-memory-access examples.

Signed-off-by: Mohamed Samir <mohamedsamirx@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nvvmCompileProgram fails with "parse expected type" when libdevice calls (__nv_expf) trigger NVVM IR mode

1 participant