Skip to content

[ROCm] Streamline bazel targets for rocm libraries#801

Closed
draganmladjenovic wants to merge 4 commits into
rocm-jaxlib-v0.8.0from
draganm/rocm-jaxlib-v0.8.0-rocm_libs
Closed

[ROCm] Streamline bazel targets for rocm libraries#801
draganmladjenovic wants to merge 4 commits into
rocm-jaxlib-v0.8.0from
draganm/rocm-jaxlib-v0.8.0-rocm_libs

Conversation

@draganmladjenovic
Copy link
Copy Markdown

Remove DsoLoader indirection and directly link to rocm libs

@draganmladjenovic draganmladjenovic force-pushed the draganm/rocm-jaxlib-v0.8.0-rocm_libs branch from b5ac783 to 22e1278 Compare April 21, 2026 15:03
nurmukhametov and others added 4 commits April 21, 2026 14:00
Imported from GitHub PR openxla#35211

Replace amd_comgr library with LLVM's native API to find NT_AMDGPU_METADATA note sections and extract the stack usage and register spill counts from there.

Add detection for dynamic stack usage.

Add VLOG(2) dumps for per-kernel stats as well as register counts.

Change the logic of discarding the module. The module is discarded only if the stack is used, i.e., either .private_segment_fixed_size is not zero or .uses_dynamic_stack is true. There are examples where there are SGPR spills, but they are saved to VGPRs and not to the stack.

Add tests in amdgpu_register_spilling_test.cc which cover cases where no spills, VGPR-only spills, SGPR-only spills, or dynamic stack usage occur. For that, the following LLVM IR inputs are added:
- amdgpu_no_spills.ll: Simple kernel with minimal register usage
- amdgpu_vgpr_spills.ll: High VGPR pressure with limited VGPRs (64)
- amdgpu_sgpr_spills.ll: High SGPR pressure with limited SGPRs (32)
- amdgpu_dynamic_stack.ll: Indirect function call requiring dynamic stack
Copybara import of the project:

--
b83efc6 by Aleksei Nurmukhametov <anurmukh@amd.com>:

[ROCm] Reimplement register spilling detection

Replace amd_comgr library with LLVM's native API to find
NT_AMDGPU_METADATA note sections and extract the stack usage and
register spill counts from there.

Add detection for dynamic stack usage.

Add VLOG(2) dumps for per-kernel stats as well as register counts.

Change the logic of discarding the module. The module is discarded only
if the stack is used, i.e., either .private_segment_fixed_size is not
zero or .uses_dynamic_stack is true. There are examples where there are
SGPR spills, but they are saved to VGPRs and not to the stack.

Add tests in amdgpu_register_spilling_test.cc which cover cases where no
spills, VGPR-only spills, SGPR-only spills, or dynamic stack usage
occur. For that, the following LLVM IR inputs are added:
- amdgpu_no_spills.ll: Simple kernel with minimal register usage
- amdgpu_vgpr_spills.ll: High VGPR pressure with limited VGPRs (64)
- amdgpu_sgpr_spills.ll: High SGPR pressure with limited SGPRs (32)
- amdgpu_dynamic_stack.ll: Indirect function call requiring dynamic
  stack

Merging this change closes openxla#35211

COPYBARA_INTEGRATE_REVIEW=openxla#35211 from ROCm:anurmukh/redo-regspill-check-no-comgr b83efc6
PiperOrigin-RevId: 845742402
@draganmladjenovic draganmladjenovic force-pushed the draganm/rocm-jaxlib-v0.8.0-rocm_libs branch from 22e1278 to 21d0991 Compare April 21, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants