Skip to content

[ROCm] Expose HIP kernel n_regs / n_spills / n_max_threads on JITKernel#2211

Open
benenzhu wants to merge 6 commits into
tile-ai:mainfrom
benenzhu:feat/hip-resource-attrs
Open

[ROCm] Expose HIP kernel n_regs / n_spills / n_max_threads on JITKernel#2211
benenzhu wants to merge 6 commits into
tile-ai:mainfrom
benenzhu:feat/hip-resource-attrs

Conversation

@benenzhu
Copy link
Copy Markdown
Contributor

@benenzhu benenzhu commented May 16, 2026

Problem

Summary

  • New JITKernel.n_regs / n_spills / n_max_threads (plus a richer resource_usage: dict[name, KernelResourceUsage]) that report per-kernel AMD GPU resource counts. Mirrors triton's kernel.n_regs / n_spills.
  • Implementation parses clang's -Rpass-analysis=kernel-resource-usage remarks rather than going through the HIP runtime, which keeps the diff small (no ctypes / libamdhip64 wiring, no HSACO bytes shuffled through lower()) like what triton doing.
  • Remarks are filtered out of hipcc's stdio before anything reaches the terminal, so autotune logs don't drown in resource blocks. Real warnings + errors flow through unchanged.
  • Cache-aware: parsed values are written as resource_usage.json next to kernel_lib.so and reloaded on cache hits, so subsequent runs don't lose the resource view to the cache. Older cache entries (no JSON file) silently degrade to None.

n_spills accounting: `VGPRs Spill + ScratchSize[bytes/lane] / 4` — explicit spill count plus scratch dwords, treating one scratch dword the same as one spilled VGPR for accounting (both end up in main memory, similar access cost). Triton uses scratch/4 only; this also folds in clang's explicit spill count when present. Raw `scratch_bytes` is exposed as its own field on `KernelResourceUsage`.

Verification on MI355X (gfx950)

Small elementwise add. First run (cache miss → compile) and second run (cache hit → no compile) both report `n_regs=5, n_spills=0, usage_keys=['main_kernel']`. Zero remark lines leak to stdout/stderr — verified by grepping the captured output for `Rpass-analysis|VGPRs|TotalSGPRs|Occupancy|remark:`. Both `tvm_ffi` and `cython` execution backends populate `resource_usage` correctly.

Test plan

  • Build + run a tilelang HIP kernel; read `kernel.n_regs / n_spills / n_max_threads`
  • Confirm clang remarks do not appear on terminal during compile
  • Cache miss → `resource_usage.json` written
  • Cache hit → values restored from JSON
  • Both `tvm_ffi` and `cython` backends populate `resource_usage`
  • CI on a CUDA-only target (smoke check that HIP-only paths stay dormant)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • HIP kernels now capture and persist resource usage (registers, spills, scratch, thread limits) to disk as part of the cache.
    • Kernel objects expose resource-usage properties (for inspection and analysis).
    • Compiler remarks for HIP are filtered and recorded to present cleaner, annotated output.
    • New environment flag to optionally preserve temporary HIP compilation files for debugging.

Review Change Stack

benenzhu and others added 4 commits May 16, 2026 05:42
…arks

Adds a triton-style view of per-kernel AMD GPU resource usage on the
tilelang JITKernel, queryable as kernel.n_regs / n_spills /
n_max_threads (with a richer resource_usage dict mapping kernel name
to a KernelResourceUsage dataclass).

Implementation:

* tilelang/jit/adapter/hip_resource_info.py — passes
  -Rpass-analysis=kernel-resource-usage to hipcc, parses the per-kernel
  remarks (Function Name / VGPRs / VGPRs Spill / TotalSGPRs / etc.)
  out of the captured stdio, and *strips* those lines before the
  output is printed or included in error messages, so autotune logs
  don't drown in remark blocks while real warnings/errors still
  surface. Includes JSON (de)serialization helpers.

* tilelang/contrib/hipcc.py — adds the remark flag, parses + filters
  the output. Same on the LibraryGenerator HIP path
  (tilelang/jit/adapter/libgen.py); HIP compiles always pipe stdio
  there so the filter has something to act on (verbose=True still
  prints the filtered output).

* tilelang/jit/kernel.py — opens a thread-local recorder window
  around lower() on HIP and exposes the parsed dict as lazy
  resource_usage / n_regs / n_spills / n_max_threads properties.

* tilelang/cache/kernel_cache.py — persists the parsed dict as
  resource_usage.json next to kernel_lib.so on cache miss; reloads
  it on cache hit. This way subsequent runs don't lose the resource
  view to the cache, without paying the runtime API / ctypes cost.
  Older cache entries (no JSON file) silently degrade to None.

Verified on MI355X (gfx950) with a small elementwise add: cache miss
and cache hit both report n_regs=5, n_spills=0; zero remark lines
leak to stdout/stderr.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 16087530-ef12-4edd-a494-45df89dbea42

📥 Commits

Reviewing files that changed from the base of the PR and between f276441 and af4bbc2.

📒 Files selected for processing (1)
  • tilelang/contrib/hip_resource_info.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tilelang/contrib/hip_resource_info.py

📝 Walkthrough

Walkthrough

Parses AMD HIP compiler remarks for per-kernel resource usage, integrates remark filtering into HIP compile paths, records usage on JITKernel instances with queryable properties, and persists/restores the usage via the kernel on-disk cache.

Changes

HIP Resource Usage Tracking and Caching

Layer / File(s) Summary
HIP resource usage parsing foundation
tilelang/contrib/hip_resource_info.py
New module defines KernelResourceUsage dataclass, thread-local recorder (reset_recorder, pop_recorded), filter_and_record to parse and strip -Rpass-analysis=kernel-resource-usage remarks, and dump_to_file/load_from_file for JSON persistence with schema-evolution tolerance.
Compiler output filtering and resource extraction
tilelang/contrib/hipcc.py, tilelang/jit/adapter/libgen.py, tilelang/env.py
hipcc and libgen now append the analysis flag to HIP compile commands, capture combined stdout/stderr, pass output through filter_and_record, and print filtered output when verbose; libgen conditionally enables --save-temps -g via TILELANG_HIP_SAVE_TEMP_FILES.
JITKernel resource usage capture and properties
tilelang/jit/kernel.py
JITKernel detects HIP targets, resets the recorder before compilation, captures recorded results into _resource_usage after compile, and exposes resource_usage, n_regs, n_spills, and n_max_threads derived from the selected primary recorded entry.
Cache save/load for resource usage metadata
tilelang/cache/kernel_cache.py
KernelCache adds resource_usage.json handling: conditionally writes kernel._resource_usage on save and restores it into the loaded JITKernel on cache load (with exception logging on failures).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • LeiWang1999

Poem

🐰 I hopped through compiler notes and logs,
Counting regs and spill-time clogs,
I stash them safe in JSON light,
So kernels wake with metrics bright,
Hooray — the GPU numbers sing tonight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: exposing HIP kernel resource metrics (n_regs, n_spills, n_max_threads) on the JITKernel class, which is the primary feature delivered by this changeset across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tilelang/jit/adapter/libgen.py (1)

188-195: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Filter HIP remarks before raising compile errors.

Line 188 raises with raw ret.stdout before filter_and_record runs, so remark lines still leak on HIP failures. Filter first, then use filtered text for verbose/error paths.

Proposed fix
-        if ret.returncode != 0:
-            captured = ret.stdout.decode("utf-8", errors="replace") if ret.stdout else ""
-            raise RuntimeError(f"Compilation Failed! {command}\n{captured}\n{self.lib_code}")
-
-        if is_hip_target(target) and ret.stdout is not None:
-            captured = filter_and_record(ret.stdout.decode("utf-8", errors="replace"))
-            if verbose and captured.strip():
-                print(captured)
+        captured = ret.stdout.decode("utf-8", errors="replace") if ret.stdout else ""
+        if is_hip_target(target) and captured:
+            captured = filter_and_record(captured)
+            if verbose and captured.strip():
+                print(captured)
+
+        if ret.returncode != 0:
+            raise RuntimeError(f"Compilation Failed! {command}\n{captured}\n{self.lib_code}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tilelang/jit/adapter/libgen.py` around lines 188 - 195, The RuntimeError
raised when ret.returncode != 0 currently uses raw ret.stdout and leaks HIP
remark lines; change the logic to first decode ret.stdout (if any), then, if
is_hip_target(target), pass that decoded text through filter_and_record to
produce a filtered captured string, and use that filtered captured both for
verbose printing (when verbose and captured.strip()) and in the RuntimeError
message (include command and self.lib_code as before); ensure you still handle
the non-HIP case by using the decoded stdout (or empty string) if
filter_and_record should not run.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tilelang/jit/adapter/libgen.py`:
- Around line 188-195: The RuntimeError raised when ret.returncode != 0
currently uses raw ret.stdout and leaks HIP remark lines; change the logic to
first decode ret.stdout (if any), then, if is_hip_target(target), pass that
decoded text through filter_and_record to produce a filtered captured string,
and use that filtered captured both for verbose printing (when verbose and
captured.strip()) and in the RuntimeError message (include command and
self.lib_code as before); ensure you still handle the non-HIP case by using the
decoded stdout (or empty string) if filter_and_record should not run.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8b3c3491-72b7-4f1c-9e49-3fae76feda95

📥 Commits

Reviewing files that changed from the base of the PR and between bcb2da3 and 23be555.

📒 Files selected for processing (5)
  • tilelang/cache/kernel_cache.py
  • tilelang/contrib/hip_resource_info.py
  • tilelang/contrib/hipcc.py
  • tilelang/jit/adapter/libgen.py
  • tilelang/jit/kernel.py

@LeiWang1999 LeiWang1999 requested a review from zhangnju May 18, 2026 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant