[ROCm] Expose HIP kernel n_regs / n_spills / n_max_threads on JITKernel by benenzhu · Pull Request #2211 · tile-ai/tilelang

benenzhu · 2026-05-16T06:10:55Z

Problem

In project https://github.com/sgl-project/sglang/blob/amd/deepseek_v4/python/sglang/srt/layers/attention/nsa/tilelang_kernel.py#L1574-L1582, uses a tilelang kernel for attention. And it will spills in some BLOCK configs with no warning.
So add a spills information to tilelang kernels like what triton do will make it more convenient to debug these problems.

Summary

New JITKernel.n_regs / n_spills / n_max_threads (plus a richer resource_usage: dict[name, KernelResourceUsage]) that report per-kernel AMD GPU resource counts. Mirrors triton's kernel.n_regs / n_spills.
Implementation parses clang's -Rpass-analysis=kernel-resource-usage remarks rather than going through the HIP runtime, which keeps the diff small (no ctypes / libamdhip64 wiring, no HSACO bytes shuffled through lower()) like what triton doing.
Remarks are filtered out of hipcc's stdio before anything reaches the terminal, so autotune logs don't drown in resource blocks. Real warnings + errors flow through unchanged.
Cache-aware: parsed values are written as resource_usage.json next to kernel_lib.so and reloaded on cache hits, so subsequent runs don't lose the resource view to the cache. Older cache entries (no JSON file) silently degrade to None.

n_spills accounting: `VGPRs Spill + ScratchSize[bytes/lane] / 4` — explicit spill count plus scratch dwords, treating one scratch dword the same as one spilled VGPR for accounting (both end up in main memory, similar access cost). Triton uses scratch/4 only; this also folds in clang's explicit spill count when present. Raw `scratch_bytes` is exposed as its own field on `KernelResourceUsage`.

Verification on MI355X (gfx950)

Small elementwise add. First run (cache miss → compile) and second run (cache hit → no compile) both report `n_regs=5, n_spills=0, usage_keys=['main_kernel']`. Zero remark lines leak to stdout/stderr — verified by grepping the captured output for `Rpass-analysis|VGPRs|TotalSGPRs|Occupancy|remark:`. Both `tvm_ffi` and `cython` execution backends populate `resource_usage` correctly.

Test plan

Build + run a tilelang HIP kernel; read `kernel.n_regs / n_spills / n_max_threads`
Confirm clang remarks do not appear on terminal during compile
Cache miss → `resource_usage.json` written
Cache hit → values restored from JSON
Both `tvm_ffi` and `cython` backends populate `resource_usage`
CI on a CUDA-only target (smoke check that HIP-only paths stay dormant)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- HIP kernels now capture and persist resource usage (registers, spills, scratch, thread limits) to disk as part of the cache.
- Kernel objects expose resource-usage properties (for inspection and analysis).
- Compiler remarks for HIP are filtered and recorded to present cleaner, annotated output.
- New environment flag to optionally preserve temporary HIP compilation files for debugging.

…arks Adds a triton-style view of per-kernel AMD GPU resource usage on the tilelang JITKernel, queryable as kernel.n_regs / n_spills / n_max_threads (with a richer resource_usage dict mapping kernel name to a KernelResourceUsage dataclass). Implementation: * tilelang/jit/adapter/hip_resource_info.py — passes -Rpass-analysis=kernel-resource-usage to hipcc, parses the per-kernel remarks (Function Name / VGPRs / VGPRs Spill / TotalSGPRs / etc.) out of the captured stdio, and *strips* those lines before the output is printed or included in error messages, so autotune logs don't drown in remark blocks while real warnings/errors still surface. Includes JSON (de)serialization helpers. * tilelang/contrib/hipcc.py — adds the remark flag, parses + filters the output. Same on the LibraryGenerator HIP path (tilelang/jit/adapter/libgen.py); HIP compiles always pipe stdio there so the filter has something to act on (verbose=True still prints the filtered output). * tilelang/jit/kernel.py — opens a thread-local recorder window around lower() on HIP and exposes the parsed dict as lazy resource_usage / n_regs / n_spills / n_max_threads properties. * tilelang/cache/kernel_cache.py — persists the parsed dict as resource_usage.json next to kernel_lib.so on cache miss; reloads it on cache hit. This way subsequent runs don't lose the resource view to the cache, without paying the runtime API / ctypes cost. Older cache entries (no JSON file) silently degrade to None. Verified on MI355X (gfx950) with a small elementwise add: cache miss and cache hit both report n_regs=5, n_spills=0; zero remark lines leak to stdout/stderr. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T06:11:04Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-05-16T06:11:09Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 16087530-ef12-4edd-a494-45df89dbea42

📥 Commits

Reviewing files that changed from the base of the PR and between f276441 and af4bbc2.

📒 Files selected for processing (1)

tilelang/contrib/hip_resource_info.py

🚧 Files skipped from review as they are similar to previous changes (1)

tilelang/contrib/hip_resource_info.py

📝 Walkthrough

Walkthrough

Parses AMD HIP compiler remarks for per-kernel resource usage, integrates remark filtering into HIP compile paths, records usage on JITKernel instances with queryable properties, and persists/restores the usage via the kernel on-disk cache.

Changes

HIP Resource Usage Tracking and Caching

Layer / File(s)	Summary
HIP resource usage parsing foundation `tilelang/contrib/hip_resource_info.py`	New module defines `KernelResourceUsage` dataclass, thread-local recorder (`reset_recorder`, `pop_recorded`), `filter_and_record` to parse and strip `-Rpass-analysis=kernel-resource-usage` remarks, and `dump_to_file`/`load_from_file` for JSON persistence with schema-evolution tolerance.
Compiler output filtering and resource extraction `tilelang/contrib/hipcc.py`, `tilelang/jit/adapter/libgen.py`, `tilelang/env.py`	hipcc and libgen now append the analysis flag to HIP compile commands, capture combined stdout/stderr, pass output through `filter_and_record`, and print filtered output when verbose; libgen conditionally enables `--save-temps -g` via `TILELANG_HIP_SAVE_TEMP_FILES`.
JITKernel resource usage capture and properties `tilelang/jit/kernel.py`	JITKernel detects HIP targets, resets the recorder before compilation, captures recorded results into `_resource_usage` after compile, and exposes `resource_usage`, `n_regs`, `n_spills`, and `n_max_threads` derived from the selected primary recorded entry.
Cache save/load for resource usage metadata `tilelang/cache/kernel_cache.py`	KernelCache adds `resource_usage.json` handling: conditionally writes `kernel._resource_usage` on save and restores it into the loaded `JITKernel` on cache load (with exception logging on failures).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

LeiWang1999

Poem

🐰 I hopped through compiler notes and logs,
Counting regs and spill-time clogs,
I stash them safe in JSON light,
So kernels wake with metrics bright,
Hooray — the GPU numbers sing tonight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the main change: exposing HIP kernel resource metrics (n_regs, n_spills, n_max_threads) on the JITKernel class, which is the primary feature delivered by this changeset across all modified files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tilelang/jit/adapter/libgen.py (1)

188-195: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Filter HIP remarks before raising compile errors.

Line 188 raises with raw ret.stdout before filter_and_record runs, so remark lines still leak on HIP failures. Filter first, then use filtered text for verbose/error paths.

Proposed fix

-        if ret.returncode != 0:
-            captured = ret.stdout.decode("utf-8", errors="replace") if ret.stdout else ""
-            raise RuntimeError(f"Compilation Failed! {command}\n{captured}\n{self.lib_code}")
-
-        if is_hip_target(target) and ret.stdout is not None:
-            captured = filter_and_record(ret.stdout.decode("utf-8", errors="replace"))
-            if verbose and captured.strip():
-                print(captured)
+        captured = ret.stdout.decode("utf-8", errors="replace") if ret.stdout else ""
+        if is_hip_target(target) and captured:
+            captured = filter_and_record(captured)
+            if verbose and captured.strip():
+                print(captured)
+
+        if ret.returncode != 0:
+            raise RuntimeError(f"Compilation Failed! {command}\n{captured}\n{self.lib_code}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tilelang/jit/adapter/libgen.py` around lines 188 - 195, The RuntimeError
raised when ret.returncode != 0 currently uses raw ret.stdout and leaks HIP
remark lines; change the logic to first decode ret.stdout (if any), then, if
is_hip_target(target), pass that decoded text through filter_and_record to
produce a filtered captured string, and use that filtered captured both for
verbose printing (when verbose and captured.strip()) and in the RuntimeError
message (include command and self.lib_code as before); ensure you still handle
the non-HIP case by using the decoded stdout (or empty string) if
filter_and_record should not run.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tilelang/jit/adapter/libgen.py`:
- Around line 188-195: The RuntimeError raised when ret.returncode != 0
currently uses raw ret.stdout and leaks HIP remark lines; change the logic to
first decode ret.stdout (if any), then, if is_hip_target(target), pass that
decoded text through filter_and_record to produce a filtered captured string,
and use that filtered captured both for verbose printing (when verbose and
captured.strip()) and in the RuntimeError message (include command and
self.lib_code as before); ensure you still handle the non-HIP case by using the
decoded stdout (or empty string) if filter_and_record should not run.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8b3c3491-72b7-4f1c-9e49-3fae76feda95

📥 Commits

Reviewing files that changed from the base of the PR and between bcb2da3 and 23be555.

📒 Files selected for processing (5)

tilelang/cache/kernel_cache.py
tilelang/contrib/hip_resource_info.py
tilelang/contrib/hipcc.py
tilelang/jit/adapter/libgen.py
tilelang/jit/kernel.py

benenzhu and others added 4 commits May 16, 2026 05:42

fix for final merge

7f594d1

fix for final merge

ad72ada

fix for final merge

23be555

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

benenzhu and others added 2 commits May 16, 2026 17:32

add a flag to save temp files

f276441

rerun ci

af4bbc2

LeiWang1999 requested a review from zhangnju May 18, 2026 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Expose HIP kernel n_regs / n_spills / n_max_threads on JITKernel#2211

[ROCm] Expose HIP kernel n_regs / n_spills / n_max_threads on JITKernel#2211
benenzhu wants to merge 6 commits into
tile-ai:mainfrom
benenzhu:feat/hip-resource-attrs

benenzhu commented May 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benenzhu commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary

Verification on MI355X (gfx950)

Test plan

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benenzhu commented May 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading