[ROCm] Expose HIP kernel n_regs / n_spills / n_max_threads on JITKernel#2211
[ROCm] Expose HIP kernel n_regs / n_spills / n_max_threads on JITKernel#2211benenzhu wants to merge 6 commits into
Conversation
…arks Adds a triton-style view of per-kernel AMD GPU resource usage on the tilelang JITKernel, queryable as kernel.n_regs / n_spills / n_max_threads (with a richer resource_usage dict mapping kernel name to a KernelResourceUsage dataclass). Implementation: * tilelang/jit/adapter/hip_resource_info.py — passes -Rpass-analysis=kernel-resource-usage to hipcc, parses the per-kernel remarks (Function Name / VGPRs / VGPRs Spill / TotalSGPRs / etc.) out of the captured stdio, and *strips* those lines before the output is printed or included in error messages, so autotune logs don't drown in remark blocks while real warnings/errors still surface. Includes JSON (de)serialization helpers. * tilelang/contrib/hipcc.py — adds the remark flag, parses + filters the output. Same on the LibraryGenerator HIP path (tilelang/jit/adapter/libgen.py); HIP compiles always pipe stdio there so the filter has something to act on (verbose=True still prints the filtered output). * tilelang/jit/kernel.py — opens a thread-local recorder window around lower() on HIP and exposes the parsed dict as lazy resource_usage / n_regs / n_spills / n_max_threads properties. * tilelang/cache/kernel_cache.py — persists the parsed dict as resource_usage.json next to kernel_lib.so on cache miss; reloads it on cache hit. This way subsequent runs don't lose the resource view to the cache, without paying the runtime API / ctypes cost. Older cache entries (no JSON file) silently degrade to None. Verified on MI355X (gfx950) with a small elementwise add: cache miss and cache hit both report n_regs=5, n_spills=0; zero remark lines leak to stdout/stderr. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughParses AMD HIP compiler remarks for per-kernel resource usage, integrates remark filtering into HIP compile paths, records usage on JITKernel instances with queryable properties, and persists/restores the usage via the kernel on-disk cache. ChangesHIP Resource Usage Tracking and Caching
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/jit/adapter/libgen.py (1)
188-195:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFilter HIP remarks before raising compile errors.
Line 188 raises with raw
ret.stdoutbeforefilter_and_recordruns, so remark lines still leak on HIP failures. Filter first, then use filtered text for verbose/error paths.Proposed fix
- if ret.returncode != 0: - captured = ret.stdout.decode("utf-8", errors="replace") if ret.stdout else "" - raise RuntimeError(f"Compilation Failed! {command}\n{captured}\n{self.lib_code}") - - if is_hip_target(target) and ret.stdout is not None: - captured = filter_and_record(ret.stdout.decode("utf-8", errors="replace")) - if verbose and captured.strip(): - print(captured) + captured = ret.stdout.decode("utf-8", errors="replace") if ret.stdout else "" + if is_hip_target(target) and captured: + captured = filter_and_record(captured) + if verbose and captured.strip(): + print(captured) + + if ret.returncode != 0: + raise RuntimeError(f"Compilation Failed! {command}\n{captured}\n{self.lib_code}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/jit/adapter/libgen.py` around lines 188 - 195, The RuntimeError raised when ret.returncode != 0 currently uses raw ret.stdout and leaks HIP remark lines; change the logic to first decode ret.stdout (if any), then, if is_hip_target(target), pass that decoded text through filter_and_record to produce a filtered captured string, and use that filtered captured both for verbose printing (when verbose and captured.strip()) and in the RuntimeError message (include command and self.lib_code as before); ensure you still handle the non-HIP case by using the decoded stdout (or empty string) if filter_and_record should not run.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tilelang/jit/adapter/libgen.py`:
- Around line 188-195: The RuntimeError raised when ret.returncode != 0
currently uses raw ret.stdout and leaks HIP remark lines; change the logic to
first decode ret.stdout (if any), then, if is_hip_target(target), pass that
decoded text through filter_and_record to produce a filtered captured string,
and use that filtered captured both for verbose printing (when verbose and
captured.strip()) and in the RuntimeError message (include command and
self.lib_code as before); ensure you still handle the non-HIP case by using the
decoded stdout (or empty string) if filter_and_record should not run.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8b3c3491-72b7-4f1c-9e49-3fae76feda95
📒 Files selected for processing (5)
tilelang/cache/kernel_cache.pytilelang/contrib/hip_resource_info.pytilelang/contrib/hipcc.pytilelang/jit/adapter/libgen.pytilelang/jit/kernel.py
Problem
Summary
JITKernel.n_regs / n_spills / n_max_threads(plus a richerresource_usage: dict[name, KernelResourceUsage]) that report per-kernel AMD GPU resource counts. Mirrors triton'skernel.n_regs / n_spills.-Rpass-analysis=kernel-resource-usageremarks rather than going through the HIP runtime, which keeps the diff small (no ctypes /libamdhip64wiring, no HSACO bytes shuffled throughlower()) like what triton doing.resource_usage.jsonnext tokernel_lib.soand reloaded on cache hits, so subsequent runs don't lose the resource view to the cache. Older cache entries (no JSON file) silently degrade toNone.n_spillsaccounting: `VGPRs Spill + ScratchSize[bytes/lane] / 4` — explicit spill count plus scratch dwords, treating one scratch dword the same as one spilled VGPR for accounting (both end up in main memory, similar access cost). Triton uses scratch/4 only; this also folds in clang's explicit spill count when present. Raw `scratch_bytes` is exposed as its own field on `KernelResourceUsage`.Verification on MI355X (gfx950)
Small elementwise add. First run (cache miss → compile) and second run (cache hit → no compile) both report `n_regs=5, n_spills=0, usage_keys=['main_kernel']`. Zero remark lines leak to stdout/stderr — verified by grepping the captured output for `Rpass-analysis|VGPRs|TotalSGPRs|Occupancy|remark:`. Both `tvm_ffi` and `cython` execution backends populate `resource_usage` correctly.
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit