Skip to content

AiterAsmKernel: add init-time sanity checks for .co registration#3127

Open
alexioslyrakis-amd wants to merge 1 commit intomainfrom
alyr/sanity-asm-kernel-init
Open

AiterAsmKernel: add init-time sanity checks for .co registration#3127
alexioslyrakis-amd wants to merge 1 commit intomainfrom
alyr/sanity-asm-kernel-init

Conversation

@alexioslyrakis-amd
Copy link
Copy Markdown

@alexioslyrakis-amd alexioslyrakis-amd commented May 11, 2026

Summary

  • Adds validate_hsaco_lds() to AiterAsmKernelFast::init(): scans the raw .co ELF blob for group_segment_fixed_size via msgpack decode and checks it against the device LDS limit (hipDeviceGetAttribute). Fires before registration, giving an actionable error instead of a silent null from hipGetFuncBySymbol.
  • Adds a registration probe in init(): since __hipRegisterFunction returns void, a hipGetFuncBySymbol probe detects silent rejection by the runtime (LDS limit exceeded, arch mismatch, corrupted binary, etc.).
  • Both checks run once per kernel variant at construction time — zero impact on the launch_kernel() hot path.

Motivation

On gfx942 (MI300X) the LDS limit is 64 KB. A .co built targeting gfx950 (MI355X, ~160 KB LDS) would be silently rejected at registration, hipGetFuncBySymbol would return null, and hipModuleLaunchKernel would emit only a generic hipErrorIllegalState with no indication of the root cause.

Test plan

  • Run with a .co declaring group_segment_fixed_size > 65536 on gfx942 — expect clear error from validate_hsaco_lds() at init
  • Run with a correctly built .co on gfx942 — expect no error, kernel executes normally

Two checks added to AiterAsmKernelFast::init(), both running once per
kernel variant at construction time (not on the launch hot path):

1. validate_hsaco_lds(): scans the raw .co ELF blob for
   group_segment_fixed_size via msgpack decode and compares against the
   device LDS limit (hipDeviceGetAttribute). Gives an actionable error
   before __hipRegisterFatBinary is called, e.g. on gfx942 (MI300X) the
   64 KB limit would reject a .co built for gfx950 (MI355X, ~160 KB).

2. Registration probe: __hipRegisterFunction returns void, so a
   hipGetFuncBySymbol probe is used to detect silent rejection by the
   runtime (LDS limit exceeded, arch mismatch, corrupted binary, etc.).
@alexioslyrakis-amd alexioslyrakis-amd requested a review from a team May 11, 2026 10:09
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3127 --add-label <label>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant