[FlyDSL AOT] Skip kernels for unrequested arches when GPU_ARCHS is set by eppaneamd · Pull Request #3321 · ROCm/aiter

eppaneamd · 2026-05-24T11:59:27Z

Summary

When GPU_ARCHS is set to a specific arch at build time (e.g. gfx942), FlyDSL AOT previously compiled all kernels unconditionally, including hundreds of kernels tuned for other arches (that will not be used).

Add _job_arch(job) helper that returns the target arch from any job dict (cu_num → cu_num_to_arch for GEMM/MoE; explicit "arch" field for CHUNK_GDN_H; None for untuned/arch-agnostic jobs that must always compile).
In start_aot(), filter all_jobs against GPU_ARCHS after collection. Uses _parse_gpu_archs_env from build_targets.py (;-separated, consistent with the rest of the codebase). Import is deferred inside the branch to avoid triggering aiter/__init__ during setup.py's early import of common.py.
GPU_ARCHS unset or "native" preserves existing behaviour.

Test plan

Unit-tested _job_arch and filter logic: tuned GEMM/MoE, CHUNK_GDN_H explicit arch, untuned jobs, single arch, multi-arch (gfx942;gfx950), unset, and native.
Verified end-to-end on AITER main with GPU_ARCHS=gfx942: filter fires correctly, 2001 gfx950-only kernels skipped, build completes without errors.

github-actions · 2026-05-24T12:00:00Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3321 --add-label <label>

Copilot

Pull request overview

This PR optimizes FlyDSL AOT compilation by honoring GPU_ARCHS (when explicitly set) to avoid compiling tuned kernels for architectures that won’t be used, reducing build time and work.

Changes:

Added a _job_arch(job) helper to derive a target arch from different FlyDSL AOT job shapes (cu_num-based vs explicit "arch").
Updated start_aot() to filter the collected AOT jobs against GPU_ARCHS (excluding "native"), and emit a summary of skipped kernels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    gpu_archs_env = os.environ.get("GPU_ARCHS", "").strip()
+    if gpu_archs_env and gpu_archs_env.lower() != "native":
+        from aiter.jit.utils.build_targets import _parse_gpu_archs_env
+
+        requested = set(_parse_gpu_archs_env(gpu_archs_env))
+        before = len(all_jobs)
+        all_jobs = [
+            (kind, job)
+            for kind, job in all_jobs
+            if (arch := _job_arch(job)) is None or arch in requested
+        ]
+        filtered = before - len(all_jobs)
+        if filtered:
+            print(
+                f"[aiter] FlyDSL AOT: GPU_ARCHS={gpu_archs_env!r} skipped"
+                f" {filtered} kernels for unrequested arches"
+                f" ({len(all_jobs)} remaining)"
+            )
+


coderfeli · 2026-05-25T01:29:17Z

@zhiding512 take a look?

aot/flydsl: skip kernels for unrequested arches when GPU_ARCHS is set

fbac184

eppaneamd requested review from a team and Copilot May 24, 2026 11:59

Copilot started reviewing on behalf of eppaneamd May 24, 2026 11:59 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

Merge branch 'main' into feat/flydsl-aot-gpu-archs-filter

f3a874c

coderfeli requested a review from zhiding512 May 25, 2026 01:29

eppaneamd added 6 commits May 25, 2026 12:58

Merge branch 'main' into feat/flydsl-aot-gpu-archs-filter

26210f3

Merge branch 'main' into feat/flydsl-aot-gpu-archs-filter

cdfce7d

Merge branch 'main' into feat/flydsl-aot-gpu-archs-filter

b26f2a1

Merge branch 'main' into feat/flydsl-aot-gpu-archs-filter

b68c6b2

Merge branch 'main' into feat/flydsl-aot-gpu-archs-filter

f5eddce

Merge branch 'main' into feat/flydsl-aot-gpu-archs-filter

d89c05a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FlyDSL AOT] Skip kernels for unrequested arches when GPU_ARCHS is set#3321

[FlyDSL AOT] Skip kernels for unrequested arches when GPU_ARCHS is set#3321
eppaneamd wants to merge 8 commits into
mainfrom
feat/flydsl-aot-gpu-archs-filter

eppaneamd commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

coderfeli commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eppaneamd commented May 24, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 24, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderfeli commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants