Skip to content

Commit 1870daf

Browse files
[IntelNav] ci: actually broaden AMDGPU_TARGETS
This file was edited but never committed before the v0.1.2 tag fired, so v0.1.2's libllama-rocm tarball shipped the same narrow target list as v0.1.1. The runtime gpu_compat shim covers it via HSA_OVERRIDE_GFX_VERSION but the "native bytecode for every modern Radeon arch" win didn't actually land. Add gfx942 (CDNA3), gfx1031/gfx1032 (RX 6700/6600), gfx1101/gfx1102 (RX 7800/7700/7600), gfx1150/gfx1151 (Strix APUs), gfx1200/gfx1201 (RDNA4) to the target list so users on those cards get native kernels.
1 parent 4e2409a commit 1870daf

1 file changed

Lines changed: 16 additions & 7 deletions

File tree

.github/workflows/intelnav-release.yml

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -160,8 +160,9 @@ jobs:
160160

161161
# ---------------------------------------------------------------------
162162
# linux-x64 ROCm — AMD Radeon (gfx9+) acceleration, the headline
163-
# IntelNav win. The user's RX 6600 (gfx1032) is covered by setting
164-
# AMDGPU_TARGETS to include gfx1030 (handles gfx1031/1032 via override).
163+
# IntelNav win. AMDGPU_TARGETS covers every shipping Radeon arch
164+
# IntelNav users are realistically running, with native bytecode for
165+
# each so no HSA_OVERRIDE_GFX_VERSION dance is required at runtime.
165166
# ---------------------------------------------------------------------
166167
linux-x64-rocm:
167168
runs-on: ubuntu-22.04
@@ -212,11 +213,19 @@ jobs:
212213

213214
- name: Configure
214215
env:
215-
# gfx1030 bytecode is binary-compatible with gfx1031/1032 via
216-
# HSA_OVERRIDE_GFX_VERSION=10.3.0 at run time — one artifact
217-
# covers the RX 6600 / 6700 / 6800 family. Add more archs
218-
# here as we get hardware to test on.
219-
AMDGPU_TARGETS: "gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100"
216+
# Native bytecode for every shipping Radeon arch we'd plausibly
217+
# see in IntelNav users' machines. Build cost is ~30 % more
218+
# than a narrow target list, but the resulting tarball runs
219+
# without HSA_OVERRIDE_GFX_VERSION on any of them. The runtime
220+
# `gpu_compat` shim still applies as belt-and-suspenders for
221+
# arches that get added to consumer-grade hardware after a
222+
# given tarball was cut.
223+
# gfx9xx — Vega + CDNA1/2/3 (radeon vii, MI50/100/200/300)
224+
# gfx103x — RDNA2 (RX 6x00, e.g. 6600 = gfx1032)
225+
# gfx110x — RDNA3 (RX 7x00)
226+
# gfx115x — RDNA3.5 (Strix Point APU iGPUs)
227+
# gfx120x — RDNA4 (RX 9x00)
228+
AMDGPU_TARGETS: "gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1031;gfx1032;gfx1100;gfx1101;gfx1102;gfx1150;gfx1151;gfx1200;gfx1201"
220229
run: |
221230
cmake -B build \
222231
${{ env.CMAKE_COMMON }} \

0 commit comments

Comments
 (0)