Date: 2026-04-13 (updated 2026-04-16) Target: GPD Pocket 4, AMD Ryzen AI 9 HX 370, XDNA 2 NPU Purpose: Determine the fastest viable path to NPU inference for AgentSpyBoo Phase 2.
Driver unblocked, inference runtime still blocked. A patched amdxdna.ko built at ~/builds/amdxdna-patched/ (root-caused by a background agent, POWER_OFF precheck fix for Strix Point) now loads clean on cold boot — xrt-smi examine reports the NPU device and flm validate passes green. However, FastFlowLM still fails on actual inference — it cannot handle protocol-7 opcodes used by the Qwen3/GGUF models we need. The NPU is unblocked at the driver level, blocked at the inference runtime level.
Recommended path remains FastFlowLM once it supports the required opcodes. Integration with AgentSpyBoo is a one-line base-URL change. The driver blocker documented below is now resolved; see the addendum at the end for current state.
The Arch/CachyOS NPU stack is unexpectedly mature: flm (FastFlowLM v0.9.38), xrt-smi (XRT 2.21.75), and the xrt-plugin-amdxdna runtime are all packaged in the official extra repo and already installed on the GPD, with llama3.2:1b pre-pulled. However the amdxdna kernel driver currently fails to bind to the NPU PCI device (c6:00.1) on kernel 6.19.12-1-cachyos with aie2_smu_init: Access power failed, ret -22 — the SMU power handshake fails, hardware probe aborts, and xrt-smi examine reports 0 devices found. Until that probe error is resolved, no NPU runtime can run, regardless of which user-space stack we pick. Recommended path is FastFlowLM once the driver binds (one base-URL swap in src/main.rs and AgentSpyBoo runs on NPU); the immediate Phase 2.0 blocker is firmware/kernel, not user-space.
FastFlowLM, contingent on unblocking the kernel driver probe.
Rationale: flm is already installed via pacman (extra/fastflowlm 0.9.38-1), exposes an OpenAI-compatible REST server (flm serve), and llama3.2:1b is already pulled. If/when amdxdna binds to 0000:c6:00.1, integration with AgentSpyBoo is a one-line base-URL change in src/main.rs (from http://127.0.0.1:13305 to FastFlowLM's port). No Rust ecosystem work, no ONNX export, no Quark quantization needed.
The other two paths are strictly worse on this machine right now: ONNX Runtime isn't even installed in the gaia venv (only onnx 1.18.0 is), so the ort + Vitis AI EP path requires building/sourcing onnxruntime-vitisai from scratch on Arch (untested territory, GCC 14 build failures noted in xdna-driver#1017). IREE + amd-aie has no Arch packages and no install footprint on the GPD at all.
$ uname -r
6.19.12-1-cachyos
$ lsmod | grep -i xdna # before modprobe
(empty)
$ ls /dev/accel/
ls: cannot access '/dev/accel/': No such file or directory
$ ls /sys/class/accel/
(empty directory exists)
$ modinfo amdxdna | head
filename: /lib/modules/6.19.12-1-cachyos/kernel/drivers/accel/amdxdna/amdxdna.ko.zst
firmware: amdnpu/1502_00/npu.sbin
firmware: amdnpu/17f0_10/npu.sbin
firmware: amdnpu/17f0_11/npu.sbin
firmware: amdnpu/17f0_10/npu.sbin
intree: Y
depends: gpu-sched
alias: pci:v00001022d000017F0sv*sd*bc*sc*i*
$ ls /lib/firmware/amdnpu
1502_00 17f0_10 17f0_11 # note: 17f0_20 firmware is referenced by modinfo but NOT present on disk
$ sudo modprobe amdxdna # passwordless sudo works on GPD
$ lsmod | grep xdna
amdxdna 200704 0
gpu_sched 73728 2 amdxdna,amdgpu
$ sudo dmesg | grep -iE "xdna|npu" | tail
amdxdna 0000:c6:00.1: [drm] *ERROR* aie2_smu_exec: smu cmd 4 failed, 0xff
amdxdna 0000:c6:00.1: [drm] *ERROR* aie2_smu_init: Access power failed, ret -22
amdxdna 0000:c6:00.1: [drm] *ERROR* aie2_hw_start: failed to init smu, ret -22
amdxdna 0000:c6:00.1: [drm] *ERROR* aie2_init: start npu failed, ret -22
amdxdna 0000:c6:00.1: [drm] *ERROR* amdxdna_probe: Hardware init failed, ret -22
amdxdna 0000:c6:00.1: probe with driver amdxdna failed with error -22
The module loads, claims its PCI alias, attempts to probe 0000:c6:00.1, and fails at the SMU power init step. After the failed probe /dev/accel/ is still absent, /sys/class/accel/ is empty, and the device falls out of the driver. This is the central blocker.
Two suspicious correlations:
- The
17f0_20firmware (which would presumably target the Strix Halo / HX 370 stepping) is referenced bymodinfobut is not present in/lib/firmware/amdnpu/— only1502_00,17f0_10,17f0_11. Possible firmware mismatch. - BIOS version is 2.10 on the GPD G1628-04. NPU enablement on Strix laptops is BIOS-gated (PSP/SMU fuses); a BIOS update from GPD that flips NPU enable, or a UEFI setting, may be required.
$ which xrt-smi xbutil
/usr/bin/xrt-smi
which: no xbutil in (...)
$ pacman -Qs xdna
local/xrt-plugin-amdxdna 1:2.21.75-2.1
Runtime for AIE and FPGA based platforms
$ ls /opt/amd /opt/xilinx /opt/ryzen_ai ~/ryzen_ai
(none exist)
No Ryzen AI SDK install (no /opt/amd, no ~/ryzen_ai). Instead, the community Arch packages ship the equivalents: xrt-plugin-amdxdna provides the userspace runtime, xrt-smi is the diagnostic tool, and fastflowlm ships the inference runtime. This is much cleaner than the Ubuntu Ryzen AI Software install — no 10 GB /opt blob, no GCC version pinning.
xrt-smi examine confirms the runtime sees the driver but no devices:
XRT
Version : 2.21.75
amdxdna Version : 6.19.12-1-cachyos
Device(s) Present
0 devices found
$ ~/gaia-env-312/bin/python3 -c "import onnxruntime as ort; print(ort.__version__); print(ort.get_available_providers())"
ModuleNotFoundError: No module named 'onnxruntime'
$ ~/gaia-env-312/bin/pip list | grep -iE "onnx|vitis|quark|iree"
onnx 1.18.0
onnxruntime is not installed at all in the gaia venv. Only the schema/IR package onnx 1.18.0. No VitisAIExecutionProvider, no quark, no iree. This means the "ort crate + Vitis AI EP" Rust path requires sourcing or building onnxruntime-vitisai from scratch, which is the path the research RESEARCH.md flagged as broken on Arch.
$ pacman -Ss fastflowlm
cachyos-extra-znver4/fastflowlm 0.9.38-1.1 [installed]
Run LLMs on AMD Ryzen AI NPUs
extra/fastflowlm 0.9.38-1 [installed: 0.9.38-1.1]
Run LLMs on AMD Ryzen AI NPUs
$ which flm
/usr/bin/flm
$ flm version
FLM v0.9.38
$ flm validate
[Linux] Kernel: 6.19.12-1-cachyos
[ERROR] No NPU device found.
[Linux] Memlock Limit: infinity
$ flm list
Models:
- deepseek-r1:8b ⏬
- gemma3:1b ⏬
- gemma3:4b ⏬
- gpt-oss:20b ⏬
- llama3.1:8b ⏬
- llama3.2:1b ✅ # <-- already pulled
- llama3.2:3b ⏬
- phi4-mini-it:4b ⏬
- lfm2:1.2b ⏬
- lfm2:2.6b ⏬
- medgemma:4b ⏬
- nanbeige4.1:3b ⏬
... (many more)
FastFlowLM is not just installable, it's already installed and configured with one model pre-pulled. The CachyOS znver4-optimized variant is selected. flm serve will run an OpenAI-compatible REST server. The HX 370 is implicitly supported since both Arch and CachyOS-extra ship it; FastFlowLM's tested model set spans Llama 3.1/3.2, Gemma 3, Phi-4-mini, gpt-oss, DeepSeek-R1, MedGemma, LFM2, and embedding models — much broader than the research notes implied.
flm validate fails purely because of the kernel driver probe failure in section A. The user-space FastFlowLM stack is shovel-ready; only the kernel binding is broken.
$ which iree-compile iree-run-module
(neither found)
$ ~/gaia-env-312/bin/pip list | grep iree
(empty)
Not installed. No Arch package observed for iree-amd-aie. This path would require building from source — not viable as Phase 2.1 starting point given that FastFlowLM is already on disk.
$ ~/gaia-env-312/bin/pip list | grep -iE "quark|amd_quark"
(empty)
Not installed. Not needed for FastFlowLM (it ships pre-quantized GGUF-style model files from its own catalog). Quark would only be needed for the ort+Vitis AI EP path, which is parked.
$ ~/gaia-env-312/bin/python3 -c "import torch; print(torch.__version__, torch.version.hip)"
2.11.0+cu130 None
Surprise finding: the torch installed in gaia-env-312 is 2.11.0+cu130 (CUDA build), not the custom gfx1150 ROCm build documented in MEMORY. The gfx1150 wheel lives under ~/builds/pytorch-gfx1150 and is not active in this venv. Either way: the existing AI stack is GPU+CPU oriented (ROCm gfx1150 wheel, llama.cpp via Lemonade on CPU). No part of the existing stack touches the NPU. Phase 2 is genuinely new territory.
$ lscpu | head
Architecture: x86_64
CPU(s): 24
Model name: AMD Ryzen AI 9 HX 370 w/ Radeon 890M
CPU family: 26
Model: 36
Thread(s)/core: 2
Cores/socket: 12
$ lspci | grep -iE "neural|signal processing"
c6:00.1 Signal processing controller: AMD Strix/Krackan/Strix Halo Neural Processing Unit (rev 10)
$ lspci -k -s c6:00.1
c6:00.1 Signal processing controller: ... Neural Processing Unit (rev 10)
Subsystem: AMD Strix/Krackan/Strix Halo Neural Processing Unit
Kernel modules: amdxdna # <-- module CLAIMS device but probe fails
NPU is physically present at 0000:c6:00.1, advertised as the Strix Halo NPU, rev 10. The driver associates with it via PCI alias but probe fails at SMU init.
amdxdnadriver probe failure on kernel 6.19.12 —aie2_smu_initreturns -22 (EINVAL/access denied from PSP). This is the only thing standing between AgentSpyBoo and an NPU backend. Effort: unknown, 1 hour to 1 week. Resolution candidates, in order of likelihood:- BIOS update from GPD (current 2.10) — Strix NPU is PSP-fused at BIOS level, and a BIOS that doesn't enable NPU SMU access cannot be worked around in software. Highest probability fix.
- Missing
17f0_20firmware referenced by modinfo but absent from/lib/firmware/amdnpu/. Pull fromlinux-firmwaregit or theamdgpu-firmwareAUR package. Low-effort to test, may not be the right firmware ID for this stepping. - Kernel regression. 6.19 mainlined more amdxdna changes; older 6.14-6.17 kernels are reported working on Strix Point. Try the
linux-ltspackage as an A/B test. - xdna-driver issue tracker (github.com/amd/xdna-driver) — search for
aie2_smu_init -22and HX 370 / Strix Point reports. Likely already filed.
- No onnxruntime in gaia venv — only relevant if the FastFlowLM path is later abandoned in favor of ort+Vitis AI EP. Effort to install: small, but the Vitis AI EP build is the hard part (research already flagged Ubuntu/GCC issues). Parked.
- Existing torch 2.11.0+cu130 in gaia-env-312 — not a Phase 2 blocker, but a documentation issue: the venv we thought was the gfx1150 ROCm one is actually a CUDA wheel. Out of scope for this report.
- Are you willing to update the GPD BIOS if GPD has released a newer firmware than 2.10? This is the single highest-leverage fix and the only one that's strictly outside Linux's control.
- Are you willing to boot a different kernel (e.g.
linux-ltsor a 6.14-6.17 series) to A/B test whether this is a kernel-side regression vs. a hardware/firmware-side block? - Should I file an upstream bug at
github.com/amd/xdna-driverwith the dmesg trace as part of Phase 2.1 if no existing report matches? - Is there a known-good NPU workload on a sibling distro (Ubuntu 24.04 + Ryzen AI Software) that we can use as a baseline if we end up needing to dual-boot for confirmation, or do we keep it Arch-pure?
Given the evidence, Phase 2.1 is a kernel/firmware unblock, not a software integration sprint. Concrete steps, in order:
- (15 min) Search xdna-driver and amdxdna issue trackers for
aie2_smu_init -22on Strix Point / HX 370 / 6.19.x. Note any matching reports and proposed fixes. - (15 min) Check linux-firmware git for any
amdnpu/17f0_2*blobs newer than what's in/lib/firmware/amdnpu/. Install if present, retry probe. - (30 min) Check GPD support for BIOS updates beyond 2.10. If a newer release exists with NPU/AIE enablement notes, flag it for the user — do not flash unattended.
- (30 min) Boot
linux-lts(probably 6.12.x), retrysudo modprobe amdxdna && xrt-smi examine. This is the cleanest A/B test for kernel-side regression. - (once /dev/accel exists) Run
flm validate, thenflm serve --port 13306 llama3.2:1b. Hit it withcurl http://127.0.0.1:13306/v1/chat/completions. Verify NPU utilization inxrt-smi. - (15 min) In
src/main.rs, swap the Lemonade base URL for the FastFlowLM URL. Runcargo runagainst the same agent prompt used in Phase 1.5. Diff the latency and tokens/sec against the CPU baseline. - (15 min) Document the win in
PHASE-2-NOTES.md, commit on a newphase-2.0branch, and tag.
Wall clock estimate, assuming the kernel block is unblockable in Linux user-space (i.e. firmware/kernel only): 2-3 hours of focused work. If the only fix is a BIOS update from GPD that doesn't yet exist, Phase 2.1 is PARKED until GPD ships a new BIOS, in which case AgentSpyBoo should be advanced on a different axis (Phase 1.5 polish, mission catalog expansion, output format work) and Phase 2 deferred.
Status: NPU unblocked at driver level, blocked at inference runtime level.
- A background agent root-caused the
aie2_smu_init: Access power failed, ret -22error: the out-of-treeamdxdnadriver's POWER_OFF precheck was incorrect for Strix Point (PCI device17f0). A patched.kowas built at~/builds/amdxdna-patched/on the GPD. - With the patched driver,
modprobe amdxdnanow binds successfully on cold boot.xrt-smi examinereports the NPU device.flm validatepasses green. - Bug filed as
amd/xdna-driver#1257on GitHub. Max Zhen (AMD) responded with follow-up questions; thread closed from our end after correction (see below).
- FastFlowLM cannot handle protocol-7 opcodes used by Qwen3/GGUF models. When
flm serveattempts inference with these models, it errors on unsupported opcodes. This is a FastFlowLM limitation, not a driver issue. - Llama 3.2 1B (protocol-compatible) may work on NPU via
flm serve, but AgentSpyBoo's orchestration LLM is Qwen3-1.7B-GGUF which requires the unsupported opcodes. - Until FastFlowLM adds protocol-7 support (or we switch to a compatible model), NPU inference for AgentSpyBoo remains blocked.
- Filed
amd/xdna-driver#1257— POWER_OFF precheck on Strix Point, cold-boot A/B isolating firmware 1.1.2.64 vs 1.0.0.63. - Also filed
CachyOS/CachyOS-PKGBUILDS#1311— linux-firmware-other symlink inversion affectingamdnpu/17f0_10/firmware path. - Critical correction: initial draft proposed backporting
min_fw_versionfrom the out-of-tree driver, which was wrong — Lizhi Hou's mainline commit75c151ceaacf(merged 2026-02-25) uses a different mechanism (firmware-name fallback tonpu_7.sbin). Draft rewritten owning the mistake.
- Phase 2 CPU track continues as the active path (Lemonade Server, Qwen3-1.7B-GGUF on CPU).
- NPU path is no longer driver-blocked — it's FastFlowLM-blocked. When/if
flmadds protocol-7 support, integration is still a one-line base-URL swap. - The patched
.kois not in the mainline kernel. Cold-booting requiresmodprobewith the patched module each boot until the fix is upstream.