Skip to content

feat: npu workflow#30

Draft
maciejmajek wants to merge 1 commit into
developmentfrom
mm/feat/npu-workflow
Draft

feat: npu workflow#30
maciejmajek wants to merge 1 commit into
developmentfrom
mm/feat/npu-workflow

Conversation

@maciejmajek

Copy link
Copy Markdown
Member

feat: NPU inference setup (FastFlowLM + AMD Ryzen AI)

Adds an optional, one-command setup for running LLM inference on the AMD Ryzen AI NPU via FastFlowLM, targeting the Ryzen AI MAX+ 395 (Strix Halo) machines.

What's in this PR

  • pixi run -e local install-npu-inference — new optional task under the local-inference feature (pixi.toml).
  • scripts/install_npu_inference.sh — idempotent installer that:
    1. NPU drivers — adds the lemonade-team/stable PPA and installs libxrt-npu2 + amdxdna-dkms; skips if already present, flags reboot when freshly installed.
    2. FastFlowLM (from source) — installs build deps (ninja, ffmpeg libs, libxrt-dev, etc.), clones RobotecAI/FastFlowLM into inference/FastFlowLM/, builds with the linux-default CMake preset, and sudo cmake --installs flm. Skips if flm is already on PATH.
    3. memlock limit — checks ulimit -l; appends * soft/hard memlock unlimited to /etc/security/limits.conf if not already unlimited, and flags reboot.
    4. Validate — runs xrt-smi examine + flm validate, capturing output. On success prints a clean ✓ NPU visible & ready / ✓ FLM ready; on failure prints the captured diagnostics plus a troubleshooting block (kernel module loaded? xrt-smi seeing the device?) with links to the FastFlowLM and Lemonade Linux guides.

Tested on

AMD Ryzen AI MAX+ 395, Ubuntu 24.04, kernel 6.17.0-1025-oem.

TODO

  • Wire FastFlowLM into the actual AI stack (RAI) — add an flm serve task analogous to the serve-llm / serve-vlm llama.cpp tasks, exposing an OpenAI-compatible endpoint.
  • Point the agents/orchestrator config at the NPU endpoint and run an end-to-end demo on NPU inference.
  • Pick + document the model(s) served on the NPU (and download path).
  • Decide whether to fold modprobe persistence + IOMMU check into the script or keep them as manual/documented steps.
  • Document install-npu-inference in the setup docs (docs/setup_*.md).

@maciejmajek maciejmajek marked this pull request as draft June 26, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant