Skip to content

Commit 5cda4f1

Browse files
localai-botmudler
andauthored
fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950)
* fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels The L4T13 vllm backend pulled torch / torchvision / torchaudio / vllm from pypi.jetson-ai-lab.io's sbsa/cu130 mirror via [tool.uv.sources] with no version pins. That mirror started shipping torch 2.11.0 next to a vllm-0.20.0+cu130 wheel that was still compiled against torch 2.10's c10 ABI, so uv landed on the mismatched pair and vllm crashed at import: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib (c10::MessageLogger's constructor signature changed between torch 2.10 and 2.11; the vllm wheel referenced the 2.10 form, the installed libc10.so exported only the 2.11 form.) Since torch 2.11 (April 2026) PyPI publishes its own aarch64 + cu130 manylinux wheels, and vllm 0.20.0 ships an aarch64 wheel whose Requires- Dist locks torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0. That makes uv's resolver produce an ABI-consistent set automatically, so the mirror and the [tool.uv.sources] pinning are no longer needed. flash-attn is dropped from the dep list: PyPI has no aarch64 wheel, but vLLM 0.20+ already bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel, so the Dao-AILab package isn't required at runtime. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(vllm): retire l4t13 pyproject.toml in favor of requirements-*.txt pyproject.toml only existed because uv pip install -r requirements.txt doesn't honor [tool.uv.sources]. The previous commit dropped [tool.uv. sources] (PyPI now serves the aarch64 + cu130 wheels directly), so the file no longer carries any logic the requirements-*.txt path can't. Replace with the same two-file pattern every other build profile uses: - requirements-l4t13.txt (accelerate / torch / transformers / bitsandbytes - matches cublas13's split) - requirements-l4t13-after.txt (vllm; runs after the base resolve so the cu130 torch wheel lands first) install.sh's whole l4t13 elif branch goes away; libbackend.sh's installRequirements already handles the requirements-install.txt build- deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and the runProtogen call, so falling through to the standard else: branch produces identical install behavior with less surface area. No functional change at install time - same wheels, same order. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang,vllm-omni): switch L4T13 backends to PyPI aarch64+cu130 wheels Same root cause and same fix as the vllm backend in the previous commits: the L4T13 sglang and vllm-omni backends both pulled their accelerator stack from pypi.jetson-ai-lab.io's sbsa/cu130 mirror with no version pins, so they would silently land on the same torch 2.11 vs cu130-built wheel ABI mismatch the moment the mirror published an out-of-sync pair. sglang ------ - Drop pyproject.toml + [tool.uv.sources]. The historical comment said the [all] extra was unsafe on aarch64 because of decord, but sglang 0.5.x now uses `decord2` on aarch64/arm/armv7l (which ships cp312 aarch64 wheels), so we can match cublas13's sglang[all]>=0.5.11 pin and stop being capped at the 0.5.1.post2 the L4T mirror shipped. That unblocks Gemma 4 / MTP recipes on Jetson Thor. - New requirements-l4t13.txt mirrors the cublas13 split (accelerate / torch / torchvision / torchaudio / transformers), requirements-l4t13- after.txt carries sglang[all]>=0.5.11. - install.sh's l4t13 elif branch goes away; falls through to the standard installRequirements path. vllm-omni --------- - requirements-l4t13.txt drops --extra-index-url to jetson-ai-lab and drops flash-attn (PyPI has no aarch64 wheel, vLLM 0.20+ bundles its own vllm_flash_attn fa2 + fa3 internally). - install.sh's l4t13 vllm-install branch collapses into the cublas13 branch since both now just run `pip install vllm --torch-backend=auto` against PyPI. - --index-strategy=unsafe-best-match is dropped from the top-level l4t13 guard; without the L4T mirror in the picture it had no purpose. The from-source vllm-omni install on top still keeps its existing `sed -i '/^fa3-fwd[[:space:]]*==/d' requirements/cuda.txt` workaround - fa3-fwd has no aarch64 wheel and no sdist, unrelated to flash-attn. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang): drop [all] extra on l4t13 - xatlas has no aarch64 wheel CI revealed that sglang[all]==0.5.12 transitively pulls xatlas via the [diffusion] sub-extra, and xatlas ships no aarch64 wheel. Its sdist depends on scikit_build_core without declaring it in build-system. requires, so under --no-build-isolation uv can't build it from source: × Failed to build `xatlas==0.0.11` ├─▶ The build backend returned an error ╰─▶ Call to `scikit_build_core.build.build_wheel` failed (exit status: 1) ModuleNotFoundError: No module named 'scikit_build_core' help: `xatlas` (v0.0.11) was included because `sglang[all]` (v0.5.12) depends on `xatlas` Upstream sglang explicitly gates st_attn and vsa on `platform_machine != aarch64` inside the same [diffusion] extra but forgot xatlas - same class of bug that bit the old decord pin. Use plain `sglang>=0.5.11` on l4t13. backend.py imports only base sglang.srt symbols (Engine, ServerArgs, FunctionCallParser, ReasoningParser); the [all] extras are optional accelerators not required at import time. cublas13 (x86_64) keeps [all] because xatlas has x86_64 wheels there. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
1 parent c500461 commit 5cda4f1

10 files changed

Lines changed: 61 additions & 204 deletions

backend/python/sglang/install.sh

Lines changed: 5 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -36,15 +36,11 @@ fi
3636
# flash-attn-4 4.0 stable lands.
3737
EXTRA_PIP_INSTALL_FLAGS+=" --prerelease=allow"
3838

39-
# JetPack 7 / L4T arm64 wheels are built for cp312 and shipped via
40-
# pypi.jetson-ai-lab.io. Bump the venv Python so the prebuilt sglang
41-
# wheel resolves cleanly. The actual install on l4t13 goes through
42-
# pyproject.toml (see the elif branch below) so [tool.uv.sources] can
43-
# pin only torch/torchvision/torchaudio/sglang to the jetson-ai-lab
44-
# index — leaving PyPI as the path for transitive deps like
45-
# markdown-it-py / anthropic / propcache that the L4T mirror's proxy
46-
# 503s on. No --index-strategy flag here: the explicit index keeps the
47-
# scoping clean.
39+
# JetPack 7 / L4T arm64 sglang + torch wheels come straight from PyPI now
40+
# (torch 2.11+ ships aarch64 + cu130 manylinux wheels and sglang 0.5.11+
41+
# ships a cp312 aarch64 wheel pinned to that torch). They're cp312-only,
42+
# so bump the venv Python accordingly.
43+
# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/
4844
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
4945
PYTHON_VERSION="3.12"
5046
PYTHON_PATCH="12"
@@ -110,27 +106,6 @@ if [ "x${BUILD_TYPE}" == "x" ] || [ "x${FROM_SOURCE:-}" == "xtrue" ]; then
110106
fi
111107
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} .
112108
popd
113-
# L4T arm64 (JetPack 7): drive the install through pyproject.toml so that
114-
# [tool.uv.sources] can pin torch/torchvision/torchaudio/sglang to the
115-
# jetson-ai-lab index, while everything else (transitive deps and
116-
# PyPI-resolvable packages like transformers / accelerate) comes from
117-
# PyPI. Bypasses installRequirements because uv pip install -r
118-
# requirements.txt does not honor sources — see
119-
# backend/python/sglang/pyproject.toml for the rationale. Mirrors the
120-
# equivalent path in backend/python/vllm/install.sh.
121-
elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
122-
ensureVenv
123-
if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then
124-
export C_INCLUDE_PATH="${C_INCLUDE_PATH:-}:$(_portable_dir)/include/python${PYTHON_VERSION}"
125-
fi
126-
pushd "${backend_dir}"
127-
# Build deps first (matches installRequirements' requirements-install.txt
128-
# pass — sglang/sgl-kernel sdists need packaging/setuptools-scm in the
129-
# venv before they can build under --no-build-isolation).
130-
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} -r requirements-install.txt
131-
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --requirement pyproject.toml
132-
popd
133-
runProtogen
134109
else
135110
installRequirements
136111
fi

backend/python/sglang/pyproject.toml

Lines changed: 0 additions & 68 deletions
This file was deleted.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# sglang 0.5.11+ ships an aarch64 manylinux wheel on PyPI whose Requires-Dist
2+
# pins torch==2.11.0 / torchaudio==2.11.0, locking an ABI-consistent set with
3+
# the cu130 torch wheel installed above. 0.5.11 is the floor for Gemma 4
4+
# support (sgl-project/sglang#21952).
5+
#
6+
# The [all] extra is deliberately NOT used on aarch64: it pulls the
7+
# [diffusion] sub-extra which requires `xatlas`, and xatlas ships no
8+
# aarch64 wheel and its sdist depends on scikit_build_core without
9+
# declaring it in build-system.requires — so under --no-build-isolation
10+
# uv can't build it. Upstream sglang gates st_attn and vsa on
11+
# platform_machine != aarch64 in the diffusion extra but forgot xatlas.
12+
# Plain `sglang` carries everything backend.py uses (Engine, ServerArgs,
13+
# FunctionCallParser, ReasoningParser); the [all] extras are optional
14+
# accelerators not required at import time.
15+
sglang>=0.5.11
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# JetPack 7 / L4T arm64 + CUDA 13. Since PyTorch 2.11 (April 2026), PyPI ships
2+
# aarch64 + cu130 manylinux wheels for torch/torchvision/torchaudio directly,
3+
# so we no longer need a custom --extra-index-url for the L4T mirror.
4+
# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/
5+
accelerate
6+
torch
7+
torchvision
8+
torchaudio
9+
transformers

backend/python/vllm-omni/install.sh

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,14 @@ else
1313
fi
1414

1515
# Handle l4t build profiles (Python 3.12, pip fallback) if needed.
16-
# unsafe-best-match is required on l4t13 because the jetson-ai-lab index
17-
# lists transitive deps at limited versions — without it uv pins to the
18-
# first matching index and fails to resolve a compatible wheel from PyPI.
16+
# Since PyTorch 2.11 (April 2026) PyPI ships aarch64 + cu130 manylinux wheels
17+
# directly for torch/torchvision/torchaudio and an aarch64 vllm wheel pinned
18+
# to that torch, so the jetson-ai-lab mirror is no longer needed.
19+
# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/
1920
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
2021
PYTHON_VERSION="3.12"
2122
PYTHON_PATCH="12"
2223
PY_STANDALONE_TAG="20251120"
23-
EXTRA_PIP_INSTALL_FLAGS="${EXTRA_PIP_INSTALL_FLAGS:-} --index-strategy=unsafe-best-match"
2424
fi
2525

2626
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
@@ -42,18 +42,11 @@ if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
4242
else
4343
uv pip install vllm==0.14.0 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
4444
fi
45-
elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
46-
# JetPack 7 / L4T arm64 cu130 — vllm comes from the prebuilt SBSA wheel
47-
# at jetson-ai-lab. Version is unpinned: the index ships whatever build
48-
# matches the cu130/cp312 ABI. unsafe-best-match lets uv fall through
49-
# to PyPI for transitive deps not present on the jetson-ai-lab index.
50-
if [ "x${USE_PIP}" == "xtrue" ]; then
51-
pip install vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
52-
else
53-
uv pip install --index-strategy=unsafe-best-match vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
54-
fi
55-
elif [ "x${BUILD_PROFILE}" == "xcublas13" ]; then
56-
# vllm 0.19+ defaults to cu130 wheels on PyPI, no extra index needed.
45+
elif [ "x${BUILD_PROFILE}" == "xcublas13" ] || [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
46+
# cublas13 (x86_64) and l4t13 (aarch64) both pull vllm from PyPI now:
47+
# vllm 0.19+ defaults to cu130 wheels on x86_64 and vllm 0.20+ ships an
48+
# aarch64 manylinux wheel pinned to torch==2.11.0. No extra index needed
49+
# in either case.
5750
if [ "x${USE_PIP}" == "xtrue" ]; then
5851
pip install vllm --torch-backend=auto
5952
else

backend/python/vllm-omni/requirements-l4t13.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1-
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
1+
# JetPack 7 / L4T arm64 + CUDA 13. PyPI ships aarch64 + cu130 manylinux wheels
2+
# for torch/torchvision/torchaudio directly since PyTorch 2.11 (April 2026),
3+
# so no custom index is needed. flash-attn is dropped here: PyPI has no
4+
# aarch64 wheel for it, but vLLM 0.20+ bundles its own vllm_flash_attn
5+
# (fa2 + fa3) inside the main wheel, so it is not required at runtime.
6+
# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/
27
accelerate
38
torch
49
torchvision
510
torchaudio
611
transformers
712
bitsandbytes
8-
flash-attn
913
diffusers
1014
librosa
1115
soundfile

backend/python/vllm/install.sh

Lines changed: 5 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,11 @@ if [ "x${BUILD_PROFILE}" == "xcublas13" ]; then
4343
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
4444
fi
4545

46-
# JetPack 7 / L4T arm64 wheels (torch, vllm, flash-attn) live on
47-
# pypi.jetson-ai-lab.io and are built for cp312, so bump the venv Python
48-
# accordingly. JetPack 6 keeps cp310 + USE_PIP=true.
49-
#
50-
# l4t13 uses pyproject.toml (see the elif branch below) to pin only the
51-
# L4T-specific wheels to the jetson-ai-lab index via [tool.uv.sources].
52-
# That keeps PyPI as the resolution path for transitive deps like
53-
# anthropic/openai/propcache, which the L4T mirror's proxy 503s on.
46+
# JetPack 7 / L4T arm64 vllm + torch wheels come straight from PyPI now
47+
# (torch 2.11+ ships aarch64 + cu130 manylinux wheels and vllm 0.20+ ships
48+
# an aarch64 wheel pinned to that torch). They're cp312-only, so bump the
49+
# venv Python accordingly. JetPack 6 keeps cp310 + USE_PIP=true.
50+
# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/
5451
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
5552
USE_PIP=true
5653
fi
@@ -103,25 +100,6 @@ if [ "x${BUILD_TYPE}" == "xintel" ]; then
103100
export CMAKE_PREFIX_PATH="$(python -c 'import site; print(site.getsitepackages()[0])'):${CMAKE_PREFIX_PATH:-}"
104101
VLLM_TARGET_DEVICE=xpu uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --no-deps .
105102
popd
106-
# L4T arm64 (JetPack 7): drive the install through pyproject.toml so that
107-
# [tool.uv.sources] can pin torch/vllm/flash-attn/torchvision/torchaudio
108-
# to the jetson-ai-lab index, while everything else (transitive deps and
109-
# PyPI-resolvable packages like transformers) comes from PyPI. Bypasses
110-
# installRequirements because uv pip install -r requirements.txt does not
111-
# honor sources — see backend/python/vllm/pyproject.toml for the rationale.
112-
elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
113-
ensureVenv
114-
if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then
115-
export C_INCLUDE_PATH="${C_INCLUDE_PATH:-}:$(_portable_dir)/include/python${PYTHON_VERSION}"
116-
fi
117-
pushd "${backend_dir}"
118-
# Build deps first (matches installRequirements' requirements-install.txt
119-
# pass — fastsafetensors and friends need pybind11 in the venv before
120-
# their sdists can build under --no-build-isolation).
121-
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} -r requirements-install.txt
122-
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --requirement pyproject.toml
123-
popd
124-
runProtogen
125103
# FROM_SOURCE=true on a CPU build skips the prebuilt vllm wheel in
126104
# requirements-cpu-after.txt and compiles vllm locally against the host's
127105
# actual CPU. Not used by default because it takes ~30-40 minutes, but

backend/python/vllm/pyproject.toml

Lines changed: 0 additions & 61 deletions
This file was deleted.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# vLLM 0.20+ ships an aarch64 manylinux wheel on PyPI whose Requires-Dist pins
2+
# torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0, locking an ABI-
3+
# consistent set with the cu130 torch wheel installed above.
4+
vllm
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# JetPack 7 / L4T arm64 + CUDA 13. Since PyTorch 2.11 (April 2026), PyPI ships
2+
# aarch64 + cu130 manylinux wheels for torch/torchvision/torchaudio directly,
3+
# so we no longer need a custom --extra-index-url for the L4T mirror.
4+
# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/
5+
accelerate
6+
torch
7+
transformers
8+
bitsandbytes

0 commit comments

Comments
 (0)