docs(install): note aarch64 wheels are aarch64-sbsa, not L4T (Jetson) (#1941)

neil-the-nowledgeable · claude · web-flow · commit 300a70fd47d2 · 2026-05-07T16:09:59.000-04:00
Adds a WARNING callout after the Linux aarch64 row of the PyPI build-targets table, explaining that: 1. Wheels are built on aarch64-sbsa runners (standard CUDA Toolkit), not the L4T / JetPack runtime that Jetson Orin / Xavier / Thor (on CUDA 12) use. 2. The mismatch surfaces as 'Error named symbol not found in /src/csrc/ops.cu' on the first CUDA op — a symbol-resolution error, NOT a kernel-image-for- device error. The cubins ARE binary-compatible with the device per Ampere-family binary compat (sm_80 SASS runs on sm_87 hardware natively). 3. Working options on Jetson: on-device source build, or third-party prebuilt from Jetson AI Lab. References #1218 and #1930 for the original error reports, and #1939 for the empirical confirmation that the fault is the toolchain delta, not the arch list (sm_80-only cubin built on-device runs cleanly on sm_87 hardware). Co-authored-by: neil-the-nowledgable <254185769+neil-the-nowledgable@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
@@ -66,6 +66,14 @@ Use `pip` or `uv` to install the latest release:
 pip install bitsandbytes
 ```
 
+> [!WARNING]
+> **NVIDIA Jetson (L4T / JetPack) — source build required.** The `Linux aarch64` wheels above are built on aarch64-sbsa runners (server-class ARM with the standard CUDA Toolkit). They are **not compatible** with the L4T runtime on Jetson devices (Orin Nano / NX / AGX, Xavier, Thor on CUDA 12), even though both are aarch64 and even though the cubins are binary-compatible with the device's compute capability (e.g., `sm_80` cubin runs on `sm_87` hardware via Ampere-family binary compat — see [NVIDIA's docs on binary compatibility](https://developer.nvidia.com/blog/understanding-ptx-the-assembly-language-of-cuda-gpu-computing/#binary_compatibility)). The mismatch is at the CUDA library / ABI layer (JetPack ships its own CUDA Toolkit and system libraries), and surfaces as a runtime symbol-resolution error like `Error named symbol not found in /src/csrc/ops.cu` on the first CUDA op.
+>
+> **Two working options on Jetson:**
+>
+> 1. **Source build on-device.** Use the [Compile from Source](#cuda-compile) instructions below, passing your device's compute capability explicitly (sm_87 for Orin family, sm_72 for Xavier). On an Orin Nano Super: `cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY=87 . && make -j4 && pip install .`
+> 2. **Third-party prebuilt** from [Jetson AI Lab's package index](https://pypi.jetson-ai-lab.io/) (e.g., `pypi.jetson-ai-lab.io/jp6/cu126/bitsandbytes/`).
+
 ### Compile from Source[[cuda-compile]]
 
 > [!TIP]