feat: add SM87 and SM110 compute capability support for Jetson devices by thomas-hiddenpeak · Pull Request #844 · huggingface/text-embeddings-inference

thomas-hiddenpeak · 2026-03-12T16:54:38Z

Summary

Add CUDA compute capability support for Jetson devices:

SM87 (8.7) — Jetson Orin series (Ampere architecture)
SM110 (11.0) — Jetson Thor (Blackwell architecture)

Changes

`backends/candle/src/compute_cap.rs`

Add (87, 87) => true and (110, 110) => true match arms for dedicated Jetson binary support
Add comprehensive test coverage for SM87 and SM110 cross-compatibility

`Dockerfile-jetson` (new)

Dedicated Dockerfile for Jetson devices using official NVIDIA L4T images
Build stage: nvcr.io/nvidia/l4t-jetpack:r36.4.0 (JetPack 6.1, CUDA 12.6)
Runtime stage: nvcr.io/nvidia/l4t-cuda:12.6.11-runtime (minimal aarch64 runtime)
Currently builds SM87 (Jetson Orin) target only
SM110 (Jetson Thor) will be added when JetPack 7 L4T images become available on NGC

`jetson-entrypoint.sh` (new)

Runtime GPU detection and routing to the SM87 binary
Simplified CUDA compat handling for L4T environment

Testing

SM87: Verified on Jetson AGX Orin 64GB with JetPack 6.1 / CUDA 12.6
Both embedding and reranking models work correctly

Notes

No changes to existing Dockerfile-cuda-all or cuda-all-entrypoint.sh
Jetson support is fully isolated in its own Dockerfile as suggested by @alvarobartt
SM110 runtime matching is included in compute_cap.rs to future-proof; the Dockerfile will be extended when official L4T images for JetPack 7 (Jetson Thor) are available

alvarobartt

Thanks @thomas-hiddenpeak!

Given that both sm_87 and sm_110 i.e., 8.7 and 11.0, compute capabilities are only for Jetson devices, don't you think it'd be better to create a custom Dockerfile for those as Dockerfile-jetson that builds both targets and comes with an entrypoint to forward to one or the other?

You're already familiar with it but see https://developer.nvidia.com/cuda/gpus

alvarobartt · 2026-03-12T18:40:55Z

+elif [ ${compute_cap} -ge 80 -a ${compute_cap} -lt 87 ]; then
    exec text-embeddings-router-80 "$@"
+elif [ ${compute_cap} -eq 87 ]; then
+    exec text-embeddings-router-87 "$@"


Note that this will lead to 8.9 to be considered unsupported whilst 8.9 should indeed fallback to 8.0 compute capability instead if a dedicated target is not built, as there's no performance loss between those compute capabilities (Ampere and Ada Lovelace)

Good catch! This has been resolved — we've reverted all changes to cuda-all-entrypoint.sh, so the SM89 → SM80 fallback behavior is untouched and works as before.

thomas-hiddenpeak · 2026-03-12T19:31:50Z

Hi @alvarobartt, great suggestion! You're right — both SM87 (Jetson Orin) and SM110 (Jetson Thor) are Jetson-specific compute capabilities, so a dedicated Dockerfile-jetson makes much more sense than modifying the existing Dockerfile-cuda-all.

I've updated the PR accordingly:

Reverted all changes to Dockerfile-cuda-all, cuda-all-entrypoint.sh, and matrix.json
Added Dockerfile-jetson: builds both SM87 and SM110 targets in a single image, with architecture-aware sccache (supports aarch64 natively)
Added jetson-entrypoint.sh: detects GPU compute capability at runtime and routes to the correct binary
Kept the compute_cap.rs changes for SM87/SM110 matching logic and tests

The Jetson support is now fully isolated — no changes to existing Docker infrastructure. Please take another look when you get a chance. Thanks! 🤗

- Add (87, 87) and (110, 110) match arms in compute_cap.rs for dedicated Jetson Orin (SM87) and Jetson Thor (SM110) binary support - Add Dockerfile-jetson: builds SM87 binary using L4T JetPack r36.4.0 (CUDA 12.6) as build base and l4t-cuda:12.6.11-runtime for deployment - Add jetson-entrypoint.sh: runtime GPU detection for Jetson Orin (SM87) - Add comprehensive test coverage for SM87 and SM110 cross-compatibility

thomas-hiddenpeak · 2026-03-12T19:50:28Z

Hi @alvarobartt, a quick follow-up on the latest changes — I realized my previous comment wasn't fully accurate, so I want to clarify what's been updated:

What changed since my last comment:

Base images: Switched from nvidia/cuda to official NVIDIA L4T images, which are purpose-built for Jetson:
- Build stage: nvcr.io/nvidia/l4t-jetpack:r36.4.0 (JetPack 6.1, CUDA 12.6, aarch64)
- Runtime stage: nvcr.io/nvidia/l4t-cuda:12.6.11-runtime (lightweight aarch64 runtime)
SM110 removed from Dockerfile: My previous comment mentioned building both SM87 and SM110 targets, but after further research, JetPack 7 (which supports Jetson Thor / SM110) doesn't have official L4T container images on NGC yet. So the Dockerfile now only builds SM87 (Jetson Orin) to keep things working and verifiable today.
SM110 kept in compute_cap.rs: The runtime matching logic for SM110 is still included as a forward-looking addition — it will be ready to use once JetPack 7 L4T images become available and we extend the Dockerfile.

What stays the same:

Fully isolated Jetson support via Dockerfile-jetson + jetson-entrypoint.sh (no changes to existing Docker infrastructure)
compute_cap.rs SM87/SM110 matching logic and comprehensive tests

Sorry for any confusion from the previous message. Let me know if you have any questions or further suggestions!

alvarobartt

Thanks @thomas-hiddenpeak but I'd also add 11.0 compute capability within the Dockerfile-jetson to route to either 8.7 or 11.0 depending on the host compute capability, right? i.e., this image should work for both rather than dedicated for only one target

alvarobartt · 2026-03-13T18:25:43Z

+
+# On Jetson L4T, CUDA libraries are provided by the host via nvidia-container-runtime.
+# Add compat path if it exists.
+if [ -d /usr/local/cuda/compat ]; then
+    export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}"
+fi


AFAIK this might not be required, right?

Suggested change

# On Jetson L4T, CUDA libraries are provided by the host via nvidia-container-runtime.

# Add compat path if it exists.

if [ -d /usr/local/cuda/compat ]; then

export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}"

fi

You're right — on Jetson L4T, CUDA libraries are mounted from the host by nvidia-container-runtime, so the compat path is typically not needed. I've simplified this to just a conditional guard in case the path exists, but happy to remove it entirely if you prefer. The original cuda-all-entrypoint.sh has more elaborate version checking logic for the standard CUDA images, but that doesn't apply here.

alvarobartt · 2026-03-13T18:26:27Z

+if [ ${compute_cap} -eq 87 ]; then
+    exec text-embeddings-router-87 "$@"


Shouldn't we also build the target for text-embeddings-router-110 and add it here based on the host compute capability?

Yes, that would be the ideal setup! Unfortunately SM110 (Jetson Thor) can't be reliably built with the currently available toolchain — CUDA versions before 13.0 misidentify SM110 as SM101 at compile time, and there are other incompatibilities. JetPack 7 (with CUDA 13.x) will fix this, but its official L4T container images aren't on NGC yet.

The compute_cap.rs changes already include the (110, 110) match arm, so once JetPack 7 images are available, adding SM110 here will be a straightforward update — just add the second build target and this entrypoint route.

thomas-hiddenpeak · 2026-03-15T14:18:53Z

Hi @alvarobartt, thanks for the feedback! I totally agree that ideally the Jetson image should support both SM87 and SM110 in one build.

However, there's a practical blocker for SM110 (Jetson Thor) right now:

CUDA version limitation: The current L4T images ship with CUDA 12.6 (JetPack 6.x). CUDA versions before 13.0 misidentify SM110 as SM101 at compile time, and there are other incompatibilities as well — so we can't reliably build a working SM110 binary with the available toolchain today.
No JetPack 7 L4T images on NGC yet: JetPack 7, which will ship with CUDA 13.x and proper SM110 support, doesn't have official L4T container images available on NGC at this time.

What I'd suggest:

Merge the current PR with SM87 Dockerfile support (Jetson Orin — verified and working)
The compute_cap.rs changes already include SM110 matching logic, so the runtime side is ready
Once JetPack 7 L4T images land on NGC, we can add SM110 to the Dockerfile-jetson as a straightforward follow-up — just adding the second build target and entrypoint route

This way we get Jetson Orin users unblocked now without shipping something untested for Jetson Thor. Does that sound reasonable to you?

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds Jetson (L4T/JetPack) container support by building a compute-capability-specific router binary and selecting it at runtime, plus extending compute capability matching in the Candle backend.

Changes:

Introduces a Jetson-specific multi-stage Dockerfile that builds an SM87 CUDA-enabled text-embeddings-router binary.
Adds a Jetson entrypoint script that detects GPU compute capability and dispatches to the correct binary.
Extends compute_cap_matching rules and tests for additional compute capability values.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
`jetson-entrypoint.sh`	New runtime detection/dispatch script for Jetson images.
`backends/candle/src/compute_cap.rs`	Updates compute capability matching logic and expands unit tests.
`Dockerfile-jetson`	New Jetson build/runtime image that produces and ships the SM87 router binary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+FROM nvcr.io/nvidia/l4t-cuda:12.6.11-runtime AS base
+
+ARG DEFAULT_USE_FLASH_ATTENTION=True
+
+ENV HUGGINGFACE_HUB_CACHE=/data \
+    PORT=80 \
+    USE_FLASH_ATTENTION=$DEFAULT_USE_FLASH_ATTENTION \
+    LD_LIBRARY_PATH="/usr/local/cuda/lib64:${LD_LIBRARY_PATH}"


+if [ -d /usr/local/cuda/compat ]; then
+    export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}"
+fi


+RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+    ca-certificates \
+    libssl-dev \
+    curl \
+    && rm -rf /var/lib/apt/lists/*


+COPY --chmod=775 jetson-entrypoint.sh entrypoint.sh
+
+ENTRYPOINT ["./entrypoint.sh"]


+if ! command -v nvidia-smi &>/dev/null; then
+    echo "Error: 'nvidia-smi' command not found."
+    exit 1
+fi


+    export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}"
+fi
+
+compute_cap=$(nvidia-smi --query-gpu=compute_cap --format=csv | sed -n '2p' | sed 's/\.//g')


+        (87, 87) => true,
        (89, 89) => true,
        (90, 90) => true,
        (100, 100) => true,
+        (110, 110) => true,
+        (120, 120) => true,
        (120..=121, 120) => true,


thomas-hiddenpeak force-pushed the split-sm87-support branch 2 times, most recently from 09ea208 to ad8f9ab Compare March 12, 2026 17:23

thomas-hiddenpeak changed the title ~~Add SM87 Docker image support~~ Add SM87 and SM110 compute capability support Mar 12, 2026

This was referenced Mar 12, 2026

[Feature] Add native Qwen3-Reranker support and SM87 compute capability #795

Closed

add Jetson Orin support #467

Closed

alvarobartt requested changes Mar 12, 2026

View reviewed changes

alvarobartt added this to the v1.10.0 milestone Mar 12, 2026

thomas-hiddenpeak force-pushed the split-sm87-support branch from ad8f9ab to dcbfbae Compare March 12, 2026 19:31

thomas-hiddenpeak changed the title ~~Add SM87 and SM110 compute capability support~~ feat: add SM87 and SM110 compute capability support for Jetson devices Mar 12, 2026

thomas-hiddenpeak force-pushed the split-sm87-support branch from dcbfbae to bf9ecfc Compare March 12, 2026 19:45

alvarobartt reviewed Mar 13, 2026

View reviewed changes

Merge branch 'main' into split-sm87-support

fcb4fe9

Copilot AI review requested due to automatic review settings June 4, 2026 10:55

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add SM87 and SM110 compute capability support for Jetson devices#844

feat: add SM87 and SM110 compute capability support for Jetson devices#844
thomas-hiddenpeak wants to merge 2 commits into
huggingface:mainfrom
thomas-hiddenpeak:split-sm87-support

thomas-hiddenpeak commented Mar 12, 2026 •

edited

Loading

Uh oh!

alvarobartt left a comment

Uh oh!

alvarobartt Mar 12, 2026

Uh oh!

thomas-hiddenpeak Mar 15, 2026

Uh oh!

thomas-hiddenpeak commented Mar 12, 2026

Uh oh!

thomas-hiddenpeak commented Mar 12, 2026

Uh oh!

alvarobartt left a comment

Uh oh!

alvarobartt Mar 13, 2026

Uh oh!

thomas-hiddenpeak Mar 15, 2026

Uh oh!

alvarobartt Mar 13, 2026

Uh oh!

thomas-hiddenpeak Mar 15, 2026

Uh oh!

thomas-hiddenpeak commented Mar 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if [ ${compute_cap} -eq 87 ]; then
		exec text-embeddings-router-87 "$@"

		COPY --chmod=775 jetson-entrypoint.sh entrypoint.sh

		ENTRYPOINT ["./entrypoint.sh"]

Uh oh!

Conversation

thomas-hiddenpeak commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

backends/candle/src/compute_cap.rs

Dockerfile-jetson (new)

jetson-entrypoint.sh (new)

Testing

Notes

Uh oh!

alvarobartt left a comment

Choose a reason for hiding this comment

Uh oh!

alvarobartt Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

thomas-hiddenpeak Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

thomas-hiddenpeak commented Mar 12, 2026

Uh oh!

thomas-hiddenpeak commented Mar 12, 2026

Uh oh!

alvarobartt left a comment

Choose a reason for hiding this comment

Uh oh!

alvarobartt Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

thomas-hiddenpeak Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

alvarobartt Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

thomas-hiddenpeak Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

thomas-hiddenpeak commented Mar 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thomas-hiddenpeak commented Mar 12, 2026 •

edited

Loading

`backends/candle/src/compute_cap.rs`

`Dockerfile-jetson` (new)

`jetson-entrypoint.sh` (new)