feat: add SM87 and SM110 compute capability support for Jetson devices#844
feat: add SM87 and SM110 compute capability support for Jetson devices#844thomas-hiddenpeak wants to merge 2 commits into
Conversation
09ea208 to
ad8f9ab
Compare
alvarobartt
left a comment
There was a problem hiding this comment.
Thanks @thomas-hiddenpeak!
Given that both sm_87 and sm_110 i.e., 8.7 and 11.0, compute capabilities are only for Jetson devices, don't you think it'd be better to create a custom Dockerfile for those as Dockerfile-jetson that builds both targets and comes with an entrypoint to forward to one or the other?
You're already familiar with it but see https://developer.nvidia.com/cuda/gpus
| elif [ ${compute_cap} -ge 80 -a ${compute_cap} -lt 87 ]; then | ||
| exec text-embeddings-router-80 "$@" | ||
| elif [ ${compute_cap} -eq 87 ]; then | ||
| exec text-embeddings-router-87 "$@" |
There was a problem hiding this comment.
Note that this will lead to 8.9 to be considered unsupported whilst 8.9 should indeed fallback to 8.0 compute capability instead if a dedicated target is not built, as there's no performance loss between those compute capabilities (Ampere and Ada Lovelace)
There was a problem hiding this comment.
Good catch! This has been resolved — we've reverted all changes to cuda-all-entrypoint.sh, so the SM89 → SM80 fallback behavior is untouched and works as before.
ad8f9ab to
dcbfbae
Compare
|
Hi @alvarobartt, great suggestion! You're right — both SM87 (Jetson Orin) and SM110 (Jetson Thor) are Jetson-specific compute capabilities, so a dedicated I've updated the PR accordingly:
The Jetson support is now fully isolated — no changes to existing Docker infrastructure. Please take another look when you get a chance. Thanks! 🤗 |
- Add (87, 87) and (110, 110) match arms in compute_cap.rs for dedicated Jetson Orin (SM87) and Jetson Thor (SM110) binary support - Add Dockerfile-jetson: builds SM87 binary using L4T JetPack r36.4.0 (CUDA 12.6) as build base and l4t-cuda:12.6.11-runtime for deployment - Add jetson-entrypoint.sh: runtime GPU detection for Jetson Orin (SM87) - Add comprehensive test coverage for SM87 and SM110 cross-compatibility
dcbfbae to
bf9ecfc
Compare
|
Hi @alvarobartt, a quick follow-up on the latest changes — I realized my previous comment wasn't fully accurate, so I want to clarify what's been updated: What changed since my last comment:
What stays the same:
Sorry for any confusion from the previous message. Let me know if you have any questions or further suggestions! |
alvarobartt
left a comment
There was a problem hiding this comment.
Thanks @thomas-hiddenpeak but I'd also add 11.0 compute capability within the Dockerfile-jetson to route to either 8.7 or 11.0 depending on the host compute capability, right? i.e., this image should work for both rather than dedicated for only one target
|
|
||
| # On Jetson L4T, CUDA libraries are provided by the host via nvidia-container-runtime. | ||
| # Add compat path if it exists. | ||
| if [ -d /usr/local/cuda/compat ]; then | ||
| export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}" | ||
| fi |
There was a problem hiding this comment.
AFAIK this might not be required, right?
| # On Jetson L4T, CUDA libraries are provided by the host via nvidia-container-runtime. | |
| # Add compat path if it exists. | |
| if [ -d /usr/local/cuda/compat ]; then | |
| export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}" | |
| fi |
There was a problem hiding this comment.
You're right — on Jetson L4T, CUDA libraries are mounted from the host by nvidia-container-runtime, so the compat path is typically not needed. I've simplified this to just a conditional guard in case the path exists, but happy to remove it entirely if you prefer. The original cuda-all-entrypoint.sh has more elaborate version checking logic for the standard CUDA images, but that doesn't apply here.
| if [ ${compute_cap} -eq 87 ]; then | ||
| exec text-embeddings-router-87 "$@" |
There was a problem hiding this comment.
Shouldn't we also build the target for text-embeddings-router-110 and add it here based on the host compute capability?
There was a problem hiding this comment.
Yes, that would be the ideal setup! Unfortunately SM110 (Jetson Thor) can't be reliably built with the currently available toolchain — CUDA versions before 13.0 misidentify SM110 as SM101 at compile time, and there are other incompatibilities. JetPack 7 (with CUDA 13.x) will fix this, but its official L4T container images aren't on NGC yet.
The compute_cap.rs changes already include the (110, 110) match arm, so once JetPack 7 images are available, adding SM110 here will be a straightforward update — just add the second build target and this entrypoint route.
|
Hi @alvarobartt, thanks for the feedback! I totally agree that ideally the Jetson image should support both SM87 and SM110 in one build. However, there's a practical blocker for SM110 (Jetson Thor) right now:
What I'd suggest:
This way we get Jetson Orin users unblocked now without shipping something untested for Jetson Thor. Does that sound reasonable to you? |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds Jetson (L4T/JetPack) container support by building a compute-capability-specific router binary and selecting it at runtime, plus extending compute capability matching in the Candle backend.
Changes:
- Introduces a Jetson-specific multi-stage Dockerfile that builds an SM87 CUDA-enabled
text-embeddings-routerbinary. - Adds a Jetson entrypoint script that detects GPU compute capability and dispatches to the correct binary.
- Extends
compute_cap_matchingrules and tests for additional compute capability values.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
jetson-entrypoint.sh |
New runtime detection/dispatch script for Jetson images. |
backends/candle/src/compute_cap.rs |
Updates compute capability matching logic and expands unit tests. |
Dockerfile-jetson |
New Jetson build/runtime image that produces and ships the SM87 router binary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| FROM nvcr.io/nvidia/l4t-cuda:12.6.11-runtime AS base | ||
|
|
||
| ARG DEFAULT_USE_FLASH_ATTENTION=True | ||
|
|
||
| ENV HUGGINGFACE_HUB_CACHE=/data \ | ||
| PORT=80 \ | ||
| USE_FLASH_ATTENTION=$DEFAULT_USE_FLASH_ATTENTION \ | ||
| LD_LIBRARY_PATH="/usr/local/cuda/lib64:${LD_LIBRARY_PATH}" |
| if [ -d /usr/local/cuda/compat ]; then | ||
| export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}" | ||
| fi |
| RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ | ||
| ca-certificates \ | ||
| libssl-dev \ | ||
| curl \ | ||
| && rm -rf /var/lib/apt/lists/* |
| COPY --chmod=775 jetson-entrypoint.sh entrypoint.sh | ||
|
|
||
| ENTRYPOINT ["./entrypoint.sh"] |
| if ! command -v nvidia-smi &>/dev/null; then | ||
| echo "Error: 'nvidia-smi' command not found." | ||
| exit 1 | ||
| fi |
| export LD_LIBRARY_PATH="/usr/local/cuda/compat:${LD_LIBRARY_PATH}" | ||
| fi | ||
|
|
||
| compute_cap=$(nvidia-smi --query-gpu=compute_cap --format=csv | sed -n '2p' | sed 's/\.//g') |
| (87, 87) => true, | ||
| (89, 89) => true, | ||
| (90, 90) => true, | ||
| (100, 100) => true, | ||
| (110, 110) => true, | ||
| (120, 120) => true, | ||
| (120..=121, 120) => true, |
Summary
Add CUDA compute capability support for Jetson devices:
Changes
backends/candle/src/compute_cap.rs(87, 87) => trueand(110, 110) => truematch arms for dedicated Jetson binary supportDockerfile-jetson(new)nvcr.io/nvidia/l4t-jetpack:r36.4.0(JetPack 6.1, CUDA 12.6)nvcr.io/nvidia/l4t-cuda:12.6.11-runtime(minimal aarch64 runtime)jetson-entrypoint.sh(new)Testing
Notes
Dockerfile-cuda-allorcuda-all-entrypoint.shcompute_cap.rsto future-proof; the Dockerfile will be extended when official L4T images for JetPack 7 (Jetson Thor) are available