This project provides one-click deployment for GPT-OSS 20B on NVIDIA Jetson devices. It uses the prebuilt Docker image:
chenduola6/got-oss-20b:jp6Docker image size: 31.28 GB
- NVIDIA Jetson device with at least 16GB VRAM
- At least 50GB available disk space
Supported JetPack/L4T versions:
- JetPack 6.2 -> L4T 36.4.0
- JetPack 6.2.1 -> L4T 36.4.3
- JetPack 6.1 -> L4T 36.4.4
PyPI (recommended):
pip install jetson-examplesGitHub (developer):
git clone https://github.com/Seeed-Projects/jetson-examples
cd jetson-examples
pip install .reComputer run gpt-ossThis command pulls the image and starts llama-server in a detached container.
The script waits for /v1/models to become ready before exiting.
Note: The script auto-detects the available GPU run mode on your Jetson (
--runtime nvidiaor--gpus all).Note: If prompted by the script, allow adding your user to the
dockergroup so future runs do not requiresudo docker. After adding the group, log out and log back in once.Note: If
curl /v1/modelsreturns503 {"message":"Loading model"}, the model is still loading. First startup can take several minutes.Note: If startup fails because of memory pressure, add swap space and try again:
sudo fallocate -l 16G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile
You can lower memory usage when launching:
LLAMA_CTX=512 LLAMA_NGL=16 reComputer run gpt-osscurl http://127.0.0.1:8080/v1/modelsdocker logs -f gpt-ossdocker pull chenduola6/got-oss-20b:jp6
docker run -it --rm \
--runtime nvidia \
--network host \
--ipc=host \
chenduola6/got-oss-20b:jp6
# inside the container
cd /root/gpt-oss/llama.cpp
./build/bin/llama-server \
-m /root/gpt-oss/gguf/gpt-oss-20b-Q4_K.gguf \
-ngl 20 -c 1024 \
--host 0.0.0.0 --port 8080Only remove the container (keep image cache):
reComputer clean gpt-oss