Skip to content

Commit edd4446

Browse files
committed
Add docker docs
1 parent 0151c29 commit edd4446

2 files changed

Lines changed: 225 additions & 1 deletion

File tree

docs/docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,8 @@
125125
"/model-api/docs/task/document-question-answering",
126126
"/model-api/docs/task/visual-question-answering"
127127
]
128-
}
128+
},
129+
"model-api/docs/containers"
129130
]
130131
}
131132
]

docs/model-api/docs/containers.mdx

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
---
2+
title: "Containers"
3+
description: "Run open models locally, offline, and on edge devices"
4+
icon: "docker"
5+
---
6+
7+
Every open-source model on Bytez is available as a Docker image. Pull it, run it, and make requests to `localhost`.
8+
9+
<AccordionGroup>
10+
<Accordion title="Pull an Image" icon="download">
11+
Images are hosted on Docker Hub under the `bytez` namespace. The image name matches the model ID with `/` replaced by `_`.
12+
13+
```bash
14+
# Pattern: bytez/{org}_{model-name}
15+
docker pull bytez/qwen_qwen3-4b
16+
```
17+
18+
<Tip>
19+
Find model IDs at [bytez.com/models](https://bytez.com/models) or via the [List Models API](/http-reference/list/models).
20+
</Tip>
21+
</Accordion>
22+
23+
<Accordion title="Start a Container" icon="play">
24+
```bash
25+
docker run -it \
26+
-e KEY=YOUR_BYTEZ_KEY \
27+
-e PORT=8000 \
28+
-p 8000:8000 \
29+
bytez/qwen_qwen3-4b
30+
```
31+
32+
Get your API key at [bytez.com/settings](https://bytez.com/settings).
33+
34+
**Environment Variables**
35+
36+
| Variable | Required | Default | Description |
37+
|----------|----------|---------|-------------|
38+
| `KEY` | Yes | - | Your Bytez API key (for analytics and update notifications) |
39+
| `PORT` | No | `80` | Port the server listens on inside the container |
40+
| `DEVICE` | No | `auto` | Where to load weights: `auto`, `cuda`, or `cpu` |
41+
42+
**Docker Options**
43+
44+
| Option | Description |
45+
|--------|-------------|
46+
| `--gpus all` | Enable GPU acceleration (requires NVIDIA drivers + CUDA) |
47+
| `-v /local/path:/server/model` | Mount a local directory for weight caching |
48+
| `-p HOST:CONTAINER` | Map container port to host port |
49+
</Accordion>
50+
51+
<Accordion title="Common Configurations" icon="gear">
52+
**Run on GPU**
53+
54+
```bash
55+
docker run -it \
56+
--gpus all \
57+
-e KEY=YOUR_BYTEZ_KEY \
58+
-e PORT=8000 \
59+
-p 8000:8000 \
60+
bytez/qwen_qwen3-4b
61+
```
62+
63+
**Cache Weights Locally**
64+
65+
Avoid re-downloading weights on every run by mounting a local directory:
66+
67+
```bash
68+
docker run -it \
69+
--gpus all \
70+
-v /path/to/cache:/server/model \
71+
-e HF_HOME=/server/model \
72+
-e KEY=YOUR_BYTEZ_KEY \
73+
-e PORT=8000 \
74+
-p 8000:8000 \
75+
bytez/qwen_qwen3-4b
76+
```
77+
78+
<Warning>
79+
If you're going to create the same model container multiple times, then for large models (70B+), caching is highly recommended. Downloads can take hours otherwise.
80+
</Warning>
81+
82+
**Force CPU-Only**
83+
84+
```bash
85+
docker run -it \
86+
-e DEVICE=cpu \
87+
-e KEY=YOUR_BYTEZ_KEY \
88+
-e PORT=8000 \
89+
-p 8000:8000 \
90+
bytez/qwen_qwen3-4b
91+
```
92+
</Accordion>
93+
94+
<Accordion title="Run Inference" icon="bolt">
95+
Once the container is running, send POST requests to `/run`.
96+
97+
**Chat Models**
98+
99+
```bash
100+
curl -X POST http://localhost:8000/run \
101+
-H "Content-Type: application/json" \
102+
-d '{
103+
"messages": [
104+
{ "role": "system", "content": "You are a helpful assistant" },
105+
{ "role": "user", "content": "What is the capital of France?" }
106+
],
107+
"stream": false,
108+
"params": {
109+
"max_new_tokens": 100,
110+
"temperature": 0.7
111+
}
112+
}'
113+
```
114+
115+
**Streaming**
116+
117+
Set `"stream": true` to receive tokens as they're generated:
118+
119+
```bash
120+
curl -X POST http://localhost:8000/run \
121+
-H "Content-Type: application/json" \
122+
-d '{
123+
"messages": [
124+
{ "role": "user", "content": "Write a haiku about coding" }
125+
],
126+
"stream": true
127+
}'
128+
```
129+
130+
</Accordion>
131+
132+
<Accordion title="Request Body by Task" icon="list">
133+
Different model tasks require different inputs. Here's a quick reference:
134+
135+
| Task | Required Fields | Example |
136+
|------|-----------------|---------|
137+
| `chat` | `messages` | `{"messages": [{"role": "user", "content": "Hi"}]}` |
138+
| `text-generation` | `text` | `{"text": "Once upon a time"}` |
139+
| `image-text-to-text` | `messages` with image | `{"messages": [{"role": "user", "content": [{"type": "text", "text": "Describe this"}, {"type": "image", "url": "..."}]}]}` |
140+
| `text-to-image` | `text` | `{"text": "A cat in space"}` |
141+
| `automatic-speech-recognition` | `url` or `base64` | `{"url": "https://example.com/audio.wav"}` |
142+
| `feature-extraction` | `text` | `{"text": "Embed this sentence"}` |
143+
144+
<Card title="Full HTTP Reference" icon="book" href="/http-reference/overview">
145+
See complete request/response params and examples for all 30+ task types.
146+
</Card>
147+
</Accordion>
148+
149+
<Accordion title="Run Offline (Air-Gapped)" icon="wifi-slash">
150+
Create a self-contained image with weights baked in - no internet required at runtime.
151+
152+
**Step 1: Run the container once to download weights**
153+
154+
```bash
155+
docker run \
156+
-e KEY=YOUR_BYTEZ_KEY \
157+
-e PORT=8000 \
158+
-p 8000:8000 \
159+
--name my-model \
160+
bytez/qwen_qwen3-4b
161+
```
162+
163+
Wait for the model to fully load (you'll see logs indicating it's ready). Then press `Ctrl+C` to stop the container.
164+
165+
**Step 2: Save as a new image**
166+
167+
```bash
168+
docker commit my-model my-model-offline
169+
```
170+
171+
**Step 3: Run offline (no internet needed)**
172+
173+
```bash
174+
docker run \
175+
-e KEY=YOUR_BYTEZ_KEY \
176+
-e PORT=8000 \
177+
-p 8000:8000 \
178+
my-model-offline
179+
```
180+
181+
<Tip>
182+
To verify it's truly offline, add `--network none` to the run command.
183+
</Tip>
184+
185+
**Optional: Export for another machine**
186+
187+
```bash
188+
# Save to a file
189+
docker save my-model-offline -o my-model-offline.tar
190+
191+
# Load on another machine
192+
docker load -i my-model-offline.tar
193+
```
194+
</Accordion>
195+
196+
<Accordion title="Troubleshooting" icon="wrench">
197+
**Container won't start**
198+
199+
Check that Docker is installed and running. For GPU support, ensure you have NVIDIA drivers and the NVIDIA Container Toolkit installed.
200+
201+
**Out of memory**
202+
203+
Try `DEVICE=auto` to split the model across GPU and CPU memory. For large models, you may need more VRAM or system RAM.
204+
205+
**Slow first request**
206+
207+
The first request loads model weights into memory. Subsequent requests are fast. Use weight caching (`-v` mount) to speed up container restarts.
208+
209+
**Model only works with specific DEVICE setting**
210+
211+
Some models only support `auto`, `cuda`, or `cpu`. If one doesn't work, try another.
212+
</Accordion>
213+
</AccordionGroup>
214+
215+
---
216+
217+
## Need Help?
218+
219+
<CardGroup cols={2}>
220+
<Card title="Discord" icon="discord" href="https://discord.com/invite/Z723PfCFWf">
221+
Get live support from the community
222+
</Card>
223+
</CardGroup>

0 commit comments

Comments
 (0)