|
| 1 | +--- |
| 2 | +title: "Containers" |
| 3 | +description: "Run open models locally, offline, and on edge devices" |
| 4 | +icon: "docker" |
| 5 | +--- |
| 6 | + |
| 7 | +Every open-source model on Bytez is available as a Docker image. Pull it, run it, and make requests to `localhost`. |
| 8 | + |
| 9 | +<AccordionGroup> |
| 10 | + <Accordion title="Pull an Image" icon="download"> |
| 11 | + Images are hosted on Docker Hub under the `bytez` namespace. The image name matches the model ID with `/` replaced by `_`. |
| 12 | + |
| 13 | + ```bash |
| 14 | + # Pattern: bytez/{org}_{model-name} |
| 15 | + docker pull bytez/qwen_qwen3-4b |
| 16 | + ``` |
| 17 | + |
| 18 | + <Tip> |
| 19 | + Find model IDs at [bytez.com/models](https://bytez.com/models) or via the [List Models API](/http-reference/list/models). |
| 20 | + </Tip> |
| 21 | + </Accordion> |
| 22 | + |
| 23 | + <Accordion title="Start a Container" icon="play"> |
| 24 | + ```bash |
| 25 | + docker run -it \ |
| 26 | + -e KEY=YOUR_BYTEZ_KEY \ |
| 27 | + -e PORT=8000 \ |
| 28 | + -p 8000:8000 \ |
| 29 | + bytez/qwen_qwen3-4b |
| 30 | + ``` |
| 31 | + |
| 32 | + Get your API key at [bytez.com/settings](https://bytez.com/settings). |
| 33 | + |
| 34 | + **Environment Variables** |
| 35 | + |
| 36 | + | Variable | Required | Default | Description | |
| 37 | + |----------|----------|---------|-------------| |
| 38 | + | `KEY` | Yes | - | Your Bytez API key (for analytics and update notifications) | |
| 39 | + | `PORT` | No | `80` | Port the server listens on inside the container | |
| 40 | + | `DEVICE` | No | `auto` | Where to load weights: `auto`, `cuda`, or `cpu` | |
| 41 | + |
| 42 | + **Docker Options** |
| 43 | + |
| 44 | + | Option | Description | |
| 45 | + |--------|-------------| |
| 46 | + | `--gpus all` | Enable GPU acceleration (requires NVIDIA drivers + CUDA) | |
| 47 | + | `-v /local/path:/server/model` | Mount a local directory for weight caching | |
| 48 | + | `-p HOST:CONTAINER` | Map container port to host port | |
| 49 | + </Accordion> |
| 50 | + |
| 51 | + <Accordion title="Common Configurations" icon="gear"> |
| 52 | + **Run on GPU** |
| 53 | + |
| 54 | + ```bash |
| 55 | + docker run -it \ |
| 56 | + --gpus all \ |
| 57 | + -e KEY=YOUR_BYTEZ_KEY \ |
| 58 | + -e PORT=8000 \ |
| 59 | + -p 8000:8000 \ |
| 60 | + bytez/qwen_qwen3-4b |
| 61 | + ``` |
| 62 | + |
| 63 | + **Cache Weights Locally** |
| 64 | + |
| 65 | + Avoid re-downloading weights on every run by mounting a local directory: |
| 66 | + |
| 67 | + ```bash |
| 68 | + docker run -it \ |
| 69 | + --gpus all \ |
| 70 | + -v /path/to/cache:/server/model \ |
| 71 | + -e HF_HOME=/server/model \ |
| 72 | + -e KEY=YOUR_BYTEZ_KEY \ |
| 73 | + -e PORT=8000 \ |
| 74 | + -p 8000:8000 \ |
| 75 | + bytez/qwen_qwen3-4b |
| 76 | + ``` |
| 77 | + |
| 78 | + <Warning> |
| 79 | + If you're going to create the same model container multiple times, then for large models (70B+), caching is highly recommended. Downloads can take hours otherwise. |
| 80 | + </Warning> |
| 81 | + |
| 82 | + **Force CPU-Only** |
| 83 | + |
| 84 | + ```bash |
| 85 | + docker run -it \ |
| 86 | + -e DEVICE=cpu \ |
| 87 | + -e KEY=YOUR_BYTEZ_KEY \ |
| 88 | + -e PORT=8000 \ |
| 89 | + -p 8000:8000 \ |
| 90 | + bytez/qwen_qwen3-4b |
| 91 | + ``` |
| 92 | + </Accordion> |
| 93 | + |
| 94 | + <Accordion title="Run Inference" icon="bolt"> |
| 95 | + Once the container is running, send POST requests to `/run`. |
| 96 | + |
| 97 | + **Chat Models** |
| 98 | + |
| 99 | + ```bash |
| 100 | + curl -X POST http://localhost:8000/run \ |
| 101 | + -H "Content-Type: application/json" \ |
| 102 | + -d '{ |
| 103 | + "messages": [ |
| 104 | + { "role": "system", "content": "You are a helpful assistant" }, |
| 105 | + { "role": "user", "content": "What is the capital of France?" } |
| 106 | + ], |
| 107 | + "stream": false, |
| 108 | + "params": { |
| 109 | + "max_new_tokens": 100, |
| 110 | + "temperature": 0.7 |
| 111 | + } |
| 112 | + }' |
| 113 | + ``` |
| 114 | + |
| 115 | + **Streaming** |
| 116 | + |
| 117 | + Set `"stream": true` to receive tokens as they're generated: |
| 118 | + |
| 119 | + ```bash |
| 120 | + curl -X POST http://localhost:8000/run \ |
| 121 | + -H "Content-Type: application/json" \ |
| 122 | + -d '{ |
| 123 | + "messages": [ |
| 124 | + { "role": "user", "content": "Write a haiku about coding" } |
| 125 | + ], |
| 126 | + "stream": true |
| 127 | + }' |
| 128 | + ``` |
| 129 | + |
| 130 | + </Accordion> |
| 131 | + |
| 132 | + <Accordion title="Request Body by Task" icon="list"> |
| 133 | + Different model tasks require different inputs. Here's a quick reference: |
| 134 | + |
| 135 | + | Task | Required Fields | Example | |
| 136 | + |------|-----------------|---------| |
| 137 | + | `chat` | `messages` | `{"messages": [{"role": "user", "content": "Hi"}]}` | |
| 138 | + | `text-generation` | `text` | `{"text": "Once upon a time"}` | |
| 139 | + | `image-text-to-text` | `messages` with image | `{"messages": [{"role": "user", "content": [{"type": "text", "text": "Describe this"}, {"type": "image", "url": "..."}]}]}` | |
| 140 | + | `text-to-image` | `text` | `{"text": "A cat in space"}` | |
| 141 | + | `automatic-speech-recognition` | `url` or `base64` | `{"url": "https://example.com/audio.wav"}` | |
| 142 | + | `feature-extraction` | `text` | `{"text": "Embed this sentence"}` | |
| 143 | + |
| 144 | + <Card title="Full HTTP Reference" icon="book" href="/http-reference/overview"> |
| 145 | + See complete request/response params and examples for all 30+ task types. |
| 146 | + </Card> |
| 147 | + </Accordion> |
| 148 | + |
| 149 | + <Accordion title="Run Offline (Air-Gapped)" icon="wifi-slash"> |
| 150 | + Create a self-contained image with weights baked in - no internet required at runtime. |
| 151 | + |
| 152 | + **Step 1: Run the container once to download weights** |
| 153 | + |
| 154 | + ```bash |
| 155 | + docker run \ |
| 156 | + -e KEY=YOUR_BYTEZ_KEY \ |
| 157 | + -e PORT=8000 \ |
| 158 | + -p 8000:8000 \ |
| 159 | + --name my-model \ |
| 160 | + bytez/qwen_qwen3-4b |
| 161 | + ``` |
| 162 | + |
| 163 | + Wait for the model to fully load (you'll see logs indicating it's ready). Then press `Ctrl+C` to stop the container. |
| 164 | + |
| 165 | + **Step 2: Save as a new image** |
| 166 | + |
| 167 | + ```bash |
| 168 | + docker commit my-model my-model-offline |
| 169 | + ``` |
| 170 | + |
| 171 | + **Step 3: Run offline (no internet needed)** |
| 172 | + |
| 173 | + ```bash |
| 174 | + docker run \ |
| 175 | + -e KEY=YOUR_BYTEZ_KEY \ |
| 176 | + -e PORT=8000 \ |
| 177 | + -p 8000:8000 \ |
| 178 | + my-model-offline |
| 179 | + ``` |
| 180 | + |
| 181 | + <Tip> |
| 182 | + To verify it's truly offline, add `--network none` to the run command. |
| 183 | + </Tip> |
| 184 | + |
| 185 | + **Optional: Export for another machine** |
| 186 | + |
| 187 | + ```bash |
| 188 | + # Save to a file |
| 189 | + docker save my-model-offline -o my-model-offline.tar |
| 190 | + |
| 191 | + # Load on another machine |
| 192 | + docker load -i my-model-offline.tar |
| 193 | + ``` |
| 194 | + </Accordion> |
| 195 | + |
| 196 | + <Accordion title="Troubleshooting" icon="wrench"> |
| 197 | + **Container won't start** |
| 198 | + |
| 199 | + Check that Docker is installed and running. For GPU support, ensure you have NVIDIA drivers and the NVIDIA Container Toolkit installed. |
| 200 | + |
| 201 | + **Out of memory** |
| 202 | + |
| 203 | + Try `DEVICE=auto` to split the model across GPU and CPU memory. For large models, you may need more VRAM or system RAM. |
| 204 | + |
| 205 | + **Slow first request** |
| 206 | + |
| 207 | + The first request loads model weights into memory. Subsequent requests are fast. Use weight caching (`-v` mount) to speed up container restarts. |
| 208 | + |
| 209 | + **Model only works with specific DEVICE setting** |
| 210 | + |
| 211 | + Some models only support `auto`, `cuda`, or `cpu`. If one doesn't work, try another. |
| 212 | + </Accordion> |
| 213 | +</AccordionGroup> |
| 214 | + |
| 215 | +--- |
| 216 | + |
| 217 | +## Need Help? |
| 218 | + |
| 219 | +<CardGroup cols={2}> |
| 220 | + <Card title="Discord" icon="discord" href="https://discord.com/invite/Z723PfCFWf"> |
| 221 | + Get live support from the community |
| 222 | + </Card> |
| 223 | +</CardGroup> |
0 commit comments