Add docker docs

NawarA · NawarA · commit edd44464b1ce · 2026-01-06T11:24:51.000-08:00
diff --git a/docs/docs.json b/docs/docs.json
@@ -125,7 +125,8 @@
                   "/model-api/docs/task/document-question-answering",
                   "/model-api/docs/task/visual-question-answering"
                 ]
-              }
+              },
+              "model-api/docs/containers"
             ]
           }
         ]
diff --git a/docs/model-api/docs/containers.mdx b/docs/model-api/docs/containers.mdx
@@ -0,0 +1,223 @@
+---
+title: "Containers"
+description: "Run open models locally, offline, and on edge devices"
+icon: "docker"
+---
+
+Every open-source model on Bytez is available as a Docker image. Pull it, run it, and make requests to `localhost`.
+
+<AccordionGroup>
+  <Accordion title="Pull an Image" icon="download">
+    Images are hosted on Docker Hub under the `bytez` namespace. The image name matches the model ID with `/` replaced by `_`.
+
+    ```bash
+    # Pattern: bytez/{org}_{model-name}
+    docker pull bytez/qwen_qwen3-4b
+    ```
+
+    <Tip>
+    Find model IDs at [bytez.com/models](https://bytez.com/models) or via the [List Models API](/http-reference/list/models).
+    </Tip>
+  </Accordion>
+
+  <Accordion title="Start a Container" icon="play">
+    ```bash
+    docker run -it \
+      -e KEY=YOUR_BYTEZ_KEY \
+      -e PORT=8000 \
+      -p 8000:8000 \
+      bytez/qwen_qwen3-4b
+    ```
+
+    Get your API key at [bytez.com/settings](https://bytez.com/settings).
+
+    **Environment Variables**
+
+    | Variable | Required | Default | Description |
+    |----------|----------|---------|-------------|
+    | `KEY` | Yes | - | Your Bytez API key (for analytics and update notifications) |
+    | `PORT` | No | `80` | Port the server listens on inside the container |
+    | `DEVICE` | No | `auto` | Where to load weights: `auto`, `cuda`, or `cpu` |
+
+    **Docker Options**
+
+    | Option | Description |
+    |--------|-------------|
+    | `--gpus all` | Enable GPU acceleration (requires NVIDIA drivers + CUDA) |
+    | `-v /local/path:/server/model` | Mount a local directory for weight caching |
+    | `-p HOST:CONTAINER` | Map container port to host port |
+  </Accordion>
+
+  <Accordion title="Common Configurations" icon="gear">
+    **Run on GPU**
+
+    ```bash
+    docker run -it \
+      --gpus all \
+      -e KEY=YOUR_BYTEZ_KEY \
+      -e PORT=8000 \
+      -p 8000:8000 \
+      bytez/qwen_qwen3-4b
+    ```
+
+    **Cache Weights Locally**
+
+    Avoid re-downloading weights on every run by mounting a local directory:
+
+    ```bash
+    docker run -it \
+      --gpus all \
+      -v /path/to/cache:/server/model \
+      -e HF_HOME=/server/model \
+      -e KEY=YOUR_BYTEZ_KEY \
+      -e PORT=8000 \
+      -p 8000:8000 \
+      bytez/qwen_qwen3-4b
+    ```
+
+    <Warning>
+    If you're going to create the same model container multiple times, then for large models (70B+), caching is highly recommended. Downloads can take hours otherwise.
+    </Warning>
+
+    **Force CPU-Only**
+
+    ```bash
+    docker run -it \
+      -e DEVICE=cpu \
+      -e KEY=YOUR_BYTEZ_KEY \
+      -e PORT=8000 \
+      -p 8000:8000 \
+      bytez/qwen_qwen3-4b
+    ```
+  </Accordion>
+
+  <Accordion title="Run Inference" icon="bolt">
+    Once the container is running, send POST requests to `/run`.
+
+    **Chat Models**
+
+    ```bash
+    curl -X POST http://localhost:8000/run \
+      -H "Content-Type: application/json" \
+      -d '{
+        "messages": [
+          { "role": "system", "content": "You are a helpful assistant" },
+          { "role": "user", "content": "What is the capital of France?" }
+        ],
+        "stream": false,
+        "params": {
+          "max_new_tokens": 100,
+          "temperature": 0.7
+        }
+      }'
+    ```
+
+    **Streaming**
+
+    Set `"stream": true` to receive tokens as they're generated:
+
+    ```bash
+    curl -X POST http://localhost:8000/run \
+      -H "Content-Type: application/json" \
+      -d '{
+        "messages": [
+          { "role": "user", "content": "Write a haiku about coding" }
+        ],
+        "stream": true
+      }'
+    ```
+
+  </Accordion>
+
+  <Accordion title="Request Body by Task" icon="list">
+    Different model tasks require different inputs. Here's a quick reference:
+
+    | Task | Required Fields | Example |
+    |------|-----------------|---------|
+    | `chat` | `messages` | `{"messages": [{"role": "user", "content": "Hi"}]}` |
+    | `text-generation` | `text` | `{"text": "Once upon a time"}` |
+    | `image-text-to-text` | `messages` with image | `{"messages": [{"role": "user", "content": [{"type": "text", "text": "Describe this"}, {"type": "image", "url": "..."}]}]}` |
+    | `text-to-image` | `text` | `{"text": "A cat in space"}` |
+    | `automatic-speech-recognition` | `url` or `base64` | `{"url": "https://example.com/audio.wav"}` |
+    | `feature-extraction` | `text` | `{"text": "Embed this sentence"}` |
+
+    <Card title="Full HTTP Reference" icon="book" href="/http-reference/overview">
+      See complete request/response params and examples for all 30+ task types.
+    </Card>
+  </Accordion>
+
+  <Accordion title="Run Offline (Air-Gapped)" icon="wifi-slash">
+    Create a self-contained image with weights baked in - no internet required at runtime.
+
+    **Step 1: Run the container once to download weights**
+
+    ```bash
+    docker run \
+      -e KEY=YOUR_BYTEZ_KEY \
+      -e PORT=8000 \
+      -p 8000:8000 \
+      --name my-model \
+      bytez/qwen_qwen3-4b
+    ```
+
+    Wait for the model to fully load (you'll see logs indicating it's ready). Then press `Ctrl+C` to stop the container.
+
+    **Step 2: Save as a new image**
+
+    ```bash
+    docker commit my-model my-model-offline
+    ```
+
+    **Step 3: Run offline (no internet needed)**
+
+    ```bash
+    docker run \
+      -e KEY=YOUR_BYTEZ_KEY \
+      -e PORT=8000 \
+      -p 8000:8000 \
+      my-model-offline
+    ```
+
+    <Tip>
+    To verify it's truly offline, add `--network none` to the run command.
+    </Tip>
+
+    **Optional: Export for another machine**
+
+    ```bash
+    # Save to a file
+    docker save my-model-offline -o my-model-offline.tar
+
+    # Load on another machine
+    docker load -i my-model-offline.tar
+    ```
+  </Accordion>
+
+  <Accordion title="Troubleshooting" icon="wrench">
+    **Container won't start**
+
+    Check that Docker is installed and running. For GPU support, ensure you have NVIDIA drivers and the NVIDIA Container Toolkit installed.
+
+    **Out of memory**
+
+    Try `DEVICE=auto` to split the model across GPU and CPU memory. For large models, you may need more VRAM or system RAM.
+
+    **Slow first request**
+
+    The first request loads model weights into memory. Subsequent requests are fast. Use weight caching (`-v` mount) to speed up container restarts.
+
+    **Model only works with specific DEVICE setting**
+
+    Some models only support `auto`, `cuda`, or `cpu`. If one doesn't work, try another.
+  </Accordion>
+</AccordionGroup>
+
+---
+
+## Need Help?
+
+<CardGroup cols={2}>
+  <Card title="Discord" icon="discord" href="https://discord.com/invite/Z723PfCFWf">
+    Get live support from the community
+  </Card>
+</CardGroup>

Original file line number	Diff line number	Diff line change
`@@ -125,7 +125,8 @@`
`125`	`125`	`"/model-api/docs/task/document-question-answering",`
`126`	`126`	`"/model-api/docs/task/visual-question-answering"`
`127`	`127`	`]`
`128`		`- }`
	`128`	`+ },`
	`129`	`+ "model-api/docs/containers"`
`129`	`130`	`]`
`130`	`131`	`}`
`131`	`132`	`]`