You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
make run LLAMA_ARGS="--verbose --jinja -ngl 999 --ctx-size 2048"
111
-
112
-
# Run tests to verify the build
113
-
make test
102
+
make
114
103
```
115
104
116
-
The `model-runner` binary will be created in the current directory. This is the backend server that manages models.
117
-
118
-
#### Step 2: Build model-cli (Client)
105
+
`dmr` starts the server on a free port, waits for it to be ready, runs your CLI command, then shuts the server down:
119
106
120
107
```bash
121
-
# From the root directory, navigate to the model-cli directory
122
-
cd cmd/cli
123
-
124
-
# Build the CLI binary
125
-
make build
126
-
127
-
# The binary will be named 'model-cli'
128
-
# Optionally, install it as a Docker CLI plugin
129
-
make install # This will link it to ~/.docker/cli-plugins/docker-model
108
+
./dmr run ai/smollm2 "Hello, how are you?"
109
+
./dmr ls
110
+
./dmr run qwen3:0.6B-Q4_0 tell me today's news
130
111
```
131
112
113
+
These components can also be built, run, and tested separately using the Makefile.
114
+
132
115
### Testing the Complete Stack End-to-End
133
116
134
117
> **Note:** We use port 13434 in these examples to avoid conflicts with Docker Desktop's built-in Model Runner, which typically runs on port 12434.
135
118
136
-
#### Option 1: Local Development (Recommended for Contributors)
119
+
#### Option 1: Manual two-terminal setup
137
120
138
121
1. **Start model-runner in one terminal:**
139
122
```bash
140
-
cd model-runner
141
123
MODEL_RUNNER_PORT=13434 ./model-runner
142
-
# The server will start on port 13434
143
124
```
144
125
145
126
2. **Use model-cli in another terminal:**
146
127
```bash
147
-
cd cmd/cli
148
-
# List available models (connecting to port 13434)
149
-
MODEL_RUNNER_HOST=http://localhost:13434 ./model-cli list
128
+
# List available models
129
+
MODEL_RUNNER_HOST=http://localhost:13434 ./cmd/cli/model-cli list
150
130
151
131
# Pull and run a model
152
-
MODEL_RUNNER_HOST=http://localhost:13434 ./model-cli run ai/smollm2 "Hello, how are you?"
132
+
MODEL_RUNNER_HOST=http://localhost:13434 ./cmd/cli/model-cli run ai/smollm2 "Hello, how are you?"
153
133
```
154
134
155
135
#### Option 2: Using Docker
@@ -422,6 +402,118 @@ in the form of [a Helm chart and static YAML](charts/docker-model-runner/README.
422
402
If you are interested in a specific Kubernetes use-case, please start a
423
403
discussion on the issue tracker.
424
404
405
+
<<<<<<< Updated upstream
406
+
=======
407
+
## dmrlet: Container Orchestrator for AI Inference
408
+
409
+
dmrlet is a purpose-built container orchestrator for AI inference workloads. Unlike Kubernetes, it focuses exclusively on running stateless inference containers with zero configuration overhead. Multi-GPU mapping "just works" without YAML, device plugins, or node selectors.
returnfmt.Errorf("missing %s binary at %s\n\nExpected directory layout:\n%s\n\nPlease run 'make build' to build all binaries", name, path, expectedLayout)
0 commit comments