Skip to content

Commit bb02788

Browse files
[Docs] Replica groups (#3511)
* Add Replica Groups Docs * Minor edits --------- Co-authored-by: Bihan Rana Co-authored-by: peterschmidt85 <andrey.cheptsov@gmail.com>
1 parent 763092d commit bb02788

File tree

1 file changed

+51
-0
lines changed

1 file changed

+51
-0
lines changed

docs/docs/concepts/services.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,57 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
164164

165165
> The `scaling` property requires creating a [gateway](gateways.md).
166166

167+
??? info "Replica groups"
168+
A service can include multiple replica groups. Each group can define its own `commands`, `resources` requirements, and `scaling` rules.
169+
170+
<div editor-title="service.dstack.yml">
171+
172+
```yaml
173+
type: service
174+
name: llama-8b-service
175+
176+
image: lmsysorg/sglang:latest
177+
env:
178+
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
179+
180+
replicas:
181+
- count: 1..2
182+
scaling:
183+
metric: rps
184+
target: 10
185+
commands:
186+
- |
187+
python -m sglang.launch_server \
188+
--model-path $MODEL_ID \
189+
--port 8000 \
190+
--trust-remote-code
191+
resources:
192+
gpu: 48GB
193+
194+
- count: 1..4
195+
scaling:
196+
metric: rps
197+
target: 5
198+
commands:
199+
- |
200+
python -m sglang.launch_server \
201+
--model-path $MODEL_ID \
202+
--port 8000 \
203+
--trust-remote-code
204+
resources:
205+
gpu: 24GB
206+
207+
port: 8000
208+
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
209+
```
210+
211+
</div>
212+
213+
> Properties such as `regions`, `port`, `image`, `env` and some other cannot be configured per replica group. This support is coming soon.
214+
215+
??? info "Disaggregated serving"
216+
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
217+
167218
### Model
168219

169220
If the service is running a chat model with an OpenAI-compatible interface,

0 commit comments

Comments
 (0)