Skip to content

Commit 40cb4a0

Browse files
BihanBihan  Ranapeterschmidt85
authored
Add sglang_router details in examples, gateway and refs (#3313)
* Add sglang_router details in examples, gateway and refs * [Docs] Improve the `sglang` router configuration with gateways --------- Co-authored-by: Bihan Rana <bihan@Bihans-MacBook-Pro.local> Co-authored-by: peterschmidt85 <andrey.cheptsov@gmail.com>
1 parent 7bae836 commit 40cb4a0

File tree

5 files changed

+89
-41
lines changed

5 files changed

+89
-41
lines changed

docs/docs/concepts/gateways.md

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
# Gateways
22

3-
Gateways manage the ingress traffic of running [services](services.md),
4-
provide an HTTPS endpoint mapped to your domain, handle auto-scaling and rate limits.
3+
Gateways manage ingress traffic for running [services](services.md), handle auto-scaling and rate limits, enable HTTPS, and allow you to configure a custom domain. They also support custom routers, such as the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}.
54

6-
> If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
7-
> the gateway is already set up for you.
5+
<!-- > If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
6+
> the gateway is already set up for you. -->
87

98
## Apply a configuration
109

@@ -57,6 +56,48 @@ You can create gateways with the `aws`, `azure`, `gcp`, or `kubernetes` backends
5756
Gateways in `kubernetes` backend require an external load balancer. Managed Kubernetes solutions usually include a load balancer.
5857
For self-hosted Kubernetes, you must provide a load balancer by yourself.
5958

59+
### Router
60+
61+
By default, the gateway uses its own load balancer to route traffic between replicas. However, you can delegate this responsibility to a specific router by setting the `router` property. Currently, the only supported external router is `sglang`.
62+
63+
#### SGLang
64+
65+
The `sglang` router delegates routing logic to the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}.
66+
67+
To enable it, set `type` field under `router` to `sglang`:
68+
69+
<div editor-title="gateway.dstack.yml">
70+
71+
```yaml
72+
type: gateway
73+
name: sglang-gateway
74+
75+
backend: aws
76+
region: eu-west-1
77+
78+
domain: example.com
79+
80+
router:
81+
type: sglang
82+
policy: cache_aware
83+
```
84+
85+
</div>
86+
87+
!!! info "Policy"
88+
89+
The `router` property allows you to configure the routing `policy`:
90+
91+
* `cache_aware` &mdash; Default policy; combines cache locality with load balancing, falling back to shortest queue.
92+
* `power_of_two` &mdash; Samples two workers and picks the lighter one.
93+
* `random` &mdash; Uniform random selection.
94+
* `round_robin` &mdash; Cycles through workers in order.
95+
96+
97+
> Currently, services using this type of gateway must run standard SGLang workers. See the [example](../../examples/inference/sglang/index.md).
98+
>
99+
> Support for prefill/decode disaggregation and auto-scaling based on inter-token latency is coming soon.
100+
60101
### Public IP
61102

62103
If you don't need/want a public IP for the gateway, you can set the `public_ip` to `false` (the default value is `true`), making the gateway private.

docs/docs/concepts/services.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,12 +100,13 @@ If [authorization](#authorization) is not disabled, the service endpoint require
100100
However, you'll need a gateway in the following cases:
101101

102102
* To use auto-scaling or rate limits
103+
* To enable a support custom router, e.g. such as the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}
103104
* To enable HTTPS for the endpoint and map it to your domain
104105
* If your service requires WebSockets
105106
* If your service cannot work with a [path prefix](#path-prefix)
106107

107-
Note, if you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
108-
a gateway is already pre-configured for you.
108+
<!-- Note, if you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
109+
a gateway is already pre-configured for you. -->
109110

110111
If a [gateway](gateways.md) is configured, the service endpoint will be accessible at
111112
`https://<run name>.<gateway domain>/`.

docs/docs/reference/dstack.yml/gateway.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,16 @@ The `gateway` configuration type allows creating and updating [gateways](../../c
1010
type:
1111
required: true
1212

13+
### `router`
14+
15+
=== "SGLang Model Gateway"
16+
17+
#SCHEMA# dstack._internal.core.models.routers.SGLangRouterConfig
18+
overrides:
19+
show_root_heading: false
20+
type:
21+
required: true
22+
1323
### `certificate`
1424

1525
=== "Let's encrypt"

examples/inference/sglang/README.md

Lines changed: 21 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -2,32 +2,21 @@
22

33
This example shows how to deploy DeepSeek-R1-Distill-Llama 8B and 70B using [SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang){:target="_blank"} and `dstack`.
44

5-
??? info "Prerequisites"
6-
Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
7-
8-
<div class="termy">
9-
10-
```shell
11-
$ git clone https://github.com/dstackai/dstack
12-
$ cd dstack
13-
```
14-
15-
</div>
5+
## Apply a configuration
166

17-
## Deployment
187
Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B using SgLang.
198

20-
=== "AMD"
9+
=== "NVIDIA"
2110

22-
<div editor-title="examples/inference/sglang/amd/.dstack.yml">
11+
<div editor-title="examples/inference/sglang/nvidia/.dstack.yml">
2312

2413
```yaml
2514
type: service
26-
name: deepseek-r1-amd
15+
name: deepseek-r1-nvidia
2716

28-
image: lmsysorg/sglang:v0.4.1.post4-rocm620
17+
image: lmsysorg/sglang:latest
2918
env:
30-
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
19+
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
3120

3221
commands:
3322
- python3 -m sglang.launch_server
@@ -36,25 +25,24 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
3625
--trust-remote-code
3726

3827
port: 8000
39-
model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
28+
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
4029

4130
resources:
42-
gpu: MI300x
43-
disk: 300GB
31+
gpu: 24GB
4432
```
4533
</div>
4634

47-
=== "NVIDIA"
35+
=== "AMD"
4836

49-
<div editor-title="examples/inference/sglang/nvidia/.dstack.yml">
37+
<div editor-title="examples/inference/sglang/amd/.dstack.yml">
5038

5139
```yaml
5240
type: service
53-
name: deepseek-r1-nvidia
41+
name: deepseek-r1-amd
5442

55-
image: lmsysorg/sglang:latest
43+
image: lmsysorg/sglang:v0.4.1.post4-rocm620
5644
env:
57-
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
45+
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
5846

5947
commands:
6048
- python3 -m sglang.launch_server
@@ -63,16 +51,14 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
6351
--trust-remote-code
6452

6553
port: 8000
66-
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
54+
model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
6755

6856
resources:
69-
gpu: 24GB
57+
gpu: MI300x
58+
disk: 300GB
7059
```
7160
</div>
7261

73-
74-
### Applying the configuration
75-
7662
To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
7763

7864
<div class="termy">
@@ -118,8 +104,10 @@ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
118104
```
119105
</div>
120106

121-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
122-
is available at `https://gateway.<gateway domain>/`.
107+
!!! info "SGLang Model Gateway"
108+
If you'd like to use a custom routing policy, e.g. by leveraging the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}, create a gateway with `router` set to `sglang`. Check out [gateways](https://dstack.ai/docs/concepts/gateways#router) for more details.
109+
110+
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the OpenAI-compatible endpoint is available at `https://gateway.<gateway domain>/`.
123111
124112
## Source code
125113

@@ -128,5 +116,5 @@ The source-code of this example can be found in
128116

129117
## What's next?
130118

131-
1. Check [services](https://dstack.ai/docs/services)
119+
1. Read about [services](https://dstack.ai/docs/concepts/services) and [gateways](https://dstack.ai/docs/concepts/gateways)
132120
2. Browse the [SgLang DeepSeek Usage](https://docs.sglang.ai/references/deepseek.html), [Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.html)
Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
from enum import Enum
22
from typing import Literal
33

4+
from pydantic import Field
5+
from typing_extensions import Annotated
6+
47
from dstack._internal.core.models.common import CoreModel
58

69

@@ -9,8 +12,13 @@ class RouterType(str, Enum):
912

1013

1114
class SGLangRouterConfig(CoreModel):
12-
type: Literal["sglang"] = "sglang"
13-
policy: Literal["random", "round_robin", "cache_aware", "power_of_two"] = "cache_aware"
15+
type: Annotated[Literal["sglang"], Field(description="The router type")] = "sglang"
16+
policy: Annotated[
17+
Literal["random", "round_robin", "cache_aware", "power_of_two"],
18+
Field(
19+
description="The routing policy. Options: `random`, `round_robin`, `cache_aware`, `power_of_two`"
20+
),
21+
] = "cache_aware"
1422

1523

1624
AnyRouterConfig = SGLangRouterConfig

0 commit comments

Comments
 (0)