Skip to content

Commit 14ef341

Browse files
[Docs] Remove the mention of the gateway endpoint #3514 (#3518)
* [Docs] Remove the mention of the gateway endpoint #3514 * Update docs/docs/concepts/services.md Co-authored-by: jvstme <36324149+jvstme@users.noreply.github.com> * Update docs/docs/concepts/services.md Co-authored-by: jvstme <36324149+jvstme@users.noreply.github.com> * [Docs] Updated examples (migrated from gateway endpoint to service endpoint) --------- Co-authored-by: jvstme <36324149+jvstme@users.noreply.github.com>
1 parent 1dc6121 commit 14ef341

File tree

11 files changed

+73
-75
lines changed

11 files changed

+73
-75
lines changed

docs/blog/posts/dstack-sky.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -121,15 +121,14 @@ model: mixtral
121121
```
122122
</div>
123123
124-
If it has a `model` mapping, the model will be accessible
125-
at `https://gateway.<project name>.sky.dstack.ai` via the OpenAI compatible interface.
124+
The service endpoint will be accessible at `https://<run name>.<project name>.sky.dstack.ai` via the OpenAI compatible interface.
126125

127126
```python
128127
from openai import OpenAI
129128
130129
131130
client = OpenAI(
132-
base_url="https://gateway.<project name>.sky.dstack.ai",
131+
base_url="https://<run name>.<project name>.sky.dstack.ai/v1",
133132
api_key="<dstack token>"
134133
)
135134

docs/docs/concepts/services.md

Lines changed: 42 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Model meta-llama/Meta-Llama-3.1-8B-Instruct is published at:
7373

7474
`dstack apply` automatically provisions instances and runs the service.
7575

76-
If a [gateway](gateways.md) is not configured, the service’s endpoint will be accessible at
76+
If you do not have a [gateway](gateways.md) created, the service endpoint will be accessible at
7777
`<dstack server URL>/proxy/services/<project name>/<run name>/`.
7878

7979
<div class="termy">
@@ -95,37 +95,50 @@ $ curl http://localhost:3000/proxy/services/main/llama31/v1/chat/completions \
9595

9696
</div>
9797

98-
If the service defines the [`model`](#model) property, the model can be accessed with
99-
the global OpenAI-compatible endpoint at `<dstack server URL>/proxy/models/<project name>/`,
100-
or via `dstack` UI.
98+
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with `Bearer <dstack token>`.
10199

102-
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with
103-
`Bearer <dstack token>`.
100+
## Configuration options
104101

105-
??? info "Gateway"
106-
Running services for development purposes doesn’t require setting up a [gateway](gateways.md).
102+
<!-- !!! info "No commands"
103+
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set). -->
107104

108-
However, you'll need a gateway in the following cases:
105+
### Gateway
109106

110-
* To use auto-scaling or rate limits
111-
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
112-
* To enable HTTPS for the endpoint and map it to your domain
113-
* If your service requires WebSockets
114-
* If your service cannot work with a [path prefix](#path-prefix)
107+
Here are cases where a service may need a [gateway](gateways.md):
115108

116-
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
117-
a gateway is already pre-configured for you. -->
109+
* To use [auto-scaling](#replicas-and-scaling) or [rate limits](#rate-limits)
110+
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
111+
* To enable HTTPS for the endpoint and map it to your domain
112+
* If your service requires WebSockets
113+
* If your service cannot work with a [path prefix](#path-prefix)
118114

119-
If a [gateway](gateways.md) is configured, the service endpoint will be accessible at
120-
`https://<run name>.<gateway domain>/`.
115+
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
116+
a gateway is already pre-configured for you. -->
121117

122-
If the service defines the `model` property, the model will be available via the global OpenAI-compatible endpoint
123-
at `https://gateway.<gateway domain>/`.
118+
If you want `dstack` to explicitly validate that a gateway is used, you can set the [`gateway`](../reference/dstack.yml/service.md#gateway) property in the service configuration to `true`. In this case, `dstack` will raise an error during `dstack apply` if a default gateway is not created.
124119

125-
## Configuration options
120+
You can also set the `gateway` property to the name of a specific gateway, if required.
121+
122+
If you have a [gateway](gateways.md) created, the service endpoint will be accessible at `https://<run name>.<gateway domain>/`:
123+
124+
<div class="termy">
125+
126+
```shell
127+
$ curl https://llama31.example.com/v1/chat/completions \
128+
-H 'Content-Type: application/json' \
129+
-H 'Authorization: Bearer &lt;dstack token&gt;' \
130+
-d '{
131+
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
132+
"messages": [
133+
{
134+
"role": "user",
135+
"content": "Compose a poem that explains the concept of recursion in programming."
136+
}
137+
]
138+
}'
139+
```
126140

127-
!!! info "No commands"
128-
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set).
141+
</div>
129142

130143
### Replicas and scaling
131144

@@ -220,12 +233,6 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
220233
??? info "Disaggregated serving"
221234
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
222235

223-
### Model
224-
225-
If the service is running a chat model with an OpenAI-compatible interface,
226-
set the [`model`](#model) property to make the model accessible via `dstack`'s
227-
global OpenAI-compatible endpoint, and also accessible via `dstack`'s UI.
228-
229236
### Authorization
230237

231238
By default, the service enables authorization, meaning the service endpoint requires a `dstack` user token.
@@ -364,7 +371,7 @@ set [`strip_prefix`](../reference/dstack.yml/service.md#strip_prefix) to `false`
364371
If your app cannot be configured to work with a path prefix, you can host it
365372
on a dedicated domain name by setting up a [gateway](gateways.md).
366373

367-
### Rate limits { #rate-limits }
374+
### Rate limits
368375

369376
If you have a [gateway](gateways.md), you can configure rate limits for your service
370377
using the [`rate_limits`](../reference/dstack.yml/service.md#rate_limits) property.
@@ -413,6 +420,11 @@ Limits apply to the whole service (all replicas) and per client (by IP). Clients
413420
414421
</div>
415422
423+
### Model
424+
425+
If the service runs a model with an OpenAI-compatible interface, you can set the [`model`](#model) property to make the model accessible through `dstack`'s chat UI on the `Models` page.
426+
In this case, `dstack` will use the service's `/v1/chat/completions` service.
427+
416428
### Resources
417429

418430
If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a

examples/accelerators/tenstorrent/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
9696
<div class="termy">
9797

9898
```shell
99-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
99+
$ curl http://127.0.0.1:3000/proxy/services/main/tt-inference-server/v1/chat/completions \
100100
-X POST \
101101
-H 'Authorization: Bearer &lt;dstack token&gt;' \
102102
-H 'Content-Type: application/json' \

examples/inference/nim/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,12 @@ Provisioning...
7878
```
7979
</div>
8080

81-
If no gateway is created, the model will be available via the OpenAI-compatible endpoint
82-
at `<dstack server URL>/proxy/models/<project name>/`.
81+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
8382

8483
<div class="termy">
8584

8685
```shell
87-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
86+
$ curl http://127.0.0.1:3000/proxy/services/main/serve-distill-deepseek/v1/chat/completions \
8887
-X POST \
8988
-H 'Authorization: Bearer &lt;dstack token&gt;' \
9089
-H 'Content-Type: application/json' \
@@ -106,8 +105,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
106105

107106
</div>
108107

109-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
110-
is available at `https://gateway.<gateway domain>/`.
108+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill-deepseek.<gateway domain>/`.
111109

112110
## Source code
113111

examples/inference/sglang/README.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
1717

1818
```yaml
1919
type: service
20-
name: deepseek-r1-nvidia
20+
name: deepseek-r1
2121

2222
image: lmsysorg/sglang:latest
2323
env:
@@ -43,7 +43,7 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
4343

4444
```yaml
4545
type: service
46-
name: deepseek-r1-amd
46+
name: deepseek-r1
4747

4848
image: lmsysorg/sglang:v0.4.1.post4-rocm620
4949
env:
@@ -74,20 +74,19 @@ $ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml
7474
# BACKEND REGION RESOURCES SPOT PRICE
7575
1 runpod EU-RO-1 24xCPU, 283GB, 1xMI300X (192GB) no $2.49
7676

77-
Submit the run deepseek-r1-amd? [y/n]: y
77+
Submit the run deepseek-r1? [y/n]: y
7878

7979
Provisioning...
8080
---> 100%
8181
```
8282
</div>
8383

84-
Once the service is up, the model will be available via the OpenAI-compatible endpoint
85-
at `<dstack server URL>/proxy/models/<project name>/`.
84+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
8685

8786
<div class="termy">
8887

8988
```shell
90-
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
89+
curl http://127.0.0.1:3000/proxy/services/main/deepseek-r1/v1/chat/completions \
9190
-X POST \
9291
-H 'Authorization: Bearer &lt;dstack token&gt;' \
9392
-H 'Content-Type: application/json' \
@@ -112,7 +111,7 @@ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
112111
!!! info "SGLang Model Gateway"
113112
If you'd like to use a custom routing policy, e.g. by leveraging the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#), create a gateway with `router` set to `sglang`. Check out [gateways](https://dstack.ai/docs/concepts/gateways#router) for more details.
114113

115-
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the OpenAI-compatible endpoint is available at `https://gateway.<gateway domain>/`.
114+
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the service endpoint will be available at `https://deepseek-r1.<gateway domain>/`.
116115
117116
## Source code
118117

examples/inference/tgi/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,12 @@ Provisioning...
8282
```
8383
</div>
8484

85-
If no gateway is created, the model will be available via the OpenAI-compatible endpoint
86-
at `<dstack server URL>/proxy/models/<project name>/`.
85+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
8786

8887
<div class="termy">
8988

9089
```shell
91-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
90+
$ curl http://127.0.0.1:3000/proxy/services/main/llama4-scout/v1/chat/completions \
9291
-X POST \
9392
-H 'Authorization: Bearer &lt;dstack token&gt;' \
9493
-H 'Content-Type: application/json' \
@@ -110,8 +109,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
110109

111110
</div>
112111

113-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
114-
is available at `https://gateway.<gateway domain>/`.
112+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama4-scout.<gateway domain>/`.
115113

116114
## Source code
117115

examples/inference/trtllm/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -330,13 +330,12 @@ Provisioning...
330330

331331
## Access the endpoint
332332

333-
If no gateway is created, the model will be available via the OpenAI-compatible endpoint
334-
at `<dstack server URL>/proxy/models/<project name>/`.
333+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
335334

336335
<div class="termy">
337336

338337
```shell
339-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
338+
$ curl http://127.0.0.1:3000/proxy/services/main/serve-distill/v1/chat/completions \
340339
-X POST \
341340
-H 'Authorization: Bearer &lt;dstack token&gt;' \
342341
-H 'Content-Type: application/json' \
@@ -359,8 +358,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
359358

360359
</div>
361360

362-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
363-
is available at `https://gateway.<gateway domain>/`.
361+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill.<gateway domain>/`.
364362

365363
## Source code
366364

examples/inference/vllm/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -79,13 +79,12 @@ Provisioning...
7979
```
8080
</div>
8181

82-
If no gateway is created, the model will be available via the OpenAI-compatible endpoint
83-
at `<dstack server URL>/proxy/models/<project name>/`.
82+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
8483

8584
<div class="termy">
8685

8786
```shell
88-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
87+
$ curl http://127.0.0.1:3000/proxy/services/main/llama31/v1/chat/completions \
8988
-X POST \
9089
-H 'Authorization: Bearer &lt;dstack token&gt;' \
9190
-H 'Content-Type: application/json' \
@@ -107,8 +106,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
107106

108107
</div>
109108

110-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
111-
is available at `https://gateway.<gateway domain>/`.
109+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama31.<gateway domain>/`.
112110

113111
## Source code
114112

examples/llms/deepseek/README.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ Both SGLang and vLLM also support `Deepseek-V2-Lite`.
179179

180180
```yaml
181181
type: service
182-
name: deepseek-r1-nvidia
182+
name: deepseek-r1
183183

184184
image: lmsysorg/sglang:latest
185185
env:
@@ -203,7 +203,7 @@ Both SGLang and vLLM also support `Deepseek-V2-Lite`.
203203

204204
```yaml
205205
type: service
206-
name: deepseek-r1-nvidia
206+
name: deepseek-r1
207207

208208
image: vllm/vllm-openai:latest
209209
env:
@@ -255,20 +255,19 @@ $ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml
255255
# BACKEND REGION RESOURCES SPOT PRICE
256256
1 runpod EU-RO-1 24xCPU, 283GB, 1xMI300X (192GB) no $2.49
257257

258-
Submit the run deepseek-r1-amd? [y/n]: y
258+
Submit the run deepseek-r1? [y/n]: y
259259

260260
Provisioning...
261261
---> 100%
262262
```
263263
</div>
264264

265-
Once the service is up, the model will be available via the OpenAI-compatible endpoint
266-
at `<dstack server URL>/proxy/models/<project name>/`.
265+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
267266

268267
<div class="termy">
269268

270269
```shell
271-
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
270+
curl http://127.0.0.1:3000/proxy/services/main/deepseek-r1/v1/chat/completions \
272271
-X POST \
273272
-H 'Authorization: Bearer &lt;dstack token&gt;' \
274273
-H 'Content-Type: application/json' \
@@ -290,8 +289,7 @@ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
290289
```
291290
</div>
292291

293-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
294-
is available at `https://gateway.<gateway domain>/`.
292+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://deepseek-r1.<gateway domain>/`.
295293

296294
## Fine-tuning
297295

examples/llms/llama/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
171171
<div class="termy">
172172

173173
```shell
174-
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
174+
curl http://127.0.0.1:3000/proxy/services/main/llama4-scout/v1/chat/completions \
175175
-X POST \
176176
-H 'Authorization: Bearer &lt;dstack token&gt;' \
177177
-H 'Content-Type: application/json' \

0 commit comments

Comments
 (0)