You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the service defines the [`model`](#model) property, the model can be accessed with
99
-
the global OpenAI-compatible endpoint at `<dstack server URL>/proxy/models/<project name>/`,
100
-
or via `dstack` UI.
98
+
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with `Bearer <dstack token>`.
101
99
102
-
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with
103
-
`Bearer <dstack token>`.
100
+
## Configuration options
104
101
105
-
??? info "Gateway"
106
-
Running services for development purposes doesn’t require setting up a [gateway](gateways.md).
102
+
<!-- !!! info "No commands"
103
+
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set). -->
107
104
108
-
However, you'll need a gateway in the following cases:
105
+
### Gateway
109
106
110
-
* To use auto-scaling or rate limits
111
-
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
112
-
* To enable HTTPS for the endpoint and map it to your domain
113
-
* If your service requires WebSockets
114
-
* If your service cannot work with a [path prefix](#path-prefix)
107
+
Here are cases where a service may need a [gateway](gateways.md):
115
108
116
-
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
117
-
a gateway is already pre-configured for you. -->
109
+
* To use [auto-scaling](#replicas-and-scaling) or [rate limits](#rate-limits)
110
+
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
111
+
* To enable HTTPS for the endpoint and map it to your domain
112
+
* If your service requires WebSockets
113
+
* If your service cannot work with a [path prefix](#path-prefix)
118
114
119
-
If a [gateway](gateways.md) is configured, the service endpoint will be accessible at
120
-
`https://<run name>.<gateway domain>/`.
115
+
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
116
+
a gateway is already pre-configured for you. -->
121
117
122
-
If the service defines the `model` property, the model will be available via the global OpenAI-compatible endpoint
123
-
at `https://gateway.<gateway domain>/`.
118
+
If you want `dstack` to explicitly validate that a gateway is used, you can set the [`gateway`](../reference/dstack.yml/service.md#gateway) property in the service configuration to `true`. In this case, `dstack` will raise an error during `dstack apply` if a default gateway is not created.
124
119
125
-
## Configuration options
120
+
You can also set the `gateway` property to the name of a specific gateway, if required.
121
+
122
+
If you have a [gateway](gateways.md) created, the service endpoint will be accessible at `https://<run name>.<gateway domain>/`:
"content": "Compose a poem that explains the concept of recursion in programming."
136
+
}
137
+
]
138
+
}'
139
+
```
126
140
127
-
!!! info "No commands"
128
-
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set).
141
+
</div>
129
142
130
143
### Replicas and scaling
131
144
@@ -220,12 +233,6 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
220
233
??? info "Disaggregated serving"
221
234
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
222
235
223
-
### Model
224
-
225
-
If the service is running a chat model with an OpenAI-compatible interface,
226
-
set the [`model`](#model) property to make the model accessible via `dstack`'s
227
-
global OpenAI-compatible endpoint, and also accessible via `dstack`'s UI.
228
-
229
236
### Authorization
230
237
231
238
By default, the service enables authorization, meaning the service endpoint requires a `dstack` user token.
@@ -364,7 +371,7 @@ set [`strip_prefix`](../reference/dstack.yml/service.md#strip_prefix) to `false`
364
371
If your app cannot be configured to work with a path prefix, you can host it
365
372
on a dedicated domain name by setting up a [gateway](gateways.md).
366
373
367
-
### Rate limits { #rate-limits }
374
+
### Rate limits
368
375
369
376
If you have a [gateway](gateways.md), you can configure rate limits for your service
370
377
using the [`rate_limits`](../reference/dstack.yml/service.md#rate_limits) property.
@@ -413,6 +420,11 @@ Limits apply to the whole service (all replicas) and per client (by IP). Clients
413
420
414
421
</div>
415
422
423
+
### Model
424
+
425
+
If the service runs a model with an OpenAI-compatible interface, you can set the [`model`](#model) property to make the model accessible through `dstack`'s chat UI on the `Models` page.
426
+
In this case, `dstack` will use the service's `/v1/chat/completions` service.
427
+
416
428
### Resources
417
429
418
430
If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
110
-
is available at `https://gateway.<gateway domain>/`.
108
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill-deepseek.<gateway domain>/`.
If you'd like to use a custom routing policy, e.g. by leveraging the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#), create a gateway with `router` set to `sglang`. Check out [gateways](https://dstack.ai/docs/concepts/gateways#router) for more details.
114
113
115
-
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the OpenAI-compatible endpoint is available at `https://gateway.<gateway domain>/`.
114
+
> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the service endpoint will be available at `https://deepseek-r1.<gateway domain>/`.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
114
-
is available at `https://gateway.<gateway domain>/`.
112
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama4-scout.<gateway domain>/`.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
363
-
is available at `https://gateway.<gateway domain>/`.
361
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill.<gateway domain>/`.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
111
-
is available at `https://gateway.<gateway domain>/`.
109
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://llama31.<gateway domain>/`.
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
294
-
is available at `https://gateway.<gateway domain>/`.
292
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://deepseek-r1.<gateway domain>/`.
0 commit comments