Skip to content

Commit 651c8d0

Browse files
Bihan  RanaBihan  Rana
authored andcommitted
Update gateway and services docs
1 parent 28e41a9 commit 651c8d0

2 files changed

Lines changed: 3 additions & 3 deletions

File tree

docs/docs/concepts/gateways.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,9 +119,9 @@ router:
119119
* `round_robin` — Cycles through workers in order.
120120

121121

122-
> Currently, services using this type of gateway must run standard SGLang workers. See the [example](../../examples/inference/sglang/index.md).
122+
> Services using this type of gateway can run PD-disaggregated inference. To run PD disaggregation inference, refer to the [SGLang PD-Disaggregation](../../examples/inference/sglang/index.md#pd-disaggregation) example.
123123
>
124-
> Support for prefill/decode disaggregation and auto-scaling based on inter-token latency is coming soon.
124+
> Support for auto-scaling based on TTFT and ITL is coming soon.
125125

126126
### Public IP
127127

docs/docs/concepts/services.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
231231
> Properties such as `regions`, `port`, `image`, `env` and some other cannot be configured per replica group. This support is coming soon.
232232

233233
??? info "Disaggregated serving"
234-
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
234+
Replica groups support disaggregated prefill and decode, allowing both worker types to run within a single service. To run PD disaggregated inference, refer to the [SGLang PD-Disaggregation](../../examples/inference/sglang/index.md#pd-disaggregation) example.
235235

236236
### Authorization
237237

0 commit comments

Comments
 (0)