Skip to content

Commit 1e82eda

Browse files
committed
docs(openapi): document timeout and stall config
1 parent d9baac6 commit 1e82eda

8 files changed

Lines changed: 563 additions & 1 deletion

File tree

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Plexus sits in front of your LLM providers and handles protocol translation, loa
2424
- **Model aliasing & load balancing** — Define virtual model names backed by multiple real providers with `random`, `cost`, `performance`, `latency`, or `in_order` selectors
2525
- **Vision fallthrough** — Automatically convert images to text descriptions for models that don't natively support vision, ensuring compatibility across all providers
2626
- **Intelligent failover** — Exponential backoff cooldowns automatically remove unhealthy providers from rotation
27+
- **Stream safety controls** — Detect client disconnects, enforce upstream timeouts, and catch provider stalls so stuck streams don't burn quota forever
2728
- **Usage tracking** — Per-request cost, token counts, latency, and TPS metrics with a built-in dashboard
2829
- **MCP proxy** — Proxy Model Context Protocol servers through Plexus with per-request session isolation
2930
- **User quotas** — Per-API-key rate limiting by requests or tokens with rolling, daily, or weekly windows, along with cost restriction.
@@ -196,7 +197,15 @@ Limit how much each API key can consume using rolling, daily, or weekly windows:
196197

197198
When a provider fails, Plexus removes it from rotation using exponential backoff: 2 min → 4 min → 8 min → ... → 5 hr cap. Successful requests reset the counter. Set disable cooldown: true on a provider to opt it out entirely.
198199

199-
→ See [Configuration: cooldown](docs/CONFIGURATION.md#cooldown-optional)
200+
→ See [Configuration: cooldown](docs/CONFIGURATION.md#cooldowns)
201+
202+
### Stream Cancellation & Provider Stall Protection
203+
204+
- Client disconnects now cancel the upstream provider request, reducing wasted tokens/quota on abandoned streams.
205+
- Global and per-provider upstream timeouts cut off requests that run too long.
206+
- Optional stall detection can fail over slow-to-start providers before bytes reach the client, and can abort streams that become too slow mid-flight.
207+
208+
→ See [Configuration: Request Timeouts](docs/CONFIGURATION.md#request-timeouts) and [Configuration: Stall Detection](docs/CONFIGURATION.md#stall-detection)
200209

201210
### MCP Proxy
202211

docs/CONFIGURATION.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,9 @@ A **provider** represents an upstream AI service that Plexus routes requests to.
9090
| **Enabled** | Whether this provider is active for routing | No (default: true) |
9191
| **Headers** | Custom HTTP headers sent with every request | No |
9292
| **Extra Body** | Additional fields merged into every request | No |
93+
| **Upstream Timeout** | Per-provider request timeout override in milliseconds. If unset, the global timeout is used. | No |
9394
| **Disable Cooldown** | Exclude from automatic cooldown on errors | No |
95+
| **Stall Detection Overrides** | Optional per-provider overrides for TTFB/throughput stall detection. Empty = inherit global setting for that field. | No |
9496
| **Adapters** | Request/response rewrite hooks applied to every model under this provider (see [Provider Adapters](#provider-adapters)) | No |
9597

9698
### Multi-Protocol Providers
@@ -185,6 +187,34 @@ Quota checkers monitor upstream provider rate limits and prevent routing to exha
185187

186188
Quota data is available via the Management API — see [API Reference: Quota Management](/docs/openapi/openapi.yaml#/paths/~1v0~1management~1quotas).
187189

190+
### Provider Timeout Overrides
191+
192+
Each provider can optionally set `timeoutMs` to override the global upstream timeout for requests routed to that provider.
193+
194+
- **Unset / omitted**: inherit the global timeout
195+
- **Set to a number**: use that provider-specific timeout instead
196+
- **Valid range**: any positive integer millisecond value in provider config; the Admin UI exposes this as **1–3600 seconds**
197+
198+
Use provider overrides when one backend is predictably slower or faster than the rest. For example, you might keep a **300s** global timeout but set a **30s** timeout on a fast inference endpoint so stuck requests fail over much sooner.
199+
200+
### Provider Stall Detection Overrides
201+
202+
Each provider can optionally override any of the global stall detection settings with these fields:
203+
204+
- `stallTtfbMs` — per-provider TTFB timeout override
205+
- `stallTtfbBytes` — per-provider TTFB byte threshold override
206+
- `stallMinBps` — per-provider minimum throughput override
207+
- `stallWindowMs` — per-provider sliding-window width override
208+
- `stallGracePeriodMs` — per-provider grace period override
209+
210+
**Important inheritance rules:**
211+
212+
- If a field is **omitted**, the provider inherits the global setting for that field.
213+
- For the nullable threshold fields (`stallTtfbMs`, `stallMinBps`), setting the value to **`null`** disables that stall dimension for the provider.
214+
- For the non-null tuning fields (`stallTtfbBytes`, `stallWindowMs`, `stallGracePeriodMs`), `null`/empty in practice means “use the inherited global value”.
215+
216+
This lets you keep one global policy while tightening or relaxing stall protection for known outliers.
217+
188218
---
189219

190220
## Model Aliases
@@ -435,6 +465,176 @@ See [API Reference: Cooldown Management](/docs/openapi/openapi.yaml#/paths/~1v0~
435465

436466
---
437467

468+
## Request Timeouts
469+
470+
Plexus can abort upstream requests that run too long instead of waiting forever.
471+
472+
Timeouts matter for both reliability and cost:
473+
474+
- they stop abandoned or hung upstream requests from continuing to burn quota,
475+
- they allow the dispatcher to fail over to another provider when appropriate,
476+
- and they put a hard cap on runaway or extremely slow streams.
477+
478+
### Default Behavior
479+
480+
- **Global default timeout**: `300` seconds
481+
- **Per-provider override**: optional via `timeoutMs`
482+
- **Effective timeout**: `provider.timeoutMs ?? (global timeout.defaultSeconds × 1000)`
483+
484+
If the timeout fires before the request completes:
485+
486+
- Plexus aborts the upstream fetch,
487+
- the request is recorded with `responseStatus = "timeout"`,
488+
- and failover may continue to the next provider when the dispatcher is still in a retryable/failover-safe stage.
489+
490+
### Global Timeout Configuration
491+
492+
Global timeout settings live in system settings and are exposed through the Admin UI **Config → Timeout Settings** and the management API.
493+
494+
| Field | Type | Default | Range | Meaning |
495+
|------|------|---------|-------|---------|
496+
| `defaultSeconds` | integer | `300` | `1–3600` | Default maximum duration for any upstream request unless the selected provider overrides it |
497+
498+
### Per-Provider Timeout Configuration
499+
500+
| Field | Type | Default | Meaning |
501+
|------|------|---------|---------|
502+
| `timeoutMs` | integer (milliseconds) | inherit global | Override the global timeout for requests routed to this provider |
503+
504+
### Management API
505+
506+
- `GET /v0/management/config/timeout` — returns the effective global timeout config
507+
- `PATCH /v0/management/config/timeout` — partial update of timeout settings
508+
509+
Example:
510+
511+
```json
512+
PATCH /v0/management/config/timeout
513+
{
514+
"defaultSeconds": 120
515+
}
516+
```
517+
518+
### Tuning Guidance
519+
520+
- Start with the default `300s` if you are unsure.
521+
- Lower it for providers that should either respond quickly or fail fast.
522+
- Raise it only for providers that legitimately need long-running responses.
523+
- Prefer a provider-specific override over increasing the global timeout for everyone.
524+
525+
---
526+
527+
## Stall Detection
528+
529+
Stall detection protects against providers that are technically connected but behaving too slowly to be useful.
530+
531+
Unlike a plain timeout, stall detection can distinguish between:
532+
533+
- a provider that **never starts producing meaningful bytes** (TTFB stall), and
534+
- a provider that **starts streaming but then slows below an acceptable throughput floor** (throughput stall).
535+
536+
By default, stall detection is effectively **off** because both threshold dimensions are disabled:
537+
538+
- `ttfbSeconds = null`
539+
- `minBytesPerSecond = null`
540+
541+
The supporting tuning values still have defaults even when detection is disabled:
542+
543+
- `ttfbBytes = 100`
544+
- `windowSeconds = 10`
545+
- `gracePeriodSeconds = 30`
546+
547+
### Two Stall Dimensions
548+
549+
#### 1. TTFB Stall Detection
550+
551+
TTFB (time-to-first-bytes) detection checks whether the provider produces **enough bytes** quickly enough.
552+
553+
It uses two values together:
554+
555+
- `ttfbSeconds` — how long Plexus waits
556+
- `ttfbBytes` — how many bytes must arrive within that time to count as meaningful output
557+
558+
If the threshold is not met in time, Plexus treats the provider as stalled and aborts that attempt.
559+
560+
**Important:** when this happens before any response bytes reach the client, the dispatcher can transparently fail over to another provider.
561+
562+
#### 2. Throughput Stall Detection
563+
564+
Throughput detection applies **after** the provider has started responding.
565+
566+
It uses:
567+
568+
- `minBytesPerSecond` — minimum acceptable streaming rate
569+
- `windowSeconds` — sliding window width for measuring throughput
570+
- `gracePeriodSeconds` — delay after TTFB success before throughput enforcement starts
571+
572+
The grace period is especially important for reasoning-heavy models that may pause naturally after their first chunk.
573+
574+
If throughput drops below the configured floor, Plexus aborts the stream and records the request as `stall`.
575+
576+
**Important:** once bytes have already been sent to the client, Plexus cannot transparently fail over the same response stream. The client must retry.
577+
578+
### Global Stall Settings
579+
580+
Global stall settings live in the Admin UI under **Config → Stall Detection** and in the management API.
581+
582+
| Field | Type | Default | Range | Meaning |
583+
|------|------|---------|-------|---------|
584+
| `ttfbSeconds` | integer or `null` | `null` | `5–120` or `null` | Max time to wait for the first meaningful bytes. `null` disables TTFB stall detection. |
585+
| `ttfbBytes` | integer | `100` | `50–10000` | Byte threshold that must arrive within `ttfbSeconds` to count as “started”. |
586+
| `minBytesPerSecond` | integer or `null` | `null` | `50–5000` or `null` | Minimum acceptable streaming throughput. `null` disables throughput stall detection. |
587+
| `windowSeconds` | integer | `10` | `3–30` | Sliding window width used to calculate throughput. |
588+
| `gracePeriodSeconds` | integer | `30` | `0–120` | Delay after TTFB success before throughput enforcement starts. |
589+
590+
### Effective Behavior and Inheritance
591+
592+
- Stall detection is **enabled** for a request if either `ttfbSeconds` or `minBytesPerSecond` is active after global + provider override resolution.
593+
- Provider overrides take precedence over global settings.
594+
- Per-provider overrides can enable stall detection even if the global stall config is disabled.
595+
- Leaving provider fields empty keeps the global value for that field.
596+
597+
### Management API
598+
599+
- `GET /v0/management/config/stall` — returns the current global stall detection config
600+
- `PATCH /v0/management/config/stall` — partial update of stall settings
601+
602+
Example:
603+
604+
```json
605+
PATCH /v0/management/config/stall
606+
{
607+
"ttfbSeconds": 15,
608+
"ttfbBytes": 100,
609+
"minBytesPerSecond": 500,
610+
"windowSeconds": 10,
611+
"gracePeriodSeconds": 30
612+
}
613+
```
614+
615+
Disable one dimension while keeping the other:
616+
617+
```json
618+
PATCH /v0/management/config/stall
619+
{
620+
"ttfbSeconds": null,
621+
"minBytesPerSecond": 400
622+
}
623+
```
624+
625+
### Recommended Starting Points
626+
627+
- **Fail over slow starters only**: set `ttfbSeconds`, leave `minBytesPerSecond` as `null`
628+
- **Protect long streams too**: set both `ttfbSeconds` and `minBytesPerSecond`
629+
- **Reasoning-heavy models**: keep a longer `gracePeriodSeconds`
630+
- **Bursty but healthy streams**: increase `windowSeconds` before raising `minBytesPerSecond`
631+
632+
### Relationship to Client Disconnects
633+
634+
Separate from timeout/stall settings, Plexus now also cancels upstream provider requests when the downstream client disconnects during streaming. This reduces wasted quota for abandoned requests even when neither timeouts nor stall detection are enabled.
635+
636+
---
637+
438638
## MCP Servers
439639

440640
Plexus proxies [Model Context Protocol](https://modelcontextprotocol.io) servers. Only HTTP streaming transport is supported.

docs/openapi/components/schemas/ProviderConfig.yaml

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,3 +194,57 @@ properties:
194194
195195
Pass-through optimisation is automatically disabled when any adapter
196196
is active.
197+
timeoutMs:
198+
type: integer
199+
minimum: 1
200+
description: >
201+
Optional per-provider upstream request timeout in milliseconds.
202+
When omitted, Plexus uses the global timeout from
203+
`/v0/management/config/timeout`.
204+
stallTtfbMs:
205+
type:
206+
- integer
207+
- 'null'
208+
minimum: 5000
209+
maximum: 120000
210+
description: >
211+
Optional per-provider TTFB stall timeout override in milliseconds.
212+
Omit to inherit the global value. `null` disables TTFB stall detection
213+
for this provider.
214+
stallTtfbBytes:
215+
type:
216+
- integer
217+
- 'null'
218+
minimum: 50
219+
maximum: 10000
220+
description: >
221+
Optional per-provider byte threshold used to confirm meaningful first
222+
output for stall detection. Omit to inherit the global value.
223+
stallMinBps:
224+
type:
225+
- integer
226+
- 'null'
227+
minimum: 50
228+
maximum: 5000
229+
description: >
230+
Optional per-provider minimum throughput override in bytes per second.
231+
Omit to inherit the global value. `null` disables throughput stall
232+
detection for this provider.
233+
stallWindowMs:
234+
type:
235+
- integer
236+
- 'null'
237+
minimum: 3000
238+
maximum: 30000
239+
description: >
240+
Optional per-provider sliding-window width for throughput stall
241+
detection, in milliseconds. Omit to inherit the global value.
242+
stallGracePeriodMs:
243+
type:
244+
- integer
245+
- 'null'
246+
minimum: 0
247+
maximum: 120000
248+
description: >
249+
Optional per-provider grace period before throughput stall detection
250+
starts, in milliseconds. Omit to inherit the global value.

docs/openapi/components/schemas/UsageRecord.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,12 +251,18 @@ properties:
251251
- success
252252
- error
253253
- pending
254+
- cancelled
255+
- timeout
256+
- stall
254257
description: >
255258
Outcome of the request.
256259
257260
- **success** — Upstream returned a successful response.
258261
- **error** — Upstream returned an error or the request failed.
259262
- **pending** — Request is in-flight (used for async tracking).
263+
- **cancelled** — The downstream client disconnected and Plexus cancelled the upstream request.
264+
- **timeout** — Plexus aborted the upstream request because it exceeded the configured timeout.
265+
- **stall** — Plexus aborted the request because stream stall detection fired.
260266
toolsDefined:
261267
type:
262268
- integer

docs/openapi/openapi.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -406,6 +406,10 @@ paths:
406406
$ref: paths/v0_management_config_vision-fallthrough.yaml
407407
/v0/management/config/background-exploration:
408408
$ref: paths/v0_management_config_background-exploration.yaml
409+
/v0/management/config/timeout:
410+
$ref: paths/v0_management_config_timeout.yaml
411+
/v0/management/config/stall:
412+
$ref: paths/v0_management_config_stall.yaml
409413
/v0/management/config/cooldown:
410414
$ref: paths/v0_management_config_cooldown.yaml
411415
/v0/management/config/exploration-rate:

0 commit comments

Comments
 (0)