You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(readme): expand TOC, document media-gen valves, fix stale architecture refs
README pass: align with the v1.10.0 surface and fix small staleness from
the audit-batch shipping.
TOC
- Add subsection links that were missing under Usage (Common valve
combinations, Reasoning tokens, Citations) and Configuration
(Media Generation, Cost Display, Per-user settings, API key
encryption, Tool calling).
Badges
- Add a release badge driven by github.com/.../releases/latest and a
test-count badge (868 tests).
Configuration — new Media Generation valve table
- VIDEO_GENERATION_TIMEOUT (default 600 s) + VIDEO_POLL_INTERVAL
(default 5 s)
- AUDIO_OUTPUT_FORMAT (default mp3, ignored for openai/gpt-audio*
which is auto-forced to pcm16 + WAV-wrapped)
- AUDIO_OUTPUT_VOICE (default alloy, ignored by music models like
Lyria)
Display & Filtering
- OUTPUT_MODALITIES doc string lists 'video' alongside text/image/
audio/embeddings.
Common valve combinations
- Added rows for flux image gen, grok-imagine cheap video, Lyria
music, gpt-audio-mini speech (with AUDIO_OUTPUT_VOICE hint),
SHOW_REMAINING_CREDIT, SHOW_COST_INFO + currency, ZDR_ENFORCE.
Architecture table
- Replace the stale '_non_stream_response()' row with
'_non_stream_fetch() + _non_stream_with_events()' (the actual
live path) and add rows for the tool loop
(_run_tools_{stream,nonstream} + _stream_one_round), video
generation (_run_video_generation), audio generation
(_materialize_audio_output + _wrap_pcm16_as_wav), the shared
upload helper (_owui_upload_bytes), and the SSRF / size / MIME
security guards.
|`OUTPUT_MODALITIES`|`OPENROUTER_OUTPUT_MODALITIES`|`all`| Output modalities to fetch from `/models`. `all` (default) lists every model. Restrict with `text`, `image`, `audio`, `embeddings`, or a comma list (e.g. `text,audio`) |
179
+
|`OUTPUT_MODALITIES`|`OPENROUTER_OUTPUT_MODALITIES`|`all`| Output modalities to fetch from `/models`. `all` (default) lists every model. Restrict with `text`, `image`, `audio`, `video`, `embeddings`, or a comma list (e.g. `text,image,video`) |
|`HIDE_DEPRECATED_MODELS`|`OPENROUTER_HIDE_DEPRECATED_MODELS`|`false`| Hide models with a non-null `expiration_date`. When False, deprecated models are tagged `⚠ {name} (deprecated)`|
@@ -182,6 +199,18 @@ Every valve accepts an environment variable fallback. The table below lists both
182
199
|`DATA_COLLECTION`|`OPENROUTER_DATA_COLLECTION`|`allow`| Data policy: `allow` or `deny`|
183
200
|`ZDR_ENFORCE`|`OPENROUTER_ZDR_ENFORCE`|`false`| Send `provider.zdr=true` so OpenRouter routes only to ZDR endpoints (request fails if none available) |
184
201
202
+
### Media Generation
203
+
204
+
Tunes the new image / video / audio output flows. Defaults are tuned for OpenRouter's
205
+
documented behaviour — most installs never need to change them.
206
+
207
+
| Valve | Env Var | Default | Description |
208
+
| --- | --- | --- | --- |
209
+
|`VIDEO_GENERATION_TIMEOUT`|`OPENROUTER_VIDEO_GENERATION_TIMEOUT`|`600`| Hard timeout for a video job (seconds). Veo/Kling clips typically finish in 30 s – 5 min; raise for longer or higher-resolution outputs |
210
+
|`VIDEO_POLL_INTERVAL`|`OPENROUTER_VIDEO_POLL_INTERVAL`|`5`| Seconds between `GET /videos/<id>` poll requests. 5 – 10 s is a good range |
211
+
|`AUDIO_OUTPUT_FORMAT`|`OPENROUTER_AUDIO_OUTPUT_FORMAT`|`mp3`| Audio container the pipe requests from audio-output models. Common: `mp3`, `wav`, `flac`, `opus`, `pcm16`. Ignored for OpenAI `gpt-audio*` (forced to `pcm16` because that's the only format the upstream accepts with `stream=true`, then auto-wrapped in a WAV container) |
212
+
|`AUDIO_OUTPUT_VOICE`|`OPENROUTER_AUDIO_OUTPUT_VOICE`|`alloy`| Voice for speech-synthesis audio models (`gpt-audio*`). Common: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`. Music models like Lyria ignore the field |
213
+
185
214
### Advanced
186
215
187
216
| Valve | Env Var | Default | Description |
@@ -251,12 +280,17 @@ The pipe implements the **Manifold** pattern: one pipe entry point that surfaces
251
280
252
281
| Layer | Files | Responsibility |
253
282
| --- | --- | --- |
254
-
| Entry points |`Pipe.pipes()`, `Pipe.pipe()`| Model listing and chat routing |
| Entry points |`Pipe.pipes()`, `Pipe.pipe()`| Model listing (with atomic frozenset swap for the audio / video routing sets) and per-request routing |
284
+
| Payload |`_prepare_payload()`| Sanitize OWUI internals, inject provider routing, reasoning, response format, fallbacks, web search, cache control |
285
+
| Transport |`_retryable_request()` + `requests.Session` w/ `HTTPAdapter(pool_maxsize=64)`| Retry wrapper with exponential backoff + Retry-After awareness; one shared connection pool sized for concurrent users |
286
+
| Streaming chat |`_stream_response()` + async `_wrap_stream`| SSE parser, `<think>` management, image/audio capture, final media materialization, mid-stream error sanitisation |
| Tool loop |`_run_tools_stream()` / `_run_tools_nonstream()` + `_stream_one_round()`| Execute tools, feed results back, cap iterations; both paths now also capture image/audio output via `_stream_media_embeds`|
289
+
| Video generation |`_run_video_generation()`| Submit to `/api/v1/videos`, poll, download with byte cap, embed via block-HTML `<video>`|
290
+
| Audio generation |`_materialize_audio_output()` + `_wrap_pcm16_as_wav()`| Decode base64 audio chunks, wrap PCM in RIFF/WAVE for OpenAI, embed via block-HTML `<audio>`|
291
+
| OWUI file upload |`_owui_upload_bytes()`| Single shared helper backing every image / video / audio re-host through OWUI |
0 commit comments