sena-labs · sena-labs · May 7, 2026 · May 7, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,62 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [1.6.0] — 2026-05-08
+
+### Added
+
+- **Web search plugin** — five new valves (`ENABLE_WEB_SEARCH`, `WEB_SEARCH_MAX_RESULTS`, `WEB_SEARCH_PROMPT`, `WEB_SEARCH_INCLUDE_DOMAINS`, `WEB_SEARCH_EXCLUDE_DOMAINS`) attach OpenRouter's `web` plugin to every request so any model can ground answers in fresh web results, with domain allow/deny lists and a custom search prompt
+- **`MODEL_CATEGORY` valve** — server-side `?category=...` filter on `/models` (e.g. `programming`, `roleplay`, `marketing`, `science`, `legal`, `finance`, `health`, `academia`)
+- **Deprecation handling** — models with a non-null `expiration_date` are tagged `⚠ {name} (deprecated)` in the selector. New `HIDE_DEPRECATED_MODELS` valve removes them entirely
+- **`REASONING_MAX_TOKENS` valve** — hard cap on reasoning tokens per response (sent as `reasoning.max_tokens`) for budget control on deep-thinking models
+- **Provider preferences extras** — `PROVIDER_ONLY` (allowlist), `PROVIDER_QUANTIZATIONS` (e.g. `bf16,fp8`), `PROVIDER_ALLOW_FALLBACKS`, `PROVIDER_MAX_PRICE_PROMPT`, `PROVIDER_MAX_PRICE_COMPLETION`. Translates to `provider.only/quantizations/allow_fallbacks/max_price` per the OpenRouter SDK schema
+- **`SERVICE_TIER` valve** — OpenAI-style tier hint (`auto`/`default`/`flex`/`priority`/`scale`) forwarded to compatible providers
+- **`SHOW_GENERATION_ID` valve** — captures the `id` field from chat-completion responses (works in both streaming and non-streaming modes) and appends `*Generation ID: gen-…*` so users can later call `GET /api/v1/generation?id={id}` for audit trails and per-request usage details
+- **Cached prompt-token cost breakdown** — when the provider reports `prompt_tokens_details.cached_tokens` (Anthropic prompt caching, OpenAI implicit caching, Gemini context caching), the `SHOW_COST_INFO` footer splits out cached vs. non-cached prompt tokens so users can see the savings (Anthropic caches save up to 90% on input cost)
+- **`_build_web_search_plugin()`**, **`_format_generation_id()`** — new helpers on `Pipe`
+
+### Changed
+
+- Model-list cache fingerprint now also includes `MODEL_CATEGORY` and `HIDE_DEPRECATED_MODELS` so toggling either invalidates the cached list
+- `pipes()` now sends `params={"output_modalities": ..., "category": ...}` when a category is set
+- `_prepare_payload()` now emits `service_tier`, `provider.only`, `provider.quantizations`, `provider.allow_fallbacks=false`, `provider.max_price.{prompt,completion}`, `reasoning.max_tokens`, and a `web` entry in `plugins` (without overwriting any user-supplied plugins)
+
+## [1.5.0] — 2026-05-07
+
+### Added
+
+- **Variant model routing** — new `MODEL_VARIANTS` valve (env: `OPENROUTER_MODEL_VARIANTS`). Comma-separated `base_id:variant` entries surface as virtual catalog rows that inherit the base model's display name and provider icon while OpenRouter routes the suffixed ID via its variant logic. Recognised tags: `free`, `thinking`, `online`, `nitro`, `exacto`, `extended`. Example: `MODEL_VARIANTS=openai/gpt-4o:nitro,anthropic/claude-3.5-sonnet:thinking`
+- **Reasoning effort: `minimal` and `xhigh`** — extends `REASONING_EFFORT` with two new levels for fastest/maximum-depth thinking on supporting models
+- **`REASONING_SUMMARY_MODE` valve** (env: `OPENROUTER_REASONING_SUMMARY_MODE`, default `disabled`) — requests a `reasoning.summary` block from supporting models. Options: `auto`, `concise`, `detailed`, `disabled`
+- **Anthropic interleaved thinking** — new `ENABLE_ANTHROPIC_INTERLEAVED_THINKING` valve (default on, env: `OPENROUTER_ANTHROPIC_INTERLEAVED_THINKING`). When the selected model is `anthropic/...`, automatically injects the `anthropic-beta: interleaved-thinking-2025-05-14` header so Claude interleaves reasoning with tool use
+- **`ANTHROPIC_PROMPT_CACHE_TTL` valve** (env: `OPENROUTER_ANTHROPIC_PROMPT_CACHE_TTL`, default `5m`) — extends `ENABLE_CACHE_CONTROL` so the ephemeral cache breakpoint can be set to either `5m` (default) or `1h` for longer cache lifetimes between turns
+- **`TOOL_CALLING_FILTER` valve** (env: `OPENROUTER_TOOL_CALLING_FILTER`, default `all`) — catalog filter for tool-capable models. Options: `all`, `only`, `exclude`. Reads `supported_parameters` from `/models` and matches on `tools`/`tool_choice`
+- **ZDR (Zero Data Retention) support** — two new valves: `ZDR_MODELS_ONLY` (catalog filter — fetches `/endpoints/zdr` and hides models without a ZDR-capable endpoint) and `ZDR_ENFORCE` (request-side — adds `provider.zdr=true` so OpenRouter rejects the call if no ZDR endpoint is available)
+- **`HTTP_REFERER_OVERRIDE` valve** (env: `OPENROUTER_HTTP_REFERER`) — explicit override for the `HTTP-Referer` app-attribution header. Empty falls back to `WEBUI_URL` env or `http://localhost:3000`
+- **`_load_zdr_model_ids()`**, **`_parse_variant_specs()`**, **`_expand_variant_models()`**, **`_resolve_referer()`**, **`_is_anthropic_model()`** — new instance methods on `Pipe`
+
+### Changed
+
+- **Breaking:** `FREE_ONLY` (boolean) replaced by **`FREE_MODEL_FILTER`** (env: `OPENROUTER_FREE_MODEL_FILTER`, default `all`). Options: `all`, `only`, `exclude`. Setups using `FREE_ONLY=true` should switch to `FREE_MODEL_FILTER=only`; setups using `FREE_ONLY=false` need no change
+- **Reasoning payload shape:** when both `REASONING_EFFORT` and `REASONING_SUMMARY_MODE` are set, both fields are merged into the same `reasoning` object instead of overwriting
+- Model-list cache fingerprint now also includes `FREE_MODEL_FILTER`, `TOOL_CALLING_FILTER`, `ZDR_MODELS_ONLY`, and `MODEL_VARIANTS` so toggling any of them invalidates the 5-minute cache
+- `_build_headers()` accepts an optional `model_id` kwarg so it can decide whether to inject Anthropic-specific beta headers
+
+## [1.4.0] — 2026-05-07
+
+### Added
+
+- **`OUTPUT_MODALITIES` valve** (env: `OPENROUTER_OUTPUT_MODALITIES`, default `all`) — controls which model output modalities are fetched from OpenRouter's `/models` endpoint. Accepts `text`, `image`, `audio`, `embeddings`, `all`, or a comma-separated combination
+- **Full-catalog model listing** — TTS (e.g. `openai/gpt-4o-mini-tts-*`), audio-output, image-generation, and embedding models now appear in the Open WebUI model selector by default
+- **Auto-discovered provider icons** — for providers not in the hardcoded fast-path dict, the pipe now lazy-loads OpenRouter's frontend provider registry (`/api/frontend/all-providers`) and resolves the icon from there. Adds icon coverage for ~20 additional model authors (xAI, Inflection, NVIDIA, Arcee, Morph, Cerebras, etc.) including gstatic favicons for providers without an OpenRouter-hosted logo. Slug normalization handles `x-ai` ↔ `xai` style mismatches
+- **`_load_provider_registry()`** and **`_get_provider_icon()`** methods on `Pipe` — layered icon resolution: hardcoded dict → registry exact slug → registry hyphen-stripped slug. Network failures are silent (best-effort fallback)
+
+### Changed
+
+- The `/models` request now passes `output_modalities=all` by default, so the catalog is no longer silently restricted to text-output models. Set `OUTPUT_MODALITIES = text` to restore the previous chat-only behaviour
+- Model-list cache fingerprint now includes `OUTPUT_MODALITIES`, so toggling the valve correctly invalidates the cached list
+- `_is_owui_managed_icon()` now also recognises `https://t0.gstatic.com/faviconV2` URLs as pipe-managed, so registry-sourced gstatic favicons remain overwriteable when OpenRouter updates its provider mapping
+
 ## [1.3.0] — 2026-05-07
 
 ### Added

diff --git a/README.md b/README.md
@@ -4,8 +4,9 @@
 [![Python](https://img.shields.io/badge/Python-%E2%89%A53.10-blue)](https://www.python.org/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 
-Access **340+ AI models** through OpenRouter directly inside Open WebUI — with provider routing,
-reasoning tokens, streaming, fallbacks, and cache control out of the box.
+Access the **full OpenRouter catalog** — chat, TTS, audio, image-generation, and embedding models —
+directly inside Open WebUI, with provider routing, reasoning tokens, streaming, fallbacks, and
+cache control out of the box.
 
 ## Feature gallery
 
@@ -50,15 +51,21 @@ reasoning tokens, streaming, fallbacks, and cache control out of the box.
 
 ## Features
 
-- **Manifold pipe** — exposes all OpenRouter models as native Open WebUI models in the model selector.
+- **Manifold pipe** — exposes the full OpenRouter catalog (chat, TTS, audio, image, embeddings) as native Open WebUI models in the model selector. Configurable via `OUTPUT_MODALITIES` and `MODEL_CATEGORY`.
+- **Web search plugin** — attach OpenRouter's `web` plugin to any model with domain allow/deny lists, custom search prompt, and result-count limits.
+- **Variant routing** — surface virtual `:nitro`/`:exacto`/`:thinking`/`:online`/`:free`/`:extended` model entries that route to OpenRouter's specialized profiles.
+- **Service tier hint** — forward OpenAI-style `flex`/`priority`/`scale` tiers to compatible providers.
+- **Generation auditability** — optional generation ID footer maps each response to OpenRouter's `/generation?id=` activity API.
+- **Cached-input savings** — surface cached vs. non-cached prompt tokens in the cost footer (Anthropic prompt caching, OpenAI implicit caching, Gemini context caching).
+- **Deprecation visibility** — models with an `expiration_date` are tagged with ⚠ in the selector (or hidden via `HIDE_DEPRECATED_MODELS`).
 - **Provider routing** — sort by `price`, `throughput`, or `latency`; prefer or exclude specific providers; enforce `require_parameters`.
 - **Reasoning tokens** — `<think>` blocks streamed in real time with configurable effort (`low`, `medium`, `high`).
 - **Streaming** — full SSE streaming with mid-stream error handling and automatic `<think>` closure on error.
 - **Model fallbacks** — automatic failover to one or more backup models via `FALLBACK_MODELS`.
 - **Middle-out compression** — fits long prompts within context windows (`transforms: ["middle-out"]`).
 - **Cache control** — Anthropic-style `cache_control` injection on the longest message chunk.
 - **Citations** — `[n]` references from web-search-enabled models are converted to markdown links.
-- **Provider icons** — 13 provider logos synced directly into Open WebUI's model database.
+- **Provider icons** — 13 hardcoded fast-path logos plus auto-discovered icons for ~20 more providers (xAI, Inflection, NVIDIA, Arcee, Morph, Cerebras, …) lazy-loaded from OpenRouter's provider registry, all synced directly into Open WebUI's model database.
 - **Retry logic** — exponential backoff with jitter on timeout and connection errors.
 - **FREE_ONLY mode** — filter to show only free-tier models (`:free` suffix or `0/0` pricing).
 - **Pre-flight validation** — invalid API keys are caught at model-fetch time, not after sending a message.
@@ -145,7 +152,10 @@ Every valve accepts an environment variable fallback. The table below lists both
 | Valve | Env Var | Default | Description |
 | --- | --- | --- | --- |
 | `INCLUDE_REASONING` | `OPENROUTER_INCLUDE_REASONING` | `true` | Request reasoning tokens (`<think>` blocks) |
-| `REASONING_EFFORT` | `OPENROUTER_REASONING_EFFORT` | `""` | Effort level: `low`, `medium`, `high`, or empty |
+| `REASONING_EFFORT` | `OPENROUTER_REASONING_EFFORT` | `""` | Effort level: `minimal`, `low`, `medium`, `high`, `xhigh`, or empty |
+| `REASONING_SUMMARY_MODE` | `OPENROUTER_REASONING_SUMMARY_MODE` | `disabled` | Reasoning-summary verbosity: `auto`, `concise`, `detailed`, `disabled` |
+| `REASONING_MAX_TOKENS` | `OPENROUTER_REASONING_MAX_TOKENS` | `0` | Hard cap on reasoning tokens per response (0 disables the cap) |
+| `ENABLE_ANTHROPIC_INTERLEAVED_THINKING` | `OPENROUTER_ANTHROPIC_INTERLEAVED_THINKING` | `true` | Auto-inject `anthropic-beta: interleaved-thinking-2025-05-14` for `anthropic/*` models |
 
 ### Display & Filtering
 
@@ -154,7 +164,13 @@ Every valve accepts an environment variable fallback. The table below lists both
 | `MODEL_PREFIX` | — | `None` | Custom prefix for model names (e.g. `🔥 `) |
 | `MODEL_PROVIDERS` | `OPENROUTER_MODEL_PROVIDERS` | `ALL` | Provider filter (e.g. `openai,anthropic`). `ALL` means no filter |
 | `INVERT_PROVIDER_LIST` | `OPENROUTER_INVERT_PROVIDER_LIST` | `false` | Treat `MODEL_PROVIDERS` as an exclusion list |
-| `FREE_ONLY` | `OPENROUTER_FREE_ONLY` | `false` | Show only free-tier models |
+| `FREE_MODEL_FILTER` | `OPENROUTER_FREE_MODEL_FILTER` | `all` | Free-tier filter: `all` / `only` / `exclude` |
+| `TOOL_CALLING_FILTER` | `OPENROUTER_TOOL_CALLING_FILTER` | `all` | Tool-capable filter (reads `supported_parameters`): `all` / `only` / `exclude` |
+| `OUTPUT_MODALITIES` | `OPENROUTER_OUTPUT_MODALITIES` | `all` | Output modalities to fetch from `/models`. `all` (default) lists every model. Restrict with `text`, `image`, `audio`, `embeddings`, or a comma list (e.g. `text,audio`) |
+| `MODEL_VARIANTS` | `OPENROUTER_MODEL_VARIANTS` | `""` | Comma-separated `base_id:tag` entries that surface virtual variant models (e.g. `openai/gpt-4o:nitro`). Tags: `free`, `thinking`, `online`, `nitro`, `exacto`, `extended` |
+| `MODEL_CATEGORY` | `OPENROUTER_MODEL_CATEGORY` | `""` | Server-side category filter (`?category=`). Common values: `programming`, `roleplay`, `marketing`, `science`, `legal`, `finance`, `health`, `academia` |
+| `HIDE_DEPRECATED_MODELS` | `OPENROUTER_HIDE_DEPRECATED_MODELS` | `false` | Hide models with a non-null `expiration_date`. When False, deprecated models are tagged `⚠ {name} (deprecated)` |
+| `ZDR_MODELS_ONLY` | `OPENROUTER_ZDR_MODELS_ONLY` | `false` | Catalog-side: hide models without a ZDR endpoint (reads `/endpoints/zdr`) |
 
 ### Provider Routing
 
@@ -163,16 +179,30 @@ Every valve accepts an environment variable fallback. The table below lists both
 | `PROVIDER_SORT` | `OPENROUTER_PROVIDER_SORT` | `""` | Sort: `price`, `throughput`, `latency` |
 | `PROVIDER_ORDER` | `OPENROUTER_PROVIDER_ORDER` | `""` | Preferred providers (comma-separated) |
 | `PROVIDER_IGNORE` | `OPENROUTER_PROVIDER_IGNORE` | `""` | Excluded providers (comma-separated) |
+| `PROVIDER_ONLY` | `OPENROUTER_PROVIDER_ONLY` | `""` | Provider allowlist (comma-separated). Merged with account-wide settings |
+| `PROVIDER_QUANTIZATIONS` | `OPENROUTER_PROVIDER_QUANTIZATIONS` | `""` | Allowed quantizations (comma-separated, e.g. `bf16,fp8`) |
+| `PROVIDER_ALLOW_FALLBACKS` | `OPENROUTER_PROVIDER_ALLOW_FALLBACKS` | `true` | When False, OpenRouter fails fast on the primary/ordered provider instead of falling back |
+| `PROVIDER_MAX_PRICE_PROMPT` | `OPENROUTER_PROVIDER_MAX_PRICE_PROMPT` | `""` | Maximum prompt price (USD per 1M tokens) |
+| `PROVIDER_MAX_PRICE_COMPLETION` | `OPENROUTER_PROVIDER_MAX_PRICE_COMPLETION` | `""` | Maximum completion price (USD per 1M tokens) |
+| `SERVICE_TIER` | `OPENROUTER_SERVICE_TIER` | `""` | OpenAI-style service tier: `auto`, `default`, `flex`, `priority`, `scale` |
 | `REQUIRE_PARAMETERS` | `OPENROUTER_REQUIRE_PARAMETERS` | `false` | Only use providers that support all request parameters |
 | `DATA_COLLECTION` | `OPENROUTER_DATA_COLLECTION` | `allow` | Data policy: `allow` or `deny` |
+| `ZDR_ENFORCE` | `OPENROUTER_ZDR_ENFORCE` | `false` | Send `provider.zdr=true` so OpenRouter routes only to ZDR endpoints (request fails if none available) |
 
 ### Advanced
 
 | Valve | Env Var | Default | Description |
 | --- | --- | --- | --- |
 | `FALLBACK_MODELS` | `OPENROUTER_FALLBACK_MODELS` | `""` | Fallback model IDs (comma-separated) |
 | `ENABLE_MIDDLE_OUT` | `OPENROUTER_ENABLE_MIDDLE_OUT` | `false` | Middle-out compression for long prompts |
+| `ENABLE_WEB_SEARCH` | `OPENROUTER_ENABLE_WEB_SEARCH` | `false` | Attach OpenRouter's `web` plugin so any model can ground answers in fresh web results |
+| `WEB_SEARCH_MAX_RESULTS` | `OPENROUTER_WEB_SEARCH_MAX_RESULTS` | `5` | Max search results passed to the model (1-20) |
+| `WEB_SEARCH_PROMPT` | `OPENROUTER_WEB_SEARCH_PROMPT` | `""` | Optional custom search prompt forwarded to the search engine |
+| `WEB_SEARCH_INCLUDE_DOMAINS` | `OPENROUTER_WEB_SEARCH_INCLUDE_DOMAINS` | `""` | Domain allowlist (supports wildcards & paths) |
+| `WEB_SEARCH_EXCLUDE_DOMAINS` | `OPENROUTER_WEB_SEARCH_EXCLUDE_DOMAINS` | `""` | Domain denylist |
 | `ENABLE_CACHE_CONTROL` | `OPENROUTER_ENABLE_CACHE_CONTROL` | `false` | Inject Anthropic `cache_control` on the longest message |
+| `ANTHROPIC_PROMPT_CACHE_TTL` | `OPENROUTER_ANTHROPIC_PROMPT_CACHE_TTL` | `5m` | TTL for the Anthropic ephemeral cache breakpoint: `5m` or `1h` |
+| `SHOW_GENERATION_ID` | `OPENROUTER_SHOW_GENERATION_ID` | `false` | Append the OpenRouter generation ID to each response (for `GET /generation?id=` lookups) |
 | `SYNC_PROVIDER_ICONS` | `OPENROUTER_SYNC_ICONS` | `true` | Sync provider icons into Open WebUI's model database |
 
 ### Network
@@ -181,6 +211,7 @@ Every valve accepts an environment variable fallback. The table below lists both
 | --- | --- | --- | --- |
 | `REQUEST_TIMEOUT` | `OPENROUTER_REQUEST_TIMEOUT` | `90` | HTTP timeout in seconds |
 | `MAX_RETRIES` | — | `2` | Auto-retry count on transient errors |
+| `HTTP_REFERER_OVERRIDE` | `OPENROUTER_HTTP_REFERER` | `""` | Override the `HTTP-Referer` header sent to OpenRouter (must include scheme). Empty falls back to `WEBUI_URL` |
 
 ## Architecture
 
@@ -320,6 +351,15 @@ A: `FALLBACK_MODELS` adds extra model IDs to the `models` array in the OpenRoute
 primary model fails, OpenRouter automatically tries the next one. Non-streaming responses include
 a "Responded by: model-id" attribution when a fallback handled the request.
 
+**Q: I selected a TTS / embeddings / image-generation model and got an error — why?**
+
+A: The pipe routes every request through OpenRouter's `/chat/completions` endpoint. Models that
+only expose a non-chat endpoint (e.g. pure TTS models served via `/audio/speech`) return an
+"endpoint not supported" error from OpenRouter. The pipe surfaces that error verbatim. Chat
+completion models that *output* audio or images (e.g. `openai/gpt-audio`) work normally — their
+audio transcript and generated images are rendered inline. To hide non-chat models from the
+selector entirely, set `OUTPUT_MODALITIES = text`.
+
 ## License
 
 This project is licensed under the **MIT License** — see the [LICENSE](LICENSE) file for details.
diff --git a/function.json b/function.json
@@ -3,13 +3,13 @@
   "name": "OpenRouter Pipe",
   "type": "manifold",
   "meta": {
-    "description": "Access 340+ AI models through OpenRouter directly inside Open WebUI. Features provider routing, reasoning tokens with <think> tags, full SSE streaming, model fallbacks, middle-out compression, Anthropic cache control, citations, 13 provider icons, and configurable retry logic.",
+    "description": "The definitive OpenRouter integration for Open WebUI. Full catalog (chat/TTS/audio/image/embeddings), variant routing (:nitro/:exacto/:thinking/:online/:free/:extended), web search plugin with domain filters, server-side category filter, deprecation warnings, extended reasoning (minimal→xhigh + max_tokens + summary), Anthropic interleaved thinking + cache TTL, ZDR enforcement, tool/free-tier filters, provider preferences (only/quantizations/max_price/allow_fallbacks), service tier routing (auto/flex/priority/scale), generation-ID auditability, cached-input cost breakdown, model fallbacks, middle-out compression, citations, auto-discovered provider icons.",
     "manifest": {
       "title": "OpenRouter Pipe",
       "author": "Sena Labs",
       "author_url": "https://github.com/sena-labs",
       "funding_url": "https://github.com/sponsors/sena-labs",
-      "version": "1.3.0",
+      "version": "1.6.0",
       "license": "MIT",
       "required_open_webui_version": "0.4.0",
       "requirements": ["requests>=2.20", "pydantic>=2.0"]