diff --git a/CHANGELOG.md b/CHANGELOG.md index d0c3db8..6fe0eef 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,62 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [1.6.0] — 2026-05-08 + +### Added + +- **Web search plugin** — five new valves (`ENABLE_WEB_SEARCH`, `WEB_SEARCH_MAX_RESULTS`, `WEB_SEARCH_PROMPT`, `WEB_SEARCH_INCLUDE_DOMAINS`, `WEB_SEARCH_EXCLUDE_DOMAINS`) attach OpenRouter's `web` plugin to every request so any model can ground answers in fresh web results, with domain allow/deny lists and a custom search prompt +- **`MODEL_CATEGORY` valve** — server-side `?category=...` filter on `/models` (e.g. `programming`, `roleplay`, `marketing`, `science`, `legal`, `finance`, `health`, `academia`) +- **Deprecation handling** — models with a non-null `expiration_date` are tagged `⚠ {name} (deprecated)` in the selector. New `HIDE_DEPRECATED_MODELS` valve removes them entirely +- **`REASONING_MAX_TOKENS` valve** — hard cap on reasoning tokens per response (sent as `reasoning.max_tokens`) for budget control on deep-thinking models +- **Provider preferences extras** — `PROVIDER_ONLY` (allowlist), `PROVIDER_QUANTIZATIONS` (e.g. `bf16,fp8`), `PROVIDER_ALLOW_FALLBACKS`, `PROVIDER_MAX_PRICE_PROMPT`, `PROVIDER_MAX_PRICE_COMPLETION`. Translates to `provider.only/quantizations/allow_fallbacks/max_price` per the OpenRouter SDK schema +- **`SERVICE_TIER` valve** — OpenAI-style tier hint (`auto`/`default`/`flex`/`priority`/`scale`) forwarded to compatible providers +- **`SHOW_GENERATION_ID` valve** — captures the `id` field from chat-completion responses (works in both streaming and non-streaming modes) and appends `*Generation ID: gen-…*` so users can later call `GET /api/v1/generation?id={id}` for audit trails and per-request usage details +- **Cached prompt-token cost breakdown** — when the provider reports `prompt_tokens_details.cached_tokens` (Anthropic prompt caching, OpenAI implicit caching, Gemini context caching), the `SHOW_COST_INFO` footer splits out cached vs. non-cached prompt tokens so users can see the savings (Anthropic caches save up to 90% on input cost) +- **`_build_web_search_plugin()`**, **`_format_generation_id()`** — new helpers on `Pipe` + +### Changed + +- Model-list cache fingerprint now also includes `MODEL_CATEGORY` and `HIDE_DEPRECATED_MODELS` so toggling either invalidates the cached list +- `pipes()` now sends `params={"output_modalities": ..., "category": ...}` when a category is set +- `_prepare_payload()` now emits `service_tier`, `provider.only`, `provider.quantizations`, `provider.allow_fallbacks=false`, `provider.max_price.{prompt,completion}`, `reasoning.max_tokens`, and a `web` entry in `plugins` (without overwriting any user-supplied plugins) + +## [1.5.0] — 2026-05-07 + +### Added + +- **Variant model routing** — new `MODEL_VARIANTS` valve (env: `OPENROUTER_MODEL_VARIANTS`). Comma-separated `base_id:variant` entries surface as virtual catalog rows that inherit the base model's display name and provider icon while OpenRouter routes the suffixed ID via its variant logic. Recognised tags: `free`, `thinking`, `online`, `nitro`, `exacto`, `extended`. Example: `MODEL_VARIANTS=openai/gpt-4o:nitro,anthropic/claude-3.5-sonnet:thinking` +- **Reasoning effort: `minimal` and `xhigh`** — extends `REASONING_EFFORT` with two new levels for fastest/maximum-depth thinking on supporting models +- **`REASONING_SUMMARY_MODE` valve** (env: `OPENROUTER_REASONING_SUMMARY_MODE`, default `disabled`) — requests a `reasoning.summary` block from supporting models. Options: `auto`, `concise`, `detailed`, `disabled` +- **Anthropic interleaved thinking** — new `ENABLE_ANTHROPIC_INTERLEAVED_THINKING` valve (default on, env: `OPENROUTER_ANTHROPIC_INTERLEAVED_THINKING`). When the selected model is `anthropic/...`, automatically injects the `anthropic-beta: interleaved-thinking-2025-05-14` header so Claude interleaves reasoning with tool use +- **`ANTHROPIC_PROMPT_CACHE_TTL` valve** (env: `OPENROUTER_ANTHROPIC_PROMPT_CACHE_TTL`, default `5m`) — extends `ENABLE_CACHE_CONTROL` so the ephemeral cache breakpoint can be set to either `5m` (default) or `1h` for longer cache lifetimes between turns +- **`TOOL_CALLING_FILTER` valve** (env: `OPENROUTER_TOOL_CALLING_FILTER`, default `all`) — catalog filter for tool-capable models. Options: `all`, `only`, `exclude`. Reads `supported_parameters` from `/models` and matches on `tools`/`tool_choice` +- **ZDR (Zero Data Retention) support** — two new valves: `ZDR_MODELS_ONLY` (catalog filter — fetches `/endpoints/zdr` and hides models without a ZDR-capable endpoint) and `ZDR_ENFORCE` (request-side — adds `provider.zdr=true` so OpenRouter rejects the call if no ZDR endpoint is available) +- **`HTTP_REFERER_OVERRIDE` valve** (env: `OPENROUTER_HTTP_REFERER`) — explicit override for the `HTTP-Referer` app-attribution header. Empty falls back to `WEBUI_URL` env or `http://localhost:3000` +- **`_load_zdr_model_ids()`**, **`_parse_variant_specs()`**, **`_expand_variant_models()`**, **`_resolve_referer()`**, **`_is_anthropic_model()`** — new instance methods on `Pipe` + +### Changed + +- **Breaking:** `FREE_ONLY` (boolean) replaced by **`FREE_MODEL_FILTER`** (env: `OPENROUTER_FREE_MODEL_FILTER`, default `all`). Options: `all`, `only`, `exclude`. Setups using `FREE_ONLY=true` should switch to `FREE_MODEL_FILTER=only`; setups using `FREE_ONLY=false` need no change +- **Reasoning payload shape:** when both `REASONING_EFFORT` and `REASONING_SUMMARY_MODE` are set, both fields are merged into the same `reasoning` object instead of overwriting +- Model-list cache fingerprint now also includes `FREE_MODEL_FILTER`, `TOOL_CALLING_FILTER`, `ZDR_MODELS_ONLY`, and `MODEL_VARIANTS` so toggling any of them invalidates the 5-minute cache +- `_build_headers()` accepts an optional `model_id` kwarg so it can decide whether to inject Anthropic-specific beta headers + +## [1.4.0] — 2026-05-07 + +### Added + +- **`OUTPUT_MODALITIES` valve** (env: `OPENROUTER_OUTPUT_MODALITIES`, default `all`) — controls which model output modalities are fetched from OpenRouter's `/models` endpoint. Accepts `text`, `image`, `audio`, `embeddings`, `all`, or a comma-separated combination +- **Full-catalog model listing** — TTS (e.g. `openai/gpt-4o-mini-tts-*`), audio-output, image-generation, and embedding models now appear in the Open WebUI model selector by default +- **Auto-discovered provider icons** — for providers not in the hardcoded fast-path dict, the pipe now lazy-loads OpenRouter's frontend provider registry (`/api/frontend/all-providers`) and resolves the icon from there. Adds icon coverage for ~20 additional model authors (xAI, Inflection, NVIDIA, Arcee, Morph, Cerebras, etc.) including gstatic favicons for providers without an OpenRouter-hosted logo. Slug normalization handles `x-ai` ↔ `xai` style mismatches +- **`_load_provider_registry()`** and **`_get_provider_icon()`** methods on `Pipe` — layered icon resolution: hardcoded dict → registry exact slug → registry hyphen-stripped slug. Network failures are silent (best-effort fallback) + +### Changed + +- The `/models` request now passes `output_modalities=all` by default, so the catalog is no longer silently restricted to text-output models. Set `OUTPUT_MODALITIES = text` to restore the previous chat-only behaviour +- Model-list cache fingerprint now includes `OUTPUT_MODALITIES`, so toggling the valve correctly invalidates the cached list +- `_is_owui_managed_icon()` now also recognises `https://t0.gstatic.com/faviconV2` URLs as pipe-managed, so registry-sourced gstatic favicons remain overwriteable when OpenRouter updates its provider mapping + ## [1.3.0] — 2026-05-07 ### Added diff --git a/README.md b/README.md index 3f8f542..bc13378 100644 --- a/README.md +++ b/README.md @@ -4,8 +4,9 @@ [![Python](https://img.shields.io/badge/Python-%E2%89%A53.10-blue)](https://www.python.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) -Access **340+ AI models** through OpenRouter directly inside Open WebUI — with provider routing, -reasoning tokens, streaming, fallbacks, and cache control out of the box. +Access the **full OpenRouter catalog** — chat, TTS, audio, image-generation, and embedding models — +directly inside Open WebUI, with provider routing, reasoning tokens, streaming, fallbacks, and +cache control out of the box. ## Feature gallery @@ -50,7 +51,13 @@ reasoning tokens, streaming, fallbacks, and cache control out of the box. ## Features -- **Manifold pipe** — exposes all OpenRouter models as native Open WebUI models in the model selector. +- **Manifold pipe** — exposes the full OpenRouter catalog (chat, TTS, audio, image, embeddings) as native Open WebUI models in the model selector. Configurable via `OUTPUT_MODALITIES` and `MODEL_CATEGORY`. +- **Web search plugin** — attach OpenRouter's `web` plugin to any model with domain allow/deny lists, custom search prompt, and result-count limits. +- **Variant routing** — surface virtual `:nitro`/`:exacto`/`:thinking`/`:online`/`:free`/`:extended` model entries that route to OpenRouter's specialized profiles. +- **Service tier hint** — forward OpenAI-style `flex`/`priority`/`scale` tiers to compatible providers. +- **Generation auditability** — optional generation ID footer maps each response to OpenRouter's `/generation?id=` activity API. +- **Cached-input savings** — surface cached vs. non-cached prompt tokens in the cost footer (Anthropic prompt caching, OpenAI implicit caching, Gemini context caching). +- **Deprecation visibility** — models with an `expiration_date` are tagged with ⚠ in the selector (or hidden via `HIDE_DEPRECATED_MODELS`). - **Provider routing** — sort by `price`, `throughput`, or `latency`; prefer or exclude specific providers; enforce `require_parameters`. - **Reasoning tokens** — `` blocks streamed in real time with configurable effort (`low`, `medium`, `high`). - **Streaming** — full SSE streaming with mid-stream error handling and automatic `` closure on error. @@ -58,7 +65,7 @@ reasoning tokens, streaming, fallbacks, and cache control out of the box. - **Middle-out compression** — fits long prompts within context windows (`transforms: ["middle-out"]`). - **Cache control** — Anthropic-style `cache_control` injection on the longest message chunk. - **Citations** — `[n]` references from web-search-enabled models are converted to markdown links. -- **Provider icons** — 13 provider logos synced directly into Open WebUI's model database. +- **Provider icons** — 13 hardcoded fast-path logos plus auto-discovered icons for ~20 more providers (xAI, Inflection, NVIDIA, Arcee, Morph, Cerebras, …) lazy-loaded from OpenRouter's provider registry, all synced directly into Open WebUI's model database. - **Retry logic** — exponential backoff with jitter on timeout and connection errors. - **FREE_ONLY mode** — filter to show only free-tier models (`:free` suffix or `0/0` pricing). - **Pre-flight validation** — invalid API keys are caught at model-fetch time, not after sending a message. @@ -145,7 +152,10 @@ Every valve accepts an environment variable fallback. The table below lists both | Valve | Env Var | Default | Description | | --- | --- | --- | --- | | `INCLUDE_REASONING` | `OPENROUTER_INCLUDE_REASONING` | `true` | Request reasoning tokens (`` blocks) | -| `REASONING_EFFORT` | `OPENROUTER_REASONING_EFFORT` | `""` | Effort level: `low`, `medium`, `high`, or empty | +| `REASONING_EFFORT` | `OPENROUTER_REASONING_EFFORT` | `""` | Effort level: `minimal`, `low`, `medium`, `high`, `xhigh`, or empty | +| `REASONING_SUMMARY_MODE` | `OPENROUTER_REASONING_SUMMARY_MODE` | `disabled` | Reasoning-summary verbosity: `auto`, `concise`, `detailed`, `disabled` | +| `REASONING_MAX_TOKENS` | `OPENROUTER_REASONING_MAX_TOKENS` | `0` | Hard cap on reasoning tokens per response (0 disables the cap) | +| `ENABLE_ANTHROPIC_INTERLEAVED_THINKING` | `OPENROUTER_ANTHROPIC_INTERLEAVED_THINKING` | `true` | Auto-inject `anthropic-beta: interleaved-thinking-2025-05-14` for `anthropic/*` models | ### Display & Filtering @@ -154,7 +164,13 @@ Every valve accepts an environment variable fallback. The table below lists both | `MODEL_PREFIX` | — | `None` | Custom prefix for model names (e.g. `🔥 `) | | `MODEL_PROVIDERS` | `OPENROUTER_MODEL_PROVIDERS` | `ALL` | Provider filter (e.g. `openai,anthropic`). `ALL` means no filter | | `INVERT_PROVIDER_LIST` | `OPENROUTER_INVERT_PROVIDER_LIST` | `false` | Treat `MODEL_PROVIDERS` as an exclusion list | -| `FREE_ONLY` | `OPENROUTER_FREE_ONLY` | `false` | Show only free-tier models | +| `FREE_MODEL_FILTER` | `OPENROUTER_FREE_MODEL_FILTER` | `all` | Free-tier filter: `all` / `only` / `exclude` | +| `TOOL_CALLING_FILTER` | `OPENROUTER_TOOL_CALLING_FILTER` | `all` | Tool-capable filter (reads `supported_parameters`): `all` / `only` / `exclude` | +| `OUTPUT_MODALITIES` | `OPENROUTER_OUTPUT_MODALITIES` | `all` | Output modalities to fetch from `/models`. `all` (default) lists every model. Restrict with `text`, `image`, `audio`, `embeddings`, or a comma list (e.g. `text,audio`) | +| `MODEL_VARIANTS` | `OPENROUTER_MODEL_VARIANTS` | `""` | Comma-separated `base_id:tag` entries that surface virtual variant models (e.g. `openai/gpt-4o:nitro`). Tags: `free`, `thinking`, `online`, `nitro`, `exacto`, `extended` | +| `MODEL_CATEGORY` | `OPENROUTER_MODEL_CATEGORY` | `""` | Server-side category filter (`?category=`). Common values: `programming`, `roleplay`, `marketing`, `science`, `legal`, `finance`, `health`, `academia` | +| `HIDE_DEPRECATED_MODELS` | `OPENROUTER_HIDE_DEPRECATED_MODELS` | `false` | Hide models with a non-null `expiration_date`. When False, deprecated models are tagged `⚠ {name} (deprecated)` | +| `ZDR_MODELS_ONLY` | `OPENROUTER_ZDR_MODELS_ONLY` | `false` | Catalog-side: hide models without a ZDR endpoint (reads `/endpoints/zdr`) | ### Provider Routing @@ -163,8 +179,15 @@ Every valve accepts an environment variable fallback. The table below lists both | `PROVIDER_SORT` | `OPENROUTER_PROVIDER_SORT` | `""` | Sort: `price`, `throughput`, `latency` | | `PROVIDER_ORDER` | `OPENROUTER_PROVIDER_ORDER` | `""` | Preferred providers (comma-separated) | | `PROVIDER_IGNORE` | `OPENROUTER_PROVIDER_IGNORE` | `""` | Excluded providers (comma-separated) | +| `PROVIDER_ONLY` | `OPENROUTER_PROVIDER_ONLY` | `""` | Provider allowlist (comma-separated). Merged with account-wide settings | +| `PROVIDER_QUANTIZATIONS` | `OPENROUTER_PROVIDER_QUANTIZATIONS` | `""` | Allowed quantizations (comma-separated, e.g. `bf16,fp8`) | +| `PROVIDER_ALLOW_FALLBACKS` | `OPENROUTER_PROVIDER_ALLOW_FALLBACKS` | `true` | When False, OpenRouter fails fast on the primary/ordered provider instead of falling back | +| `PROVIDER_MAX_PRICE_PROMPT` | `OPENROUTER_PROVIDER_MAX_PRICE_PROMPT` | `""` | Maximum prompt price (USD per 1M tokens) | +| `PROVIDER_MAX_PRICE_COMPLETION` | `OPENROUTER_PROVIDER_MAX_PRICE_COMPLETION` | `""` | Maximum completion price (USD per 1M tokens) | +| `SERVICE_TIER` | `OPENROUTER_SERVICE_TIER` | `""` | OpenAI-style service tier: `auto`, `default`, `flex`, `priority`, `scale` | | `REQUIRE_PARAMETERS` | `OPENROUTER_REQUIRE_PARAMETERS` | `false` | Only use providers that support all request parameters | | `DATA_COLLECTION` | `OPENROUTER_DATA_COLLECTION` | `allow` | Data policy: `allow` or `deny` | +| `ZDR_ENFORCE` | `OPENROUTER_ZDR_ENFORCE` | `false` | Send `provider.zdr=true` so OpenRouter routes only to ZDR endpoints (request fails if none available) | ### Advanced @@ -172,7 +195,14 @@ Every valve accepts an environment variable fallback. The table below lists both | --- | --- | --- | --- | | `FALLBACK_MODELS` | `OPENROUTER_FALLBACK_MODELS` | `""` | Fallback model IDs (comma-separated) | | `ENABLE_MIDDLE_OUT` | `OPENROUTER_ENABLE_MIDDLE_OUT` | `false` | Middle-out compression for long prompts | +| `ENABLE_WEB_SEARCH` | `OPENROUTER_ENABLE_WEB_SEARCH` | `false` | Attach OpenRouter's `web` plugin so any model can ground answers in fresh web results | +| `WEB_SEARCH_MAX_RESULTS` | `OPENROUTER_WEB_SEARCH_MAX_RESULTS` | `5` | Max search results passed to the model (1-20) | +| `WEB_SEARCH_PROMPT` | `OPENROUTER_WEB_SEARCH_PROMPT` | `""` | Optional custom search prompt forwarded to the search engine | +| `WEB_SEARCH_INCLUDE_DOMAINS` | `OPENROUTER_WEB_SEARCH_INCLUDE_DOMAINS` | `""` | Domain allowlist (supports wildcards & paths) | +| `WEB_SEARCH_EXCLUDE_DOMAINS` | `OPENROUTER_WEB_SEARCH_EXCLUDE_DOMAINS` | `""` | Domain denylist | | `ENABLE_CACHE_CONTROL` | `OPENROUTER_ENABLE_CACHE_CONTROL` | `false` | Inject Anthropic `cache_control` on the longest message | +| `ANTHROPIC_PROMPT_CACHE_TTL` | `OPENROUTER_ANTHROPIC_PROMPT_CACHE_TTL` | `5m` | TTL for the Anthropic ephemeral cache breakpoint: `5m` or `1h` | +| `SHOW_GENERATION_ID` | `OPENROUTER_SHOW_GENERATION_ID` | `false` | Append the OpenRouter generation ID to each response (for `GET /generation?id=` lookups) | | `SYNC_PROVIDER_ICONS` | `OPENROUTER_SYNC_ICONS` | `true` | Sync provider icons into Open WebUI's model database | ### Network @@ -181,6 +211,7 @@ Every valve accepts an environment variable fallback. The table below lists both | --- | --- | --- | --- | | `REQUEST_TIMEOUT` | `OPENROUTER_REQUEST_TIMEOUT` | `90` | HTTP timeout in seconds | | `MAX_RETRIES` | — | `2` | Auto-retry count on transient errors | +| `HTTP_REFERER_OVERRIDE` | `OPENROUTER_HTTP_REFERER` | `""` | Override the `HTTP-Referer` header sent to OpenRouter (must include scheme). Empty falls back to `WEBUI_URL` | ## Architecture @@ -320,6 +351,15 @@ A: `FALLBACK_MODELS` adds extra model IDs to the `models` array in the OpenRoute primary model fails, OpenRouter automatically tries the next one. Non-streaming responses include a "Responded by: model-id" attribution when a fallback handled the request. +**Q: I selected a TTS / embeddings / image-generation model and got an error — why?** + +A: The pipe routes every request through OpenRouter's `/chat/completions` endpoint. Models that +only expose a non-chat endpoint (e.g. pure TTS models served via `/audio/speech`) return an +"endpoint not supported" error from OpenRouter. The pipe surfaces that error verbatim. Chat +completion models that *output* audio or images (e.g. `openai/gpt-audio`) work normally — their +audio transcript and generated images are rendered inline. To hide non-chat models from the +selector entirely, set `OUTPUT_MODALITIES = text`. + ## License This project is licensed under the **MIT License** — see the [LICENSE](LICENSE) file for details. diff --git a/function.json b/function.json index c132bb7..f708d95 100644 --- a/function.json +++ b/function.json @@ -3,13 +3,13 @@ "name": "OpenRouter Pipe", "type": "manifold", "meta": { - "description": "Access 340+ AI models through OpenRouter directly inside Open WebUI. Features provider routing, reasoning tokens with tags, full SSE streaming, model fallbacks, middle-out compression, Anthropic cache control, citations, 13 provider icons, and configurable retry logic.", + "description": "The definitive OpenRouter integration for Open WebUI. Full catalog (chat/TTS/audio/image/embeddings), variant routing (:nitro/:exacto/:thinking/:online/:free/:extended), web search plugin with domain filters, server-side category filter, deprecation warnings, extended reasoning (minimal→xhigh + max_tokens + summary), Anthropic interleaved thinking + cache TTL, ZDR enforcement, tool/free-tier filters, provider preferences (only/quantizations/max_price/allow_fallbacks), service tier routing (auto/flex/priority/scale), generation-ID auditability, cached-input cost breakdown, model fallbacks, middle-out compression, citations, auto-discovered provider icons.", "manifest": { "title": "OpenRouter Pipe", "author": "Sena Labs", "author_url": "https://github.com/sena-labs", "funding_url": "https://github.com/sponsors/sena-labs", - "version": "1.3.0", + "version": "1.6.0", "license": "MIT", "required_open_webui_version": "0.4.0", "requirements": ["requests>=2.20", "pydantic>=2.0"] diff --git a/integration_test.py b/integration_test.py index 8fdd156..51b5201 100644 --- a/integration_test.py +++ b/integration_test.py @@ -169,13 +169,13 @@ def _check_chat_available() -> bool: ) # ══════════════════════════════════════════════════════════════════════════════ -# 3. FREE_ONLY filter +# 3. FREE_MODEL_FILTER='only' # ══════════════════════════════════════════════════════════════════════════════ -_section("3. FREE_ONLY filter") +_section("3. FREE_MODEL_FILTER='only'") pipe_free = Pipe() -pipe_free.valves = Pipe.Valves(OPENROUTER_API_KEY=API_KEY, FREE_ONLY=True) +pipe_free.valves = Pipe.Valves(OPENROUTER_API_KEY=API_KEY, FREE_MODEL_FILTER="only") free_models = pipe_free.pipes() _assert(len(free_models) > 0, f"free models: {len(free_models)}") _assert( diff --git a/openrouter_pipe.py b/openrouter_pipe.py index 8d4f3c8..8281ee0 100644 --- a/openrouter_pipe.py +++ b/openrouter_pipe.py @@ -3,12 +3,12 @@ author: Sena Labs author_url: https://github.com/sena-labs funding_url: https://github.com/sponsors/sena-labs -version: 1.3.0 +version: 1.6.0 license: MIT icon_url: data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAxMDAgMTAwIj48ZGVmcz48bGluZWFyR3JhZGllbnQgaWQ9ImJnIiB4MT0iMCUiIHkxPSIwJSIgeDI9IjEwMCUiIHkyPSIxMDAlIj48c3RvcCBvZmZzZXQ9IjAlIiBzdG9wLWNvbG9yPSIjNmQyOGQ5Ii8+PHN0b3Agb2Zmc2V0PSIxMDAlIiBzdG9wLWNvbG9yPSIjYTc4YmZhIi8+PC9saW5lYXJHcmFkaWVudD48L2RlZnM+PHJlY3Qgd2lkdGg9IjEwMCIgaGVpZ2h0PSIxMDAiIHJ4PSIyMCIgZmlsbD0idXJsKCNiZykiLz48cGF0aCBkPSJNMjAgNTAgQzIwIDMwLCA0MCAzMCwgNTAgMzAgTDUwIDIyIEw2OCA0MCBMNTAgNTggTDUwIDUwIEM0MCA1MCwgMzUgNDUsIDMwIDUwIEMyNSA1NSwgMjAgNzAsIDIwIDUwIFoiIGZpbGw9IndoaXRlIiBvcGFjaXR5PSIwLjk1Ii8+PGNpcmNsZSBjeD0iNzgiIGN5PSIzMCIgcj0iNyIgZmlsbD0id2hpdGUiIG9wYWNpdHk9IjAuOCIvPjxjaXJjbGUgY3g9IjgyIiBjeT0iNTAiIHI9IjciIGZpbGw9IndoaXRlIiBvcGFjaXR5PSIwLjk1Ii8+PGNpcmNsZSBjeD0iNzgiIGN5PSI3MCIgcj0iNyIgZmlsbD0id2hpdGUiIG9wYWNpdHk9IjAuOCIvPjxsaW5lIHgxPSI2OCIgeTE9IjQwIiB4Mj0iNzYiIHkyPSIzMiIgc3Ryb2tlPSJ3aGl0ZSIgc3Ryb2tlLXdpZHRoPSIyIiBvcGFjaXR5PSIwLjUiLz48bGluZSB4MT0iNjgiIHkxPSI0MCIgeDI9Ijc2IiB5Mj0iNTAiIHN0cm9rZT0id2hpdGUiIHN0cm9rZS13aWR0aD0iMiIgb3BhY2l0eT0iMC41Ii8+PGxpbmUgeDE9IjY4IiB5MT0iNDAiIHgyPSI3NiIgeTI9IjY4IiBzdHJva2U9IndoaXRlIiBzdHJva2Utd2lkdGg9IjIiIG9wYWNpdHk9IjAuNSIvPjwvc3ZnPg== required_open_webui_version: 0.4.0 requirements: requests>=2.20, pydantic>=2.0 -description: Access 340+ AI models through OpenRouter directly inside Open WebUI. Features provider routing, reasoning tokens with tags, full SSE streaming, model fallbacks, middle-out compression, Anthropic cache control, citations, 13 provider icons, and configurable retry logic. +description: The definitive OpenRouter integration for Open WebUI. Full catalog (chat/TTS/audio/image/embeddings), variant routing (:nitro/:exacto/:thinking/:online/:free/:extended), web search plugin with domain filters, server-side category filter, deprecation warnings, extended reasoning (minimal→xhigh + max_tokens + summary), Anthropic interleaved thinking + cache TTL, ZDR enforcement, tool/free-tier filters, provider preferences (only/quantizations/max_price/allow_fallbacks), service tier routing (auto/flex/priority/scale), generation-ID auditability, cached-input cost breakdown, model fallbacks, middle-out compression, citations, auto-discovered provider icons. """ import copy @@ -35,13 +35,31 @@ # API path constants _API_PATH_MODELS = "/models" _API_PATH_CHAT = "/chat/completions" +_API_PATH_ZDR_ENDPOINTS = "/endpoints/zdr" + +# Beta header for Claude's interleaved-thinking + tool-use mode. +# https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking +_ANTHROPIC_INTERLEAVED_THINKING_BETA = "interleaved-thinking-2025-05-14" + +# OpenRouter variant suffixes that route to specialized providers/profiles. +# https://openrouter.ai/docs/features/preset-routing +_RECOGNISED_VARIANT_TAGS = frozenset( + {"free", "thinking", "online", "nitro", "exacto", "extended"} +) # Cache TTL for model list (seconds) _MODELS_CACHE_TTL = 300.0 # 5 minutes +# OpenRouter's frontend provider registry — gives us icon URLs for ~70 providers +# (hosted SVG/PNG when available, gstatic favicons otherwise). Used as a +# dynamic fallback when a model's author isn't in _PROVIDER_ICONS. +_PROVIDER_REGISTRY_URL = "https://openrouter.ai/api/frontend/all-providers" + # Provider icons — synced into the Open WebUI Models database by # _sync_model_icons() so the frontend can serve them via # /models/model/profile/image. Disable with SYNC_PROVIDER_ICONS = False. +# Hardcoded fast path for top model authors; everything else is auto-discovered +# via _load_provider_registry(). # URLs verified against https://openrouter.ai/images/icons/ (May 2025). _PROVIDER_ICONS = { "openai": "https://openrouter.ai/images/icons/OpenAI.svg", @@ -70,8 +88,12 @@ def _is_owui_managed_icon(url: str) -> bool: data: URLs are the pipe's own SVG icon that OWUI assigns as default to all manifold child models. openrouter.ai/images/models/ and - openrouter.ai/images/icons/ are the provider icon paths we write (the - former was the old path, now superseded by the latter). Any other URL is + openrouter.ai/images/icons/ are the OpenRouter-hosted provider icons we + write (the former was the old path, superseded by the latter). + t0.gstatic.com/faviconV2 URLs are the gstatic favicons returned by + OpenRouter's provider registry for providers without a hosted icon — we + write those too as part of icon auto-discovery, so they must remain + overwriteable when OpenRouter updates its mapping. Any other URL is assumed to be a user-set custom icon and must not be overwritten. """ return ( @@ -79,6 +101,7 @@ def _is_owui_managed_icon(url: str) -> bool: or url.startswith("data:") or url.startswith("https://openrouter.ai/images/models/") or url.startswith("https://openrouter.ai/images/icons/") + or url.startswith("https://t0.gstatic.com/faviconV2") ) @@ -126,7 +149,11 @@ def _format_citation_list(citations: Optional[List[str]]) -> str: def _format_cost_info(usage: dict, currency: str = "USD") -> str: - """Format token usage and cost from an OpenRouter usage dict.""" + """Format token usage and cost from an OpenRouter usage dict. + + When the provider reports cached prompt tokens (90%+ cheaper on most + providers), the breakdown is shown so users see the savings. + """ if not usage: return "" prompt = usage.get("prompt_tokens", 0) @@ -134,7 +161,23 @@ def _format_cost_info(usage: dict, currency: str = "USD") -> str: total = usage.get("total_tokens", 0) or (prompt + completion) cost = usage.get("cost") - token_str = f"{prompt:,} prompt + {completion:,} completion = {total:,} total" + # Cached prompt tokens — emitted by Anthropic prompt caching, OpenAI + # implicit caching, and Gemini context caching. Shape varies per provider. + cached_tokens = 0 + details = usage.get("prompt_tokens_details") or {} + if isinstance(details, dict): + cached_tokens = details.get("cached_tokens") or 0 + if not cached_tokens: + cached_tokens = usage.get("cache_read_input_tokens") or 0 + + if cached_tokens: + non_cached = max(prompt - int(cached_tokens), 0) + token_str = ( + f"{non_cached:,} prompt + {int(cached_tokens):,} cached + " + f"{completion:,} completion = {total:,} total" + ) + else: + token_str = f"{prompt:,} prompt + {completion:,} completion = {total:,} total" parts = [f"**Tokens:** {token_str}"] if cost is not None: @@ -156,6 +199,17 @@ def _format_cost_info(usage: dict, currency: str = "USD") -> str: return f"\n\n---\n*{' · '.join(parts)}*" +def _format_generation_id(generation_id: Optional[str]) -> str: + """Format the OpenRouter generation ID footer. + + Users can pass the ID to ``GET /api/v1/generation?id={id}`` to retrieve + detailed usage and routing info for any past request. + """ + if not generation_id: + return "" + return f"\n\n---\n*Generation ID: `{generation_id}`*" + + def _format_image_output(images: list) -> str: """Format OpenRouter image output objects as markdown image tags. @@ -189,23 +243,69 @@ class Valves(BaseModel): ) REASONING_EFFORT: str = Field( default=os.getenv("OPENROUTER_REASONING_EFFORT", ""), - description="Controls reasoning depth. Works independently of Include Reasoning", + description=( + "Controls reasoning depth. Works independently of Include Reasoning. " + "'minimal' favors fastest output, 'xhigh' requests maximum depth on " + "supporting models." + ), json_schema_extra={ "input": { "type": "select", "options": [ {"value": "", "label": "Disabled"}, + {"value": "minimal", "label": "Minimal"}, {"value": "low", "label": "Low"}, {"value": "medium", "label": "Medium"}, {"value": "high", "label": "High"}, + {"value": "xhigh", "label": "Extra High"}, + ], + } + }, + ) + REASONING_SUMMARY_MODE: str = Field( + default=os.getenv("OPENROUTER_REASONING_SUMMARY_MODE", "disabled"), + description=( + "Reasoning summary verbosity sent as `reasoning.summary` in the " + "request payload. 'disabled' (default) skips the field entirely; " + "supporting models emit a concise/detailed summary block alongside " + "their reasoning trace." + ), + json_schema_extra={ + "input": { + "type": "select", + "options": [ + {"value": "disabled", "label": "Disabled"}, + {"value": "auto", "label": "Auto"}, + {"value": "concise", "label": "Concise"}, + {"value": "detailed", "label": "Detailed"}, ], } }, ) + REASONING_MAX_TOKENS: int = Field( + default=int(os.getenv("OPENROUTER_REASONING_MAX_TOKENS", "0")), + ge=0, + description=( + "Hard cap on reasoning tokens per response (sent as " + "`reasoning.max_tokens`). 0 (default) leaves the cap to the " + "provider. Useful for budget control on deep-thinking models." + ), + ) INCLUDE_REASONING: bool = Field( default=os.getenv("OPENROUTER_INCLUDE_REASONING", "true").lower() == "true", description="Show model reasoning in blocks. Can be used with or without Reasoning Effort", ) + ENABLE_ANTHROPIC_INTERLEAVED_THINKING: bool = Field( + default=os.getenv( + "OPENROUTER_ANTHROPIC_INTERLEAVED_THINKING", "true" + ).lower() + == "true", + description=( + "When True and the selected model is `anthropic/...`, send the " + "`anthropic-beta: interleaved-thinking-2025-05-14` header so Claude " + "interleaves reasoning with tool use. No effect on other providers." + ), + ) MODEL_PREFIX: Optional[str] = Field( default=None, description="Prefix shown before model names (include trailing space if needed, e.g. 'OR: ')" ) @@ -218,9 +318,80 @@ class Valves(BaseModel): == "true", description="When true the provider list becomes an exclusion list", ) - FREE_ONLY: bool = Field( - default=os.getenv("OPENROUTER_FREE_ONLY", "false").lower() == "true", - description="Show only free-tier models (by suffix :free or zero pricing)", + FREE_MODEL_FILTER: str = Field( + default=os.getenv("OPENROUTER_FREE_MODEL_FILTER", "all"), + description=( + "Filter the catalog by free-tier status (':free' suffix or zero " + "prompt+completion pricing). 'all' = no filter (default), " + "'only' = keep just free models, 'exclude' = hide free models." + ), + json_schema_extra={ + "input": { + "type": "select", + "options": [ + {"value": "all", "label": "All"}, + {"value": "only", "label": "Only free"}, + {"value": "exclude", "label": "Exclude free"}, + ], + } + }, + ) + TOOL_CALLING_FILTER: str = Field( + default=os.getenv("OPENROUTER_TOOL_CALLING_FILTER", "all"), + description=( + "Filter the catalog by tool-calling capability " + "(`supported_parameters` containing `tools` or `tool_choice`). " + "'all' (default) keeps everything, 'only' restricts to tool-capable " + "models, 'exclude' hides them." + ), + json_schema_extra={ + "input": { + "type": "select", + "options": [ + {"value": "all", "label": "All"}, + {"value": "only", "label": "Only tool-capable"}, + {"value": "exclude", "label": "Exclude tool-capable"}, + ], + } + }, + ) + MODEL_VARIANTS: str = Field( + default=os.getenv("OPENROUTER_MODEL_VARIANTS", ""), + description=( + "Comma-separated `base_id:variant` entries to expose as virtual " + "models that inherit the base model's metadata (name, icon). " + "Example: 'openai/gpt-4o:nitro, anthropic/claude-3.5-sonnet:thinking'. " + "Recognised tags: free, thinking, online, nitro, exacto, extended. " + "OpenRouter routes the suffixed ID specially " + "(see https://openrouter.ai/docs/features/preset-routing)." + ), + ) + MODEL_CATEGORY: str = Field( + default=os.getenv("OPENROUTER_MODEL_CATEGORY", ""), + description=( + "Server-side category filter for `/models` (passed as " + "`?category=...`). Empty disables. Common values: " + "programming, roleplay, marketing, marketing/seo, technology, " + "science, translation, legal, finance, health, trivia, academia." + ), + ) + HIDE_DEPRECATED_MODELS: bool = Field( + default=os.getenv("OPENROUTER_HIDE_DEPRECATED_MODELS", "false").lower() + == "true", + description=( + "Hide models with a non-null `expiration_date`. When False " + "(default), deprecated models stay visible but are tagged with " + "a ⚠ prefix in the display name." + ), + ) + OUTPUT_MODALITIES: str = Field( + default=os.getenv("OPENROUTER_OUTPUT_MODALITIES", "all"), + description=( + "Output modalities to fetch from OpenRouter's /models endpoint. " + "'all' (default) lists every model — chat, TTS, audio, image, and embeddings. " + "Use 'text' for chat-only, or a comma list e.g. 'text,audio'. " + "Valid tokens: text, image, audio, embeddings, all." + ), ) PROVIDER_SORT: str = Field( default=os.getenv("OPENROUTER_PROVIDER_SORT", ""), @@ -245,6 +416,67 @@ class Valves(BaseModel): default=os.getenv("OPENROUTER_PROVIDER_IGNORE", ""), description="Excluded providers, comma-separated", ) + PROVIDER_ONLY: str = Field( + default=os.getenv("OPENROUTER_PROVIDER_ONLY", ""), + description=( + "Allowlist of provider slugs to use (comma-separated). When " + "set, OpenRouter routes only to these providers. Merged with " + "your account-wide allowlist." + ), + ) + PROVIDER_QUANTIZATIONS: str = Field( + default=os.getenv("OPENROUTER_PROVIDER_QUANTIZATIONS", ""), + description=( + "Comma-separated quantization filters (e.g. 'bf16,fp8'). Only " + "endpoints serving the model at one of these precisions will " + "be used. Common values: bf16, fp16, fp8, int8, int4." + ), + ) + PROVIDER_ALLOW_FALLBACKS: bool = Field( + default=os.getenv("OPENROUTER_PROVIDER_ALLOW_FALLBACKS", "true").lower() + == "true", + description=( + "When True (default), OpenRouter falls back to alternate " + "providers if the primary one (or those in PROVIDER_ORDER) is " + "unavailable. Set False to fail fast on the primary provider." + ), + ) + PROVIDER_MAX_PRICE_PROMPT: str = Field( + default=os.getenv("OPENROUTER_PROVIDER_MAX_PRICE_PROMPT", ""), + description=( + "Maximum prompt price (USD per 1M tokens) you accept for this " + "request, e.g. '3.0'. Empty disables. Sent as " + "`provider.max_price.prompt`." + ), + ) + PROVIDER_MAX_PRICE_COMPLETION: str = Field( + default=os.getenv("OPENROUTER_PROVIDER_MAX_PRICE_COMPLETION", ""), + description=( + "Maximum completion price (USD per 1M tokens) you accept for " + "this request, e.g. '15.0'. Empty disables. Sent as " + "`provider.max_price.completion`." + ), + ) + SERVICE_TIER: str = Field( + default=os.getenv("OPENROUTER_SERVICE_TIER", ""), + description=( + "OpenAI-style service tier hint forwarded to compatible " + "providers. Empty (default) leaves the choice to the provider." + ), + json_schema_extra={ + "input": { + "type": "select", + "options": [ + {"value": "", "label": "Default"}, + {"value": "auto", "label": "Auto"}, + {"value": "default", "label": "Default tier"}, + {"value": "flex", "label": "Flex (cheaper, slower)"}, + {"value": "priority", "label": "Priority (faster)"}, + {"value": "scale", "label": "Scale"}, + ], + } + }, + ) REQUIRE_PARAMETERS: bool = Field( default=os.getenv("OPENROUTER_REQUIRE_PARAMETERS", "false").lower() == "true", @@ -272,11 +504,89 @@ class Valves(BaseModel): == "true", description="Automatically compress long conversations that exceed the model's context window by summarizing middle messages", ) + ENABLE_WEB_SEARCH: bool = Field( + default=os.getenv("OPENROUTER_ENABLE_WEB_SEARCH", "false").lower() + == "true", + description=( + "Attach OpenRouter's `web` plugin to every request so the " + "model can ground answers in fresh web results. Stacks with " + "the `:online` variant tag (provider-side) — pick one. " + "OpenRouter charges per search call separately from tokens." + ), + ) + WEB_SEARCH_MAX_RESULTS: int = Field( + default=int(os.getenv("OPENROUTER_WEB_SEARCH_MAX_RESULTS", "5")), + ge=1, + le=20, + description="Maximum number of search results returned to the model when ENABLE_WEB_SEARCH is on.", + ) + WEB_SEARCH_PROMPT: str = Field( + default=os.getenv("OPENROUTER_WEB_SEARCH_PROMPT", ""), + description=( + "Optional custom search prompt forwarded to the search engine " + "(`plugins[].search_prompt`). Empty uses OpenRouter's default." + ), + ) + WEB_SEARCH_INCLUDE_DOMAINS: str = Field( + default=os.getenv("OPENROUTER_WEB_SEARCH_INCLUDE_DOMAINS", ""), + description=( + "Comma-separated domain allowlist for web search. Wildcards " + "and path filters supported (e.g. '*.substack.com, " + "openai.com/blog')." + ), + ) + WEB_SEARCH_EXCLUDE_DOMAINS: str = Field( + default=os.getenv("OPENROUTER_WEB_SEARCH_EXCLUDE_DOMAINS", ""), + description="Comma-separated domain denylist for web search (same format as include list).", + ) ENABLE_CACHE_CONTROL: bool = Field( default=os.getenv("OPENROUTER_ENABLE_CACHE_CONTROL", "false").lower() == "true", description="Enable prompt caching for Anthropic models (reduces cost on repeated long prompts). No effect on other providers", ) + ANTHROPIC_PROMPT_CACHE_TTL: str = Field( + default=os.getenv("OPENROUTER_ANTHROPIC_PROMPT_CACHE_TTL", "5m"), + description=( + "TTL for the Anthropic ephemeral cache breakpoint when " + "ENABLE_CACHE_CONTROL is on. '5m' (default) keeps the standard " + "short-lived cache; '1h' costs more on cache writes but persists " + "longer between turns." + ), + json_schema_extra={ + "input": { + "type": "select", + "options": [ + {"value": "5m", "label": "5 minutes"}, + {"value": "1h", "label": "1 hour"}, + ], + } + }, + ) + ZDR_ENFORCE: bool = Field( + default=os.getenv("OPENROUTER_ZDR_ENFORCE", "false").lower() == "true", + description=( + "When True, every chat request includes `provider.zdr=true` so " + "OpenRouter rejects the call unless a Zero Data Retention " + "endpoint is available for the chosen model." + ), + ) + ZDR_MODELS_ONLY: bool = Field( + default=os.getenv("OPENROUTER_ZDR_MODELS_ONLY", "false").lower() == "true", + description=( + "Catalog-side filter: when True, fetch OpenRouter's " + "`/endpoints/zdr` list and hide models without any ZDR-capable " + "endpoint. Pairs well with ZDR_ENFORCE for end-to-end privacy " + "guarantees." + ), + ) + HTTP_REFERER_OVERRIDE: str = Field( + default=os.getenv("OPENROUTER_HTTP_REFERER", ""), + description=( + "Override the `HTTP-Referer` header sent to OpenRouter for app " + "attribution (must be a full URL with scheme). Empty falls back " + "to WEBUI_URL or http://localhost:3000." + ), + ) SYNC_PROVIDER_ICONS: bool = Field( default=os.getenv("OPENROUTER_SYNC_ICONS", "true").lower() == "true", description="Automatically sync provider icons into Open WebUI's model database so they appear in the UI", @@ -293,6 +603,15 @@ class Valves(BaseModel): default=False, description="Append token usage and cost to each response", ) + SHOW_GENERATION_ID: bool = Field( + default=os.getenv("OPENROUTER_SHOW_GENERATION_ID", "false").lower() + == "true", + description=( + "Append the OpenRouter generation ID to each response so it " + "can be looked up later via `GET /generation?id=...` for " + "audit trails and per-request usage details." + ), + ) COST_CURRENCY: str = Field( default=os.getenv("OPENROUTER_COST_CURRENCY", "USD"), description="Currency label shown in cost display (display only; OpenRouter bills in USD)", @@ -332,6 +651,12 @@ def __init__(self) -> None: self._models_cache_key: str = "" # Track which model IDs already have icons synced (avoids repeated DB writes) self._icons_synced: set = set() + # Lazy-loaded mirror of OpenRouter's provider registry (slug → icon URL). + # None = not attempted; {} = attempted but failed/empty (do not retry). + self._provider_registry: Optional[dict] = None + # Lazy-loaded set of model IDs that have at least one ZDR endpoint. + # None = not attempted; frozenset() = attempted but failed/empty. + self._zdr_model_ids: Optional[frozenset] = None # Cache function_id once: OWUI sets __module__ to "function_{id}" at load time _fm = type(self).__module__ or "" self._function_id: Optional[str] = ( @@ -367,9 +692,12 @@ def _build_cache_key(self) -> str: else "" ) return ( - f"{api_key_hash}|{self.valves.FREE_ONLY}|" + f"{api_key_hash}|{self.valves.FREE_MODEL_FILTER}|" f"{self.valves.MODEL_PROVIDERS}|{self.valves.INVERT_PROVIDER_LIST}|" - f"{self.valves.MODEL_PREFIX}" + f"{self.valves.MODEL_PREFIX}|{self.valves.OUTPUT_MODALITIES}|" + f"{self.valves.TOOL_CALLING_FILTER}|{self.valves.ZDR_MODELS_ONLY}|" + f"{self.valves.MODEL_VARIANTS}|{self.valves.MODEL_CATEGORY}|" + f"{self.valves.HIDE_DEPRECATED_MODELS}" ) def _models_cache_valid(self) -> bool: @@ -395,10 +723,18 @@ def pipes(self) -> List[dict]: return self._models_cache headers = self._build_headers(include_content_type=False) + modalities = (self.valves.OUTPUT_MODALITIES or "all").strip() or "all" + params: dict = {"output_modalities": modalities} + category = (self.valves.MODEL_CATEGORY or "").strip() + if category: + params["category"] = category response = None try: response = self._session.get( - self.models_url, headers=headers, timeout=self.valves.REQUEST_TIMEOUT + self.models_url, + headers=headers, + params=params, + timeout=self.valves.REQUEST_TIMEOUT, ) # Detect auth errors from the models endpoint itself # 502 from Clerk usually means the key format is invalid @@ -438,6 +774,12 @@ def pipes(self) -> List[dict]: provider_filter = self._parse_provider_filter() prefix = self.valves.MODEL_PREFIX or "" + free_filter = (self.valves.FREE_MODEL_FILTER or "all").strip().lower() + tool_filter = (self.valves.TOOL_CALLING_FILTER or "all").strip().lower() + zdr_only = self.valves.ZDR_MODELS_ONLY + zdr_capable_ids: Optional[frozenset] = ( + self._load_zdr_model_ids() if zdr_only else None + ) models: List[dict] = [] for model in data: @@ -445,7 +787,7 @@ def pipes(self) -> List[dict]: if not model_id: continue - if self.valves.FREE_ONLY: + if free_filter in ("only", "exclude"): is_free = ":free" in model_id.lower() if not is_free: pricing = model.get("pricing") or {} @@ -456,9 +798,37 @@ def pipes(self) -> List[dict]: ) except (ValueError, TypeError): is_free = False - if not is_free: + if free_filter == "only" and not is_free: + continue + if free_filter == "exclude" and is_free: + continue + + if tool_filter in ("only", "exclude"): + supported = model.get("supported_parameters") or [] + tool_capable = any( + p in supported for p in ("tools", "tool_choice") + ) + if tool_filter == "only" and not tool_capable: + continue + if tool_filter == "exclude" and tool_capable: continue + if zdr_only and zdr_capable_ids is not None: + # OpenRouter's /endpoints/zdr returns base IDs (no '~' alias prefix + # and no ':variant' suffix). Strip both before comparing. + base_id = model_id.lstrip("~").split(":", 1)[0] + if base_id not in zdr_capable_ids: + continue + + # Deprecation handling: a non-null `expiration_date` means + # OpenRouter has scheduled the model for removal. Hide the entry + # entirely when the operator opts in; otherwise keep it but tag + # the display name so users notice before relying on it. + expiration = model.get("expiration_date") + is_deprecated = expiration is not None and str(expiration).strip() != "" + if is_deprecated and self.valves.HIDE_DEPRECATED_MODELS: + continue + # Split model_id once for provider extraction. # Strip leading '~' (OpenRouter "latest" aliases like ~anthropic/claude-haiku-latest) # so they match the same provider filter as their base provider. @@ -471,6 +841,8 @@ def pipes(self) -> List[dict]: continue model_name = model.get("name", model_id) + if is_deprecated: + model_name = f"⚠ {model_name} (deprecated)" model_dict = { "id": model_id, @@ -479,9 +851,18 @@ def pipes(self) -> List[dict]: models.append(model_dict) + # Append virtual variant entries (e.g. openai/gpt-4o:nitro). Variants + # inherit the base model's display name; only the suffix and a tag + # label change — the icon-sync step writes the same provider icon. + models = self._expand_variant_models(models, prefix) + if not models: - if self.valves.FREE_ONLY: - error_text = "No free models available. Disable FREE_ONLY to see paid models." + if free_filter == "only": + error_text = "No free models available. Set FREE_MODEL_FILTER to 'all' to see paid models." + elif tool_filter == "only": + error_text = "No tool-capable models available. Set TOOL_CALLING_FILTER to 'all' to broaden the catalog." + elif zdr_only: + error_text = "No ZDR-capable models available. Disable ZDR_MODELS_ONLY or check your OpenRouter privacy settings." elif provider_filter: providers_str = ", ".join(sorted(provider_filter)) error_text = f"No models match providers: {providers_str}. Check MODEL_PROVIDERS setting." @@ -554,7 +935,7 @@ async def pipe( ) payload = self._prepare_payload(body) - headers = self._build_headers() + headers = self._build_headers(model_id=payload.get("model")) stream = body.get("stream", False) if stream: @@ -630,7 +1011,7 @@ def _sync_model_icons(self, models: List[dict]) -> None: # ~anthropic/claude-haiku-latest) resolve to the correct icon. parts = model_id.split("/", 1) provider_key = parts[0].lstrip("~").lower() if len(parts) > 1 else "" - icon_url = _PROVIDER_ICONS.get(provider_key) + icon_url = self._get_provider_icon(provider_key) # Build the prefixed ID that Open WebUI uses in the frontend db_model_id = f"{function_id}.{model_id}" @@ -730,9 +1111,71 @@ def _sync_model_icons(self, models: List[dict]) -> None: @staticmethod def get_provider_icon(provider: str) -> Optional[str]: - """Return icon URL for the given provider.""" + """Return hardcoded icon URL for the given provider (fast path only). + + Does not consult the dynamic OpenRouter provider registry — for that, + use ``_get_provider_icon`` on a Pipe instance. + """ return _PROVIDER_ICONS.get(provider.lower()) + def _load_provider_registry(self) -> dict: + """Lazy-load OpenRouter's provider registry, cache for the pipe lifetime. + + Returns ``{slug: icon_url}`` (with each slug also indexed under its + hyphen-stripped variant so e.g. ``x-ai`` resolves to the registry's + ``xai`` entry). Network failures are silent — a single empty dict is + cached and the pipe falls back to the hardcoded ``_PROVIDER_ICONS``. + """ + if self._provider_registry is not None: + return self._provider_registry + + registry: dict = {} + try: + resp = self._session.get( + _PROVIDER_REGISTRY_URL, + timeout=min(self.valves.REQUEST_TIMEOUT, 15), + ) + try: + if resp.status_code == 200: + data = resp.json().get("data") or [] + for entry in data: + slug = (entry or {}).get("slug") or "" + icon = ((entry or {}).get("icon") or {}).get("url") or "" + if not slug or not icon: + continue + if icon.startswith("/"): + icon = f"https://openrouter.ai{icon}" + if not _is_safe_url(icon): + continue + registry[slug] = icon + # Also index by hyphen-stripped slug — model-author IDs + # like ``x-ai`` map to provider slug ``xai``. + compact = slug.replace("-", "") + if compact and compact != slug: + registry.setdefault(compact, icon) + finally: + resp.close() + except Exception as exc: # pragma: no cover + print(f"[OpenRouter Pipe] Provider registry fetch failed: {exc}") + + self._provider_registry = registry + return registry + + def _get_provider_icon(self, provider_key: str) -> Optional[str]: + """Resolve a provider icon URL using the layered fallback chain. + + Order: hardcoded ``_PROVIDER_ICONS`` → registry exact match → + registry hyphen-stripped match. Returns ``None`` if no source has it. + """ + if not provider_key: + return None + key = provider_key.lower() + icon = _PROVIDER_ICONS.get(key) + if icon: + return icon + registry = self._load_provider_registry() + return registry.get(key) or registry.get(key.replace("-", "")) or None + def _parse_provider_filter(self) -> Optional[set]: """Parse MODEL_PROVIDERS valve into a set of lowercase provider names.""" val = (self.valves.MODEL_PROVIDERS or "").strip() @@ -747,6 +1190,142 @@ def _parse_csv(value: str) -> List[str]: return [] return [item.strip() for item in value.split(",") if item.strip()] + def _load_zdr_model_ids(self) -> frozenset: + """Lazy-load OpenRouter's ZDR-capable model IDs and cache for the pipe lifetime. + + Returns the cached set on subsequent calls (including the empty-set + sentinel returned on network failure, so we don't retry on every + ``pipes()`` call). The endpoint returns a list of model IDs that have + at least one Zero Data Retention provider endpoint. + """ + if self._zdr_model_ids is not None: + return self._zdr_model_ids + + ids: set = set() + try: + resp = self._session.get( + f"{self._base}{_API_PATH_ZDR_ENDPOINTS}", + headers=self._build_headers(include_content_type=False), + timeout=min(self.valves.REQUEST_TIMEOUT, 30), + ) + try: + if resp.status_code == 200: + payload = resp.json() or {} + raw = payload.get("data") or payload.get("models") or [] + for entry in raw: + if isinstance(entry, str): + ids.add(entry) + elif isinstance(entry, dict): + mid = entry.get("id") or entry.get("model") + if isinstance(mid, str) and mid: + ids.add(mid) + finally: + resp.close() + except Exception as exc: # pragma: no cover + print(f"[OpenRouter Pipe] ZDR endpoint fetch failed: {exc}") + + self._zdr_model_ids = frozenset(ids) + return self._zdr_model_ids + + def _parse_variant_specs(self) -> List[tuple]: + """Parse MODEL_VARIANTS into ``(base_id, variant_tag)`` pairs. + + Recognised tags are listed in ``_RECOGNISED_VARIANT_TAGS`` and ensure + we don't accidentally fabricate IDs OpenRouter wouldn't honour. + Unknown tags are skipped with a console note. + """ + raw = self.valves.MODEL_VARIANTS or "" + out: List[tuple] = [] + for spec in self._parse_csv(raw): + if ":" not in spec: + print(f"[OpenRouter Pipe] Skipping malformed variant spec '{spec}' (expected base_id:variant_tag)") + continue + base_id, _, tag = spec.rpartition(":") + base_id = base_id.strip() + tag = tag.strip().lower() + if not base_id or not tag: + continue + if tag not in _RECOGNISED_VARIANT_TAGS: + print( + f"[OpenRouter Pipe] Skipping unknown variant tag ':{tag}' " + f"(supported: {', '.join(sorted(_RECOGNISED_VARIANT_TAGS))})" + ) + continue + out.append((base_id, tag)) + return out + + def _build_web_search_plugin(self) -> Optional[dict]: + """Assemble the OpenRouter `web` plugin spec from valve settings. + + Returns ``None`` when the feature is disabled. Output mirrors the + WebSearchPlugin schema from the official SDK + (id/enabled/max_results/search_prompt/include_domains/exclude_domains). + """ + if not self.valves.ENABLE_WEB_SEARCH: + return None + plugin: dict = {"id": "web"} + max_results = self.valves.WEB_SEARCH_MAX_RESULTS + if max_results: + plugin["max_results"] = int(max_results) + prompt = (self.valves.WEB_SEARCH_PROMPT or "").strip() + if prompt: + plugin["search_prompt"] = prompt + include = self._parse_csv(self.valves.WEB_SEARCH_INCLUDE_DOMAINS) + if include: + plugin["include_domains"] = include + exclude = self._parse_csv(self.valves.WEB_SEARCH_EXCLUDE_DOMAINS) + if exclude: + plugin["exclude_domains"] = exclude + return plugin + + def _expand_variant_models(self, models: List[dict], prefix: str) -> List[dict]: + """Append virtual variant entries to the catalog. + + Each ``base_id:variant`` entry inherits the base model's display name + (with the tag appended) and reuses the same provider icon — only the + ID changes so OpenRouter routes the request via the variant suffix. + Variants whose base model isn't in the catalog (filtered out, or + unknown to OpenRouter) are silently skipped. + """ + specs = self._parse_variant_specs() + if not specs: + return models + + prefix_str = prefix or "" + # Strip the user-set prefix so we can reuse base names verbatim. + by_id: dict = {} + for entry in models: + mid = entry.get("id") + if isinstance(mid, str): + by_id[mid] = entry + + seen_variant_ids = {entry.get("id") for entry in models} + appended: List[dict] = [] + for base_id, tag in specs: + base_entry = by_id.get(base_id) + if base_entry is None: + print( + f"[OpenRouter Pipe] Variant base not in catalog: " + f"{base_id} (skipping :{tag})" + ) + continue + variant_id = f"{base_id}:{tag}" + if variant_id in seen_variant_ids: + continue + base_name = base_entry.get("name", base_id) + # If the user set a prefix it's already in base_name; we only need + # to suffix the tag label. + tag_label = tag.capitalize() + appended.append( + { + "id": variant_id, + "name": f"{base_name} {tag_label}", + } + ) + seen_variant_ids.add(variant_id) + + return models + appended + def _prepare_payload(self, body: dict) -> dict: """Sanitize OWUI internals and inject provider routing, reasoning, and fallbacks.""" payload = copy.deepcopy(body) @@ -769,8 +1348,21 @@ def _prepare_payload(self, body: dict) -> dict: payload["include_reasoning"] = True effort = self.valves.REASONING_EFFORT.strip().lower() - if effort in ("low", "medium", "high"): - payload["reasoning"] = {"effort": effort} + summary = self.valves.REASONING_SUMMARY_MODE.strip().lower() + reasoning_cfg: dict = {} + if effort in ("minimal", "low", "medium", "high", "xhigh"): + reasoning_cfg["effort"] = effort + if summary in ("auto", "concise", "detailed"): + reasoning_cfg["summary"] = summary + if self.valves.REASONING_MAX_TOKENS > 0: + reasoning_cfg["max_tokens"] = int(self.valves.REASONING_MAX_TOKENS) + if reasoning_cfg: + payload["reasoning"] = reasoning_cfg + + # --- Service tier --- + tier = (self.valves.SERVICE_TIER or "").strip().lower() + if tier in ("auto", "default", "flex", "priority", "scale"): + payload["service_tier"] = tier # --- Provider routing --- provider: dict = {} @@ -787,6 +1379,29 @@ def _prepare_payload(self, body: dict) -> dict: if ignore: provider["ignore"] = ignore + only = self._parse_csv(self.valves.PROVIDER_ONLY) + if only: + provider["only"] = only + + quantizations = self._parse_csv(self.valves.PROVIDER_QUANTIZATIONS) + if quantizations: + provider["quantizations"] = [q.lower() for q in quantizations] + + # `allow_fallbacks` defaults to true on OpenRouter, so only emit the + # field when the operator opted out. + if not self.valves.PROVIDER_ALLOW_FALLBACKS: + provider["allow_fallbacks"] = False + + max_price: dict = {} + prompt_cap = (self.valves.PROVIDER_MAX_PRICE_PROMPT or "").strip() + if prompt_cap: + max_price["prompt"] = prompt_cap + completion_cap = (self.valves.PROVIDER_MAX_PRICE_COMPLETION or "").strip() + if completion_cap: + max_price["completion"] = completion_cap + if max_price: + provider["max_price"] = max_price + if self.valves.REQUIRE_PARAMETERS: provider["require_parameters"] = True @@ -794,6 +1409,12 @@ def _prepare_payload(self, body: dict) -> dict: if dc == "deny": provider["data_collection"] = "deny" + # ZDR enforcement: forces OpenRouter to route only to Zero Data + # Retention endpoints; the call fails fast if none exist for the + # selected model. + if self.valves.ZDR_ENFORCE: + provider["zdr"] = True + if provider: payload["provider"] = provider @@ -813,6 +1434,23 @@ def _prepare_payload(self, body: dict) -> dict: if self.valves.ENABLE_MIDDLE_OUT: payload["transforms"] = ["middle-out"] + # --- Web search plugin --- + # Append (don't overwrite) so the user can stack additional plugins + # via the request body. Skip silently if a `web` plugin is already + # present — first-match wins. + web_plugin = self._build_web_search_plugin() + if web_plugin is not None: + existing_plugins = payload.get("plugins") + if not isinstance(existing_plugins, list): + existing_plugins = [] + already_has_web = any( + isinstance(p, dict) and p.get("id") == "web" + for p in existing_plugins + ) + if not already_has_web: + existing_plugins.append(web_plugin) + payload["plugins"] = existing_plugins + # --- Cache control (Anthropic) --- if self.valves.ENABLE_CACHE_CONTROL: self._inject_cache_control(payload) @@ -824,8 +1462,13 @@ def _inject_cache_control(self, payload: dict) -> None: Applies to the first matching role (system, then user) with list-type content. Only one chunk is tagged ('first match wins') to avoid - excessive cache entries. + excessive cache entries. The TTL valve (5m/1h) is propagated into the + breakpoint so longer-lived caches are honoured by Anthropic. """ + ttl = (self.valves.ANTHROPIC_PROMPT_CACHE_TTL or "").strip().lower() + cache_payload: dict = {"type": "ephemeral"} + if ttl in ("5m", "1h"): + cache_payload["ttl"] = ttl try: messages = payload.get("messages", []) for role in ("system", "user"): @@ -841,20 +1484,62 @@ def _inject_cache_control(self, payload: dict) -> None: if length > longest_len: longest_idx, longest_len = idx, length if longest_idx >= 0: - content[longest_idx]["cache_control"] = {"type": "ephemeral"} + content[longest_idx]["cache_control"] = dict(cache_payload) return except Exception as exc: # pragma: no cover print(f"[OpenRouter Pipe] cache_control not applied: {exc}") - def _build_headers(self, include_content_type: bool = True) -> dict: - """Build HTTP headers for OpenRouter API requests.""" + @staticmethod + def _is_anthropic_model(model_id: str) -> bool: + """Return True if the (possibly variant-suffixed) model ID is Claude.""" + if not isinstance(model_id, str): + return False + # Strip leading '~' (latest aliases) before the prefix check. + return model_id.lstrip("~").lower().startswith("anthropic/") + + def _resolve_referer(self) -> str: + """Pick the HTTP-Referer header sent to OpenRouter. + + Order: explicit valve override → cached WEBUI_URL env → default. + Validates that an override is a full URL with scheme; falls back + silently otherwise so a misconfigured valve never breaks requests. + """ + override = (self.valves.HTTP_REFERER_OVERRIDE or "").strip() + if override.startswith(("http://", "https://")): + return override + return self._referer + + def _build_headers( + self, + include_content_type: bool = True, + *, + model_id: Optional[str] = None, + ) -> dict: + """Build HTTP headers for OpenRouter API requests. + + ``model_id`` is the (post-clean) ID about to be invoked; passing it + lets us inject provider-specific beta headers (e.g. Anthropic's + interleaved-thinking) only when relevant. + """ headers = { "Authorization": f"Bearer {self.valves.OPENROUTER_API_KEY}", - "HTTP-Referer": self._referer, + "HTTP-Referer": self._resolve_referer(), "X-Title": self._title, } if include_content_type: headers["Content-Type"] = "application/json" + + if ( + model_id + and self.valves.ENABLE_ANTHROPIC_INTERLEAVED_THINKING + and self._is_anthropic_model(model_id) + ): + existing = headers.get("anthropic-beta", "") + features = [p.strip() for p in existing.split(",") if p.strip()] + if _ANTHROPIC_INTERLEAVED_THINKING_BETA not in features: + features.append(_ANTHROPIC_INTERLEAVED_THINKING_BETA) + headers["anthropic-beta"] = ",".join(features) + return headers def _non_stream_response(self, headers: dict, payload: dict) -> str: @@ -919,6 +1604,11 @@ def _non_stream_response(self, headers: dict, payload: dict) -> str: if cost_info: final_parts.append(cost_info) + if self.valves.SHOW_GENERATION_ID: + gen_footer = _format_generation_id(res.get("id")) + if gen_footer: + final_parts.append(gen_footer) + return "".join(final_parts) except requests.exceptions.Timeout: return f"OpenRouter Error: Request timed out after {self.valves.REQUEST_TIMEOUT}s. Try increasing REQUEST_TIMEOUT or retry." @@ -937,6 +1627,7 @@ def _stream_response( in_think = False latest_citations: List[str] = [] latest_usage: dict = {} + latest_generation_id: Optional[str] = None def _close_think_tag(): nonlocal in_think @@ -972,6 +1663,11 @@ def _close_think_tag(): yield f"\n\nOpenRouter Error: {msg}" return + # Generation ID arrives on the first chunk and stays stable. + gen_id = chunk.get("id") + if gen_id and not latest_generation_id: + latest_generation_id = gen_id + usage_data = chunk.get("usage") if usage_data: latest_usage = usage_data @@ -1016,6 +1712,11 @@ def _close_think_tag(): cost_info = _format_cost_info(latest_usage, self.valves.COST_CURRENCY) if cost_info: yield cost_info + + if self.valves.SHOW_GENERATION_ID: + gen_footer = _format_generation_id(latest_generation_id) + if gen_footer: + yield gen_footer except requests.exceptions.Timeout: close_tag = _close_think_tag() if close_tag: diff --git a/test_pipe.py b/test_pipe.py index bcd2319..2f0385f 100644 --- a/test_pipe.py +++ b/test_pipe.py @@ -116,11 +116,12 @@ def _section(title: str): "OPENROUTER_API_KEY", "OPENROUTER_BASE_URL", "OPENROUTER_REASONING_EFFORT", "OPENROUTER_INCLUDE_REASONING", "OPENROUTER_MODEL_PROVIDERS", "OPENROUTER_INVERT_PROVIDER_LIST", - "OPENROUTER_FREE_ONLY", "OPENROUTER_PROVIDER_SORT", + "OPENROUTER_FREE_MODEL_FILTER", "OPENROUTER_PROVIDER_SORT", "OPENROUTER_PROVIDER_ORDER", "OPENROUTER_PROVIDER_IGNORE", "OPENROUTER_REQUIRE_PARAMETERS", "OPENROUTER_DATA_COLLECTION", "OPENROUTER_FALLBACK_MODELS", "OPENROUTER_ENABLE_MIDDLE_OUT", "OPENROUTER_ENABLE_CACHE_CONTROL", "OPENROUTER_REQUEST_TIMEOUT", + "OPENROUTER_OUTPUT_MODALITIES", ]: _env_backup[k] = os.environ.pop(k, None) @@ -134,7 +135,15 @@ def _section(title: str): _assert(v.INCLUDE_REASONING is True, "include_reasoning True by default") _assert(v.MODEL_PREFIX is None, "prefix None by default") _assert(v.MODEL_PROVIDERS == "ALL", "MODEL_PROVIDERS default is ALL") -_assert(v.FREE_ONLY is False, "FREE_ONLY false") +_assert(v.FREE_MODEL_FILTER == "all", "FREE_MODEL_FILTER default is 'all'") +_assert(v.TOOL_CALLING_FILTER == "all", "TOOL_CALLING_FILTER default is 'all'") +_assert(v.MODEL_VARIANTS == "", "MODEL_VARIANTS default empty") +_assert(v.ZDR_MODELS_ONLY is False, "ZDR_MODELS_ONLY default False") +_assert(v.ZDR_ENFORCE is False, "ZDR_ENFORCE default False") +_assert(v.REASONING_SUMMARY_MODE == "disabled", "REASONING_SUMMARY_MODE default 'disabled'") +_assert(v.ENABLE_ANTHROPIC_INTERLEAVED_THINKING is True, "interleaved thinking default True") +_assert(v.ANTHROPIC_PROMPT_CACHE_TTL == "5m", "ANTHROPIC_PROMPT_CACHE_TTL default '5m'") +_assert(v.HTTP_REFERER_OVERRIDE == "", "HTTP_REFERER_OVERRIDE default empty") _assert(v.PROVIDER_SORT == "", "PROVIDER_SORT empty") _assert(v.PROVIDER_ORDER == "", "PROVIDER_ORDER empty") _assert(v.PROVIDER_IGNORE == "", "PROVIDER_IGNORE empty") @@ -147,6 +156,7 @@ def _section(title: str): _assert(v.MAX_RETRIES == 2, "MAX_RETRIES 2") _assert(v.SHOW_COST_INFO is False, "SHOW_COST_INFO false by default") _assert(v.COST_CURRENCY == "USD", "COST_CURRENCY USD by default") +_assert(v.OUTPUT_MODALITIES == "all", "OUTPUT_MODALITIES default 'all' (full catalog)") try: Pipe.Valves(REQUEST_TIMEOUT=-1) @@ -260,6 +270,74 @@ def _section(title: str): payload4 = pipe2._prepare_payload(body4) _assert(payload4["model"] == "openai/gpt-4o", "model without dot left unchanged") +# ── 5d. Extended REASONING_EFFORT levels (minimal, xhigh) ── +_pipe5d = Pipe() +_pipe5d.valves = Pipe.Valves(OPENROUTER_API_KEY="k", REASONING_EFFORT="minimal") +_p5d = _pipe5d._prepare_payload({"model": "openai/o1", "messages": []}) +_assert(_p5d.get("reasoning") == {"effort": "minimal"}, "REASONING_EFFORT='minimal' sent verbatim") + +_pipe5d.valves = Pipe.Valves(OPENROUTER_API_KEY="k", REASONING_EFFORT="xhigh") +_p5d = _pipe5d._prepare_payload({"model": "openai/o1", "messages": []}) +_assert(_p5d.get("reasoning") == {"effort": "xhigh"}, "REASONING_EFFORT='xhigh' sent verbatim") + +# Empty/garbage effort drops the key +_pipe5d.valves = Pipe.Valves(OPENROUTER_API_KEY="k", REASONING_EFFORT="") +_p5d = _pipe5d._prepare_payload({"model": "openai/o1", "messages": []}) +_assert("reasoning" not in _p5d, "empty REASONING_EFFORT: no reasoning field") +_pipe5d.valves = Pipe.Valves(OPENROUTER_API_KEY="k", REASONING_EFFORT="bogus") +_p5d = _pipe5d._prepare_payload({"model": "openai/o1", "messages": []}) +_assert("reasoning" not in _p5d, "garbage REASONING_EFFORT: silently dropped") + +# ── 5e. REASONING_SUMMARY_MODE merged into reasoning object ── +_pipe5e = Pipe() +_pipe5e.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", + REASONING_EFFORT="high", + REASONING_SUMMARY_MODE="detailed", +) +_p5e = _pipe5e._prepare_payload({"model": "openai/o1", "messages": []}) +_assert( + _p5e.get("reasoning") == {"effort": "high", "summary": "detailed"}, + "effort + summary merged into one reasoning object", +) +# Summary alone (no effort) +_pipe5e.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", REASONING_EFFORT="", REASONING_SUMMARY_MODE="auto" +) +_p5e = _pipe5e._prepare_payload({"model": "openai/o1", "messages": []}) +_assert(_p5e.get("reasoning") == {"summary": "auto"}, "summary-only reasoning object") +# disabled summary skipped +_pipe5e.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", REASONING_EFFORT="", REASONING_SUMMARY_MODE="disabled" +) +_p5e = _pipe5e._prepare_payload({"model": "openai/o1", "messages": []}) +_assert("reasoning" not in _p5e, "summary='disabled' + no effort: reasoning key dropped") + +# ── 5f. ZDR_ENFORCE injects provider.zdr=true ── +_pipe5f = Pipe() +_pipe5f.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ZDR_ENFORCE=True) +_p5f = _pipe5f._prepare_payload({"model": "openai/gpt-4o", "messages": []}) +_assert(_p5f.get("provider", {}).get("zdr") is True, "ZDR_ENFORCE=True: provider.zdr=true injected") + +_pipe5f.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ZDR_ENFORCE=False) +_p5f = _pipe5f._prepare_payload({"model": "openai/gpt-4o", "messages": []}) +_assert( + "provider" not in _p5f or "zdr" not in _p5f.get("provider", {}), + "ZDR_ENFORCE=False: no provider.zdr field", +) + +# ZDR_ENFORCE plays nice with other provider fields +_pipe5f.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", + ZDR_ENFORCE=True, + PROVIDER_SORT="price", + DATA_COLLECTION="deny", +) +_p5f = _pipe5f._prepare_payload({"model": "openai/gpt-4o", "messages": []}) +_assert(_p5f["provider"]["zdr"] is True, "ZDR_ENFORCE coexists with sort") +_assert(_p5f["provider"]["sort"] == "price", "ZDR_ENFORCE: sort preserved") +_assert(_p5f["provider"]["data_collection"] == "deny", "ZDR_ENFORCE: data_collection preserved") + # ── 6. _build_headers ──────────────────────────────────────────────────────── _section("6. _build_headers()") @@ -278,6 +356,56 @@ def _section(title: str): _assert("Content-Type" not in headers_no_ct, "Content-Type omitted") _assert("Authorization" in headers_no_ct, "auth still present") +# 6b. ENABLE_ANTHROPIC_INTERLEAVED_THINKING injects beta header for anthropic models only +pipe = Pipe() +pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ENABLE_ANTHROPIC_INTERLEAVED_THINKING=True) +_h_anth = pipe._build_headers(model_id="anthropic/claude-3.5-sonnet") +_assert( + _h_anth.get("anthropic-beta") == "interleaved-thinking-2025-05-14", + "anthropic model: interleaved-thinking beta header injected", +) +_h_oai = pipe._build_headers(model_id="openai/gpt-4o") +_assert( + "anthropic-beta" not in _h_oai, + "non-anthropic model: no interleaved-thinking header", +) +# Tilde latest-alias still picks up the header +_h_alias = pipe._build_headers(model_id="~anthropic/claude-haiku-latest") +_assert( + _h_alias.get("anthropic-beta") == "interleaved-thinking-2025-05-14", + "tilde anthropic alias: interleaved-thinking header injected", +) +# When the valve is off, no header even on Claude +pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ENABLE_ANTHROPIC_INTERLEAVED_THINKING=False) +_h_off = pipe._build_headers(model_id="anthropic/claude-3.5-sonnet") +_assert( + "anthropic-beta" not in _h_off, + "valve off: no interleaved-thinking header even for Claude", +) + +# 6c. HTTP_REFERER_OVERRIDE: explicit override > env fallback > default +pipe = Pipe() +pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="k") +_default_ref = pipe._build_headers()["HTTP-Referer"] +_assert( + _default_ref.startswith(("http://", "https://")), + "HTTP-Referer falls back to a valid scheme URL when no override set", +) +pipe.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", + HTTP_REFERER_OVERRIDE="https://my-corp.example.com/owui", +) +_assert( + pipe._build_headers()["HTTP-Referer"] == "https://my-corp.example.com/owui", + "HTTP_REFERER_OVERRIDE: full URL respected", +) +# Bogus override (no scheme) → silently falls back +pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="k", HTTP_REFERER_OVERRIDE="not-a-url") +_assert( + pipe._build_headers()["HTTP-Referer"] != "not-a-url", + "HTTP_REFERER_OVERRIDE: schemeless value silently ignored", +) + # ── 7. _get_provider_icon ──────────────────────────────────────────────────── _section("7. get_provider_icon()") @@ -361,14 +489,37 @@ def _section(title: str): } pipe._inject_cache_control(payload_cc) _assert( - payload_cc["messages"][0]["content"][1].get("cache_control") == {"type": "ephemeral"}, - "cache_control applied to longest text chunk", + payload_cc["messages"][0]["content"][1].get("cache_control") + == {"type": "ephemeral", "ttl": "5m"}, + "cache_control applied to longest text chunk (default 5m TTL)", ) _assert( "cache_control" not in payload_cc["messages"][0]["content"][0], "cache_control NOT on shorter chunk", ) +# Cache TTL valve switches the breakpoint to 1h +_pipe_ttl_1h = Pipe() +_pipe_ttl_1h.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", + ENABLE_CACHE_CONTROL=True, + ANTHROPIC_PROMPT_CACHE_TTL="1h", +) +payload_ttl = { + "messages": [ + { + "role": "system", + "content": [{"type": "text", "text": "long system prompt"}], + } + ] +} +_pipe_ttl_1h._inject_cache_control(payload_ttl) +_assert( + payload_ttl["messages"][0]["content"][0].get("cache_control") + == {"type": "ephemeral", "ttl": "1h"}, + "ANTHROPIC_PROMPT_CACHE_TTL='1h' propagated into breakpoint", +) + # No list content → no crash payload_cc2 = {"messages": [{"role": "system", "content": "plain string"}]} pipe._inject_cache_control(payload_cc2) # Should not raise @@ -864,7 +1015,7 @@ async def _test_pipe_stream() -> str: _assert("info" not in models[0], "pipes: info key removed (dead code)") # 15b. FREE_ONLY filter -pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", FREE_ONLY=True) +pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", FREE_MODEL_FILTER="only") # Mock data with pricing info: one :free suffix, one free-by-pricing, one paid mock_models_pricing = { @@ -1045,7 +1196,7 @@ async def _test_pipe_stream() -> str: _assert("No models found" in models[0]["name"], "pipes empty data: correct message") # 15m. FREE_ONLY + all paid models → "No free models available" -pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", FREE_ONLY=True) +pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", FREE_MODEL_FILTER="only") pipe._models_cache = None _mock_all_paid = { "data": [ @@ -1118,15 +1269,20 @@ async def _test_pipe_stream() -> str: "API key: input type is password", ) -# 16b. REASONING_EFFORT uses select with 4 options +# 16b. REASONING_EFFORT uses select with 6 options (disabled, minimal, low, medium, high, xhigh) re_field = Pipe.Valves.model_fields["REASONING_EFFORT"] _assert( re_field.json_schema_extra is not None, "REASONING_EFFORT: json_schema_extra present", ) re_options = re_field.json_schema_extra.get("input", {}).get("options", []) -_assert(len(re_options) == 4, "REASONING_EFFORT: 4 options (disabled, low, medium, high)") +_assert( + len(re_options) == 6, + "REASONING_EFFORT: 6 options (disabled, minimal, low, medium, high, xhigh)", +) re_values = [o["value"] for o in re_options] +_assert("minimal" in re_values, "REASONING_EFFORT: minimal option present") +_assert("xhigh" in re_values, "REASONING_EFFORT: xhigh option present") _assert("" in re_values and "high" in re_values, "REASONING_EFFORT: contains empty and high") # 16c. PROVIDER_SORT uses select with 4 options @@ -1222,7 +1378,7 @@ def _counting_get(*args, **kwargs): _assert(_call_count == 1, "cache hit: API called only once for two pipes() calls") # 19b. Changing a valve invalidates cache -_pipe_cache.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", FREE_ONLY=True) +_pipe_cache.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", FREE_MODEL_FILTER="only") _call_count = 0 with patch.object(_pipe_cache._session, "get", side_effect=_counting_get): _pipe_cache.pipes() # should miss cache (valve changed) @@ -1243,6 +1399,300 @@ def _counting_get(*args, **kwargs): _pipe_cache.pipes() _assert(_call_count == 1, "cache expired: API called after TTL") +# ── 19d. OUTPUT_MODALITIES query param ────────────────────────────────────── +_section("19d. OUTPUT_MODALITIES query param on /models") + +_mock_modalities_resp = MagicMock() +_mock_modalities_resp.status_code = 200 +_mock_modalities_resp.json.return_value = {"data": [ + {"id": "openai/gpt-4o", "name": "GPT-4o"}, + {"id": "openai/gpt-4o-mini-tts-2025-12-15", "name": "GPT-4o Mini TTS"}, +]} +_mock_modalities_resp.raise_for_status = MagicMock() + +_captured_kwargs = {} + +def _capture_get(*args, **kwargs): + _captured_kwargs.clear() + _captured_kwargs.update(kwargs) + return _mock_modalities_resp + +# Default valve → params should request 'all' +_pipe_mod = Pipe() +_pipe_mod.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key") +_pipe_mod._models_cache = None +with patch.object(_pipe_mod._session, "get", side_effect=_capture_get): + _models = _pipe_mod.pipes() +_assert( + _captured_kwargs.get("params") == {"output_modalities": "all"}, + "default OUTPUT_MODALITIES sends params={'output_modalities':'all'}", +) +_tts_ids = {m["id"] for m in _models} +_assert( + "openai/gpt-4o-mini-tts-2025-12-15" in _tts_ids, + "TTS model surfaced in pipes() output when API returns it", +) + +# Custom valve value → forwarded verbatim +_pipe_mod = Pipe() +_pipe_mod.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", OUTPUT_MODALITIES="text,audio") +_pipe_mod._models_cache = None +with patch.object(_pipe_mod._session, "get", side_effect=_capture_get): + _pipe_mod.pipes() +_assert( + _captured_kwargs.get("params") == {"output_modalities": "text,audio"}, + "custom OUTPUT_MODALITIES forwarded as params value", +) + +# Empty/whitespace valve → falls back to 'all' +_pipe_mod = Pipe() +_pipe_mod.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", OUTPUT_MODALITIES=" ") +_pipe_mod._models_cache = None +with patch.object(_pipe_mod._session, "get", side_effect=_capture_get): + _pipe_mod.pipes() +_assert( + _captured_kwargs.get("params") == {"output_modalities": "all"}, + "blank OUTPUT_MODALITIES falls back to 'all'", +) + +# 19e. Cache key includes OUTPUT_MODALITIES — toggling invalidates +_pipe_mod_cache = Pipe() +_pipe_mod_cache.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", OUTPUT_MODALITIES="all") +_key_all = _pipe_mod_cache._build_cache_key() +_pipe_mod_cache.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", OUTPUT_MODALITIES="text") +_key_text = _pipe_mod_cache._build_cache_key() +_assert(_key_all != _key_text, "_build_cache_key differs for different OUTPUT_MODALITIES") + +# Behavioral: pipes() refetches after OUTPUT_MODALITIES changes +_pipe_mod_cache = Pipe() +_pipe_mod_cache.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", OUTPUT_MODALITIES="all") +_pipe_mod_cache._models_cache = None + +_modalities_call_count = 0 + +def _counting_modalities_get(*args, **kwargs): + global _modalities_call_count + _modalities_call_count += 1 + return _mock_modalities_resp + +with patch.object(_pipe_mod_cache._session, "get", side_effect=_counting_modalities_get): + _pipe_mod_cache.pipes() # populates cache + _pipe_mod_cache.pipes() # cache hit +_assert(_modalities_call_count == 1, "OUTPUT_MODALITIES cache hit: 1 API call across 2 pipes() invocations") + +_pipe_mod_cache.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", OUTPUT_MODALITIES="text") +_modalities_call_count = 0 +with patch.object(_pipe_mod_cache._session, "get", side_effect=_counting_modalities_get): + _pipe_mod_cache.pipes() +_assert( + _modalities_call_count == 1, + "OUTPUT_MODALITIES change invalidates cache: API refetched", +) + +# ── 19f. FREE_MODEL_FILTER trinary (all/only/exclude) ──────────────────────── +_section("19f. FREE_MODEL_FILTER trinary") + +_mock_pricing = { + "data": [ + {"id": "openai/gpt-4o", "name": "GPT-4o", "pricing": {"prompt": "5", "completion": "15"}}, + {"id": "google/gemini-2.0-flash-exp:free", "name": "Gemini 2.0 Flash (Free)", "pricing": {"prompt": "0", "completion": "0"}}, + {"id": "google/gemma-3-1b-it", "name": "Gemma 3 1B", "pricing": {"prompt": "0", "completion": "0"}}, + ] +} +_mock_pricing_resp = MagicMock() +_mock_pricing_resp.status_code = 200 +_mock_pricing_resp.json.return_value = _mock_pricing +_mock_pricing_resp.raise_for_status = MagicMock() + +# 'all' = no filter +_pipe_ff = Pipe() +_pipe_ff.valves = Pipe.Valves(OPENROUTER_API_KEY="k", FREE_MODEL_FILTER="all") +_pipe_ff._models_cache = None +with patch.object(_pipe_ff._session, "get", return_value=_mock_pricing_resp): + _all_models = _pipe_ff.pipes() +_assert(len(_all_models) == 3, "FREE_MODEL_FILTER='all': all 3 models pass through") + +# 'exclude' hides free models +_pipe_ff = Pipe() +_pipe_ff.valves = Pipe.Valves(OPENROUTER_API_KEY="k", FREE_MODEL_FILTER="exclude") +_pipe_ff._models_cache = None +with patch.object(_pipe_ff._session, "get", return_value=_mock_pricing_resp): + _paid = _pipe_ff.pipes() +_paid_ids = {m["id"] for m in _paid} +_assert("openai/gpt-4o" in _paid_ids, "FREE_MODEL_FILTER='exclude': paid model kept") +_assert(":free" not in str(_paid_ids), "FREE_MODEL_FILTER='exclude': :free suffix excluded") +_assert("google/gemma-3-1b-it" not in _paid_ids, "FREE_MODEL_FILTER='exclude': zero-pricing excluded") + +# ── 19g. TOOL_CALLING_FILTER ──────────────────────────────────────────────── +_section("19g. TOOL_CALLING_FILTER") + +_mock_tools = { + "data": [ + {"id": "openai/gpt-4o", "name": "GPT-4o", "supported_parameters": ["tools", "tool_choice", "temperature"]}, + {"id": "openai/o1-mini", "name": "o1-mini", "supported_parameters": ["temperature"]}, + {"id": "openai/gpt-3.5-turbo", "name": "GPT-3.5", "supported_parameters": ["tool_choice"]}, + ] +} +_mock_tools_resp = MagicMock() +_mock_tools_resp.status_code = 200 +_mock_tools_resp.json.return_value = _mock_tools +_mock_tools_resp.raise_for_status = MagicMock() + +_pipe_tc = Pipe() +_pipe_tc.valves = Pipe.Valves(OPENROUTER_API_KEY="k", TOOL_CALLING_FILTER="only") +_pipe_tc._models_cache = None +with patch.object(_pipe_tc._session, "get", return_value=_mock_tools_resp): + _tc_models = _pipe_tc.pipes() +_tc_ids = {m["id"] for m in _tc_models} +_assert("openai/gpt-4o" in _tc_ids, "TOOL_CALLING_FILTER='only': model with 'tools' kept") +_assert("openai/gpt-3.5-turbo" in _tc_ids, "TOOL_CALLING_FILTER='only': model with 'tool_choice' kept") +_assert("openai/o1-mini" not in _tc_ids, "TOOL_CALLING_FILTER='only': non-tool model dropped") + +_pipe_tc = Pipe() +_pipe_tc.valves = Pipe.Valves(OPENROUTER_API_KEY="k", TOOL_CALLING_FILTER="exclude") +_pipe_tc._models_cache = None +with patch.object(_pipe_tc._session, "get", return_value=_mock_tools_resp): + _tc_excl = _pipe_tc.pipes() +_tc_excl_ids = {m["id"] for m in _tc_excl} +_assert(_tc_excl_ids == {"openai/o1-mini"}, "TOOL_CALLING_FILTER='exclude': only non-tool model kept") + +# ── 19h. MODEL_VARIANTS expansion ─────────────────────────────────────────── +_section("19h. MODEL_VARIANTS expansion") + +_mock_var = { + "data": [ + {"id": "openai/gpt-4o", "name": "GPT-4o"}, + {"id": "anthropic/claude-3.5-sonnet", "name": "Claude 3.5 Sonnet"}, + ] +} +_mock_var_resp = MagicMock() +_mock_var_resp.status_code = 200 +_mock_var_resp.json.return_value = _mock_var +_mock_var_resp.raise_for_status = MagicMock() + +_pipe_var = Pipe() +_pipe_var.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", + MODEL_VARIANTS="openai/gpt-4o:nitro,anthropic/claude-3.5-sonnet:thinking,openai/gpt-4o:exacto", +) +_pipe_var._models_cache = None +with patch.object(_pipe_var._session, "get", return_value=_mock_var_resp): + _var_models = _pipe_var.pipes() +_var_ids = {m["id"] for m in _var_models} +_assert("openai/gpt-4o" in _var_ids, "MODEL_VARIANTS: base model preserved") +_assert("openai/gpt-4o:nitro" in _var_ids, "MODEL_VARIANTS: :nitro variant added") +_assert("openai/gpt-4o:exacto" in _var_ids, "MODEL_VARIANTS: :exacto variant added") +_assert("anthropic/claude-3.5-sonnet:thinking" in _var_ids, "MODEL_VARIANTS: :thinking variant added") +_nitro_entry = next(m for m in _var_models if m["id"] == "openai/gpt-4o:nitro") +_assert("Nitro" in _nitro_entry["name"], "MODEL_VARIANTS: tag label appended to display name") +_assert("GPT-4o" in _nitro_entry["name"], "MODEL_VARIANTS: base name retained") + +# Variant whose base isn't in the catalog → silently skipped +_pipe_var = Pipe() +_pipe_var.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", + MODEL_VARIANTS="missing/provider-model:nitro,openai/gpt-4o:nitro", +) +_pipe_var._models_cache = None +with patch.object(_pipe_var._session, "get", return_value=_mock_var_resp): + _var_models = _pipe_var.pipes() +_var_ids = {m["id"] for m in _var_models} +_assert("missing/provider-model:nitro" not in _var_ids, "MODEL_VARIANTS: missing base skipped") +_assert("openai/gpt-4o:nitro" in _var_ids, "MODEL_VARIANTS: valid variant still added") + +# Unrecognised tag → skipped +_pipe_var = Pipe() +_pipe_var.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", MODEL_VARIANTS="openai/gpt-4o:bogus" +) +_pipe_var._models_cache = None +with patch.object(_pipe_var._session, "get", return_value=_mock_var_resp): + _var_models = _pipe_var.pipes() +_assert( + not any(m["id"] == "openai/gpt-4o:bogus" for m in _var_models), + "MODEL_VARIANTS: unrecognised tag silently dropped", +) + +# Empty MODEL_VARIANTS → no expansion +_pipe_var = Pipe() +_pipe_var.valves = Pipe.Valves(OPENROUTER_API_KEY="k", MODEL_VARIANTS="") +_pipe_var._models_cache = None +with patch.object(_pipe_var._session, "get", return_value=_mock_var_resp): + _var_models = _pipe_var.pipes() +_assert(len(_var_models) == 2, "MODEL_VARIANTS empty: no virtual entries added") + +# ── 19i. ZDR_MODELS_ONLY filter + _load_zdr_model_ids ─────────────────────── +_section("19i. ZDR_MODELS_ONLY filter") + +_mock_zdr_resp = MagicMock() +_mock_zdr_resp.status_code = 200 +_mock_zdr_resp.json.return_value = { + "data": ["openai/gpt-4o", "anthropic/claude-3.5-sonnet"] +} +_mock_zdr_resp.raise_for_status = MagicMock() + +_mock_models_zdr = { + "data": [ + {"id": "openai/gpt-4o", "name": "GPT-4o"}, + {"id": "anthropic/claude-3.5-sonnet", "name": "Claude"}, + {"id": "google/gemini-2.0-flash-exp", "name": "Gemini"}, + ] +} +_mock_models_zdr_resp = MagicMock() +_mock_models_zdr_resp.status_code = 200 +_mock_models_zdr_resp.json.return_value = _mock_models_zdr +_mock_models_zdr_resp.raise_for_status = MagicMock() + + +def _zdr_router(url, *args, **kwargs): + if "/endpoints/zdr" in url: + return _mock_zdr_resp + return _mock_models_zdr_resp + + +_pipe_zdr = Pipe() +_pipe_zdr.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ZDR_MODELS_ONLY=True) +_pipe_zdr._models_cache = None +with patch.object(_pipe_zdr._session, "get", side_effect=_zdr_router): + _zdr_models = _pipe_zdr.pipes() +_zdr_ids = {m["id"] for m in _zdr_models} +_assert(_zdr_ids == {"openai/gpt-4o", "anthropic/claude-3.5-sonnet"}, + "ZDR_MODELS_ONLY: catalog narrowed to ZDR-capable IDs") + +# Loader caches: no second HTTP call when called twice +_pipe_zdr2 = Pipe() +_pipe_zdr2.valves = Pipe.Valves(OPENROUTER_API_KEY="k") +_zdr_call_count = 0 + + +def _counting_zdr_router(url, *args, **kwargs): + global _zdr_call_count + if "/endpoints/zdr" in url: + _zdr_call_count += 1 + return _mock_zdr_resp if "/endpoints/zdr" in url else _mock_models_zdr_resp + + +with patch.object(_pipe_zdr2._session, "get", side_effect=_counting_zdr_router): + _ = _pipe_zdr2._load_zdr_model_ids() + _ = _pipe_zdr2._load_zdr_model_ids() +_assert(_zdr_call_count == 1, "_load_zdr_model_ids: cached after first call") + +# ── 19j. _build_cache_key includes new filters ────────────────────────────── +_section("19j. cache key includes FREE_MODEL_FILTER / TOOL_CALLING_FILTER / ZDR_MODELS_ONLY / MODEL_VARIANTS") + +_keys = [] +for v in [ + {}, + {"FREE_MODEL_FILTER": "only"}, + {"TOOL_CALLING_FILTER": "exclude"}, + {"ZDR_MODELS_ONLY": True}, + {"MODEL_VARIANTS": "openai/gpt-4o:nitro"}, +]: + _p = Pipe() + _p.valves = Pipe.Valves(OPENROUTER_API_KEY="k", **v) + _keys.append(_p._build_cache_key()) +_assert(len(set(_keys)) == len(_keys), "cache key fingerprint differs per new-filter valve") + # ── 20. Base URL validator ─────────────────────────────────────────────────── _section("20. Base URL validator") @@ -1381,7 +1831,7 @@ async def _test_pipe_no_msgs_key(): # 24b. FREE_ONLY with :free suffix _pipe_free = Pipe() -_pipe_free.valves = Pipe.Valves(OPENROUTER_API_KEY="k", FREE_ONLY=True) +_pipe_free.valves = Pipe.Valves(OPENROUTER_API_KEY="k", FREE_MODEL_FILTER="only") _pipe_free._models_cache = None _mock_free_resp = MagicMock() _mock_free_resp.status_code = 200 @@ -1598,6 +2048,133 @@ async def _test_pipe_no_msgs_key(): _assert(_is_owui("https://openrouter.ai/images/icons/Anthropic.svg"), "_is_owui_managed_icon: icons path anthropic → True") _assert(not _is_owui("https://custom-icon.example.com/icon.png"), "_is_owui_managed_icon: external URL → False") _assert(not _is_owui("https://cdn.openai.com/logo.png"), "_is_owui_managed_icon: other https URL → False") +_assert( + _is_owui("https://t0.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&url=https://x.ai/&size=256"), + "_is_owui_managed_icon: gstatic faviconV2 URL → True (registry-sourced, overwriteable)", +) + +# ── 25j. _load_provider_registry + _get_provider_icon ──────────────────────── +_section("25j. provider registry auto-discovery") + +# Mock the OpenRouter frontend providers payload +_registry_payload = { + "data": [ + {"slug": "openai", "name": "OpenAI", "icon": {"url": "/images/icons/OpenAI.svg"}}, + {"slug": "xai", "name": "xAI", "icon": {"url": "https://t0.gstatic.com/faviconV2?url=https://x.ai/&size=256"}}, + {"slug": "arcee-ai", "name": "Arcee AI", "icon": {"url": "https://t0.gstatic.com/faviconV2?url=https://www.arcee.ai/&size=256"}}, + {"slug": "broken", "name": "Broken", "icon": {"url": ""}}, # empty icon — must be skipped + {"slug": "unsafe", "name": "Unsafe", "icon": {"url": "javascript:alert(1)"}}, # unsafe — must be skipped + {"slug": "noicon", "name": "NoIcon"}, # no icon key at all + ] +} +_mock_reg_resp = MagicMock() +_mock_reg_resp.status_code = 200 +_mock_reg_resp.json.return_value = _registry_payload + +_pipe_reg = Pipe() +_pipe_reg.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key") + +_reg_call_count = 0 +def _counting_reg_get(url, *args, **kwargs): + global _reg_call_count + if "all-providers" in url: + _reg_call_count += 1 + return _mock_reg_resp + return _mock_reg_resp # fall-through is fine for this test + +with patch.object(_pipe_reg._session, "get", side_effect=_counting_reg_get): + _r1 = _pipe_reg._load_provider_registry() + _r2 = _pipe_reg._load_provider_registry() # cached, no second fetch + +_assert(_reg_call_count == 1, "registry: HTTP fetched exactly once (caching)") +_assert(_r1 is _r2, "registry: cached object is the same instance on subsequent calls") +_assert( + _r1.get("openai") == "https://openrouter.ai/images/icons/OpenAI.svg", + "registry: relative /images/icons/ URL resolved against openrouter.ai", +) +_assert( + _r1.get("xai", "").startswith("https://t0.gstatic.com/faviconV2"), + "registry: gstatic favicon URL kept verbatim", +) +_assert( + _r1.get("arcee-ai") == _r1.get("arceeai"), + "registry: hyphen-stripped slug also indexed (arcee-ai → arceeai)", +) +_assert("broken" not in _r1, "registry: empty icon URL skipped") +_assert("unsafe" not in _r1, "registry: unsafe (non-http) icon URL skipped") +_assert("noicon" not in _r1, "registry: entry without icon key skipped") + +# 25k. _get_provider_icon layered lookup +_pipe_lookup = Pipe() +_pipe_lookup.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key") +with patch.object(_pipe_lookup._session, "get", side_effect=_counting_reg_get): + # Hardcoded fast path — registry never consulted + _icon_openai = _pipe_lookup._get_provider_icon("openai") + _assert( + _icon_openai == "https://openrouter.ai/images/icons/OpenAI.svg", + "_get_provider_icon: hardcoded dict hit returns OpenAI icon", + ) + + # Slug not in dict but in registry (exact) + _icon_arcee = _pipe_lookup._get_provider_icon("arcee-ai") + _assert( + _icon_arcee and _icon_arcee.startswith("https://t0.gstatic.com/faviconV2"), + "_get_provider_icon: registry exact-slug hit (arcee-ai)", + ) + + # Hyphen-strip normalization: x-ai (model author) → xai (registry slug) + _icon_xai = _pipe_lookup._get_provider_icon("x-ai") + _assert( + _icon_xai and _icon_xai.startswith("https://t0.gstatic.com/faviconV2"), + "_get_provider_icon: hyphen-strip normalization (x-ai → xai)", + ) + + # Truly unknown provider returns None (registry has no entry) + _icon_missing = _pipe_lookup._get_provider_icon("totally-unknown-provider") + _assert( + _icon_missing is None, + "_get_provider_icon: unknown provider returns None", + ) + + # Empty/None provider key + _assert(_pipe_lookup._get_provider_icon("") is None, "_get_provider_icon: empty key → None") + +# 25l. Registry network failure → cached empty dict, no retry, dict still works +_pipe_fail = Pipe() +_pipe_fail.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key") + +_fail_call_count = 0 +def _failing_reg_get(*args, **kwargs): + global _fail_call_count + _fail_call_count += 1 + raise Exception("simulated network failure") + +with patch.object(_pipe_fail._session, "get", side_effect=_failing_reg_get): + _r_fail = _pipe_fail._load_provider_registry() + _r_fail_2 = _pipe_fail._load_provider_registry() +_assert(_r_fail == {}, "registry: network failure → empty dict") +_assert(_fail_call_count == 1, "registry: failure does not retry (cached empty)") + +# Hardcoded dict still works after registry failure +_assert( + _pipe_fail._get_provider_icon("openai") == "https://openrouter.ai/images/icons/OpenAI.svg", + "_get_provider_icon: hardcoded dict still resolves after registry failure", +) +_assert( + _pipe_fail._get_provider_icon("x-ai") is None, + "_get_provider_icon: x-ai falls back to None when registry failed", +) + +# 25m. Registry HTTP non-200 → empty dict +_mock_reg_403 = MagicMock() +_mock_reg_403.status_code = 403 +_mock_reg_403.json.return_value = {"data": []} + +_pipe_403 = Pipe() +_pipe_403.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key") +with patch.object(_pipe_403._session, "get", return_value=_mock_reg_403): + _r_403 = _pipe_403._load_provider_registry() +_assert(_r_403 == {}, "registry: HTTP 403 → empty dict (no parse, no retry)") # ── 26. _stream_response() edge cases ──────────────────────────────────────── @@ -1716,7 +2293,7 @@ async def _test_pipe_no_msgs_key(): {"id": "some/model", "name": "Model", "pricing": {"prompt": "not-a-number", "completion": "0"}}, ] } -pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", FREE_ONLY=True) +pipe.valves = Pipe.Valves(OPENROUTER_API_KEY="test-key", FREE_MODEL_FILTER="only") pipe._models_cache = None with patch.object(pipe._session, "get", return_value=_mock_invalid_price): models = pipe.pipes() @@ -1777,7 +2354,8 @@ async def _test_pipe_no_msgs_key(): "cache_control: image_url chunk skipped in mixed content", ) _assert( - payload_mixed_img["messages"][0]["content"][1].get("cache_control") == {"type": "ephemeral"}, + payload_mixed_img["messages"][0]["content"][1].get("cache_control") + == {"type": "ephemeral", "ttl": "5m"}, "cache_control: text chunk in mixed content gets cache_control", ) @@ -1795,10 +2373,282 @@ async def _test_pipe_no_msgs_key(): } pipe._inject_cache_control(payload_user_list) _assert( - payload_user_list["messages"][0]["content"][1].get("cache_control") == {"type": "ephemeral"}, + payload_user_list["messages"][0]["content"][1].get("cache_control") + == {"type": "ephemeral", "ttl": "5m"}, "cache_control: user role list content gets cache_control when no system role", ) +# ── 28d. v1.6.0 — Web search plugin builder ───────────────────────────────── +_section("28d. v1.6.0 web search plugin") + +_pipe_ws = Pipe() +_pipe_ws.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ENABLE_WEB_SEARCH=False) +_assert(_pipe_ws._build_web_search_plugin() is None, "web search disabled → None") + +_pipe_ws.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", + ENABLE_WEB_SEARCH=True, + WEB_SEARCH_MAX_RESULTS=8, + WEB_SEARCH_PROMPT="Find authoritative sources", + WEB_SEARCH_INCLUDE_DOMAINS="*.gov, *.edu", + WEB_SEARCH_EXCLUDE_DOMAINS="reddit.com", +) +_plugin = _pipe_ws._build_web_search_plugin() +_assert(_plugin and _plugin["id"] == "web", "web plugin id is 'web'") +_assert(_plugin["max_results"] == 8, "max_results forwarded") +_assert(_plugin["search_prompt"] == "Find authoritative sources", "custom search_prompt") +_assert(_plugin["include_domains"] == ["*.gov", "*.edu"], "include_domains parsed") +_assert(_plugin["exclude_domains"] == ["reddit.com"], "exclude_domains parsed") + +# Payload integration: appended to existing user plugins, never duplicated +_pipe_ws = Pipe() +_pipe_ws.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ENABLE_WEB_SEARCH=True) +_p_ws = _pipe_ws._prepare_payload({"model": "openai/gpt-4o", "messages": []}) +_assert(any(p.get("id") == "web" for p in _p_ws.get("plugins", [])), + "ENABLE_WEB_SEARCH: plugins[] contains web entry") + +# User plugins preserved alongside web +_pipe_ws.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ENABLE_WEB_SEARCH=True) +_p_ws = _pipe_ws._prepare_payload({ + "model": "openai/gpt-4o", + "messages": [], + "plugins": [{"id": "file-parser"}], +}) +_p_ids = [p.get("id") for p in _p_ws.get("plugins", [])] +_assert("file-parser" in _p_ids and "web" in _p_ids, "user plugins coexist with auto web plugin") + +# Existing user-supplied web plugin wins +_pipe_ws.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ENABLE_WEB_SEARCH=True, WEB_SEARCH_MAX_RESULTS=20) +_p_ws = _pipe_ws._prepare_payload({ + "model": "openai/gpt-4o", + "messages": [], + "plugins": [{"id": "web", "max_results": 3}], +}) +_assert( + sum(1 for p in _p_ws["plugins"] if p.get("id") == "web") == 1, + "user-supplied web plugin not duplicated by valve injection", +) +_assert( + _p_ws["plugins"][0].get("max_results") == 3, + "user-supplied web plugin keeps its own max_results", +) + +# Web search disabled → no plugin emitted at all +_pipe_ws.valves = Pipe.Valves(OPENROUTER_API_KEY="k", ENABLE_WEB_SEARCH=False) +_p_ws = _pipe_ws._prepare_payload({"model": "openai/gpt-4o", "messages": []}) +_assert("plugins" not in _p_ws, "web search disabled: no plugins key added") + +# ── 28e. v1.6.0 — REASONING_MAX_TOKENS ────────────────────────────────────── +_section("28e. v1.6.0 reasoning max_tokens") + +_pipe_rmt = Pipe() +_pipe_rmt.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", REASONING_EFFORT="high", REASONING_MAX_TOKENS=2048 +) +_p_rmt = _pipe_rmt._prepare_payload({"model": "openai/o1", "messages": []}) +_assert( + _p_rmt.get("reasoning") == {"effort": "high", "max_tokens": 2048}, + "reasoning.max_tokens emitted alongside effort", +) + +_pipe_rmt.valves = Pipe.Valves(OPENROUTER_API_KEY="k", REASONING_MAX_TOKENS=0) +_p_rmt = _pipe_rmt._prepare_payload({"model": "openai/o1", "messages": []}) +_assert("reasoning" not in _p_rmt, "max_tokens=0 + no effort: reasoning key omitted") + +# ── 28f. v1.6.0 — Provider extras (only/quantizations/allow_fallbacks/max_price) ── +_section("28f. v1.6.0 provider preferences extras") + +_pipe_pp = Pipe() +_pipe_pp.valves = Pipe.Valves( + OPENROUTER_API_KEY="k", + PROVIDER_ONLY="anthropic, openai", + PROVIDER_QUANTIZATIONS="bf16, fp8", + PROVIDER_ALLOW_FALLBACKS=False, + PROVIDER_MAX_PRICE_PROMPT="3.0", + PROVIDER_MAX_PRICE_COMPLETION="15.0", +) +_p_pp = _pipe_pp._prepare_payload({"model": "openai/gpt-4o", "messages": []}) +_p_provider = _p_pp.get("provider", {}) +_assert(_p_provider.get("only") == ["anthropic", "openai"], "provider.only forwarded") +_assert(_p_provider.get("quantizations") == ["bf16", "fp8"], "provider.quantizations lower-cased") +_assert(_p_provider.get("allow_fallbacks") is False, "provider.allow_fallbacks=False emitted only when opted out") +_assert( + _p_provider.get("max_price") == {"prompt": "3.0", "completion": "15.0"}, + "provider.max_price merged", +) + +# Defaults: allow_fallbacks=true is implicit (omit field) +_pipe_pp.valves = Pipe.Valves(OPENROUTER_API_KEY="k", PROVIDER_ALLOW_FALLBACKS=True) +_p_pp = _pipe_pp._prepare_payload({"model": "openai/gpt-4o", "messages": []}) +_assert( + "provider" not in _p_pp or "allow_fallbacks" not in _p_pp.get("provider", {}), + "PROVIDER_ALLOW_FALLBACKS=True (default): field omitted", +) + +# ── 28g. v1.6.0 — SERVICE_TIER ────────────────────────────────────────────── +_section("28g. v1.6.0 service tier") + +for tier in ("auto", "default", "flex", "priority", "scale"): + _pipe_st = Pipe() + _pipe_st.valves = Pipe.Valves(OPENROUTER_API_KEY="k", SERVICE_TIER=tier) + _p_st = _pipe_st._prepare_payload({"model": "openai/gpt-4o", "messages": []}) + _assert(_p_st.get("service_tier") == tier, f"SERVICE_TIER='{tier}' forwarded") + +# Bogus value silently dropped +_pipe_st = Pipe() +_pipe_st.valves = Pipe.Valves(OPENROUTER_API_KEY="k", SERVICE_TIER="bogus") +_p_st = _pipe_st._prepare_payload({"model": "openai/gpt-4o", "messages": []}) +_assert("service_tier" not in _p_st, "garbage SERVICE_TIER silently ignored") + +# ── 28h. v1.6.0 — Cached prompt-token cost breakdown ──────────────────────── +_section("28h. v1.6.0 cached prompt token reporting") + +_format_cost_info = mod._format_cost_info + +# OpenAI / Anthropic shape: prompt_tokens_details.cached_tokens +_cost_with_cache = _format_cost_info({ + "prompt_tokens": 1000, + "completion_tokens": 200, + "total_tokens": 1200, + "prompt_tokens_details": {"cached_tokens": 800}, + "cost": 0.0030, +}, "USD") +_assert("800 cached" in _cost_with_cache, "cached tokens shown in token line") +_assert("200 prompt" in _cost_with_cache, "non-cached prompt tokens shown (1000-800=200)") + +# Alternate shape: cache_read_input_tokens (some Anthropic surfaces) +_cost_alt = _format_cost_info({ + "prompt_tokens": 500, + "completion_tokens": 100, + "cache_read_input_tokens": 400, +}, "USD") +_assert("400 cached" in _cost_alt, "cache_read_input_tokens recognised") + +# No cache info → original format preserved +_cost_plain = _format_cost_info({ + "prompt_tokens": 100, "completion_tokens": 50, "total_tokens": 150 +}, "USD") +_assert("cached" not in _cost_plain, "no cache field: footer unchanged") + +# ── 28i. v1.6.0 — Generation ID footer ────────────────────────────────────── +_section("28i. v1.6.0 generation id footer") + +_format_gen = mod._format_generation_id +_assert(_format_gen(None) == "", "None → empty string") +_assert(_format_gen("") == "", "empty → empty string") +out = _format_gen("gen-abc123") +_assert("gen-abc123" in out, "generation id appears in footer") +_assert("`gen-abc123`" in out, "generation id wrapped in backticks for click-to-copy") + +# Non-stream response surfaces the id when SHOW_GENERATION_ID=True +_pipe_gen = Pipe() +_pipe_gen.valves = Pipe.Valves(OPENROUTER_API_KEY="k", SHOW_GENERATION_ID=True) +_mock_gen_resp = MagicMock() +_mock_gen_resp.json.return_value = { + "id": "gen-zzz111", + "model": "openai/gpt-4o", + "choices": [{"message": {"content": "hi", "role": "assistant"}}], +} +with patch.object(_pipe_gen, "_retryable_request", return_value=_mock_gen_resp): + _out = _pipe_gen._non_stream_response({}, {"model": "openai/gpt-4o"}) +_assert("gen-zzz111" in _out, "non-stream: generation id rendered when SHOW_GENERATION_ID=True") + +# Toggled off → no footer +_pipe_gen.valves = Pipe.Valves(OPENROUTER_API_KEY="k", SHOW_GENERATION_ID=False) +_mock_gen_resp.json.return_value = { + "id": "gen-zzz111", + "model": "openai/gpt-4o", + "choices": [{"message": {"content": "hi", "role": "assistant"}}], +} +with patch.object(_pipe_gen, "_retryable_request", return_value=_mock_gen_resp): + _out = _pipe_gen._non_stream_response({}, {"model": "openai/gpt-4o"}) +_assert("gen-zzz111" not in _out, "SHOW_GENERATION_ID=False: footer suppressed") + +# ── 28j. v1.6.0 — MODEL_CATEGORY query param ──────────────────────────────── +_section("28j. v1.6.0 MODEL_CATEGORY") + +_mock_cat_resp = MagicMock() +_mock_cat_resp.status_code = 200 +_mock_cat_resp.json.return_value = {"data": [{"id": "openai/gpt-4o", "name": "GPT-4o"}]} +_mock_cat_resp.raise_for_status = MagicMock() + +_captured_params = {} + +def _capture_cat(*args, **kwargs): + _captured_params.clear() + _captured_params.update(kwargs) + return _mock_cat_resp + +_pipe_cat = Pipe() +_pipe_cat.valves = Pipe.Valves(OPENROUTER_API_KEY="k", MODEL_CATEGORY="programming") +_pipe_cat._models_cache = None +with patch.object(_pipe_cat._session, "get", side_effect=_capture_cat): + _pipe_cat.pipes() +_assert( + _captured_params.get("params", {}).get("category") == "programming", + "MODEL_CATEGORY: '?category=programming' forwarded to /models", +) + +# Empty category → no category param sent +_pipe_cat = Pipe() +_pipe_cat.valves = Pipe.Valves(OPENROUTER_API_KEY="k", MODEL_CATEGORY="") +_pipe_cat._models_cache = None +with patch.object(_pipe_cat._session, "get", side_effect=_capture_cat): + _pipe_cat.pipes() +_assert( + "category" not in _captured_params.get("params", {}), + "empty MODEL_CATEGORY: no category param sent", +) + +# ── 28k. v1.6.0 — Deprecated model tagging ────────────────────────────────── +_section("28k. v1.6.0 deprecated model handling") + +_mock_deprec = { + "data": [ + {"id": "openai/gpt-3.5-turbo", "name": "GPT-3.5", "expiration_date": "2026-09-01"}, + {"id": "openai/gpt-4o", "name": "GPT-4o"}, + ] +} +_mock_deprec_resp = MagicMock() +_mock_deprec_resp.status_code = 200 +_mock_deprec_resp.json.return_value = _mock_deprec +_mock_deprec_resp.raise_for_status = MagicMock() + +# Default: deprecated kept and tagged +_pipe_dep = Pipe() +_pipe_dep.valves = Pipe.Valves(OPENROUTER_API_KEY="k") +_pipe_dep._models_cache = None +with patch.object(_pipe_dep._session, "get", return_value=_mock_deprec_resp): + _dep_models = _pipe_dep.pipes() +_dep_by_id = {m["id"]: m["name"] for m in _dep_models} +_assert("openai/gpt-3.5-turbo" in _dep_by_id, "deprecated model still listed by default") +_assert("⚠" in _dep_by_id["openai/gpt-3.5-turbo"], "deprecated model tagged with ⚠ marker") +_assert("(deprecated)" in _dep_by_id["openai/gpt-3.5-turbo"], "deprecated label appended to name") +_assert("⚠" not in _dep_by_id["openai/gpt-4o"], "live model untouched") + +# HIDE_DEPRECATED_MODELS=True drops them +_pipe_dep = Pipe() +_pipe_dep.valves = Pipe.Valves(OPENROUTER_API_KEY="k", HIDE_DEPRECATED_MODELS=True) +_pipe_dep._models_cache = None +with patch.object(_pipe_dep._session, "get", return_value=_mock_deprec_resp): + _dep_models = _pipe_dep.pipes() +_dep_ids = {m["id"] for m in _dep_models} +_assert(_dep_ids == {"openai/gpt-4o"}, "HIDE_DEPRECATED_MODELS=True: deprecated rows removed") + +# ── 28l. v1.6.0 — Cache-key invalidates on new filter valves ──────────────── +_section("28l. v1.6.0 cache key includes MODEL_CATEGORY / HIDE_DEPRECATED_MODELS") + +_keys_v16 = [] +for v in [ + {}, + {"MODEL_CATEGORY": "programming"}, + {"HIDE_DEPRECATED_MODELS": True}, +]: + _p = Pipe() + _p.valves = Pipe.Valves(OPENROUTER_API_KEY="k", **v) + _keys_v16.append(_p._build_cache_key()) +_assert(len(set(_keys_v16)) == len(_keys_v16), "cache key differs per new v1.6 filter valve") + # ── 29. _non_stream_response() edge cases ──────────────────────────────────── _section("29. _non_stream_response() edge cases")