feat: add OpenRouter as LLM and embedding provider#56
feat: add OpenRouter as LLM and embedding provider#56oglenyaboss wants to merge 4 commits intorepowise-dev:mainfrom
Conversation
Add OpenRouter as a first-class provider, enabling access to 200+ models (Claude, GPT, Gemini, Llama, Qwen, etc.) through a single API key. LLM provider: - New OpenRouterProvider using OpenAI-compatible endpoint - Supports generate() and stream_chat() (ChatProvider protocol) - Sets recommended HTTP-Referer and X-Title headers - Default model: anthropic/claude-sonnet-4.6 - Rate limits: 60 RPM / 200K TPM - Cost tracking intentionally disabled (OpenRouter proxies models with varying prices — users should check the OpenRouter dashboard) Embedding provider: - New OpenRouterEmbedder for vector search and chat - Default model: google/gemini-embedding-001 (768 dims) - One OPENROUTER_API_KEY covers both LLM and embeddings Integration: - Registered in LLM and embedding registries (lazy import) - CLI auto-detection from OPENROUTER_API_KEY env var - Interactive provider selection in `repowise init` - Embedder selection in `repowise serve` - Server provider catalog for web UI - No new pip dependency (uses existing openai package) Tests: - 13 unit tests (construction, generation, error mapping, headers) - Registry test updated (builtin count 6 → 7) - Integration test (skipped without OPENROUTER_API_KEY)
|
This is really thorough, probably the most complete provider PR we've gotten. LLM + embedder + CLI + server catalog all done, and the honest known limitations section is appreciated. One thing to fix before merge:
Also flagging for awareness (not blocking):
Should be a quick fix, happy to merge after (1). |
Remove the cost_tracker parameter and unreachable if-block from generate(). OpenRouter proxies 200+ models with varying prices, so cost tracking is documented as unsupported — users should check the OpenRouter dashboard instead.
- stream_chat: text deltas, tool calls, rate limit error (3 tests) - OpenRouterEmbedder: construction, dimensions, embedding, base URL (12 tests)
|
Thanks for the review!
|
swati510
left a comment
There was a problem hiding this comment.
Nice clean provider addition, follows the existing shape well. A few things worth looking at before merge:
-
Silent dimension corruption risk in OpenRouterEmbedder: _DIMS falls back to 768 for any model not in the dict, but the actual model might be 1024 or 3072 dims. If a user picks e.g. cohere/embed-english-v3, they'll get 768-dim stored vectors that don't match the model's real output, vector store silently corrupted. Safer to raise ValueError("unknown embedding model %s, add to _DIMS") instead of falling back.
-
OpenAI param compat: you're using max_completion_tokens, which is the newer OpenAI SDK name. OpenRouter proxies 200+ models and not all of them accept that param (some only take max_tokens). Worth smoke-testing against a few of the listed defaults (anthropic/claude-sonnet-4.6, google/gemini-, meta-llama/) to confirm it works, and falling back to max_tokens if not.
-
The OpenRouterProvider constructor accepts **_kwargs: Any and silently drops them. If the registry later starts passing e.g. cost_tracker=..., rate_limiter=..., or tier=... (now that minimax/zai use tier), those will just vanish. Drop the **_kwargs catchall or convert it to explicit params like every other provider.
Embedder dimensions fallback is the one that could bite users badly, the other two are mostly hygiene.
|
|
||
| @property | ||
| def dimensions(self) -> int: | ||
| return self._DIMS.get(self._model, 768) |
There was a problem hiding this comment.
Silent 768-dim fallback for unknown models is a footgun. If a user picks a model with different real dimensions (e.g. 1024), stored vectors won't match the model output and the vector store is corrupted without any error. Raise ValueError here instead, force users to add new models to _DIMS explicitly.
| rate_limiter: RateLimiter | None = None, | ||
| http_referer: str | None = None, | ||
| app_title: str = "repowise", | ||
| **_kwargs: Any, |
There was a problem hiding this comment.
Silently swallowing unknown kwargs means any future change to how the registry constructs providers (e.g. passing cost_tracker, rate_limiter, tier) will fail silently for this provider. Drop the catchall and declare the params you accept explicitly.
- Embedder: raise ValueError in __init__ for unknown models instead of silently falling back to 768 dims, which would mis-size stored vectors against the model's real output and corrupt the vector store. - Provider: drop **_kwargs catchall and accept cost_tracker explicitly so unknown kwargs from future registry changes fail loudly instead of vanishing. - Provider: switch max_completion_tokens → max_tokens in generate() and stream_chat(). Per OpenRouter API docs, max_tokens is the universal parameter across the 200+ proxied models; max_completion_tokens is an OpenAI-specific newer name not all proxied models accept.
|
Thanks for the catches! Fixed all three in 1f9f67a: 1. Silent dimension fallback → ValueError. Moved the check to 2. 3. Tests updated: replaced the unknown-model-defaults test with a construction-failure test, switched the kwargs assertion to |
Summary
OPENROUTER_API_KEYrepowise init(doc generation) andrepowise serve(chat/search) work with OpenRouter out of the boxopenaipackage (OpenAI-compatible API)LLM Provider
OpenRouterProviderwithgenerate()+stream_chat()supportanthropic/claude-sonnet-4.6HTTP-RefererandX-Titleheaders for OpenRouter dashboard trackingEmbedding Provider
OpenRouterEmbedderfor semantic search and chat RAGgoogle/gemini-embedding-001(768 dims)Integration Points
OPENROUTER_API_KEYenv varrepowise initandrepowise serveKnown Limitations
/api/v1/modelsendpoint.qwen/qwen3.6-plusvia OpenRouter. Chat and embedding functionality was not integration-tested but follows the same OpenAI-compatible patterns as the existing OpenAI provider.Test plan
OPENROUTER_API_KEY)repowise initwith OpenRouter on a real project (328 wiki pages generated successfully)