feat(rag): decouple embedding provider from active LLM#741
Open
feat(rag): decouple embedding provider from active LLM#741
Conversation
Wiki RAG used to derive its embedding provider only from the active LLM
config. When the active provider was claude_bridge (or anything without
/v1/embeddings), this fell through to BM25-only — unless Vector Search
was configured. On the server VS was OOM-crashing, so semantic search
was effectively dead across all assistants.
Add a third tier after the active-LLM lookup: scan the cloud_llm_providers
table for any enabled gemini / openai / deepseek / openrouter provider and
use its key for embeddings. Lets chat keep claude_bridge while embeddings
go through Gemini REST (no local model in memory, free tier 1500 req/min).
Also tighten the OpenAI-compatible branch to a known provider_type list —
the previous `cloud_config.get("base_url")` fallback was too permissive
and would happily hand a claude_bridge base_url to the OpenAI embeddings
client.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
init_wiki_rag()that picks any enabledgemini/openai/deepseek/openrouterprovider from thecloud_llm_providerstable and uses its key for embeddings — only when the active LLM didn't already produce one.provider_type ∈ {openai, deepseek, openrouter}instead of "any provider with abase_url" (previously this would have happily handed aclaude_bridgebase_url to the OpenAI embeddings client).Why
Server runs
LLM_BACKEND=cloud:claude-bridge-…for chat.claude_bridgehas no/v1/embeddingsendpoint, so before this patch Wiki RAG fell through to BM25-only (or to Vector Search, which was OOM-crashing on a 6 GB VPS with the 768-dim mpnet model). Result: zero semantic search for every assistant on the server.With this patch, chat keeps
claude_bridge(Sonnet quality) while embeddings go through the existing enabledgemini-defaultprovider via REST. No local model in memory, free tier covers our volume.Test plan
ruff check+ruff formatcleancloud_provider_service.get_by_type("gemini")+get_provider_with_key()returns the api_key from DBai-secretary→ grep startup log forWiki RAG: cloud embeddings (Gemini from DB: gemini-default)(only appears ifVECTOR_SEARCH_URLis unset or empty)Notes
paraphrase-multilingual-MiniLM-L12-v2, 384-dim, ~470 MB peak vs the previous 5+ GB). When VS is up, it still wins (this code path is gated onnot vector_search_configured). When it's down or unconfigured, embeddings still work — that's the new behavior.🤖 Generated with Claude Code