feat(rag): decouple embedding provider from active LLM by ShaerWare · Pull Request #741 · ShaerWare/AI_Secretary_System

ShaerWare · 2026-04-25T13:49:04Z

Summary

Add a third tier in init_wiki_rag() that picks any enabled gemini / openai / deepseek / openrouter provider from the cloud_llm_providers table and uses its key for embeddings — only when the active LLM didn't already produce one.
Tighten the OpenAI-compatible branch in the active-LLM tier: only fire for provider_type ∈ {openai, deepseek, openrouter} instead of "any provider with a base_url" (previously this would have happily handed a claude_bridge base_url to the OpenAI embeddings client).
Order stays: local sentence-transformers → active-LLM cloud → DB-provider cloud → BM25-only.

Why

Server runs LLM_BACKEND=cloud:claude-bridge-… for chat. claude_bridge has no /v1/embeddings endpoint, so before this patch Wiki RAG fell through to BM25-only (or to Vector Search, which was OOM-crashing on a 6 GB VPS with the 768-dim mpnet model). Result: zero semantic search for every assistant on the server.

With this patch, chat keeps claude_bridge (Sonnet quality) while embeddings go through the existing enabled gemini-default provider via REST. No local model in memory, free tier covers our volume.

Test plan

ruff check + ruff format clean
Verified locally: cloud_provider_service.get_by_type("gemini") + get_provider_with_key() returns the api_key from DB
Server pull → restart ai-secretary → grep startup log for Wiki RAG: cloud embeddings (Gemini from DB: gemini-default) (only appears if VECTOR_SEARCH_URL is unset or empty)
Smoke-test a chat query against a collection — should return ranked semantic matches, not just BM25 hits

Notes

This patch is the safety net. On the server we also re-enabled Vector Search with a lighter model (paraphrase-multilingual-MiniLM-L12-v2, 384-dim, ~470 MB peak vs the previous 5+ GB). When VS is up, it still wins (this code path is gated on not vector_search_configured). When it's down or unconfigured, embeddings still work — that's the new behavior.

🤖 Generated with Claude Code

Wiki RAG used to derive its embedding provider only from the active LLM config. When the active provider was claude_bridge (or anything without /v1/embeddings), this fell through to BM25-only — unless Vector Search was configured. On the server VS was OOM-crashing, so semantic search was effectively dead across all assistants. Add a third tier after the active-LLM lookup: scan the cloud_llm_providers table for any enabled gemini / openai / deepseek / openrouter provider and use its key for embeddings. Lets chat keep claude_bridge while embeddings go through Gemini REST (no local model in memory, free tier 1500 req/min). Also tighten the OpenAI-compatible branch to a known provider_type list — the previous `cloud_config.get("base_url")` fallback was too permissive and would happily hand a claude_bridge base_url to the OpenAI embeddings client. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag): decouple embedding provider from active LLM#741

feat(rag): decouple embedding provider from active LLM#741
ShaerWare wants to merge 1 commit intomainfrom
feat/embeddings-decouple-llm

ShaerWare commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ShaerWare commented Apr 25, 2026

Summary

Why

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant