Feat/OpenAI embeddings by jonesj38 · Pull Request #689 · tobi/qmd

jonesj38 · 2026-05-28T18:42:52Z

Re-opening of previous pull request #116 and a solution to issue #620

Summary

Optional OpenAI integration for embeddings and query expansion. Dramatically faster for users who prefer API-based inference over local models.

Performance

Features

• OpenAI Embeddings — text-embedding-3-small (1536 dims), native batch API, ~$0.02/1M tokens
• OpenAI Query Expansion — gpt-4o-mini for lex/vec/hyde variants
• OpenAI Reranking — API-based reranking replaces local qwen3-reranker, eliminating model download and GGUF inference overhead
• Tiktoken chunking — eliminates model load time for tokenization
• Robust retry logic — exponential backoff with jitter for rate limits
Usage

export OPENAI_API_KEY="sk-..." export QMD_OPENAI=1 qmd embed -f # Re-embed with OpenAI qmd search "query"

Design

• Opt-in — local models remain the default
• Graceful fallback — errors don't crash, just skip
• Replace local reranking with OpenAI — no GGUF model download or local inference needed
• No breaking changes — existing workflows unchanged
Files Changed

• src/openai-llm.ts — new OpenAI LLM implementation
• src/llm.ts — embedding config, provider switching
• src/store.ts — tiktoken chunking integration
• src/qmd.ts — QMD_OPENAI env var support
Dependencies

• openai — API client
• tiktoken — fast BPE tokenization

Adds support for using OpenAI's text-embedding-3-small model as an alternative to local llama-cpp embeddings. Changes: - New openai-llm.ts: OpenAI API client implementing LLM interface - llm.ts: Embedding config management, getDefaultEmbeddingLLM() - collections.ts: EmbeddingProviderConfig for YAML config schema - store.ts: Use configurable embedding LLM, skip local model for query expansion/rerank when using OpenAI - qmd.ts: Load embedding config on startup - package.json: Add openai dependency - README.md: Documentation for OpenAI embeddings Configuration (in ~/.config/qmd/index.yml): embedding: provider: openai openai: api_key: sk-... # Optional, falls back to OPENAI_API_KEY env model: text-embedding-3-small # Optional, this is the default Benefits: - Much faster embedding (~10x vs local models on CPU) - No GPU/VRAM requirements - More reliable (no local model loading issues) - Cost: ~$0.02 per 1M tokens

- OpenAI embeddings (text-embedding-3-small, 1536d) via QMD_OPENAI=1 - Query expansion with gpt-4o-mini (~200ms vs 30s local) - Tiktoken for fast tokenization (no model loading) - Exponential backoff with jitter for rate limits (429) - Inter-batch delay (150ms) to avoid hitting RPM limits - Performance: search 3-5s (was 30-60s), embed ~10min (was 2hrs) Files: openai-llm.ts, llm.ts, store.ts, qmd.ts Deps: openai, tiktoken

Replace the rerank() stub with a real listwise reranker using gpt-4o-mini. - Sends top candidates with query to gpt-4o-mini as a ranking task - Parses comma-separated index output, handles missing/duplicate indices - Skips API call for ≤2 documents (not worth the latency) - Falls back to original order on API failure - Cost: ~$0.001 per rerank call - Updated qmd.ts to route through OpenAI reranker instead of skipping The full qmd query pipeline with OpenAI now: 1. Query expansion (gpt-4o-mini) 2. BM25 + vector search (parallel) 3. RRF fusion 4. Cross-encoder reranking (gpt-4o-mini) ← NEW 5. Position-aware blending

Accept comma-separated collection names in -c flag for cross-collection search. All three search modes (search, vsearch, query) now support querying multiple collections simultaneously. Changes: - resolveCollectionFilter() helper parses and validates comma-separated names - searchFTS() accepts string | string[] for collection filtering - searchVec() accepts string | string[] for collection filtering - SQL uses IN clause for multi-collection filtering - Updated interface types and test for new parameter types Usage: qmd search 'auth' -c repo-a,repo-b qmd vsearch 'auth patterns' -c docs,examples qmd query 'OAuth implementation' -c project,patterns,docs This enables Shad's multi-vault search to pass all vault collections in a single qmd call instead of running separate searches per collection.

Add support for separate OpenAI-compatible servers for embeddings vs chat (expansion/reranking). Common in setups where local GPU serves embeddings and cloud handles chat. Implements Kaspre's split-URL pattern from PR tobi#116 discussion. - Add chat_base_url and chat_api_key to YAML config and OpenAIConfig - Add QMD_OPENAI_* env var prefix (QMD_OPENAI_BASE_URL, QMD_OPENAI_API_KEY, QMD_OPENAI_CHAT_BASE_URL, QMD_OPENAI_CHAT_API_KEY) per alexleach's suggestion - Wire expansion_model and base_url through YAML config per viniciushsantana's feedback - Route expandQuery() and rerank() through chatClient, embed()/embedBatch() through embedding client - Fix upstream rebase issues (Database.transaction type, collectionName rename) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@Kaspre

…ifferent models can be used to each. Thanks to @Kaspre for their comment embedding: provider: openai openai: api_key: "sk-..." base_url: "http://localhost:8081/v1" # embeddings model: "nomic-embed-text" chat_base_url: "https://ollama.com/v1" # expansion (falls back to base_url) chat_api_key: "..." # (falls back to api_key) expansion_model: "gemma3:4b" rerank_base_url: "https://api.cohere.com/v1" # reranking (falls back to chat_base_url) rerank_api_key: "..." # (falls back to chat_api_key) rerank_model: "rerank-v3" # (falls back to expansion_model) also rebased onto main

@alexleach

…ename, embed fix Changes based on PR comments: 1. Configurable base_url for OpenAI-compatible APIs (Ollama, vLLM, Azure) - collections.ts: EmbeddingProviderConfig already has base_url field - qmd.ts: now passes base_url and expansion_model from YAML to setEmbeddingConfig - openai-llm.ts: constructor accepts baseURL config 2. Env var rename: QMD_OPENAI_API_KEY takes priority over OPENAI_API_KEY - Avoids conflict with official openai-node SDK (per @alexleach) - Falls back to OPENAI_API_KEY for backwards compatibility 3. generateEmbeddings bypasses LlamaCpp when using OpenAI (per @viniciushsantana) - OpenAI path calls API directly, no local model session needed - Refactored to shared runEmbedding() with pluggable embed/embedBatch fns 4. expandQuery now actually calls OpenAI for query expansion - Was previously returning lex-only fallback when isUsingOpenAI() - Now uses gpt-4o-mini via openaiLLM.expandQuery() 5. README updated with base_url, expansion_model docs Addresses: @alexleach (env naming, base_url), @viniciushsantana (embed fix, expansion_model, base_url YAML wiring)

…ments - Lazy-load node-llama-cpp to skip native compilation in OpenAI mode - Add tiktoken-based input truncation (QMD_OPENAI_MAX_INPUT_TOKENS) - QMD_OPENAI_BASE_URL auto-activates OpenAI mode (no QMD_OPENAI=1 needed) - Skip LlamaCpp init in qmd status when using OpenAI - Restore terminal cursor on embed error (try/finally) - Bypass withLLMSession in vectorSearch/querySearch for OpenAI mode Co-authored-by: ALB.Leach <alexleach@users.noreply.github.com>

socket-security · 2026-05-28T18:43:21Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	npm/tiktoken@1.0.22
	npm/openai@4.104.0

View full report

Shrub24 · 2026-05-29T18:04:45Z

How does this compare to #446 #619

jonesj38 and others added 13 commits April 11, 2026 19:23

fix: use default embedding LLM for hybrid vector queries

ce3b061

Merge branch 'main' into feat/openai-embeddings

6668fb7

fix: apply OpenAI embedding config in SDK mode

8eab131

fix: close showStatus local model branch

6d06cf8

Merge upstream main into OpenAI embeddings branch

f9077e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/OpenAI embeddings#689

Feat/OpenAI embeddings#689
jonesj38 wants to merge 13 commits into
tobi:mainfrom
jonesj38:feat/openai-embeddings

jonesj38 commented May 28, 2026 •

edited

Loading

Uh oh!

socket-security Bot commented May 28, 2026

Uh oh!

Shrub24 commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jonesj38 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

socket-security Bot commented May 28, 2026

Uh oh!

Shrub24 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonesj38 commented May 28, 2026 •

edited

Loading

Shrub24 commented May 29, 2026 •

edited

Loading