Skip to content

Commit 08dd8d6

Browse files
pluginmdclaude
andcommitted
feat: polymorphic provider backend, observability, and secrets hygiene
This commit is the full enhancement set on top of upstream tobi/qmd: Provider abstraction - LlamaCpp constructor now dispatches embed/rerank between local node-llama-cpp and Jina AI via env vars or per-index YAML URIs. - Generation (query expansion) stays local for latency reasons. - Jina embeddings: jina-embeddings-v3 (1024d, 8192 ctx, Matryoshka). - Jina reranking: jina-reranker-v2-base-multilingual. - Backwards-compatible: no env var set == identical upstream behaviour. Observability layer - New append-only jina_usage SQLite table + getUsageSnapshot helper. - qmd usage command with text/JSON/CSV/ASCII histogram outputs. - qmd usage chart renders a daily histogram with unicode block bars. - Quota warnings (ok/warn/critical/over) driven by QMD_JINA_QUOTA. - qmd status shows compact quota summary when a remote provider is active. - Stable qmd.usage.v1 JSON schema for CI gates and scripting. Benchmarking - qmd bench jina measures embed_single, embed_batch, and rerank stages against local and/or Jina backends with deterministic synthetic workloads. - --runs N flattens samples across runs for median/mean/stddev/p95. - High-variance stages (stddev > 20% of median) highlighted yellow. - Side-by-side comparison table with speedup ratio and winner. - Stable qmd.bench.jina.v1 JSON output. Secrets hygiene - .env auto-load via zero-dep src/dotenv.ts (180 lines, 18 tests). - .env.example template documenting all env vars. - .gitignore excludes .env, .env.*, *.key, *.pem, secrets/. - scripts/scan-secrets.sh detects leaked keys for Jina, OpenAI, Anthropic, Voyage, Cohere, GitHub, AWS, and PEM blocks. - Scanner reports file:line without re-leaking the key value. - scripts/pre-commit hook invokes the scanner on staged files. - scripts/install-hooks.sh installs both pre-commit and pre-push. Tests - +68 new tests (50 jina + 18 dotenv), 467 total, 0 regressions. - All Jina tests mock fetch: zero real API calls during test runs. - Clean typecheck on tsconfig.build.json. Documentation - README.md rewritten in English with upstream attribution and a clear "what this fork adds" delta table near the top. - docs/01-architecture-changes.md design rationale (569 lines) covering the six substantive shifts, what deliberately did not change, architectural deltas, and open questions. - CHANGELOG.md Unreleased section documents every shift. Upstream compatibility: additive, never subtractive. Every new capability is opt-in. If you set no new env vars and touch no new YAML fields, QMD behaves identically to upstream tobi/qmd. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c2f3a40 commit 08dd8d6

17 files changed

Lines changed: 5659 additions & 430 deletions

.env.example

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# QMD environment configuration template
2+
#
3+
# HOW TO USE:
4+
# 1. Copy this file: cp .env.example .env
5+
# 2. Edit .env with your real values
6+
# 3. Source it before running qmd: `set -a && source .env && set +a`
7+
# (or use direnv / dotenv-cli / your shell's env loader)
8+
#
9+
# The .env file is GITIGNORED — safe to put real secrets in.
10+
# NEVER edit this .env.example file with real values.
11+
#
12+
# If you accidentally commit a key: rotate it IMMEDIATELY at the provider.
13+
14+
# =============================================================================
15+
# Jina AI remote providers (optional)
16+
# =============================================================================
17+
18+
# Activate Jina as the embedding backend.
19+
# When set, embed()/embedBatch() route to the Jina cloud API instead of a local
20+
# GGUF model. Local generation and reranking still work.
21+
# QMD_EMBED_PROVIDER=jina
22+
23+
# Activate Jina as the reranker backend.
24+
# QMD_RERANK_PROVIDER=jina
25+
26+
# Jina API key — required when either provider above is set.
27+
# Get one at https://jina.ai/
28+
# JINA_API_KEY=jina_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
29+
30+
# --- Optional Jina overrides ---
31+
32+
# Embedding model (default: jina-embeddings-v3)
33+
# QMD_JINA_MODEL=jina-embeddings-v3
34+
35+
# Output dimensions. v3 supports Matryoshka truncation: 32/64/128/256/512/1024
36+
# (default: 1024)
37+
# QMD_JINA_DIMENSION=1024
38+
39+
# Reranker model (default: jina-reranker-v2-base-multilingual)
40+
# QMD_JINA_RERANK_MODEL=jina-reranker-v2-base-multilingual
41+
42+
# API base URL — override for proxies or self-hosted Jina instances
43+
# QMD_JINA_BASE_URL=https://api.jina.ai/v1
44+
45+
# Max inputs per HTTP request (default: 128; Jina accepts up to 2048)
46+
# QMD_JINA_BATCH=128
47+
48+
# Parallel HTTP requests when a batch is split across multiple calls (default: 4)
49+
# QMD_JINA_CONCURRENCY=4
50+
51+
# Per-request timeout in milliseconds (default: 60000)
52+
# QMD_JINA_TIMEOUT_MS=60000
53+
54+
# Retry attempts on 429/5xx/network errors (default: 4)
55+
# QMD_JINA_MAX_RETRIES=4
56+
57+
# =============================================================================
58+
# Quota tracking (optional)
59+
# =============================================================================
60+
61+
# Set to your monthly token quota to enable quota warnings in `qmd status`
62+
# and `qmd usage`. Accepts plain digits or K/M/B suffixes.
63+
# QMD_JINA_QUOTA=1B
64+
65+
# Rolling window used to evaluate the quota (24h | 7d | 30d | all)
66+
# Default: 30d (matches typical monthly billing cycles)
67+
# QMD_JINA_QUOTA_WINDOW=30d
68+
69+
# Warn threshold as percent of quota (default: 80)
70+
# QMD_JINA_WARN_PCT=80
71+
72+
# =============================================================================
73+
# Local model overrides (no remote provider required)
74+
# =============================================================================
75+
76+
# Override the local embedding model (HuggingFace URI)
77+
# QMD_EMBED_MODEL=hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf
78+
79+
# Override the local rerank / generation models
80+
# QMD_RERANK_MODEL=hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf
81+
# QMD_GENERATE_MODEL=hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf
82+
83+
# Local embed context window (default: 2048)
84+
# QMD_EMBED_CONTEXT_SIZE=2048
85+
86+
# Force CPU even if GPU is available (for debugging)
87+
# QMD_LLAMA_GPU=false
88+
89+
# Model cache directory (default: ~/.cache/qmd/models)
90+
# XDG_CACHE_HOME=~/.cache
91+
92+
# =============================================================================
93+
# Misc
94+
# =============================================================================
95+
96+
# Editor URI template for clickable terminal links
97+
# QMD_EDITOR_URI=vscode://file/{path}:{line}:{col}
98+
99+
# Disable coloured output
100+
# NO_COLOR=1
101+
102+
# Disable all LLM operations (for CI)
103+
# CI=true

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,12 @@ texts/
1818
finetune/outputs/
1919
finetune/data/train/
2020
.claude/
21+
22+
# Secrets — NEVER commit these
23+
.env
24+
.env.*
25+
!.env.example
26+
*.key
27+
*.pem
28+
secrets/
29+
credentials.json

CHANGELOG.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,96 @@
22

33
## [Unreleased]
44

5+
- Remote providers: Jina AI. QMD can now delegate embedding and/or
6+
reranking to the Jina cloud API instead of loading local GGUF models.
7+
Enable independently via `QMD_EMBED_PROVIDER=jina`,
8+
`QMD_RERANK_PROVIDER=jina`, or both. All paths share a single
9+
`JINA_API_KEY`. Query expansion still runs locally.
10+
- Embedding defaults to `jina-embeddings-v3` at 1024 dimensions with
11+
Matryoshka truncation support (32/64/128/256/512/1024). Batches
12+
inputs with bounded concurrency, retries 429/5xx with exponential
13+
backoff + jitter, honours `Retry-After`, and uses an 8192-token
14+
context window (vs 2048 for embeddinggemma).
15+
- Reranking defaults to `jina-reranker-v2-base-multilingual`. Same
16+
retry semantics. Deduplicates identical document texts before the
17+
API call to save tokens, then fans scores back to all matching
18+
indices.
19+
- Per-index YAML config: specify `models.embed: "jina:<model>"` or
20+
`models.rerank: "jina:<model>"` in `~/.config/qmd/<index>.yml`
21+
to pin a particular index to remote providers. Use `--index` to
22+
switch between indexes with different providers.
23+
- Within a single index, all collections must share the same vector
24+
dimension (the `vectors_vec` table is fixed-width). To mix
25+
providers, use separate indexes.
26+
- Usage tracking: every successful Jina API call records token
27+
consumption to a new `jina_usage` SQLite table. New command
28+
`qmd usage` shows rolling 24h/7d/30d/all-time totals and a
29+
per-operation breakdown. `qmd usage reset` clears the history.
30+
`qmd usage --json` emits a stable `qmd.usage.v1` payload for
31+
scripting/alerting.
32+
- Quota warnings: set `QMD_JINA_QUOTA=1B` (or `500M`, `10k`) to
33+
enable rolling-window quota tracking. `qmd status` and `qmd usage`
34+
colour-code severity: `ok` (green), `warn` (≥80%, yellow),
35+
`critical` (≥95%, red), `over` (>100%, red). Window configurable
36+
via `QMD_JINA_QUOTA_WINDOW` (`24h` | `7d` | `30d` | `all`,
37+
default `30d` to match typical monthly plans). Threshold
38+
configurable via `QMD_JINA_WARN_PCT` (default `80`). The
39+
`--json` output always includes the `severity` field so
40+
downstream alerting (e.g. CI gates) can react without parsing
41+
human text.
42+
- Benchmark command: `qmd bench jina` runs a reproducible
43+
latency + throughput benchmark measuring `embed_single`,
44+
`embed_batch`, and `rerank` stages on local and/or Jina backends
45+
with synthetic documents. Reports median/p95 latency and
46+
throughput per stage, plus a side-by-side comparison table with
47+
a speedup ratio and a winner per stage. Flags: `--size`,
48+
`--doc-len`, `--provider {local|jina|both}`, `--runs` (repeat the
49+
full workload N times to reduce noise, up to 100), `--skip-rerank`,
50+
`--skip-single`, `--json`. With `--runs > 1` the report includes
51+
mean / stddev / min / max across samples; high-variance stages
52+
(stddev > 20% of median) are highlighted in yellow. Useful for
53+
answering "is Jina actually faster for my machine and network?"
54+
before committing to a provider switch.
55+
- CSV export for `qmd usage --csv` emits the operation breakdown
56+
in spreadsheet-friendly format (header row + one row per
57+
`(operation, model)` pair). Use `--json` for the full payload
58+
including totals and quota state.
59+
- Daily histogram: `qmd usage chart` renders an ASCII bar chart of
60+
daily token consumption for the last N days (default 30, UTC),
61+
with bars scaled to the peak day. Colour-codes high-usage days,
62+
shows blank bars for days with zero usage, and prints a footer
63+
with total / active-days / peak / average-per-active-day. Flags:
64+
`--days <n>` (1-365), `--json` (emits `qmd.usage.chart.v1`).
65+
66+
- Secrets hygiene.
67+
- `.env` file support: QMD auto-loads `.env` from the cwd (or any
68+
parent directory up to 5 levels) on every CLI invocation. Zero-dep
69+
parser supports quoted/escaped values, `export` prefix, inline
70+
comments, and BOM. Shell env vars still win so per-command
71+
overrides work. Set `QMD_ENV_FILE=/path` for a custom location.
72+
- `.env.example` template ships all Jina / quota / local-model
73+
env vars with inline documentation. `cp .env.example .env`
74+
and edit to get started safely.
75+
- `.gitignore` now excludes `.env`, `.env.*`, `*.key`, `*.pem`,
76+
`secrets/`, and `credentials.json` so common mistakes never
77+
reach git.
78+
- `scripts/scan-secrets.sh` detects leaked API keys for Jina,
79+
OpenAI, Anthropic, Voyage, Cohere, GitHub tokens, AWS access
80+
keys, and PEM private key blocks. Auto-installed as a
81+
pre-commit hook via `./scripts/install-hooks.sh`. Modes:
82+
staged (default), `--tracked`, `--all <path>`. Reports
83+
`file:line` without re-leaking the key in output.
84+
- `qmd pull` skips models served by remote providers (no local GGUF
85+
to fetch). `qmd status` displays each backend with `[remote]` or
86+
`[local]` tags, and surfaces init errors instead of silently
87+
falling back.
88+
- New env vars: `QMD_EMBED_PROVIDER`, `QMD_RERANK_PROVIDER`,
89+
`QMD_JINA_MODEL`, `QMD_JINA_RERANK_MODEL`, `QMD_JINA_DIMENSION`,
90+
`QMD_JINA_BASE_URL`, `QMD_JINA_BATCH`, `QMD_JINA_CONCURRENCY`,
91+
`QMD_JINA_TIMEOUT_MS`, `QMD_JINA_MAX_RETRIES`.
92+
- Switching embedding providers requires `qmd embed -f` because
93+
vector dimensions differ between providers.
94+
595
## [2.1.0] - 2026-04-05
696

797
Code files now chunk at function and class boundaries via tree-sitter,

0 commit comments

Comments
 (0)