pluginmd
diff --git a/‎.env.example‎
Lines changed: 103 additions & 0 deletions b/‎.env.example‎
Lines changed: 103 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 9 additions & 0 deletions b/‎.gitignore‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 90 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 90 additions & 0 deletions
@@ -0,0 +1,103 @@
+# QMD environment configuration template
+#
+# HOW TO USE:
+#   1. Copy this file: cp .env.example .env
+#   2. Edit .env with your real values
+#   3. Source it before running qmd: `set -a && source .env && set +a`
+#      (or use direnv / dotenv-cli / your shell's env loader)
+#
+# The .env file is GITIGNORED — safe to put real secrets in.
+# NEVER edit this .env.example file with real values.
+#
+# If you accidentally commit a key: rotate it IMMEDIATELY at the provider.
+
+# =============================================================================
+# Jina AI remote providers (optional)
+# =============================================================================
+
+# Activate Jina as the embedding backend.
+# When set, embed()/embedBatch() route to the Jina cloud API instead of a local
+# GGUF model. Local generation and reranking still work.
+# QMD_EMBED_PROVIDER=jina
+
+# Activate Jina as the reranker backend.
+# QMD_RERANK_PROVIDER=jina
+
+# Jina API key — required when either provider above is set.
+# Get one at https://jina.ai/
+# JINA_API_KEY=jina_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+
+# --- Optional Jina overrides ---
+
+# Embedding model (default: jina-embeddings-v3)
+# QMD_JINA_MODEL=jina-embeddings-v3
+
+# Output dimensions. v3 supports Matryoshka truncation: 32/64/128/256/512/1024
+# (default: 1024)
+# QMD_JINA_DIMENSION=1024
+
+# Reranker model (default: jina-reranker-v2-base-multilingual)
+# QMD_JINA_RERANK_MODEL=jina-reranker-v2-base-multilingual
+
+# API base URL — override for proxies or self-hosted Jina instances
+# QMD_JINA_BASE_URL=https://api.jina.ai/v1
+
+# Max inputs per HTTP request (default: 128; Jina accepts up to 2048)
+# QMD_JINA_BATCH=128
+
+# Parallel HTTP requests when a batch is split across multiple calls (default: 4)
+# QMD_JINA_CONCURRENCY=4
+
+# Per-request timeout in milliseconds (default: 60000)
+# QMD_JINA_TIMEOUT_MS=60000
+
+# Retry attempts on 429/5xx/network errors (default: 4)
+# QMD_JINA_MAX_RETRIES=4
+
+# =============================================================================
+# Quota tracking (optional)
+# =============================================================================
+
+# Set to your monthly token quota to enable quota warnings in `qmd status`
+# and `qmd usage`. Accepts plain digits or K/M/B suffixes.
+# QMD_JINA_QUOTA=1B
+
+# Rolling window used to evaluate the quota (24h | 7d | 30d | all)
+# Default: 30d (matches typical monthly billing cycles)
+# QMD_JINA_QUOTA_WINDOW=30d
+
+# Warn threshold as percent of quota (default: 80)
+# QMD_JINA_WARN_PCT=80
+
+# =============================================================================
+# Local model overrides (no remote provider required)
+# =============================================================================
+
+# Override the local embedding model (HuggingFace URI)
+# QMD_EMBED_MODEL=hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf
+
+# Override the local rerank / generation models
+# QMD_RERANK_MODEL=hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf
+# QMD_GENERATE_MODEL=hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf
+
+# Local embed context window (default: 2048)
+# QMD_EMBED_CONTEXT_SIZE=2048
+
+# Force CPU even if GPU is available (for debugging)
+# QMD_LLAMA_GPU=false
+
+# Model cache directory (default: ~/.cache/qmd/models)
+# XDG_CACHE_HOME=~/.cache
+
+# =============================================================================
+# Misc
+# =============================================================================
+
+# Editor URI template for clickable terminal links
+# QMD_EDITOR_URI=vscode://file/{path}:{line}:{col}
+
+# Disable coloured output
+# NO_COLOR=1
+
+# Disable all LLM operations (for CI)
+# CI=true
@@ -18,3 +18,12 @@ texts/
 finetune/outputs/
 finetune/data/train/
 .claude/
+
+# Secrets — NEVER commit these
+.env
+.env.*
+!.env.example
+*.key
+*.pem
+secrets/
+credentials.json
@@ -2,6 +2,96 @@
 
 ## [Unreleased]
 
+- Remote providers: Jina AI. QMD can now delegate embedding and/or
+  reranking to the Jina cloud API instead of loading local GGUF models.
+  Enable independently via `QMD_EMBED_PROVIDER=jina`,
+  `QMD_RERANK_PROVIDER=jina`, or both. All paths share a single
+  `JINA_API_KEY`. Query expansion still runs locally.
+  - Embedding defaults to `jina-embeddings-v3` at 1024 dimensions with
+    Matryoshka truncation support (32/64/128/256/512/1024). Batches
+    inputs with bounded concurrency, retries 429/5xx with exponential
+    backoff + jitter, honours `Retry-After`, and uses an 8192-token
+    context window (vs 2048 for embeddinggemma).
+  - Reranking defaults to `jina-reranker-v2-base-multilingual`. Same
+    retry semantics. Deduplicates identical document texts before the
+    API call to save tokens, then fans scores back to all matching
+    indices.
+  - Per-index YAML config: specify `models.embed: "jina:<model>"` or
+    `models.rerank: "jina:<model>"` in `~/.config/qmd/<index>.yml`
+    to pin a particular index to remote providers. Use `--index` to
+    switch between indexes with different providers.
+  - Within a single index, all collections must share the same vector
+    dimension (the `vectors_vec` table is fixed-width). To mix
+    providers, use separate indexes.
+  - Usage tracking: every successful Jina API call records token
+    consumption to a new `jina_usage` SQLite table. New command
+    `qmd usage` shows rolling 24h/7d/30d/all-time totals and a
+    per-operation breakdown. `qmd usage reset` clears the history.
+    `qmd usage --json` emits a stable `qmd.usage.v1` payload for
+    scripting/alerting.
+  - Quota warnings: set `QMD_JINA_QUOTA=1B` (or `500M`, `10k`) to
+    enable rolling-window quota tracking. `qmd status` and `qmd usage`
+    colour-code severity: `ok` (green), `warn` (≥80%, yellow),
+    `critical` (≥95%, red), `over` (>100%, red). Window configurable
+    via `QMD_JINA_QUOTA_WINDOW` (`24h` | `7d` | `30d` | `all`,
+    default `30d` to match typical monthly plans). Threshold
+    configurable via `QMD_JINA_WARN_PCT` (default `80`). The
+    `--json` output always includes the `severity` field so
+    downstream alerting (e.g. CI gates) can react without parsing
+    human text.
+  - Benchmark command: `qmd bench jina` runs a reproducible
+    latency + throughput benchmark measuring `embed_single`,
+    `embed_batch`, and `rerank` stages on local and/or Jina backends
+    with synthetic documents. Reports median/p95 latency and
+    throughput per stage, plus a side-by-side comparison table with
+    a speedup ratio and a winner per stage. Flags: `--size`,
+    `--doc-len`, `--provider {local|jina|both}`, `--runs` (repeat the
+    full workload N times to reduce noise, up to 100), `--skip-rerank`,
+    `--skip-single`, `--json`. With `--runs > 1` the report includes
+    mean / stddev / min / max across samples; high-variance stages
+    (stddev > 20% of median) are highlighted in yellow. Useful for
+    answering "is Jina actually faster for my machine and network?"
+    before committing to a provider switch.
+  - CSV export for `qmd usage --csv` emits the operation breakdown
+    in spreadsheet-friendly format (header row + one row per
+    `(operation, model)` pair). Use `--json` for the full payload
+    including totals and quota state.
+  - Daily histogram: `qmd usage chart` renders an ASCII bar chart of
+    daily token consumption for the last N days (default 30, UTC),
+    with bars scaled to the peak day. Colour-codes high-usage days,
+    shows blank bars for days with zero usage, and prints a footer
+    with total / active-days / peak / average-per-active-day. Flags:
+    `--days <n>` (1-365), `--json` (emits `qmd.usage.chart.v1`).
+
+- Secrets hygiene.
+  - `.env` file support: QMD auto-loads `.env` from the cwd (or any
+    parent directory up to 5 levels) on every CLI invocation. Zero-dep
+    parser supports quoted/escaped values, `export` prefix, inline
+    comments, and BOM. Shell env vars still win so per-command
+    overrides work. Set `QMD_ENV_FILE=/path` for a custom location.
+  - `.env.example` template ships all Jina / quota / local-model
+    env vars with inline documentation. `cp .env.example .env`
+    and edit to get started safely.
+  - `.gitignore` now excludes `.env`, `.env.*`, `*.key`, `*.pem`,
+    `secrets/`, and `credentials.json` so common mistakes never
+    reach git.
+  - `scripts/scan-secrets.sh` detects leaked API keys for Jina,
+    OpenAI, Anthropic, Voyage, Cohere, GitHub tokens, AWS access
+    keys, and PEM private key blocks. Auto-installed as a
+    pre-commit hook via `./scripts/install-hooks.sh`. Modes:
+    staged (default), `--tracked`, `--all <path>`. Reports
+    `file:line` without re-leaking the key in output.
+  - `qmd pull` skips models served by remote providers (no local GGUF
+    to fetch). `qmd status` displays each backend with `[remote]` or
+    `[local]` tags, and surfaces init errors instead of silently
+    falling back.
+  - New env vars: `QMD_EMBED_PROVIDER`, `QMD_RERANK_PROVIDER`,
+    `QMD_JINA_MODEL`, `QMD_JINA_RERANK_MODEL`, `QMD_JINA_DIMENSION`,
+    `QMD_JINA_BASE_URL`, `QMD_JINA_BATCH`, `QMD_JINA_CONCURRENCY`,
+    `QMD_JINA_TIMEOUT_MS`, `QMD_JINA_MAX_RETRIES`.
+  - Switching embedding providers requires `qmd embed -f` because
+    vector dimensions differ between providers.
+
 ## [2.1.0] - 2026-04-05
 
 Code files now chunk at function and class boundaries via tree-sitter,