Problem
GTE-Large FP16 (~670 MB, 512-token context, English-centric) is outperformed by Qwen3-Embedding-0.6B as of April 2026:
| Model |
MTEB avg |
Context |
VRAM |
| GTE-Large FP16 |
63.1 |
512 tok |
~670 MB |
| Qwen3-Embedding-0.6B |
68.4 |
8192 tok |
~360 MB |
Qwen3-Embed-0.6B is smaller, better, and multilingual — relevant for Italian lecture notes + English whitepapers. The 8192-token context also eliminates the child-chunk truncation issue (R4) entirely.
Fix
When `@qvac/sdk ≥ 1.0` ships `QWEN3_EMBEDDING_0_6B`, update `QVAC_EMB_SRC` in `workers/qvac-service/src/models.js`. Re-ingest all documents after the switch.
Effort: Small (config change + re-ingest)
Impact: High — direct quality improvement, smaller memory footprint
Problem
GTE-Large FP16 (~670 MB, 512-token context, English-centric) is outperformed by Qwen3-Embedding-0.6B as of April 2026:
Qwen3-Embed-0.6B is smaller, better, and multilingual — relevant for Italian lecture notes + English whitepapers. The 8192-token context also eliminates the child-chunk truncation issue (R4) entirely.
Fix
When `@qvac/sdk ≥ 1.0` ships `QWEN3_EMBEDDING_0_6B`, update `QVAC_EMB_SRC` in `workers/qvac-service/src/models.js`. Re-ingest all documents after the switch.
Effort: Small (config change + re-ingest)
Impact: High — direct quality improvement, smaller memory footprint