Q4 — Embedding model bottleneck: GTE-Large FP16 → Qwen3-Embedding-0.6B

## Problem

GTE-Large FP16 (~670 MB, 512-token context, English-centric) is outperformed by Qwen3-Embedding-0.6B as of April 2026:

| Model | MTEB avg | Context | VRAM |
|---|---|---|---|
| GTE-Large FP16 | 63.1 | 512 tok | ~670 MB |
| Qwen3-Embedding-0.6B | 68.4 | 8192 tok | ~360 MB |

Qwen3-Embed-0.6B is **smaller, better, and multilingual** — relevant for Italian lecture notes + English whitepapers. The 8192-token context also eliminates the child-chunk truncation issue (R4) entirely.

## Fix

When \`@qvac/sdk ≥ 1.0\` ships \`QWEN3_EMBEDDING_0_6B\`, update \`QVAC_EMB_SRC\` in \`workers/qvac-service/src/models.js\`. Re-ingest all documents after the switch.

**Effort:** Small (config change + re-ingest)  
**Impact:** High — direct quality improvement, smaller memory footprint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q4 — Embedding model bottleneck: GTE-Large FP16 → Qwen3-Embedding-0.6B #115

Problem

Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	MTEB avg	Context	VRAM
GTE-Large FP16	63.1	512 tok	~670 MB
Qwen3-Embedding-0.6B	68.4	8192 tok	~360 MB

Q4 — Embedding model bottleneck: GTE-Large FP16 → Qwen3-Embedding-0.6B #115

Description

Problem

Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions