You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: calibration uses domino cascade instead of collapsed MatVec
Removed the hash tokenizer fallback from the hot path — it never worked.
Both lenses now use real BPE (XLM-RoBERTa for Jina, Qwen2 for Reranker).
Calibration rewritten three times this session:
1. avg pairwise centroid distance → ρ=0.07 (no engine, just lookup noise)
2. MatVec think cycle → cos=1.000 for ALL pairs (attractor collapse)
3. Domino cascade 3σ focus → actual differentiation (0.000-0.389)
but ρ=-0.57 (anti-correlated with human judgment)
Diagnosis: 256-centroid codebook is too coarse for text similarity.
Short texts share structural tokens (articles, prepositions) that
map to identical centroids, creating spurious overlap.
The engine differentiates — the codebook doesn't. Next steps:
- 4096-centroid codebook (full L3 table, 16 MB)
- Or: per-role tables (attn_k cos range is wider than token_embd)
- Or: Jina v5 ONNX ground truth to replace expert-assigned scores
https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A
0 commit comments