You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: EMBEDDINGS.md
+14-11Lines changed: 14 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,14 +68,15 @@ This option runs embedding models directly on your machine using the library.
68
68
69
69
### Recommended Models
70
70
71
-
These are based on MTEB [datasets](https://huggingface.co/datasets/mteb/results) as of 13-Jun-2026.
71
+
These are based on MTEB [datasets](https://huggingface.co/datasets/mteb/results) as of 15-Jun-2026. All listed models have been verified to work with the `sentence-transformers` provider in `cocoindex-code`.
|**High**|[`geevec-ai/geevec-embeddings-1.0-lite`](https://huggingface.co/geevec-ai/geevec-embeddings-1.0-lite)| 366M |**0.92**| Maximum local accuracy (needs GPU for speed). |
|**Micro**|[`lightonai/LateOn-Code-edge`](https://huggingface.co/lightonai/LateOn-Code-edge)| 17M | 0.82 |**Efficiency King.** Incredible code performance for its size. |
77
+
|**Small**|[`lightonai/LateOn-Code`](https://huggingface.co/lightonai/LateOn-Code)| 149M | 0.85 | Great balance of speed and accuracy on modern laptops. |
78
+
|**Medium**|[`microsoft/harrier-oss-v1-270m`](https://huggingface.co/microsoft/harrier-oss-v1-270m)| 270M |**0.90**|**Performance sweet spot.** High accuracy, runs well on CPUs. |
79
+
|**Multi**|[`ibm-granite/granite-embedding-97m-multilingual-r2`](https://huggingface.co/ibm-granite/granite-embedding-97m-multilingual-r2)| 97M | 0.80 | Multilingual codebases (e.g. Code + Docs in different languages). |
79
80
80
81
#### Other Model Options
81
82
@@ -190,8 +191,8 @@ envs:
190
191
191
192
## Choosing Based on Your Content
192
193
193
-
- **Heavy Source Code**: Use **Jina v5 Nano** (Local) or **Voyage 4 Large** (Cloud). Both score >0.90 on code search benchmarks.
194
-
- **Large Documentation / Files**: Models with large context windows (8k+ tokens) like **Jina v5** (32k) or **OpenAI v3 Large** (8k).
194
+
- **Heavy Source Code**: Use **LateOn-Code** (Micro/Small) or **Harrier 270m** (Medium). Both score >0.85 on code search benchmarks.
195
+
- **Large Documentation / Files**: Models with large context windows like **Voyage 4 Large** (Cloud) or **OpenAI v3 Large** (8k).
0 commit comments