Skip to content

Commit 5c151dd

Browse files
vlaskypunkish
andcommitted
Fix incomplete documentation in KNN and Matryoshka guides
- Complete unfinished sentence in KNN docs describing manual method trade-offs (slower, more space, but more flexible) - Fill in TODO placeholders in Matryoshka docs with paper date, title, and explanation of the naming origin (Russian nesting dolls) Cherry-picked from upstream PRs asg017#208 and asg017#209 Co-Authored-By: punkish <punkish@users.noreply.github.com>
1 parent c6f5b56 commit 5c151dd

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

site/features/knn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Currently there are two ways to to perform KNN queries with `sqlite-vec`:
77
With `vec0` virtual tables and "manually" with regular tables.
88

99
The `vec0` virtual table is faster and more compact, but is less flexible and requires `JOIN`s back to your source tables.
10-
The "manual" method is more flexible and
10+
The "manual" method is more flexible and allows for more granular queries, but may be slower and use more space.
1111

1212

1313

site/guides/matryoshka.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Matryoshka (Adaptive-Length) Embeddings
22

33
Matryoshka embeddings are a new class of embedding models introduced in the
4-
TODO-YYY paper [_TODO title_](https://arxiv.org/abs/2205.13147). They allow one
4+
26 May 2022 paper titled [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147). They allow one
55
to truncate excess dimensions in large vector, without sacrificing much quality.
66

77
Let's say your embedding model generate 1024-dimensional vectors. If you have 1
@@ -16,7 +16,7 @@ Matryoshka embeddings, on the other hand, _can_ be truncated, without losing muc
1616
quality. Using [`mixedbread.ai`](#TODO) `mxbai-embed-large-v1` model, they claim
1717
that
1818

19-
They are called "Matryoshka" embeddings because ... TODO
19+
They are called "Matryoshka" embeddings after the "Matryoshka dolls", also known as "Russian nesting dolls", which are a set of wooden dolls of decreasing size that are placed inside one another. In a similar way, Matryoshka embedding can store more important information in earlier dimensions, and less important information in later dimensions. See more about Matryoshka embeddings at [Hugging Face](https://huggingface.co/blog/matryoshka)
2020

2121
## Matryoshka Embeddings with `sqlite-vec`
2222

0 commit comments

Comments
 (0)