Skip to content

Commit 99a53e3

Browse files
vlaskymceachen
authored andcommitted
Normalize MMR diversity term for consistent behavior across distance metrics
The MMR relevance term is normalized to [0,1] by dividing by max_dist, but the diversity term used raw distances (1 - d). For cosine distance this works fine since values are already in [0,1], but for L2 and L1 where distances are unbounded, the two terms in the MMR score operated on different scales, making mmr_lambda behave unpredictably. Normalize the diversity term the same way (d / max_dist) so both terms are on a consistent [0,1] scale regardless of the distance metric.
1 parent a62ffd7 commit 99a53e3

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

sqlite-vec.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7624,7 +7624,7 @@ static int vec0_mmr_rerank(
76247624
for (i64 j = 0; j < step; j++) {
76257625
f32 d = vec0_compute_distance(vector_column,
76267626
vectors[i], out_vectors[j]);
7627-
f32 sim = 1.0f - d;
7627+
f32 sim = 1.0f - (d / max_dist);
76287628
if (sim > max_sim) max_sim = sim;
76297629
}
76307630

tests/__snapshots__/test-knn-mmr.ambr

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -192,14 +192,14 @@
192192
'rowid': 1,
193193
'distance': 0.0,
194194
}),
195-
OrderedDict({
196-
'rowid': 3,
197-
'distance': 0.028284257277846336,
198-
}),
199195
OrderedDict({
200196
'rowid': 2,
201197
'distance': 0.014142128638923168,
202198
}),
199+
OrderedDict({
200+
'rowid': 5,
201+
'distance': 1.4142135381698608,
202+
}),
203203
]),
204204
})
205205
# ---

0 commit comments

Comments
 (0)