You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(core): record entity-boost benchmark findings; keep default off
Benchmarked the #951 entity-aware ranking boost against the LoCoMo retrieval
suite (hybrid mode) and a hand-built adversarial corpus.
LoCoMo is insensitive to the boost: sweeping the weight across
0.15/0.3/0.5/1.0/2.0 produced identical recall@5, recall@10, MRR, and
content-hit at every point (no query reordered, no score changed). LoCoMo docs
are keyed by session id and expose speaker names only in body text, never as
entity titles or relation names, so the title/relation-matching boost never
fires there.
An adversarial check found a real regression mode: Title-Case queries inject
spurious entity terms. 'What Is The Plan For Q3' extracts 'Q3' and, even at
weight 0.15, promotes a literal-'Q3' document over the more relevant 'third
quarter' document. Clean proper nouns (Katze) work; lowercase-leading
identifiers (getUserById) are correctly ignored.
Decision: keep search_entity_boost_enabled default off and the weight at 0.15.
LoCoMo provides no signal to raise the weight, and the adversarial check is not
clean. Document the findings and guidance; no code/default changes.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
Copy file name to clipboardExpand all lines: docs/semantic-search.md
+35-3Lines changed: 35 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -107,7 +107,7 @@ All settings are fields on `BasicMemoryConfig` and can be set via environment va
107
107
|`semantic_embedding_document_input_type`|`BASIC_MEMORY_SEMANTIC_EMBEDDING_DOCUMENT_INPUT_TYPE`| Auto for known LiteLLM models | Optional LiteLLM `input_type` for indexed document/passages. |
108
108
|`semantic_embedding_query_input_type`|`BASIC_MEMORY_SEMANTIC_EMBEDDING_QUERY_INPUT_TYPE`| Auto for known LiteLLM models | Optional LiteLLM `input_type` for search queries. |
109
109
|`semantic_vector_k`|`BASIC_MEMORY_SEMANTIC_VECTOR_K`|`100`| Candidate count for vector nearest-neighbour retrieval. Higher values improve recall at the cost of latency. |
110
-
|`search_entity_boost_enabled`|`BASIC_MEMORY_SEARCH_ENTITY_BOOST_ENABLED`|`false`| Enable the entity-aware ranking boost in hybrid search (see below). Default off pending benchmark validation. |
110
+
|`search_entity_boost_enabled`|`BASIC_MEMORY_SEARCH_ENTITY_BOOST_ENABLED`|`false`| Enable the entity-aware ranking boost in hybrid search (see below). Default off: benchmark-validated as inert on LoCoMo and prone to Title-Case false positives. |
111
111
|`search_entity_boost_weight`|`BASIC_MEMORY_SEARCH_ENTITY_BOOST_WEIGHT`|`0.15`| Per-matched-term multiplier strength for the entity boost. A candidate matching N query entity terms is scaled by `1 + weight * min(N, max_terms)`. |
112
112
|`search_entity_boost_max_terms`|`BASIC_MEMORY_SEARCH_ENTITY_BOOST_MAX_TERMS`|`3`| Maximum number of distinct matched entity terms that contribute to the boost, bounding the multiplier. |
0 commit comments