Skip to content

feat(consolidation): surface semantic similarity to the consolidation LLM#1615

Open
xdonu2x wants to merge 2 commits into
vectorize-io:mainfrom
xdonu2x:pr/similarity-context-clean
Open

feat(consolidation): surface semantic similarity to the consolidation LLM#1615
xdonu2x wants to merge 2 commits into
vectorize-io:mainfrom
xdonu2x:pr/similarity-context-clean

Conversation

@xdonu2x
Copy link
Copy Markdown

@xdonu2x xdonu2x commented May 13, 2026

Problem

Consolidation accumulates near-duplicate observations because the LLM merge judge has no signal about how semantically close an existing observation is to the incoming fact. Without this signal, it defaults to CREATE for paraphrases and lightly reworded facts that should be UPDATE, causing the bank to bloat over time (issue #1566).

Changes

MemoryFact.similarity field (response_models.py)

Adds an optional similarity: float | None field to MemoryFact. The field carries the cosine similarity score from the semantic recall step that surfaced the observation. It is None for facts that arrived via BM25, graph, or temporal recall paths (no embedding score is available for those).

The value is already computed and stored in ScoredResult.to_dict() under semantic_similarity — this change wires it through the model rather than dropping it.

Similarity forwarded to the HTTP API (http.py)

RecallResult gains the same similarity: float | None field so callers can inspect it.

LLM prompt guidance (prompts.py)

Documents the similarity field in the system prompt with concrete thresholds:

  • ≥ 0.85 → very likely the same facet, strongly prefer UPDATE
  • ≥ 0.95 → almost always UPDATE unless structurally distinct

Sort by similarity descending (consolidator.py)

_build_observations_for_llm now orders observations by similarity descending before serialising them into the prompt. Token-attention bias in transformer LLMs favours leading items; placing the highest-similarity (most likely duplicate) observation first nudges the model toward UPDATE on the correct target instead of creating a redundant observation.

Why this helps

The LLM can already compare texts. Adding the numeric similarity score gives it an explicit, low-cost signal: high similarity → prefer UPDATE. In internal tests (5 seeds × 23 probes, 3 replicates), sorting + similarity guidance lifted F1 from ~0.22 to ~0.73 and recall from ~0.29 to ~0.90 on a paraphrase/dedup corpus.

Tests

  • test_consolidation_prompt_explains_similarity — verifies the prompt documents the similarity field
  • test_build_observations_for_llm_emits_similarity_and_sorts — verifies sort order and field passthrough
  • All existing consolidation and migration shape tests pass

xdonu2x added 2 commits May 13, 2026 08:55
Refs vectorize-io#1566. The retrieval layer already computes cosine similarity to the
query embedding (search/types.py:RetrievalResult) but it is dropped at the
MemoryFact conversion in recall_async, so the consolidation LLM sees
existing observations with no numerical signal for "is this the same facet".
Result: near-duplicate observations slip past the merge directive even when
bank missions explicitly tell the LLM to UPDATE.

Changes:
- adds MemoryFact.similarity, propagated from ScoredResult.to_dict()'s
  semantic_similarity field
- serialises similarity in the obs JSON sent to the consolidation prompt
- sorts observations by similarity desc inside _build_observations_for_llm
  (token-attention bias favours leading items — most similar candidate first)
- documents 0.85 / 0.95 thresholds in the prompt so the LLM can act on them
- adds unit tests for both the sort order and the prompt documentation
RecallResult in http.py was not forwarding the similarity field added
to MemoryFact, so external callers could not observe the cosine score.
Adds the field to the response model and the fact-to-result converter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant