Skip to content

Commit fb8e040

Browse files
dzmitrys-devclaude
andcommitted
fix(14-A): write chunk text under payload['document'] to match retrieval contract
Production retrieval (`tuned_hybrid.py`) reads chunk body via `payload.get("document")`. Phase 14-A's `ingest()` was writing `payload["text"]` instead, so every bench query returned chunk objects with empty `.text` attributes — the scoped/unscoped passes ran end-to-end but measured question-only token counts (`tpca ≈ input_tokens_p50`) instead of retrieval-augmented context. Discovered by direct Qdrant payload scroll comparison after the first full bench run produced a deceptive-looking PASS (tpca=25.94 vs gate ≤962.2) with `recall_at_5=0.0` across all 470 records and 5 axes — classic too-good-to-be-true. Add an explicit anti-regression assertion in `test_ingest_upserts_one_point_per_haystack_turn`: payload MUST contain "document" and MUST NOT contain "text". Locks the contract so future edits can't silently re-introduce the field-name drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d6de38c commit fb8e040

2 files changed

Lines changed: 17 additions & 3 deletions

File tree

src/supamem/eval/longmemeval_ingest.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,8 +209,13 @@ def ingest(
209209
_SPARSE_VECTOR_NAME: sparse_vec,
210210
},
211211
payload={
212+
# Production retrieval (`tuned_hybrid.py`) reads chunk text
213+
# from `payload["document"]`. Using "text" here makes
214+
# retrieved chunks have an empty `.text` attribute — the
215+
# bench scoped/unscoped passes then measure nothing
216+
# meaningful. Match the production contract.
212217
"session_id": sid,
213-
"text": text,
218+
"document": text,
214219
"axis": axis,
215220
},
216221
)

tests/test_longmemeval_ingest.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ def test_ingest_payload_index_idempotent(patch_embedders: None) -> None:
196196

197197

198198
def test_ingest_upserts_one_point_per_haystack_turn(patch_embedders: None) -> None:
199-
"""2 sessions × 3 turns → exactly 6 upserted points; payloads carry session_id + text."""
199+
"""2 sessions × 3 turns → exactly 6 upserted points; payloads carry session_id + document."""
200200
client = MagicMock()
201201
client.get_collections.return_value = MagicMock(collections=[])
202202

@@ -231,7 +231,16 @@ def test_ingest_upserts_one_point_per_haystack_turn(patch_embedders: None) -> No
231231
payload = getattr(pt, "payload", None)
232232
assert payload is not None
233233
assert "session_id" in payload
234-
assert "text" in payload
234+
# Production retrieval contract: chunk text MUST live under
235+
# `payload["document"]` so `tuned_hybrid.py` reads it via
236+
# `payload.get("document")`. Writing to `payload["text"]` would
237+
# make retrieved chunks have empty `.text` attributes and the
238+
# bench would silently measure question-only token counts.
239+
assert "document" in payload
240+
assert "text" not in payload, (
241+
"ingest must write chunk text under payload['document'], not 'text', "
242+
"to match the production retrieval contract (tuned_hybrid)."
243+
)
235244
assert payload["session_id"] in {"sess-A", "sess-B"}
236245

237246

0 commit comments

Comments
 (0)