Skip to content

Commit 9c94cbc

Browse files
earayuclaude
andauthored
test(compat): task #61 P1 — bulk_upsert_entity_with_lineage_parts cross-backend (#1927)
* test(compat): task #61 P1 — bulk_upsert_entity_with_lineage_parts cross-backend PM @不穷 elevated this Protocol method as a P0 audit gap (msg=10b753e8). Until now ``bulk_upsert_entity_with_lineage_parts`` (Wave 8 W8-2) had no cross-backend test in `tests/integration/compat/`, even though all three production backends (Postgres / Neo4j / Nebula) implement it and the indexing worker uses it for the LineageEntityMerger merge step. Bulk write paths are exactly where backend differences emerge — batch size limits, transaction atomicity, error handling, dedup contract — and the lack of a parametrized matrix here meant any silent drift in the bulk semantics would survive merge. This adds 7 new parametrized cases that pin the Protocol contract declared in `aperag/indexing/graph.py:575+`: * empty parts is a no-op (no implicit row creation) * mixed-name parts raise ValueError (atomicity guarantee) * round-trip: 3 distinct (document_id, parse_version) parts visible after * dedup last-wins within a single bulk call * bulk replaces existing rows on matching key (same as single upsert) * bulk with distinct keys appends, never wipes pre-existing lineage * per-part entity_type follows last-wins rule Coverage delta: 30 → 37 cross-backend cases (collect-only verified). Sister to chenyexuan PR #1926 — without that workflow path fix, this test never triggered on PRs that touch `aperag/indexing/graph_storage/*`. Both PRs together restore real CI gating on cross-backend regressions for the LineageGraphStore Protocol surface. Part of task #61 DB compat audit (earayu2 directive msg=f26b703e), testing-lane slice (task #67, claimed via msg=e02c3028). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(compat): task #61 P1 — fold huangheng+ziang NIT into bulk_upsert tests Two non-blocking NITs from @huangheng msg=99b5ffd5 + @ziang msg=84f5c3cc re-CR on PR #1927 — fold-in to land more complete test: * `_rejects_mixed_names` now also asserts post-raise zero-side-effect (`get_entity("Alice") is None` + `get_entity("Bob") is None`) — pins Lesson #12 v6.4 aggregation-chain invariant: a backend that swapped validation order to raise AFTER the first row write would silently leak partial state. * New `_replay_is_idempotent` case — pins the Protocol's "Forward-only retry safety: per-part dedup so replays are idempotent" contract. A backend that appended on replay (instead of dedup-then-replace) would silently duplicate lineage members under retry. Coverage delta: 37 → 38 cross-backend cases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(compat): task #61 P1 — fold huangzhangshu description_parts NIT Per @huangzhangshu testing primary CR (msg=5bbc5d1a) — the bulk_upsert cases pinned lineage member identity but did not assert ``description_parts`` text content. A backend could write the lineage member key correctly but silently drop or stale-keep the description text, breaking the agent context retrieval contract. Add `description_parts` key→text assertions to 3 cases: * `_round_trip` — all 3 (doc_id, parse_version) parts must carry their source bulk's description text (not silently dropped). * `_dedup_last_wins_within_bulk` — same-key collapse must keep the LAST description text within the bulk (not first). * `_replaces_existing_same_key` — bulk's strip-then-append must overwrite the prior single-write description (not silently keep it). * `_replay_is_idempotent` — replay must overwrite first call's description with the second's (last-wins on replay), not just dedup the member. Coverage delta: same 38 cases, but every dedup/replace/replay case now pins both lineage AND description_parts text contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 43648f9 commit 9c94cbc

1 file changed

Lines changed: 213 additions & 0 deletions

File tree

tests/integration/compat/test_lineage_graph_compat.py

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -838,3 +838,216 @@ async def test_delete_relation_returns_false_when_absent(store, collection_id):
838838
_, s = store
839839
deleted = await s.delete_relation("Alice", "Bob", "knows")
840840
assert deleted is False
841+
842+
843+
# --- task #61 P0 — bulk_upsert_entity_with_lineage_parts cross-backend -----
844+
#
845+
# This Protocol method (Wave 8 W8-2) was previously not exercised by any
846+
# cross-backend test (冬柏 task #67 audit msg=3e93bb64). Bulk write paths
847+
# are exactly where backend differences emerge: batch limits / atomicity /
848+
# error handling / dedup contract. PM @不穷 elevated this to P0 in
849+
# msg=10b753e8. The Protocol contract (`aperag/indexing/graph.py:575+`):
850+
# * All record.name MUST share the same string — raise ValueError otherwise.
851+
# * Empty parts is a no-op.
852+
# * Per-part record.entity_type follows last-wins.
853+
# * compacted_description is intentionally NOT a parameter.
854+
# * Same dedup-by-(document_id, parse_version) key as single upsert.
855+
# * Forward-only retry safety: per-part dedup so replays are idempotent.
856+
857+
858+
@pytest.mark.asyncio
859+
async def test_bulk_upsert_entity_with_lineage_parts_empty_is_noop(store, collection_id):
860+
"""Per Protocol contract `empty parts is a no-op`. The store MUST
861+
not raise and MUST not create any row."""
862+
863+
_, s = store
864+
await s.bulk_upsert_entity_with_lineage_parts(parts=[])
865+
# Round-trip: get_entity should return None for a name that was never
866+
# written — the empty bulk must not implicitly create any entity.
867+
assert await s.get_entity("Alice") is None
868+
869+
870+
@pytest.mark.asyncio
871+
async def test_bulk_upsert_entity_with_lineage_parts_rejects_mixed_names(store, collection_id):
872+
"""Per Protocol contract `all record.name values MUST share the same
873+
string — backends MAY assert and raise ValueError if they don't`.
874+
All 3 production backends MUST raise to keep the bulk path's atomic
875+
guarantee honest (a mixed-name bulk would silently fan out to N
876+
different entity rows — atomicity meaningless)."""
877+
878+
_, s = store
879+
with pytest.raises(ValueError):
880+
await s.bulk_upsert_entity_with_lineage_parts(
881+
parts=[
882+
(_entity("Alice"), _LM_A_V1),
883+
(_entity("Bob"), _LM_B_V1),
884+
],
885+
)
886+
# Per @huangheng msg=99b5ffd5 + @ziang msg=84f5c3cc NIT — Lesson
887+
# #12 v6.4 (aggregation chain): a backend that raised AFTER writing
888+
# the first row would silently leak partial state. Pin
889+
# "raise-then-zero-side-effect" so any backend that swaps validation
890+
# order regresses loudly.
891+
assert await s.get_entity("Alice") is None, "raise must occur before any row write"
892+
assert await s.get_entity("Bob") is None, "raise must occur before any row write"
893+
894+
895+
@pytest.mark.asyncio
896+
async def test_bulk_upsert_entity_with_lineage_parts_round_trip(store, collection_id):
897+
"""Bulk write 3 distinct (document_id, parse_version) parts for the
898+
same entity name; all 3 lineage members + description parts must be
899+
visible after a single round-trip."""
900+
901+
_, s = store
902+
await s.bulk_upsert_entity_with_lineage_parts(
903+
parts=[
904+
(_entity("Alice", description="from-doc-A-v1"), _LM_A_V1),
905+
(_entity("Alice", description="from-doc-A-v2"), _LM_A_V2),
906+
(_entity("Alice", description="from-doc-B-v1"), _LM_B_V1),
907+
],
908+
)
909+
got = await s.get_entity("Alice")
910+
assert got is not None
911+
keys = {(lm.document_id, lm.parse_version) for lm in got.source_lineage}
912+
assert keys == {("doc-A", "v1"), ("doc-A", "v2"), ("doc-B", "v1")}, (
913+
f"all 3 (document_id, parse_version) members must be visible after bulk; got {keys}"
914+
)
915+
# Per @huangzhangshu msg=5bbc5d1a — description_parts text must
916+
# round-trip alongside lineage keys; a backend that wrote the
917+
# lineage member but dropped the description text would silently
918+
# break agent context retrieval.
919+
parts_by_key = {(p.document_id, p.parse_version): p.text for p in got.description_parts}
920+
assert parts_by_key[("doc-A", "v1")] == "from-doc-A-v1"
921+
assert parts_by_key[("doc-A", "v2")] == "from-doc-A-v2"
922+
assert parts_by_key[("doc-B", "v1")] == "from-doc-B-v1"
923+
924+
925+
@pytest.mark.asyncio
926+
async def test_bulk_upsert_entity_with_lineage_parts_dedup_last_wins_within_bulk(store, collection_id):
927+
"""Per Protocol contract `parts sharing the same key collapse
928+
last-wins`. Two parts in the same bulk with the same
929+
(document_id, parse_version) MUST collapse to one row, with the
930+
second part's description winning."""
931+
932+
_, s = store
933+
await s.bulk_upsert_entity_with_lineage_parts(
934+
parts=[
935+
(_entity("Alice", description="first-write"), _LM_A_V1),
936+
(_entity("Alice", description="last-write"), _LM_A_V1),
937+
],
938+
)
939+
got = await s.get_entity("Alice")
940+
assert got is not None
941+
matching = [lm for lm in got.source_lineage if lm.document_id == "doc-A" and lm.parse_version == "v1"]
942+
assert len(matching) == 1, f"same-key parts must collapse to one member; got {len(matching)}"
943+
# Per @huangzhangshu msg=5bbc5d1a — last-wins is on description
944+
# text not just lineage member. A backend that collapsed to one
945+
# member but kept the FIRST description text would silently break
946+
# the contract.
947+
parts_by_key = {(p.document_id, p.parse_version): p.text for p in got.description_parts}
948+
assert parts_by_key[("doc-A", "v1")] == "last-write", (
949+
f"same-key dedup MUST keep last description text; got {parts_by_key.get(('doc-A', 'v1'))!r}"
950+
)
951+
952+
953+
@pytest.mark.asyncio
954+
async def test_bulk_upsert_entity_with_lineage_parts_replaces_existing_same_key(store, collection_id):
955+
"""An existing single upsert with key (doc-A, v1) must be replaced
956+
when a subsequent bulk write contains the same key — same dedup
957+
contract as single upsert. Per Protocol: bulk strips existing rows
958+
whose key matches any incoming part, then appends the new parts."""
959+
960+
_, s = store
961+
await s.upsert_entity_with_lineage(record=_entity("Alice", description="single"), lineage=_LM_A_V1)
962+
await s.bulk_upsert_entity_with_lineage_parts(
963+
parts=[(_entity("Alice", description="bulk"), _LM_A_V1)],
964+
)
965+
got = await s.get_entity("Alice")
966+
assert got is not None
967+
matching = [lm for lm in got.source_lineage if lm.document_id == "doc-A" and lm.parse_version == "v1"]
968+
assert len(matching) == 1, "single + bulk on same key must collapse to one member"
969+
# Per @huangzhangshu msg=5bbc5d1a — bulk's strip-then-append MUST
970+
# keep the bulk-side description text (not the prior single
971+
# upsert's). A backend that kept the single-side text would silently
972+
# fail the strip semantics.
973+
parts_by_key = {(p.document_id, p.parse_version): p.text for p in got.description_parts}
974+
assert parts_by_key[("doc-A", "v1")] == "bulk", (
975+
f"bulk write MUST overwrite single-write description; got {parts_by_key.get(('doc-A', 'v1'))!r}"
976+
)
977+
978+
979+
@pytest.mark.asyncio
980+
async def test_bulk_upsert_entity_with_lineage_parts_appends_distinct_keys(store, collection_id):
981+
"""An existing single upsert with key (doc-A, v1) must coexist with
982+
a subsequent bulk write containing distinct keys — bulk must NOT
983+
wipe unrelated lineage members. This is the cross-backend dedup
984+
invariant the Protocol pins."""
985+
986+
_, s = store
987+
await s.upsert_entity_with_lineage(record=_entity("Alice"), lineage=_LM_A_V1)
988+
await s.bulk_upsert_entity_with_lineage_parts(
989+
parts=[
990+
(_entity("Alice"), _LM_A_V2),
991+
(_entity("Alice"), _LM_B_V1),
992+
],
993+
)
994+
got = await s.get_entity("Alice")
995+
assert got is not None
996+
keys = {(lm.document_id, lm.parse_version) for lm in got.source_lineage}
997+
assert keys == {("doc-A", "v1"), ("doc-A", "v2"), ("doc-B", "v1")}, (
998+
f"bulk with distinct keys MUST NOT wipe pre-existing lineage; got {keys}"
999+
)
1000+
1001+
1002+
@pytest.mark.asyncio
1003+
async def test_bulk_upsert_entity_with_lineage_parts_entity_type_last_wins(store, collection_id):
1004+
"""Per Protocol contract `per-part record.entity_type follows the
1005+
single-upsert "most recently observed value wins" rule (last
1006+
tuple's type is the post-write entity_type for the row)`. The
1007+
final row's entity_type MUST match the last bulk part's type."""
1008+
1009+
_, s = store
1010+
await s.bulk_upsert_entity_with_lineage_parts(
1011+
parts=[
1012+
(_entity("Alice", entity_type="person"), _LM_A_V1),
1013+
(_entity("Alice", entity_type="researcher"), _LM_A_V2),
1014+
],
1015+
)
1016+
got = await s.get_entity("Alice")
1017+
assert got is not None
1018+
assert got.entity_type == "researcher", f"last bulk part's entity_type wins; got {got.entity_type}"
1019+
1020+
1021+
@pytest.mark.asyncio
1022+
async def test_bulk_upsert_entity_with_lineage_parts_replay_is_idempotent(store, collection_id):
1023+
"""Per Protocol contract `Forward-only retry safety: the dedup key
1024+
is per-part, so a mid-flight crash + retry replays each tuple
1025+
idempotently`. Two consecutive bulk calls with the same key MUST
1026+
collapse to one lineage member (idempotent) AND the second call's
1027+
description MUST overwrite the first (last-wins on replay).
1028+
1029+
Per @huangheng msg=99b5ffd5 + @ziang msg=84f5c3cc NIT — replay
1030+
safety is the critical retry-correctness invariant; a backend that
1031+
appended on replay (instead of dedup-then-replace) would silently
1032+
duplicate lineage members under retry.
1033+
"""
1034+
1035+
_, s = store
1036+
await s.bulk_upsert_entity_with_lineage_parts(
1037+
parts=[(_entity("Alice", description="v1-first"), _LM_A_V1)],
1038+
)
1039+
await s.bulk_upsert_entity_with_lineage_parts(
1040+
parts=[(_entity("Alice", description="v1-replay"), _LM_A_V1)],
1041+
)
1042+
got = await s.get_entity("Alice")
1043+
assert got is not None
1044+
matching = [lm for lm in got.source_lineage if lm.document_id == "doc-A" and lm.parse_version == "v1"]
1045+
assert len(matching) == 1, f"replay must be idempotent (no duplicate member); got {len(matching)}"
1046+
# Per @huangzhangshu msg=5bbc5d1a — replay safety is not just about
1047+
# member dedup; the replay's description text MUST overwrite the
1048+
# first call's (last-wins on replay). A backend that kept the first
1049+
# text would silently fail forward-progress on retry.
1050+
parts_by_key = {(p.document_id, p.parse_version): p.text for p in got.description_parts}
1051+
assert parts_by_key[("doc-A", "v1")] == "v1-replay", (
1052+
f"replay MUST overwrite description (last-wins); got {parts_by_key.get(('doc-A', 'v1'))!r}"
1053+
)

0 commit comments

Comments
 (0)