Commit 836fe54
fix(celery T2.1): revert cutover to worker single-TX + add graph cleanup dispatch
Three architect rulings (msg=492315e8) applied as a follow-up on top
of T2.1 commit 7f51d44:
**Ruling 1 — §F.3 cutover MUST be single-TX in worker, not split.**
My earlier T2.1 implementation split §F.3 into:
- worker writes statement 1 (status=ACTIVE) on sync success
- reconciler.reconcile_cutover() writes statements 2+3 (demote+promote)
~30s later on its next cycle
That's a drift from the §F.3 explicit lock "Three statements in one
transaction" + introduces an ACTIVE-but-not-is_serving inconsistency
window the spec disallows in §F.4.
Fix:
- aperag/indexing/orchestrator.py: replace _finalize_active with
_finalize_active_with_cutover — runs all 3 §F.3 statements
(status=ACTIVE → demote prior is_serving for (doc, modality) →
promote new) inside the same session.begin() block. The §F.1
partial unique index (uniq_document_index_v2_serving) is honoured
naturally because statement 2 (demote-FALSE) precedes statement 3
(promote-TRUE) — index never sees two TRUE rows mid-TX.
- aperag/indexing/reconciler.py: remove reconcile_cutover() entirely
+ drop from run_reconcile_loop (now 3 scans: PENDING dispatch +
FAILED retry + RUNNING reclaim). Module docstring updated to
reference the orchestrator-side cutover with explicit pointer to
architect ruling msg=492315e8 Ruling 1.
- aperag/indexing/__init__.py: remove reconcile_cutover from exports.
- tests: 3 reconciler-cutover tests rewritten as 3 orchestrator-side
cutover tests (process_one_task happy path → ACTIVE+is_serving=TRUE
in one TX; partial unique invariant; per-modality independence).
E2E smoke updated — no longer needs a separate reconcile_cutover
step after orchestrator completion.
**Ruling 3 — graph cleanup MUST land in T2.1, not punt to T2.2.**
T2.2 lane is quota + bulkhead per Ruling 2 simplification; cleanup
is squarely chenyexuan's lane. Earlier T2.1 logged a WARN + skipped
graph backend cleanup as a "T2.2 follow-up" — that was a scope leak.
Fix in aperag/indexing/cleanup.py:
- Two distinct entry points with different graph semantics:
(A) cleanup_orphan_parse_versions — orphan parse_v GC. For graph,
this is a backend no-op because §D.3.6 sync supersede already
removed old parse_version's lineage members when the new
parse_version was written (per amended §D.3.2 canonical PR
#1725 head a0a4799 — sync clears by document_id, not parse_v).
Counted under "graph_noop" for telemetry; DB row still GC'd.
(B) cleanup_for_deleted_documents — caller-driven, runs when a
document is removed. For non-graph: flat backend delete per
(document_id, parse_version). For graph: invoke the lineage
cleanup loop on the worker's underlying LineageGraphStore via
EntityLock (Wave 1 conventions exposed as _store + _entity_lock).
Per-document dedup so multiple parse_version rows for the same
doc share one lineage cleanup call.
- _is_graph_worker uses Modality.GRAPH check (no graph.py import)
to avoid pulling Nebula/Neo4j extras into the cleanup module.
- _flat_backend_delete_callable renamed for clarity (was
_backend_delete_callable); only walks worker._backend (vector /
fulltext / summary / vision); explicitly does NOT walk _store.
- _cleanup_graph_lineage_for_document implements §D.3 lineage
cleanup at the storage layer using duck-typed Wave 1 conventions
(find_entity_ids_with_lineage / remove_entity_lineage_member /
optional gc_entity_if_orphan + symmetric relation cleanup). Each
entity is serialized through entity_lock.acquire() so a
concurrent graph sync cannot race.
Tests:
- test_cleanup_orphan_parse_version_for_graph_is_backend_noop —
asserts graph_noop counter increments + backend delete is NOT
called + DB row is still GC'd
- test_cleanup_for_deleted_documents_removes_non_graph_backend_per_parse_version
— vector path: delete_by_filter per parse_version + rows dropped
- test_cleanup_for_deleted_documents_calls_graph_lineage_cleanup_once_per_doc
— graph path: lineage cleanup runs once per doc regardless of how
many parse_version rows; entity GC + relation cleanup invoked
- test_cleanup_for_deleted_documents_handles_empty_input
- _StubLineageGraphStore + _StubAsyncLock + _GraphLikeWorker
test fixtures — duck-typed stand-ins for GraphModalityWorker so we
don't need graph extras to test cleanup
Local pytest: full indexing suite + Phase 3 audit gate = 104 passed,
0 failed (Wave 1 + T2.1 + T2.2 + T2.1 follow-up all green together
on rebased branch on Bryce's commit 057409f).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent 057409f commit 836fe54
5 files changed
Lines changed: 691 additions & 218 deletions
File tree
- aperag/indexing
- tests/unit_test/indexing
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| |||
127 | 128 | | |
128 | 129 | | |
129 | 130 | | |
130 | | - | |
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
| |||
242 | 242 | | |
243 | 243 | | |
244 | 244 | | |
245 | | - | |
246 | 245 | | |
247 | 246 | | |
248 | 247 | | |
249 | 248 | | |
250 | 249 | | |
251 | 250 | | |
| 251 | + | |
252 | 252 | | |
253 | 253 | | |
254 | 254 | | |
| |||
0 commit comments