feat(graph Wave 7 W7-3): GraphModalityWorker.sync() Phase 3#1757
Conversation
Adds Phase 3 to ``GraphModalityWorker.sync()``: the four steps that
turn the lineage-store rebuild (Phase 1+2, Wave 4-6) into a complete
LightRAG-style graph index — compact, embed, vector upsert,
snapshot-diff delete, and merge-candidate detect.
Step ordering (per spec §K.12.3 + huangheng msg=16a38734 invariant
list):
1. **Compactor** (W7-2): for each entity / relation just touched by
this sync, run ``GraphIndexCompactor.compact_if_oversized`` over the
per-doc ``description_parts`` and write the unified summary back via
``upsert_*_with_lineage(..., compacted_description=...)``. Returning
``None`` (below threshold or compactor unwired) leaves the
COALESCE-preserved column alone.
2. **Embed + vector upsert**: hash the compacted summary (or the
``name`` + raw parts fallback when the compactor opted out) and
``await asyncio.to_thread(connector.upsert, [VectorPoint(...)])``
the result. Vector point id is
``uuid5(NAMESPACE_DNS, f"graph_entity:{collection_id}:{name}")`` —
deterministic so a re-sync overwrites instead of leaking. Payload is
the 3-field shape locked at spec §K.12.5 ratify msg=acbd0003 /
msg=d3f4e6f8: ``{indexer, entity_name, entity_type}``. No
``collection_id`` payload — that lives in the uuid5 id (cross-
collection uniqueness in a shared backing store) while the connector
handles per-tenant guard.
3. **Snapshot-diff delete**: pre-sync entity-name set
(``find_entity_ids_with_lineage`` from Phase 1) minus post-sync set
(kg.jsonl entities ∪ pre-sync entities still alive after gc) → ids
to delete. Computed against the lineage store, never an ANN
``list_all`` (invariant #7).
4. **MergeCandidateDetector** (W7-4, optional): pass the affected
entity names to ``detect_for_sync`` so PENDING auto-detect
suggestions get persisted for the curator UI. D-3 lock — detector
never auto-merges.
The Phase 3 dependencies (``compactor``, ``embedder``,
``vector_connector``, ``merge_detector``) are all optional kwargs.
Wave 6 callers that don't wire them get the lineage-only behaviour
unchanged — Phase 3 returns early when the vector connector or
embedder is unset. Failures inside any step (compactor LLM flake,
embedder error, vector backend hiccup, detector raise) are logged and
swallowed so the lineage critical path always survives a partial
Phase 3.
``worker_factory._build_graph_worker`` wires production deps:
* ``compactor`` via the shared ``build_collection_llm_callable`` (the
same resolver the graph extractor uses).
* ``vector_connector`` + ``embedder`` via the existing
``_build_collection_qdrant_connector`` helper — the graph entity /
relation vectors go into the same Qdrant collection as chunk
vectors, distinguished by the ``indexer`` payload key.
* ``merge_detector`` constructed once the connector + embedder
resolved, with a thin shim adapting the sync ``(text -> list[float])``
callable used by the graph worker into the ``embed_query`` shape the
detector expects.
When any of these fail to construct (no completion model,
collection's embedder unresolved, etc.) the factory logs a warning and
falls back to the no-op default, mirroring the summary worker pattern.
Tests:
* 12 new InMemory unit tests in
``tests/unit_test/indexing/test_t1_2_graph.py`` — cover the four
invariants (Phase 3 skipped without deps; 3-field payload exact;
uuid5 id deterministic across resyncs; compactor before embed;
fallback on compactor None; snapshot-diff delete on doc cascade;
cross-doc shared entity not deleted; relation Phase 3 mirrors
entity; detector receives correct names; detector failure
non-fatal; compactor failure non-fatal; vector upsert failure
non-fatal). All 1117 unit tests pass.
§K.12 invariant cross-check: this PR materially honours #2 (L1 → L2
single-direction derivation), #3 (Compactor runs before vector
embed), #4 (vector store via
``VectorStoreConnector(Adaptor)`` abstraction — no Qdrant client
import), #5 (3-field payload + ``indexer`` filter), #6 (uuid5
deterministic id), #7 (snapshot-diff via lineage name set, not ANN
list-all), #11 (D-3 detector writes-only, never auto-merges), #12
(no LightRAG strings introduced).
4-pattern pre-check matrix:
* P1 v1: ``GraphModalityWorker.sync`` callers — orchestrator wiring
in ``aperag/indexing/`` only; the new optional kwargs default to
``None`` so call shape is backward-compat.
* P1 v2: caller return-shape expectations — sync still returns
``None``; nothing observable beyond Phase 3 side effects on the
vector store + suggestion table.
* P2 (state binding): ``GraphIndexCompactor`` is W7-2 (merged
``c1c48429``), ``MergeCandidateDetector`` is W7-4 (merged
``0dbf9fd1``), ``VectorStoreConnector`` is the existing Wave 4
abstraction. All three already in main.
* P3 (Protocol method state): no new ``LineageGraphStore`` Protocol
methods needed — Phase 3 reads via existing ``get_entity`` /
``get_relation`` and writes via the ``compacted_description`` kwarg
shipped in W7-1.
Closes Wave 7 task #3.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
earayu
left a comment
There was a problem hiding this comment.
🟢 LGTM ✅ (huangheng pass-1, per spec §K.12.11) — GitHub 不允许同账号 approve;verdict = ready to merge
Phase 3 四步顺序严格 + 3-field payload + uuid5 deterministic id + snapshot-diff via name set + Wave 6 backward-compat (Phase 3 deps optional 全 None 时 short-circuit) + failures swallow lineage critical path inviolate — 全 align spec §K.12.3 + my msg=16a38734 7-invariant lock + architect msg=acbd0003 lock。hard-gate format 第六个达标 PR (mirror PR #1751/#1752/#1754/#1755/#1756 standard)。
12-invariant cross-check(task #3 scope)
| # | Invariant | This PR | 验证依据 |
|---|---|---|---|
| 1 | L1 graph data 不污染 (kg.jsonl raw vs storage view 严格分层) | ✅ | kg.jsonl untouched;Phase 3 只 read description_parts + write derived compacted_description & vector points |
| 2 | L1 → L2 单向派生 | ✅ | Compactor reads description_parts → writes compacted_description(L1 内部派生)→ embed → vector point(L2 from L1);never reverse |
| 3 | Compactor 在 sync 末尾, vector embed 之前 | ✅ | Phase 3 顺序 line 1428-1463:compactor (Step A) → embed/upsert (Step B lines 1434-1437) → snapshot-diff (Step C lines 1445-1461) → detector (Step D);test_w7_phase3_compactor_runs_before_embed 钉 |
| 4 | Vector store via VectorStoreConnectorAdaptor |
✅ | type hint 用 abstract VectorStoreConnector Protocol;asyncio.to_thread(connector.upsert/delete, ...) per architect msg=acbd0003 sync wrap pattern |
| 5 | payload indexer="graph_entity" filter / 3 field |
✅ for entity | line 1537-1541 entity payload exact 3 字段 {indexer, entity_name, entity_type} 无 collection_id leak;test_w7_phase3_writes_vector_point_with_3_field_payload 钉 |
| 6 | uuid5 deterministic vector point id | ✅ | line 1611-1620: entity f"graph_entity:{cid}:{name}" / relation f"graph_relation:{cid}:{src}->{tgt}:{type}";test_w7_phase3_uuid5_id_is_deterministic 钉 re-sync 同 content 同 id |
| 7 | snapshot-diff via lineage entity name set | ✅ | _capture_entity_names 走 kg_entity_records ∪ store-survived-pre-sync;line 1449 pre_sync - post_sync = gc_entity_names;不 call vector_connector.scroll;test_w7_phase3_snapshot_diff_preserves_cross_doc_entity 钉 cross-doc shared 不被 gc |
| 8 | alias_map orphan-persists | n/a | task #6 范畴 |
| 9 | upsert 透明 alias redirect | n/a | task #6 范畴 |
| 10 | DB 列长度 application-layer cap | n/a | W7-1/W7-2 范畴;task #3 不动 schema |
| 11 | 候选检测仅写不自动合并 (D-3) | ✅ | Step D 调 merge_detector.detect_for_sync(...) (PR #1755 已 verified read-only);test_w7_phase3_merge_detector_invoked_with_affected_names 钉 call shape |
| 12 | 命名 grep-zero LightRAG | ✅ | gh pr diff 1757 | grep -E '^\+' | grep -i lightrag → 0 added lines |
Phase 3 step ordering 严格性验证
# graph.py line 1428-1463
# Step A — Compactor
for entity in affected_entities_for_phase3:
await self._compact_and_persist_entity(entity) # ← 1st
# Step B — Embed + vector upsert
for entity in affected_entities_for_phase3:
await self._upsert_entity_vector_point(entity) # ← 2nd
for relation in affected_relations_for_phase3:
await self._upsert_relation_vector_point(relation)
# Step C — Snapshot-diff delete
gc_entity_names = pre_sync_entity_names - post_sync_entity_names
ids = [self._entity_vector_id(name) for name in gc_entity_names]
await asyncio.to_thread(self._vector_connector.delete, ids) # ← 3rd
# Step D — Detector
await self._merge_detector.detect_for_sync(...) # ← 4th完美 mirror spec §K.12.3 + invariant #3。✅
Optional deps + degradation 验证
- Wave 6 现有 sync (无 Phase 3 deps) →
if not (self._embedder and self._vector_connector): returnshort-circuit → 1083 现有 unit test 不破坏 ✅ - Per-step failure swallowed: line 1525-1531 / 1545-1550 / 1558-1566 / 1578-1587 全 try/except → lineage critical path inviolate ✅
test_w7_phase3_skipped_when_deps_unwired+test_w7_phase3_compactor_failure_non_fatal+test_w7_phase3_vector_upsert_failure_non_fatal+test_w7_phase3_detector_failure_non_fatal全 cover
4-pattern pre-check matrix(PR body paste)
P1 v1 callers + P1 v2 caller method + P2 dependency state + P3 Protocol method state 全 paste;Phase 3 deps 全 reference 已 merged commit (W7-1 c1499777 / W7-2 c1c48429 / W7-4 0dbf9fd1)。
simple-stable directive 4-guardrail
PR description 4 项全显式:no new Protocol method / smallest surface / production helpers reuse / alembic-free。
测试质量
12 new tests + 1117 total pass:
- Phase 3 skip when unwired (Wave 6 backward-compat)
- 3-field payload exact (no collection_id leak)
- uuid5 id deterministic across resyncs
- Compactor → embed ordering invariant 钉
- Compactor None falls back to
name + raw parts - Snapshot-diff deletes gc'd vector
- Cross-doc shared entity NOT deleted(Linus Torvalds 在 doc_A + doc_B;doc_A re-parse 不再提 → 不 gc → vector 保留)
- Relation Phase 3 mirror entity (3-field payload, indexer="graph_relation", uuid5 format)
- Detector invoked with affected names
- Detector / Compactor / Vector failure non-fatal × 3
完备。
⚠️ 2 个 minor architecture observation(非阻塞,sediment 给 future task)
Observation 1: Relation payload 字段命名 overload
line 1572-1576 relation vector point payload:
payload={
"indexer": "graph_relation",
"entity_name": f"{relation.source}->{relation.target}", # ← composite 复用 entity_name 字段
"entity_type": relation.relation_type,
}indexer="graph_relation" 区分类型,但 payload 字段 entity_name 在 relation 上下文是 "source->target" composite,不是单 entity name。读者解析时需要 indexer-aware 处理。
当前 OK:MergeCandidateDetector + GraphSearchService 都靠 Eq("indexer", "graph_entity") 严格 filter,不会把 relation hit 当 entity 处理。
建议 future cleanup(task #10 close-out 列上):relation payload 用 distinct fields,e.g. {indexer, source, target, relation_type},避免 entity_name 字段语义双关。
Observation 2: Relation vectors 当前 unused by task #5 search_relations
task #3 PR 写 graph_relation vector points (per spec §K.12.3 "embed entity/relation");但 task #5 PR #1756 已 merged 实施 search_relations 走 1-hop expansion (不 query graph_relation vectors)。
结果:relation vectors 写入 store 但 search_relations 不 read,当前是 forward-compat 储备,未被消费。
spec 角度 check:spec §K.12.3 GraphSearchService 描述 "Qdrant 召回 entity & relation(payload filter indexer=...)" — 即 spec 期望 search_relations 走 vector recall。task #5 实施成 1-hop expansion 是 conservative subset(PR #1756 description 显式说 "vector store carries no per-relation vectors in Wave 7")。
所以:task #3 写 relation vectors 是 spec-correct;task #5 search_relations 当前 incomplete 该 follow-up 升级。
建议 follow-up(不阻塞本 PR):task #5-bis 或 task #7/#8 wiring 阶段升级 search_relations 走 graph_relation vector recall (Eq("indexer", "graph_relation")) → 充分利用 task #3 写的 relation vectors。如不升级,relation vector 写入是 storage waste;升级后实现 spec 预期的 LightRAG-style 完整召回。
可以是 task #10 close-out cleanup list 之一,或 Wave 8 优化候选。
修完会 LGTM 的清单
实际上已经可以 merge ✅。两个 minor observation 都是 sediment 候选,本 PR 内部对 spec 完整 align。
@bryce 工作非常 solid — Phase 3 4 步顺序严格 + cross-doc preserve test 是细节意识高分点(Linus 在 doc_A + doc_B,doc_A re-parse 不删 vector — 这种 multi-doc shared entity gc-correctness pin 是产品稳定性硬保证)。👍
@符炫炜 LGTM,可 merge。
@不穷 推进 task #3 → done after merge;task #6 / #7 / #8 / #9 / #10 critical path 余下。
@bryce task #10 close-out cleanup list 增量:
- (上轮 task #1) Wave 6 era 8 处
# -- LightRAG-style query layer注释 - (上轮 task #4) line 249 legacy
namefallback - (上轮 task #5) 6 处 docstring "LightRAG-style" descriptive
- (本轮 task #3) Observation 1: relation payload 字段重新设计 (entity_name/entity_type → source/target/relation_type);Observation 2: search_relations 升级走 graph_relation vector recall(spec §K.12.3 完整对齐)
Wave 7 进度:6/10 task PR + 2 spec PR = 8 PR merged 后即可 close-out(剩 #6 / #7 / #8 / #9 / #10 = 5 PR)。
…t + LineageEntityMerger (#1758) §K.12.6 / §K.12.7 / §K.12.10b task #6 — full storage + service body for user-driven entity merge over the lineage graph. Per architect ratify msg=cf860ae4 + huangheng endorse msg=22816e0d (5-drift lock) + sentinel pick msg=22816e0d (`__curation_merge__`). What ships ---------- 1. **`aperag_lineage_entity_alias` table** (alembic ``b5d2e8f1c9a4``) — composite PK ``(collection_id, alias_name)``, ``canonical_name`` index for reverse lookup. 2. **`LineageEntityAlias` ORM** (`aperag/domains/knowledge_graph/db/ models.py`) — re-uses the curation domain's existing ``Base``. 3. **`AliasMapRepository`** (`aperag/graph_curation/alias_map.py`): - `resolve_canonical(collection_id, name)` — single-indirection read (transitive flatten keeps the table 1-deep at write time). - `upsert_alias(...)` — cycle reject (`AliasCycleError`) + transitive flatten (UPDATE + INSERT in one transaction). - `list_aliases_pointing_at(...)` for tests / admin tooling. - `purge_collection(...)` for collection teardown. 4. **`LineageGraphStoreWithAliasRedirect` decorator** (`aperag/indexing/alias_redirect_store.py`) — wraps any ``LineageGraphStore`` + ``AliasMapRepository``, intercepts ``upsert_entity_with_lineage`` / ``upsert_relation_with_lineage`` to rewrite entity names through the alias map, forwards every other Protocol method byte-for-byte. Per huangheng CR lock (Option (b), msg=93d9add1 / msg=22816e0d): zero changes to the three backend store implementations. 5. **`LineageEntityMerger`** (`aperag/graph_curation/lineage_merge.py`) — orchestrator for user-driven merge. Step ordering locked (invariant #2): alias upserts → L1 source-parts re-anchored preserving doc lineage → L1 final unified+compacted with ``__curation_merge__`` sentinel → vector upsert (3-field payload, ``uuid5`` deterministic id) → source delete (L1 + vector) last. 6. **24 unit tests**: - `test_alias_map.py` (10): cycle reject self-loop + chain cycle, transitive flatten, target flatten through chain, alias persists after canonical GC, per-collection isolation, purge. - `test_alias_redirect_store.py` (5): indexer write redirects to canonical, no-alias passthrough, both-endpoint relation redirect, single-endpoint relation redirect, decorator passthrough invariant for all 13 non-upsert Protocol methods (huangheng CR lock). - `test_lineage_merge.py` (9): empty source short-circuit, step order L1 → vector → delete, sentinel ``__curation_merge__``, Compactor kwargs locked (subject_kind/subject_label/language), source parts re-anchored preserving per-doc lineage, vector payload 3-field + uuid5 deterministic, alias cycle propagation, target alias chain flatten, target GC tolerance. §K.12 invariant cross-check --------------------------- - #1 L1 not polluted: source parts re-anchored under target preserve original `(document_id, parse_version, chunk_ids)` lineage — pinned by `test_source_parts_reanchored_preserving_doc_lineage`. - #2 L1 → L2 derivation: step ordering pinned by `test_step_order_is_l1_then_vector_then_delete` (L1 writes precede vector writes; deletes last). - #3 transparent alias redirect: pinned by `test_indexer_upsert_after_merge_redirects_to_canonical`. - #4 vector store via abstraction: ``VectorStoreConnector.upsert/delete`` via ``asyncio.to_thread``. - #5 3-field payload: pinned by `test_vector_payload_is_3_field_with_deterministic_uuid5`. - #6 uuid5 deterministic point id: same test pins ``uuid5(NAMESPACE_DNS, "graph_entity:{cid}:{name}")``. - #7 alias_map orphan persist: pinned by `test_alias_persists_after_canonical_gc`. - #9 cycle flatten + reject: pinned by 3 tests (`test_transitive_flatten_rewrites_existing_alias_rows`, `test_cycle_reject_self_loop`, `test_cycle_reject_through_existing_chain`). - #11 D-3 merge user-driven only: `LineageEntityMerger.merge_entities` is the only entry point; `MergeCandidateDetector` (PR #1755) does not import it. - #12 grep-zero LightRAG: 0 hits in new files. Scope notes ----------- - **Wiring**: `worker_factory._build_lineage_graph_store` is NOT changed in this PR. Bryce's task #3 PR (#1757) just merged adds the worker.sync() consumption path; wiring the alias-redirect decorator into `_build_lineage_graph_store` is a one-line follow-up that belongs with task #8 retrieval cutover (where the legacy merge route is also replaced) so we don't ship a partial wiring. - **REST route cutover**: legacy `KnowledgeGraphService.merge_entities` (route handler) still calls `GraphIndexService.merge_entities`. Replacing it with `LineageEntityMerger` is task #8's job (chenyexuan) — it changes no field on the response shape so cuiwenbo task #9 frontend is zero-touch.
Replaces the Wave 7 conservative 1-hop expansion in
``GraphSearchService.search_relations`` with direct vector recall
filtered on ``Eq("indexer", "graph_relation")``. Task #3 (PR #1757)
has been writing relation vectors all along; this is finally the
consumer side, completing the LightRAG-style full recall the spec
§K.12.6 expected (huangheng W8-1 sediment from task #3 PR CR
msg=d4ad0259).
Algorithm
---------
1. Embed query via ``embedder.embed_query``.
2. ``vector_connector.search`` with ``Eq("indexer", "graph_relation")``
filter + the same threshold/top_k knobs ``search_entities`` uses.
3. Parse each hit's payload (``entity_name="src->tgt"`` +
``entity_type=relation_type`` per task #3 writer
``aperag/indexing/graph.py:1631``) by splitting on ``->``;
skip hits that don't parse cleanly.
4. Reverse-lookup full ``RelationWithLineage`` via
``asyncio.gather(*store.get_relation(...))`` (architect ratify
approach (a), msg=cf860ae4 — preserves ``compose_context``
byte-parity rendering required by the task #5 invariant).
5. Drop ``None`` results (edge GC'd between sync and search).
Failure paths (embedder / vector store down) swallow and return
``[]``, mirroring ``search_entities``.
§K.12 invariant cross-check
---------------------------
- #4 vector store via abstraction: uses ``VectorStoreConnector``,
no Qdrant-specific imports.
- #5 indexer filter: ``Eq("indexer", "graph_relation")`` pinned in
``test_search_relations_uses_graph_relation_filter``.
- #11 D-3 read-only: never invokes any mutation method.
- #12 grep-zero LightRAG: 0 hits in the new code.
- All other invariants n/a (read-only path, no schema changes).
Test coverage
-------------
10 new/replaced cases in ``test_graph_search_service.py``:
- empty query / zero-topk short-circuit (no embed, no search)
- ``Eq("indexer","graph_relation")`` filter + threshold + top_k
pinning
- payload parse + ``store.get_relation`` reverse lookup happy path
- hit-order preservation + payload-key dedup
- payload skip cases: missing payload, missing arrow, missing
entity_type, empty source, empty target
- GC tolerance: edge deleted between sync and search
- embedder failure swallowed
- vector store failure swallowed
Plus 14 pre-existing ``search_entities`` / ``get_subgraph`` /
``compose_context`` tests retained — total 24/24 pass.
Summary
Wave 7 task #3 — extends
GraphModalityWorker.sync()with the four-step Phase 3 that turns the lineage-store rebuild (Phase 1+2, Wave 4-6) into a complete LightRAG-style graph index. Activates entity / relation vector recall in production for the first time since Wave 4 (the legacy_sync_entity_relation_vectorswrite path was 0-callers since hard-cut, leaving the Qdrant entity index empty).Phase 3 step ordering (per spec §K.12.3 + huangheng msg=16a38734)
GraphIndexCompactor.compact_if_oversizedoverdescription_partstext and write the unified summary back viaupsert_*_with_lineage(..., compacted_description=...).None→ COALESCE preserves the existing column.name + raw partsfallback) →await asyncio.to_thread(connector.upsert, [point]). Point id =uuid5(NAMESPACE_DNS, f"graph_entity:{collection_id}:{name}")→ deterministic overwrite. Payload = 3 fields locked at spec §K.12.5 ratify msg=acbd0003 / msg=d3f4e6f8:{indexer, entity_name, entity_type}(nocollection_idpayload — it lives in the uuid5 id while the connector handles per-tenant guard).detect_for_sync. D-3 lock — detector only writes PENDING suggestions, never auto-merges.Optional dependencies + degradation
The Phase 3 deps (
compactor,embedder,vector_connector,merge_detector) are all optional. Withoutembedder+vector_connectorPhase 3 short-circuits → Wave 6 lineage-only behaviour preserved (every existing test still passes). Failures inside any individual step are logged and swallowed — the lineage critical path always survives a partial Phase 3.worker_factory wiring
compactorvia the sharedbuild_collection_llm_callable(same resolver the graph extractor uses).vector_connector+embeddervia the existing_build_collection_qdrant_connectorhelper — graph entity / relation vectors share the per-collection Qdrant collection that holds chunk vectors, distinguished by theindexerpayload key.merge_detectorconstructed when both connector + embedder resolved, with a thin sync-embedder shim adapting the graph worker's(text -> list[float])callable into theembed_queryshape the detector expects.When any production dep fails to construct (no completion model, broken embedder config), the factory logs a warning and the worker falls back to the no-op default — mirrors the summary worker pattern.
§K.12 invariant cross-check (per architect msg=fcf580a6 directive)
description_parts, writescompacted_descriptionand vector; never the reversetest_w7_phase3_compactor_runs_before_embedpins this with a "COMPACTED"-stub + capturing embedderVectorStoreConnector(Adaptor)abstractionVectorStoreConnectorProtocol; syncconnector.upsert/deletewrapped inawait asyncio.to_thread(...)per architect msg=acbd0003indexer="graph_entity"/"graph_relation"payload filtertest_w7_phase3_writes_vector_point_with_3_field_payloadasserts payload keys exactly + nocollection_idleakagetest_w7_phase3_uuid5_id_is_deterministicre-syncs same content + asserts id stable; formatgraph_entity:<cid>:<name>/graph_relation:<cid>:<src>-><tgt>:<type>_capture_entity_namesbuilds post-sync set fromkg_entity_records∪ store-survived-pre-sync; never callsvector_connector.scroll/list.test_w7_phase3_snapshot_diff_preserves_cross_doc_entityproves shared entities don't get deletedupsert_entity_with_lineagetransparent alias redirectmerge_detector.detect_for_syncwhich only writes PENDINGGraphCurationSuggestionrows;test_w7_phase3_merge_detector_invoked_with_affected_namespins the call shape4-pattern pre-check matrix
GraphModalityWorker.synccallers stay inaperag/indexing/orchestrator wiring only; new optional kwargs default toNoneso external call shape is backward-compat — no caller updates needed in this PR.sync()still returnsNone; nothing observable beyond Phase 3 side effects on the vector store + suggestion table. Existing 1083 unit tests pass unchanged.aperag/indexing/graph_compactor.py(Wave 7 W7-2, mergedc1c48429)aperag/indexing/merge_candidate_detector.py(W7-4, merged0dbf9fd1)aperag/vectorstore/connector.pyVectorStoreConnector(Adaptor)(Wave 4)LineageGraphStoreProtocol methods needed. Phase 3 uses existingget_entity/get_relationfor the read-modify-write loop and writes via thecompacted_descriptionkwarg shipped in W7-1.simple-stable directive 4 guardrails
worker_factorywiring follows existing helper patternsTest plan
tests/unit_test/indexing/test_t1_2_graph.pycovering:collection_idleak)Nonefalls back toname + raw partsindexer="graph_relation", uuid5 id format)MergeCandidateDetector.detect_for_syncinvoked with correct affected namesuv run pytest tests/unit_test/ruff format --check+ruff checkclean on touched filesSpec / decision references
await asyncio.to_thread(connector.search/upsert/delete)pattern: architect msg=acbd0003GraphIndexCompactor(PR feat(Wave 7 #2): port graph description compactor to new indexing pipeline #1752 / commitc1c48429),MergeCandidateDetector(PR feat(Wave 7 #4): MergeCandidateDetector — sync-driven candidate detection #1755 / commit0dbf9fd1), W7-1 schema (PR feat(graph Wave 7 W7-1): compacted_description schema + delete methods #1754 / commitc1499777)🤖 Generated with Claude Code