feat(graph Wave 7 W7-3): GraphModalityWorker.sync() Phase 3 by earayu · Pull Request #1757 · apecloud/ApeRAG

earayu · 2026-04-27T18:24:01Z

Summary

Wave 7 task #3 — extends GraphModalityWorker.sync() with the four-step Phase 3 that turns the lineage-store rebuild (Phase 1+2, Wave 4-6) into a complete LightRAG-style graph index. Activates entity / relation vector recall in production for the first time since Wave 4 (the legacy _sync_entity_relation_vectors write path was 0-callers since hard-cut, leaving the Qdrant entity index empty).

Phase 3 step ordering (per spec §K.12.3 + huangheng msg=16a38734)

Compactor (W7-2 dep): for each entity / relation just touched, run GraphIndexCompactor.compact_if_oversized over description_parts text and write the unified summary back via upsert_*_with_lineage(..., compacted_description=...). None → COALESCE preserves the existing column.
Embed + vector upsert: hash compacted (or name + raw parts fallback) → await asyncio.to_thread(connector.upsert, [point]). Point id = uuid5(NAMESPACE_DNS, f"graph_entity:{collection_id}:{name}") → deterministic overwrite. Payload = 3 fields locked at spec §K.12.5 ratify msg=acbd0003 / msg=d3f4e6f8: {indexer, entity_name, entity_type} (no collection_id payload — it lives in the uuid5 id while the connector handles per-tenant guard).
Snapshot-diff delete: pre-sync entity-name set minus post-sync set → ids to delete. Source: lineage store, never an ANN list-all (invariant feat: chat animate #7).
MergeCandidateDetector (W7-4 dep, optional): pass affected entity names to detect_for_sync. D-3 lock — detector only writes PENDING suggestions, never auto-merges.

Optional dependencies + degradation

The Phase 3 deps (compactor, embedder, vector_connector, merge_detector) are all optional. Without embedder + vector_connector Phase 3 short-circuits → Wave 6 lineage-only behaviour preserved (every existing test still passes). Failures inside any individual step are logged and swallowed — the lineage critical path always survives a partial Phase 3.

worker_factory wiring

compactor via the shared build_collection_llm_callable (same resolver the graph extractor uses).
vector_connector + embedder via the existing _build_collection_qdrant_connector helper — graph entity / relation vectors share the per-collection Qdrant collection that holds chunk vectors, distinguished by the indexer payload key.
merge_detector constructed when both connector + embedder resolved, with a thin sync-embedder shim adapting the graph worker's (text -> list[float]) callable into the embed_query shape the detector expects.

When any production dep fails to construct (no completion model, broken embedder config), the factory logs a warning and the worker falls back to the no-op default — mirrors the summary worker pattern.

§K.12 invariant cross-check (per architect msg=fcf580a6 directive)

#	Invariant	This PR
1	L1 graph data not polluted	✅ kg.jsonl untouched; Phase 3 only reads from / writes derived columns
2	L1 → L2 single-direction derivation (compacted is derived)	✅ Compactor reads `description_parts`, writes `compacted_description` and vector; never the reverse
3	Compactor at sync end, before vector embed	✅ Step ordering Phase 3 = compactor → embed; `test_w7_phase3_compactor_runs_before_embed` pins this with a "COMPACTED"-stub + capturing embedder
4	Vector store via `VectorStoreConnector(Adaptor)` abstraction	✅ no Qdrant client import; type hint is `VectorStoreConnector` Protocol; sync `connector.upsert` / `delete` wrapped in `await asyncio.to_thread(...)` per architect msg=acbd0003
5	`indexer="graph_entity"`/`"graph_relation"` payload filter	✅ `test_w7_phase3_writes_vector_point_with_3_field_payload` asserts payload keys exactly + no `collection_id` leakage
6	uuid5 deterministic vector point id	✅ `test_w7_phase3_uuid5_id_is_deterministic` re-syncs same content + asserts id stable; format `graph_entity:<cid>:<name>` / `graph_relation:<cid>:<src>-><tgt>:<type>`
7	snapshot-diff via lineage entity name set	✅ `_capture_entity_names` builds post-sync set from `kg_entity_records` ∪ store-survived-pre-sync; never calls `vector_connector.scroll`/list. `test_w7_phase3_snapshot_diff_preserves_cross_doc_entity` proves shared entities don't get deleted
8	alias_map orphan-persists across canonical gc	N/A (W7-6)
9	`upsert_entity_with_lineage` transparent alias redirect	N/A (W7-6)
10	DB column length is application-layer cap, not schema CHECK	N/A (W7-1 / W7-2)
11	Candidate detection writes-only, no auto-merge (D-3)	✅ Step D calls `merge_detector.detect_for_sync` which only writes PENDING `GraphCurationSuggestion` rows; `test_w7_phase3_merge_detector_invoked_with_affected_names` pins the call shape
12	grep-zero LightRAG naming	✅ no LightRAG strings introduced

4-pattern pre-check matrix

P1 v1 (caller import count): GraphModalityWorker.sync callers stay in aperag/indexing/ orchestrator wiring only; new optional kwargs default to None so external call shape is backward-compat — no caller updates needed in this PR.
P1 v2 (caller method coverage): sync() still returns None; nothing observable beyond Phase 3 side effects on the vector store + suggestion table. Existing 1083 unit tests pass unchanged.
P2 (state binding): dependencies all already on main —
- aperag/indexing/graph_compactor.py (Wave 7 W7-2, merged c1c48429)
- aperag/indexing/merge_candidate_detector.py (W7-4, merged 0dbf9fd1)
- aperag/vectorstore/connector.py VectorStoreConnector(Adaptor) (Wave 4)
P3 (Protocol method state): no new LineageGraphStore Protocol methods needed. Phase 3 uses existing get_entity / get_relation for the read-modify-write loop and writes via the compacted_description kwarg shipped in W7-1.

simple-stable directive 4 guardrails

Guardrail	Status
#1 don't expand scope	✅ no new Protocol methods; deps optional; ~330-line worker delta + ~470-line test delta; `worker_factory` wiring follows existing helper patterns
#2 ship fast	✅ behind merged W7-1 / W7-2 / W7-4; production wiring + unit test suite in one PR
#3 simple > complex	✅ failures swallowed (lineage critical path inviolate); deterministic uuid5 ids; same connector + embedder helpers reused across vector / summary / vision workers
#4 private-deploy maintenance-free	✅ all knobs via existing collection LLM / embedder resolvers; no new operator config; alembic-free (vector schema lives in Qdrant collection from chunk vectors)

Test plan

12 new InMemory unit tests in tests/unit_test/indexing/test_t1_2_graph.py covering:
- Phase 3 skipped when vector deps unwired (Wave 6 backward-compat)
- 3-field payload exact (no collection_id leak)
- uuid5 id deterministic across resyncs
- Compactor runs before embed (ordering invariant)
- Compactor None falls back to name + raw parts
- Snapshot-diff deletes gc'd vector points
- Cross-doc shared entity NOT deleted (Linus mentioned in doc_A + doc_B; doc_A re-parsed without him; vector survives)
- Relation Phase 3 mirrors entity (3-field payload, indexer="graph_relation", uuid5 id format)
- MergeCandidateDetector.detect_for_sync invoked with correct affected names
- Detector failure non-fatal (lineage + vector upsert intact)
- Compactor failure non-fatal (embedder gets fallback text)
- Vector upsert failure non-fatal (lineage row intact)
All 1117 unit tests pass (1083 + 34 new across W7-1 + W7-3): uv run pytest tests/unit_test/
ruff format --check + ruff check clean on touched files
CI compat-graph stage (pure unit testing only — Phase 3 integration with real Qdrant lives in W7-5 / W7-8 verification, intentionally not in this PR scope)

Spec / decision references

spec PR docs(celery Wave 7 §K.12): legacy graphindex 全删 + LightRAG 风格 graph 层最终态 spec #1751 §K.12.3 (Phase 3 step ordering) + §K.12.5 (payload + uuid5 id rules)
payload 3-field lock: architect msg=d3f4e6f8 / msg=acbd0003 / huangheng msg=16a38734
await asyncio.to_thread(connector.search/upsert/delete) pattern: architect msg=acbd0003
Phase 3 invariant 7-list (huangheng): msg=16a38734
deps: GraphIndexCompactor (PR feat(Wave 7 #2): port graph description compactor to new indexing pipeline #1752 / commit c1c48429), MergeCandidateDetector (PR feat(Wave 7 #4): MergeCandidateDetector — sync-driven candidate detection #1755 / commit 0dbf9fd1), W7-1 schema (PR feat(graph Wave 7 W7-1): compacted_description schema + delete methods #1754 / commit c1499777)

🤖 Generated with Claude Code

Adds Phase 3 to ``GraphModalityWorker.sync()``: the four steps that turn the lineage-store rebuild (Phase 1+2, Wave 4-6) into a complete LightRAG-style graph index — compact, embed, vector upsert, snapshot-diff delete, and merge-candidate detect. Step ordering (per spec §K.12.3 + huangheng msg=16a38734 invariant list): 1. **Compactor** (W7-2): for each entity / relation just touched by this sync, run ``GraphIndexCompactor.compact_if_oversized`` over the per-doc ``description_parts`` and write the unified summary back via ``upsert_*_with_lineage(..., compacted_description=...)``. Returning ``None`` (below threshold or compactor unwired) leaves the COALESCE-preserved column alone. 2. **Embed + vector upsert**: hash the compacted summary (or the ``name`` + raw parts fallback when the compactor opted out) and ``await asyncio.to_thread(connector.upsert, [VectorPoint(...)])`` the result. Vector point id is ``uuid5(NAMESPACE_DNS, f"graph_entity:{collection_id}:{name}")`` — deterministic so a re-sync overwrites instead of leaking. Payload is the 3-field shape locked at spec §K.12.5 ratify msg=acbd0003 / msg=d3f4e6f8: ``{indexer, entity_name, entity_type}``. No ``collection_id`` payload — that lives in the uuid5 id (cross- collection uniqueness in a shared backing store) while the connector handles per-tenant guard. 3. **Snapshot-diff delete**: pre-sync entity-name set (``find_entity_ids_with_lineage`` from Phase 1) minus post-sync set (kg.jsonl entities ∪ pre-sync entities still alive after gc) → ids to delete. Computed against the lineage store, never an ANN ``list_all`` (invariant #7). 4. **MergeCandidateDetector** (W7-4, optional): pass the affected entity names to ``detect_for_sync`` so PENDING auto-detect suggestions get persisted for the curator UI. D-3 lock — detector never auto-merges. The Phase 3 dependencies (``compactor``, ``embedder``, ``vector_connector``, ``merge_detector``) are all optional kwargs. Wave 6 callers that don't wire them get the lineage-only behaviour unchanged — Phase 3 returns early when the vector connector or embedder is unset. Failures inside any step (compactor LLM flake, embedder error, vector backend hiccup, detector raise) are logged and swallowed so the lineage critical path always survives a partial Phase 3. ``worker_factory._build_graph_worker`` wires production deps: * ``compactor`` via the shared ``build_collection_llm_callable`` (the same resolver the graph extractor uses). * ``vector_connector`` + ``embedder`` via the existing ``_build_collection_qdrant_connector`` helper — the graph entity / relation vectors go into the same Qdrant collection as chunk vectors, distinguished by the ``indexer`` payload key. * ``merge_detector`` constructed once the connector + embedder resolved, with a thin shim adapting the sync ``(text -> list[float])`` callable used by the graph worker into the ``embed_query`` shape the detector expects. When any of these fail to construct (no completion model, collection's embedder unresolved, etc.) the factory logs a warning and falls back to the no-op default, mirroring the summary worker pattern. Tests: * 12 new InMemory unit tests in ``tests/unit_test/indexing/test_t1_2_graph.py`` — cover the four invariants (Phase 3 skipped without deps; 3-field payload exact; uuid5 id deterministic across resyncs; compactor before embed; fallback on compactor None; snapshot-diff delete on doc cascade; cross-doc shared entity not deleted; relation Phase 3 mirrors entity; detector receives correct names; detector failure non-fatal; compactor failure non-fatal; vector upsert failure non-fatal). All 1117 unit tests pass. §K.12 invariant cross-check: this PR materially honours #2 (L1 → L2 single-direction derivation), #3 (Compactor runs before vector embed), #4 (vector store via ``VectorStoreConnector(Adaptor)`` abstraction — no Qdrant client import), #5 (3-field payload + ``indexer`` filter), #6 (uuid5 deterministic id), #7 (snapshot-diff via lineage name set, not ANN list-all), #11 (D-3 detector writes-only, never auto-merges), #12 (no LightRAG strings introduced). 4-pattern pre-check matrix: * P1 v1: ``GraphModalityWorker.sync`` callers — orchestrator wiring in ``aperag/indexing/`` only; the new optional kwargs default to ``None`` so call shape is backward-compat. * P1 v2: caller return-shape expectations — sync still returns ``None``; nothing observable beyond Phase 3 side effects on the vector store + suggestion table. * P2 (state binding): ``GraphIndexCompactor`` is W7-2 (merged ``c1c48429``), ``MergeCandidateDetector`` is W7-4 (merged ``0dbf9fd1``), ``VectorStoreConnector`` is the existing Wave 4 abstraction. All three already in main. * P3 (Protocol method state): no new ``LineageGraphStore`` Protocol methods needed — Phase 3 reads via existing ``get_entity`` / ``get_relation`` and writes via the ``compacted_description`` kwarg shipped in W7-1. Closes Wave 7 task #3. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

earayu

🟢 LGTM ✅ (huangheng pass-1, per spec §K.12.11) — GitHub 不允许同账号 approve；verdict = ready to merge

Phase 3 四步顺序严格 + 3-field payload + uuid5 deterministic id + snapshot-diff via name set + Wave 6 backward-compat (Phase 3 deps optional 全 None 时 short-circuit) + failures swallow lineage critical path inviolate — 全 align spec §K.12.3 + my msg=16a38734 7-invariant lock + architect msg=acbd0003 lock。hard-gate format 第六个达标 PR (mirror PR #1751/#1752/#1754/#1755/#1756 standard)。

12-invariant cross-check（task #3 scope）

#	Invariant	This PR	验证依据
1	L1 graph data 不污染 (kg.jsonl raw vs storage view 严格分层)	✅	kg.jsonl untouched；Phase 3 只 read description_parts + write derived `compacted_description` & vector points
2	L1 → L2 单向派生	✅	Compactor reads description_parts → writes compacted_description（L1 内部派生）→ embed → vector point（L2 from L1）；never reverse
3	Compactor 在 sync 末尾, vector embed 之前	✅	Phase 3 顺序 line 1428-1463：compactor (Step A) → embed/upsert (Step B lines 1434-1437) → snapshot-diff (Step C lines 1445-1461) → detector (Step D)；`test_w7_phase3_compactor_runs_before_embed` 钉
4	Vector store via `VectorStoreConnectorAdaptor`	✅	type hint 用 abstract `VectorStoreConnector` Protocol；`asyncio.to_thread(connector.upsert/delete, ...)` per architect msg=acbd0003 sync wrap pattern
5	payload `indexer="graph_entity"` filter / 3 field	✅ for entity	line 1537-1541 entity payload exact 3 字段 `{indexer, entity_name, entity_type}` 无 collection_id leak；`test_w7_phase3_writes_vector_point_with_3_field_payload` 钉 ⚠️ relation 见 minor below
6	uuid5 deterministic vector point id	✅	line 1611-1620: entity `f"graph_entity:{cid}:{name}"` / relation `f"graph_relation:{cid}:{src}->{tgt}:{type}"`；`test_w7_phase3_uuid5_id_is_deterministic` 钉 re-sync 同 content 同 id
7	snapshot-diff via lineage entity name set	✅	`_capture_entity_names` 走 `kg_entity_records ∪ store-survived-pre-sync`；line 1449 `pre_sync - post_sync = gc_entity_names`；不 call `vector_connector.scroll`；`test_w7_phase3_snapshot_diff_preserves_cross_doc_entity` 钉 cross-doc shared 不被 gc
8	alias_map orphan-persists	n/a	task #6 范畴
9	upsert 透明 alias redirect	n/a	task #6 范畴
10	DB 列长度 application-layer cap	n/a	W7-1/W7-2 范畴；task #3 不动 schema
11	候选检测仅写不自动合并 (D-3)	✅	Step D 调 `merge_detector.detect_for_sync(...)` (PR #1755 已 verified read-only)；`test_w7_phase3_merge_detector_invoked_with_affected_names` 钉 call shape
12	命名 grep-zero LightRAG	✅	`gh pr diff 1757 \| grep -E '^\+' \| grep -i lightrag` → 0 added lines

Phase 3 step ordering 严格性验证

# graph.py line 1428-1463
# Step A — Compactor
for entity in affected_entities_for_phase3:
    await self._compact_and_persist_entity(entity)  # ← 1st
# Step B — Embed + vector upsert
for entity in affected_entities_for_phase3:
    await self._upsert_entity_vector_point(entity)  # ← 2nd
for relation in affected_relations_for_phase3:
    await self._upsert_relation_vector_point(relation)
# Step C — Snapshot-diff delete
gc_entity_names = pre_sync_entity_names - post_sync_entity_names
ids = [self._entity_vector_id(name) for name in gc_entity_names]
await asyncio.to_thread(self._vector_connector.delete, ids)  # ← 3rd
# Step D — Detector
await self._merge_detector.detect_for_sync(...)  # ← 4th

完美 mirror spec §K.12.3 + invariant #3。✅

Optional deps + degradation 验证

Wave 6 现有 sync (无 Phase 3 deps) → if not (self._embedder and self._vector_connector): return short-circuit → 1083 现有 unit test 不破坏 ✅
Per-step failure swallowed: line 1525-1531 / 1545-1550 / 1558-1566 / 1578-1587 全 try/except → lineage critical path inviolate ✅
test_w7_phase3_skipped_when_deps_unwired + test_w7_phase3_compactor_failure_non_fatal + test_w7_phase3_vector_upsert_failure_non_fatal + test_w7_phase3_detector_failure_non_fatal 全 cover

4-pattern pre-check matrix（PR body paste）

P1 v1 callers + P1 v2 caller method + P2 dependency state + P3 Protocol method state 全 paste；Phase 3 deps 全 reference 已 merged commit (W7-1 c1499777 / W7-2 c1c48429 / W7-4 0dbf9fd1)。

simple-stable directive 4-guardrail

PR description 4 项全显式：no new Protocol method / smallest surface / production helpers reuse / alembic-free。

测试质量

12 new tests + 1117 total pass：

Phase 3 skip when unwired (Wave 6 backward-compat)
3-field payload exact (no collection_id leak)
uuid5 id deterministic across resyncs
Compactor → embed ordering invariant 钉
Compactor None falls back to name + raw parts
Snapshot-diff deletes gc'd vector
Cross-doc shared entity NOT deleted（Linus Torvalds 在 doc_A + doc_B；doc_A re-parse 不再提 → 不 gc → vector 保留）
Relation Phase 3 mirror entity (3-field payload, indexer="graph_relation", uuid5 format)
Detector invoked with affected names
Detector / Compactor / Vector failure non-fatal × 3

完备。

⚠️ 2 个 minor architecture observation（非阻塞，sediment 给 future task）

Observation 1: Relation payload 字段命名 overload

line 1572-1576 relation vector point payload：

payload={
    "indexer": "graph_relation",
    "entity_name": f"{relation.source}->{relation.target}",  # ← composite 复用 entity_name 字段
    "entity_type": relation.relation_type,
}

indexer="graph_relation" 区分类型，但 payload 字段 entity_name 在 relation 上下文是 "source->target" composite，不是单 entity name。读者解析时需要 indexer-aware 处理。

当前 OK：MergeCandidateDetector + GraphSearchService 都靠 Eq("indexer", "graph_entity") 严格 filter，不会把 relation hit 当 entity 处理。

建议 future cleanup（task #10 close-out 列上）：relation payload 用 distinct fields，e.g. {indexer, source, target, relation_type}，避免 entity_name 字段语义双关。

Observation 2: Relation vectors 当前 unused by task #5 search_relations

task #3 PR 写 graph_relation vector points (per spec §K.12.3 "embed entity/relation")；但 task #5 PR #1756 已 merged 实施 search_relations 走 1-hop expansion (不 query graph_relation vectors)。

结果：relation vectors 写入 store 但 search_relations 不 read，当前是 forward-compat 储备，未被消费。

spec 角度 check：spec §K.12.3 GraphSearchService 描述 "Qdrant 召回 entity & relation（payload filter indexer=...）" — 即 spec 期望 search_relations 走 vector recall。task #5 实施成 1-hop expansion 是 conservative subset（PR #1756 description 显式说 "vector store carries no per-relation vectors in Wave 7"）。

所以：task #3 写 relation vectors 是 spec-correct；task #5 search_relations 当前 incomplete 该 follow-up 升级。

建议 follow-up（不阻塞本 PR）：task #5-bis 或 task #7/#8 wiring 阶段升级 search_relations 走 graph_relation vector recall (Eq("indexer", "graph_relation")) → 充分利用 task #3 写的 relation vectors。如不升级，relation vector 写入是 storage waste；升级后实现 spec 预期的 LightRAG-style 完整召回。

可以是 task #10 close-out cleanup list 之一，或 Wave 8 优化候选。

修完会 LGTM 的清单

实际上已经可以 merge ✅。两个 minor observation 都是 sediment 候选，本 PR 内部对 spec 完整 align。

@bryce 工作非常 solid — Phase 3 4 步顺序严格 + cross-doc preserve test 是细节意识高分点（Linus 在 doc_A + doc_B，doc_A re-parse 不删 vector — 这种 multi-doc shared entity gc-correctness pin 是产品稳定性硬保证）。👍

@符炫炜 LGTM，可 merge。
@不穷推进 task #3 → done after merge；task #6 / #7 / #8 / #9 / #10 critical path 余下。

@bryce task #10 close-out cleanup list 增量：

(上轮 task #1) Wave 6 era 8 处 # -- LightRAG-style query layer 注释
(上轮 task #4) line 249 legacy name fallback
(上轮 task #5) 6 处 docstring "LightRAG-style" descriptive
(本轮 task #3) Observation 1: relation payload 字段重新设计 (entity_name/entity_type → source/target/relation_type)；Observation 2: search_relations 升级走 graph_relation vector recall（spec §K.12.3 完整对齐）

Wave 7 进度：6/10 task PR + 2 spec PR = 8 PR merged 后即可 close-out（剩 #6 / #7 / #8 / #9 / #10 = 5 PR）。

…t + LineageEntityMerger (#1758) §K.12.6 / §K.12.7 / §K.12.10b task #6 — full storage + service body for user-driven entity merge over the lineage graph. Per architect ratify msg=cf860ae4 + huangheng endorse msg=22816e0d (5-drift lock) + sentinel pick msg=22816e0d (`__curation_merge__`). What ships ---------- 1. **`aperag_lineage_entity_alias` table** (alembic ``b5d2e8f1c9a4``) — composite PK ``(collection_id, alias_name)``, ``canonical_name`` index for reverse lookup. 2. **`LineageEntityAlias` ORM** (`aperag/domains/knowledge_graph/db/ models.py`) — re-uses the curation domain's existing ``Base``. 3. **`AliasMapRepository`** (`aperag/graph_curation/alias_map.py`): - `resolve_canonical(collection_id, name)` — single-indirection read (transitive flatten keeps the table 1-deep at write time). - `upsert_alias(...)` — cycle reject (`AliasCycleError`) + transitive flatten (UPDATE + INSERT in one transaction). - `list_aliases_pointing_at(...)` for tests / admin tooling. - `purge_collection(...)` for collection teardown. 4. **`LineageGraphStoreWithAliasRedirect` decorator** (`aperag/indexing/alias_redirect_store.py`) — wraps any ``LineageGraphStore`` + ``AliasMapRepository``, intercepts ``upsert_entity_with_lineage`` / ``upsert_relation_with_lineage`` to rewrite entity names through the alias map, forwards every other Protocol method byte-for-byte. Per huangheng CR lock (Option (b), msg=93d9add1 / msg=22816e0d): zero changes to the three backend store implementations. 5. **`LineageEntityMerger`** (`aperag/graph_curation/lineage_merge.py`) — orchestrator for user-driven merge. Step ordering locked (invariant #2): alias upserts → L1 source-parts re-anchored preserving doc lineage → L1 final unified+compacted with ``__curation_merge__`` sentinel → vector upsert (3-field payload, ``uuid5`` deterministic id) → source delete (L1 + vector) last. 6. **24 unit tests**: - `test_alias_map.py` (10): cycle reject self-loop + chain cycle, transitive flatten, target flatten through chain, alias persists after canonical GC, per-collection isolation, purge. - `test_alias_redirect_store.py` (5): indexer write redirects to canonical, no-alias passthrough, both-endpoint relation redirect, single-endpoint relation redirect, decorator passthrough invariant for all 13 non-upsert Protocol methods (huangheng CR lock). - `test_lineage_merge.py` (9): empty source short-circuit, step order L1 → vector → delete, sentinel ``__curation_merge__``, Compactor kwargs locked (subject_kind/subject_label/language), source parts re-anchored preserving per-doc lineage, vector payload 3-field + uuid5 deterministic, alias cycle propagation, target alias chain flatten, target GC tolerance. §K.12 invariant cross-check --------------------------- - #1 L1 not polluted: source parts re-anchored under target preserve original `(document_id, parse_version, chunk_ids)` lineage — pinned by `test_source_parts_reanchored_preserving_doc_lineage`. - #2 L1 → L2 derivation: step ordering pinned by `test_step_order_is_l1_then_vector_then_delete` (L1 writes precede vector writes; deletes last). - #3 transparent alias redirect: pinned by `test_indexer_upsert_after_merge_redirects_to_canonical`. - #4 vector store via abstraction: ``VectorStoreConnector.upsert/delete`` via ``asyncio.to_thread``. - #5 3-field payload: pinned by `test_vector_payload_is_3_field_with_deterministic_uuid5`. - #6 uuid5 deterministic point id: same test pins ``uuid5(NAMESPACE_DNS, "graph_entity:{cid}:{name}")``. - #7 alias_map orphan persist: pinned by `test_alias_persists_after_canonical_gc`. - #9 cycle flatten + reject: pinned by 3 tests (`test_transitive_flatten_rewrites_existing_alias_rows`, `test_cycle_reject_self_loop`, `test_cycle_reject_through_existing_chain`). - #11 D-3 merge user-driven only: `LineageEntityMerger.merge_entities` is the only entry point; `MergeCandidateDetector` (PR #1755) does not import it. - #12 grep-zero LightRAG: 0 hits in new files. Scope notes ----------- - **Wiring**: `worker_factory._build_lineage_graph_store` is NOT changed in this PR. Bryce's task #3 PR (#1757) just merged adds the worker.sync() consumption path; wiring the alias-redirect decorator into `_build_lineage_graph_store` is a one-line follow-up that belongs with task #8 retrieval cutover (where the legacy merge route is also replaced) so we don't ship a partial wiring. - **REST route cutover**: legacy `KnowledgeGraphService.merge_entities` (route handler) still calls `GraphIndexService.merge_entities`. Replacing it with `LineageEntityMerger` is task #8's job (chenyexuan) — it changes no field on the response shape so cuiwenbo task #9 frontend is zero-touch.

Replaces the Wave 7 conservative 1-hop expansion in ``GraphSearchService.search_relations`` with direct vector recall filtered on ``Eq("indexer", "graph_relation")``. Task #3 (PR #1757) has been writing relation vectors all along; this is finally the consumer side, completing the LightRAG-style full recall the spec §K.12.6 expected (huangheng W8-1 sediment from task #3 PR CR msg=d4ad0259). Algorithm --------- 1. Embed query via ``embedder.embed_query``. 2. ``vector_connector.search`` with ``Eq("indexer", "graph_relation")`` filter + the same threshold/top_k knobs ``search_entities`` uses. 3. Parse each hit's payload (``entity_name="src->tgt"`` + ``entity_type=relation_type`` per task #3 writer ``aperag/indexing/graph.py:1631``) by splitting on ``->``; skip hits that don't parse cleanly. 4. Reverse-lookup full ``RelationWithLineage`` via ``asyncio.gather(*store.get_relation(...))`` (architect ratify approach (a), msg=cf860ae4 — preserves ``compose_context`` byte-parity rendering required by the task #5 invariant). 5. Drop ``None`` results (edge GC'd between sync and search). Failure paths (embedder / vector store down) swallow and return ``[]``, mirroring ``search_entities``. §K.12 invariant cross-check --------------------------- - #4 vector store via abstraction: uses ``VectorStoreConnector``, no Qdrant-specific imports. - #5 indexer filter: ``Eq("indexer", "graph_relation")`` pinned in ``test_search_relations_uses_graph_relation_filter``. - #11 D-3 read-only: never invokes any mutation method. - #12 grep-zero LightRAG: 0 hits in the new code. - All other invariants n/a (read-only path, no schema changes). Test coverage ------------- 10 new/replaced cases in ``test_graph_search_service.py``: - empty query / zero-topk short-circuit (no embed, no search) - ``Eq("indexer","graph_relation")`` filter + threshold + top_k pinning - payload parse + ``store.get_relation`` reverse lookup happy path - hit-order preservation + payload-key dedup - payload skip cases: missing payload, missing arrow, missing entity_type, empty source, empty target - GC tolerance: edge deleted between sync and search - embedder failure swallowed - vector store failure swallowed Plus 14 pre-existing ``search_entities`` / ``get_subgraph`` / ``compose_context`` tests retained — total 24/24 pass.

earayu commented Apr 27, 2026

View reviewed changes

earayu merged commit 37bdb73 into main Apr 27, 2026
10 checks passed

earayu deleted the bryce/wave7-task3-sync-extension branch April 27, 2026 18:28

earayu mentioned this pull request Apr 27, 2026

feat(Wave 7 #6): alias_map + transparent redirect + LineageEntityMerger #1758

Merged

earayu mentioned this pull request Apr 28, 2026

feat(Wave 8 W8-1 #12): activate search_relations vector recall #1767

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(graph Wave 7 W7-3): GraphModalityWorker.sync() Phase 3#1757

feat(graph Wave 7 W7-3): GraphModalityWorker.sync() Phase 3#1757
earayu merged 1 commit into
mainfrom
bryce/wave7-task3-sync-extension

earayu commented Apr 27, 2026

Uh oh!

earayu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

earayu commented Apr 27, 2026

Summary

Phase 3 step ordering (per spec §K.12.3 + huangheng msg=16a38734)

Optional dependencies + degradation

worker_factory wiring

§K.12 invariant cross-check (per architect msg=fcf580a6 directive)

4-pattern pre-check matrix

simple-stable directive 4 guardrails

Test plan

Spec / decision references

Uh oh!

earayu left a comment

Choose a reason for hiding this comment

12-invariant cross-check（task #3 scope）

Phase 3 step ordering 严格性验证

Optional deps + degradation 验证

4-pattern pre-check matrix（PR body paste）

simple-stable directive 4-guardrail

测试质量

⚠️ 2 个 minor architecture observation（非阻塞，sediment 给 future task）

Observation 1: Relation payload 字段命名 overload

Observation 2: Relation vectors 当前 unused by task #5 search_relations

修完会 LGTM 的清单

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant