Skip to content

Commit 0cc3056

Browse files
earayuclaude
andauthored
feat(graph): close out Wave 7 — delete legacy + list_entities Protocol (W7-10) (#1765)
Final close-out of Wave 7 §K.12: deletes the legacy ``aperag/domains/knowledge_graph/graphindex/`` package, drops the legacy ``graphindex_*`` tables via alembic, and adds the one new Protocol method (``LineageGraphStore.list_entities``) the architect ratified to replace the legacy ``list_entities_for_curation`` / ``get_knowledge_graph`` enumerate-by-label paths the legacy package owned. This commit also folds in the grep-zero verification helper (formerly PR #1763 by 冬柏) with ``_TASK10_LEGACY_DELETED`` flipped to True so the 10-pattern grep-zero contract becomes an active gate in this same PR (architect-preferred atomic close-out, per simple-stable directive #1 — fewer PRs, single CI run). ## Scope (per architect msg=28afe6ab + 4-question Q1-Q4 ratify msg=838d57c3 / msg=f3216dfc) 1. **NEW Protocol method ``LineageGraphStore.list_entities``** (``label / limit / offset`` kwargs) + ``EntityWithLineage`` rows sorted by ``name`` for deterministic pagination. InMemory reference + Postgres / Neo4j / Nebula production backends (mirror ``query_entities_by_keyword`` W6 #33 chunk 2 pattern). 2. **``aperag/domains/knowledge_graph/service.py:get_knowledge_graph`` cutover** — 2-step pipeline replacing the legacy ``GraphIndexService.get_knowledge_graph``: 1. ``store.list_entities(label, limit=query_max_nodes)`` — label-filtered entity list (primary work). 2. ``GraphSearchService.get_subgraph(names, hops=max_depth)`` — optional edge expansion when ``max_depth > 0``. Each layer keeps clean semantics (W7-5 ``get_subgraph`` is anchor-expansion, not label-filter; using it as primary entry would force a wrapper that re-enumerated entities just to compute anchors — drift caught in architect own-up msg=838d57c3 → revise to ``list_entities`` primary). 3. **``CurationEntity`` adapter** (new ``aperag/graph_curation/dto.py``) replacing the legacy ``Entity`` DTO. ``from_lineage(EntityWithLineage)`` constructor adapts the storage view into the shape ``build_candidate_pairs`` / ``_pair_score`` / ``_jaccard`` / ``entity_snapshot`` already accept — production-validated algorithm keeps its 0-signature change (architect Q2 ratify, simple-stable directive #3). 4. **``aperag/graph_curation/service.py`` cutover** — ``accept_suggestion`` and ``generate_run`` migrated off the legacy ``GraphIndexService`` bundle: - ``accept_suggestion`` delegates to ``LineageEntityMerger.merge_entities`` (W7-6, PR #1758) the same way the W7-8 ``GraphService.merge_entities`` route already does; both surfaces converge on a single merge path so user-merge- from-curation vs user-merge-from-graph-view never diverge. - ``generate_run`` signature now takes ``store`` / ``vector_connector`` / ``embedder`` / ``llm`` (architect Q3 ratify) and uses two new helpers: - ``_enumerate_curation_entities`` — paged ``list_entities`` loop adapting each row to ``CurationEntity``. - ``_fetch_shadow_neighbours`` — ANN search via ``VectorStoreConnector`` with the Wave 7 W7-3 3-field payload (``Eq("indexer", "graph_entity")`` filter) replacing the legacy ``find_entity_shadow_neighbors`` that filtered on the deleted ``entity_id`` payload field. 5. **``aperag/graph_curation/integration.py`` rewrite** — ``run_graph_curation_run_sync`` resolves the four Wave 7 deps via the same ``worker_factory`` factories the indexer / curation merger use, with a ``_SyncEmbedderShim`` adapter mirroring the one in ``worker_factory`` for the merge candidate detector. 6. **``build_collection_llm_callable`` relocation** — production call sites (``worker_factory._build_collection_graph_compactor`` / ``_build_collection_summarizer``, ``aperag/graph_curation/lineage_merge.py:build_lineage_entity_merger_for``, ``aperag/graph_curation/integration.py``) all import from the canonical home ``aperag/indexing/llm.py`` (Q3 ratify; the file already exists, the legacy package was just re-exporting). 7. **Legacy package + tests deleted**: - ``aperag/domains/knowledge_graph/graphindex/`` (entire package) - ``tests/unit_test/graphindex/`` (entire dir) - ``tests/integration/compat/test_graph_compat.py`` (replaced by ``test_lineage_graph_compat.py`` in W7-1) 8. **Alembic drop migration** ``c7e3a1b9f4d6`` removes ``graphindex_chunks`` / ``graphindex_edges`` / ``graphindex_nodes`` plus their indexes / unique constraints. Hard-cut policy per spec §K.12.12: legacy graph indexing was gated behind ``enable_knowledge_graph=False`` until Wave 4, then never wired into the new pipeline (``run_index_document_sync`` had 0 production callers since Wave 4 hard-cut), so the tables are empty across every deployment. Downgrade recreates empty schema. 9. **Test rewrites** — three test files that consumed the legacy ``Entity`` DTO got updated: - ``tests/unit_test/graph_curation/test_service.py`` / ``test_candidate_generation.py`` — switched to ``CurationEntity as Entity``. - ``tests/unit_test/service/test_search_graph_contract.py`` — rewritten to consume ``EntityWithLineage`` / ``RelationWithLineage`` via the new ``_adapt_lineage_entities`` / ``_adapt_lineage_relations`` adapters (the W7-1 lineage-side replacements for the deleted ``_adapt_nodes`` / ``_adapt_edges`` helpers). - 7 new InMemory ``list_entities`` unit tests in ``tests/unit_test/indexing/test_t1_2_graph.py`` covering empty-collection, sort, label filter, pagination, zero-or-negative limit, negative offset, compacted forward-compat. ## §K.12 invariant cross-check | # | Invariant | This PR | |---|-----------|---------| | 1 | L1 graph data not polluted | ✅ ``list_entities`` is read-only; storage view → adapter projection only | | 2 | L1 → L2 single-direction derive | ✅ no derived writes | | 3 | Compactor before vector embed | N/A — read path | | 4 | Vector store via Adaptor | ✅ ``_fetch_shadow_neighbours`` uses ``VectorStoreConnector`` only | | 5 | payload indexer filter | ✅ ``Eq("indexer","graph_entity")`` filter; no legacy ``entity_id`` payload reference | | 6 | uuid5 vector point id | N/A — read path | | 7 | snapshot-diff lineage name set | N/A — read path | | 8 | alias_map persist orphan | ✅ unaffected; alias_map is W7-6 owned | | 9 | upsert_entity alias redirect | ✅ unaffected; decorator pattern preserved (curation flow uses inner store directly per architect msg=cf860ae4) | | 10 | DB column length application-cap | ✅ no schema CHECK constraints introduced | | 11 | candidate detection write-only | ✅ ``MergeCandidateDetector`` unchanged; ``generate_run`` uses same write boundary | | 12 | grep-zero LightRAG | ✅ `rg "from aperag.domains.knowledge_graph.graphindex" aperag/ tests/` returns only the assertion in ``test_graph_search_migration.py:55``. ``rg "graphindex_*"`` against ``aperag/`` is 0 outside the alembic migration history. The 8 Wave 6-era ``# -- LightRAG-style query layer`` comments + W7-4 line 249 fallback + W7-5 docstrings remain (architect msg=3fe200be — they are descriptive comments referencing design heritage, removable in Wave 8 cleanup if desired) | ## 4-pattern pre-check matrix (paste from PR thread reply) * **P1 v1** — ``rg "from aperag.domains.knowledge_graph.graphindex" aperag/ tests/`` produced 6 production / 11 import sites pre-PR; post-PR matches only ``test_graph_search_migration.py:55`` (the assertion-as-test that itself proves the migration is complete). * **P1 v2** — every method on the legacy ``GraphIndexService`` that a non-legacy caller used is now accounted for: ``merge_entities`` → W7-8, ``get_knowledge_graph`` → 2-step pipeline above, ``list_entities_for_curation`` → ``LineageGraphStore.list_entities`` + ``CurationEntity.from_lineage``, ``find_entity_shadow_neighbors`` → ``_fetch_shadow_neighbours`` via ``VectorStoreConnector``, ``list_labels`` → already migrated W6 #40 + W7-1 ``compacted_description`` field. * **P2** — alembic ``c7e3a1b9f4d6`` drops the legacy tables, alembic env.py loses the legacy ``graphindex.models`` import (replaced with explanatory comment); ``aperag_lineage_*`` tables stay intact. * **P3** — single Protocol method addition (``list_entities``) — implemented across InMemory + 3 production backends + 7 unit tests. ## simple-stable 4-guardrail | Guardrail | Status | |---|---| | #1 不无限扩范围 | ✅ ``list_entities`` is base capability mirroring `delete_entity` / `query_entities_by_keyword`; no new endpoints, no new schema tables | | #2 尽快上线 | ✅ single PR closes Wave 7; all 11 prior task PRs already merged | | #3 简单稳定 | ✅ adapter pattern preserves production-validated `build_candidate_pairs`; ``list_entities`` follows existing pagination idiom | | #4 私有化部署免维护 | ✅ alembic auto-drops legacy tables; no operator config; ``list_entities`` uses the same backend factories the rest of Wave 7 wires | ## Test plan - [x] All 1142 unit tests pass (``uv run pytest tests/unit_test/``) - [x] ``alembic upgrade head --sql`` generates the expected ``DROP INDEX`` / ``DROP TABLE`` cascade - [x] ``alembic heads`` resolves to single head ``c7e3a1b9f4d6`` - [x] ``ruff format --check`` / ``ruff check`` clean on touched files - [ ] CI compat-graph + e2e-http stages — both gated post-merge - [ ] Pair with 冬柏's grep-zero helper PR #1763 — flip ``_TASK10_LEGACY_DELETED=True`` once both PRs merge Closes Wave 7. Next: architect final review per spec §K.12.12. ## Fold-in: grep-zero verification helper (formerly PR #1763) Per architect ratify + 冬柏 authorization, PR #1763 is folded into this commit instead of shipping as a separate PR. Contents: * **``tests/integration/test_w7_grep_zero_legacy_graphindex.py``** (NEW, 343 LOC) — 10 ripgrep contracts, one per legacy pattern, flipped to active gate (``_TASK10_LEGACY_DELETED = True``). Patterns cover: 1. ``from aperag.domains.knowledge_graph.graphindex`` imports 2. ``graphindex_(nodes|edges)`` table names (excludes the migration script itself) 3. bare ``import aperag.domains.knowledge_graph.graphindex`` 4. ``_sync_entity_relation_vectors`` (W7-3 superseded) 5. ``_compact_oversized_descriptions`` (W7-2 superseded) 6. ``_summarize_description`` (W7-2 superseded) 7. ``_fallback_truncate`` (renamed-and-kept on new ``GraphIndexCompactor`` — exception list documents the new home) 8. ``_delete_removed_shadow_vectors`` (W7-3 superseded) 9. ``GraphSearchContract.query_context`` (port name kept on the retrieval Protocol; legacy ``GraphIndexService.query_context`` historical-context comments allow-listed) 10. ``GraphIndexService.merge_entities`` (legacy class binding; historical-context comments in lineage_merge.py + test_wave7_task8_wiring.py allow-listed) * **Self-exclusion**: ``_rg_count`` always excludes this helper file itself (every pattern is named in the docstring + assertion call site, which would otherwise self-trigger). * **``aperag/mcp/server.py``** — bundled ruff-format/import-sort drift fix (post-#1762/#1759 leftover that pre-commit catches). Kept here so the close-out PR lands cleanly through ``make lint``. ## Test plan (final) - [x] 1141 unit tests pass (``uv run pytest tests/unit_test/``) - [x] 10/10 grep-zero integration tests pass (``uv run pytest tests/integration/test_w7_grep_zero_legacy_graphindex.py``) - [x] ``alembic upgrade head --sql`` generates the expected ``DROP INDEX`` / ``DROP TABLE`` cascade - [x] ``ruff format --check`` / ``ruff check`` clean on touched files - [ ] CI e2e-http-smoke + e2e-http-provider — gated post-merge Closes Wave 7. Next: architect final review per spec §K.12.12. Co-authored-by: 冬柏 <noreply@anthropic.com>
1 parent 90b3a4a commit 0cc3056

46 files changed

Lines changed: 1300 additions & 8261 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

aperag/domains/knowledge_graph/graphindex/__init__.py

Lines changed: 0 additions & 53 deletions
This file was deleted.

aperag/domains/knowledge_graph/graphindex/config.py

Lines changed: 0 additions & 160 deletions
This file was deleted.

0 commit comments

Comments
 (0)