Commit 0de5f13
feat(graph): close out Wave 7 — delete legacy + list_entities Protocol (W7-10)
Final close-out of Wave 7 §K.12: deletes the legacy
``aperag/domains/knowledge_graph/graphindex/`` package, drops the
legacy ``graphindex_*`` tables via alembic, and adds the one new
Protocol method (``LineageGraphStore.list_entities``) the architect
ratified to replace the legacy ``list_entities_for_curation`` /
``get_knowledge_graph`` enumerate-by-label paths the legacy package
owned.
This commit also folds in the grep-zero verification helper (formerly
PR #1763 by 冬柏) with ``_TASK10_LEGACY_DELETED`` flipped to True so the
10-pattern grep-zero contract becomes an active gate in this same PR
(architect-preferred atomic close-out, per simple-stable directive
#1 — fewer PRs, single CI run).
## Scope (per architect msg=28afe6ab + 4-question Q1-Q4 ratify msg=838d57c3 / msg=f3216dfc)
1. **NEW Protocol method ``LineageGraphStore.list_entities``**
(``label / limit / offset`` kwargs) + ``EntityWithLineage`` rows
sorted by ``name`` for deterministic pagination. InMemory
reference + Postgres / Neo4j / Nebula production backends
(mirror ``query_entities_by_keyword`` W6 #33 chunk 2 pattern).
2. **``aperag/domains/knowledge_graph/service.py:get_knowledge_graph``
cutover** — 2-step pipeline replacing the legacy
``GraphIndexService.get_knowledge_graph``:
1. ``store.list_entities(label, limit=query_max_nodes)`` —
label-filtered entity list (primary work).
2. ``GraphSearchService.get_subgraph(names, hops=max_depth)`` —
optional edge expansion when ``max_depth > 0``.
Each layer keeps clean semantics (W7-5 ``get_subgraph`` is
anchor-expansion, not label-filter; using it as primary entry
would force a wrapper that re-enumerated entities just to compute
anchors — drift caught in architect own-up msg=838d57c3 → revise
to ``list_entities`` primary).
3. **``CurationEntity`` adapter** (new
``aperag/graph_curation/dto.py``) replacing the legacy ``Entity``
DTO. ``from_lineage(EntityWithLineage)`` constructor adapts the
storage view into the shape ``build_candidate_pairs`` /
``_pair_score`` / ``_jaccard`` / ``entity_snapshot`` already
accept — production-validated algorithm keeps its 0-signature
change (architect Q2 ratify, simple-stable directive #3).
4. **``aperag/graph_curation/service.py`` cutover** —
``accept_suggestion`` and ``generate_run`` migrated off the
legacy ``GraphIndexService`` bundle:
- ``accept_suggestion`` delegates to
``LineageEntityMerger.merge_entities`` (W7-6, PR #1758) the same
way the W7-8 ``GraphService.merge_entities`` route already does;
both surfaces converge on a single merge path so user-merge-
from-curation vs user-merge-from-graph-view never diverge.
- ``generate_run`` signature now takes ``store`` /
``vector_connector`` / ``embedder`` / ``llm`` (architect Q3
ratify) and uses two new helpers:
- ``_enumerate_curation_entities`` — paged ``list_entities``
loop adapting each row to ``CurationEntity``.
- ``_fetch_shadow_neighbours`` — ANN search via
``VectorStoreConnector`` with the Wave 7 W7-3 3-field payload
(``Eq("indexer", "graph_entity")`` filter) replacing the
legacy ``find_entity_shadow_neighbors`` that filtered on the
deleted ``entity_id`` payload field.
5. **``aperag/graph_curation/integration.py`` rewrite** —
``run_graph_curation_run_sync`` resolves the four Wave 7 deps
via the same ``worker_factory`` factories the indexer / curation
merger use, with a ``_SyncEmbedderShim`` adapter mirroring the
one in ``worker_factory`` for the merge candidate detector.
6. **``build_collection_llm_callable`` relocation** — production
call sites (``worker_factory._build_collection_graph_compactor``
/ ``_build_collection_summarizer``,
``aperag/graph_curation/lineage_merge.py:build_lineage_entity_merger_for``,
``aperag/graph_curation/integration.py``) all import from the
canonical home ``aperag/indexing/llm.py`` (Q3 ratify; the file
already exists, the legacy package was just re-exporting).
7. **Legacy package + tests deleted**:
- ``aperag/domains/knowledge_graph/graphindex/`` (entire package)
- ``tests/unit_test/graphindex/`` (entire dir)
- ``tests/integration/compat/test_graph_compat.py`` (replaced
by ``test_lineage_graph_compat.py`` in W7-1)
8. **Alembic drop migration** ``c7e3a1b9f4d6`` removes
``graphindex_chunks`` / ``graphindex_edges`` / ``graphindex_nodes``
plus their indexes / unique constraints. Hard-cut policy per
spec §K.12.12: legacy graph indexing was gated behind
``enable_knowledge_graph=False`` until Wave 4, then never wired
into the new pipeline (``run_index_document_sync`` had 0
production callers since Wave 4 hard-cut), so the tables are
empty across every deployment. Downgrade recreates empty schema.
9. **Test rewrites** — three test files that consumed the legacy
``Entity`` DTO got updated:
- ``tests/unit_test/graph_curation/test_service.py`` /
``test_candidate_generation.py`` — switched to
``CurationEntity as Entity``.
- ``tests/unit_test/service/test_search_graph_contract.py`` —
rewritten to consume ``EntityWithLineage`` /
``RelationWithLineage`` via the new
``_adapt_lineage_entities`` / ``_adapt_lineage_relations``
adapters (the W7-1 lineage-side replacements for the deleted
``_adapt_nodes`` / ``_adapt_edges`` helpers).
- 7 new InMemory ``list_entities`` unit tests in
``tests/unit_test/indexing/test_t1_2_graph.py`` covering
empty-collection, sort, label filter, pagination,
zero-or-negative limit, negative offset, compacted
forward-compat.
## §K.12 invariant cross-check
| # | Invariant | This PR |
|---|-----------|---------|
| 1 | L1 graph data not polluted | ✅ ``list_entities`` is read-only; storage view → adapter projection only |
| 2 | L1 → L2 single-direction derive | ✅ no derived writes |
| 3 | Compactor before vector embed | N/A — read path |
| 4 | Vector store via Adaptor | ✅ ``_fetch_shadow_neighbours`` uses ``VectorStoreConnector`` only |
| 5 | payload indexer filter | ✅ ``Eq("indexer","graph_entity")`` filter; no legacy ``entity_id`` payload reference |
| 6 | uuid5 vector point id | N/A — read path |
| 7 | snapshot-diff lineage name set | N/A — read path |
| 8 | alias_map persist orphan | ✅ unaffected; alias_map is W7-6 owned |
| 9 | upsert_entity alias redirect | ✅ unaffected; decorator pattern preserved (curation flow uses inner store directly per architect msg=cf860ae4) |
| 10 | DB column length application-cap | ✅ no schema CHECK constraints introduced |
| 11 | candidate detection write-only | ✅ ``MergeCandidateDetector`` unchanged; ``generate_run`` uses same write boundary |
| 12 | grep-zero LightRAG | ✅ `rg "from aperag.domains.knowledge_graph.graphindex" aperag/ tests/` returns only the assertion in ``test_graph_search_migration.py:55``. ``rg "graphindex_*"`` against ``aperag/`` is 0 outside the alembic migration history. The 8 Wave 6-era ``# -- LightRAG-style query layer`` comments + W7-4 line 249 fallback + W7-5 docstrings remain (architect msg=3fe200be — they are descriptive comments referencing design heritage, removable in Wave 8 cleanup if desired) |
## 4-pattern pre-check matrix (paste from PR thread reply)
* **P1 v1** — ``rg "from aperag.domains.knowledge_graph.graphindex" aperag/ tests/`` produced 6 production / 11 import sites pre-PR; post-PR matches only ``test_graph_search_migration.py:55`` (the assertion-as-test that itself proves the migration is complete).
* **P1 v2** — every method on the legacy ``GraphIndexService`` that a non-legacy caller used is now accounted for: ``merge_entities`` → W7-8, ``get_knowledge_graph`` → 2-step pipeline above, ``list_entities_for_curation`` → ``LineageGraphStore.list_entities`` + ``CurationEntity.from_lineage``, ``find_entity_shadow_neighbors`` → ``_fetch_shadow_neighbours`` via ``VectorStoreConnector``, ``list_labels`` → already migrated W6 #40 + W7-1 ``compacted_description`` field.
* **P2** — alembic ``c7e3a1b9f4d6`` drops the legacy tables, alembic env.py loses the legacy ``graphindex.models`` import (replaced with explanatory comment); ``aperag_lineage_*`` tables stay intact.
* **P3** — single Protocol method addition (``list_entities``) — implemented across InMemory + 3 production backends + 7 unit tests.
## simple-stable 4-guardrail
| Guardrail | Status |
|---|---|
| #1 不无限扩范围 | ✅ ``list_entities`` is base capability mirroring `delete_entity` / `query_entities_by_keyword`; no new endpoints, no new schema tables |
| #2 尽快上线 | ✅ single PR closes Wave 7; all 11 prior task PRs already merged |
| #3 简单稳定 | ✅ adapter pattern preserves production-validated `build_candidate_pairs`; ``list_entities`` follows existing pagination idiom |
| #4 私有化部署免维护 | ✅ alembic auto-drops legacy tables; no operator config; ``list_entities`` uses the same backend factories the rest of Wave 7 wires |
## Test plan
- [x] All 1142 unit tests pass (``uv run pytest tests/unit_test/``)
- [x] ``alembic upgrade head --sql`` generates the expected
``DROP INDEX`` / ``DROP TABLE`` cascade
- [x] ``alembic heads`` resolves to single head ``c7e3a1b9f4d6``
- [x] ``ruff format --check`` / ``ruff check`` clean on touched files
- [ ] CI compat-graph + e2e-http stages — both gated post-merge
- [ ] Pair with 冬柏's grep-zero helper PR #1763 — flip
``_TASK10_LEGACY_DELETED=True`` once both PRs merge
Closes Wave 7. Next: architect final review per spec §K.12.12.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
## Fold-in: grep-zero verification helper (formerly PR #1763)
Per architect ratify + 冬柏 authorization, PR #1763 is folded into
this commit instead of shipping as a separate PR. Contents:
* **``tests/integration/test_w7_grep_zero_legacy_graphindex.py``**
(NEW, 343 LOC) — 10 ripgrep contracts, one per legacy pattern,
flipped to active gate (``_TASK10_LEGACY_DELETED = True``).
Patterns cover:
1. ``from aperag.domains.knowledge_graph.graphindex`` imports
2. ``graphindex_(nodes|edges)`` table names (excludes the
migration script itself)
3. bare ``import aperag.domains.knowledge_graph.graphindex``
4. ``_sync_entity_relation_vectors`` (W7-3 superseded)
5. ``_compact_oversized_descriptions`` (W7-2 superseded)
6. ``_summarize_description`` (W7-2 superseded)
7. ``_fallback_truncate`` (renamed-and-kept on new
``GraphIndexCompactor`` — exception list documents the new home)
8. ``_delete_removed_shadow_vectors`` (W7-3 superseded)
9. ``GraphSearchContract.query_context`` (port name kept on the
retrieval Protocol; legacy ``GraphIndexService.query_context``
historical-context comments allow-listed)
10. ``GraphIndexService.merge_entities`` (legacy class binding;
historical-context comments in lineage_merge.py +
test_wave7_task8_wiring.py allow-listed)
* **Self-exclusion**: ``_rg_count`` always excludes this helper file
itself (every pattern is named in the docstring + assertion call
site, which would otherwise self-trigger).
* **``aperag/mcp/server.py``** — bundled ruff-format/import-sort
drift fix (post-#1762/#1759 leftover that pre-commit catches).
Kept here so the close-out PR lands cleanly through ``make lint``.
## Test plan (final)
- [x] 1141 unit tests pass (``uv run pytest tests/unit_test/``)
- [x] 10/10 grep-zero integration tests pass
(``uv run pytest tests/integration/test_w7_grep_zero_legacy_graphindex.py``)
- [x] ``alembic upgrade head --sql`` generates the expected
``DROP INDEX`` / ``DROP TABLE`` cascade
- [x] ``ruff format --check`` / ``ruff check`` clean on touched files
- [ ] CI e2e-http-smoke + e2e-http-provider — gated post-merge
Closes Wave 7. Next: architect final review per spec §K.12.12.
Co-Authored-By: 冬柏 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent 985b7d5 commit 0de5f13
49 files changed
Lines changed: 1309 additions & 8282 deletions
File tree
- aperag
- domains/knowledge_graph
- graphindex
- engine
- storage
- graph_curation
- indexing
- graph_storage
- mcp
- migration
- versions
- tests
- integration
- compat
- unit_test
- graph_curation
- graphindex
- indexing
- mcp
- service
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
This file was deleted.
This file was deleted.
0 commit comments