Skip to content

feat(graph_curation): task #61 P2-S1+S2 — batch alias resolution#1950

Merged
earayu merged 1 commit into
mainfrom
bryce/task-88-p2-batch-alias-resolution
Apr 30, 2026
Merged

feat(graph_curation): task #61 P2-S1+S2 — batch alias resolution#1950
earayu merged 1 commit into
mainfrom
bryce/task-88-p2-batch-alias-resolution

Conversation

@earayu
Copy link
Copy Markdown
Collaborator

@earayu earayu commented Apr 30, 2026

Summary

Closes task #88 per PM @不穷 dispatch (msg=8f130f25). Implements task #61 spec v1 § 2.4 P2-S1 + P2-S2: batch alias resolution to fix the Singapore PG connection saturation observed by Planetegg (msg=4043adf4 SRE diagnostic).

Changes

  • AliasMapRepository.resolve_canonical_many (new) — single SQL SELECT ... WHERE alias_name IN (...) roundtrip; total PG connections per call: 1 regardless of seed count. Caller order preserved (dict insertion-order semantics).
  • LineageGraphStoreWithAliasRedirect.expand_neighbors_n_hops — rewritten to use the batch primitive. asyncio.gather per-name fan-out removed; import asyncio no longer needed.
  • Test stub _FakeAliasRepo extended with resolve_canonical_many + call-count tracking so tests can pin the call-graph.

Background

Pre-fix, expand_neighbors_n_hops checked out one PG connection per anchor name via asyncio.gather. At spec-cap seed cardinalities (GET /graphs?max_nodes=1000 → 2000 seeds; /graphs/hybrid → up to 5000 seeds) the PG connection pool saturates and unrelated API requests stall.

Tests

  • tests/unit_test/graph_curation/test_alias_map.py (7 new): unmapped names self-resolve / mixed alias+canonical / input dedupe / empty input / empty string / per-collection isolation / 2000-seed spec-cap correctness.
  • tests/unit_test/indexing/test_alias_redirect_store.py (2 new): call-graph gate (exactly 1 resolve_canonical_many call + 0 resolve_canonical calls) at 5-seed and 2000-seed spec-cap.

Local: uv run pytest tests/unit_test/graph_curation/ tests/unit_test/indexing/test_alias_redirect_store.py56 passed.

Test plan

  • Local unit tests pass (56)
  • ruff check + ruff format --check clean
  • CI lint-and-unit + e2e green
  • @符炫炜 architect ratify
  • @weston architecture CR (call-graph gate sufficient)
  • @huangzhangshu testing CR (spec-cap quantified)
  • @Planetegg SRE acknowledge (production root cause fix verify)

Spec / scope alignment

Follow-up (NOT in this PR)

P3 cross-cutting: other LineageGraphStore consumers (e.g. some GraphCurationService internals) that still invoke per-name alias resolution should migrate to the batch primitive — gated on production data showing residual fan-out is a real bottleneck.

🤖 Generated with Claude Code

…seed PG connection saturation fix

Closes task #88 per PM @不穷 dispatch (msg=8f130f25). Implements task
#61 spec v1 § 2.4 P2-S1 + Planetegg P2-HIGH (msg=db7fb085 +
msg=1314ac59) + Singapore SRE diagnostic (Planetegg msg=4043adf4)
batch alias resolution.

Background
----------
``LineageGraphStoreWithAliasRedirect.expand_neighbors_n_hops`` is on
the ``GET /api/v2/collections/{id}/graphs`` and ``/graphs/hybrid``
read paths. Pre-fix, it called ``AliasMapRepository.resolve_canonical``
once per anchor name via ``asyncio.gather``, which checks out one
PG connection per name. Spec § 2.4 P2-S1 quantification:

* ``GET /graphs?max_nodes=1000`` → up to **2 × max_nodes = 2000**
  seeds.
* ``GET /graphs/hybrid``: default 1000 / max 5000 seeds.

At those cardinalities the PG connection pool saturates, observed
in Singapore production (Planetegg msg=4043adf4 SRE diagnostic).

Changes
-------
* ``aperag/graph_curation/alias_map.py``: new
  :meth:`AliasMapRepository.resolve_canonical_many` batch primitive.
  Single SQL ``SELECT alias_name, canonical_name FROM
  aperag_lineage_entity_alias WHERE collection_id=? AND alias_name
  IN (...)`` reads all matching rows in one shot. Names absent from
  the result set fall back to themselves (mirrors single-name
  ``resolve_canonical`` semantics). Empty / falsy names short-
  circuit without an SQL lookup. Total connections checked out: **1**
  per call regardless of seed count. Caller order is preserved on
  the dict iteration order (insertion order semantics).
* ``aperag/indexing/alias_redirect_store.py``: rewrite
  ``LineageGraphStoreWithAliasRedirect.expand_neighbors_n_hops`` to
  use the batch primitive. ``asyncio.gather`` per-name fan-out gone;
  ``import asyncio`` no longer needed at module top-level.
* Test stub ``_FakeAliasRepo`` in
  ``tests/unit_test/indexing/test_alias_redirect_store.py``: now
  implements both ``resolve_canonical`` (single, used by
  upsert/get/delete redirect paths) and ``resolve_canonical_many``
  (batch, used by expand) + tracks call counts so tests can pin the
  call-graph (i.e. expand path goes through batch primitive exactly
  once).

Tests
-----
* ``tests/unit_test/graph_curation/test_alias_map.py`` (7 new):
  - ``test_resolve_canonical_many_returns_self_for_unmapped_names``
  - ``test_resolve_canonical_many_mixed_alias_and_canonical``
  - ``test_resolve_canonical_many_dedupes_input``
  - ``test_resolve_canonical_many_empty_input``
  - ``test_resolve_canonical_many_handles_empty_string``
  - ``test_resolve_canonical_many_per_collection_isolation``
  - ``test_resolve_canonical_many_large_seed_cap`` (2000-name spec
    quantification — pinned correctness at the spec-cap so a future
    regression that re-introduces per-name fan-out either times out
    or breaks the result shape).
* ``tests/unit_test/indexing/test_alias_redirect_store.py`` (2 new):
  - ``test_expand_neighbors_uses_batch_alias_resolution`` —
    call-graph gate: exactly 1 ``resolve_canonical_many`` call,
    zero ``resolve_canonical`` calls, regardless of seed count. A
    regression that re-introduces the gather pattern is caught
    immediately.
  - ``test_expand_neighbors_large_seed_cap_uses_single_batch_call``
    — 2000-seed spec-cap pinned at the call-graph level.

Local: ``uv run pytest tests/unit_test/graph_curation/
tests/unit_test/indexing/test_alias_redirect_store.py`` →
**56 passed, 1 warning**.

Spec / scope alignment
----------------------
* task #61 spec v1 § 2.4 P2-S1 — batch resolve primitive ✅
* task #61 spec v1 § 2.4 P2-S2 — ``expand_neighbors_n_hops`` seed
  cap test ✅
* Lesson #17 backend 收敛 contract — single primitive replaces N-
  parallel fan-out at the same caller, no FE / API changes
  required ✅
* Lesson #18 mechanical gate codification — call-graph assertion in
  the redirect-store test is the mechanical gate (caught by CI on
  any future regression that bypasses the batch primitive) ✅

Follow-ups (NOT in this PR)
---------------------------
* P3 cross-cutting concern: every ``LineageGraphStore`` consumer
  that currently invokes the alias path per-name (e.g. some
  GraphCurationService internals) should migrate to the batch
  primitive — independent task gated on production data showing
  the residual N-fan-out is a real bottleneck.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@earayu
Copy link
Copy Markdown
Collaborator Author

earayu commented Apr 30, 2026

Testing CR LGTM ✅

I reviewed the batch alias-resolution path against task #61 P2-S1/S2 and the Singapore PG connection fan-out failure mode.

What I checked:

  • AliasMapRepository.resolve_canonical_many() resolves a batch through one SELECT ... alias_name IN (...) and preserves the single-name semantics: unmapped names resolve to themselves, empty input is safe, empty string is preserved, duplicate input collapses by dict semantics, and collection isolation is preserved.
  • LineageGraphStoreWithAliasRedirect.expand_neighbors_n_hops() no longer imports/uses asyncio.gather; it calls resolve_canonical_many() once, then dedupes canonical anchors before calling the inner graph store.
  • The call-graph tests catch the regression we care about: reintroducing per-name resolve_canonical() calls or calling the batch primitive once per seed would fail via call counters.
  • The 2000-seed test covers the documented /graphs?max_nodes=1000 fan-out shape and keeps the high-cardinality path exercised.

Local validation:

uv run --extra test pytest tests/unit_test/graph_curation/test_alias_map.py tests/unit_test/indexing/test_alias_redirect_store.py -q
# 35 passed

uv run ruff check aperag/graph_curation/alias_map.py aperag/indexing/alias_redirect_store.py tests/unit_test/graph_curation/test_alias_map.py tests/unit_test/indexing/test_alias_redirect_store.py
# All checks passed

git diff --check origin/main...HEAD
# passed

No testing blocker from my lane. CI green / accepted flake waiver + remaining architecture/SRE checks are enough to merge.

@earayu earayu merged commit eb4c4f3 into main Apr 30, 2026
21 of 22 checks passed
@earayu earayu deleted the bryce/task-88-p2-batch-alias-resolution branch April 30, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant