You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* refactor(task-31-a3): description-free graph_curation 7 call sites
Wave 5 description-NULL invariant (task #31 spec § 3.1.5): graph
extractor stopped emitting `description` text post Wave 5 task #5
(facts/vectors split). The dedup detection / scoring / snapshot /
accept-apply paths still read `entity.description` /
`compacted_description` / `description_parts` and would either
silently degrade scoring (always-empty bag-of-tokens) or leak stale
fragments from pre-Wave-5 rows into reviewer-facing suggestions.
Fix the 6 detector / snapshot call sites + 1 apply path enumerated
in the spec, plus 1 service-layer helper surfaced by the boundary
test grep gate:
1. candidate_generation.py:38 entity_snapshot — drop description
2. candidate_generation.py:179 _lexical_signals — drop description
Jaccard token overlap
3. candidate_generation.py:196 _pair_score — drop description
scoring weight (signal no longer
emitted; branch is dead)
4. dto.py CurationEntity.from_lineage — set description="" instead
of deriving from compacted /
description_parts; keep field on
the dataclass for back-compat
with callers that still pass it
5. merge_candidate_detector._description_text_for_scoring →
_embedding_query_text — embed `<name> (<entity_type>)` (mirror
of how the graph_vectors worker writes the entity vector,
Wave 5 task #5 / #7); the legacy method always short-circuited
to "" post Wave 5 so detection produced zero candidates
6. merge_candidate_detector._to_legacy_entity — pass
description="" instead of reading from entity
7. merge_candidate_detector._snapshot — drop description key from
persisted entity_snapshots payload
+1 lineage_merge.py — add merge_entities_apply_description_free
variant for the async accept-apply worker (task #31 § 3.1.5).
Skips LLM unified description / Compactor pass /
__curation_merge__ sentinel description write / vector embed
write per the spec «不调» list. Legacy merge_entities path is
preserved for manual sync API back-compat
(Lesson #14 multi-iteration cleanup follow-up).
+1 service._fetch_shadow_neighbors — replace
`entity.description or entity.name` with `entity.name`;
post Wave 5 the description is always "" so the fallback was
a no-op, and reading description here violates the boundary
gate.
Boundary gate (tests/boundaries/test_graph_curation_description_free.py,
4 AST-level assertions per spec § 5.2.a):
- graph_curation_modules_do_not_read_entity_description
- merge_candidate_detector_does_not_read_entity_description
- lineage_merge_apply_description_free_does_not_read_entity_description
- lineage_merge_apply_description_free_does_not_call_llm_or_compactor
Allowlist:
- lineage_merge.merge_entities (legacy back-compat) excluded by file
- dto.py field declaration excluded (annotation, not a read)
- LineageMergeResult.compacted_description (non-entity result shape
used by legacy sync handle_action API) excluded by base name
Wave-5 invariant codify pattern (Lesson #18 candidate, per huangheng
PR #1932 + chenyexuan PR #1933 first-application demo): lesson
sediment (cr-checklist § 四 Wave 5 description-NULL family) +
mechanical gate (this boundary test) — paired so future regressions
fail at CI not at review time.
Tests: 1466 unit + 104 boundary all green.
Risk: 0 production behavior change for legacy sync handle_action API
(merge_entities preserved); new accept-apply async path uses the
description-free variant exclusively.
Spec: docs/zh-CN/architecture/task-31-graph-node-merge-spec-v1.md § 3.1.5
Task: task #77 (Phase A3) under task #31 umbrella
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(task-31-a3): fold huangheng cr-checklist Lesson #14/#18 NITs
Per @huangheng cr-checklist Lesson #14 + #18 候选 cross-link verify
(msg=be330423) — 2 non-blocker NITs on PR #1941 fix-forwarded:
NIT 1 (service.py:244 deprecation marker):
Add deprecation comment on the legacy sync ``handle_action()`` API
return-shape line that reads ``merge_result.compacted_description``.
Aligns with Lesson #14 «老 path 保留 + 标 deprecation» pattern
(matches the ``lineage_merge.merge_entities`` deprecation marker
added by the main commit), and explicitly cross-links the boundary
test allowlist mechanism (``NON_ENTITY_BASE_NAMES``) so future
grep-based audits don't dispatch on the read.
NIT 2 (boundary test docstring bonus catch cross-link):
Add explicit Lesson #18 候选 second-application demo trail in
``tests/boundaries/test_graph_curation_description_free.py``
module docstring — cite the ``service.py:845`` bonus catch
(``text = entity.description or entity.name`` inside
``GraphCurationService._fetch_shadow_neighbors``) as canonical
proof of the «lesson sediment + mechanical gate 双 layer
codification» value. The spec § 3.1.5 ratify (符炫炜 + Bryce +
ziang + huangzhangshu + Weston multi-source review) listed exactly
6+1 sites and every reviewer + spec author missed this 7th hidden
read; the boundary gate caught it on first run, turning
``reviewer-as-detector`` into ``CI-as-detector`` per the
Lesson #18 thesis.
0 production code change beyond comment / docstring text.
Tests: 4/4 boundary test pass + ruff format / check clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(boundary): include dto.py in description-free AST scan
Per @huangzhangshu BLOCKER (PR #1941 testing-lane CR, msg=2deb5407)
+ @ziang second-source ratify (msg=f485803c) + @不穷 PM dispatch
(msg=a6cd42c9): the boundary gate
``test_graph_curation_modules_do_not_read_entity_description`` was
whole-file excluding ``aperag/graph_curation/dto.py`` to avoid
flagging the dataclass field declaration. But spec § 3.1.5 item 4
explicitly lists ``CurationEntity.from_lineage`` as one of the 6
description-free call sites, so the gate must catch future
regressions that re-introduce
``entity.compacted_description`` / ``entity.description_parts``
reads inside ``from_lineage``.
The whole-file exclusion was a false-positive prevention that
turned out to be unnecessary: the AST walker matches
``ast.Attribute`` reads only, and dataclass field annotations
(``description: str = ""``) are ``ast.AnnAssign`` nodes with
``target=ast.Name``, while constructor keyword args
(``cls(description="")``) are ``ast.keyword`` nodes — neither is
an ``ast.Attribute`` access on an entity object.
Drops the whole-file exclusion and adds two reinforcing
sister-tests so future maintainers do not regress this:
* ``test_dto_module_is_in_boundary_scope`` — synthetic-AST
positive control: feeds a fake ``from_lineage`` body that reads
``entity.compacted_description`` through the same offender
detector and asserts the offender is surfaced. If a future
refactor breaks the AST walker, this test catches the silent
protection-loss.
* ``test_dto_field_declaration_is_not_a_false_positive`` — live
negative control: confirms the production ``dto.py`` produces
zero offenders, with a docstring directing future maintainers
to fix the walker (NOT re-allowlist the file) if a false-
positive is ever observed.
6/6 boundary tests pass + ruff format / check clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
0 commit comments