Skip to content

docs(cr-checklist): task #61 + task #30 B3 close-out sediment fold-in#1932

Merged
earayu merged 1 commit into
mainfrom
huangheng/task-61-cr-checklist-sediment-fold
Apr 30, 2026
Merged

docs(cr-checklist): task #61 + task #30 B3 close-out sediment fold-in#1932
earayu merged 1 commit into
mainfrom
huangheng/task-61-cr-checklist-sediment-fold

Conversation

@earayu
Copy link
Copy Markdown
Collaborator

@earayu earayu commented Apr 30, 2026

Summary

task #30 全 phases (A1+A2+A3+B1+B2+B3) + task #61 全 P0 (W1+D1+V2/V3/V4+G1+spec v1) 闭环后 huangheng follow-up 子 PR — docs/zh-CN/architecture/task-17-cr-review-checklist.md 1 文件 docs-only / +225 / 0 deletions / 0 production code 改动 / 0 risk。

per architect dispatch:

§ 四 新增 8 lesson sediment

跨 PR 多独立 source 同源 catch trail

§ 六 sediment 引用追加 6 PR commit cross-link

PR #1925 / #1926 / #1927 / #1928 / #1929 / #1930 全 commit hash 入仓。

§ 八 修订记录追加本 PR fold-in 完整 trail。

Weston msg=0f544c29 framing feedback fold-in

按 Weston explicit framing guidance 在 lesson body 措辞落地:

  • Lesson fix: socket reconnect bug #12 v7.4 / v9 framing precise as 「external API raw convention != helper canonical raw convention」not 「math itself wrong」
  • Lesson feat: chat #17 framing as 「when backend can converge same contract, prefer backend adapter convergence to avoid upper consumer replicating differences」not 「FE never branches」

scope NOT in this PR

CR

  • @符炫炜 ratify
  • @weston cross-link sediment framing verify (按 msg=0f544c29 explicit feedback 已 fold)
  • 任意 lane reviewer LGTM 即可 squash merge

Test plan

🤖 Generated with Claude Code

§ 四 加 8 lesson sediment(task #30 B3 + task #61 全 P0 闭环累计实证)+ § 六 sediment 引用追加 6 PR commit cross-link + § 八 修订记录追加本 PR fold trail。

新增 lesson:
- Lesson #12 v7.4: external API raw contract verify (task #61 P0-B PR #1930
  Qdrant euclid raw direction first-application + fix-forward 1e30a00)
- Lesson #12 v8 second-application: test docstring fake guardrail (task #61
  P0-G1 PR #1927 description_parts assertion 缺位 fix-forward 1953933)
- Lesson #12 v9: first-principles verify catch surface signal mistakes
  (task #61 P0-V1 重新定性 Bryce + task #61 P0-B Qdrant euclid Weston catch
  双独立 source 同源 first/second-application)
- Lesson #13 v2.3: deploy manifest dual-side rewrite (task #61 P0-D1 PR #1929
  Helm Neo4j worker env first-application)
- Lesson #13 v3 application demo 2: cross-source default value alignment
  (task #30 B3 PR #1925 commit dae43f5 三 source 同步 first-application)
- Lesson #14 application demo: spec 内部 default 漂浮 multi-iteration cleanup
  (task #30 B3 PR #1925 fix-forward dae43f5 § 3.1.1 line 85 cleanup
  second-application demo, first-application 在 task #35 6 轮 fix-forward)
- Lesson #16: CI workflow paths filter dead reference 反 pattern (task #61
  P0-W1 PR #1926 first-application demo + Lesson #15 file-move 3-step verify
  升级到 v2 4-step grep .github/workflows/*.yml paths 同步)
- Lesson #17: backend 收敛 contract 优于上层 fork (simple-stable + private-deploy
  paramount directive earayu2 msg=1224bec8 在 cross-adapter contract 设计时
  应用; task #69 P0-B + task #70 P1 候选 1 cross-PR 一次性收敛 first-application)

跨 PR 多独立 source 同源 catch trail:
- Lesson #12 v9: Bryce msg=23a2f514 + Weston msg=86e05a8e 双独立 source
- Lesson #16: chenyexuan msg=f298011e + 冬柏 msg=3e93bb64 双独立 source
- Lesson #17: cuiwenbo msg=cedc7703 + Bryce msg=9895a148 双独立 source
- Lesson #13 v3 application demo 2: huangheng msg=bf785b12 + Planetegg
  msg=c63acbf5 + Weston msg=1e6b0838 三独立 source

per architect msg=c4cdf634 + msg=daaeeab5 + msg=03c892e0 sediment dispatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@earayu earayu merged commit dc79aad into main Apr 30, 2026
10 checks passed
@earayu earayu deleted the huangheng/task-61-cr-checklist-sediment-fold branch April 30, 2026 06:22
earayu added a commit that referenced this pull request Apr 30, 2026
…scope lock

5 BLOCKER cleanup (per ziang/Weston/Bryce/huangzhangshu/Planetegg/dongdong/huangheng converge):

1. lane name `graph_node_merge_suggestion` -> `graph_curation_run` 全文统一 (§2.1/§4 A1/§4 B3/§5.1/§5.2 + 测试文件名 `test_graph_curation_run_boundaries.py` + 删 11th 数字硬码 + 删 `_dedup_scan` 旧实现名)
2. 新 `merge_suggestion` table 残留全清 (§2.2 P1-B + §3.1.2 + §6 Migration chain) -> 全文统一「复用 GraphCurationSuggestion + extend status enum 4 新 value + 加 evidence_refs field」
3. §3.1.4 `LineageGraphStore.merge_entities` (不存在) -> `LineageEntityMerger` description-free apply path 基于 LineageGraphStore primitives + cross-backend boundary test 钉 LineageEntityMerger 行为
4. Phase A1 worker `MergeSuggestionWorker` 调 `MergeCandidateDetector` -> 区分 manual/cron full sweep 调 `generate_graph_curation_run_task` integration path vs auto_post_ingest sync inline `detect_for_sync()` quick path
5. description-free 4 处 -> 6 处 detector/snapshot call site enumerate (补 `merge_candidate_detector.py:322-328` + 修正 `:263-271` -> `:257-284`,跟 §3.1.5 align) + 1 个 apply path

§5 gate 拆 5.2.a scan-generation invariants (lane symbolic dual-side / independent queue family / description-free 6 call sites / trigger split / safe-only write) vs 5.2.b async accept-apply 状态机 invariants (7-state machine / enum lowercase + dual-side / description-free apply variant / cross-backend apply contract / audit trail / idempotent replay)

§6 sediment cite update: Lesson #13 v3.1 -> v2.3 (per huangheng line 285 verify); 「即将 fold」-> 「已 fold per PR #1932 commit dc79aad」(已 merged); 加 Lesson #18 候选 cross-link (lesson sediment + mechanical gate 双 layer codification - 一记一 enforce, per huangheng msg=b18d26ee + chenyexuan PR #1933 first-application demo)

新增 §3.1.3 entity_type scope lock (per PM msg=05be0b52 + ziang msg=d6d9dc3c + dongdong msg=83783bc6 + Weston msg=78ab2267 三方 converge):
- v1 仍以 entity name 为主 merge target,`entity_type` 仅 compatibility / penalty signal
- merge suggestion 必须容忍 type 近似 (展示 observed_types + type_conflict + suggested_entity_type)
- `entity_type_alias` suggestion kind 移 Phase B / P1 follow-up (#31-C3),独立设计 store/API/migration/UI;v1 boundary test 钉「不写 suggestion_kind='entity_type_alias'」防 scope creep

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
earayu added a commit that referenced this pull request Apr 30, 2026
…n_window_size (#1933)

Codify Lesson #13 v3 (cross-source default value alignment) as a CI
unit test gate so future task-#30 B3-class drift is caught by
``cicd-push.yml`` lint+unit instead of by reviewers via fix-forward
rounds.

Background — task #30 B3 (PR #1925, merge ``43648f9``) locked
``graph_extraction_window_size`` default to ``2`` across **four**
sources that all need to agree:

1. ``aperag/indexing/graph_extractor.py``
   ``_DEFAULT_GRAPH_EXTRACTION_WINDOW_SIZE`` (Python const, runtime
   fallback)
2. ``aperag/schema/common.py``
   ``KnowledgeGraphConfig.graph_extraction_window_size`` Pydantic
   ``Field(examples=[N])`` (OpenAPI / TS schema source)
3. ``web/src/api-v2/schema.d.ts`` JSDoc ``@example N`` (frontend client
   surface — committed to repo, can drift if regen skipped)
4. ``docs/zh-CN/architecture/task-30-graph-chunk-window-spec-v1.md``
   § 3.1.1 line 85 ``**B3 lock default `N`**`` + § 4.2
   ``**`graph_extraction_window_size = N`**`` (architectural source of
   truth that PRs CR against)

PR #1925 itself surfaced the drift class:
- Weston ``msg=1b7d9bef`` BLOCKER 1 caught ``schema.d.ts`` still
  carrying default ``1``
- huangheng ``msg=bf785b12`` NIT 1 caught § 3.1.1 line 85 still saying
  default ``1``
Both required a fix-forward commit (``dae43f5``).

Why a unit test (not a boundary test): ``tests/boundaries/`` is not
currently invoked by ``make test-unit`` / ``test-integration`` /
``cicd-push.yml`` (task #33 Layer 1 audit finding).
``tests/unit_test/`` runs on every push via ``make test-unit``. Per
simple-stable directive (earayu2 ``msg=1224bec8``), the cheapest
reliable gate is a unit test in the existing CI lane, not a new
workflow file.

Scope discipline: pins **default value parity** across four sources
only. Does not pin description text, override-recommendation phrasing,
or rationale wording. If a future change moves the default away from
2, the test fails with a list of all observed values per source plus
the procedural reminder (``≥10 samples + ≥3 models 同时不退步 + PM +
architect + earayu2 三方 confirm``).

Tests:

- ``test_graph_extraction_window_size_default_consistent_across_sources``
  — the main gate (asserts all 4 sources agree)
- ``test_graph_extraction_window_size_default_is_positive_integer`` —
  sanity (window assembler math requires ``>= 1``)
- ``test_individual_source_extractor_does_not_raise[*]`` — separates
  "extractor broken" failures from "values drifted" failures so
  operator immediately knows whether to fix test infra or schema

Local validation:

- 5/5 pass in clean state
- Synthetic drift on each of (Python const / TS schema / spec § 3.1.1 /
  spec § 4.2) caught with clear actionable error message naming the
  drifting source
- Full ``tests/unit_test/contracts/`` 58/58 pass
- ruff format + ruff check clean

Sediment cross-link: this gate is the codified counterpart to
huangheng PR #1932 § 四 Lesson #13 v3 application demo 2 + Lesson #14
application demo (PR #1925 § 3.1.1 multi-iteration cleanup) — that PR
records the drift class as a CR-checklist lesson; this PR enforces it
mechanically so the lesson does not have to be remembered.

task #33 Layer 2 P3 (chenyexuan claim, in_progress) per PM dispatch
``msg=65465f9e``.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
earayu added a commit that referenced this pull request Apr 30, 2026
…1931)

task #31 spec v1 lock — graph 节点合并扫描 + 后台建议任务设计文档入仓。

## 设计核心

- **scope reframe**: extract / fix / extend Wave 7 §K.12.4 全栈,不 build new
- **独立 queue family** `q:graph_curation_run`:lane 不污染 Modality + DocumentIndex + reconciler,独立 push/pop API
- **trigger 三策略 reconcile**: manual/cron full sweep 走 worker pop → generate_graph_curation_run_task;auto_post_ingest 保 sync inline detect_for_sync 但同 description-free invariant
- **复用 GraphCurationSuggestion table**:不引入新 merge_suggestion table,仅 extend 4 新 status enum + evidence_refs field
- **状态机 Option B (apply_pending + ACCEPTED legacy)**: pending → dismissed | rejected | apply_pending → applying → applied | apply_failed;现有 ACCEPTED 历史 sync handle_action terminal status 保留 legacy read-only,新 async path zero-write gate
- **description-free 6 call sites + 1 apply path** (Wave 5 invariant): candidate_generation.py:43/179-181/196-197 + dto.py:59-65/101-105 + merge_candidate_detector.py:257-284 + :322-328 + lineage_merge.py:246-317 apply variant
- **LineageEntityMerger application-layer cross-backend contract** (Protocol 不含 merge_entities,复用 LineageGraphStore primitives)
- **entity_type scope lock 三层**: v1 仅 compatibility/penalty signal,suggestion 容忍 type 近似展示 observed_types/type_conflict/suggested_entity_type,entity_type_alias 独立 suggestion kind 移 Phase B/P1 follow-up #31-C3
- **复用 /graphs/merge-suggestions endpoint + extend SUGGESTION_ACTIONS dismiss + Pydantic Field validator confidence_score [0,1]**

## 集体 8/8 lane LGTM 收齐

- @bryce (msg=9e49d440): 5 BLOCKER 全清 + entity_type scope lock + Migration chain 一致性
- @weston (msg=ed202960 + 92dd89ff): 五类 consistency sweep + entity_type 三层架构 + Migration chain
- @huangzhangshu (msg=9a4cbd61 + 68783841): 五类旧口径清成 Phase A/B gate + enum count micro-fix
- @ziang (msg=760b7341 + 0b761117): impl-lane 5 BLOCKER + state machine Option B + enum count
- @huangheng (msg=535de81b): Lesson framework v5/v6/v7/v8/v9/v13/v14/v16/v17 + Lesson #18 候选 cross-link + Migration chain 时序 全一致
- @dongdong (msg=8316b45a): FE/UI scoped + entity_type FE 友好性 + state machine
- @Planetegg (msg=7d428e33): SRE/deploy Helm render gate symbolic lane assertion
- @cuiwenbo (msg=594fbd4f): 3 NIT (endpoint reuse + status enum FE typed schema sync + confidence_score [0,1] validator) 全 fold

## CI 状态

- lint-and-unit ✅
- e2e-http-smoke 3/3 ✅  
- e2e-http-provider-preflight 3/3 ✅
- docs-only lite gate 满足

## 关联

- 不阻塞 PR #1932 (huangheng sediment merged dc79aad) / PR #1933 (chenyexuan merged 1024ef9) / task #61 P1/P2 follow-up / task #11 GC orphan vector follow-up
- Phase A 4 sub-task 派单 spec lock 后立即可启动 (推荐 owner: A1+A3 Bryce/ziang / A2 ziang / A4 dongdong+cuiwenbo)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
earayu added a commit that referenced this pull request Apr 30, 2026
…1941)

* refactor(task-31-a3): description-free graph_curation 7 call sites

Wave 5 description-NULL invariant (task #31 spec § 3.1.5): graph
extractor stopped emitting `description` text post Wave 5 task #5
(facts/vectors split). The dedup detection / scoring / snapshot /
accept-apply paths still read `entity.description` /
`compacted_description` / `description_parts` and would either
silently degrade scoring (always-empty bag-of-tokens) or leak stale
fragments from pre-Wave-5 rows into reviewer-facing suggestions.

Fix the 6 detector / snapshot call sites + 1 apply path enumerated
in the spec, plus 1 service-layer helper surfaced by the boundary
test grep gate:

  1. candidate_generation.py:38  entity_snapshot — drop description
  2. candidate_generation.py:179 _lexical_signals — drop description
                                  Jaccard token overlap
  3. candidate_generation.py:196 _pair_score — drop description
                                  scoring weight (signal no longer
                                  emitted; branch is dead)
  4. dto.py CurationEntity.from_lineage — set description="" instead
                                  of deriving from compacted /
                                  description_parts; keep field on
                                  the dataclass for back-compat
                                  with callers that still pass it
  5. merge_candidate_detector._description_text_for_scoring →
     _embedding_query_text — embed `<name> (<entity_type>)` (mirror
     of how the graph_vectors worker writes the entity vector,
     Wave 5 task #5 / #7); the legacy method always short-circuited
     to "" post Wave 5 so detection produced zero candidates
  6. merge_candidate_detector._to_legacy_entity — pass
     description="" instead of reading from entity
  7. merge_candidate_detector._snapshot — drop description key from
     persisted entity_snapshots payload
  +1 lineage_merge.py — add merge_entities_apply_description_free
     variant for the async accept-apply worker (task #31 § 3.1.5).
     Skips LLM unified description / Compactor pass /
     __curation_merge__ sentinel description write / vector embed
     write per the spec «不调» list. Legacy merge_entities path is
     preserved for manual sync API back-compat
     (Lesson #14 multi-iteration cleanup follow-up).
  +1 service._fetch_shadow_neighbors — replace
     `entity.description or entity.name` with `entity.name`;
     post Wave 5 the description is always "" so the fallback was
     a no-op, and reading description here violates the boundary
     gate.

Boundary gate (tests/boundaries/test_graph_curation_description_free.py,
4 AST-level assertions per spec § 5.2.a):

  - graph_curation_modules_do_not_read_entity_description
  - merge_candidate_detector_does_not_read_entity_description
  - lineage_merge_apply_description_free_does_not_read_entity_description
  - lineage_merge_apply_description_free_does_not_call_llm_or_compactor

Allowlist:
  - lineage_merge.merge_entities (legacy back-compat) excluded by file
  - dto.py field declaration excluded (annotation, not a read)
  - LineageMergeResult.compacted_description (non-entity result shape
    used by legacy sync handle_action API) excluded by base name

Wave-5 invariant codify pattern (Lesson #18 candidate, per huangheng
PR #1932 + chenyexuan PR #1933 first-application demo): lesson
sediment (cr-checklist § 四 Wave 5 description-NULL family) +
mechanical gate (this boundary test) — paired so future regressions
fail at CI not at review time.

Tests: 1466 unit + 104 boundary all green.
Risk: 0 production behavior change for legacy sync handle_action API
(merge_entities preserved); new accept-apply async path uses the
description-free variant exclusively.

Spec: docs/zh-CN/architecture/task-31-graph-node-merge-spec-v1.md § 3.1.5
Task: task #77 (Phase A3) under task #31 umbrella

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(task-31-a3): fold huangheng cr-checklist Lesson #14/#18 NITs

Per @huangheng cr-checklist Lesson #14 + #18 候选 cross-link verify
(msg=be330423) — 2 non-blocker NITs on PR #1941 fix-forwarded:

NIT 1 (service.py:244 deprecation marker):
  Add deprecation comment on the legacy sync ``handle_action()`` API
  return-shape line that reads ``merge_result.compacted_description``.
  Aligns with Lesson #14 «老 path 保留 + 标 deprecation» pattern
  (matches the ``lineage_merge.merge_entities`` deprecation marker
  added by the main commit), and explicitly cross-links the boundary
  test allowlist mechanism (``NON_ENTITY_BASE_NAMES``) so future
  grep-based audits don't dispatch on the read.

NIT 2 (boundary test docstring bonus catch cross-link):
  Add explicit Lesson #18 候选 second-application demo trail in
  ``tests/boundaries/test_graph_curation_description_free.py``
  module docstring — cite the ``service.py:845`` bonus catch
  (``text = entity.description or entity.name`` inside
  ``GraphCurationService._fetch_shadow_neighbors``) as canonical
  proof of the «lesson sediment + mechanical gate 双 layer
  codification» value. The spec § 3.1.5 ratify (符炫炜 + Bryce +
  ziang + huangzhangshu + Weston multi-source review) listed exactly
  6+1 sites and every reviewer + spec author missed this 7th hidden
  read; the boundary gate caught it on first run, turning
  ``reviewer-as-detector`` into ``CI-as-detector`` per the
  Lesson #18 thesis.

0 production code change beyond comment / docstring text.

Tests: 4/4 boundary test pass + ruff format / check clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(boundary): include dto.py in description-free AST scan

Per @huangzhangshu BLOCKER (PR #1941 testing-lane CR, msg=2deb5407)
+ @ziang second-source ratify (msg=f485803c) + @不穷 PM dispatch
(msg=a6cd42c9): the boundary gate
``test_graph_curation_modules_do_not_read_entity_description`` was
whole-file excluding ``aperag/graph_curation/dto.py`` to avoid
flagging the dataclass field declaration. But spec § 3.1.5 item 4
explicitly lists ``CurationEntity.from_lineage`` as one of the 6
description-free call sites, so the gate must catch future
regressions that re-introduce
``entity.compacted_description`` / ``entity.description_parts``
reads inside ``from_lineage``.

The whole-file exclusion was a false-positive prevention that
turned out to be unnecessary: the AST walker matches
``ast.Attribute`` reads only, and dataclass field annotations
(``description: str = ""``) are ``ast.AnnAssign`` nodes with
``target=ast.Name``, while constructor keyword args
(``cls(description="")``) are ``ast.keyword`` nodes — neither is
an ``ast.Attribute`` access on an entity object.

Drops the whole-file exclusion and adds two reinforcing
sister-tests so future maintainers do not regress this:

* ``test_dto_module_is_in_boundary_scope`` — synthetic-AST
  positive control: feeds a fake ``from_lineage`` body that reads
  ``entity.compacted_description`` through the same offender
  detector and asserts the offender is surfaced. If a future
  refactor breaks the AST walker, this test catches the silent
  protection-loss.
* ``test_dto_field_declaration_is_not_a_false_positive`` — live
  negative control: confirms the production ``dto.py`` produces
  zero offenders, with a docstring directing future maintainers
  to fix the walker (NOT re-allowlist the file) if a false-
  positive is ever observed.

6/6 boundary tests pass + ruff format / check clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant