Skip to content

docs(indexing): perf audit v1 — parse / scheduler / DB / private deploy#1954

Closed
earayu wants to merge 473 commits into
mainfrom
architect/indexing-perf-audit-v1
Closed

docs(indexing): perf audit v1 — parse / scheduler / DB / private deploy#1954
earayu wants to merge 473 commits into
mainfrom
architect/indexing-perf-audit-v1

Conversation

@earayu
Copy link
Copy Markdown
Collaborator

@earayu earayu commented May 2, 2026

Summary

per earayu2 directive (msg=718c79ba) — @符炫炜 + @ziang 在 #Indexing小组 thread 内合作审计 indexing 链路 + DB 层 + 私有化大量/长文档场景,详细方案报告。

v1 涵盖(符炫炜 own):

  • §1 Parse 层:5 项瓶颈
  • §2 Index 4-lane 调度:6 项瓶颈
  • §3 DB 层:7 项瓶颈
  • §4 私有化部署:tier 1/2/3 配方 + production preset
  • §5 大量 + 长文档端到端瓶颈排序:长文 5000 chunk 期望提速 720s → 190s (3.7×)
  • §6-§8 实施切片 / 验证方式 / 依赖与风险

v2 follow-up(本 PR 不含)

  • K8s prod 部署参数(HPA / PVC / leader-election)
  • PG + KubeBlocks + pgbouncer 章节(@ziang 研究 kubeblocks-skills 后补)
  • admin UI 可配化清单
  • §11-§14 @ziang 补充:读路径 / cleanup / 端到端归因 / 联合验收

main HEAD pin: eb4c4f3 (2026-04-30 18:46)

Test plan

  • @ziang review §1-§10 + 补 §11-§14 + KubeBlocks 章节
  • @不穷 confirm Wave 1-4 切片 + 12 PR 排期
  • @earayu2 final verdict + 拍板 P0 优先级
  • v2 commit follow-up(K8s prod / pgbouncer / admin UI)

🤖 Generated with Claude Code

earayu and others added 30 commits April 25, 2026 20:15
Restore quota/system routes on /api/v2 and finish the Phase 8 G5 transitional ledger cleanup.
Land the wire-emission half of D8.1 — the agent-runtime SSE endpoint
now emits AI SDK v5 ``UI Message Stream Protocol`` part frames in
place of the legacy ``AgentTimelineEventEnvelope`` JSON, advertising
itself via the ``x-vercel-ai-ui-message-stream: v1`` response header
that the FE ``@ai-sdk/react`` consumer (#76) keys on.

New ``aperag/domains/agent_runtime/wire/`` sub-package:
* ``parts.py`` — Pydantic models for every v5 part type the runtime
  emits + ``data-citation`` (Anthropic-shape) / ``data-activity``
  ApeRAG extensions + placeholder ``data-tool-consent`` /
  ``data-elicitation`` literals reserved for #75 chenyexuan; exposed
  as a discriminated ``StreamPart`` union with a ``TypeAdapter`` for
  round-trip parsing.
* ``translator.py`` — pure ``translate_envelope(envelope, state)``
  function mapping each timeline envelope to one-or-more parts per
  the D8.1 mapping table; per-turn ``TranslatorState`` carries
  text-block lifecycle bookkeeping; ``safe_tool_name_resolver`` hook
  reserved for #75 (raw tool name + empty metadata until then).

SSE route (``api/routes.py``) updated:
* New ``_format_part_frame`` writes ``id: <seq>\ndata: <json>\n\n``
  AI SDK v5 frames; only the LAST part of an envelope fan-out gets
  the SSE ``id:`` so ``Last-Event-ID`` resume keeps pointing at the
  next envelope (translator docstring documents the invariant).
* ``stream_turn_events_view`` now wraps each envelope through the
  translator and yields one frame per part. Heartbeat switched to
  the SSE-comment form (``: heartbeat\n\n``) which is invisible to
  the v5 consumer. Generator wrapped in try/except that emits a
  synthetic ``error`` part on uncaught exceptions before re-raising.

Out of scope (per PM lock msg=82ba98fc): DB / Redis storage (#74),
tool consent / elicitation / SafeToolName plumbing (#75), FE consumer
(#76), agent reasoning loop. The translator is read-only over
envelopes; storage shape is unchanged.

Tests:
* ``tests/unit_test/test_agent_runtime_wire_parts.py`` — 14 contract
  tests covering every envelope→part mapping, JSON round-trip across
  the union, ``safe_tool_name_resolver`` plug-in seam, SSE response
  headers (v5 marker + Content-Type), and ``Last-Event-ID`` resume
  semantics.
* Updated ``test_agent_runtime_v3.py`` and
  ``test_agent_runtime_openapi_contract.py`` to assert on the new
  AI SDK v5 wire shape (hard-cut per Phase 8 msg=78fdb6fc — no dual
  emission, no envelope-format fallback).

Acceptance gates green: wire-parts suite + modularization_boundaries
+ v1_ghost_guard + openapi_spec all pass; ``make lint`` +
``make add-license`` clean.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(phase8 #74 D8.2): first-cut UIMessage at-rest storage for agent path

Phase 8 task #74 (D8.2) — first cut of the at-rest UIMessage storage
layer per the canonical ``docs/modularization/agent-message-protocol-design.md``
and ``docs/modularization/agent-runtime-mcp-design.md`` (in main).

This PR delivers the foundation:

* ``aperag/domains/agent_runtime/uimessage.py`` (NEW) — pydantic
  schema for ``UIMessage`` and every ``UIMessagePart`` variant
  (text / tool / source-url / source-document / data-citation /
  data-activity / data-tool-consent / data-elicitation), plus
  ``persistable_parts`` / ``args_preview`` / ``args_hash`` helpers
  enforcing D9 §A7 raw-args-private rule.

* ``aperag/domains/agent_runtime/db/models.py`` — new ``AgentMessage``
  ORM (``agent_message`` table; 1:1 with ``agent_turn`` via
  ``turn_id``; ``parts`` JSON column carries the full UIMessage at
  rest; ``schema_version`` tag for FE forward-compat). Legacy
  ``AgentArtifact`` / ``AgentTimelineEvent`` tables retained during
  D8.x rollout — D8.6 (#80) will drop them once the FE renderer is
  consuming AgentMessage exclusively.

* ``aperag/migration/versions/...d8e2c4a17b91_add_agent_message_table.py``
  — new alembic revision chained off ``7c4e9e1f8b21``; pure additive
  (no rename / drop in this PR), idempotent migration.

* ``aperag/domains/agent_runtime/storage.py`` — extend
  ``AgentRuntimeRedisStore`` with ``write_message_snapshot`` /
  ``read_message_snapshot`` / ``delete_message_snapshot`` keyed on
  ``agent_runtime:turn:<id>:message``; same TTL as the live event
  buffer.

* ``aperag/domains/agent_runtime/uimessage_store.py`` (NEW) —
  ``UIMessageStore`` wraps the DB row + Redis snapshot behind a
  single ``write`` / ``read`` / ``delete`` surface. ``write``
  filters transient parts (currently only ``data-activity``);
  ``read`` prefers Redis but falls back to the durable DB row when
  the snapshot is cold. ``UIMessageDbOps`` is a SQLAlchemy-bound
  helper kept separate so unit tests can inject in-memory fakes.

* ``tests/unit_test/agent_runtime/test_uimessage_at_rest.py`` (NEW)
  — at-rest reload contract tests pinning the three invariants
  Weston named as the prerequisite for unblocking D8.4b
  (msg=50c90f6f / msg=cef89ed8): round-trip fidelity across every
  persistable part variant, transient exclusion, snapshot
  consistency between Redis and DB.

Out of scope (left for follow-up commits / sibling lanes per PM
msg=a3c31f79):

* Wire/streaming emitter — D8.1 (#73, cuiwenbo)
* Tool / citation / consent / elicitation enforcement of the
  7-point D9 §A4 contract — D8.3 (#75, chenyexuan)
* Full event-to-UIMessage projection in the runtime services —
  follow-up commit on this branch once #73 stream contract is
  visible
* Drop of legacy ``agent_artifact`` / ``agent_timeline_event``
  tables — D8.6 (#80)
* Non-agent bot path migration — D8.5 (#79)
* FE renderer — D8.4a/b/c (#76/#77/#78)

Gates: 709 pass / 29 skip / 1 deselect / 0 fail unit suite (incl.
7 new contract tests + 24 boundary intact); ruff lint+format clean.

* fix(phase8 #74 D8.2): wrap data-* parts in {type, data: {...}} per D8 §2 canonical

Architect canonical lock 2026-04-25 (msg=ad6168e7) + PM scope-tightening
(msg=1ff7ed9e): persisted data-* parts must round-trip byte-for-byte
with the wire shape produced by #73 cuiwenbo's emitter — D8 §2 forbids
a wire/at-rest converter layer.

Pre-fix at-rest used flat fields (DataCitationPart.cited_text/.location,
DataToolConsentPart.tool_call_id/..., DataElicitationPart.elicitation_id/...)
which violated the same-schema canonical and would have forced #75
chenyexuan or the FE renderer (#76/#77) to maintain dual code paths.

This commit:
- Introduces inner data classes (CitationData / ActivityData /
  ToolConsentData / ElicitationData) so each data-* part follows
  {type, data: {...}} with the field set unchanged.
- Updates the every-part fixture in the contract test to construct
  parts via the wrapped form.
- Adds test_data_parts_use_wrapped_data_shape — a dedicated lock that
  reads the persisted DB row and asserts each data-* part's keys are
  exactly {type, data} and that data carries the canonical fields.

Tests: 8/8 in agent_runtime/test_uimessage_at_rest.py pass; full unit
suite 711/711 (29 skip), ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #74 D8.2): align ToolPart with D8 §2.4 / D9 §A1+§A6 SafeToolName shape

Weston minimal CR (msg=1812fb03) + architect canonical affirm (msg=8412dce5):
the at-rest ToolPart used a flat `type: "tool"` literal plus a separate
`tool_name` field, which is neither the AI SDK v5 streaming form
(`tool-input-*` / `tool-output-*`) nor the v5 consolidated form
(`type: "tool-<safeName>"`). That third intermediate shape would have
forced #75 emit + #76/#77 FE renderer to do `tool` -> `tool-<name>`
conversion — the same wire/at-rest schema drift class we just rejected
for the data-* parts.

This commit:
- Encodes the SafeToolName directly in `ToolPart.type` via a regex-
  validated `^tool-[A-Za-z0-9_-]+$` discriminator string, matching
  D8 §2.4 + D9 §A1/§A6.
- Drops the redundant `tool_name` field; MCP server/tool identity
  remains carried in `metadata`.
- Replaces the misplaced `args_preview` / `args_hash` fields with the
  canonical `input: Optional[Any]`. Those redaction helpers stay
  module-level (`args_preview()` / `args_hash()`) so #75 D8.3 can use
  them when building DataToolConsentPart.data per D9 §A7.
- Updates the every-part fixture and the round-trip expected_types to
  the new tool-`<name>` discriminator.
- Adds test_tool_part_type_uses_safe_tool_name_form — pins the
  persisted tool part `type` matches the SafeToolName regex and
  confirms no top-level `tool_name` field leaks back.

SafeToolName *resolution* (raw MCP name → safe form, collision hash
suffix per D9 §A6) remains #75's scope; #74 only enforces the
canonical storage shape.

Tests: 9/9 in agent_runtime/test_uimessage_at_rest.py pass; full unit
suite 711/711 (29 skip) — the one observed concurrent_control flake
passes on rerun. Ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #74 D8.2): persist UIMessage parts with canonical camelCase aliases

Weston minimal CR (msg=59a459c6) + architect canonical affirm: the
at-rest part models lacked Pydantic aliases, so `model_dump(by_alias=True)`
fell back to snake_case (`source_id`, `tool_call_id`, `args_preview`,
`elicitation_id`, etc.) — diverging from cuiwenbo wire `parts.py` (#73)
which already serializes camelCase per AI SDK v5. That breaks the D8 §2
same-schema invariant a third time and would have forced #76/#77 FE
renderer to handle two casings.

This commit attaches `Field(alias=...)` + `ConfigDict(populate_by_name=True)`
to every camelCase-canonical field so JSON serialization matches the
wire byte-for-byte while Python call sites still use snake_case:

- SourceUrlPart.source_id        → sourceId
- SourceDocumentPart.source_id   → sourceId
- SourceDocumentPart.media_type  → mediaType
- ToolPart.tool_call_id          → toolCallId
- ToolPart.error_text            → errorText
- ToolConsentData.tool_call_id   → toolCallId
- ToolConsentData.tool_name      → toolName
- ToolConsentData.args_preview   → argsPreview
- ToolConsentData.args_hash      → argsHash
- ToolConsentData.requested_at   → requestedAt
- ElicitationData.elicitation_id → elicitationId

Snake_case stays where D8 §2 / Anthropic-shape canon requires it:
CitationData.cited_text and the four CitationLocation variants
(char_location / page_location / content_block_location / url_citation
plus their internal start_char / end_char / doc_index / doc_title /
page_index / block_index fields) follow the Anthropic citation
convention unchanged.

Tests:
- test_data_parts_use_wrapped_data_shape now asserts the wrapped
  data-tool-consent / data-elicitation payloads carry camelCase keys
  (toolCallId / argsPreview / requestedAt / elicitationId, etc.).
- New test_persisted_keys_use_canonical_camelcase locks the camelCase
  contract end-to-end against the persisted DB row, explicitly
  failing if any of the legacy snake_case forms reappear.
- test_tool_part_type_uses_safe_tool_name_form additionally pins
  toolCallId on the tool part.

Gates: 10/10 in agent_runtime/test_uimessage_at_rest.py pass; full
unit suite 712/29 skip/0 fail (concurrent_control flake deselected,
pre-existing). Ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #74 D8.2): align DataElicitationPart with D9 §5.1 canonical

Weston minimal CR (msg=51dffdc9) + PM lock (msg=042b0a7b): the at-rest
ElicitationData was missing the canonical `serverName` field and used
a non-canonical `submitted` state literal. D9 §5.1 locks the shape as:

    { type: "data-elicitation", data: {
        elicitationId: string,
        serverName: string,          // MCP server requesting input
        prompt: string,
        schema: JsonSchema,
        state: "pending" | "answered" | "cancelled"
    }}

This commit:
- Adds `server_name: str = Field(alias="serverName")` to ElicitationData
  so MCP server identity round-trips with the elicitation request.
- Tightens `state` to `Literal["pending", "answered", "cancelled"]` per
  D9 §5.1 / §6.3 — the previous `submitted` would have forced #75 emit
  to translate state on every elicitation reply.
- Keeps `response: Optional[dict[str, Any]]` per PM msg=042b0a7b
  ("可以保留但不能替代 canonical 字段"); it carries the user's submitted
  value at-rest after the POST endpoint completes the round-trip.

Tests:
- Updates the every-part fixture with a representative serverName.
- test_data_parts_use_wrapped_data_shape now asserts `serverName` is
  in the persisted data-elicitation keys.
- test_persisted_keys_use_canonical_camelcase locks `serverName` (not
  `server_name`) and the canonical state literal.
- New test_data_elicitation_answered_state_round_trip — explicit
  round-trip of a `state="answered"` elicitation with a populated
  response, pinning the canonical state vocabulary against regression.

Gates: 11/11 in agent_runtime/test_uimessage_at_rest.py pass; full
unit suite 713 passed / 29 skipped / 0 failed (concurrent_control
flake deselected, pre-existing). Ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…ap matrix (#83) (#1698)

Phase 9 D10.a (read-only) — current-state record of ApeRAG MCP / RAG /
HTTP / internal-service surface, intended to feed D10 design pack
(task #82 / #84).

Body §B: 6-interface inventory (Vector / Graph / Full-text /
Web Search / Summary / Vision) — for each: MCP exposure, HTTP endpoint,
request/response schema, service entry, implementation file, multi-tenant
boundary.

§C: HTTP-only / internal-only capabilities not yet exposed via MCP,
recorded as gaps (per PM expansion). Tagged per architect 4-tier access
taxonomy (MCP-exposed / HTTP-only / internal-only / none).

Appendix A: D9 base reuse matrix — SafeToolName, 3-tier registry,
7-point contract, multi-tenant auth boundary — distinguishing on-disk
reusable vs. design-only.

Appendix B: 1-page impact table for the three earayu2 open questions
(Summary/Vision deprecate, write tools scope, cross-collection ops)
with cost asymmetry per choice.

Includes "Delta from 5113730 -> e290488" pass for #74 D8.2 merge:
data-tool-consent + data-elicitation parts moved on-disk; 7-point
compliance lower-bound conclusions unchanged.

Ground truth: origin/main HEAD e290488 at time of writing.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… elicitation (#1696)

* feat(phase8 #74 D8.2): first-cut UIMessage at-rest storage for agent path

Phase 8 task #74 (D8.2) — first cut of the at-rest UIMessage storage
layer per the canonical ``docs/modularization/agent-message-protocol-design.md``
and ``docs/modularization/agent-runtime-mcp-design.md`` (in main).

This PR delivers the foundation:

* ``aperag/domains/agent_runtime/uimessage.py`` (NEW) — pydantic
  schema for ``UIMessage`` and every ``UIMessagePart`` variant
  (text / tool / source-url / source-document / data-citation /
  data-activity / data-tool-consent / data-elicitation), plus
  ``persistable_parts`` / ``args_preview`` / ``args_hash`` helpers
  enforcing D9 §A7 raw-args-private rule.

* ``aperag/domains/agent_runtime/db/models.py`` — new ``AgentMessage``
  ORM (``agent_message`` table; 1:1 with ``agent_turn`` via
  ``turn_id``; ``parts`` JSON column carries the full UIMessage at
  rest; ``schema_version`` tag for FE forward-compat). Legacy
  ``AgentArtifact`` / ``AgentTimelineEvent`` tables retained during
  D8.x rollout — D8.6 (#80) will drop them once the FE renderer is
  consuming AgentMessage exclusively.

* ``aperag/migration/versions/...d8e2c4a17b91_add_agent_message_table.py``
  — new alembic revision chained off ``7c4e9e1f8b21``; pure additive
  (no rename / drop in this PR), idempotent migration.

* ``aperag/domains/agent_runtime/storage.py`` — extend
  ``AgentRuntimeRedisStore`` with ``write_message_snapshot`` /
  ``read_message_snapshot`` / ``delete_message_snapshot`` keyed on
  ``agent_runtime:turn:<id>:message``; same TTL as the live event
  buffer.

* ``aperag/domains/agent_runtime/uimessage_store.py`` (NEW) —
  ``UIMessageStore`` wraps the DB row + Redis snapshot behind a
  single ``write`` / ``read`` / ``delete`` surface. ``write``
  filters transient parts (currently only ``data-activity``);
  ``read`` prefers Redis but falls back to the durable DB row when
  the snapshot is cold. ``UIMessageDbOps`` is a SQLAlchemy-bound
  helper kept separate so unit tests can inject in-memory fakes.

* ``tests/unit_test/agent_runtime/test_uimessage_at_rest.py`` (NEW)
  — at-rest reload contract tests pinning the three invariants
  Weston named as the prerequisite for unblocking D8.4b
  (msg=50c90f6f / msg=cef89ed8): round-trip fidelity across every
  persistable part variant, transient exclusion, snapshot
  consistency between Redis and DB.

Out of scope (left for follow-up commits / sibling lanes per PM
msg=a3c31f79):

* Wire/streaming emitter — D8.1 (#73, cuiwenbo)
* Tool / citation / consent / elicitation enforcement of the
  7-point D9 §A4 contract — D8.3 (#75, chenyexuan)
* Full event-to-UIMessage projection in the runtime services —
  follow-up commit on this branch once #73 stream contract is
  visible
* Drop of legacy ``agent_artifact`` / ``agent_timeline_event``
  tables — D8.6 (#80)
* Non-agent bot path migration — D8.5 (#79)
* FE renderer — D8.4a/b/c (#76/#77/#78)

Gates: 709 pass / 29 skip / 1 deselect / 0 fail unit suite (incl.
7 new contract tests + 24 boundary intact); ruff lint+format clean.

* refactor(phase8 #73 D8.1): backend AI SDK v5 stream emitter

Land the wire-emission half of D8.1 — the agent-runtime SSE endpoint
now emits AI SDK v5 ``UI Message Stream Protocol`` part frames in
place of the legacy ``AgentTimelineEventEnvelope`` JSON, advertising
itself via the ``x-vercel-ai-ui-message-stream: v1`` response header
that the FE ``@ai-sdk/react`` consumer (#76) keys on.

New ``aperag/domains/agent_runtime/wire/`` sub-package:
* ``parts.py`` — Pydantic models for every v5 part type the runtime
  emits + ``data-citation`` (Anthropic-shape) / ``data-activity``
  ApeRAG extensions + placeholder ``data-tool-consent`` /
  ``data-elicitation`` literals reserved for #75 chenyexuan; exposed
  as a discriminated ``StreamPart`` union with a ``TypeAdapter`` for
  round-trip parsing.
* ``translator.py`` — pure ``translate_envelope(envelope, state)``
  function mapping each timeline envelope to one-or-more parts per
  the D8.1 mapping table; per-turn ``TranslatorState`` carries
  text-block lifecycle bookkeeping; ``safe_tool_name_resolver`` hook
  reserved for #75 (raw tool name + empty metadata until then).

SSE route (``api/routes.py``) updated:
* New ``_format_part_frame`` writes ``id: <seq>\ndata: <json>\n\n``
  AI SDK v5 frames; only the LAST part of an envelope fan-out gets
  the SSE ``id:`` so ``Last-Event-ID`` resume keeps pointing at the
  next envelope (translator docstring documents the invariant).
* ``stream_turn_events_view`` now wraps each envelope through the
  translator and yields one frame per part. Heartbeat switched to
  the SSE-comment form (``: heartbeat\n\n``) which is invisible to
  the v5 consumer. Generator wrapped in try/except that emits a
  synthetic ``error`` part on uncaught exceptions before re-raising.

Out of scope (per PM lock msg=82ba98fc): DB / Redis storage (#74),
tool consent / elicitation / SafeToolName plumbing (#75), FE consumer
(#76), agent reasoning loop. The translator is read-only over
envelopes; storage shape is unchanged.

Tests:
* ``tests/unit_test/test_agent_runtime_wire_parts.py`` — 14 contract
  tests covering every envelope→part mapping, JSON round-trip across
  the union, ``safe_tool_name_resolver`` plug-in seam, SSE response
  headers (v5 marker + Content-Type), and ``Last-Event-ID`` resume
  semantics.
* Updated ``test_agent_runtime_v3.py`` and
  ``test_agent_runtime_openapi_contract.py`` to assert on the new
  AI SDK v5 wire shape (hard-cut per Phase 8 msg=78fdb6fc — no dual
  emission, no envelope-format fallback).

Acceptance gates green: wire-parts suite + modularization_boundaries
+ v1_ghost_guard + openapi_spec all pass; ``make lint`` +
``make add-license`` clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #74 D8.2): wrap data-* parts in {type, data: {...}} per D8 §2 canonical

Architect canonical lock 2026-04-25 (msg=ad6168e7) + PM scope-tightening
(msg=1ff7ed9e): persisted data-* parts must round-trip byte-for-byte
with the wire shape produced by #73 cuiwenbo's emitter — D8 §2 forbids
a wire/at-rest converter layer.

Pre-fix at-rest used flat fields (DataCitationPart.cited_text/.location,
DataToolConsentPart.tool_call_id/..., DataElicitationPart.elicitation_id/...)
which violated the same-schema canonical and would have forced #75
chenyexuan or the FE renderer (#76/#77) to maintain dual code paths.

This commit:
- Introduces inner data classes (CitationData / ActivityData /
  ToolConsentData / ElicitationData) so each data-* part follows
  {type, data: {...}} with the field set unchanged.
- Updates the every-part fixture in the contract test to construct
  parts via the wrapped form.
- Adds test_data_parts_use_wrapped_data_shape — a dedicated lock that
  reads the persisted DB row and asserts each data-* part's keys are
  exactly {type, data} and that data carries the canonical fields.

Tests: 8/8 in agent_runtime/test_uimessage_at_rest.py pass; full unit
suite 711/711 (29 skip), ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #74 D8.2): align ToolPart with D8 §2.4 / D9 §A1+§A6 SafeToolName shape

Weston minimal CR (msg=1812fb03) + architect canonical affirm (msg=8412dce5):
the at-rest ToolPart used a flat `type: "tool"` literal plus a separate
`tool_name` field, which is neither the AI SDK v5 streaming form
(`tool-input-*` / `tool-output-*`) nor the v5 consolidated form
(`type: "tool-<safeName>"`). That third intermediate shape would have
forced #75 emit + #76/#77 FE renderer to do `tool` -> `tool-<name>`
conversion — the same wire/at-rest schema drift class we just rejected
for the data-* parts.

This commit:
- Encodes the SafeToolName directly in `ToolPart.type` via a regex-
  validated `^tool-[A-Za-z0-9_-]+$` discriminator string, matching
  D8 §2.4 + D9 §A1/§A6.
- Drops the redundant `tool_name` field; MCP server/tool identity
  remains carried in `metadata`.
- Replaces the misplaced `args_preview` / `args_hash` fields with the
  canonical `input: Optional[Any]`. Those redaction helpers stay
  module-level (`args_preview()` / `args_hash()`) so #75 D8.3 can use
  them when building DataToolConsentPart.data per D9 §A7.
- Updates the every-part fixture and the round-trip expected_types to
  the new tool-`<name>` discriminator.
- Adds test_tool_part_type_uses_safe_tool_name_form — pins the
  persisted tool part `type` matches the SafeToolName regex and
  confirms no top-level `tool_name` field leaks back.

SafeToolName *resolution* (raw MCP name → safe form, collision hash
suffix per D9 §A6) remains #75's scope; #74 only enforces the
canonical storage shape.

Tests: 9/9 in agent_runtime/test_uimessage_at_rest.py pass; full unit
suite 711/711 (29 skip) — the one observed concurrent_control flake
passes on rerun. Ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #74 D8.2): persist UIMessage parts with canonical camelCase aliases

Weston minimal CR (msg=59a459c6) + architect canonical affirm: the
at-rest part models lacked Pydantic aliases, so `model_dump(by_alias=True)`
fell back to snake_case (`source_id`, `tool_call_id`, `args_preview`,
`elicitation_id`, etc.) — diverging from cuiwenbo wire `parts.py` (#73)
which already serializes camelCase per AI SDK v5. That breaks the D8 §2
same-schema invariant a third time and would have forced #76/#77 FE
renderer to handle two casings.

This commit attaches `Field(alias=...)` + `ConfigDict(populate_by_name=True)`
to every camelCase-canonical field so JSON serialization matches the
wire byte-for-byte while Python call sites still use snake_case:

- SourceUrlPart.source_id        → sourceId
- SourceDocumentPart.source_id   → sourceId
- SourceDocumentPart.media_type  → mediaType
- ToolPart.tool_call_id          → toolCallId
- ToolPart.error_text            → errorText
- ToolConsentData.tool_call_id   → toolCallId
- ToolConsentData.tool_name      → toolName
- ToolConsentData.args_preview   → argsPreview
- ToolConsentData.args_hash      → argsHash
- ToolConsentData.requested_at   → requestedAt
- ElicitationData.elicitation_id → elicitationId

Snake_case stays where D8 §2 / Anthropic-shape canon requires it:
CitationData.cited_text and the four CitationLocation variants
(char_location / page_location / content_block_location / url_citation
plus their internal start_char / end_char / doc_index / doc_title /
page_index / block_index fields) follow the Anthropic citation
convention unchanged.

Tests:
- test_data_parts_use_wrapped_data_shape now asserts the wrapped
  data-tool-consent / data-elicitation payloads carry camelCase keys
  (toolCallId / argsPreview / requestedAt / elicitationId, etc.).
- New test_persisted_keys_use_canonical_camelcase locks the camelCase
  contract end-to-end against the persisted DB row, explicitly
  failing if any of the legacy snake_case forms reappear.
- test_tool_part_type_uses_safe_tool_name_form additionally pins
  toolCallId on the tool part.

Gates: 10/10 in agent_runtime/test_uimessage_at_rest.py pass; full
unit suite 712/29 skip/0 fail (concurrent_control flake deselected,
pre-existing). Ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(phase8 #75 D8.3): backend tool lifecycle + citations + consent + elicitation

Implements the seven-point D9 §A4 contract that gates tool execution
in the agent runtime, plus the Anthropic-shape citation transform:

- tools/safe_name.py     -- D9 §A1+§A6 SafeToolName + collision sha256
                            suffix + (mcpServer, mcpToolName, safeName)
                            reverse lookup
- tools/registry.py      -- D9 §1.1+§A5 three-tier MCP registry with
                            system-namespace reservation and audit-logged
                            admin alias (no silent override)
- tools/authorization.py -- D9 §2 three-level auth (visibility /
                            invocation / consent) with §2.2 default
                            policy + per-tool risk overrides
- tools/args_cache.py    -- D9 §A7 backend-private raw-args cache with
                            short TTL; wire-side argsPreview / argsHash
                            re-exported from the canonical helpers in
                            aperag/domains/agent_runtime/uimessage.py
                            (single-source-of-truth)
- tools/consent.py       -- D9 §3 consent request <-> decision flow with
                            asyncio.Event waiter, single-use raw-args
                            consume, denial-drops-cache invariant
- tools/elicitation.py   -- D9 §5 elicitation request <-> answer flow
                            with schema-validated response + cancel
                            hook; pluggable validator (default checks
                            JSON Schema required fields)
- tools/lifecycle.py     -- envelope event-type constants for
                            tool.consent.* / tool.elicitation.* +
                            translate_lifecycle_envelope() translator
                            extension + LifecycleEmitter glue between
                            consent/elicitation services and the
                            runtime's EventService.append_event path
- tools/citations.py     -- typed Anthropic-shape citation builder for
                            char_location / page_location /
                            content_block_location / url_citation, fed
                            from RAG ReferenceBundleItem metadata

Wire-side refinement:
- wire/parts.py DataToolConsentPart + DataElicitationPart placeholders
  refined to use the canonical wrapped {type, data: ToolConsentData /
  ElicitationData} shape (no more `transient: True` placeholder; per D9
  §3.1 / §5.1 these parts are persisted, audit-trail relevant)

api/routes.py:
- chained translate_lifecycle_envelope() after translate_envelope() so
  consent/elicitation envelopes emit DataToolConsentPart /
  DataElicitationPart on the SSE stream
- new POST /agent/turns/{turn_id}/consent/{tool_call_id} -- records the
  user's decision, wakes the runtime waiter, appends the
  tool.consent.decided envelope so SSE replay carries the resolved part
- new POST /agent/turns/{turn_id}/elicit/{elicitation_id} -- submits a
  schema-validated response, wakes the waiter, appends the
  tool.elicitation.resolved envelope

Contract tests (focused unit_test/agent_runtime/test_tools_*.py, 82
new tests, all passing locally; full unit suite 814 / 29 skip / 0
fail):
- test_tools_safe_name.py     (12 tests) -- D9 §A1+§A6 lock
- test_tools_registry.py      (12 tests) -- D9 §1.1+§A5 lock
- test_tools_authorization.py (11 tests) -- D9 §2 lock
- test_tools_args_cache.py    (12 tests) -- D9 §A7 raw-args privacy lock
- test_tools_consent.py       ( 9 tests) -- D9 §3 consent flow lock
- test_tools_elicitation.py   ( 9 tests) -- D9 §5 elicitation lock
- test_tools_lifecycle.py     ( 9 tests) -- D9 §6 translator extension
- test_tools_citations.py     ( 9 tests) -- D8 §2.5 typed citation lock

7-point D9 §A4 verification:
1. SafeToolName + MCP metadata (D9 §A1+§A6)              -- safe_name.py
2. AI SDK v5 + data-tool-consent custom data-part (§A2)  -- wire/parts.py + lifecycle.py
3. argsPreview + argsHash backend-private (§A7)          -- args_cache.py + consent.py
4. Registry no silent system override (§A5)              -- registry.py
5. data-elicitation schema-validated input (§5)          -- elicitation.py
6. Three-level authorization (§2)                        -- authorization.py
7. PydanticAI as default candidate (§A3)                 -- runtime backbone unchanged
                                                            (per architect msg=ff619d8a /
                                                            Weston msg=50c90f6f C2 lock,
                                                            this PR scope explicitly excludes
                                                            backbone rewrite)

Built on:
- #73 D8.1 wire emitter (cuiwenbo, PR #1695 / 5113730 in main)
  -- consumes wire/parts.py + chains lifecycle translator via api/routes.py
- #74 D8.2 at-rest UIMessage storage (Bryce, PR #1694 head be7406c)
  -- imports ToolConsentData / ElicitationData / args_preview / args_hash
  from aperag/domains/agent_runtime/uimessage.py for wire/at-rest
  same-schema canonical

* fix(phase8 #74 D8.2): align DataElicitationPart with D9 §5.1 canonical

Weston minimal CR (msg=51dffdc9) + PM lock (msg=042b0a7b): the at-rest
ElicitationData was missing the canonical `serverName` field and used
a non-canonical `submitted` state literal. D9 §5.1 locks the shape as:

    { type: "data-elicitation", data: {
        elicitationId: string,
        serverName: string,          // MCP server requesting input
        prompt: string,
        schema: JsonSchema,
        state: "pending" | "answered" | "cancelled"
    }}

This commit:
- Adds `server_name: str = Field(alias="serverName")` to ElicitationData
  so MCP server identity round-trips with the elicitation request.
- Tightens `state` to `Literal["pending", "answered", "cancelled"]` per
  D9 §5.1 / §6.3 — the previous `submitted` would have forced #75 emit
  to translate state on every elicitation reply.
- Keeps `response: Optional[dict[str, Any]]` per PM msg=042b0a7b
  ("可以保留但不能替代 canonical 字段"); it carries the user's submitted
  value at-rest after the POST endpoint completes the round-trip.

Tests:
- Updates the every-part fixture with a representative serverName.
- test_data_parts_use_wrapped_data_shape now asserts `serverName` is
  in the persisted data-elicitation keys.
- test_persisted_keys_use_canonical_camelcase locks `serverName` (not
  `server_name`) and the canonical state literal.
- New test_data_elicitation_answered_state_round_trip — explicit
  round-trip of a `state="answered"` elicitation with a populated
  response, pinning the canonical state vocabulary against regression.

Gates: 11/11 in agent_runtime/test_uimessage_at_rest.py pass; full
unit suite 713 passed / 29 skipped / 0 failed (concurrent_control
flake deselected, pre-existing). Ruff check + format clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #75 D8.3): align elicitation to D9 §5 / D9.1 canonical (serverName + state="answered")

Fast-follow per PR description's Test plan TODO. Reconciles
``ElicitationService`` and ``LifecycleEmitter.request_elicitation``
with the canonical ``ElicitationData`` shape locked by Bryce's
#1694 head ``04d268be`` (Weston msg=89bafde9 4th-blocker fix +
architect msg=8a76e5e0 D9.1 amend):

- ``ElicitationOutcome`` literal: ``"submitted"`` -> ``"answered"``
  (canonical state vocabulary per D9 §5.1 / D9.1)
- ``ElicitationService.request_input(*, server_name=...)``: required
  kwarg threaded through to populate ``ElicitationData.server_name``
  so the FE consent UI can surface which MCP server initiated the
  elicitation
- ``LifecycleEmitter.request_elicitation(*, server_name=...)``:
  matching kwarg propagated to the underlying service
- contract tests updated: ``test_payload_carries_canonical_server_name``
  + ``test_request_input_rejects_empty_server_name`` added; existing
  state assertions flipped to ``"answered"``

Tests: ``pytest tests/unit_test/agent_runtime/test_tools_*.py -q``
=> 84 passed (was 82 + 2 new server_name tests).

Wire / at-rest shape stays canonical-clean: ``ElicitationData`` is
imported directly from ``aperag/domains/agent_runtime/uimessage.py``
so the field set + alias casing follow #74 ``be7406c5`` -> ``04d268be``
single-source-of-truth.

* fix(phase8 #75 D8.3): tenant ownership + multi-tenant registry + default-deny auth

Address Weston's three blockers from minimal CR (msg=57cf4632) +
the architect-upgraded fourth blocker (msg=19f2c9a9). All within
PR scope per PM lock (msg=ab2ed5d3); none deferred.

## B2 (tenant-bound consent + elicitation ownership)

- ``ConsentService`` records ``ConsentBinding(turn_id, user_id)``
  at ``request_consent`` time; ``decide()`` raises
  :class:`ConsentOwnershipError` when ``actor_user_id`` does not
  match the bound user, or when ``expected_turn_id`` is provided
  and does not match the bound turn (defense in depth even when
  the user matches).
- ``ElicitationService`` mirrors the same pattern via
  ``ElicitationBinding`` + :class:`ElicitationOwnershipError`.
  ``cancel(*, bypass_ownership=True)`` is reserved for
  internal-only callers (timeout sweeper / abort path) so user-
  facing handlers cannot accidentally skip the check.
- ``LifecycleEmitter.request_consent`` /
  ``LifecycleEmitter.request_elicitation`` thread the new
  ``turn_id`` + ``user_id`` kwargs through to the underlying
  services.
- HTTP endpoints moved to ``chat_id``-scoped paths to align with
  the existing pattern (``/agent/chats/{chat_id}/turns/{turn_id}/...``)
  and to leverage ``turn_service.get_turn_snapshot(user, chat,
  turn)`` for HTTP-layer ownership pre-check (raises
  ``ResourceNotFoundException`` -> 404 on cross-user / unknown
  turn). New endpoints:
    POST /agent/chats/{chat_id}/turns/{turn_id}/consent/{tool_call_id}
    POST /agent/chats/{chat_id}/turns/{turn_id}/elicit/{elicitation_id}
  Both translate ``ConsentOwnershipError`` /
  ``ElicitationOwnershipError`` -> 403, ``KeyError`` -> 404,
  ``ValueError`` -> 409 (already resolved) or 422 (validation).
- Regression tests:
    test_decide_rejects_cross_user_actor / cross_turn_actor (consent)
    test_submit_rejects_cross_user_actor / cross_turn_actor (elicitation)
    test_request_consent_rejects_empty_turn_or_user
    test_request_input_rejects_empty_server_name (already there)

## B3 (registry composite key per scope_ref)

- ``_ScopeIndex.entries`` keyed on ``(scope_ref, name)`` tuple;
  system tier uses ``scope_ref=None`` (single global namespace).
  Bot/user tiers use the owning ``scope_ref`` so different bots /
  users can independently register the same name without
  collision -- per D9 §1.1 multi-tenant boundary.
- New ``_tier_key()`` helper composes the right key shape per
  scope.
- ``effective_servers()`` switched to keyed iteration so the
  ``scope_ref`` filter happens at lookup time (was after
  iteration, which was too late once a same-name entry had
  already been overwritten).
- ``unregister(scope, name, *, scope_ref=None)`` API added so
  bot/user removals can target the right (scope_ref, name) pair.
- Regression tests:
    test_two_bots_can_register_same_name_without_collision
    test_two_users_can_register_same_name_without_collision
    test_user_register_does_not_leak_to_other_user_resolution
    test_bot_register_does_not_leak_to_other_bot_resolution
    test_unregister_is_scope_ref_aware_for_bot_user_tiers

## B4 (unknown-risk default-deny)

- ``ToolAuthorizationPolicy.evaluate`` -- when the
  ``risk_resolver`` returns ``None`` for an unknown tool, the
  policy now returns ``visible=True, can_invoke_auto=False,
  requires_consent=True, risk="writes_user_data"`` instead of the
  previous ``READ_ONLY`` auto-invocable default. Per architect
  canonical lock msg=19f2c9a9: misclassified side-effect tools
  must NOT silently bypass the consent gate; the security-first
  fail-closed posture only costs an extra consent prompt for
  tools that operators forget to classify as ``READ_ONLY``.
- Regression test:
    test_unknown_tool_default_deny_per_security_canonical
    test_unknown_tool_filter_visible_keeps_consent_required_tool

## Gates

- ``pytest tests/unit_test/agent_runtime/test_tools_*.py -q``: 95 passed
  (was 84 + 11 new B2/B3/B4 tests; old elicitation tests
  re-targeted to ``actor_user_id="user-1"`` to match the
  test-fixture binding ``user_id="user-1"``)
- ``pytest tests/unit_test/ -q --deselect concurrent_control/test_performance_comparison.py``:
  828 passed / 29 skipped / 0 failed
- ``ruff check`` + ``ruff format --check``: clean

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…3 merge (#1699)

Phase 9 D10.a follow-up — append second delta block covering
e290488 -> bd4052d (#75 D8.3 backend tool lifecycle + citations
+ consent + elicitation merge).

Per @明书 prior commitment in task #83 thread (msg=4c385635 +
msg=95221e79) and PM trigger (msg=4b13bd46): re-pin ground truth to
post-#75 main; verify D9 base reuse matrix rows.

Diff scope: 11 aperag files / +2546 / -18 (excluding the prior delta's
own doc PR landing). New tools/ subpackage with 9 modules and 8 test
files (95 contract tests).

Appendix A flips: SafeToolName resolver, 3-tier registry, 7-point
contract items ②③⑤, three-level authorization, tool lifecycle, D8.3
citations all now on-disk + tested. Body §B, §C, §D, Appendix B
unchanged. Read-only D10 compliance lower bound (7-point items ①④⑥⑦)
conclusions unchanged but anchor points now have canonical on-disk
impls.

Nuance noted: translator.py:120 TODO comment still present at HEAD;
the new tool lifecycle path bypasses translator.py via tools/lifecycle.py
LifecycleEmitter, so the legacy translator hook becomes a separate
integration concern rather than a D10 blocker.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
#1701)

Per @earayu2 msg=d02d70dd: agents are blocked on every commit by the
``addlicense`` git pre-commit hook (it injects Apache headers and
aborts the commit asking the agent to re-commit). The project no
longer needs forced license header injection at this stage; this
PR removes the friction so agents can commit smoothly.

What is removed:

- ``scripts/hooks/`` (3 files: ``pre-commit`` + READMEs) -- the
  pre-commit hook that ran ``make lint`` + ``make add-license`` on
  every commit. ``make add-license`` modifies files mid-commit and
  forces a redo, which is the actual blocker.
- ``scripts/install-hooks.sh`` -- the helper that copied
  ``scripts/hooks/*`` into ``.git/hooks/``.
- ``Makefile`` targets: ``add-license`` / ``check-license`` /
  ``install-addlicense`` / ``install-hooks``. The ``addlicense``
  binary download path is also gone.
- ``env-dev`` no longer depends on ``install-addlicense`` /
  ``install-hooks``; new clones get a clean dev environment with no
  license tooling.
- ``docs/zh-CN/development/development-guide.md`` lines describing
  the now-removed hooks + addlicense steps.

What is kept (per scope discipline):

- ``make lint`` / ``make format`` (ruff check + format) -- still the
  canonical hygiene gate, runs in CI ``lint-and-unit``.
- Existing license headers in source files -- not stripped, since
  bulk-removing them would be a huge unrelated diff.
- Unit / e2e test gates -- untouched.
- CI workflows -- never referenced ``add-license`` / ``check-license``
  / pre-commit hooks (verified via grep), so no CI changes needed.

Local agents that previously had the pre-commit hook installed will
still have a stale copy in ``.git/hooks/pre-commit`` until they
delete it manually; that is per-clone state, no repo action required.
``make env-dev`` for fresh clones produces no hook.

Boundary: hygiene-only PR. No app code, no migration, no schema
change. lint + unit gates from main remain in place; only the
agent-friction tooling is removed.
…reducer (#1700)

* feat(phase8 #76 D8.4a): FE AI SDK-compatible stream transport + part reducer

D8.4a first-cut. Replaces the legacy AgentRuntimeRedisStore SSE consumer
with a fetch+ReadableStream transport that speaks the AI SDK v5 UI
Message Stream Protocol. Hooks the new client into `chat-messages.tsx`
through a narrow `legacy-snapshot-shim` so `AgentTurnCard` keeps
rendering until the parts renderer (#77) ships.

Module layout (`web/src/features/agent-runtime/`):
* `types.ts` — wire `StreamPart` typed union (mirrors
  `aperag/domains/agent_runtime/wire/parts.py`) + at-rest
  `AgentMessagePart` (text / tool / source / citation / consent /
  elicitation) shaped to align with `@ai-sdk/react`'s `UIMessagePart`.
* `stream-parser.ts` — SSE frame parser (handles `id:` + `data:` only,
  ignores comments/heartbeats; carries trailing partial frames).
* `stream-client.ts` — single-connection consumer; validates
  `x-vercel-ai-ui-message-stream: v1` response header, forwards
  `Last-Event-ID` on resume, terminates on `finish` / `error` /
  `abort` and on local `AbortSignal`.
* `reducer.ts` — collapses lifecycle wire parts (`tool-input-*` /
  `tool-output-available`) into consolidated tool parts; dedups by
  stable id (text-block id / toolCallId / sourceId / elicitationId /
  citation fingerprint); transient `data-activity` is replace-last
  only and never reaches the persistent parts list.
* `use-agent-turn-stream.ts` — React hook with reconnect loop; surfaces
  `{ parts, transientActivity, status, errorText, lastSequence,
  abort }` to consumers (#77 / #78).
* `api.ts` — typed JSON wrappers for create/cancel/snapshot/artifact +
  consent/elicitation submit endpoints (#78 plug-in surface).
* `legacy-snapshot-shim.ts` — TODO(#77 dongdong) projection back to
  `AgentTurnSnapshot { turn, timeline, artifacts }` so the existing
  card renders during the transition. Boundary: streamingAnswer
  (grouped per text-block id), patched turn status; timeline +
  artifacts pass through from the baseline snapshot only.

Wire-protocol contracts (architect msg=bad0cd0f) — all verifiable in
the consumer:
1. AI SDK v5 typed parts surface (`StreamPart` mirrors BE; index
   re-exports SDK-aligned `AgentMessagePart` shapes).
2. Header marker — `x-vercel-ai-ui-message-stream: v1` checked before
   any `onPart` dispatch.
3. Resume / error / abort — `Last-Event-ID` header + `after_sequence`
   query on every reconnect; `error` part dispatched then connection
   terminates (no auto-retry on protocol failure beyond reconnect
   loop bounded at 5 attempts); `abort` part flips status and the
   `AbortController` cleans up.
4. Part-level dedup — by stable identifier per part type (architect
   msg=f35c5a3d Lock C); envelope-atomic replay tolerated.
5. Wire shape adoption — wrapped `{type, data:{...}}` for
   `data-citation/data-tool-consent/data-elicitation/data-activity`
   passes through unchanged; outer keys camelCase.
6. Transient `data-activity` — never persisted; surfaced on the
   separate `transientActivity` slot.

Two-phase lifecycle (ApeRAG-specific, captured for D10 reference):
client must POST `/agent/chats/{cid}/turns` first to obtain the
stream URL, then GET that URL to begin the SSE body. `useChat` is not
adopted because its single-step POST+stream lifecycle does not match.

`web/package.json`: adds `@ai-sdk/react@^2.0.0` + `ai@^5.0.0` for
typed parts surface (used today via re-exports; #77 will lean on
`isTextUIPart` / `isToolUIPart` / `isDataUIPart` directly).

Verified: `yarn lint` clean; `tsc --noEmit` clean for the touched
files (pre-existing main-branch errors in `chat-input.tsx` /
`page.tsx` / `collection-form.tsx` unrelated); `yarn dev` boots in
2.3s, GET / / `/auth/signin` / `/workspace/collections` /
`/workspace` all return 200.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(phase8 #76 D8.4a): forward-compat tool-output-error wire shape

Per architect canonical decision (msg=2f9225f5) — strict AI SDK v5
spec splits tool failure into a separate `tool-output-error` part type
(`{toolCallId, errorText}`). BE migration tracked as task #89 (D8.0c+
hygiene fix-forward, owner @cuiwenbo). The reducer now accepts both
the current `tool-output-available + errorText` shape and the post-#89
`tool-output-error` shape so the FE rolls forward without coupling to
BE timing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #76 D8.4a): address Weston B1/B2 — terminal-driven close + SDK-compatible parts

Weston msg=63a796f3 review identified two blockers within the locked
review boundary; both fixed in-PR.

## B1 — Terminal-driven completion (stream-client.ts)

Before: `consumeAgentStream()` returned `{reason:'completed'}` on
`reader.read()` `done`, regardless of whether a `finish` / `error` /
`abort` part had been dispatched. A clean mid-turn TCP close at the
HTTP layer would mark the turn completed instead of triggering the
reconnect loop, leaving #77 to render half-streamed parts as the
final message.

After: EOF without a terminal part returns `{reason:'error', error:
'stream closed before terminal frame'}` so the hook reconnects with
`Last-Event-ID` from the highest-seen `id:` field. Existing reconnect
budget (5 attempts) bounds persistent failures.

## B2 — SDK-compatible part union (types.ts + reducer.ts +
##      legacy-snapshot-shim.ts)

Before: `AgentMessagePart` used an ApeRAG-local `{kind: ...}`
discriminator. The PR claimed #77 could lean on `@ai-sdk/react`'s
`isTextUIPart` / `isToolUIPart` / `isDataUIPart` guards, but the SDK
guards branch on `type`, not `kind` — so the seam was nominally
SDK-aligned, factually divergent.

After: every part uses a `type:` discriminator that matches the SDK
exactly:
* `text` / `source-url` / `source-document` mirror the corresponding
  SDK `*UIPart` shapes structurally.
* Tool parts use `type: \`tool-${SafeToolName}\`` so the SDK's
  `isToolUIPart` `startsWith('tool-')` guard accepts them. `toolName`
  is also kept as a sibling field for direct render access.
* `data-citation` / `data-tool-consent` / `data-elicitation` use the
  SDK `DataUIPart` shape (`{type: 'data-${name}', id, data}`); `id`
  is the dedup key (citation fingerprint, toolCallId, elicitationId
  respectively).

A compile-time `_AgentMessagePartIsSDKCompatible` assertion in
`types.ts` enforces structural assignment to the SDK's
`TextUIPart` / `SourceUrlUIPart` / `SourceDocumentUIPart` /
`DataUIPart<ApeRAGUIDataTypes>` types — drift fails type-check.

Reducer is rewritten to produce the new shapes; consent and
elicitation now correctly replace existing parts when their state
transitions (the previous `kind:` shape relied on `update?` callback
that was a no-op for the consent/elicitation flow). `null` fields
from the wire are coerced to `undefined` to satisfy SDK shape
expectations.

`legacy-snapshot-shim.ts`: top comment claim "minimal timeline (one
entry per running tool call)" was a drift — the actual code only
passes through `baselineSnapshot.timeline` / `.artifacts`. Comment
realigned to actual coverage (per dongdong msg=f33e9039 minor).

Verified: `yarn lint` clean; `tsc --noEmit` clean for the touched
files (the SDK compatibility assertion compiles, proving structural
assignment); `yarn dev` boots in 3.5s on port 3011 with `GET /`,
`/auth/signin`, `/workspace/collections`, `/workspace` all 200.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* refactor(model-platform): replace provider dialect configuration

Introduce a first-class model account/model/use abstraction so users no longer configure LiteLLM dialects or custom provider routing names.

Made-with: Cursor

* fix(model-platform): align model_provider provider_type unique index

The model_provider table introduced by b4f2d91c8e3a declared
provider_type with a separate UniqueConstraint plus a non-unique index,
but the ORM declares the column as unique=True, index=True (which
SQLAlchemy renders as a single unique index). alembic check flagged the
drift and broke lint-and-unit on PR #1697. This revision drops the
redundant unique constraint and promotes the index to unique=True so
the autogenerate diff is clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(model-platform): unblock e2e-http-provider after v2 provider removal

The model-platform refactor deleted /api/v2/providers/* and
db.ops.query_provider_api_key without migrating the call sites that
still rely on them. Three small follow-ups to make CI green:

* Add AsyncLlmProviderRepositoryMixin.query_provider_api_key as a thin
  shim over ModelAccount so document_service.fetch_url_documents and
  web_access routes that look up the user's Jina key keep working.
  Falls back to public ACTIVE accounts when ``need_public`` is set,
  matching the old llm_provider.api_key semantics.
* Rewrite tests/e2e_http/hurl/full/10_provider_llm.hurl to drive the
  new /api/v3/model-providers, /model-accounts, /models, /model-uses
  surface plus /api/v1/embeddings + /api/v1/rerank with model_id. Uses
  provider_type=dashscope for the alibabacloud account and
  provider_type=openai_compatible for the openrouter account, both of
  which are seeded by 7c4e9e1f8b21.
* Update tests/unit_test/chat/test_chat_title_service.py to reflect the
  new chat-title flow: it no longer reaches into
  default_model_service; instead db_ops.query_model_uses returns the
  background-task ModelUse and the assertion now guards against
  model_invocation_service.chat being awoken on an empty history.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(model-platform): replace api-key shim with ModelPlatformService surface

The temporary ``query_provider_api_key`` shim on the LLM provider
repository is removed. Cross-domain callers that need a raw provider
API key (web_access JINA reader, knowledge_base fetch-url) now go
through ``ModelPlatformService.get_user_provider_api_key``, the
canonical surface for non-model-platform domains. The repository
keeps the SQL primitive as ``query_model_account_api_key`` — a
properly-named consumer of the new ``ModelAccount`` row, no longer
framed as a backward-compat shim.

This closes the design completeness gap PM flagged for #1697 (msg
=8ac3e7d9): the model-platform refactor must not leak a v2-shape api
key lookup into other domains' DB-ops surface.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(e2e-http): migrate hurl 11/13/17/19 to v3 ModelSpec shape

The model-system refactor cuts ``ModelSpec.{model,
model_service_provider, custom_llm_provider}`` and only keeps
``model_id``. Pydantic silently drops extras, so any hurl file still
sending the old triple-shape parses as ``model_id=None`` and the
downstream code-path goes silently broken (collection vector index,
bot completion).

Following the template established by ``10_provider_llm.hurl``, each
file that exercises a real provider path (11, 13, 17) now seeds its
own ``ModelAccount`` + ``Model`` via the v3 routes and references the
captured ``model_id`` in the ``embedding`` / ``completion`` blobs.
``19_retrieval_http.hurl`` is a deterministic 4xx-shape test that
must not depend on any provider seed; the optional embedding /
completion config is dropped entirely.

Closes the second design completeness gap PM flagged for #1697
(msg=8ac3e7d9). Provider keys are still required to actually run
these against a live stack — the shape itself is now correct.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(model-platform): legacy back-compat + data migration + multi-head fix

Three Weston blockers (msg=80e873c1) on the model-platform refactor:

* Blocker A — ``/api/v1/embeddings`` and ``/api/v1/rerank`` are
  permanent OpenAI-compat allowlist routes. The PR's first cut required
  ``model_id`` and broke pre-#1697 callers (provider hurl + external
  clients) with 422. ``EmbeddingRequest`` / ``RerankRequest`` now accept
  either the new ``{model_id}`` shape *or* the legacy
  ``{model, model_service_provider, custom_llm_provider}`` triple. The
  triple is resolved server-side via the new
  ``ModelPlatformService.resolve_legacy_model_id`` (provider_type +
  provider_model_id → ``Model.id``). ``/api/v3/model-*`` is untouched
  and still ``model_id``-only.

* Blocker B — alembic multi-head with #74. Both ``b4f2d91c8e3a`` and
  ``d8e2c4a17b91`` (which landed on main while #1697 was open) had
  ``down_revision=7c4e9e1f8b21``. Rebased onto current main and
  re-chained ``b4f2d91c8e3a → d8e2c4a17b91`` so ``alembic heads``
  reports a single head (``84fac9e3d8c2``).

* Blocker C — pre-#1697 collection / bot configs in the DB hold the
  legacy triple in JSON. Pydantic silently dropped extras after the
  schema cut, so existing rows would parse with ``model_id=None`` and
  the runtime resolver would 404. Two halves:

  - ``ModelSpec`` now stashes the legacy triple onto private
    ``legacy_*`` slots at parse time and exposes ``has_legacy_triple()``.
    Sync code paths (``base_embedding`` / ``base_completion``) and the
    async agent-runtime path (``_resolve_request``) call the new
    sync/async resolvers to fill ``model_id`` lazily before the
    runtime lookup runs. ``model_dump`` does not leak the legacy fields
    — only the canonical ``{model_id}`` shape goes back out the wire.

  - The ``b4f2d91c8e3a`` migration captures any user-supplied
    ``model_service_provider`` API keys + ``llm_provider_models`` rows
    and replays them as ``ModelAccount`` (``user_id="public"``) +
    ``Model`` rows in the new schema *before* dropping the legacy
    tables. Best-effort, not strict — rows that don't fit any new
    provider type are skipped (see migration docstring for the
    mapping). Idempotent w.r.t. repeated ``alembic upgrade`` (the
    legacy-table drops are now ``inspect``-guarded after Phase-7
    teardown rebuilt the baseline migrations from scratch).

Regression coverage in ``tests/unit_test/test_model_platform_v1_compat.py``:
new + legacy parses for both ``EmbeddingRequest`` / ``RerankRequest``
and ``ModelSpec``; ``model_id`` precedence over the legacy triple;
private legacy fields stay out of ``model_dump``. The existing v3
contract test (``test_model_platform_v3_contract.py``) still asserts
the new ``/api/v3/...`` schemas never expose the legacy field names.

Local gates: ``make db-check`` (single head, no autogen drift),
``make lint``, ``pytest tests/unit_test/`` (722 passed, 29 skipped).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(e2e-http): migrate hurl 12/14/15/20 + bash scripts to v3 ModelSpec shape

Closes Blocker D (Option A-extended, PM lock-in msg=e551e144 +
msg=06e8f718). PR #1697 collapses the legacy
``{model, model_service_provider, custom_llm_provider}`` triple and
keeps only ``model_id`` in ``ModelSpec``. Pydantic silently drops
extras, so any hurl / bash file still sending the old triple parses
as ``model_id=None`` and the downstream code-path goes silently
broken (collection vector index, bot completion, graph index).

Following the template established in ``10_provider_llm.hurl`` (and
the previous migration of 11/13/17 in ad69af9), each remaining file
that exercises a real provider path now seeds its own ``ModelAccount``
+ ``Model`` via the v3 routes and references the captured
``model_id`` in the ``embedding`` / ``completion`` blobs.

Files migrated:
* tests/e2e_http/hurl/full/12_bot.hurl
* tests/e2e_http/hurl/full/14_graph_http.hurl
* tests/e2e_http/hurl/full/15_agent_runtime_v3.hurl
* tests/e2e_http/hurl/full/20_knowledge_graph_http.hurl
* tests/e2e_http/scripts/run_chat_collection_flow.sh
* tests/e2e_http/scripts/run_graph_index_flow.sh

The bash scripts now require ``E2E_ALIBABACLOUD_API_KEY`` +
``E2E_OPENROUTER_API_KEY`` so they can seed the v3 model rows up
front, mirroring the hurl variable convention. No semantic changes
beyond the shape rewrite.

Final grep across ``aperag/ tests/ web/src/ docs/zh-CN/`` confirms
no live caller / hurl / bash / FE config payload still sends the
legacy triple — the only remaining matches are:

* the new Blocker A compat parser in ``aperag/domains/model_platform``
  + ``aperag/schema`` (intended — it accepts both shapes),
* litellm SDK ``custom_llm_provider`` keyword args inside the LLM
  invocation runners (a different namespace — that field is the
  Python kwarg name on ``litellm.completion``),
* the ghost-guard / regression-guard tests in
  ``tests/unit_test/test_model_platform_v3_contract.py`` and
  ``tests/unit_test/tasks/test_collection_init_skip.py``,
* the legacy migration files (which by definition had to ``CREATE
  TABLE`` the old schema before the refactor migration ``DROP``s it),
* and the FE i18n bundles + audit-log docs (string labels, not JSON
  contract field names — out of scope).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(model-platform): personal model_account beats newer public row

query_model_account_api_key documents fallback semantics as "fall back to
public when the user has no personal account", but ORDER BY only sorted
by gmt_updated DESC — a freshly-edited shared "public" row silently
shadowed the caller's own credential when fallback_to_public=True. Both
production callers (fetch_url_documents, _get_user_jina_api_key) hit
this path, so it was a real correctness bug.

Add an ownership-first ORDER BY (CASE WHEN user_id = $caller THEN 0
ELSE 1) before the timestamp ordering so a user-owned row always
wins over public, regardless of update timestamps. Public is still
considered when (and only when) the caller has no personal row.

Regression tests in test_model_platform_v1_compat.py:
- test_user_personal_key_wins_over_newer_public_key: seed user row
  (1h ago) + public row (now), expect "user_key"; verified red on
  the un-fixed query
- test_public_key_returned_when_user_has_no_personal_account: sanity
  that the actual fallback path is unaffected

Weston blocker msg=fcefbaf7. No schema change (db-check clean).

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…itation/activity) (#1703)

* feat(phase8 #77 D8.4b): FE message-parts renderer (text/tool/source/citation/activity)

D8.4b first-cut. Replaces the legacy `AgentTurnCard` + `legacy-snapshot-shim`
projection with a renderer that consumes the new `useAgentTurnStream`
seam (D8.4a, merge `63a9d522`) directly. Each `AgentMessagePart` is
rendered by type; transient `data-activity` is surfaced through a
separate inline indicator and never persisted.

## What lands

* **NEW** `web/src/components/chat/agent-turn-renderer.tsx` — rebuilds
  the activity card from the `parts` stream. Keeps the L1 visual
  baseline (avatar + status badge + activity stream Collapsible +
  answer Card + debug Collapsible + references Sheet + feedback +
  copy) so non-technical users see the same affordance.
  * `<ToolActivityItem>` — one entry per `tool-${SafeToolName}` part;
    state-aware label / icon / debug-expand previews of input + output
    (or errorText on `output-error`).
  * `<ActivityIndicator>` — transient `data-activity` rendered inline
    above the activity stream entries; replaced on each new frame and
    never persisted.
  * `<ConsentPlaceholder>` / `<ElicitationPlaceholder>` — fallback
    rendering for `data-tool-consent` / `data-elicitation` parts when
    no interactive slot is provided. **#78 chenyexuan** plugs in
    concrete components via the new `ConsentSlot` / `ElicitationSlot`
    props on `AgentTurnRendererProps`.
  * References sheet now sources from `source-url` / `source-document`
    parts + `data-citation` content, replacing the old
    `reference_bundle` artifact path.

* `chat-messages.tsx` — `AgentTurnStreamCard` now feeds the hook
  output directly into `AgentTurnRenderer`; the
  `projectToLegacySnapshot` projection layer is gone.

* **DELETE** `web/src/components/chat/agent-turn-card.tsx`
  (1279 LOC) — replaced by the new renderer end-to-end.
* **DELETE** `web/src/features/agent-runtime/legacy-snapshot-shim.ts`
  — its only caller (`AgentTurnStreamCard`) no longer needs the
  projection. `getRunningToolName` / `projectToLegacySnapshot` /
  `LegacySnapshotShim` are dropped from the feature module
  re-exports.

## Slot props (the only seam crossing into #78 territory)

```ts
type ConsentSlotProps = {
  chatId: string;
  turnId: string;
  part: AgentToolConsentPart;
};
type ElicitationSlotProps = {
  chatId: string;
  turnId: string;
  part: AgentElicitationPart;
};

type AgentTurnRendererProps = {
  // ... part stream + status from useAgentTurnStream
  ConsentSlot?: React.ComponentType<ConsentSlotProps>;
  ElicitationSlot?: React.ComponentType<ElicitationSlotProps>;
};
```

#78 chenyexuan implements `consent-prompt.tsx` + `elicitation-form.tsx`
that conform to these prop signatures; both call
`decideToolConsent` / `submitElicitation` from the agent-runtime API
client landed in D8.4a. Optional by design — the placeholder
fallback keeps the parts visible even if a slot is not yet wired.

## i18n

Adds to `page_chat.json` (zh-CN + en-US):
* `activity_stream.tool.title` + `activity_stream.tool.state.{input-streaming|input-available|output-available|output-error}`
* `activity_stream.transient.{thinking|searching_knowledge|reading_source|comparing_results|writing_answer|waiting|completed|error}`
* `activity_stream.consent.placeholder_{title,state}`
* `activity_stream.elicitation.placeholder_state`
* `activity_stream.{completed_empty,pending_empty}`
* `answer_section.completed_empty`

## Verification

* `yarn lint` clean.
* `tsc --noEmit` clean for the touched files (the four pre-existing
  errors in `chat-input.tsx` are unrelated and untouched here).
* `yarn dev` boots in 2.8s on port 3012; `GET /`, `/auth/signin`,
  `/workspace/collections`, `/workspace` all return 200.

## Notes

* The EOF-before-terminal regression test follow-up that Weston
  flagged on D8.4a (msg=b7ae3bfd) is not bundled here — there is no
  FE test infra in the repo today, and adding `vitest` is its own
  scope. The behavior is documented at the relevant code paths in
  `stream-client.ts` + `reducer.ts`; recommend adding a dedicated
  test-infra PR after `#77/#78` land.
* No hook contract changes; `useAgentTurnStream` and the
  `AgentMessagePart` typed union are exactly as merged in
  `63a9d522`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #77 D8.4b): synthesize parts from snapshot for terminal historical reload

Addresses dongdong msg=97336fb9 — terminal historical AI turns reloaded
through `seedFromSnapshot()` were rendering as empty `idle` cards
because `useAgentTurnStream({ streamUrl: null })` keeps `parts: []`
and `status: 'idle'`, and the new renderer no longer reads
`baselineSnapshot.timeline / .artifacts` directly.

Fix scope: read-only synthesis of `AgentMessagePart[]` from the legacy
snapshot's artifacts (answer text → one `text` part; reference bundle
items → `source-url` + `data-citation` parts) when the hook is
dormant for a terminal turn. Backend status is mapped back to the
stream-side enum so the renderer's status branching stays consistent.

Files:
* **NEW** `web/src/features/agent-runtime/snapshot-fallback.ts` —
  `synthesizePartsFromSnapshot()` + `mapBackendTurnStatus()` +
  `isTerminalBackendStatus()` helpers. Read-only, never feeds the
  live reducer; deletes wholesale once the BE snapshot endpoint
  returns UIMessages.
* `chat-messages.tsx` — `AgentTurnStreamCard` falls back to
  synthesized parts + mapped status when `streamUrl == null` and the
  live stream has not produced anything. Live turns are unaffected.
* `features/agent-runtime/index.ts` — re-exports the fallback helpers.

Tool call timeline is intentionally NOT replayed for historical turns —
matches the legacy `agent-turn-card` behaviour, which also did not
show tool-call activity stream once the answer artifact had landed.

Verified: `yarn lint` clean; `tsc --noEmit` clean for touched files;
`yarn dev` boots in 2.6s on port 3013; `GET /`, `/auth/signin`,
`/workspace/collections`, `/workspace` all return 200.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refine(phase8 #77 D8.4b): pin TODO(#90) on snapshot-fallback + error_summary handling

Per architect msg=711f8c2f review of the prior `2effca4a` fix:

* File header now explicitly references task **#90 (D8.4d)** as the
  removal trigger — `Bryce` claimed #90 (msg=00230183) to migrate
  the snapshot endpoint to canonical UIMessage parts, after which
  this whole module deletes wholesale.
* Adds `extractErrorTextFromSnapshot()` covering the
  `error_summary` artifact, mapping its payload (`message` /
  `text` / `summary` / artifact-level summary) back into the
  renderer's `errorText` channel. The wire/at-rest contract treats
  `error` as a lifecycle marker (status + errorText), not a part,
  so this stays out of `AgentMessagePart[]`.
* `chat-messages.tsx` `AgentTurnStreamCard` chains
  `extractErrorTextFromSnapshot` ahead of `envelope.error_message`
  in the fallback path so historical FAILED turns surface the
  richer artifact text when present.

Verified: `yarn lint` clean; `tsc --noEmit` clean for the touched
files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #77 D8.4b): update Phase 1b batch 6 contract test for renderer rename

CI lint-and-unit failed because `tests/unit_test/test_web_typed_api_contract.py`
hardcoded a path to `web/src/components/chat/agent-turn-card.tsx`,
which #77 deleted in favor of the new `agent-turn-renderer.tsx`.
Swap the path; the same `@/api` / legacy-SDK / FeedbackTagEnum
ban-list applies to the new renderer (which only reaches
`@/features/agent-runtime` + `@/features/bot/types`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* docs: add future observability design

Co-authored-by: earayu <earayu@163.com>

* feat: add OTLP-first observability foundation

Co-authored-by: earayu <earayu@163.com>

* fix: tolerate unset legacy otel flag

Co-authored-by: earayu <earayu@163.com>

* fix: satisfy observability lint checks

Co-authored-by: earayu <earayu@163.com>

* fix: avoid duplicate FastAPI instrumentation

Co-authored-by: earayu <earayu@163.com>

* fix: keep application logs capturable

Co-authored-by: earayu <earayu@163.com>

* chore: remove jaeger observability path

Co-authored-by: earayu <earayu@163.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
* feat(phase8 #77 D8.4b): FE message-parts renderer (text/tool/source/citation/activity)

D8.4b first-cut. Replaces the legacy `AgentTurnCard` + `legacy-snapshot-shim`
projection with a renderer that consumes the new `useAgentTurnStream`
seam (D8.4a, merge `63a9d522`) directly. Each `AgentMessagePart` is
rendered by type; transient `data-activity` is surfaced through a
separate inline indicator and never persisted.

## What lands

* **NEW** `web/src/components/chat/agent-turn-renderer.tsx` — rebuilds
  the activity card from the `parts` stream. Keeps the L1 visual
  baseline (avatar + status badge + activity stream Collapsible +
  answer Card + debug Collapsible + references Sheet + feedback +
  copy) so non-technical users see the same affordance.
  * `<ToolActivityItem>` — one entry per `tool-${SafeToolName}` part;
    state-aware label / icon / debug-expand previews of input + output
    (or errorText on `output-error`).
  * `<ActivityIndicator>` — transient `data-activity` rendered inline
    above the activity stream entries; replaced on each new frame and
    never persisted.
  * `<ConsentPlaceholder>` / `<ElicitationPlaceholder>` — fallback
    rendering for `data-tool-consent` / `data-elicitation` parts when
    no interactive slot is provided. **#78 chenyexuan** plugs in
    concrete components via the new `ConsentSlot` / `ElicitationSlot`
    props on `AgentTurnRendererProps`.
  * References sheet now sources from `source-url` / `source-document`
    parts + `data-citation` content, replacing the old
    `reference_bundle` artifact path.

* `chat-messages.tsx` — `AgentTurnStreamCard` now feeds the hook
  output directly into `AgentTurnRenderer`; the
  `projectToLegacySnapshot` projection layer is gone.

* **DELETE** `web/src/components/chat/agent-turn-card.tsx`
  (1279 LOC) — replaced by the new renderer end-to-end.
* **DELETE** `web/src/features/agent-runtime/legacy-snapshot-shim.ts`
  — its only caller (`AgentTurnStreamCard`) no longer needs the
  projection. `getRunningToolName` / `projectToLegacySnapshot` /
  `LegacySnapshotShim` are dropped from the feature module
  re-exports.

## Slot props (the only seam crossing into #78 territory)

```ts
type ConsentSlotProps = {
  chatId: string;
  turnId: string;
  part: AgentToolConsentPart;
};
type ElicitationSlotProps = {
  chatId: string;
  turnId: string;
  part: AgentElicitationPart;
};

type AgentTurnRendererProps = {
  // ... part stream + status from useAgentTurnStream
  ConsentSlot?: React.ComponentType<ConsentSlotProps>;
  ElicitationSlot?: React.ComponentType<ElicitationSlotProps>;
};
```

#78 chenyexuan implements `consent-prompt.tsx` + `elicitation-form.tsx`
that conform to these prop signatures; both call
`decideToolConsent` / `submitElicitation` from the agent-runtime API
client landed in D8.4a. Optional by design — the placeholder
fallback keeps the parts visible even if a slot is not yet wired.

## i18n

Adds to `page_chat.json` (zh-CN + en-US):
* `activity_stream.tool.title` + `activity_stream.tool.state.{input-streaming|input-available|output-available|output-error}`
* `activity_stream.transient.{thinking|searching_knowledge|reading_source|comparing_results|writing_answer|waiting|completed|error}`
* `activity_stream.consent.placeholder_{title,state}`
* `activity_stream.elicitation.placeholder_state`
* `activity_stream.{completed_empty,pending_empty}`
* `answer_section.completed_empty`

## Verification

* `yarn lint` clean.
* `tsc --noEmit` clean for the touched files (the four pre-existing
  errors in `chat-input.tsx` are unrelated and untouched here).
* `yarn dev` boots in 2.8s on port 3012; `GET /`, `/auth/signin`,
  `/workspace/collections`, `/workspace` all return 200.

## Notes

* The EOF-before-terminal regression test follow-up that Weston
  flagged on D8.4a (msg=b7ae3bfd) is not bundled here — there is no
  FE test infra in the repo today, and adding `vitest` is its own
  scope. The behavior is documented at the relevant code paths in
  `stream-client.ts` + `reducer.ts`; recommend adding a dedicated
  test-infra PR after `#77/#78` land.
* No hook contract changes; `useAgentTurnStream` and the
  `AgentMessagePart` typed union are exactly as merged in
  `63a9d522`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #77 D8.4b): synthesize parts from snapshot for terminal historical reload

Addresses dongdong msg=97336fb9 — terminal historical AI turns reloaded
through `seedFromSnapshot()` were rendering as empty `idle` cards
because `useAgentTurnStream({ streamUrl: null })` keeps `parts: []`
and `status: 'idle'`, and the new renderer no longer reads
`baselineSnapshot.timeline / .artifacts` directly.

Fix scope: read-only synthesis of `AgentMessagePart[]` from the legacy
snapshot's artifacts (answer text → one `text` part; reference bundle
items → `source-url` + `data-citation` parts) when the hook is
dormant for a terminal turn. Backend status is mapped back to the
stream-side enum so the renderer's status branching stays consistent.

Files:
* **NEW** `web/src/features/agent-runtime/snapshot-fallback.ts` —
  `synthesizePartsFromSnapshot()` + `mapBackendTurnStatus()` +
  `isTerminalBackendStatus()` helpers. Read-only, never feeds the
  live reducer; deletes wholesale once the BE snapshot endpoint
  returns UIMessages.
* `chat-messages.tsx` — `AgentTurnStreamCard` falls back to
  synthesized parts + mapped status when `streamUrl == null` and the
  live stream has not produced anything. Live turns are unaffected.
* `features/agent-runtime/index.ts` — re-exports the fallback helpers.

Tool call timeline is intentionally NOT replayed for historical turns —
matches the legacy `agent-turn-card` behaviour, which also did not
show tool-call activity stream once the answer artifact had landed.

Verified: `yarn lint` clean; `tsc --noEmit` clean for touched files;
`yarn dev` boots in 2.6s on port 3013; `GET /`, `/auth/signin`,
`/workspace/collections`, `/workspace` all return 200.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(phase8 #78 D8.4c): FE interactive consent + elicitation UI

Body components for the `<ConsentSlot>` / `<ElicitationSlot>`
placeholders that #77 (huangheng, PR #1703 head `b532abcd`) reserved
on the parts renderer. Consumes the SDK-compatible slot props
(`{ chatId, turnId, part }`) and wires the user's decision back via
`decideToolConsent()` / `submitElicitation()` from the AI SDK-
compatible client API landed by #76.

## Write set (3 files)

- NEW `web/src/components/chat/consent-prompt.tsx` -- renders one
  `AgentToolConsentPart`. Surfaces only `toolName + argsPreview +
  risk badge` (raw args never reach the FE per #75 backend
  redaction), short fingerprint of `argsHash`, plus Approve / Deny
  buttons. State machine is server-driven: clicking the button calls
  `decideToolConsent(...)` and we wait for the next streamed
  `data-tool-consent` part to flip the visible state -- no local
  optimism. Resolved (approved/denied/expired) parts render a
  compact status row.
- NEW `web/src/components/chat/elicitation-form.tsx` -- renders one
  `AgentElicitationPart`. Generates form fields from the JSON-Schema
  fragment (`string` / `number` / `integer` / `boolean` / `enum`,
  `format: textarea` for multi-line, `default` for initial state,
  `required` for FE-side gating). On submit calls
  `submitElicitation(...)` with coerced payload; on validation error
  we leave the form populated for retry. Resolved (answered /
  cancelled) renders a compact status row.
- MOD `web/src/components/chat/chat-messages.tsx` -- imports both
  components and passes them to `AgentTurnRenderer` as `ConsentSlot`
  / `ElicitationSlot`. Renderer shell + transport hook contract
  untouched.

## D9 §3.1 / §5.1 contract (renderer-side verification)

- consent UI shows only `toolName + argsPreview + risk` -- raw args
  never reach the FE wire (BE-side `args_preview()` redaction per
  #75 + `argsPreview` field on `ToolConsentData` per #74 wrapped
  shape).
- consent decisions go to chat-scoped path
  `/agent/chats/{chat_id}/turns/{turn_id}/consent/{tool_call_id}`
  -- HTTP-layer ownership pre-check + service-layer
  `ConsentOwnershipError` defense-in-depth still apply (per #75).
- elicitation form is schema-validated FE-side as a UX accelerator;
  the BE remains source of truth (`tools/elicitation.py`
  `_required_fields_validator` per #75).
- pending -> approved | denied | expired (consent) and pending ->
  answered | cancelled (elicitation) state transitions are picked
  up from the next streamed part; the visible UI is server-driven.
- Error handling: 403 (ownership) / 404 (not found) / 409 (already
  resolved) / 422 (validation) all surface via `toast.error(...)`;
  the form / prompt stays mounted so the user can retry.

## Boundary discipline

- Does NOT change `transport / hook contract` (per PM lock
  msg=6e521597) -- consumes `useAgentTurnStream` shape unchanged.
- Does NOT change renderer shell (per PM lock msg=4adbf669) --
  only fills the slot bodies via `ConsentSlot` / `ElicitationSlot`
  props.
- Does NOT change schema main design (#74 final shape).

## Built on

- #73 D8.1 wire emitter (cuiwenbo, `51137301`)
- #74 D8.2 at-rest UIMessage storage (Bryce, `e290488b`)
- #75 D8.3 backend tool lifecycle + consent/elicit endpoints + 7-point
  contract enforcement (chenyexuan, `bd4052d5`)
- #76 D8.4a SDK-compatible stream transport + `useAgentTurnStream`
  hook + client API (huangheng, `63a9d522`)
- #77 D8.4b parts renderer + `<ConsentSlot>` / `<ElicitationSlot>`
  seam (huangheng, PR #1703 head `b532abcd`) -- this PR is chained
  on top of `#1703` per PM split-write-set lock msg=4adbf669.

## Gates

- `yarn tsc --noEmit` on changed files: clean (8 pre-existing
  errors on main are unrelated to this diff -- all in
  `chat-input.tsx` / `collection-form.tsx` / `collection-provider.tsx`
  / `app/page.tsx`).
- `yarn lint --quiet`: clean (no warnings/errors).

* refine(phase8 #77 D8.4b): pin TODO(#90) on snapshot-fallback + error_summary handling

Per architect msg=711f8c2f review of the prior `2effca4a` fix:

* File header now explicitly references task **#90 (D8.4d)** as the
  removal trigger — `Bryce` claimed #90 (msg=00230183) to migrate
  the snapshot endpoint to canonical UIMessage parts, after which
  this whole module deletes wholesale.
* Adds `extractErrorTextFromSnapshot()` covering the
  `error_summary` artifact, mapping its payload (`message` /
  `text` / `summary` / artifact-level summary) back into the
  renderer's `errorText` channel. The wire/at-rest contract treats
  `error` as a lifecycle marker (status + errorText), not a part,
  so this stays out of `AgentMessagePart[]`.
* `chat-messages.tsx` `AgentTurnStreamCard` chains
  `extractErrorTextFromSnapshot` ahead of `envelope.error_message`
  in the fallback path so historical FAILED turns surface the
  richer artifact text when present.

Verified: `yarn lint` clean; `tsc --noEmit` clean for the touched
files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #78 D8.4c): wire consent + elicitation UI through i18n catalog

Address dongdong B1 blocker on PR #1704 (msg=e86a774b): the new
consent prompt + elicitation form rendered hardcoded English strings
(button labels, toast messages, risk badges, state labels), breaking
the zh-CN visual baseline that #77 already established for the
renderer placeholders.

## What changed

- `web/src/components/chat/consent-prompt.tsx` -- replace hardcoded
  English with `useTranslations('page_chat')`. Risk label looks up
  `activity_stream.consent.risk.{key}`; resolved-state label looks up
  `activity_stream.consent.state_label.{state}`; toast falls back to
  `activity_stream.consent.decision_failed` when the API rejection
  carries no message. Dynamic identifiers (`toolName`, `argsPreview`,
  `argsHash`) stay verbatim per dongdong's guidance.
- `web/src/components/chat/elicitation-form.tsx` -- same pattern:
  `submit` / `submitting` / `submit_failed` / `from_server` /
  `no_schema_fields` / `missing_required` / `invalid_value` /
  `select_placeholder` / `state_label` all routed through i18n.
  Schema field `title` / `description` and the prompt itself stay
  verbatim (BE-controlled identifiers).

## Catalog updates (en-US + zh-CN, both split + merged forms)

Added under `activity_stream.consent` and `activity_stream.elicitation`:

- consent: `approve` / `deny` / `approving` / `denying` /
  `args_fingerprint` / `decision_failed` / `resolved_status` /
  `risk.{writes_user_data, calls_external_api, modifies_system,
  admin_only}` / `state_label.{approved, denied, expired}`
- elicitation: `submit` / `submitting` / `submit_failed` /
  `from_server` / `no_schema_fields` / `missing_required` /
  `invalid_value` / `select_placeholder` / `resolved_status` /
  `state_label.{answered, cancelled}`

The merged `web/src/i18n/{en-US, zh-CN}.json` catalogs and the per-page
`web/src/i18n/{en-US, zh-CN}/page_chat.json` files both got the same
additions so `yarn i18n:sync` regenerates `en-US.d.json.ts` typed
catalog with the new keys.

## Boundary unchanged

- Slot props (`ConsentSlotProps` / `ElicitationSlotProps`) untouched.
- No transport / hook / renderer-shell changes.
- BE contract surface (decideToolConsent / submitElicitation /
  ToolConsentData / ElicitationData) untouched.

## Gates

- `yarn i18n:sync` regenerated typed catalog.
- `yarn tsc --noEmit` on changed files: clean (0 errors in #78
  files; 8 pre-existing main errors unrelated).
- `yarn lint --quiet`: clean (no warnings/errors).

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… parts (#1705)

* feat(phase8 #90 D8.4d): snapshot endpoint returns canonical UIMessage parts

Per architect msg=711f8c2f canonical lock + PM scope (msg=383c2e2b /
msg=247f4d8e): the agent runtime turn snapshot endpoint
(`GET /api/v2/agent/chats/{cid}/turns/{tid}`) is migrated from the
legacy `{turn, timeline, artifacts}` envelope to the canonical
`UIMessage`-aligned `AgentTurnSnapshot` shape, so the FE renderer
(#76 / #77 / #78) consumes the same `UIMessagePart` discriminated
union from both the live SSE stream and the at-rest reload path
(D8 §2 wire / at-rest byte-equal).

Backend changes
- New `aperag/domains/agent_runtime/snapshot_assembler.py` projects
  legacy `AgentArtifact` rows into `UIMessagePart[]`:
  * `answer` → single `TextPart`
  * `reference_bundle` → N × `SourceUrlPart` + N × `DataCitationPart`
  * `error_summary` → not a part; surfaced via `error_text`
  * `tool_result_summary` / `search_result_summary` → skipped
  Mirrors the FE-side `snapshot-fallback.ts` adapter that #77
  huangheng landed as a transitional bridge so deletion was
  mechanical from this side.
- `AgentTurnSnapshot` (now defined in `uimessage.py` next to the
  rest of the UIMessage family; re-exported from `schemas.py` for
  back-compat) flips to `{schema_version, turn_id, chat_id, role,
  status, parts, error_text?, timeline_cursor, ...timestamps}`.
- `TurnService.get_turn_snapshot()` rewritten:
  1. Forward-compat: try `UIMessageStore.read(turn_id)` (D8.6
     populates `agent_message.parts` directly; today the store is
     optional and reads return None).
  2. Fallback: `assemble_parts_from_artifacts` projects legacy
     artifacts.
  3. `extract_error_text` pulls the `error_summary` artifact's
     payload message (or summary) for FAILED / CANCELLED turns,
     falling back to `runtime_state.error_message`.
- The 3 ownership-only callers (cancel / consent / elicit) of
  `get_turn_snapshot` are unaffected — they only use the call to
  trigger `ResourceNotFoundException`, never read the body.

Frontend changes (deletes the #77 transitional adapter)
- `web/src/features/agent-runtime/api.ts`:
  `AgentTurnSnapshotEnvelope` flips to the new flat shape with
  `parts: AgentMessagePart[]`. Old `{turn, timeline, artifacts}`
  fields are gone.
- `web/src/components/chat/chat-messages.tsx`:
  `seedFromSnapshot` synthesizes a minimal `AgentTurnEnvelope` from
  the new flat snapshot for the live-turn store; reload-path
  rendering reads `baselineSnapshot.parts` and
  `baselineSnapshot.error_text` directly without any client-side
  synthesis.
- `web/src/features/agent-runtime/snapshot-fallback.ts`:
  `synthesizePartsFromSnapshot` and `extractErrorTextFromSnapshot`
  are removed (their TODO(#90) trigger has fired).
  `mapBackendTurnStatus` and `isTerminalBackendStatus` are kept --
  status mapping is still useful even after the schema flip.
- `web/src/features/agent-runtime/index.ts`: the dropped exports
  are removed from the barrel.

Tests
- `tests/unit_test/agent_runtime/test_snapshot_assembler.py` (NEW,
  8 tests) pins the artifact → UIMessagePart projection: answer
  text, reference-bundle fan-out (with and without uri), ordering,
  unknown-artifact skip, error-text extraction (payload preferred,
  summary fallback, none-without-error_summary).
- `tests/unit_test/agent_runtime/test_agent_runtime_v3.py` snapshot
  tests rewritten for the new shape:
  * `test_turn_snapshot_returns_canonical_uimessage_parts_for_completed_turn`
  * `test_turn_snapshot_surfaces_error_text_for_failed_turn`
  * `test_turn_snapshot_does_not_expose_legacy_keys` (regression
    guard: `{turn, timeline, artifacts}` must not reappear)
  * `test_turn_snapshot_user_activity_inference_runs_via_event_service`
    pins the empty-timeline guarantee on the new shape.
- `tests/e2e_http/hurl/full/15_agent_runtime_v3.hurl` snapshot
  assertions migrated to the new shape (`turn_id`, `chat_id`,
  `role`, `status`, `parts` at top level; legacy keys gone).
- The OpenAPI contract test at
  `tests/unit_test/agent_runtime/test_agent_runtime_openapi_contract.py`
  continues to pass — the schema name is unchanged
  (`AgentTurnSnapshot`); only the fields differ.

Gates
- `pytest tests/unit_test/agent_runtime/ -q` → 134 passed
- `pytest tests/unit_test --deselect concurrent_control flake -q`
  → 831 passed / 29 skipped / 0 failed
- `ruff check` + `ruff format --check` → clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #90 D8.4d): retire snapshot.artifacts in eval worker + OpenAI completion

The CI e2e-http-provider failure on PR #1705 head c16d131 surfaced
two production callers of ``TurnService.get_turn_snapshot()`` that
were still accessing the legacy ``snapshot.artifacts`` /
``snapshot.turn.answer_artifact_id`` shape. Both extract artifact
data for purposes independent of the FE-facing UIMessage protocol
(eval answer-text capture and OpenAI-compat completion content), so
they switch to ``db_ops.query_agent_artifacts_by_turn(turn_id)``
directly rather than reconstructing artifact data from the new
``UIMessagePart[]``.

- ``aperag/domains/evaluation/worker.py``:
  ``_extract_answer_text`` now takes a raw artifact list; the call
  site fetches artifacts via ``db_ops.query_agent_artifacts_by_turn``
  before invoking it.
- ``aperag/domains/conversation/service/chat_completion_service.py``:
  ``_build_completion_content`` rewritten to take the same raw
  artifact list; the OpenAI-compat completion path keeps its
  artifact-shaped logic (independent of FE protocol).
- ``tests/unit_test/chat/test_chat_completion_service.py``:
  ``_FakeTurnService`` swaps the old ``snapshot=`` parameter for
  ``artifacts=`` and exposes ``query_agent_artifacts_by_turn`` on
  its ``db_ops`` mock; the helper ``_snapshot()`` becomes
  ``_artifacts()`` returning the raw list.

Gates: full unit suite 831 passed / 29 skipped / 0 failed; ruff
check + format clean. CI e2e-http-provider should now pass since
the only ``AttributeError: 'AgentTurnSnapshot' object has no
attribute 'artifacts'`` raise was from these two paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #90 D8.4d): migrate 17_chat_collection_flow.hurl snapshot assertions

CI surfaced a second hurl file my inventory missed:
``tests/e2e_http/hurl/full/17_chat_collection_flow.hurl`` had the
same snapshot-endpoint shape assertions
(``$.turn.*`` / ``$.timeline`` / ``$.artifacts``) the previous commit
swept out of ``15_agent_runtime_v3.hurl``. Migrating to the new flat
shape in the same way: top-level ``turn_id`` / ``chat_id`` / ``role``
/ ``status`` / ``parts``, with legacy ``timeline`` / ``artifacts``
gone. The POST create-turn assertions (``$.turn.*`` on the
``CreateTurnResponse`` envelope, line 159-166) are unchanged — that
endpoint is unaffected by D8.4d.

Same root cause as the previous fix commit: my Explore agent
inventory only listed ``15_agent_runtime_v3.hurl`` for the snapshot
endpoint hit; broader grep would have caught this one too.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #90 D8.4d): migrate run_chat_collection_flow.sh poll to canonical parts

CI rerun on 2f20d0d surfaced a bash script my inventory + grep both
missed: ``tests/e2e_http/scripts/run_chat_collection_flow.sh``. It
polls the snapshot endpoint to verify a completed turn:

* ``.turn.status`` → ``.status`` (top-level on new shape)
* ``.turn.answer_artifact_id`` / ``.turn.reference_bundle_artifact_id``
  → derive from ``.parts`` directly (TextPart text + DataCitationPart
  count) instead of fetching the legacy artifact endpoint twice.
* ``Timed out waiting for turn completion artifacts`` → ``parts``

The post-completion assertions now read off the snapshot parts
themselves: answer text non-empty (concatenation of all TextPart
``text`` fields) and at least one ``data-citation`` part. The
legacy ``/api/v2/agent/artifacts/{id}`` round-trip is removed —
post-#90 the FE-facing canonical does not expose artifact IDs from
the snapshot, and the script's intent (verify completion + non-empty
answer + references) is preserved with strictly fewer round-trips.

The POST create-turn response (line 338) keeps using ``.turn.turn_id``
because ``CreateTurnResponse`` is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase8 #90 D8.4d): make reference_count optional in chat-collection-flow script

CI rerun on c42ae44 reached the snapshot polling step, the turn
COMPLETED with a non-empty answer (a clarification reply: "Which
collection would you like me to search in?"), and my too-strict
assertion ``reference_count > 0`` failed.

Pre-#90 script semantics: answer artifact required, reference bundle
artifact optional (the runtime only emits a reference_bundle when the
agent's reply actually cites sources). My first-cut migration kept
the answer-required side but tightened references from optional to
required, breaking the no-citation reply case.

This commit reverts the assertion budget to match the pre-#90 contract
exactly: answer non-empty is required; reference count is logged for
visibility but does not fail the script.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…kind discriminator (#1706)

Phase 8 task #92 (D8.5-BE) — first-cut backend migration of the
non-agent chat path to the canonical ``UIMessage`` shape, scoped per
architect msg=01918929 + Weston msg=df87fe24 + earayu2 msg=f20d5034
hard-cut acceptance:

The inventory revealed the production "non-agent chat path" the
original D8.5 design assumed has already converged on the agent
runtime (``chat_completion_service.openai_chat_completions`` already
delegates to ``runtime_manager.turn_service.create_or_get_turn`` and
``ChatService.create_chat`` rejects non-AGENT bots). So the actual
#92 work is A+B+C only — adding the discriminator column for future
non-agent paths and migrating the user-visible chat history shape to
canonical UIMessage. The translator extension (``chat.text.delta`` /
``chat.completed``) and the ``StoredChatMessagePart`` /
``RedisChatMessageHistory`` deletion are deferred per architect /
Weston canonical lock.

Changes:

A. ``runtime_kind`` discriminator on ``agent_message`` table
- ``aperag/domains/agent_runtime/db/models.py``: new
  ``runtime_kind: str`` ORM column with values
  ``agent_runtime`` / ``direct_chat`` / ``rag_chat`` (mutually
  exclusive enum); existing rows backfill via
  ``server_default="agent_runtime"``. ``role`` keeps speaker
  semantics independent of the runtime that produced the message.
- ``aperag/migration/versions/...c8f2d34a51e7_add_agent_message_runtime_kind.py``:
  additive migration; downgrade drops the column.

B. ``ChatService._build_v3_chat_history`` rewrite
- Returns ``list[AgentTurnSnapshot]`` (one snapshot per assistant
  turn) instead of the legacy ``list[list[ChatMessage]]`` shape.
- Reuses ``snapshot_assembler.assemble_parts_from_artifacts`` (the
  #90 D8.4d projection) so historical turns expose the same
  ``UIMessagePart`` shape the FE consumes from the live SSE stream
  (D8 §2 wire/at-rest byte-equal).
- ``error_text`` for FAILED / CANCELLED turns surfaces an
  ``error_summary`` artifact's message, falling back to
  ``turn.error_message`` — mirrors the snapshot endpoint contract.
- The turn's user query lives at ``input_text`` on the snapshot
  envelope (rather than as a separate ``role=human`` ChatMessage)
  so the FE renders user/assistant from a single object per turn.
- Legacy ``_extract_artifact_text`` / ``_extract_references`` /
  ``_map_reference_item`` / ``_artifact_type_value`` /
  ``_coerce_timestamp`` helpers are retired alongside the legacy
  shape.

C. ``ChatDetails.history`` schema
- ``aperag/domains/conversation/schemas.py``: ``history`` is now
  ``Optional[list[AgentTurnSnapshot]]`` with explicit description
  citing D8 §2 byte-equal canonical and the new shape.
- The ``conversation.schemas`` ↔ ``agent_runtime.uimessage``
  ↔ ``agent_runtime.schemas`` ↔ ``conversation.schemas`` cycle is
  broken via ``TYPE_CHECKING`` import + a module-level
  ``ChatDetails.model_rebuild()`` hook at the bottom of
  ``conversation/schemas.py``. Pydantic resolves the forward ref at
  load time so the OpenAPI schema is fully populated.
- ``aperag/domains/agent_runtime/uimessage.py``: ``AgentTurnSnapshot``
  gains ``runtime_kind: RuntimeKind`` (default ``"agent_runtime"``)
  and ``input_text: Optional[str]`` so historical turns can render
  the user query without a separate envelope round-trip.
- ``TurnService.get_turn_snapshot`` writes both new fields on the
  live snapshot endpoint so live and historical reload paths match.

D. (deferred) Translator extension for ``chat.text.delta`` /
``chat.completed`` and ``StoredChatMessagePart`` /
``RedisChatMessageHistory`` deletion stay out of #92 per
Weston msg=df87fe24 / PM msg=01918929. The non-agent live path the
extension would have served does not exist in the current
codebase; reintroducing it is a feature task, not a refactor.

Tests:
- ``tests/unit_test/chat/test_chat_service.py`` rewritten:
  * ``test_get_chat_returns_canonical_uimessage_history`` pins the
    new shape (snapshot per turn with text + source-url +
    data-citation parts, runtime_kind, input_text)
  * ``test_get_chat_history_surfaces_error_text_for_failed_turn``
    pins the error_text contract for FAILED turns
  * ``test_get_chat_history_does_not_expose_legacy_chatmessage_shape``
    regression-guard against revert to ``list[list[ChatMessage]]``
- ``tests/unit_test/agent_runtime/test_agent_runtime_v3.py`` updated
  to import ``AgentTurnSnapshot`` from ``agent_runtime.uimessage``
  (the back-compat re-export through ``agent_runtime.schemas`` was
  retired to break the new cycle).

Per D10 §G hard gate 1 (comprehensive grep sweep) ran across
``aperag/`` + ``tests/unit_test/`` + ``tests/e2e_http/hurl/`` +
``tests/e2e_http/scripts/``: only the FE
``web/src/components/chat/chat-messages.tsx`` reads ``chat.history``
in the old shape — that is the explicit hand-off seam for #93
huangheng (per architect msg=6e53a7c4).

Gates: full unit suite 833 / 29 skip / 0 fail; ruff check + format
clean.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… + render user bubble from input_text (#1707)

* feat(phase8 #92 D8.5-BE): canonical UIMessage chat history + runtime_kind discriminator

Phase 8 task #92 (D8.5-BE) — first-cut backend migration of the
non-agent chat path to the canonical ``UIMessage`` shape, scoped per
architect msg=01918929 + Weston msg=df87fe24 + earayu2 msg=f20d5034
hard-cut acceptance:

The inventory revealed the production "non-agent chat path" the
original D8.5 design assumed has already converged on the agent
runtime (``chat_completion_service.openai_chat_completions`` already
delegates to ``runtime_manager.turn_service.create_or_get_turn`` and
``ChatService.create_chat`` rejects non-AGENT bots). So the actual
#92 work is A+B+C only — adding the discriminator column for future
non-agent paths and migrating the user-visible chat history shape to
canonical UIMessage. The translator extension (``chat.text.delta`` /
``chat.completed``) and the ``StoredChatMessagePart`` /
``RedisChatMessageHistory`` deletion are deferred per architect /
Weston canonical lock.

Changes:

A. ``runtime_kind`` discriminator on ``agent_message`` table
- ``aperag/domains/agent_runtime/db/models.py``: new
  ``runtime_kind: str`` ORM column with values
  ``agent_runtime`` / ``direct_chat`` / ``rag_chat`` (mutually
  exclusive enum); existing rows backfill via
  ``server_default="agent_runtime"``. ``role`` keeps speaker
  semantics independent of the runtime that produced the message.
- ``aperag/migration/versions/...c8f2d34a51e7_add_agent_message_runtime_kind.py``:
  additive migration; downgrade drops the column.

B. ``ChatService._build_v3_chat_history`` rewrite
- Returns ``list[AgentTurnSnapshot]`` (one snapshot per assistant
  turn) instead of the legacy ``list[list[ChatMessage]]`` shape.
- Reuses ``snapshot_assembler.assemble_parts_from_artifacts`` (the
  #90 D8.4d projection) so historical turns expose the same
  ``UIMessagePart`` shape the FE consumes from the live SSE stream
  (D8 §2 wire/at-rest byte-equal).
- ``error_text`` for FAILED / CANCELLED turns surfaces an
  ``error_summary`` artifact's message, falling back to
  ``turn.error_message`` — mirrors the snapshot endpoint contract.
- The turn's user query lives at ``input_text`` on the snapshot
  envelope (rather than as a separate ``role=human`` ChatMessage)
  so the FE renders user/assistant from a single object per turn.
- Legacy ``_extract_artifact_text`` / ``_extract_references`` /
  ``_map_reference_item`` / ``_artifact_type_value`` /
  ``_coerce_timestamp`` helpers are retired alongside the legacy
  shape.

C. ``ChatDetails.history`` schema
- ``aperag/domains/conversation/schemas.py``: ``history`` is now
  ``Optional[list[AgentTurnSnapshot]]`` with explicit description
  citing D8 §2 byte-equal canonical and the new shape.
- The ``conversation.schemas`` ↔ ``agent_runtime.uimessage``
  ↔ ``agent_runtime.schemas`` ↔ ``conversation.schemas`` cycle is
  broken via ``TYPE_CHECKING`` import + a module-level
  ``ChatDetails.model_rebuild()`` hook at the bottom of
  ``conversation/schemas.py``. Pydantic resolves the forward ref at
  load time so the OpenAPI schema is fully populated.
- ``aperag/domains/agent_runtime/uimessage.py``: ``AgentTurnSnapshot``
  gains ``runtime_kind: RuntimeKind`` (default ``"agent_runtime"``)
  and ``input_text: Optional[str]`` so historical turns can render
  the user query without a separate envelope round-trip.
- ``TurnService.get_turn_snapshot`` writes both new fields on the
  live snapshot endpoint so live and historical reload paths match.

D. (deferred) Translator extension for ``chat.text.delta`` /
``chat.completed`` and ``StoredChatMessagePart`` /
``RedisChatMessageHistory`` deletion stay out of #92 per
Weston msg=df87fe24 / PM msg=01918929. The non-agent live path the
extension would have served does not exist in the current
codebase; reintroducing it is a feature task, not a refactor.

Tests:
- ``tests/unit_test/chat/test_chat_service.py`` rewritten:
  * ``test_get_chat_returns_canonical_uimessage_history`` pins the
    new shape (snapshot per turn with text + source-url +
    data-citation parts, runtime_kind, input_text)
  * ``test_get_chat_history_surfaces_error_text_for_failed_turn``
    pins the error_text contract for FAILED turns
  * ``test_get_chat_history_does_not_expose_legacy_chatmessage_shape``
    regression-guard against revert to ``list[list[ChatMessage]]``
- ``tests/unit_test/agent_runtime/test_agent_runtime_v3.py`` updated
  to import ``AgentTurnSnapshot`` from ``agent_runtime.uimessage``
  (the back-compat re-export through ``agent_runtime.schemas`` was
  retired to break the new cycle).

Per D10 §G hard gate 1 (comprehensive grep sweep) ran across
``aperag/`` + ``tests/unit_test/`` + ``tests/e2e_http/hurl/`` +
``tests/e2e_http/scripts/``: only the FE
``web/src/components/chat/chat-messages.tsx`` reads ``chat.history``
in the old shape — that is the explicit hand-off seam for #93
huangheng (per architect msg=6e53a7c4).

Gates: full unit suite 833 / 29 skip / 0 fail; ruff check + format
clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(phase8 #93 D8.5-FE): consume canonical AgentTurnSnapshot history + render user bubble from input_text

D8.5-FE first-cut, chained on @bryce #92 D8.5-BE
(`bryce/phase8-task92-d85-be-non-agent-uimessage`).

Per architect msg=a92ca060 + PM lock msg=38e116e5 + Bryce handoff
msg=27e8ec6d:
* `ChatDetails.history` is now `AgentTurnSnapshot[]` (canonical
  UIMessage at-rest, byte-equal with the live SSE wire).
* `AgentTurnSnapshot` carries `runtime_kind` (forward-compat
  discriminator, FE does not branch on it per PM lock) +
  `input_text` (user-side bubble content).
* No new non-agent SSE / live path — production code already
  routes everything through `agent_runtime`. The deferred
  `chat.text.delta` / `chat.completed` envelope expansion stays
  out per Bryce msg=27e8ec6d D defer and PM acceptance.

## What changed (FE)

* `web/src/components/chat/chat-messages.tsx` — full rewrite of the
  per-turn render orchestration:
  * State replaces `messages: ChatMessage[][]` with `liveTurns` map +
    `turnOrder: string[]` + `pendingUserMessages: { key, query,
    timestamp }[]`.
  * `chat.history` (canonical `AgentTurnSnapshot[]`) directly seeds
    `liveTurns` at mount via `seedFromHistory()` — no per-turn
    `getAgentTurnSnapshot()` round trip on first render.
  * `seedFromSnapshot()` and `ensureTurnGroups()` (tied to the legacy
    `ChatMessage[][]` shape) are gone; `recordTurn()` replaces them.
  * `handleSendMessage()` adds an optimistic `pendingUserMessages`
    entry until `createAgentTurn` returns the real turn id, then
    promotes the turn into `liveTurns` and drops the pending entry.
  * `recoverActiveTurn` effect still re-fetches the snapshot for the
    sessionStorage-active turn id so a mid-stream reload picks up
    cursor / status drift since page load.
* `AgentTurnStreamCard` now renders `<MessagePartsUser>` from
  `envelope.input_text` inline above the AI card so historical and
  live turns share one render path. The legacy `MessagePartsAi`
  branch is gone (canonical parts handle historical render too).
* `web/src/features/agent-runtime/api.ts` — `AgentTurnSnapshotEnvelope`
  extended with `runtime_kind: AgentRuntimeKind` (`agent_runtime` |
  `direct_chat` | `rag_chat`, default `agent_runtime`) and optional
  `input_text`. `AgentRuntimeKind` re-exported from the feature index.
* `web/src/api-v2/schema.d.ts` — regenerated via `yarn api:v2:types`
  against the post-#92 OpenAPI public spec.

## Boundary held (per PM lock)

* No new `chat-runtime/` feature module — the agent-runtime hook +
  renderer cover the historical render path 1:1 (per architect
  msg=38e116e5 lock).
* `runtime_kind` stays a BE-internal discriminator; FE does not
  branch on it.
* Legacy `MessagePartsAi` / `MessagePartAi` / `StoredChatMessagePart`
  files remain on disk — Python-side schema/storage delete is #80
  territory, and the FE files have no callers after this PR but are
  not removed here (kept for #80 sweep).

## Verification

* `yarn lint` clean (one pre-existing `no-explicit-any` warning in
  `features/providers/server-api.ts` unrelated to this PR).
* `tsc --noEmit` clean for touched files (the four pre-existing
  errors in `chat-input.tsx` are main-baseline noise unchanged here).
* `yarn dev` boots in 2.6s on port 3014; `GET /`, `/auth/signin`,
  `/workspace/collections`, `/workspace` all return 200.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…al Agent substrate (#1708)

* docs(modularization): add D10.b design pack — MCP/API redesign for external Agent substrate

Design pack covering:
- §A 8 read primitives (list_collections / list_documents / get_document_metadata /
  get_collection_metadata / read_document / read_document_outline /
  read_document_section / read_document_chunk)
- §B Search primitives split + omnibus deprecation
  (vector_search / graph_search / fulltext_search / web_search)
- §C Pagination + cursor contract (R2: opaque base64 + invariant_hash + 6 explicit
  error codes; explicit-error-not-silent-reset)
- §D Capability negotiation (R3 Option A canonical)
- §E Read primitive persistence strategy (Lock #7 LRU + parse_version L1+L2)
- §F D9 base reuse boundary (Lock #6 + Lock #8; cites #1698/#1699 inventory and
  Weston policy/backend-owned tenancy refinements)
- §G Implementation guidelines — 5 hard gates accumulated through D8.x review:
  (1) contract shape change → comprehensive grep sweep across hurl/unit_test/scripts
      with 5-category classification
  (2) CI red canonical decision needs actual root cause not "infra flake" attribution
  (3) bridge/adapter deletion → ALL caller path validation
  (4) Caller migration → preserve original assertion semantics
  (5) cross-stack design boundary requires owner inventory cross-validation
- §H Migration & backward compatibility plan (hard-cut philosophy per
  earayu2 msg=f20d5034)

Cumulative architect locks: R1/R2/R3 + Lock #5/#6/#7/#8 + cheapest combo.
runtime_kind discriminator settled as 3-value (agent_runtime / direct_chat /
rag_chat) per Bryce/Weston BE inventory finding.

Closes design phase of task #84. Implementation tasks (D10.c-h) to be
decomposed post-merge per §G 5-gate methodology.

* docs(modularization): D10.b — drop stale placeholder + add §G D10.c-h lane decomposition

Per Weston msg=71d8d605 + PM msg=db923645 in-PR doc-only blockers:

1. Remove the stale tail placeholder "(§B-§F + §H still to be drafted in
   subsequent sessions)". The 9/9 sections were already drafted; the placeholder
   was a leftover from an earlier draft session and contradicted §H presence
   above.

2. Add concrete D10.c-h implementation lane decomposition as a §G subsection.
   Each lane (D10.c through D10.h) now has:
   - Deliverable scope (one-to-one with §A-§H spec text for locatability)
   - Owner candidate (suggestion only — final claim via slock task claim)
   - Write-set boundary (which directories/files the lane is allowed to touch,
     plus explicit "Forbidden" list to prevent scope inflation)
   - Dependency graph (depends-on / blocks)

   Lanes:
   - D10.c — Read primitives BE implementation (§A)
   - D10.d — Search primitives split + omnibus deprecation (§B)
   - D10.e — Pagination + cursor contract (§C)
   - D10.f — Capability negotiation Option A (§D)
   - D10.g — Read primitive persistence LRU + parse_version (§E)
   - D10.h — Migration & hard-cut cutover (§H, architect-led)

   Plus a dependency-graph summary with 3 parallel-friendly windows so PM can
   batch task creation: window 1 (D10.d/e/g concurrent post-D10.c), window 2
   (D10.f joins post-D10.c+d), window 3 (D10.h cutover single-lane post-soak).

§G end-marker now states explicitly that §G is an open ledger — new lessons
should be appended as additional "Hard gate" subsections, not split into a
separate doc.

Doc-only change. No implementation impact.

---------

Co-authored-by: 符炫炜 <fuxuanwei@apecloud.io>
… error codes to §C.3 canonical (#1710)

Per [D10 spec amendment] thread (Bryce msg=441c5e56 + PM msg=40e98684 +
architect 双签):

§G D10.e Deliverable summary (line 1115) had 6 SCREAMING_SNAKE codes that
did not match §C.3 body's 6 snake_case codes. The §G summary was a drafting
slip from the rushed §G decomposition amendment commit (36b5835); §C.3 body
remains the canonical source because:

  1. Casing — wire format is snake_case to match the rest of the ApeRAG API
     surface (existing error codes use snake_case in JSON wire format).

  2. Granularity — §C.3 body splits invariant violation into 3 distinct
     codes (cursor_filter_mismatch / cursor_tenant_mismatch /
     cursor_index_changed) because line 567-571 maps DIFFERENT client
     recovery paths to each:
       - cursor_filter_mismatch  → client bug, surface to user
       - cursor_tenant_mismatch  → security violation, distinct telemetry
       - cursor_index_changed    → backend ops issue, retry from null
     Collapsing them into a single cursor_invariant_mismatch would lose
     this distinction and force clients to over-react.

  3. CURSOR_FOREIGN and CURSOR_PAGE_OUT_OF_RANGE in §G summary did not
     appear in §C.3 body and had no client-recovery path defined — they
     were drafting noise, not real codes.

§G D10.e summary now cites §C.3 verbatim and points readers at the §C.3
body for the client-recovery mapping (single source of truth).

Doc-only change. No implementation impact. Unblocks task #97 (D10.e
cursor errors.py).

Co-authored-by: 符炫炜 <fuxuanwei@apecloud.io>
…wire type (#1709)

D8.0c+ hygiene fix-forward — align ApeRAG agent-runtime wire emitter
with the AI SDK v5 strict spec for tool finish events.

Before: a single ``tool-output-available`` part conflated success and
failure via an optional ``error_text``; the FE reducer carried a
forward-compat fallback that re-classified failures based on the
field's presence.

After: success and failure are split onto two distinct wire events,
matching the AI SDK v5 standard:
- ``tool-output-available`` carries only ``output`` (success path).
- ``tool-output-error`` carries only ``error_text`` (required, failure
  path).

Both classes set ``model_config.extra = "forbid"`` so a residual
caller that still passes ``errorText`` to the success class surfaces
as a clean ``ValidationError`` rather than silently masquerading as
success.

Wire / FE alignment:
- aperag/domains/agent_runtime/wire/parts.py: adds
  ToolOutputErrorPart, strips error_text from ToolOutputAvailablePart,
  extends the discriminated union and __all__, refreshes the module
  docstring.
- aperag/domains/agent_runtime/wire/translator.py:
  _translate_tool_finished now branches on _is_failure_status to emit
  ToolOutputErrorPart on failure and ToolOutputAvailablePart on success.
- web/src/features/agent-runtime/types.ts: drops the errorText? legacy
  field from tool-output-available; tool-output-error becomes the sole
  strict failure shape.
- web/src/features/agent-runtime/reducer.ts: drops the legacy fallback
  that re-classified failures off tool-output-available; the success
  branch is now an unconditional state="output-available".

Doc alignment:
- docs/modularization/agent-message-protocol-design.md: wire part list
  now lists tool-output-error alongside tool-output-available.
- docs/modularization/agent-runtime-mcp-design.md: consent
  invocation-block + denial flows now reference tool-output-error.
- Module docstrings in tools/consent.py and tools/elicitation.py are
  updated to match.

Tests:
- Existing test_translate_tool_started_finished /
  test_translate_tool_failure are tightened to assert the strict shapes.
- New test_tool_output_strict_split_per_ai_sdk_v5 and
  test_tool_output_available_rejects_error_text_kwarg pin both the
  dump/parse split and the extra="forbid" regression guard.
- Round-trip sample matrix now covers both classes.

Verified locally:
- uv run --extra test python -m pytest tests/unit_test/ -q -> 836 passed / 29 skipped.
- make lint clean.

Note on local pre-commit hook: a stale .git/hooks/pre-commit script
(installed before #88) still calls the removed make add-license target
and was bypassed for this commit via core.hooksPath=/dev/null. The
script in scripts/hooks was already removed by #88 (8ed1d7b); the
local .git/hooks copy is residual state that the repo owner can clean
up with rm .git/hooks/pre-commit.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…1712)

* feat(phase9 #97 D10.e prep): cursor pagination contract + 18 unit tests

Per design pack §C (canonical post-#1710 SSoT):
- aperag/mcp/cursor/codec.py: CursorPayload (sort_key + last_position + invariant_hash + issued_at + ttl_seconds 1h default + server_id + schema_version 1) + base64url JSON encode/decode + is_expired TTL check
- aperag/mcp/cursor/invariants.py: compute_invariant_hash sha256 over (sort_key + filters + collection_id + tenant_id + index_id) deterministic across dict ordering
- aperag/mcp/cursor/schemas.py: PaginationParams (cursor + limit conint 1..200) + PaginationResult[T] generic (items + next_cursor + total_count)
- aperag/mcp/cursor/errors.py: 6 canonical snake_case codes (cursor_invalid / cursor_expired / cursor_filter_mismatch / cursor_tenant_mismatch / cursor_index_changed / cursor_schema_unsupported per §C.3 + #1710 amendment) + CursorError exception + CursorErrorEnvelope wire shape + SILENT_RESET_FORBIDDEN guard
- aperag/mcp/cursor/__init__.py: public surface for D10.c read primitive imports

tests/unit_test/mcp/test_cursor_contract.py:
- 5 codec round-trip / wire format / TTL boundary tests
- 2 invariant_hash determinism + binding sensitivity tests
- 7 error envelope round-trip tests (parametrized over each canonical code) + SILENT_RESET_FORBIDDEN pin
- 4 PaginationParams/PaginationResult shape tests including end-to-end cursor flow

Pending D10.c stub head landing for `aperag/service/pagination.py` integration helper + `tests/e2e_http/hurl/<NN>_d10_pagination.hurl` cross-tool e2e — those are Window 1 work.

* fix(phase9 #97 D10.e): enforce §C.3 explicit error contract in decode_cursor

Per Weston msg=cc4a3ab0 二线 CR blocker: decode_cursor() previously
surfaced malformed / wrong-schema / expired wire payloads as bare
ValueError / KeyError, leaving every D10.c / D10.d caller to
re-derive the canonical mapping. That violates the §C.3
explicit-not-silent invariant by construction — any forgotten
mapping silently degrades into ValueError → tool error → first-page
restart, which is exactly the anti-pattern SILENT_RESET_FORBIDDEN
guards against.

Fix:
- decode_cursor() now raises CursorError directly with the right
  canonical code: cursor_invalid (malformed wire / base64 / json /
  missing field), cursor_schema_unsupported (unknown schema_version),
  cursor_expired (past issued_at + ttl_seconds clock).
- _decode_cursor_payload() preserved as a private structural-only
  decode for tests that need to craft expired / wrong-schema
  payloads to exercise the canonical error paths.
- 3 new canonical-code tests + 1 internal-decode escape hatch
  test added; old raw-error test deleted (pre-#1710 wire shape
  no longer reachable through public surface).

_payload() fixture's issued_at now defaults to current time so
round-trip tests stay green when run far from the fixture's
drafting date; tests that need expired / fixed payloads override
explicitly.

21/21 tests pass; ruff check + format clean.

---------

Co-authored-by: Bryce <bryce@aperag.local>
Lands typed signatures + Pydantic response shapes + stable handle types
(ChunkId/SectionPath/HeadingAnchor) for the 8 §A read primitives, with
NotImplementedError bodies. Allows D10.d (#96) / D10.e (#97) / D10.g (#99)
owners to statically import the cross-lane surface and start their lanes
in parallel; full primitive bodies + integration tests follow in a separate
PR within this same task lane.

Per #1708 merged docs/modularization/d10-design-pack.md §G D10.c lane:
- Write-set: aperag/mcp/tools/{read_*,list_*,get_*}.py + schemas.py + handles.py
- Forbidden: §B search primitives / §C cursor encoder / §E persistence cache internals

Note on hook bypass: local .git/hooks/pre-commit references missing
'make add-license' target (stale environment hook, not repo content).
Bypassed via -c core.hooksPath=/dev/null per @明书 PR #1709 precedent;
license headers are present in every new file. User-side hygiene only,
no PR content impact.
…1714)

* feat(phase9 #95 D10.c): read primitives implementation (8 primitives, un-cached)

Wires the 8 read primitive bodies that landed as NotImplementedError in #1711
to ApeRAG's existing repositories + D9 tenancy/auth base.

Per #1708 merged docs/modularization/d10-design-pack.md §A:
- list_collections / list_documents / get_collection_metadata / get_document_metadata
  (4 list/metadata primitives, no parse_version)
- read_document / read_document_outline / read_document_section / read_document_chunk
  (4 parse_version-keyed primitives, body uses ParseVersionT helper)

Each primitive body strictly follows:
  tenancy gate (D9) → auth gate (D9) → compute parse_version → fetch authoritative

Cache wiring is intentionally deferred to D10.g (#99 @明书) per cuiwenbo
msg=13a4139a sequence: stub merge → un-cached implementation merge → cache
wire-in. cuiwenbo's parse_version helper at aperag/mcp/tools/parse_version.py
exposes ParseVersionT (Annotated[str, sha256[:16] regex]) for downstream lanes.

Also registers the 8 primitives in aperag/mcp/server.py with the
'# === D10.c read primitives ===' marker comment so chenyexuan's D10.d
search-split registration can be added adjacent without merge churn.

§G Forbidden boundary preserved: no §B search primitives, §C cursor encoder,
§D capability annotation, §E cache internals touched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase9 #95 D10.c): explicit cursor errors + pre-pagination type_filter

Addresses Weston msg=246c84d3 二线 sanity blockers on PR #1714:

1. cursor silent-reset → ValueError on malformed (§C explicit-not-silent)
   - list_collections._decode_cursor / list_documents._decode_cursor:
     None or "" → offset 0; non-empty but malformed/missing/negative offset
     raises ValueError with clear message
   - TODO comment marks the seam for D10.e (#97) to replace ValueError
     with canonical CursorError

2. list_documents type_filter applied before pagination
   - Filter mimetype in-memory (no Document.mimetype column exists; media
     type is computed via mimetypes.guess_type from filename) so
     total_count, offset/limit, and next_cursor are all computed over the
     filtered set
   - Regression test: markdown doc at position > limit is still found

Tests added (4 cursor malformed + 2 type_filter regression + 2 None/empty
positive guards = 8 new, 28 → 36 in D10.c surface suite).

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…#1713)

Phase 9 D10.d (#96) per docs/modularization/d10-design-pack.md §B.

Adds 4 split search MCP tools (vector_search / graph_search /
fulltext_search / web_search) under aperag/mcp/tools/, marks the
omnibus search_collection and search_chat_files as DEPRECATED with a
docstring banner, and relocates the existing web_search implementation
from server.py into the new tools subpackage so all D10 search tools
live in one place.

Forbidden boundaries (§G D10.d) are honored: search_collection /
search_chat_files implementation bodies are intentionally untouched
(deletion is D10.h territory), no read primitive tool surface is
modified (D10.c territory), no aperag/service/search_service.py
compat layer is created (would require [D10 spec amendment] thread).

The §B canonical SearchResult / SearchResultItem shape with chunk_id
/ section_path / heading_anchor surfacing is intentionally deferred
to a D10.d follow-up PR — current backend does not expose chunk_id in
the public response shape and the propagation question warrants a
[D10 spec amendment] thread before implementation.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…o wire-in) (#1716)

D10.g task #99 first-cut — adds the ``aperag/cache/`` subpackage that
implements the §E read-primitive persistence cache (L1 in-process LRU
+ L2 parse_version-keyed Redis), plus the explicit-trigger
invalidation helpers reserved for the D11+ write-tools lane. No D10.c
read primitive body is touched by this PR — wire-in lands in a
follow-up within the same task #99 lane (per cuiwenbo msg=13a4139a
sequence: D10.c implementation merge → D10.g cache wire-in).

The cache strictly accelerates; per §E.7 hard lock, callers run
tenancy + authorization gates on every invocation before reaching
the cache. The cache layer never sees the calling user — keys are
``(document_id, parse_version, ...)`` only. ``read_document_chunk``
is the §E.6 special case: chunk_id is indexing-immutable, so the
chunk namespace key is ``(chunk_id,)`` only with no parse_version
weighting.

Architecture (per architect Q1/Q2/Q3 lock msg=a67974b3):

- ``ParseVersionT`` and ``compute_parse_version`` are reused from
  ``aperag.mcp.tools.parse_version`` (cuiwenbo D10.c) — the helper
  module is the cross-lane SoT, not duplicated here.
- L1 is a per-worker LRU bounded by the new ``Config.d10_cache_l1_size``
  knob (default 256). L2 is the new ``Config.d10_cache_l2_ttl_seconds``
  knob (default 3600s = 1h, matching §C.4 cursor TTL default).
- The Redis client is keyword-only DI on ``L2Cache`` (per Q3 lock).
  Production wires the existing memory-redis client at app bootstrap;
  tests inject fakes; ``NoopL2Cache`` covers the no-Redis dev case so
  the composition shape never special-cases.
- ``ReadPrimitiveCache`` exposes one ``get_or_compute_*`` per primitive
  so callers do not build cache keys by hand. Per-key inflight
  collapsing prevents two concurrent miss paths from both paying the
  cold-parse cost.
- L1 / L2 decode failures are treated as cache miss (logged) — the
  authoritative storage is the source of truth, never fall back to a
  stale or alternative shape. L2 backend failures are similarly
  fail-open; the read primitive's compute step always answers.

Files:

- ``aperag/cache/__init__.py`` — public surface (re-exports).
- ``aperag/cache/parse_version_cache.py`` — L1Cache (LRU,
  asyncio.Lock-guarded), L2Cache (Redis adapter, keyword-only DI,
  fail-open), NoopL2Cache (no-Redis fallback).
- ``aperag/cache/read_primitive_cache.py`` — ReadPrimitiveCache facade
  with one get_or_compute_* per §A primitive; namespace constants;
  ``build_read_primitive_cache`` production helper.
- ``aperag/cache/invalidation.py`` — invalidate_document /
  invalidate_collection helpers reserved for D11+ write tools (no
  auto-wire in D10.g).
- ``aperag/config.py`` — adds ``d10_cache_l1_size`` (default 256) and
  ``d10_cache_l2_ttl_seconds`` (default 3600) knobs.
- ``tests/unit_test/cache/test_parse_version_cache.py`` — 14 tests
  covering L1 LRU eviction, namespace purge, L2 Redis adapter (using
  a minimal in-memory fake), fail-open semantics, NoopL2Cache.
- ``tests/unit_test/cache/test_read_primitive_cache.py`` — 8 tests
  covering all 4 ``get_or_compute_*`` methods, parse_version
  invalidation, section_path/heading_anchor key independence, chunk
  namespace shape, concurrent-miss compute collapsing, type-mismatch
  defense, undecodable cache entry recovery.
- ``tests/unit_test/cache/test_invalidation.py`` — 2 tests confirming
  parse_version namespaces are purged and chunk namespace is preserved
  per the §E.6 immutability rule.

Verified:

- ``uv run --extra test python -m pytest tests/unit_test/`` →
  917 passed / 29 skipped (was 893 + 24 cache tests = 917).
- ``make lint`` clean (478 files formatted).

Out of scope (deferred to D10.g follow-up PR within same task #99 lane):

- D10.c primitive body wire-in — 1-line ``cache.get_or_compute_*``
  call per primitive, replacing the inline authoritative fetch in
  ``aperag/mcp/tools/{read_document,read_document_outline,read_document_section,read_document_chunk}.py``.
  Held back to keep this PR purely additive (no D10.c body edits).
- Production cache instance wiring at app bootstrap — same follow-up.
- get_collection_metadata / get_document_metadata short-TTL caching —
  not in §G D10.g write-set; future extension.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…0.c list primitives (#1715)

Closes the D10.e write-set per design pack §G:
- NEW aperag/service/pagination.py — `encode_offset_cursor` /
  `decode_offset_cursor` helper that wraps the canonical
  `aperag.mcp.cursor` codec around the offset bookkeeping the D10.c
  list primitives already perform; binds invariants over (sort_key,
  filters, collection_id, tenant_id) so any scope drift between cursor
  issuance and re-use surfaces as canonical CursorError.
- MOD aperag/mcp/tools/list_collections.py + list_documents.py — drop
  the `_decode_cursor` / `_encode_cursor` placeholders #1714 left as a
  D10.e seam; route every cursor through the helper. Malformed /
  expired / scope-mismatched cursors now raise canonical CursorError
  ("cursor_invalid" / "cursor_expired" / "cursor_filter_mismatch" /
  "cursor_schema_unsupported") instead of bare ValueError.

Tests:
- NEW tests/unit_test/service/test_pagination_helper.py — 16 tests
  pin every canonical-error path: None/empty start-fresh, round-trip,
  garbage / non-json wire (cursor_invalid), sort_key drift / filters
  drift / collection_id drift (cursor_filter_mismatch), TTL expiry
  (cursor_expired), unknown schema_version (cursor_schema_unsupported),
  malformed last_position offset (string / negative / bool).
- MOD tests/unit_test/test_d10c_read_primitives_surface.py — drop the
  8 `_decode_cursor` tests that now belong to the helper module; left
  a comment redirecting readers to the new test file. The type_filter
  pre-pagination regression tests (Weston msg=246c84d3) stay where
  they are.

Stale-narrative cleanup (architect msg=343f2e32 follow-up):
- aperag/mcp/cursor/__init__.py + codec.py — drop the "pending spec
  amendment double-sign per architect msg=669db73c" docstring
  fragments now that #1710 has merged the canonical lock to main.

`tests/e2e_http/hurl/<NN>_d10_pagination.hurl` listed in §G is
intentionally deferred: there is no MCP-over-HTTP coverage in the
hurl suite today (D10.c #1714 followed the same scope), and the
helper's behaviour is fully exercised by the 16 unit tests above.
A dedicated MCP-over-HTTP e2e infrastructure pass is the right time
to add it, not this lane.

65/65 tests pass (21 cursor + 16 helper + 28 D10.c surface);
ruff check + format clean.

Co-authored-by: Bryce <bryce@aperag.local>
…n-keyed primitives (#1718)

D10.g task #99 follow-up — completes the wire-in deferred in #1716.
Each of the four parse_version-keyed read primitives in
``aperag/mcp/tools/`` now calls ``cache.get_or_compute_*`` after the
D9 tenancy + authorization gates run; the un-cached fetch path becomes
the cache's ``compute`` callback.

Per §E.7 hard lock the cache only accelerates — every invocation runs
``resolve_authenticated_user`` → ``tenancy_gate`` → ``authorization_gate``
before any cache lookup. The cache layer never sees the calling user
and cannot grant or skip a permission. Per §E.6 the chunk primitive's
cache key is ``(chunk_id,)`` only — no parse_version weighting because
``chunk_id`` is indexing-layer-immutable.

Changes:

- ``aperag/cache/runtime.py`` (new) — process-wide
  :func:`get_read_primitive_cache` lazy singleton wired to the
  existing memory-redis client (via
  ``aperag.db.redis_manager.RedisConnectionManager``); falls back to
  ``NoopL2Cache`` when Redis is unreachable so the cache never blocks
  authoritative reads (§E.7 fail-open).
- ``aperag/cache/__init__.py`` — re-exports
  ``get_read_primitive_cache`` / ``reset_read_primitive_cache``.
- ``aperag/mcp/tools/read_document.py`` — caches the full
  ``DocumentContent`` (key = document_id + parse_version); byte-range
  slicing applied AFTER cache lookup so the cache stays shared across
  range requests for the same document version.
- ``aperag/mcp/tools/read_document_outline.py`` — caches the
  ``DocumentOutline`` envelope.
- ``aperag/mcp/tools/read_document_section.py`` — caches by
  ``(document_id, parse_version, section_path, heading_anchor)``;
  computes outline + slice + sibling-count inside the cache compute.
- ``aperag/mcp/tools/read_document_chunk.py`` — caches by
  ``(chunk_id,)`` only per §E.6.
- ``tests/unit_test/cache/test_wire_in_invariants.py`` (new) — three
  static guards: (1) cache call must follow tenancy + authorization
  gates in every primitive body, (2) each primitive uses its
  dedicated typed ``get_or_compute_*`` method, (3) the chunk
  primitive must not pass parse_version to the cache key.

Production wiring:

- The L1 size and L2 TTL knobs landed in #1716
  (``Config.d10_cache_l1_size`` / ``Config.d10_cache_l2_ttl_seconds``)
  are now actually consumed by the singleton.
- L2 fail-open: a Redis connection failure at process start logs at
  WARNING and degrades to L1-only — the cache layer remains available
  per the §E.7 hard lock that authoritative reads must never block on
  the cache.

Test plan:

- ``uv run --extra test python -m pytest tests/unit_test/`` →
  932 passed / 29 skipped (was 917 + 3 new wire-in invariant tests
  + 12 from D10.e #1715 helper tests landed in the meantime = 932).
- ``make lint`` clean.
- The pre-existing ``test_call_sequence_witness_for_all_parse_version_keyed_primitives``
  in ``tests/unit_test/test_d10c_read_primitives_surface.py`` still
  passes — the cache call is added strictly after the D9 base gates.

Out of scope: ``get_collection_metadata`` / ``get_document_metadata``
short-TTL caching; explicit invalidation triggers wired to write
paths (those wait on D11+ write-tools per §E.5 + §G D10.g write-set
boundary).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…t search tools (#1717)

* feat(phase9 #96 D10.d): cursor placeholder + kw-only sentinel on split search tools

D10.d task #96 same-lane follow-up per `[D10 spec amendment]` thread
(msg=b9b7072a) PM canonical decisions:

- Drift #5: add `*,` kw-only barrier after `query` on the 3
  collection-scoped split tools (`vector_search` / `graph_search` /
  `fulltext_search`), aligning with the D10.c read-primitive precedent
  established in #1714. `web_search` keeps its existing wire signature
  per §B.4 / amendment (parameter canonicalization deferred to D10.h
  cutover).

- Drift #4 (c): publish `cursor: str | None = None` placeholder on the
  same 3 tools so external MCP clients see the canonical surface, but
  the body explicitly fails with
  `CursorError("cursor_invalid", "search pagination cursor is not yet
  implemented", details={"reason": "search_not_paginated", "tool":
  ...})` on any non-null value. `cursor=None` continues to mean "first
  page" (current behavior). Real search pagination requires a backend
  capability that will land in a dedicated D11+ upgrade.

`fulltext_search` also gains `rerank: bool = True` to match the §B.3
spec; previously the rerank flag was reachable only via the omnibus
`search_collection`.

Drifts #1 (`SearchResultItem` with chunk_id) and #2 (`web_search`
canonicalization) intentionally NOT in this PR — both deferred to the
D10.h cutover lane per the amendment thread. Drift #3 (capability
annotations) belongs to D10.f (task #98).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase9 #96 D10.d): cursor=="" → first page, NotImplementedError on truthy cursor

Per Weston blocker review (msg=177a1dd8) on PR #1717 + architect
sign-off (msg=ebfcdabe):

1. **Empty-string cursor must equal None** — both should preserve the
   existing single-page `top_k` behavior. The previous `is not None`
   guard incorrectly raised on `cursor=""`. Switched to `if cursor:`
   (truthiness check) so the loud-fail only triggers on truly
   non-empty cursor values.

2. **Stop pretending feature-not-implemented is a malformed cursor** —
   the canonical `CursorError("cursor_invalid", ...)` describes
   wire-level malformed cursors. Using it for "search pagination is
   not implemented" camouflages a missing capability as a client-side
   cursor bug. Switched to plain `NotImplementedError("search
   pagination is not yet implemented (tool=..., reason=
   search_not_paginated)")` so the loud-fail accurately surfaces the
   deferred-capability semantic.

Test updates:
- `cursor=""` removed from the bad-cursor parametrize (it is no
  longer a bad cursor)
- New `test_collection_scoped_split_tools_treat_none_and_empty_cursor_as_first_page`
  monkeypatches httpx + get_api_key and asserts that both `None` and
  `""` cursor pass through the guard and reach the backend search call
- Loud-fail test renamed to `_reject_non_empty_cursor` and asserts
  `NotImplementedError` (not `CursorError`) with a clear "not
  implemented" message including the originating tool and reason

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…metadata (#1719)

* feat(phase9 #98 D10.f): capability negotiation Option A + annotation metadata

Per docs/modularization/d10-design-pack.md §D every D10 tool now carries
a frozen ToolAnnotation envelope (requires / capabilities / deprecated /
deprecated_until / fallback_to) onto the MCP wire so external Agents can
do client-side filtering without server-side session state.

§G D10.f write-set:
- aperag/mcp/capabilities.py — ToolAnnotation Pydantic model + closed
  KNOWN_CAPABILITIES / KNOWN_REQUIRES sets + to_mcp_dict() for
  FastMCP annotations= kwarg.
- aperag/mcp/tools/_annotations.py — name → ToolAnnotation registry
  with re-registration guard (idempotent on identical re-register;
  ValueError on conflict).
- aperag/mcp/server.py + aperag/mcp/tools/search_*.py — additive
  decorator wrap on all 8 D10.c read primitives + 4 D10.d search
  primitives. No body / signature change.
- aperag/sdk/capability_filter.py — Option A client-side filter
  (FilterDecision explicit-not-silent reasons per §D.3, deprecated
  pass-through, missing-capability sorted output).
- tests/unit_test/mcp/test_capability_negotiation.py — 37 tests
  covering schema validity, registry, register() semantics, §D.2
  filter pseudocode, §D.3 explicit-not-silent decisions.
- tests/e2e_http/hurl/full/21_d10_capabilities.hurl — wire-shape
  contract: tools/list returns every D10 tool name plus annotation
  field markers (requires / capabilities / graph_index / fulltext_index
  / web_access / long_context / collection_access / deprecated_until).

Server-side filter (Option B per §D.4) intentionally not implemented;
deferred to a future task per architect lock if narrow legal/compliance
scenarios ever demand it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(phase9 #98 D10.f): drop null-stripped wire-shape asserts in capabilities hurl

CI on 9d8e196 flagged 21_d10_capabilities.hurl asserting the literal
"deprecated_until" / "fallback_to" field markers. The MCP wire format
strips null-valued fields, so those keys do not appear on the wire while
every D10 tool keeps them at None — matches the architect's §D.1 spec
sample (the keys reappear automatically once a tool sets them).

Adjustment is hurl-only and additive to test coverage:

- Drop the always-failing markers; add a one-line note pointing at
  test_capability_negotiation.py for the schema-default behavior.
- Replace the bare "deprecated_until:null" probe with
  "deprecated":false so we still confirm the discriminator surfaces
  under its current default.

No production code change. Unit suite still asserts the full §D.1
envelope (default None values, frozen, extra=forbid) so the contract
remains locked at the model level.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…redChatMessage* legacy (#1720)

D8.6 destructive cleanup of legacy chat-history persistence path. Pre-launch
system has no users / no data, so we simply delete instead of migrate
(per earayu2 msg=9730bb6b — accept hard-cut).

Removed
-------
- ``RedisChatMessageHistory`` (the entire class) from
  ``aperag/utils/history.py``. Chat history is canonical at-rest in the
  ``agent_message`` table (D8.2 #74) and the read path now flows through
  ``UIMessageStore`` per turn.
- ``aperag/chat/history/`` directory (``StoredChatMessage`` /
  ``StoredChatMessagePart`` legacy classes); replaced by ``UIMessagePart``
  in #74. ``aperag.chat`` had no remaining surface so the dir is deleted.
- ``RedisChatMessageHistory(...).clear()`` from ``chat_service.delete_chat``
  — no Redis history to clear; the chat row delete cascades the canonical
  rows.
- The legacy Redis read path from ``chat_title_service.generate_title`` —
  it now reads recent ``AgentTurn`` rows via ``query_agent_turns`` and
  composes OpenAI-format prompts from each turn's ``input_text`` (user)
  plus the persisted assistant ``UIMessage`` text parts via
  ``UIMessageStore.read``.

Kept
----
- ``get_async_redis_client`` in ``aperag/utils/history.py`` — still used
  by ``aperag/utils/weixin/client.py`` for non-history Redis access.

Caller sweep
------------
- Only remaining ``from aperag.utils.history import`` site is
  ``weixin/client.py`` (importing ``get_async_redis_client``).
- Zero remaining ``RedisChatMessageHistory`` Python references; the
  remaining hits are doc / docstring mentions describing the removal.
- Zero remaining ``StoredChatMessage`` / ``aperag.chat.history`` Python
  references.

Tests
-----
- Rewrote ``test_chat_title_service.py`` to drive the empty-history
  branch via an empty ``query_agent_turns`` result instead of monkey-
  patching the legacy Redis class.
- Full unit suite: 996 passed, 29 skipped.

Follow-up within #80 lane (subsequent chunk PR)
------------------------------------------------
- Wire ``UIMessageStore.write`` into the agent runtime emit path and
  drop ``snapshot_assembler`` + ``agent_artifact`` /
  ``agent_timeline_event`` tables (separate scope, separate PR).
…snapshot_assembler / agent_artifact (#1723)

Phase 8 D8.6 chunk-2 hard-cut Option 2 (PM lock msg=d916b44a): A+B in
one PR — read-side artifact-fallback removal + write-side
``UIMessageStore.write`` live wire-in. Pre-launch system has no users
/ no data, so destructive deletion replaces migration.

A — Read-side artifact-fallback removal
---------------------------------------
- Delete ``aperag/domains/agent_runtime/snapshot_assembler.py`` and
  its test.
- ``services.py:get_turn_snapshot`` reads ``UIMessageStore.read``
  only — no more ``query_agent_artifacts_by_turn`` fallback. FAILED
  / CANCELLED ``error_text`` falls back to the ``AgentTurn`` row's
  ``error_message``.
- ``chat_service.py:_build_v3_chat_history`` switches to per-turn
  ``UIMessageStore.read`` (lazy singleton, mirrors chunk-1's
  ``ChatTitleService`` pattern).

B — Write-side ``UIMessageStore.write`` live wire-in
----------------------------------------------------
- ``runtime.py`` end-of-turn now composes a single canonical
  ``UIMessage`` (``TextPart`` + ``SourceUrlPart`` /
  ``DataCitationPart`` per reference, mirroring the FE-bound
  shape) and persists via ``uimessage_store.write``. Replaces both
  ``artifact_service.create_artifact(ANSWER, ...)`` and
  ``create_artifact(REFERENCE_BUNDLE, ...)``.
- FAILED path drops the ``error_summary`` artifact write — the
  error surface stays on the ``AgentTurn`` row, which the snapshot
  endpoint already reads.
- ``mark_completed`` loses the ``answer_artifact_id`` /
  ``reference_bundle_artifact_id`` parameters; the runtime-state
  Redis merge no longer carries those fields.
- ``HistoryWriter.build_history_context`` reads canonical
  ``UIMessage`` text parts per-turn; legacy artifact lookup is
  gone.
- Reader migration:
  * ``ChatCompletionService._build_completion_content`` accepts a
    parts list and projects the OpenAI-compat
    ``answer + DOC_QA_REFERENCES + json`` envelope from
    ``TextPart`` + ``DataCitationPart``.
  * ``evaluation/worker._extract_answer_text`` joins ``TextPart``
    text from the persisted message.

Deletions
---------
- ``ArtifactService`` class (no production callers after wire-in).
- ``/api/v2/agent/artifacts/{artifact_id}`` route +
  ``get_artifact_view``.
- ``AgentArtifact`` SQLAlchemy model + ``AgentArtifactType`` enum +
  ``AgentArtifactEnvelope`` schema (incl. its ``__all__`` export).
- ``AgentTurn.answer_artifact_id`` / ``reference_bundle_artifact_id``
  columns + their indices.
- ``AgentTurnEnvelope`` artifact_id fields (no remaining consumer).
- ``db_ops`` methods: ``create_agent_artifact`` /
  ``query_agent_artifact`` / ``query_agent_artifacts_by_turn``.
- Helper ``_extract_answer_text_from_artifact`` in services.py.

Migration
---------
- New alembic head ``d8e6c2b4f1a9`` (revises ``c8f2d34a51e7``):
  drops the ``agent_artifact`` table + its three indices and the
  two ``agent_turn`` artifact_id columns + their indices. Downgrade
  reconstructs both for rollback symmetry.

Caller sweep (``AgentArtifact`` / ``artifact_service`` /
``snapshot_assembler`` / ``extract_error_text`` /
``assemble_parts_from_artifacts`` / ``answer_artifact_id`` /
``reference_bundle_artifact_id``): zero remaining Python call-sites
in ``aperag/`` or ``tests/`` (only docstring narratives describing
the removal + a single bash comment in
``tests/e2e_http/scripts/run_chat_collection_flow.sh``).

Tests
-----
- Rewrote ``test_agent_runtime_v3.py`` + ``test_chat_service.py`` +
  ``test_chat_completion_service.py`` to drive the new code paths
  via a minimal in-memory ``_FakeUIMessageStore``; dropped legacy
  ``_FakeArtifactService`` and ``query_agent_artifacts_by_turn``
  mocks.
- Added ``test_agent_runtime_views_no_artifact_route`` regression
  guard pinning the deletion of the ``/agent/artifacts/{id}`` route
  + ``get_artifact_view`` symbol.
- ``test_agent_runtime_openapi_contract.py`` now asserts the
  artifact route + ``AgentArtifactEnvelope`` schema are absent from
  the FastAPI OpenAPI spec.
- Full unit suite: 989 passed, 29 skipped, ruff + format clean.

Out of scope (chunk-3 per PM Option 2 lock)
-------------------------------------------
- ``agent_timeline_event`` table removal + replay/reload semantic
  change. Wire-emitter event sourcing stays as-is in this PR.
…lization + retrieval handle exposure + D10 hurl coverage (#1721)

* feat(phase9 #100 D10.h): cutover blocks A/B/D/F-partial — search legacy delete + web_search canonicalize + cursor narrative

Per design pack §G D10.h + amendment-#2 + PM scope lock (msg=dc63c7e6):

Block A — search_collection / search_chat_files hard-cut + caller sweep:
- aperag/mcp/server.py: delete `search_collection` (line 295-465 ~170 LOC) + `search_chat_files` (468-583 ~116 LOC) bodies; rewrite the `aperag_usage_guide` resource docstring to document the canonical D10 split tools (vector_search / graph_search / fulltext_search / web_search) instead of the deprecated omnibus surface; replace the "deferred to D10.h cutover" NOTE comment with a closure-state note pointing at the split tools.
- aperag/domains/agent_runtime/services.py: update `_KNOWLEDGE_SEARCH_TOOLS` from `{"list_collections", "search_collection"}` to the split-tool set `{list_collections, list_documents, vector_search, graph_search, fulltext_search}`; expand `_READING_TOOLS` to cover the 6 D10.c read primitives so user-activity routing tracks the post-cutover surface end-to-end.

Block B — web_search §B.4 canonicalization:
- aperag/mcp/tools/search_web.py: `query` is now positional + required (raises ValueError on empty); every other parameter is keyword-only; the result-count limit is named `top_k` (not `max_results`); `source` is `str | None` so a missing domain filter is null. The wrapper still passes `max_results` to the internal `/api/v2/web/search` payload — backend rename is intentionally out of this lane (PM constraint #2 msg=309b3ed3 "目标是 cutover contract 收口,不是重写 hurl 框架").
- tests/unit_test/mcp/test_search_split.py: replace the deferred-shape signature pin with `test_web_search_signature_matches_b4_canonical` — asserts query is required, the 4 other params are kw-only with the canonical names + defaults, and the legacy `max_results` parameter is gone.

Block D — stale narrative cleanup:
- aperag/mcp/cursor/invariants.py: drop the "(pending architect canonical lock per msg=441c5e56)" parenthetical now that #1710 has merged the canonical lock to main.

Block C (retrieval `chunk_id` / `section_path` / `heading_anchor` propagation per Drift #1) is left for the architect to author on this same branch per msg=b9e7f91e cross-lane offer; block E (D10 hurl coverage suite) follows after C lands.

#80 territory deliberately untouched (snapshot_assembler / RedisChatMessageHistory / StoredChatMessagePart / chat/history / utils/history) per PM msg=309b3ed3 #100/#80 disjoint lock.

* feat(phase9 #100 D10.h): block E — D10 hurl coverage suite

4 new hurl files exercise the post-cutover MCP wire surface end-to-
end; mirror 21_d10_capabilities's tools/list + substring-contains
pattern (the wire is JSON-RPC framed inside SSE, so jsonpath would
skip the envelope):

- 22_d10_read_primitives.hurl — pin the §A.1-§A.8 input schema names
  for every paginated and metadata read primitive plus the
  ``read_document`` byte-range parameters and the ``read_document_chunk``
  stable handle.
- 23_d10_pagination.hurl — pin §C.5 PaginationParams (cursor + limit)
  inputs and PaginationResult (items + next_cursor + total_count)
  outputs on every paginated read primitive.
- 24_d10_search_split.hurl — pin the §B.1-§B.4 split-search surface
  and the §B.4 canonical parameter set (top_k present, max_results
  intentionally absent). Includes assertions on
  chunk_id/section_path/heading_anchor that exercise the retrieval-
  propagation block authored by the architect on this same branch —
  these will fail until that commit lands, which is the intended
  cross-block gate.
- 25_d10_cutover.hurl — cutover gate: every canonical D10 tool name
  is still on tools/list, the legacy search_collection / search_chat_files
  omnibus pair is gone. A future regression that re-registers either
  legacy tool fails CI loudly before merge.

Per PM constraint #2 (msg=309b3ed3) the hurl coverage matches this
hard-cut one-for-one — no infra refactor, no new transport, just
substring-contains over the existing ``/mcp/`` mount.

* feat(phase9 #100 D10.h Block C): expose chunk_id / section_path / heading_anchor on SearchResultMetadata

Per amendment-#2 Drift #1 (msg=ebfcdabe) + PM final scope lock
(msg=dc63c7e6 / msg=5760999e) — D10.h cutover Block C: surface the 3
LOCKED §A.9 stable handle fields on the retrieval domain public
allowlist so external Agents (Claude Code / Codex / Cursor) can
navigate from search hits back to the canonical D10.c read primitives
(``read_document_chunk(chunk_id)`` / ``read_document_section(section_path)``).

## Changes

- ``aperag/domains/retrieval/schemas.py``:
  - ``SearchResultMetadata`` allowlist adds 3 fields (``chunk_id``,
    ``section_path``, ``heading_anchor``). The model keeps
    ``extra="forbid"`` — adding the fields does not relax the
    allowlist.
  - ``SearchResultMetadata.from_raw()`` extends extraction so the
    3 fields propagate from upstream backends that already include
    them (e.g. ``aperag/domains/indexing/fulltext_index.py:541-553``
    already surfaces ``chunk_id``).

- ``tests/unit_test/domains/retrieval/test_search_result_metadata.py``
  (NEW) — 7 tests pin the contract:
  1. Each of the 3 fields is constructible.
  2. Unknown keys still rejected (``extra="forbid"`` regression).
  3. ``from_raw()`` extracts all 3 when present.
  4. Missing fields surface as ``None`` (upstream propagation gap is
     not a schema break).
  5. Non-string / empty values filtered so the public surface never
     carries a numeric chunk_id or empty string.

## Out of scope (per PM msg=5760999e #4 constraint)

Indexing-layer attachment of ``section_path`` / ``heading_anchor`` to
chunk metadata at index time is NOT included here. Per the constraint,
multi-indexer expansion (vector / fulltext / graph / summary / vision
each writing section context) would balloon the write-set; we keep
D10.h Block C as a 1-2-touch retrieval-domain surface change. The
3 fields surface as ``None`` until the indexing-layer enhancement
lands in a follow-up. ``chunk_id`` populates immediately via the
existing fulltext index ``_source.chunk_id`` propagation
(``fulltext_index.py:541-553``).

## §G hard gate compliance

- #1 (3-root grep): no caller assertion drift on
  ``SearchResultMetadata`` shape outside the allowlist additions
  (allowlist additions are additive).
- #5 (cross-stack): only ``aperag/domains/retrieval/schemas.py`` +
  ``tests/unit_test/domains/retrieval/`` touched on the architect
  side; Bryce's Block A/B/D commits cover the rest of D10.h scope.

Block C complement to architect+Bryce co-own #100 (msg=a17a4017
execution split: architect commits Block C on shared branch).

* test(phase9 #100 D10.h): caller migration assertion semantics for the cutover

Per §G hard gate #4 the cutover lane is the right place to update the
test surface that previously pinned the legacy ``search_collection``
behaviour, so the assertions match the post-cutover reality:

- tests/unit_test/mcp/test_search_split.py:
  drop the two ``[DEPRECATED]`` banner tests and the two
  body-still-targets-v2 tests; replace them with two cutover-removal
  tests (``test_search_collection_legacy_omnibus_removed_from_module``
  / ``test_search_chat_files_legacy_omnibus_removed_from_module``)
  that pin both the runtime attribute absence and the absence of the
  ``async def`` in source. Inline the ``_async_def_source`` ast walk
  into the one remaining caller (``test_web_search_module_targets_v2_web_path``).
- tests/unit_test/test_mcp_server.py:
  rename ``test_search_collection_docstring_explains_step_and_failure_meaning``
  to ``test_search_collection_legacy_omnibus_no_longer_registered`` —
  the user-visible step language for the new split tools is covered
  in ``test_search_split.py`` so this file just pins the absence.
- tests/unit_test/test_mcp_contract.py:
  rewrite the ``search_collection`` URL-and-import invariants as
  ``test_legacy_search_collection_omnibus_stays_removed`` +
  ``test_search_result_legacy_import_stays_gone``. The original tests
  guarded the Phase 2 retrieval hard-cut (URL must be v2, SearchResult
  must come from the retrieval domain); the post-cutover invariant is
  that neither symbol re-enters the server module at all.
- tests/unit_test/agent_runtime/test_agent_runtime_v3.py:
  ``test_event_service_to_event_envelope_adds_user_activity_contract``
  switches its sample tool name from the removed ``search_collection``
  to ``vector_search`` so the user-activity inference contract is
  exercised against the canonical ``_KNOWLEDGE_SEARCH_TOOLS`` set
  updated in ``aperag/domains/agent_runtime/services.py``.

1001/1001 unit tests pass; ruff check + format clean.

* fix(phase9 #100 D10.h): Weston review — fixture migration + false-positive hurl gate + stale split-tool docstrings

Per Weston msg=8a691444:

1. tests/fixtures/mcp_agent.py
   The ``searcher`` agent's instructions still pointed at the omnibus
   ``search_collection()`` call — a real instruction surface, not just
   prose. After the cutover that tool is gone from MCP ``tools/list``
   so the fixture would teach the agent to call a non-existent tool.
   Migrate the instruction to the canonical D10 split-search flow:
   ``list_collections`` first, then compose ``vector_search`` /
   ``fulltext_search`` / ``graph_search`` per the question, and chain
   into ``read_document_chunk`` / ``read_document_section`` via the
   ``chunk_id`` / ``section_path`` handles on each ``SearchResultItem``.

2. tests/e2e_http/hurl/full/24_d10_search_split.hurl
   The three ``contains "\"chunk_id\""`` / ``"\"section_path\""`` /
   ``"\"heading_anchor\""`` assertions claimed to gate the
   ``SearchResultItem.metadata`` outputSchema, but the split-search
   tools return ``Dict[str, Any]`` whose FastMCP outputSchema is
   ``additionalProperties: true`` — those substrings would actually
   be matched by unrelated read-primitive input schemas (e.g.,
   ``read_document_chunk(chunk_id=...)``), making the hurl a
   false-positive gate. Drop the three assertions and add a header
   comment pointing at the proper Pydantic-level pin in
   ``tests/unit_test/domains/retrieval/test_search_result_metadata.py``
   (which the architect's block C already authored).

3. aperag/mcp/tools/search_{vector,graph,fulltext}.py module docstrings
   Each one still narrated the omnibus deprecation timeline ("the
   alias remains until D10.h cutover") even though D10.h is the lane
   that just deleted it. Rewrite the three module docstrings to
   describe the post-cutover state: each split tool is the sole
   public entry point for its recall mode, and the retrieval
   ``SearchResultMetadata`` allowlist now exposes the three §A.9
   stable handle fields. Non-blocking per Weston, but cleaner to
   land while we are touching this lane.

1001 unit tests pass; ruff check + format clean.

* fix(phase9 #100 D10.h): add missing [Asserts] block to D10 hurl files

CI e2e-http-provider failed at ``22_d10_read_primitives.hurl:52`` with
``the HTTP method <body> is not valid``. The four new hurl files I
added in block E left the body assertions hanging directly under
``HTTP 200`` without an ``[Asserts]`` block, so the hurl parser tried
to read each ``body contains "..."`` line as the start of a new
request (looking for an HTTP method like GET/POST/etc).

The existing ``21_d10_capabilities.hurl`` (which my files were
modeled on) does have ``[Asserts]`` after its final ``HTTP 200`` —
I missed that line when copying the pattern. Add it to all four:

- 22_d10_read_primitives.hurl
- 23_d10_pagination.hurl
- 24_d10_search_split.hurl
- 25_d10_cutover.hurl

Same hurl-only fix pattern as huangheng's #1719 follow-up
(`379d2535`): no production code touched, only the hurl assertion
framing.

* fix(phase9 #100 D10.h): drop false-positive output-schema assertions in 23_d10_pagination.hurl

CI run on head ``7d64991`` failed at
``23_d10_pagination.hurl:67`` and ``:68`` — same false-positive
pattern Weston already flagged for 24_d10_search_split.hurl
(msg=8a691444). The MCP tool functions in ``aperag/mcp/server.py``
are typed ``-> Dict[str, Any]``, so FastMCP exposes only
``"outputSchema": {"additionalProperties": true}`` on the
``tools/list`` wire — the ``items / next_cursor / total_count``
PaginationResult envelope field names never reach the body, and
``body contains "\\"next_cursor\\""`` / ``"\\"total_count\\""``
genuinely fail.

The PaginationParams input names (``cursor / limit``) DO surface
on the wire because they are inputSchema parameters; those four
substrings stay. Replaced the misleading "FastMCP emits the
Pydantic model JSON schema" comment with a header note pointing at
the cursor unit suite, which is where the envelope-shape pin
already lives (``tests/unit_test/mcp/test_cursor_contract.py`` +
``tests/unit_test/service/test_pagination_helper.py``).

Same hurl-only fix pattern as the previous ``[Asserts]`` push and
huangheng's #1719 ``379d2535`` follow-up — no production code
touched.

---------

Co-authored-by: Bryce <bryce@aperag.local>
Co-authored-by: 符炫炜 <fuxuanwei@apecloud.io>
earayu and others added 26 commits April 30, 2026 13:01
…audit (#1928)

* docs(task-61): DB adapter compat spec v1 — vector + graph cross-impl audit

Architect spec v1 起草 per earayu2 directive (msg=8b989470 / msg=2bad8e75
/ msg=f26b703e) + PM 不穷 task #72 dispatch.

Streaming evidence integration from 8 lanes:
- huangheng msg=ed2f2973: 3 vector P0 candidates (cross-tenant /
  filter silent / collection init)
- Bryce msg=8e895471 task #69: 11 vector findings (4 P0 + 3 P1 + 4 P2,
  including upgraded score normalization P0-V3/V4)
- 冬柏 msg=3e93bb64 task #67: 3 missing Protocol method tests
  (bulk_upsert_entity_with_lineage_parts P0 + remove_relation_lineage
  P1 + list_entities P1)
- chenyexuan msg=f298011e + PR #1926: workflow paths filter dead
  reference P0-W1 (in flight)
- cuiwenbo msg=dfebf706 task #70: FE/UX 3 candidates (score, viz error
  vs empty, confidence_score)
- Planetegg msg=db7fb085 + msg=41906f4 + msg=41665d7e task #65: alias
  resolution gather P2-S1 + Singapore QDRANT_MULTITENANT=True (no
  hot-fix needed) + env shape verify
- ziang task #64 graph store audit (in_progress, will fold-in)
- dongdong task #71 deploy/typed schema (in_progress, will fold-in)

Spec structure:
- §1 inventory by lane with file:line evidence
- §2 缺口 by severity (P0 CRITICAL hot-fix candidate / P0 必修 / P1
  允许差异 declare / P2 性能优化 / YAGNI)
- §3 三层 design direction per Weston msg=85e527e3 framework
- §4 sub-task dispatch (Phase A 8 lane parallel + Phase B per-P0
  three-PR-pattern + Phase C P2 + Phase D PR #1926 unblock)
- §5 acceptance: P0/P1 standards + boundary test gate + e2e + sample
  limitation免责
- §6 CR mandatory checklist citing Lesson #11-#16 family from
  PR #1916/#1924/#1922 sediment + new Lesson #16 candidate (workflow
  paths dead reference)

Sample limitation: spec evidence from streaming surface, not
huangzhangshu collected gap list — fix-forward amend after
huangzhangshu lane completes + Bryce/ziang audit slice输出.

Not blocking: PR #1925 task #30 B3 default=2, PR #1926 compat-test
paths filter, Singapore 2pm release (env fix separate lane), task #31
graph node merge / task #33 P3 workflow gate.

* docs(task-61): fix-forward Weston BLOCKER + 5 streaming integration

Weston msg=13dd5e91 BLOCKER (score normalization severity drift):
保持 P0-V3+V4 P0 across §1.1 / §2.2 / §5.3 — score 方向是 caller
语义硬契约,不能在 PGVector/Qdrant 间显示反向。§2.2 加 P0-V3+V4
显式行 + §5.3 加 test_score_normalization_in_vector.py boundary
test (跨 metric × 跨 adapter 全 6 cell parametrize).

Streaming integrations (5 lane):

1. Bryce msg=23a2f514 P0-V1 first-principles 重新定性 — Qdrant
   legacy mode tenant isolation 是 collection name level 不是 query
   filter level (verify qdrant_connector.py:442-446),下沉 P1-V4
   defense-in-depth (legacy mode deprecation follow-up 候选).

2. Bryce msg=8e895471 11 vector findings — 4 P0 (cross-tenant
   下沉 / filter silent / score V3+V4) + 3 P1 (collection init /
   batch atomicity / filter Or 语义) + 4 P2.

3. dongdong msg=4201465a + PR #1929 + cuiwenbo msg=bcec38ad —
   P0-D1 Helm worker Neo4j env missing (Singapore graph viz
   root-cause); P1-D1 e2e shape matrix gap; P1-D2 Nebula no Helm
   first-class; P1-D3 typed schema 缺 vector backend exposure.

4. chenyexuan NIT — Lesson #16 candidate cite added §6.

5. Planetegg msg=eb9de4b0 NIT — P2-S1 量化 max_nodes*2 default
   1000→2000 / hybrid default 1000 max 5000; msg ID corrections
   §7 (msg=41665d7e Singapore multitenant verify, msg=eb9de4b0
   P2-S1 quantification, dropped invalid msg=ec358a3e).

冬柏 PR #1927 commit b2234ae fold-in §5.3 (38 cases incl
zero-side-effect + replay idempotency post-NIT).

P0 list final: P0-V2 (filter silent, Bryce P0-A) + P0-V3+V4
(score normalization, Bryce P0-B) + P0-G1 (bulk_upsert, 冬柏
PR #1927) + P0-W1 (compat-test paths, chenyexuan PR #1926) +
P0-D1 (Helm Neo4j env, dongdong PR #1929).

* docs(task-61): § 3.1.1 historical residue cleanup per Weston msg=fdf04a69 NIT — strike old P0 hot-fix path (P0-V1 已下沉 P1-V4 per Bryce first-principles verify)

* docs(task-61): final consistency cleanup per Weston msg=e414d3cf — line 14 count 4+3+4 to 3 P0 + 4 P1 + 4 P2; § 5.1 P0-V1 line removed; § 5.2 P1-V4 defense-in-depth boundary test added
…on (task #61 P0-A + P0-B) (#1930)

* feat(vectorstore): cross-adapter filter fail-loud + score normalization (task #61 P0-A + P0-B)

Closes the two task #61 vector-adapter contract gaps PM @不穷 dispatched
to me (msg=a387a81e) and architect @符炫炜 ratified (msg=7646eb4f),
collapsed onto a single PR per Weston's contract-matrix scope (msg=8beffab5).

P0-A — filter fail-loud
-----------------------
* Add ``UnsupportedFilterError`` to ``aperag.vectorstore.base`` as a
  cross-adapter exception type. Subclasses ``TypeError`` so existing
  ``except TypeError`` callers (pgvector translator pre-this-PR) keep
  working unchanged.
* Qdrant ``_normalize_filter_input`` now raises instead of logging a
  warning + ``return None``. The previous behaviour silently dropped
  the filter and degraded the search into a tenant-wide unfiltered scan
  — a correctness footgun, not graceful degradation.
* Pgvector ``_SqlFilter._walk`` re-types its raise to the same exception
  so both backends fail the same way on the same input.

P0-B — score normalization onto [0, 1] with higher = better
-----------------------------------------------------------
* Add ``normalize_score(metric, raw)`` and inverse
  ``denormalize_threshold_to_native(metric, normalized)`` to
  ``aperag.vectorstore.base``. Cosine clamps to [0, 1]; euclid maps
  ``-L2`` via ``1/(1+L2)`` onto (0, 1]; dot uses a numerically-stable
  sigmoid onto (0, 1). All three transforms are monotone so top-k
  ordering is preserved versus the raw form.
* Both adapters apply ``normalize_score`` before constructing
  ``SearchHit`` and use ``denormalize_threshold_to_native`` to push
  ``QueryRequest.score_threshold`` down to the native query (SQL
  ``WHERE score >= …`` / Qdrant ``score_threshold=``) so the server-
  side cutoff is exactly equivalent to a Python post-filter on the
  normalized score. A belt-and-braces post-filter catches any inverse-
  roundoff drift so the [0, 1] contract holds exactly.
* ``SearchHit.__post_init__`` now validates ``0.0 <= score <= 1.0`` so
  any future direct-build path that bypassed an adapter's normalization
  surfaces at the DTO boundary instead of polluting downstream
  score-threshold logic.
* ``base.VectorStoreConnector`` docstring + ``search()`` contract
  updated to spell out the §5/§6 invariants.

Tests
-----
* New ``tests/unit_test/vectorstore/test_score_normalization.py``: range
  invariants per metric, ordering preservation, denormalize→normalize
  roundtrip on (0, 1), endpoint behaviour (-inf / +inf clamps for
  pushdown), and ``UnsupportedFilterError isinstance TypeError``.
* Existing translator unit tests updated to assert the cross-adapter
  exception type while still asserting ``TypeError`` for back-compat.
* ``tests/integration/compat/test_vector_compat.py`` adds three new
  cross-backend cases (filter fail-loud, score in [0, 1], threshold
  direction, top-k ranking monotone) so the contract is pinned across
  PGVector × Qdrant under compat-test, not just per-adapter.

Per spec PR #1928 § 2.2 / § 5.3, follow-up boundary test sub-PR by
@huangheng will extend the parametrize fixture to cover the full
PGVector × Qdrant × {cosine, euclid, dot} 6-cell grid; this PR ships
the cosine cell (the only metric currently exercised by the compat
fixture) plus the per-metric unit tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(vectorstore): annotate cosine-tuned default score thresholds (huangheng NIT 1)

huangheng PR #1930 line-level CR (msg=5eb7315c) NIT 1 fold-in: caller
chain audit surfaced that all three in-tree default thresholds
(``DEFAULT_VECTOR_SCORE_THRESHOLD = 0.72`` × 2 + retrieval ``score_threshold = 0.5``)
were tuned on cosine-distance embeddings. After P0-B normalization the
[0, 1] number is directly comparable across adapters but the *intent*
is still cosine-grade strictness — collections that pick ``euclid`` or
``dot`` distance may want to override.

This commit only adds explanatory docstrings; no behaviour change.
The metric-aware default refactor (Lesson #12 v7.3 cross-PR default
value alignment family) stays as a follow-up sub-PR per huangheng's
non-blocker NIT framing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(vectorstore): negate Qdrant euclid raw at adapter boundary (Weston BLOCKER)

Weston msg=86e05a8e caught a real bug in PR #1930's P0-B implementation:
``normalize_score("euclid", raw)`` assumes the canonical "negative L2,
higher=better" raw form (which pgvector's ``_score_expr = -(<->)``
produces directly), but Qdrant returns positive L2 distance natively
(smaller=better). Result: every Qdrant euclid hit was clamped to L2=0
→ score=1.0, and a tight ``score_threshold=0.9`` returned an empty
list because the inverse threshold was a negative number that Qdrant
re-interpreted as a positive-L2 *upper* bound (vacuous).

Per architect msg=06902347 + huangheng msg=99b52499, fix-forward Option
A: keep the shared ``normalize_score`` / ``denormalize_threshold_to_native``
helpers' contract (input is canonical "higher-is-better raw", output is
[0, 1]) and convert at the Qdrant adapter boundary for the asymmetric
metric. Cosine + dot agree on convention across both backends so they
need no boundary work; only euclid is asymmetric.

Changes
-------
* ``aperag/vectorstore/qdrant_connector.py``:
  * ``search()`` now negates ``p.score`` before calling
    ``normalize_score`` when the metric is euclid.
  * Threshold pushdown: when the metric is euclid, the helper-returned
    "negative L2" gets flipped back to a "positive L2 upper bound"
    before passing to Qdrant's native ``score_threshold``. Pre-existing
    ``+inf`` (return empty) / ``-inf`` (omit threshold) edge cases stay
    intact.
* ``aperag/vectorstore/base.py``: docstring for the score-normalization
  block now documents the canonical "higher-is-better raw" convention
  the helpers operate on, calls out the Qdrant euclid asymmetry
  explicitly, and pins the responsibility on adapters (math-only helper,
  adapters do raw → canonical conversion).

Tests (Weston requested cross-metric Qdrant-native verify)
----------------------------------------------------------
``tests/unit_test/vectorstore/test_score_normalization.py`` adds four
end-to-end Qdrant ``:memory:`` regressions:

* ``test_qdrant_euclid_normalized_scores_strictly_decreasing_with_distance``
  — pins Weston's exact failure mode: near/mid/far must produce
  strictly decreasing normalized scores.
* ``test_qdrant_euclid_score_threshold_filters_far_keeps_near`` — pins
  the threshold-pushdown direction: ``score_threshold=0.9`` must keep
  the L2=0 near point and drop the L2=3 far point.
* ``test_qdrant_dot_normalized_scores_strictly_increasing_with_inner_product``
  — explicit pin that dot is *not* asymmetric and a future refactor
  must not negate it accidentally.
* ``test_qdrant_cosine_normalized_scores_strictly_increasing_with_similarity``
  — completes the per-metric Qdrant pin so all three native conventions
  are documented next to each other.

Local ``uv run pytest tests/unit_test/vectorstore/`` → 146 passed, 10
skipped, 1 warning. Existing PGVector + cosine compat tests unchanged.

Sediment fold-in candidates per huangheng msg=99b52499:
* Lesson #12 v9 second-application demo (Weston msg=86e05a8e + Bryce
  msg=23a2f514, double-source) — first-principles verify catches
  surface-signal mistakes
* Lesson #12 v7 extension candidate — external API contract verify
  (Qdrant ``p.score`` raw convention vs in-tree docstring assumption)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…t spot) (#1925)

* feat(task-30-b3): lock graph_extraction_window_size default = 2

总架构师拍板甜蜜点 per earayu2 directive msg=adb0c366「效果稍微降低
一点是可以接受的,总架构师拍板一个甜蜜点,默认至少是 2,根据性价比」.

B2 evidence-grounded sweet spot analysis (Planetegg msg=096e0089 full
matrix + Weston msg=9ae48560 + Planetegg msg=a33607aa + 架构师
msg=08ebb696 + msg=f1feb2f1 三方收敛):

- window=2 跨模型稳定 (json_ok=1.0, source_valid≥0.992)
- Qwen entity -0.07 + relation +0.028, calls -50%, cost -26%, wall -44%
- Gemini entity -0.035 + relation +0.028, cost -17%, wall -13%
- window=3 dominated (Gemini json drift 1/6 + Qwen wall +20% + relation drop)
- window=5 model-specific (Gemini good, Qwen entity 0.754 跪)

Changes:
- aperag/indexing/graph_extractor.py: _DEFAULT_GRAPH_EXTRACTION_WINDOW_SIZE
  1 → 2 + docstring fold sweet spot rationale + sample limitation
- aperag/schema/common.py: KnowledgeGraphConfig.graph_extraction_window_size
  description default 1 → 2 + override 推荐 (legacy=1 / Gemini=5)
- docs/zh-CN/architecture/task-30-graph-chunk-window-spec-v1.md § 4.2
  rewrite to lock 章节 + B2 全矩阵数据 + sweet spot rationale +
  collection-level override 推荐 + sample 限制免责

Sample limitation免责: 3 个 benchmark 文档 insufficient for per-model
auto default; future change requires ≥10 samples + ≥3 models 同时不
退步 + PM + architect + earayu2 三方 confirm.

* docs(task-30): § 4.2.5 fix-forward per Planetegg msg=1106a78f NIT — defer indexing-retrieval-kg.md amend to follow-up; this PR scope = code default + Pydantic Field description + spec § 4.2 lock only

* docs(task-30): fix-forward Weston msg=1b7d9bef BLOCKER 1 + huangheng msg=bf785b12 NIT 1 — schema.d.ts default=2 align + § 3.1.1 line 85 default=1 → default=2 lock per § 4.2 sweet spot
…ss-backend (#1927)

* test(compat): task #61 P1 — bulk_upsert_entity_with_lineage_parts cross-backend

PM @不穷 elevated this Protocol method as a P0 audit gap (msg=10b753e8).
Until now ``bulk_upsert_entity_with_lineage_parts`` (Wave 8 W8-2) had
no cross-backend test in `tests/integration/compat/`, even though all
three production backends (Postgres / Neo4j / Nebula) implement it
and the indexing worker uses it for the LineageEntityMerger merge step.

Bulk write paths are exactly where backend differences emerge — batch
size limits, transaction atomicity, error handling, dedup contract —
and the lack of a parametrized matrix here meant any silent drift in
the bulk semantics would survive merge.

This adds 7 new parametrized cases that pin the Protocol contract
declared in `aperag/indexing/graph.py:575+`:

* empty parts is a no-op (no implicit row creation)
* mixed-name parts raise ValueError (atomicity guarantee)
* round-trip: 3 distinct (document_id, parse_version) parts visible after
* dedup last-wins within a single bulk call
* bulk replaces existing rows on matching key (same as single upsert)
* bulk with distinct keys appends, never wipes pre-existing lineage
* per-part entity_type follows last-wins rule

Coverage delta: 30 → 37 cross-backend cases (collect-only verified).

Sister to chenyexuan PR #1926 — without that workflow path fix, this
test never triggered on PRs that touch `aperag/indexing/graph_storage/*`.
Both PRs together restore real CI gating on cross-backend regressions
for the LineageGraphStore Protocol surface.

Part of task #61 DB compat audit (earayu2 directive msg=f26b703e),
testing-lane slice (task #67, claimed via msg=e02c3028).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(compat): task #61 P1 — fold huangheng+ziang NIT into bulk_upsert tests

Two non-blocking NITs from @huangheng msg=99b5ffd5 + @ziang msg=84f5c3cc
re-CR on PR #1927 — fold-in to land more complete test:

* `_rejects_mixed_names` now also asserts post-raise zero-side-effect
  (`get_entity("Alice") is None` + `get_entity("Bob") is None`) — pins
  Lesson #12 v6.4 aggregation-chain invariant: a backend that swapped
  validation order to raise AFTER the first row write would silently
  leak partial state.

* New `_replay_is_idempotent` case — pins the Protocol's "Forward-only
  retry safety: per-part dedup so replays are idempotent" contract.
  A backend that appended on replay (instead of dedup-then-replace)
  would silently duplicate lineage members under retry.

Coverage delta: 37 → 38 cross-backend cases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(compat): task #61 P1 — fold huangzhangshu description_parts NIT

Per @huangzhangshu testing primary CR (msg=5bbc5d1a) — the bulk_upsert
cases pinned lineage member identity but did not assert
``description_parts`` text content. A backend could write the lineage
member key correctly but silently drop or stale-keep the description
text, breaking the agent context retrieval contract.

Add `description_parts` key→text assertions to 3 cases:

* `_round_trip` — all 3 (doc_id, parse_version) parts must carry their
  source bulk's description text (not silently dropped).
* `_dedup_last_wins_within_bulk` — same-key collapse must keep the
  LAST description text within the bulk (not first).
* `_replaces_existing_same_key` — bulk's strip-then-append must
  overwrite the prior single-write description (not silently keep it).
* `_replay_is_idempotent` — replay must overwrite first call's
  description with the second's (last-wins on replay), not just dedup
  the member.

Coverage delta: same 38 cases, but every dedup/replace/replay case now
pins both lineage AND description_parts text contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…#1932)

§ 四 加 8 lesson sediment(task #30 B3 + task #61 全 P0 闭环累计实证)+ § 六 sediment 引用追加 6 PR commit cross-link + § 八 修订记录追加本 PR fold trail。

新增 lesson:
- Lesson #12 v7.4: external API raw contract verify (task #61 P0-B PR #1930
  Qdrant euclid raw direction first-application + fix-forward 1e30a00)
- Lesson #12 v8 second-application: test docstring fake guardrail (task #61
  P0-G1 PR #1927 description_parts assertion 缺位 fix-forward 1953933)
- Lesson #12 v9: first-principles verify catch surface signal mistakes
  (task #61 P0-V1 重新定性 Bryce + task #61 P0-B Qdrant euclid Weston catch
  双独立 source 同源 first/second-application)
- Lesson #13 v2.3: deploy manifest dual-side rewrite (task #61 P0-D1 PR #1929
  Helm Neo4j worker env first-application)
- Lesson #13 v3 application demo 2: cross-source default value alignment
  (task #30 B3 PR #1925 commit dae43f5 三 source 同步 first-application)
- Lesson #14 application demo: spec 内部 default 漂浮 multi-iteration cleanup
  (task #30 B3 PR #1925 fix-forward dae43f5 § 3.1.1 line 85 cleanup
  second-application demo, first-application 在 task #35 6 轮 fix-forward)
- Lesson #16: CI workflow paths filter dead reference 反 pattern (task #61
  P0-W1 PR #1926 first-application demo + Lesson #15 file-move 3-step verify
  升级到 v2 4-step grep .github/workflows/*.yml paths 同步)
- Lesson #17: backend 收敛 contract 优于上层 fork (simple-stable + private-deploy
  paramount directive earayu2 msg=1224bec8 在 cross-adapter contract 设计时
  应用; task #69 P0-B + task #70 P1 候选 1 cross-PR 一次性收敛 first-application)

跨 PR 多独立 source 同源 catch trail:
- Lesson #12 v9: Bryce msg=23a2f514 + Weston msg=86e05a8e 双独立 source
- Lesson #16: chenyexuan msg=f298011e + 冬柏 msg=3e93bb64 双独立 source
- Lesson #17: cuiwenbo msg=cedc7703 + Bryce msg=9895a148 双独立 source
- Lesson #13 v3 application demo 2: huangheng msg=bf785b12 + Planetegg
  msg=c63acbf5 + Weston msg=1e6b0838 三独立 source

per architect msg=c4cdf634 + msg=daaeeab5 + msg=03c892e0 sediment dispatch.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…n_window_size (#1933)

Codify Lesson #13 v3 (cross-source default value alignment) as a CI
unit test gate so future task-#30 B3-class drift is caught by
``cicd-push.yml`` lint+unit instead of by reviewers via fix-forward
rounds.

Background — task #30 B3 (PR #1925, merge ``43648f9``) locked
``graph_extraction_window_size`` default to ``2`` across **four**
sources that all need to agree:

1. ``aperag/indexing/graph_extractor.py``
   ``_DEFAULT_GRAPH_EXTRACTION_WINDOW_SIZE`` (Python const, runtime
   fallback)
2. ``aperag/schema/common.py``
   ``KnowledgeGraphConfig.graph_extraction_window_size`` Pydantic
   ``Field(examples=[N])`` (OpenAPI / TS schema source)
3. ``web/src/api-v2/schema.d.ts`` JSDoc ``@example N`` (frontend client
   surface — committed to repo, can drift if regen skipped)
4. ``docs/zh-CN/architecture/task-30-graph-chunk-window-spec-v1.md``
   § 3.1.1 line 85 ``**B3 lock default `N`**`` + § 4.2
   ``**`graph_extraction_window_size = N`**`` (architectural source of
   truth that PRs CR against)

PR #1925 itself surfaced the drift class:
- Weston ``msg=1b7d9bef`` BLOCKER 1 caught ``schema.d.ts`` still
  carrying default ``1``
- huangheng ``msg=bf785b12`` NIT 1 caught § 3.1.1 line 85 still saying
  default ``1``
Both required a fix-forward commit (``dae43f5``).

Why a unit test (not a boundary test): ``tests/boundaries/`` is not
currently invoked by ``make test-unit`` / ``test-integration`` /
``cicd-push.yml`` (task #33 Layer 1 audit finding).
``tests/unit_test/`` runs on every push via ``make test-unit``. Per
simple-stable directive (earayu2 ``msg=1224bec8``), the cheapest
reliable gate is a unit test in the existing CI lane, not a new
workflow file.

Scope discipline: pins **default value parity** across four sources
only. Does not pin description text, override-recommendation phrasing,
or rationale wording. If a future change moves the default away from
2, the test fails with a list of all observed values per source plus
the procedural reminder (``≥10 samples + ≥3 models 同时不退步 + PM +
architect + earayu2 三方 confirm``).

Tests:

- ``test_graph_extraction_window_size_default_consistent_across_sources``
  — the main gate (asserts all 4 sources agree)
- ``test_graph_extraction_window_size_default_is_positive_integer`` —
  sanity (window assembler math requires ``>= 1``)
- ``test_individual_source_extractor_does_not_raise[*]`` — separates
  "extractor broken" failures from "values drifted" failures so
  operator immediately knows whether to fix test infra or schema

Local validation:

- 5/5 pass in clean state
- Synthetic drift on each of (Python const / TS schema / spec § 3.1.1 /
  spec § 4.2) caught with clear actionable error message naming the
  drifting source
- Full ``tests/unit_test/contracts/`` 58/58 pass
- ruff format + ruff check clean

Sediment cross-link: this gate is the codified counterpart to
huangheng PR #1932 § 四 Lesson #13 v3 application demo 2 + Lesson #14
application demo (PR #1925 § 3.1.1 multi-iteration cleanup) — that PR
records the drift class as a CR-checklist lesson; this PR enforces it
mechanically so the lesson does not have to be remembered.

task #33 Layer 2 P3 (chenyexuan claim, in_progress) per PM dispatch
``msg=65465f9e``.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…1931)

task #31 spec v1 lock — graph 节点合并扫描 + 后台建议任务设计文档入仓。

## 设计核心

- **scope reframe**: extract / fix / extend Wave 7 §K.12.4 全栈,不 build new
- **独立 queue family** `q:graph_curation_run`:lane 不污染 Modality + DocumentIndex + reconciler,独立 push/pop API
- **trigger 三策略 reconcile**: manual/cron full sweep 走 worker pop → generate_graph_curation_run_task;auto_post_ingest 保 sync inline detect_for_sync 但同 description-free invariant
- **复用 GraphCurationSuggestion table**:不引入新 merge_suggestion table,仅 extend 4 新 status enum + evidence_refs field
- **状态机 Option B (apply_pending + ACCEPTED legacy)**: pending → dismissed | rejected | apply_pending → applying → applied | apply_failed;现有 ACCEPTED 历史 sync handle_action terminal status 保留 legacy read-only,新 async path zero-write gate
- **description-free 6 call sites + 1 apply path** (Wave 5 invariant): candidate_generation.py:43/179-181/196-197 + dto.py:59-65/101-105 + merge_candidate_detector.py:257-284 + :322-328 + lineage_merge.py:246-317 apply variant
- **LineageEntityMerger application-layer cross-backend contract** (Protocol 不含 merge_entities,复用 LineageGraphStore primitives)
- **entity_type scope lock 三层**: v1 仅 compatibility/penalty signal,suggestion 容忍 type 近似展示 observed_types/type_conflict/suggested_entity_type,entity_type_alias 独立 suggestion kind 移 Phase B/P1 follow-up #31-C3
- **复用 /graphs/merge-suggestions endpoint + extend SUGGESTION_ACTIONS dismiss + Pydantic Field validator confidence_score [0,1]**

## 集体 8/8 lane LGTM 收齐

- @bryce (msg=9e49d440): 5 BLOCKER 全清 + entity_type scope lock + Migration chain 一致性
- @weston (msg=ed202960 + 92dd89ff): 五类 consistency sweep + entity_type 三层架构 + Migration chain
- @huangzhangshu (msg=9a4cbd61 + 68783841): 五类旧口径清成 Phase A/B gate + enum count micro-fix
- @ziang (msg=760b7341 + 0b761117): impl-lane 5 BLOCKER + state machine Option B + enum count
- @huangheng (msg=535de81b): Lesson framework v5/v6/v7/v8/v9/v13/v14/v16/v17 + Lesson #18 候选 cross-link + Migration chain 时序 全一致
- @dongdong (msg=8316b45a): FE/UI scoped + entity_type FE 友好性 + state machine
- @Planetegg (msg=7d428e33): SRE/deploy Helm render gate symbolic lane assertion
- @cuiwenbo (msg=594fbd4f): 3 NIT (endpoint reuse + status enum FE typed schema sync + confidence_score [0,1] validator) 全 fold

## CI 状态

- lint-and-unit ✅
- e2e-http-smoke 3/3 ✅  
- e2e-http-provider-preflight 3/3 ✅
- docs-only lite gate 满足

## 关联

- 不阻塞 PR #1932 (huangheng sediment merged dc79aad) / PR #1933 (chenyexuan merged 1024ef9) / task #61 P1/P2 follow-up / task #11 GC orphan vector follow-up
- Phase A 4 sub-task 派单 spec lock 后立即可启动 (推荐 owner: A1+A3 Bryce/ziang / A2 ziang / A4 dongdong+cuiwenbo)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
task #31 Phase A2 implementation (ziang).

## Scope
- extend GraphCurationSuggestionStatus enum +5 new values: APPLY_PENDING / APPLYING / APPLIED / APPLY_FAILED / DISMISSED
  - DISMISSED added because main 现有 enum 实际没有 (ziang 第一性原理 grep main 实证, spec § 3.1.6 假设错误)
- add graph_curation_suggestions.evidence_refs column + Alembic migration (revision 7a2b1c3d4e5f)
- add response_model to /graphs/merge-suggestions read/run/action endpoints (OpenAPI + FE typed schema regen)
- legacy ACCEPTED zero-write contract test (test_accepted_status_write_is_legacy_service_only) - grep gate 钉 main 全 codebase 仅 service.py 允许写 ACCEPTED
- preserve legacy FE compatibility fields (suggestion_batch_id alias run_id, merge_reason alias reason, suggested_target_entity projection fallback) - per dongdong msg=99aa83ea BLOCKER fix-forward 3b447df

## Architect ratify
- Spec 4 边界 cross-check 全过 (独立 queue / 复用 table+endpoint / description-free 不在 A2 范围 / async apply 状态机 + ACCEPTED legacy zero-write gate)
- ziang first-principles grep main verify catch spec drift (DISMISSED 假设错误) - Lesson #12 v9 second-application demo
- dongdong catch response_model legacy filter BLOCKER + ziang fix-forward projection layer 解决 - mini-pattern 19 candidate

## CI
- lint-and-unit ✅
- e2e-http-smoke 3/3 ✅
- provider-preflight 3/3 ✅
- e2e-http-provider 3/3 ✅

🤖 Architect ratify by Claude Code
per earayu2 directive (PM msg=fb070544): spec lock 后出一版自然中文实施方案,语言口语化,面向非技术读者,方便 PM 派单 + 协作方对照阅读。

doc 内容:
- §0 这个 task 在做什么 (举例 Apple Inc./苹果公司/苹果)
- §1 现状 (Wave 7 §K.12.4 全栈已存在 + 5 个待修问题表格)
- §2 Phase A 4 个并行子任务详细步骤:
  - #31-A1 抽 worker lane (Bryce/ziang)
  - #31-A2 扩展状态枚举 (ziang,含 ACCEPTED legacy semantic 解释)
  - #31-A3 description-free 修 6+1 处 (Bryce/ziang)
  - #31-A4 复用 endpoint + 前端扩展 (dongdong+cuiwenbo)
- §3 entity_type 边界锁 (v1 仅 compatibility signal)
- §4 派单建议表格
- §5 Phase B/C 概览
- §6 不阻塞清单

对应 spec: task-31-graph-node-merge-spec-v1.md (PR #1931 merged 29b82e2)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…1941)

* refactor(task-31-a3): description-free graph_curation 7 call sites

Wave 5 description-NULL invariant (task #31 spec § 3.1.5): graph
extractor stopped emitting `description` text post Wave 5 task #5
(facts/vectors split). The dedup detection / scoring / snapshot /
accept-apply paths still read `entity.description` /
`compacted_description` / `description_parts` and would either
silently degrade scoring (always-empty bag-of-tokens) or leak stale
fragments from pre-Wave-5 rows into reviewer-facing suggestions.

Fix the 6 detector / snapshot call sites + 1 apply path enumerated
in the spec, plus 1 service-layer helper surfaced by the boundary
test grep gate:

  1. candidate_generation.py:38  entity_snapshot — drop description
  2. candidate_generation.py:179 _lexical_signals — drop description
                                  Jaccard token overlap
  3. candidate_generation.py:196 _pair_score — drop description
                                  scoring weight (signal no longer
                                  emitted; branch is dead)
  4. dto.py CurationEntity.from_lineage — set description="" instead
                                  of deriving from compacted /
                                  description_parts; keep field on
                                  the dataclass for back-compat
                                  with callers that still pass it
  5. merge_candidate_detector._description_text_for_scoring →
     _embedding_query_text — embed `<name> (<entity_type>)` (mirror
     of how the graph_vectors worker writes the entity vector,
     Wave 5 task #5 / #7); the legacy method always short-circuited
     to "" post Wave 5 so detection produced zero candidates
  6. merge_candidate_detector._to_legacy_entity — pass
     description="" instead of reading from entity
  7. merge_candidate_detector._snapshot — drop description key from
     persisted entity_snapshots payload
  +1 lineage_merge.py — add merge_entities_apply_description_free
     variant for the async accept-apply worker (task #31 § 3.1.5).
     Skips LLM unified description / Compactor pass /
     __curation_merge__ sentinel description write / vector embed
     write per the spec «不调» list. Legacy merge_entities path is
     preserved for manual sync API back-compat
     (Lesson #14 multi-iteration cleanup follow-up).
  +1 service._fetch_shadow_neighbors — replace
     `entity.description or entity.name` with `entity.name`;
     post Wave 5 the description is always "" so the fallback was
     a no-op, and reading description here violates the boundary
     gate.

Boundary gate (tests/boundaries/test_graph_curation_description_free.py,
4 AST-level assertions per spec § 5.2.a):

  - graph_curation_modules_do_not_read_entity_description
  - merge_candidate_detector_does_not_read_entity_description
  - lineage_merge_apply_description_free_does_not_read_entity_description
  - lineage_merge_apply_description_free_does_not_call_llm_or_compactor

Allowlist:
  - lineage_merge.merge_entities (legacy back-compat) excluded by file
  - dto.py field declaration excluded (annotation, not a read)
  - LineageMergeResult.compacted_description (non-entity result shape
    used by legacy sync handle_action API) excluded by base name

Wave-5 invariant codify pattern (Lesson #18 candidate, per huangheng
PR #1932 + chenyexuan PR #1933 first-application demo): lesson
sediment (cr-checklist § 四 Wave 5 description-NULL family) +
mechanical gate (this boundary test) — paired so future regressions
fail at CI not at review time.

Tests: 1466 unit + 104 boundary all green.
Risk: 0 production behavior change for legacy sync handle_action API
(merge_entities preserved); new accept-apply async path uses the
description-free variant exclusively.

Spec: docs/zh-CN/architecture/task-31-graph-node-merge-spec-v1.md § 3.1.5
Task: task #77 (Phase A3) under task #31 umbrella

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(task-31-a3): fold huangheng cr-checklist Lesson #14/#18 NITs

Per @huangheng cr-checklist Lesson #14 + #18 候选 cross-link verify
(msg=be330423) — 2 non-blocker NITs on PR #1941 fix-forwarded:

NIT 1 (service.py:244 deprecation marker):
  Add deprecation comment on the legacy sync ``handle_action()`` API
  return-shape line that reads ``merge_result.compacted_description``.
  Aligns with Lesson #14 «老 path 保留 + 标 deprecation» pattern
  (matches the ``lineage_merge.merge_entities`` deprecation marker
  added by the main commit), and explicitly cross-links the boundary
  test allowlist mechanism (``NON_ENTITY_BASE_NAMES``) so future
  grep-based audits don't dispatch on the read.

NIT 2 (boundary test docstring bonus catch cross-link):
  Add explicit Lesson #18 候选 second-application demo trail in
  ``tests/boundaries/test_graph_curation_description_free.py``
  module docstring — cite the ``service.py:845`` bonus catch
  (``text = entity.description or entity.name`` inside
  ``GraphCurationService._fetch_shadow_neighbors``) as canonical
  proof of the «lesson sediment + mechanical gate 双 layer
  codification» value. The spec § 3.1.5 ratify (符炫炜 + Bryce +
  ziang + huangzhangshu + Weston multi-source review) listed exactly
  6+1 sites and every reviewer + spec author missed this 7th hidden
  read; the boundary gate caught it on first run, turning
  ``reviewer-as-detector`` into ``CI-as-detector`` per the
  Lesson #18 thesis.

0 production code change beyond comment / docstring text.

Tests: 4/4 boundary test pass + ruff format / check clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(boundary): include dto.py in description-free AST scan

Per @huangzhangshu BLOCKER (PR #1941 testing-lane CR, msg=2deb5407)
+ @ziang second-source ratify (msg=f485803c) + @不穷 PM dispatch
(msg=a6cd42c9): the boundary gate
``test_graph_curation_modules_do_not_read_entity_description`` was
whole-file excluding ``aperag/graph_curation/dto.py`` to avoid
flagging the dataclass field declaration. But spec § 3.1.5 item 4
explicitly lists ``CurationEntity.from_lineage`` as one of the 6
description-free call sites, so the gate must catch future
regressions that re-introduce
``entity.compacted_description`` / ``entity.description_parts``
reads inside ``from_lineage``.

The whole-file exclusion was a false-positive prevention that
turned out to be unnecessary: the AST walker matches
``ast.Attribute`` reads only, and dataclass field annotations
(``description: str = ""``) are ``ast.AnnAssign`` nodes with
``target=ast.Name``, while constructor keyword args
(``cls(description="")``) are ``ast.keyword`` nodes — neither is
an ``ast.Attribute`` access on an entity object.

Drops the whole-file exclusion and adds two reinforcing
sister-tests so future maintainers do not regress this:

* ``test_dto_module_is_in_boundary_scope`` — synthetic-AST
  positive control: feeds a fake ``from_lineage`` body that reads
  ``entity.compacted_description`` through the same offender
  detector and asserts the offender is surfaced. If a future
  refactor breaks the AST walker, this test catches the silent
  protection-loss.
* ``test_dto_field_declaration_is_not_a_false_positive`` — live
  negative control: confirms the production ``dto.py`` produces
  zero offenders, with a docstring directing future maintainers
  to fix the walker (NOT re-allowlist the file) if a false-
  positive is ever observed.

6/6 boundary tests pass + ruff format / check clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…rker lane (#1938)

* feat(indexing): task #31 Phase A1 — independent graph_curation_run worker lane

Closes task #75 per PM @不穷 dispatch (msg=4068e5e2). Implements spec
``task-31-graph-node-merge-spec-v1.md`` § 3.1.1 + § 3.1.1.b + § 5.2.a:
extract the graph node merge suggestion full-sweep run from the
API-process ``asyncio.create_task(asyncio.to_thread(...))`` fire-and-
forget into a dedicated worker lane on the indexing-worker process,
with state isolation from the existing per-:class:`Modality` queue
family.

Why a dedicated lane (per ziang msg=92321bcc + Bryce msg=4c23f87e
BLOCKER 1)
-----------------------------------------------------------------
``WorkQueue.push/pop`` is keyed by :class:`Modality` (Redis key
``q:indexing:<modality>``) and the per-modality entrypoint machinery
is bound to :class:`ModalityWorkerFactory` + :class:`DocumentIndex`
payloads. ``GraphCurationRun`` is a per-collection / per-run job,
**not** a per-document modality state — strapping it onto the
modality keyed queue would pollute ``DocumentIndex`` / reconciler /
index_state semantics. So this PR builds a parallel queue family
(``q:graph_curation_run``) and a parallel run loop on top of the
shared :class:`RedisWorkQueue` connection / quota / metrics
infrastructure but with full state isolation.

Trigger reconcile (per spec § 3.1.1.b)
--------------------------------------
* **manual / cron** — ``GraphCurationService.start_run`` (API
  process) creates the ``GraphCurationRun`` row and enqueues the
  ``run_id`` onto ``q:graph_curation_run``. The worker pop calls
  ``generate_graph_curation_run_task`` integration path.
* **auto_post_ingest** — existing
  ``MergeCandidateDetector.detect_for_sync`` end-of-sync inline
  ``GraphModalityWorker.sync`` path, intentionally NOT routed
  through this worker. That stays as a write-only quick path and is
  description-free fixed by task #77 A3 (chenyexuan).

Changes
-------
* ``aperag/indexing/orchestrator.py``: extend ``WorkQueue`` Protocol
  with ``push_graph_curation_run`` / ``pop_graph_curation_run``;
  implement on :class:`InMemoryWorkQueue` and
  :class:`RedisWorkQueue`. New ``GRAPH_CURATION_RUN_KEY =
  "q:graph_curation_run"`` constant — distinct from the
  ``q:indexing:<modality>`` template, so a Redis ``KEYS`` audit can't
  confuse the two families.
* ``aperag/indexing/graph_curation_run_orchestrator.py`` (new):
  :func:`run_graph_curation_run_worker` async loop mirroring the
  ``run_parse_worker_loop`` shape — pop, decode via
  :class:`GraphCurationRunDispatchPayload`, dispatch
  ``generate_graph_curation_run_task`` on a worker thread (so the
  asyncio loop stays free), drain in-flight on shutdown.
* ``aperag/indexing/__init__.py``: re-export the new helpers.
* ``aperag/cli/indexing_worker.py``: add the new lane to the
  startup task list (independent ``asyncio.create_task``, NOT
  through ``_entrypoint(Modality, ...)``); update startup log to
  list 11 tasks. The lane is identified by symbolic name
  ``graph_curation_run`` — the boundary test asserts presence by
  name, never by count, so future lane add/remove doesn't drift the
  gate.
* ``aperag/graph_curation/service.py``: replace
  ``asyncio.create_task(asyncio.to_thread(generate_graph_curation_run_task, ...))``
  at ``service.py:114-123`` with a thin
  ``runtime.queue.push_graph_curation_run(payload)`` enqueue. Fail-
  loud if no runtime / queue is installed (rather than silently
  leaving the run PENDING forever) — matches the existing
  ``_mark_run_failed`` discipline.

Tests
-----
* ``tests/unit_test/test_app_lifespan_no_workers.py``: extend the
  positive contract list with ``run_graph_curation_run_worker``;
  add three task-#31-named tests pinning the dual-side gate per
  spec § 5.2.a:
  - positive: ``test_cli_worker_starts_graph_curation_run_lane``
  - negative: ``test_graph_curation_service_does_not_execute_run_inline``
    (greps ``service.py`` for any ``generate_graph_curation_run_task(``
    call site on executable lines — comments / docstrings are
    permitted to describe the historical pattern)
  - positive: ``test_graph_curation_service_uses_push_graph_curation_run``
* ``tests/unit_test/indexing/test_graph_curation_run_orchestrator.py``
  (new, 17 cases): payload roundtrip + key normalisation;
  in-memory queue independence (push to graph_curation_run does
  not leak into any modality queue and vice versa); Redis key
  constant distinctness; worker loop dispatch + malformed-payload
  drop + task-exception swallow + shutdown drain.

Local: ``uv run pytest tests/unit_test/test_app_lifespan_no_workers.py
tests/unit_test/indexing/ tests/unit_test/graph_curation/
tests/unit_test/vectorstore/`` → **544 passed, 10 skipped, 2 warnings**.

Spec / scope alignment
----------------------
* task #31 spec v1 § 3.1.1 independent queue family ``q:graph_curation_run`` ✅
* task #31 spec v1 § 3.1.1.b trigger split (manual/cron worker pop vs
  auto_post_ingest sync inline write-only) ✅
* task #31 spec v1 § 5.2.a lane symbolic dual-side gate
  (positive lane name appears in indexing-worker; negative API
  process must not invoke ``generate_graph_curation_run_task``
  directly) ✅
* Lesson #11 v5 entry-point migration cross-process parity (lane
  added to ``cli/indexing_worker.py`` startup; negative gate on
  ``app.py``) ✅
* Lesson #14 multi-iteration cleanup (the new symbolic lane
  assertion replaces the brittle "11th lane" count which several
  reviewers flagged in PR #1931 fix-forward 4-6) ✅

Follow-ups (NOT in this PR)
---------------------------
* task #77 A3 description-free 6+1 call site refactor — chenyexuan
* task #78 A4 FE / endpoint reuse + 7-state UI — dongdong + cuiwenbo
* huangheng follow-up sub-PR queue (sediment fold-in cycle)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(graph-curation): pin start_run enqueue behaviour (huangzhangshu PR #1938 CR gap)

huangzhangshu testing CR (msg=fe66bd72) caught: PR #1938 had source-
level grep gates on the API enqueue path but no behaviour tests. The
pre-A1 ``test_start_run_marks_failed_when_enqueue_raises`` was deleted
in Wave 3 T3.1 chunk 3 because the ``asyncio.create_task`` schedule
path could not raise. Phase A1 reintroduces a real failure path (the
``await runtime.queue.push_graph_curation_run(...)`` enqueue), so the
behaviour gate needs to come back.

Five new ``pytest.mark.asyncio`` cases covering the
``GraphCurationService.start_run`` post-transaction enqueue branch:

* ``test_start_run_enqueues_canonical_payload_when_created`` — pin
  the ``{run_id, collection_id}`` payload shape that the worker's
  :class:`GraphCurationRunDispatchPayload.from_dict` reads. Both
  fields must be ``str`` so ``from_dict`` normalisation is a no-op.
* ``test_start_run_does_not_enqueue_when_run_already_active`` —
  ``created=False`` (existing PENDING/RUNNING) MUST NOT re-enqueue.
  Without this, every duplicate API call would multiply Redis
  payloads and waste worker LLM quota.
* ``test_start_run_marks_run_failed_and_raises_when_enqueue_raises``
  — Redis push failure surfaces as ``RuntimeError`` raised, with the
  run row marked FAILED carrying the original exception in its
  reason. Silent success would leave the row in PENDING forever.
* ``test_start_run_marks_run_failed_and_raises_when_runtime_not_installed``
  — fail-loud guard for test environments / pre-startup boot
  (``runtime is None``).
* ``test_start_run_marks_run_failed_when_runtime_has_no_queue`` —
  symmetric: runtime present but ``queue=None`` (INLINE-mode test
  runtime).

The ``_FakeQueue`` stub captures pushed payloads and toggles
``raise_on_push`` for the failure path; the heavy collaborators
(``_get_and_validate_collection``, ``execute_with_transaction``,
``_mark_run_failed``, ``_run_to_dict``) are stubbed at the instance
level so the test exercises only the post-transaction enqueue
branch — matches the existing test-style in this file.

Local: ``uv run pytest tests/unit_test/graph_curation/test_service.py``
→ **8 passed**.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(indexing): worker fail-safe mark-FAILED for stuck-in-transit runs (Weston PR #1938 BLOCKER)

Weston PR #1938 architecture CR (msg=04c9e5ee BLOCKER) caught a real
correctness bug in the post-A1 worker catch path: the comment claimed
"task-level failure already persisted in PG" but at least three
pre-``generate_run`` raise sites bypass the service-layer
``_mark_run_failed`` and leave the run row in ``PENDING``:

* ``aperag/graph_curation/integration.py:35-37`` — collection not
  found.
* ``aperag/graph_curation/integration.py:49-61`` — backend / vector /
  embedder resolution failure.
* ``aperag/domains/knowledge_graph/tasks.py:17-26`` — log + re-raise
  without marking FAILED.

Failure chain without the fix:
1. Worker pops the payload (Redis side already consumed).
2. ``generate_graph_curation_run_task`` raises before
   ``service.generate_run`` runs, so no ``_mark_run_failed`` runs.
3. Run row stays ``PENDING``; queue payload is gone.
4. Next ``start_run`` call sees an "active" PENDING run and returns
   ``created=False`` without re-enqueueing — the collection's manual
   full sweep is permanently wedged.

Fix
---
``_mark_run_failed_best_effort(engine, run_id, error_message)``:
``UPDATE graph_curation_runs SET status='FAILED', error_message=...
WHERE id=:run_id AND status IN ('PENDING', 'RUNNING')``. The WHERE
clause keeps this update idempotent w.r.t. ``generate_run`` having
already written FAILED inside its own try/except — only stuck-in-
transit rows get rewritten, finalised rows are preserved.

The fail-safe is itself wrapped in ``try/except`` so a brief PG
outage does not propagate up and halt the worker loop — the loop
must keep popping subsequent payloads regardless. Error message is
truncated to 1024 chars to stay polite to the ``Text`` column; the
full traceback is in the worker log via ``logger.exception``.

The ``_process_one_run`` catch path now logs + invokes the fail-safe
under ``asyncio.to_thread`` (sync DB I/O) before swallowing the
exception. ``engine`` is checked for ``None`` so the pure-unit tests
that don't pass an engine keep working.

Tests
-----
* ``test_worker_loop_swallows_task_exception_and_marks_failed`` —
  upgraded the existing swallow test: after a raise the loop still
  processes the next payload AND ``_mark_run_failed_best_effort`` is
  invoked exactly once (only for the raising run) carrying the
  exception type + message in the reason.
* ``test_mark_run_failed_best_effort_only_updates_pending_or_running``
  — direct unit on the fail-safe SQL, asserts the
  ``status IN ('PENDING', 'RUNNING')`` predicate is present and the
  error message gets truncated to ≤ 1024 chars.
* ``test_mark_run_failed_best_effort_swallows_db_errors`` — a
  ``begin()`` that raises ``OSError`` MUST NOT propagate out of the
  fail-safe; the worker loop must keep going.

Local: ``uv run pytest tests/unit_test/indexing/test_graph_curation_run_orchestrator.py
tests/unit_test/test_app_lifespan_no_workers.py
tests/unit_test/graph_curation/test_service.py`` → **31 passed**.

Spec amend candidate
--------------------
Per architect msg=7af40610: spec v1.1 amend should add an explicit
"worker catch fail-safe" invariant to § 3.1.1 + § 5.2.b boundary
test gate, so the obligation is documented at the spec level rather
than discovered at impl-time again.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
task #31 Phase A4 implementation (dongdong) - FE merge suggestion UI + dismiss action.

## Scope
- backend: dismiss action support (schemas.py + service.py:handle_action dismiss branch + 三 success return 补 message field per Weston BLOCKER msg=c1595745)
- FE typed schema sync: SUGGESTION_ACTIONS 加 dismiss + MergeSuggestionStatus 7 active + 3 legacy union
- FE rendering: 7-state UI display (4 new APPLY_PENDING/APPLYING/APPLIED/APPLY_FAILED + DISMISSED + legacy ACCEPTED/EXPIRED/SUPERSEDED) + dismiss button + canonical fields 切换 + panel-open implicit run → read-only list + 显式触发扫描 button
- FE-derived P1 fields (per architect lock msg=80054596 simple-stable + 不扩后端): observed_types/type_conflict/affected_doc_count/suggested_entity_type
- 顺手 catch fix: 删 target_entity_data extension (backend extra=forbid 422 兼容 bug)
- legacy ACCEPTED label "Applied (legacy)" / "已应用 (历史)" semantic alignment (per cuiwenbo NIT 1 + spec § 3.1.2 line 131)
- test_suggestion_action_response_requires_valid_success_shapes 单测覆盖 3 path (per huangzhangshu建议 + future schemas.py field add 漂移防护)

## CR collected (4/4 LGTM)
- @符炫炜 architect ratify ✅ msg=36a0bbe4 + BLOCKER condition met msg=44813b59
- @cuiwenbo CR pair final final pass ✅ msg=9e503bda
- @huangzhangshu testing final pass ✅ msg=bf233776
- @weston architect cross-CR final pass ✅ msg=0fe380bf

## Architect own-up sediment
- SuggestionActionResponse.message required field gap = 第二个 architect ratify trust-framing miss (Weston catch via first-principles trace) - Lesson #12 v9 fifth-application demo
- mini-pattern 19 升级范围: spec → impl 边界 + impl → response_model contract 边界 + impl catch path → upstream raise points 边界

## CI
- lint-and-unit ✅
- e2e-http-smoke 3/3 ✅
- provider-preflight 3/3 ✅
- e2e-http-provider 3/3 ✅ (re-run after JSON error injected into SSE stream flake confirmed - cross-PR same signature with PR #1941 A3 = systematic flake per ci-flake-policy § 2.2 single-shape signature waiver)

🤖 Architect ratify by Claude Code
task #31 spec v1.1 amend — fold Phase A 4/4 done 实施 surface 的 spec drift + spec lock invariants + lesson sediment trail.

## v1.1 Amend Scope
- § 3.1.1 worker loop fail-safe invariant (PR #1938 Weston BLOCKER → spec lock)
- § 3.1.2 action API response shape model_validate contract (PR #1940 Weston BLOCKER → spec lock)
- § 3.1.6 DISMISSED enum source 修正 (PR #1935 ziang grep main 实证 v1 spec drift fix)
- § 5.2.b 新增 3 boundary test invariants
- § 5.2.c 新增 Phase A 实施 sediment trail
- § 6 cr-checklist 加 5 sediment items
- Migration chain 时序: 5 new value (含 DISMISSED) 不是 4
- fix-forward 1 (commit d50864f) — 全文 6 处 4→5 enum count global sweep + § 1.1 line 17 pre-A2 实证口径补齐 (per Weston BLOCKER msg=2ad46e97)

## CR
- @符炫炜 architect (own draft)
- @huangheng cr-checklist sediment cite verify ✅ msg=b276da50
- @weston framing verify ✅ msg=a111fcc3 (re-final pass post fix-forward 1)

## CI
- lint-and-unit ✅
- e2e-http-smoke 3/3 ✅ (auto-merge after green)
- provider-preflight 3/3 ✅
- docs-only lite gate satisfied

🤖 Architect ratify by Claude Code
#1943)

* docs(cr-checklist): task #31 Phase A 全闭环后 sediment fold-in 子 PR 2

§ 四 加 6 lesson sediment(task #31 Phase A 4 PR + task #33 P3 PR #1933 codify
累计实证 + multi-PR same-hour multi-source first-principles catch trust-framing
miss)+ § 六 sediment 引用追加 5 PR commit cross-link + § 八 修订记录追加本 PR
fold-in 完整 trail。

新增 lesson:
- Lesson #12 v9 third + fourth + fifth-application demos (PR #1935 ziang
  DISMISSED enum impl-side catch + dongdong response_model legacy field filter
  BLOCKER 双 same-PR / PR #1938 Weston worker fail-safe BLOCKER upstream raise
  points trace / PR #1940 Weston SuggestionActionResponse.message required
  field catch) — sediment 升级 systemic 信号 reviewer chain 必独立 first-
  principles re-verify
- Migration chain 时序 second-application demo (PR #1935 复用 table extend
  pattern 跟 PR #1910 新建 enum hard-cut migration 时序约束不同; 5 new enum
  value APPLY_PENDING/APPLYING/APPLIED/APPLY_FAILED/DISMISSED + evidence_refs
  JSON column + ACCEPTED legacy zero-write grep gate)
- Lesson #17 second-application demo (PR #1935 backend 收敛 canonical contract
  时同 PR fold-in legacy projection layer 保 backward-compat - suggestion_
  batch_id=run_id alias 等 - 跟 deprecation marker Lesson #14 family 配)
- Lesson #18 formally established: lesson sediment + mechanical gate 双 layer
  codification 「一记一 enforce」(first-app PR #1933 4-source default value
  parity / second-app PR #1941 description-free read scope + service.py:845
  bonus catch / third-app PR #1941 fix-forward sister tests 防 whole-file
  exclude 静默削弱 gate)
- mini-pattern 19: spec lock pre-check grep main 实证 enum/contract assumption
  (architect own-up 升级版三层: spec→impl / impl→response_model / impl catch
  path→upstream raise points)
- mini-pattern 20: PR adds response_model wire-up 必跑 model_validate(actual_
  handler_return_shape) boundary gate (PR #1940 first-application demo)

per architect dispatch msg=b6726ac9 + msg=420ca548 sediment trigger A 满足
(task #31 Phase A 4/4 done) 启动 + Phase B B1 lane huangheng owner.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(cr-checklist): fix cite accuracy NIT per Weston msg=7690b723

2 cite accuracy fixes (Weston framing CR catch):

1. response_model validation failure 状态码: 422 -> 500
   - response_model validation fails 抛 FastAPI ResponseValidationError
   - 通常映射到 HTTP 500,不是 request body 校验的 422
   - 影响 line 745 + line 850 描述 PR #1940 BLOCKER 时的状态码引用

2. GraphMergeSuggestionItem canonical schema 字段实证修正
   - 原写: ... / observed_types / type_conflict / suggested_entity_type
   - 实际 main aperag/domains/knowledge_graph/schemas.py::GraphMergeSuggestionItem
     不含这三字段
   - A4 (PR #1940) 这些字段是 FE-derived display (FE 从 entities /
     suggested_target_entity / evidence_refs 推导),不是 PR #1935 backend
     projection
   - 影响 line 781 sect 4 Lesson #17 second-application demo 描述

per Weston PR #1943 framing CR (msg=7690b723) - sediment cite accuracy
要求把事实漂移修干净,避免 future onboarding reference 时 confuse
422/500 状态码语义 + backend/FE field source attribution。

不阻塞 main fold-in scope - 6 lesson sediment + 5 PR commit cross-link
其他 framing 全 accurate (Weston verified)。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Adds cross-backend compat tests for LineageEntityMerger.merge_entities_apply_description_free across PostgreSQL, Neo4j, and Nebula. Pins lineage re-anchor, source deletion, stale description non-leakage, no __curation_merge__ sentinel, collaborator zero-call, replay idempotency, and alias-failure zero-side-effect contracts.
Add first-class Helm values and API/indexing-worker env injection for Nebula graph backend credentials.
Completes the HTTP E2E deployment shape matrix by adding qdrant-postgres, pgvector-neo4j, and pgvector-nebula shapes; targeted/manual workflow callers; Makefile shortcuts; docs; ci-flake-policy clarification; and unit contracts pinning the full 2x3 matrix plus path-targeted trigger coverage.
Task #61 P1-G1/G2 graph store compat coverage.
… filter Or guard + retrieve defense-in-depth (#1948)

* feat(vectorstore): task #61 P1-V vector adapter family — capability + filter Or guard + retrieve defense-in-depth

Closes task #83 per PM @不穷 dispatch (msg=29c9e753). Folds 4 P1-V
items from task #61 spec v1 § 2.3 into a single PR:

P1-V1 — collection init failure contract documentation
------------------------------------------------------
``ensure_collection`` Protocol docstring now spells out the cross-
adapter contract (idempotent / race-safe / fail-loud / cache-not-
poisoned-on-failure). Both adapters already implement these
behaviours; the documentation closes the spec drift gap so future
implementers have a checklist.

P1-V2 — batch upsert atomicity capability declaration
-----------------------------------------------------
New :class:`VectorBackendCapabilities` frozen dataclass on the base
module declares static per-backend behaviour flags. Each
``VectorStoreConnector`` subclass exposes an instance via the
``BACKEND_CAPABILITIES`` class-level attribute:

* ``PgvectorVectorStoreConnector.BACKEND_CAPABILITIES.supports_atomic_batch_upsert = True``
  (PGVector wraps bulk INSERT ON CONFLICT in ``engine.begin()`` —
  mid-batch failure rolls back the whole batch).
* ``QdrantVectorStoreConnector.BACKEND_CAPABILITIES.supports_atomic_batch_upsert = False``
  (Qdrant ``client.upsert(points, wait=True)`` is best-effort
  per-point — partial writes possible on mid-batch failure).

``upsert`` Protocol docstring now points at the capability flag so
callers know to chunk + verify on backends that declare ``False``.

P1-V3 — filter Or empty-parts guard
-----------------------------------
``Or.__post_init__`` already rejects empty ``parts`` at DSL
construction. Both adapter translators now also guard at the
translator boundary so a future refactor that bypasses the
constructor (e.g. ``object.__setattr__(or_node, "parts", ())`` on
the frozen dataclass, or a ``dataclasses.replace`` with empty
parts) can't silently degrade to a vacuous "match everything"
disjunction:

* ``aperag/vectorstore/pgvector_connector.py:_SqlFilter._walk`` —
  raises ``UnsupportedFilterError`` on empty post-walk parts.
* ``aperag/vectorstore/qdrant_connector.py:_translate_filter`` —
  raises ``UnsupportedFilterError`` on empty post-prune subs (so
  ``rest.Filter(should=[])`` — which Qdrant treats as match-all —
  is unreachable).

P1-V4 — Qdrant legacy mode defense-in-depth
-------------------------------------------
``QdrantVectorStoreConnector.retrieve`` now applies the same
``TENANT_PAYLOAD_KEY`` filter in **both** multitenant and legacy
modes, but with a backwards-compatible "no payload key → pass
through" branch so legacy-only rows that don't carry the payload
key keep working:

* In multitenant mode: filter is the primary tenant-isolation
  layer (unchanged behaviour).
* In legacy mode: collection-name isolation is the primary layer;
  the new payload-level filter is belt-and-braces against tooling
  drift / migration mistakes that could plant a stray foreign-tenant
  row in a legacy collection.

The new ``BACKEND_CAPABILITIES.supports_legacy_mode`` flag declares
which adapter supports the legacy layout (PGVector ``False``,
Qdrant ``True``) so callers can tell the difference machine-
readably.

Tests
-----
* ``tests/unit_test/vectorstore/test_backend_capabilities.py``
  (new) — pins shape + per-flag values for each adapter. Coordinates
  with cuiwenbo task #87 P1-D3 collection metadata Pydantic
  projection so the static capability matrix stays consistent
  across PRs.
* ``tests/unit_test/vectorstore/test_pgvector_translator.py`` and
  ``test_qdrant_filter_translation.py`` — pin the new Or empty-parts
  guard with frozen-dataclass-bypass coverage.
* ``tests/unit_test/vectorstore/test_qdrant_multitenancy_integration.py``
  — new ``test_retrieve_legacy_mode_filters_stray_foreign_payload``
  exercises the P1-V4 belt-and-braces filter on a real ``:memory:``
  Qdrant client: legacy-mode rows without payload key pass through
  (backward compat), own-tenant payload passes, foreign-tenant
  payload is dropped.

Local: ``uv run pytest tests/unit_test/vectorstore/`` →
**156 passed, 10 skipped, 1 warning**.

Spec / scope alignment
----------------------
* task #61 spec v1 § 2.3 P1-V1 → ensure_collection contract doc ✅
* task #61 spec v1 § 2.3 P1-V2 → BACKEND_CAPABILITIES.supports_atomic_batch_upsert ✅
* task #61 spec v1 § 2.3 P1-V3 → Or empty-parts guard ✅
* task #61 spec v1 § 2.3 P1-V4 → retrieve defense-in-depth + supports_legacy_mode ✅
* Lesson #14 multi-iteration cleanup — legacy mode flagged via
  ``supports_legacy_mode`` so a future PR can drop the mode
  entirely once telemetry confirms zero production usage ✅
* Lesson #17 backend 收敛 contract — capability declaration is the
  backend-side contract that lets callers (FE / API / MCP) read a
  single source of truth instead of forking on backend type ✅

Follow-ups (NOT in this PR)
---------------------------
* task #84 P1-G1+G2 graph store boundary tests — ziang
* task #85 P1-D1 e2e shape matrix — huangzhangshu
* task #86 P1-D2 Helm Nebula first-class — Planetegg
* task #87 P1-D3 collection metadata vector_backend projection —
  cuiwenbo + dongdong (consumes ``BACKEND_CAPABILITIES`` values)
* task #88 P2-S1+S2 batch alias resolution — Bryce after this PR

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(vectorstore): mode-specific tenant filter on Qdrant retrieve (Weston PR #1948 BLOCKER)

Weston PR #1948 architecture CR (msg=910cad66 BLOCKER) caught a real
correctness regression in the initial P1-V4 commit: the uniform
"no payload key → pass through" branch leaked stray ``{}`` payload
rows in the **shared multitenant collection** to every tenant on a
``retrieve(ids=...)`` call.

Local Qdrant ``:memory:`` repro (per Weston): a multitenant
connector ``tenant_a`` writes a point with ``payload={}`` directly
to the shared collection, then ``tenant_a.retrieve([id])`` returns
the row. Because ``upsert()`` always stamps the payload key, the
only way a missing-key row reaches the shared collection is tooling
drift / migration drift — exactly the case P1-V4 defense-in-depth
is supposed to catch.

Fix
---
Mode-specific semantics:

* **Multitenant mode** (shared physical collection): STRICT —
  every row MUST carry ``TENANT_PAYLOAD_KEY`` matching the
  connector's tenant id. No "no payload key → pass through"
  branch, because the shared collection means a missing key would
  expose the row to every tenant.
* **Legacy mode** (per-tenant physical collection, unchanged from
  initial commit): PERMISSIVE — a row that doesn't carry the
  payload key still passes through (typical pre-multitenant data
  shape), but a stray foreign-tenant payload gets dropped (catches
  tooling drift / migration mistakes).

Tests
-----
``test_retrieve_multitenant_mode_strict_requires_payload_key`` (new)
— Weston's exact repro: seed shared collection with ``{}`` payload
+ own-tenant payload + foreign-tenant payload, assert only the
own-tenant row passes through. The legacy-mode permissive
counterpart (``test_retrieve_legacy_mode_filters_stray_foreign_payload``)
stays unchanged so a future refactor that unifies them silently
re-opens the leak fails fast.

Local: ``uv run pytest tests/unit_test/vectorstore/`` →
**157 passed, 10 skipped** (one new case).

Sediment trigger
----------------
This is Lesson #12 v9 fifth-application demo same family — Weston
first-principles repro catches the unified branch as silent leak
that I missed when applying the legacy-compat optimization
uniformly. The narrower ``mode-specific`` framing matches the spec
language ("legacy compat for legacy mode only") more precisely.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…ty matrix (#1949)

* feat(collection): task #61 P1-D3 — vector backend identity + capability matrix

Project the deployment-wide ``settings.vector_db_type`` onto every
collection detail read so the FE can render a "what does this vector
backend actually support" panel without per-collection migration or
runtime probe.

Backend (output-only projection):
- ``aperag/schema/common.py``: ``VectorBackendCapabilities`` +
  ``VectorBackendInfo`` + ``_STATIC_VECTOR_BACKEND_CAPABILITIES`` dict +
  ``project_vector_backend_info()`` helper.
- ``aperag/domains/knowledge_base/schemas.py:Collection``: add
  ``vector_backend: Optional[VectorBackendInfo]``. **Intentionally NOT
  on ``CollectionConfig``** so the OpenAPI ``CollectionCreate`` /
  ``CollectionUpdate`` input shapes do not let callers mistake a
  deployment-wide setting for a per-collection editable knob (per
  dongdong msg=c2593fdd + PM msg=caf7e4df + architect msg=0044261f
  read-only projection lock).
- ``aperag/domains/knowledge_base/service/collection_service.py``:
  populate ``vector_backend`` in ``build_collection_response`` from
  ``settings.vector_db_type``; ``None`` for unknown backends so the FE
  can render a placeholder without a hard failure.

Cross-PR consistency with task #83 / PR #1948 (Bryce, vector adapter
behavior fixes):
- Bryce's connector-layer ``BACKEND_CAPABILITIES`` ClassVar declares 2
  truth flags (``supports_atomic_batch_upsert`` +
  ``supports_legacy_mode``); this PR's schema-layer Pydantic model
  mirrors those values plus a 3rd schema-layer-only flag
  ``supports_filter_or_with_empty_parts`` which is uniformly False
  across adapters after task #83 P1-V3 (translator-level
  defense-in-depth rejects empty Or parts).
- The 3rd flag stays in the schema so the FE can declare the uniform
  reject explicitly per spec § 2.3 P1-D3 「显示『允许差异但显式』」 —
  Lesson #17 backend 收敛 contract simple-stable family pattern (cite
  PR #1930 SearchHit normalize, PR #1935 GraphMergeSuggestionItem
  projection layer).

Mechanical gate (per Lesson #18 lesson-sediment + mechanical-gate 双
layer codification — first established by chenyexuan PR #1933 / PR
#1941, then PR #1940 ``model_validate`` boundary): 13-case unit suite
in ``tests/unit_test/contracts/test_vector_backend_capability_matrix.py``
pins each capability flag, normalizes inputs, and round-trips Pydantic
``model_dump`` so future drift between schema, projection helper, and
FE-consumed shape fails fast at unit-test time.

FE (read-only display):
- ``web/src/features/collection/types.ts``: typed mirrors
  ``VectorBackendInfo`` / ``VectorBackendCapabilities`` /
  ``VectorBackendType``.
- ``web/src/app/workspace/collections/[collectionId]/settings/collection-vector-backend-card.tsx``:
  new component that surfaces backend identity + capability matrix in
  the collection settings page (above the edit form). dongdong picks
  up rendering polish (responsive + dark mode + final copy) on the
  same PR per the joint A4-style split (cuiwenbo contract layer +
  dongdong rendering polish + CR pair).
- ``web/src/i18n/{en-US,zh-CN}/page_collections.json``: copy strings.
- ``web/src/api-v2/schema.d.ts`` regenerated via ``yarn api:v2:types``.

Local verification:
- ``uv run --extra test pytest tests/unit_test/contracts/test_vector_backend_capability_matrix.py tests/unit_test/contracts/test_collection_v2_openapi_contract.py -q`` → 23 passed
- ``make openapi-check`` → ok
- ``yarn type-check --pretty false`` → 0 new errors on this PR's files
  (pre-existing graph-lab cosmograph + agent-runtime errors unchanged)
- ``yarn lint --quiet`` → 0 warnings/errors
- ``yarn i18n:check`` → ok
- ``git diff --check`` → ok

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(collection): task #87 P1-D3 — convert vector_backend to computed_field

Per dongdong msg=fa88e97b BLOCKER + huangzhangshu msg=5b7cba0f /
msg=ee6e7af2 + Weston msg=057f642c re-final framing verify gate +
PM msg=03c821b0 fix-forward direction lock: the previous regular-field
``Optional[VectorBackendInfo]`` implementation leaked the deployment
projection onto every input shape that referenced ``Collection``,
including ``Collection-Input`` itself, ``Agent-Input.collections``,
and ``CreateTurnRequest.collections``. That contradicted the read-only
output projection lock from architect msg=0044261f.

Move ``Collection.vector_backend`` to a Pydantic v2 ``@computed_field``
property so OpenAPI input/output schemas auto-split:

- ``Collection-Output`` now lists ``vector_backend`` with
  ``readonly: true`` (verified in regenerated
  ``web/src/api-v2/schema.d.ts``).
- ``Collection-Input`` no longer carries ``vector_backend`` (verified
  by grep + new contract test).
- ``CollectionCreate`` / ``CollectionUpdate`` / ``Agent-Input.collections`` /
  ``CreateTurnRequest.collections`` all inherit the cleaned ``Collection-Input``,
  so the deployment-wide setting can no longer be passed as a
  per-collection override on agent / chat-turn requests.

The ``build_collection_response`` constructor no longer passes
``vector_backend`` (computed fields are not accepted as input); the
property reads ``settings.vector_db_type`` lazily on each serialization.

Two new contract tests:
- ``test_collection_input_schema_does_not_expose_vector_backend``: pin
  the input/output JSON Schema split + ``readOnly`` flag on the
  output side. Asserts ``CollectionCreate`` / ``CollectionUpdate``
  also do not surface ``vector_backend``.
- ``test_collection_constructor_ignores_vector_backend_input``:
  defensive — even if a malicious caller stuffs ``vector_backend``
  into a ``model_validate`` payload, Pydantic ignores it and the
  computed property still reflects the deployment setting.

Sediment: cuiwenbo own-up CR miss — implement-time only verified the
``CollectionConfig`` placement (one defense layer) and missed the
``Collection`` self-reuse-as-input second layer. dongdong + Weston +
huangzhangshu independently caught via OpenAPI generated-schema gate.
mini-pattern 19 layer 5 candidate: "Pydantic schema placement verify
must grep ``references Collection`` to catch input/output reuse risk,
not only direct form-input shape" (continuing the trust-framing-miss
family from PR #1935 / #1938 / #1940).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): consolidate vector_backend_capability_matrix imports for ruff

Combine the two from aperag.schema.common import ... statements
into a single block so ruff's import organization rule is satisfied.
No code-behavior change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): apply ruff format to vector_backend test + common.py

Run `uv run ruff format` on ApeRAG/aperag/schema/common.py and
ApeRAG/tests/unit_test/contracts/test_vector_backend_capability_matrix.py
so `make lint` (`ruff format --check`) passes. Pure formatting; no
behavior change. Other unrelated files reverted to keep this PR scope
clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…seed PG connection saturation fix (#1950)

Closes task #88 per PM @不穷 dispatch (msg=8f130f25). Implements task
#61 spec v1 § 2.4 P2-S1 + Planetegg P2-HIGH (msg=db7fb085 +
msg=1314ac59) + Singapore SRE diagnostic (Planetegg msg=4043adf4)
batch alias resolution.

Background
----------
``LineageGraphStoreWithAliasRedirect.expand_neighbors_n_hops`` is on
the ``GET /api/v2/collections/{id}/graphs`` and ``/graphs/hybrid``
read paths. Pre-fix, it called ``AliasMapRepository.resolve_canonical``
once per anchor name via ``asyncio.gather``, which checks out one
PG connection per name. Spec § 2.4 P2-S1 quantification:

* ``GET /graphs?max_nodes=1000`` → up to **2 × max_nodes = 2000**
  seeds.
* ``GET /graphs/hybrid``: default 1000 / max 5000 seeds.

At those cardinalities the PG connection pool saturates, observed
in Singapore production (Planetegg msg=4043adf4 SRE diagnostic).

Changes
-------
* ``aperag/graph_curation/alias_map.py``: new
  :meth:`AliasMapRepository.resolve_canonical_many` batch primitive.
  Single SQL ``SELECT alias_name, canonical_name FROM
  aperag_lineage_entity_alias WHERE collection_id=? AND alias_name
  IN (...)`` reads all matching rows in one shot. Names absent from
  the result set fall back to themselves (mirrors single-name
  ``resolve_canonical`` semantics). Empty / falsy names short-
  circuit without an SQL lookup. Total connections checked out: **1**
  per call regardless of seed count. Caller order is preserved on
  the dict iteration order (insertion order semantics).
* ``aperag/indexing/alias_redirect_store.py``: rewrite
  ``LineageGraphStoreWithAliasRedirect.expand_neighbors_n_hops`` to
  use the batch primitive. ``asyncio.gather`` per-name fan-out gone;
  ``import asyncio`` no longer needed at module top-level.
* Test stub ``_FakeAliasRepo`` in
  ``tests/unit_test/indexing/test_alias_redirect_store.py``: now
  implements both ``resolve_canonical`` (single, used by
  upsert/get/delete redirect paths) and ``resolve_canonical_many``
  (batch, used by expand) + tracks call counts so tests can pin the
  call-graph (i.e. expand path goes through batch primitive exactly
  once).

Tests
-----
* ``tests/unit_test/graph_curation/test_alias_map.py`` (7 new):
  - ``test_resolve_canonical_many_returns_self_for_unmapped_names``
  - ``test_resolve_canonical_many_mixed_alias_and_canonical``
  - ``test_resolve_canonical_many_dedupes_input``
  - ``test_resolve_canonical_many_empty_input``
  - ``test_resolve_canonical_many_handles_empty_string``
  - ``test_resolve_canonical_many_per_collection_isolation``
  - ``test_resolve_canonical_many_large_seed_cap`` (2000-name spec
    quantification — pinned correctness at the spec-cap so a future
    regression that re-introduces per-name fan-out either times out
    or breaks the result shape).
* ``tests/unit_test/indexing/test_alias_redirect_store.py`` (2 new):
  - ``test_expand_neighbors_uses_batch_alias_resolution`` —
    call-graph gate: exactly 1 ``resolve_canonical_many`` call,
    zero ``resolve_canonical`` calls, regardless of seed count. A
    regression that re-introduces the gather pattern is caught
    immediately.
  - ``test_expand_neighbors_large_seed_cap_uses_single_batch_call``
    — 2000-seed spec-cap pinned at the call-graph level.

Local: ``uv run pytest tests/unit_test/graph_curation/
tests/unit_test/indexing/test_alias_redirect_store.py`` →
**56 passed, 1 warning**.

Spec / scope alignment
----------------------
* task #61 spec v1 § 2.4 P2-S1 — batch resolve primitive ✅
* task #61 spec v1 § 2.4 P2-S2 — ``expand_neighbors_n_hops`` seed
  cap test ✅
* Lesson #17 backend 收敛 contract — single primitive replaces N-
  parallel fan-out at the same caller, no FE / API changes
  required ✅
* Lesson #18 mechanical gate codification — call-graph assertion in
  the redirect-store test is the mechanical gate (caught by CI on
  any future regression that bypasses the batch primitive) ✅

Follow-ups (NOT in this PR)
---------------------------
* P3 cross-cutting concern: every ``LineageGraphStore`` consumer
  that currently invokes the alias path per-name (e.g. some
  GraphCurationService internals) should migrate to the batch
  primitive — independent task gated on production data showing
  the residual N-fan-out is a real bottleneck.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
earayu2 directive (msg=718c79ba): @符炫炜 + @ziang thread 内合作审计 indexing
链路 + 数据库层 + 私有化大量/长文档场景,写详细方案报告。

v1 covers (符炫炜 own):
- §1 Parse 层:5 项瓶颈(B1-B5)+ P0-3a/b/c/P3-10 方案
- §2 Index 4-lane 调度:6 项瓶颈(C1-C6)+ P0-1/P0-2/P1-4/P2-6/P2-7 方案
- §3 DB 层:7 项瓶颈(D1-D7)+ P1-5a/P1-5b/P2-6/P3 方案
- §4 私有化部署:3 tier 配方 + P2-8 production preset
- §5 大量 + 长文档端到端瓶颈排序 + 提速估算(长文 5000-chunk 720s → 190s, 3.7×)
- §6 实施切片(Wave 1-4, 12 PR)
- §7 验证方式(每 PR 必带 boundary test + 回归压测 + CI gate)
- §8 依赖 + 风险

v2 follow-up (本 commit 不含):
- K8s prod 部署参数详细章节(HPA / PVC / leader-election)
- PG + KubeBlocks + pgbouncer 章节(@ziang 研究 kubeblocks-skills 后补)
- admin UI 可配化清单(dongdong 前端接入留 hook)
- §11-§14 ziang 补充章节(读路径 / cleanup / 端到端归因 / 联合验收)

main HEAD pin: eb4c4f3 (2026-04-30 18:46)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@apecloud-bot apecloud-bot added the size/XXL Denotes a PR that changes 1000+ lines. label May 2, 2026
@apecloud-bot
Copy link
Copy Markdown
Collaborator

This branch name is not following the standards: feature/|bugfix/|release/|hotfix/|support/|releasing/|dependabot/

per earayu2 thread directives:
- msg=caf5c760 / msg=4e9c909c: K8s 走 prod, docker-compose 仅 e2e
- msg=e6e4d366 / msg=2f9b062f: PG 用 KubeBlocks + pgbouncer (transaction pooling)
- msg=99c1d23a: 新可配参数考虑接入 admin UI

新增章节:
- §9 K8s prod 部署参数: resources requests/limits 3 tier 表 / HPA + KEDA queue depth
  triggers / leader-election 边界 (P1-Helm-3 Redis SETNX lease) / PVC 配置 / OBJECT_STORE
  multi-replica enforcement / PodDisruptionBudget / 监控告警 (process_resident_memory /
  queue depth / pg_stat_activity / vector store latency p99)

- §10 PG + KubeBlocks + pgbouncer: pooling mode 兼容性 audit checklist
  (prepared statements / SET LOCAL / advisory lock 全 ✅) / pgbouncer.ini 推荐参数
  (pool_mode=transaction, max_client_conn=500, default_pool_size=25) / KubeBlocks PG
  values 配套 / ApeRAG 侧改造 (pool 30 + pgbouncer 25 server / 4 replica = 120 client) /
  Helm 模板 P1-Helm-6 / 验证流程

- §11 admin UI 可配化清单: 类 A runtime perf (14 项强烈建议接入 IndexingSettings 卡片) /
  类 B collection-level (5 lane on/off + graph extractor concurrency) / 类 C infra ops
  (db pool / pgbouncer / Helm 资源 — 部署期参数不入 admin UI) / P2-Admin-1 IndexingSettings
  卡片 wireframe / P2-Admin-2 backend changes (env > DB settings 优先级) / hook 给
  @dongdong 前端接入

§12-§16 待 @ziang 补充: 读路径 / cleanup / 端到端归因 / KubeBlocks 研究 / 联合验收

main HEAD pin: eb4c4f3 (2026-04-30 18:46)
PR: #1954

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@earayu earayu closed this May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants