Skip to content

Commit 127b6d5

Browse files
committed
feat(runbook): expose ann calibration thresholds in workspace
1 parent 69466f6 commit 127b6d5

16 files changed

Lines changed: 288 additions & 22 deletions

docs/diataxis/en/explanation/development-progress-dashboard.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ It tracks what is already implemented, where the hard gaps remain, and how to ve
2222
- ANN-style prefilter, representation telemetry, circuit health, remote index sync, and live `external_http` connector proof now exist in `src/learning/queryBackend.ts` and `src/learning/vectorAccelerationAdapter.ts`,
2323
- runtime capability/runbook governance now includes explicit ANN remote index-sync health (`query_vector_acceleration_index_sync_health`) in addition to prefilter, health, traceability, and circuit checks,
2424
- `server.ts` now closes the corresponding operator loop: the index-sync gate participates in verification escalation, remediation action-queue generation, and per-check runbook history summaries,
25-
- the agent workspace runtime-runbook surfaces now render operator-facing ANN governance directly in the frontend shell: verify/checks now expose sync-health plus circuit-budget, traceability, and prefilter summaries, while action-queue keeps the index-sync incident drilldown,
25+
- the agent workspace runtime-runbook surfaces now render operator-facing ANN governance directly in the frontend shell: verify/checks now expose sync-health plus circuit-budget, traceability, and prefilter summaries, and they now also show threshold/signal drilldowns needed for calibration work, while action-queue keeps the index-sync incident drilldown,
2626
- the modular `src/routes/knowledge.ts` runtime-runbook surfaces now delegate to live server-side runbook ops with full query-parameter passthrough, so browser/runtime consumers no longer hit the old KLP placeholder payloads for verify/history/checks/action-queue/remediation/schedule flows,
27-
- the browser strict smoke gate now also proves those ANN runbook surfaces from real browser evidence: verify-card ANN sync/circuit/traceability/prefilter content, checks-card first-check ANN sync plus circuit/traceability/prefilter snapshots, and action-queue index-sync drilldown are now asserted end to end instead of remaining component-test-only,
27+
- the browser strict smoke gate now also proves those ANN runbook surfaces from real browser evidence: verify-card ANN sync/circuit/traceability/prefilter content plus threshold/signal labels, checks-card first-check ANN sync plus circuit/traceability/prefilter snapshots, and action-queue index-sync drilldown are now asserted end to end instead of remaining component-test-only,
2828
- locale governance for the agent workspace is now tighter on both static and runtime surfaces: bilingual locale bundles now cover the query/quality/runbook cards exercised by strict browser smoke, `src/agent_workspace.locale.contract.test.ts` blocks source-referenced `agentWorkspace.*` key drift, and startup-time translate helpers no longer emit false missing-key warnings before locale initialization finishes,
2929
- Phase-2 runtime diagnostics are now materially implemented in `src/learning/KnowledgeLearningPlatform.ts` for query-backend comparison/history/trend, knowledge staleness diagnostics/rebuild planning, learning-quality history/trend, session-plan quality evaluation/history/trend/runtime-threshold diagnostics, query-backend config, and query-backend diagnostics,
3030
- Phase-3 tutor/memory diagnostics remain real and now include an active default runtime tutor adapter path in `src/server.ts`, so normal server execution can emit adapter telemetry instead of staying catalog-only.
@@ -106,7 +106,7 @@ Current branch status for this slice:
106106
- CI now has an always-on strict desktop evidence job in `.github/workflows/migration-gates.yml` (`agent-workspace-tauri-strict-evidence`) that runs `verify:agent-workspace:tauri:rust:strict` and `verify:agent-workspace:tauri:window-evidence:strict` on Linux hosts with explicit `javascriptcoregtk-4.1` / `libsoup-3.0` dependencies, and release workflow `.github/workflows/release-desktop-multi-os.yml` now enforces the same strict evidence gate on the Linux desktop build path before bundle generation; both workflows also generate a strict evidence index (`verify:agent-workspace:tauri:evidence:index:strict`), enforce a strict evidence manifest gate (`verify:agent-workspace:tauri:evidence:manifest:strict`), and upload tauri evidence artifacts (retention policy pinned to 30 days) for audit traceability, while the Linux release path now publishes `release-fragment-latest.md` into GitHub Release notes using marker-based idempotent upsert,
107107
- migration workflow now also includes a dedicated always-on `agent-workspace-contract-gates` job that runs `test:agent-workspace:contracts` (parity/frontend/tauri contract suites) plus `test:conversation-turn-cache:durability` (restart durability check for turn-cache trend index/export consistency), closing the CI drift-detection gap for agent-workspace contract evolution,
108108
- license governance now adds `test:license:contract` to enforce `GPL-3.0-only` parity across `LICENSE`, `README`, `package.json`, and `src-tauri/Cargo.toml`, and this gate is wired into `migration-gates` CI to block license drift,
109-
- browser smoke now exercises real `conversation/path/query-compare/quality/session/runbook` backend slices (including trend + history diagnostics plus runbook verify/checks/action-queue), real graph runtime, and real path runtime, and now asserts ANN sync-health plus verify/checks circuit/traceability/prefilter card content from browser evidence before emitting screenshot/console/network-summary artifacts (`scripts/verify-agent-workspace-browser.js`, `src/agent_workspace.browser.contract.test.ts`),
109+
- browser smoke now exercises real `conversation/path/query-compare/quality/session/runbook` backend slices (including trend + history diagnostics plus runbook verify/checks/action-queue), real graph runtime, and real path runtime, and now asserts ANN sync-health plus verify/checks circuit/traceability/prefilter threshold/signal content from browser evidence before emitting screenshot/console/network-summary artifacts (`scripts/verify-agent-workspace-browser.js`, `src/agent_workspace.browser.contract.test.ts`),
110110
- scoped conversation-memory foundation is now wired end-to-end (typed contracts, backend normalizers/routes, capability operation registry, locale keys, lifecycle tests, browser/runtime verification) through `/api/knowledge/conversation-memory/{list,add,search,delete,feedback}` (`src/learning/api.ts`, `src/learning/types.ts`, `src/learning/KnowledgeLearningPlatform.ts`, `src/server.ts`, `src/frontend/agent_workspace.js`, `src/knowledge.api.contract.test.ts`, `src/learning/KnowledgeLearningPlatform.test.ts`, `src/agent_workspace.frontend.test.ts`),
111111
- unified turn streaming baseline is now delivered on `/api/knowledge/conversation` via `Accept: text/event-stream` negotiation with a minimal event set (`turn_started`/`capability_planned`/`capability_progress`/`capability_result`/`turn_completed`/`turn_failed`) and frontend stream-first + sync fallback behavior (`src/server.ts`, `src/frontend/agent_workspace.js`, `src/knowledge.api.contract.test.ts`, `src/agent_workspace.frontend.test.ts`),
112112
- M8.2 recovery semantics are now in place on top of the stream baseline: frontend requests propagate client turn IDs across stream-first + sync fallback, server route `/api/knowledge/conversation` now enforces replay-window idempotency with turn-level dedupe/conflict protection (`turn_id_conflict`), and resumed stream requests replay cached turn events instead of re-running execution (`src/server.ts`, `src/frontend/agent_workspace.js`, `src/knowledge.api.contract.test.ts`, `src/agent_workspace.frontend.test.ts`),
@@ -132,7 +132,7 @@ Current branch status for this slice:
132132
## Latest Validation Snapshot (2026-05-14)
133133

134134
- Reconfirmed on the current Windows host in this turn: `node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache`, `npm run test:agent-workspace:contracts`, `npm run build:with-vite`, `npm run docs:diataxis:check`, `npm run docs:site:build`, `NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_DYNAMIC_STRICT=1 node scripts/verify-agent-workspace-browser.js`.
135-
- The strict browser proof now explicitly verifies the bilingual runtime-runbook verify/checks ANN governance labels that were added in this slice: sync-health plus circuit, traceability, and prefilter summaries.
135+
- The strict browser proof now explicitly verifies the bilingual runtime-runbook verify/checks ANN governance labels that were added in this slice: sync-health plus circuit, traceability, and prefilter summaries, along with the threshold/signal drilldowns that support calibration work.
136136
- Tauri strict evidence is implementation-closed but still host-dependent:
137137
- the current Windows host proves non-strict tauri/runtime behavior and load-flow parity,
138138
- Linux strict evidence commands (`verify:agent-workspace:tauri:rust:strict`, `verify:agent-workspace:tauri:window-evidence:strict`, strict evidence index/manifest) still require provisioned `webkit2gtk-4.1`, `javascriptcoregtk-4.1`, and `libsoup-3.0`.

docs/diataxis/zh/explanation/development-progress-dashboard.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@
1212
- `src/learning/queryBackend.ts` / `src/learning/vectorAccelerationAdapter.ts` 现已具备 ANN 风格 prefilter、representation telemetry、circuit health、远端索引同步,以及 live `external_http` connector 证明,
1313
- runtime capability / runbook 治理也已新增显式的 ANN 远端索引同步健康度检查(`query_vector_acceleration_index_sync_health`),与 prefilter、health、traceability、circuit 并列,
1414
- `server.ts` 现已补齐对应的 operator 闭环:该 sync-health 门禁已经进入 verification escalation、remediation action queue、以及 per-check runbook history summary,
15-
- agent workspace 的 runtime-runbook 前端面现已把面向运维的 ANN 治理直接前推到壳层:verify/checks 不仅能看到 sync-health,还能看到熔断预算、可追踪性、预筛选摘要;action-queue 则继续承载 index-sync 事故钻取,
15+
- agent workspace 的 runtime-runbook 前端面现已把面向运维的 ANN 治理直接前推到壳层:verify/checks 不仅能看到 sync-health,还能看到熔断预算、可追踪性、预筛选摘要,并进一步看到用于校准的阈值/信号钻取;action-queue 则继续承载 index-sync 事故钻取,
1616
- modular `src/routes/knowledge.ts` 的 runtime-runbook 路由面现在也会委托到真实 server 侧 runbook ops,并完整透传 query 参数,因此浏览器/运行时消费者不再命中旧的 KLP placeholder verify/history/checks/action-queue/remediation/schedule 响应,
17-
- 浏览器 strict smoke 现在也会用真实浏览器证据证明这批 ANN runbook 面:verify 卡的 ANN sync/熔断/可追踪性/预筛选内容、checks 卡的首个检查 ANN sync 加熔断/可追踪性/预筛选快照,以及 action-queue 的 index-sync 钻取都已纳入端到端断言,而不再只停留在组件测试层,
17+
- 浏览器 strict smoke 现在也会用真实浏览器证据证明这批 ANN runbook 面:verify 卡的 ANN sync/熔断/可追踪性/预筛选内容及阈值/信号标签、checks 卡的首个检查 ANN sync 加熔断/可追踪性/预筛选快照,以及 action-queue 的 index-sync 钻取都已纳入端到端断言,而不再只停留在组件测试层,
1818
- agent workspace 的 locale 治理现在也更严了:双语 locale bundle 已补齐 strict browser smoke 实际触达的 query/quality/runbook 卡片文案,`src/agent_workspace.locale.contract.test.ts` 会阻断源码引用的 `agentWorkspace.*` key 漂移,而启动期 `translate()` 也不再在 locale 初始化完成前发出误报式 missing-key warning,
1919
- `src/learning/KnowledgeLearningPlatform.ts` 中的 Phase-2 运行时诊断面已接通真实实现,包括 query-backend comparison/history/trend、knowledge staleness diagnostics/rebuild planning、learning-quality history/trend、session-plan quality evaluate/history/trend/runtime-threshold diagnostics、query-backend config、query-backend diagnostics,
2020
- Phase-3 的导师/记忆诊断仍为真实实现,且 `src/server.ts` 现已注入默认激活态 tutor adapter,正常 server 路径可直接产出 adapter telemetry。
@@ -118,7 +118,7 @@
118118
- CI 已在 `.github/workflows/migration-gates.yml` 接入常态化 strict 桌面证据作业(`agent-workspace-tauri-strict-evidence`),会在 Linux 宿主安装 `javascriptcoregtk-4.1` / `libsoup-3.0` 依赖后执行 `verify:agent-workspace:tauri:rust:strict``verify:agent-workspace:tauri:window-evidence:strict`;release 流程 `.github/workflows/release-desktop-multi-os.yml` 也已在 Linux 桌面构建路径接入同等 strict 证据门禁(在 bundle 产物构建前执行);两条流程会额外生成 strict 证据索引(`verify:agent-workspace:tauri:evidence:index:strict`)、执行 strict 证据清单门禁(`verify:agent-workspace:tauri:evidence:manifest:strict`),并上传 tauri 证据工件用于审计追溯(保留期固定 30 天),同时 Linux release 链路会将 `release-fragment-latest.md` 通过 marker 幂等 upsert 写入 GitHub Release notes,
119119
- `migration-gates` 现已新增常态化 `agent-workspace-contract-gates` 作业:执行 `test:agent-workspace:contracts`(parity/frontend/tauri 三类契约套件)与 `test:conversation-turn-cache:durability`(turn-cache trend index/export 跨重启一致性检查),用于补齐 agent-workspace 合同演进的 CI 漂移阻断能力,
120120
- 协议治理已新增许可证一致性门禁:`test:license:contract` 会校验 `LICENSE``README``package.json``src-tauri/Cargo.toml` 在主线持续保持 `GPL-3.0-only`,并已接入 `migration-gates` CI 作业,
121-
- browser smoke 已覆盖真实 `conversation/path/query-compare/quality/session/runbook` 后端切片(含 trend + history 诊断,以及 runbook verify/checks/action-queue)、真实 graph runtime、真实 path runtime,并会在输出 screenshot / console / network-summary 证据前,先断言 verify/checks 中的 ANN sync-health + 熔断/可追踪性/预筛选内容,以及 action-queue 的 index-sync 钻取都已经真实渲染(`scripts/verify-agent-workspace-browser.js``src/agent_workspace.browser.contract.test.ts`),
121+
- browser smoke 已覆盖真实 `conversation/path/query-compare/quality/session/runbook` 后端切片(含 trend + history 诊断,以及 runbook verify/checks/action-queue)、真实 graph runtime、真实 path runtime,并会在输出 screenshot / console / network-summary 证据前,先断言 verify/checks 中的 ANN sync-health + 熔断/可追踪性/预筛选内容及阈值/信号标签,以及 action-queue 的 index-sync 钻取都已经真实渲染(`scripts/verify-agent-workspace-browser.js``src/agent_workspace.browser.contract.test.ts`),
122122
- scoped conversation-memory 基线已完成端到端接线(typed contract、后端 normalizer/route、前端 operation registry、双语键、生命周期测试、runtime/browser 验证),端点为 `/api/knowledge/conversation-memory/{list,add,search,delete,feedback}``src/learning/api.ts``src/learning/types.ts``src/learning/KnowledgeLearningPlatform.ts``src/server.ts``src/frontend/agent_workspace.js``src/knowledge.api.contract.test.ts``src/learning/KnowledgeLearningPlatform.test.ts``src/agent_workspace.frontend.test.ts`),
123123
- unified turn streaming 最小基线已落地:在 `/api/knowledge/conversation` 上通过 `Accept: text/event-stream` 协商输出事件流(`turn_started`/`capability_planned`/`capability_progress`/`capability_result`/`turn_completed`/`turn_failed`),前端采用 stream-first 并保留同步 JSON fallback(`src/server.ts``src/frontend/agent_workspace.js``src/knowledge.api.contract.test.ts``src/agent_workspace.frontend.test.ts`),
124124
- M8.2 恢复语义已落地:前端在 stream-first 与 sync fallback 间透传统一 `turnId``/api/knowledge/conversation` 已新增 turn 级重放窗口与去重/冲突保护(`turn_id_conflict`),中断后重试可回放缓存事件而不重复执行回合(`src/server.ts``src/frontend/agent_workspace.js``src/knowledge.api.contract.test.ts``src/agent_workspace.frontend.test.ts`),
@@ -144,7 +144,7 @@
144144
## 最新验证快照(2026-05-14)
145145

146146
- 本轮已在当前 Windows 宿主重新确认通过:`node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache``npm run test:agent-workspace:contracts``npm run build:with-vite``npm run docs:diataxis:check``npm run docs:site:build``NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_DYNAMIC_STRICT=1 node scripts/verify-agent-workspace-browser.js`
147-
- 严格浏览器证据现在已显式校验本轮新增的双语 runtime-runbook verify/checks ANN 治理标签:不仅验证 sync-health,也验证熔断、可追踪性、预筛选摘要。
147+
- 严格浏览器证据现在已显式校验本轮新增的双语 runtime-runbook verify/checks ANN 治理标签:不仅验证 sync-health,也验证熔断、可追踪性、预筛选摘要,以及支撑校准工作的阈值/信号钻取
148148
- Tauri strict 证据链在实现层面已经闭环,但仍受宿主依赖约束:
149149
- 当前 Windows 宿主已经证明 non-strict tauri/runtime 行为与 load-flow parity,
150150
- Linux strict 证据命令(`verify:agent-workspace:tauri:rust:strict``verify:agent-workspace:tauri:window-evidence:strict` 及 strict evidence index/manifest)仍要求宿主预装 `webkit2gtk-4.1``javascriptcoregtk-4.1``libsoup-3.0`

docs/en/TEST_REPORT.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,10 @@
4949
### What This Refresh Adds
5050

5151
1. The Phase-2 ANN governance slice is now operator-visible through the frontend runbook shell, not only backend JSON:
52-
- verify/checks now surface ANN sync-health, circuit-budget, traceability, and prefilter summaries,
52+
- verify/checks now surface ANN sync-health, circuit-budget, traceability, and prefilter summaries plus threshold/signal drilldowns,
5353
- action-queue continues to carry the index-sync incident drilldown.
54-
2. This refresh still does **not** prove release-grade Phase-2 closure:
54+
2. `query_vector_acceleration_prefilter_effectiveness` now shares the ANN fast-lane escalation path instead of using the slower generic escalation branch.
55+
3. This refresh still does **not** prove release-grade Phase-2 closure:
5556
- it closes visibility and browser/runtime proof for the new ANN governance summaries,
5657
- it does **not** close workload/threshold calibration for those budgets.
5758

docs/en/TODO.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@
1111
- [x] Default runtime graph backend is no longer `local-file-graphdb`: the server now targets embedded `graphdb/sqlite` with explicit file fallback.
1212
- [~] The new embedded `graphdb/sqlite` baseline is now restart-durability-proved, but packaged/runtime proof and heavier workload gates still remain before calling A8 production-closed.
1313
- [~] Phase-1 A9 now has a live `external_http` connector baseline with remote index sync and end-to-end query proof, but recall/latency thresholds and larger-workload validation still remain before production closure.
14-
- [x] Agent-workspace runbook verify/checks now surface ANN index-sync, circuit, traceability, and prefilter summaries, while action-queue keeps the index-sync incident drilldown.
14+
- [x] Agent-workspace runbook verify/checks now surface ANN index-sync, circuit, traceability, and prefilter summaries plus threshold/signal drilldowns, while action-queue keeps the index-sync incident drilldown.
15+
- [x] `query_vector_acceleration_prefilter_effectiveness` now shares the ANN fast-lane escalation path instead of lagging behind the other ANN governance checks.
1516
- [ ] Move the newly surfaced ANN governance budgets from visibility closure to workload/threshold calibration closure, then promote the new Phase-2 diagnostics to release-grade gates only after the same checks run on a release-grade graphdb/ANN baseline.
1617
- [ ] Extend tutor routing from the new local-first baseline into a production-proven multi-provider policy.
1718
- [ ] Continue FR-009 evidence freshness, Linux strict Tauri host provisioning, and final Electron decommission review.

0 commit comments

Comments
 (0)