Skip to content

Commit 516ca4c

Browse files
committed
feat(runbook): add ann calibration readiness gate
1 parent 5a09512 commit 516ca4c

12 files changed

Lines changed: 246 additions & 10 deletions

docs/diataxis/en/explanation/development-progress-dashboard.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ It tracks what is already implemented, where the hard gaps remain, and how to ve
2121
- the embedded sqlite baseline now also has restart-durability proof: shutdown closes the store cleanly, the adapter can reopen safely, and server integration covers ingest -> shutdown -> fresh module reload -> diagnostics/query/readiness continuity,
2222
- ANN-style prefilter, representation telemetry, circuit health, remote index sync, and live `external_http` connector proof now exist in `src/learning/queryBackend.ts` and `src/learning/vectorAccelerationAdapter.ts`,
2323
- runtime capability/runbook governance now includes explicit ANN remote index-sync health (`query_vector_acceleration_index_sync_health`) in addition to prefilter, health, traceability, and circuit checks,
24+
- runtime capability governance now also includes explicit gate `query_vector_acceleration_calibration_readiness`, which formalizes whether the ANN path is even ready for release-grade threshold tuning,
2425
- `server.ts` now closes the corresponding operator loop: the index-sync gate participates in verification escalation, remediation action-queue generation, and per-check runbook history summaries,
2526
- the agent workspace runtime-runbook surfaces now render operator-facing ANN governance directly in the frontend shell: verify/checks now expose sync-health plus circuit-budget, traceability, and prefilter summaries, and they now also show threshold/signal drilldowns plus calibration-readiness state needed for budget tuning work, while action-queue keeps the index-sync incident drilldown,
2627
- the modular `src/routes/knowledge.ts` runtime-runbook surfaces now delegate to live server-side runbook ops with full query-parameter passthrough, so browser/runtime consumers no longer hit the old KLP placeholder payloads for verify/history/checks/action-queue/remediation/schedule flows,
@@ -31,7 +32,7 @@ It tracks what is already implemented, where the hard gaps remain, and how to ve
3132
- What is not closed yet:
3233
- Phase-1 A8 has advanced beyond a file-only default: `src/server.ts` now defaults to `graphdb/sqlite` with explicit file fallback, and restart durability is already proved, but packaged/runtime proof and heavier-workload hardening are still open before calling the local graph backend production-closed,
3334
- Phase-1 A9 is now operational rather than scaffold-only, but recall/latency calibration and larger-workload validation are still open before calling the ANN layer production-closed,
34-
- Phase-2 quality/session/query observability is now real, but it is not yet release-closed because these gates still require release-grade calibration on top of the current graph/ANN operational baseline,
35+
- Phase-2 quality/session/query observability is now real, but it is not yet release-closed because these gates still require release-grade calibration on top of the current graph/ANN operational baseline; the new ANN calibration-readiness gate only formalizes prerequisites, not closure,
3536
- default tutor routing is no longer catalog-only, but the runtime is still effectively `local`-first and retains explicit rule-engine fallback rather than a production-proven multi-provider routing policy.
3637
- Active execution focus therefore shifts to truth-first foundation recovery:
3738
- finish the remaining packaged/runtime + heavier-workload closure for the embedded graph backend baseline,
@@ -131,7 +132,7 @@ Current branch status for this slice:
131132

132133
## Latest Validation Snapshot (2026-05-14)
133134

134-
- Reconfirmed on the current Windows host in this turn: `node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache`, `npm run test:agent-workspace:contracts`, `npm run build:with-vite`, `npm run docs:diataxis:check`, `npm run docs:site:build`, `NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_DYNAMIC_STRICT=1 node scripts/verify-agent-workspace-browser.js`.
135+
- Reconfirmed on the current Windows host in this turn: `node node_modules/jest/bin/jest.js src/learning/runtimeCapability.test.ts src/knowledge.api.contract.test.ts --runInBand --no-cache`, `node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache`, `npm run test:agent-workspace:contracts`, `npm run build:with-vite`, `npm run docs:diataxis:check`, `npm run docs:site:build`, `NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_DYNAMIC_STRICT=1 node scripts/verify-agent-workspace-browser.js`.
135136
- The strict browser proof now explicitly verifies the bilingual runtime-runbook verify/checks ANN governance labels that were added in this slice: sync-health plus circuit, traceability, and prefilter summaries, along with the threshold/signal drilldowns and calibration-readiness cues that support budget-tuning work.
136137
- Tauri strict evidence is implementation-closed but still host-dependent:
137138
- the current Windows host proves non-strict tauri/runtime behavior and load-flow parity,

docs/diataxis/zh/explanation/development-progress-dashboard.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
- embedded sqlite 基线现在还具备了重启耐久性证明:shutdown 会干净关闭 store,adapter 可安全重开,server integration 已覆盖 ingest -> shutdown -> fresh module reload -> diagnostics/query/readiness 连续性,
1212
- `src/learning/queryBackend.ts` / `src/learning/vectorAccelerationAdapter.ts` 现已具备 ANN 风格 prefilter、representation telemetry、circuit health、远端索引同步,以及 live `external_http` connector 证明,
1313
- runtime capability / runbook 治理也已新增显式的 ANN 远端索引同步健康度检查(`query_vector_acceleration_index_sync_health`),与 prefilter、health、traceability、circuit 并列,
14+
- runtime capability 治理现在也新增了显式门禁 `query_vector_acceleration_calibration_readiness`,用来正式回答当前 ANN 路径是否已经具备进入发布级阈值校准的前提条件,
1415
- `server.ts` 现已补齐对应的 operator 闭环:该 sync-health 门禁已经进入 verification escalation、remediation action queue、以及 per-check runbook history summary,
1516
- agent workspace 的 runtime-runbook 前端面现已把面向运维的 ANN 治理直接前推到壳层:verify/checks 不仅能看到 sync-health,还能看到熔断预算、可追踪性、预筛选摘要,并进一步看到用于校准的阈值/信号钻取和校准就绪态;action-queue 则继续承载 index-sync 事故钻取,
1617
- modular `src/routes/knowledge.ts` 的 runtime-runbook 路由面现在也会委托到真实 server 侧 runbook ops,并完整透传 query 参数,因此浏览器/运行时消费者不再命中旧的 KLP placeholder verify/history/checks/action-queue/remediation/schedule 响应,
@@ -21,7 +22,7 @@
2122
- 当前仍未闭环的部分:
2223
- Phase-1 A8 已经超出 file-only 默认态:`src/server.ts` 现在默认走 `graphdb/sqlite` 并保留显式 file fallback,且重启耐久性已证明;但在宣布本地图后端达到生产闭环之前,packaged/runtime 证明与更重工作负载级加固仍未完成;
2324
- Phase-1 A9 现已进入 operational baseline,而不再只是 scaffolding:但在宣布 ANN 层达到生产闭环前,仍需补齐 recall/latency 校准与更大工作负载验证;
24-
- Phase-2 的 quality/session/query 可观测性已不再是空占位,但它们仍需要建立在当前 graph/ANN operational baseline 之上的发布级校准,因此还不能宣称发布级闭环;
25+
- Phase-2 的 quality/session/query 可观测性已不再是空占位,但它们仍需要建立在当前 graph/ANN operational baseline 之上的发布级校准,因此还不能宣称发布级闭环;新的 ANN calibration-readiness gate 只是把前提条件正式化,并不等于校准完成;
2526
- 默认 tutor routing 已不再只是 catalog-only,但当前 runtime 仍是 `local`-first,并保留显式 rule-engine fallback,而不是已验证的生产级多 provider 路由策略。
2627
- 因此当前活跃重心不是“默认认为 Phase-1 已完成然后推进上层”,而是:
2728
1. 先补完 embedded graph backend 基线剩余的 packaged/runtime + 更重工作负载闭环,
@@ -143,7 +144,7 @@
143144

144145
## 最新验证快照(2026-05-14)
145146

146-
- 本轮已在当前 Windows 宿主重新确认通过:`node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache``npm run test:agent-workspace:contracts``npm run build:with-vite``npm run docs:diataxis:check``npm run docs:site:build``NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_DYNAMIC_STRICT=1 node scripts/verify-agent-workspace-browser.js`
147+
- 本轮已在当前 Windows 宿主重新确认通过:`node node_modules/jest/bin/jest.js src/learning/runtimeCapability.test.ts src/knowledge.api.contract.test.ts --runInBand --no-cache``node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache``npm run test:agent-workspace:contracts``npm run build:with-vite``npm run docs:diataxis:check``npm run docs:site:build``NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_STRICT=1 NOTE_CONNECTION_AGENT_WORKSPACE_BROWSER_UI_DYNAMIC_STRICT=1 node scripts/verify-agent-workspace-browser.js`
147148
- 严格浏览器证据现在已显式校验本轮新增的双语 runtime-runbook verify/checks ANN 治理标签:不仅验证 sync-health,也验证熔断、可追踪性、预筛选摘要,以及支撑校准工作的阈值/信号钻取和校准就绪态。
148149
- Tauri strict 证据链在实现层面已经闭环,但仍受宿主依赖约束:
149150
- 当前 Windows 宿主已经证明 non-strict tauri/runtime 行为与 load-flow parity,

docs/en/TEST_REPORT.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@
3535

3636
- [x] `node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache`
3737
- PASS
38+
- [x] `node node_modules/jest/bin/jest.js src/learning/runtimeCapability.test.ts src/knowledge.api.contract.test.ts --runInBand --no-cache`
39+
- PASS
3840
- [x] `npm run test:agent-workspace:contracts`
3941
- PASS
4042
- [x] `npm run build:with-vite`
@@ -53,7 +55,8 @@
5355
- they now also surface ANN circuit budget flags and prefilter calibration-readiness cues,
5456
- action-queue continues to carry the index-sync incident drilldown.
5557
2. `query_vector_acceleration_prefilter_effectiveness` now shares the ANN fast-lane escalation path instead of using the slower generic escalation branch.
56-
3. This refresh still does **not** prove release-grade Phase-2 closure:
58+
3. Runtime capability governance now has explicit gate `query_vector_acceleration_calibration_readiness`, which fails or warns until the ANN path has representative sync/prefilter/traceability/stability telemetry in the same runtime window.
59+
4. This refresh still does **not** prove release-grade Phase-2 closure:
5760
- it closes visibility and browser/runtime proof for the new ANN governance summaries,
5861
- it does **not** close workload/threshold calibration for those budgets.
5962

docs/en/TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
- [x] Agent-workspace runbook verify/checks now surface ANN index-sync, circuit, traceability, and prefilter summaries plus threshold/signal drilldowns, while action-queue keeps the index-sync incident drilldown.
1515
- [x] `query_vector_acceleration_prefilter_effectiveness` now shares the ANN fast-lane escalation path instead of lagging behind the other ANN governance checks.
1616
- [x] Agent-workspace runbook verify/checks now also surface ANN circuit budget flags and prefilter calibration-readiness cues, so budget tuning no longer depends on raw JSON inspection.
17+
- [x] Runtime capability matrix/runbook now has explicit gate `query_vector_acceleration_calibration_readiness` to formalize whether ANN threshold tuning can start.
1718
- [ ] Move the newly surfaced ANN governance budgets from visibility closure to workload/threshold calibration closure, then promote the new Phase-2 diagnostics to release-grade gates only after the same checks run on a release-grade graphdb/ANN baseline.
1819
- [ ] Extend tutor routing from the new local-first baseline into a production-proven multi-provider policy.
1920
- [ ] Continue FR-009 evidence freshness, Linux strict Tauri host provisioning, and final Electron decommission review.

docs/en/implementation_plan.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Bring code truth, active progress docs, and next execution order back into align
2020
- `src/query_backend.external_http.integration.test.ts` now proves a live `external_http` connector path end to end: ingest -> remote index sync -> query -> diagnostics.
2121
- runtime capability governance now treats ANN remote index sync as a first-class check: `query_vector_acceleration_index_sync_health` is emitted in the matrix/runbook alongside health, traceability, prefilter, and circuit checks.
2222
- `server.ts` now closes the runbook loop for that new gate: ANN index-sync health is included in verification escalation, remediation action-queue generation, and per-check history summaries.
23+
- runtime capability governance now also has an explicit ANN calibration prerequisite gate: `query_vector_acceleration_calibration_readiness` blocks release-grade threshold tuning until sync telemetry, stable connector state, prefilter sample readiness, evaluable candidate ratios, and external traceability signals are all present in the same runtime window.
2324
- the agent workspace runtime runbook surfaces now expose ANN sync-health metrics across verify/checks/action-queue flows, and the verify/checks cards now also surface ANN circuit-budget, traceability, and prefilter summaries plus threshold/signal drilldowns and calibration-readiness state, so operator-facing governance no longer stops at `index_sync_health`.
2425
- modular knowledge-route wiring for `runtime-capability-runbook/*` is now backed by live server-side runbook ops instead of KLP placeholder payloads, and the route layer now preserves `checkId` / `sinceMinutes` / queue-filter query params rather than dropping them.
2526
- the real browser smoke gate now proves those verify/checks/action-queue surfaces end to end: strict browser evidence must show the ANN sync-health verify card, the new verify/checks ANN circuit/traceability/prefilter drilldowns, the first-check ANN sync metric, and the index-sync action-queue drilldown instead of only proving that the cards can open.
@@ -37,7 +38,7 @@ Bring code truth, active progress docs, and next execution order back into align
3738
|---|---|---|---|
3839
| Phase-1 A8 graph backend | production-grade local graph backend | ops semantics exist, default runtime now targets embedded `graphdb/sqlite` with explicit file fallback, and restart durability is integration-proved; packaged/runtime proof and heavier-workload hardening are still open | Operational baseline |
3940
| Phase-1 A9 ANN connector | production-grade ANN connector | `external_http` now supports remote index sync plus live end-to-end query proof under strict failure/representation semantics, but recall/latency calibration and larger-workload validation are still open | Operational baseline |
40-
| Phase-2 quality gates | live mastery/divergence quality trend gates | query-backend comparison, staleness, learning-quality, and session-plan-quality runtime surfaces are now live in `KnowledgeLearningPlatform.ts`; operator-facing ANN governance now surfaces index-sync, circuit, traceability, and prefilter summaries plus threshold/signal drilldowns and calibration-readiness cues through runbook verify/checks, but the full gate set still needs release-grade calibration on top of the current graph/ANN operational baseline | Operational baseline |
41+
| Phase-2 quality gates | live mastery/divergence quality trend gates | query-backend comparison, staleness, learning-quality, and session-plan-quality runtime surfaces are now live in `KnowledgeLearningPlatform.ts`; operator-facing ANN governance now surfaces index-sync, circuit, traceability, and prefilter summaries plus threshold/signal drilldowns and calibration-readiness cues through runbook verify/checks, and runtime now carries explicit gate `query_vector_acceleration_calibration_readiness`, but the full gate set still needs release-grade calibration on top of the current graph/ANN operational baseline | Operational baseline |
4142
| Phase-3 tutor + memory | tutor and memory operating layer becomes real | tutor telemetry/trace/provider trends + conversation memory + memory-policy diagnostics are real, and default runtime now injects a local tutor adapter; production-proven multi-provider routing is still open | Operational baseline |
4243
| Architecture compaction | major monoliths reduced to sustainable size | `server.ts` 14,992, `KnowledgeLearningPlatform.ts` 7,706, `path_app.js` 4,649, `app.js` 4,713, `routes/knowledge.ts` 690 | Open |
4344

docs/zh/TEST_REPORT.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@
3535

3636
- [x] `node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache`
3737
- 通过
38+
- [x] `node node_modules/jest/bin/jest.js src/learning/runtimeCapability.test.ts src/knowledge.api.contract.test.ts --runInBand --no-cache`
39+
- 通过
3840
- [x] `npm run test:agent-workspace:contracts`
3941
- 通过
4042
- [x] `npm run build:with-vite`
@@ -53,7 +55,8 @@
5355
- 也会进一步展示 ANN 熔断预算标志与预筛选校准就绪态,
5456
- action-queue 继续承载 index-sync 事故钻取。
5557
2. `query_vector_acceleration_prefilter_effectiveness` 现已进入 ANN 快速升级路径,不再沿用较慢的通用升级分支。
56-
3. 这轮刷新仍然**不等于**发布级 Phase-2 闭环:
58+
3. runtime capability 治理现在已经具备显式门禁 `query_vector_acceleration_calibration_readiness`,会在 ANN 路径尚未形成同窗口 sync/prefilter/traceability/stability 代表性遥测时直接给出 fail/warn。
59+
4. 这轮刷新仍然**不等于**发布级 Phase-2 闭环:
5760
- 它闭合的是新 ANN 治理摘要的可见性与 browser/runtime 证明,
5861
-**没有**闭合这些预算的工作负载/阈值校准。
5962

docs/zh/TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
- [x] agent-workspace 的 runbook verify/checks 现已显式暴露 ANN index-sync、熔断、可追踪性、预筛选摘要及阈值/信号钻取,action-queue 继续承载 index-sync 事故钻取。
88
- [x] `query_vector_acceleration_prefilter_effectiveness` 现已进入 ANN 快速升级路径,不再落后于其他 ANN 治理检查。
99
- [x] agent-workspace 的 runbook verify/checks 现已进一步显式暴露 ANN 熔断预算标志与预筛选校准就绪态,预算调优不再依赖人工翻 raw JSON。
10+
- [x] runtime capability matrix/runbook 现已具备显式门禁 `query_vector_acceleration_calibration_readiness`,正式约束 ANN 阈值校准何时可以开始。
1011
- [ ] 先把这批新暴露出来的 ANN 治理预算从“可见”推进到“可校准”,再在同一套检查运行在发布级 graphdb/ANN 基线之上后,把新的 Phase-2 诊断面升级为发布级门禁。
1112

1213
- [x] agent-workspace 的 browser/runtime/Tauri 验证闭环已经是真实状态。

0 commit comments

Comments
 (0)