Skip to content

Commit cd67781

Browse files
committed
feat(learning): measure backend sufficiency query telemetry
1 parent c12a4ca commit cd67781

11 files changed

Lines changed: 256 additions & 22 deletions

docs/diataxis/en/explanation/development-progress-dashboard.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@ Execution anchor:
5555
- `scripts/backend-baseline-sufficiency-utils.js`
5656
- `scripts/verify-backend-baseline-sufficiency.js`
5757
- `src/backend.baseline.sufficiency.contract.test.ts`
58+
- Upgraded the first real measured observation path:
59+
- runtime query traffic now promotes sufficiency query-latency observation from `policy_default` to `measured`,
60+
- the warm-query threshold now carries `observedValue` when runtime telemetry exists,
61+
- docs/verifier output now expose query observation mode and latest persistence event timestamps.
5862
- Locked the current architectural posture to an evidence-first rule:
5963
- local sqlite + deterministic ANN remains the default backend baseline,
6064
- heavier backend work now needs explicit escalation triggers,
@@ -64,7 +68,7 @@ Execution anchor:
6468
- `mkdocs.yml`
6569
- EN/ZH Diataxis overview progress entry points.
6670
- Changed the next bounded backend-governance slice:
67-
- replace `policy_default` thresholds with measured capture rather than redoing initial route/report wiring.
71+
- extend measurement beyond query latency into rebuild/recovery duration capture rather than redoing initial route/report wiring.
6872

6973
## Latest Mainline Increment (2026-04-14 M1 Baseline)
7074

docs/diataxis/en/explanation/local-backend-sufficiency-and-escalation-plan.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ This decision is no longer prose-only.
3636

3737
Important limitation:
3838

39-
- current evidence mode is still `policy_default`, not measured performance capture.
39+
- top-level evidence mode is still `policy_default`, but query-latency observation now upgrades to `measured` when runtime query history exists.
4040

4141
That means the next backend-governance step is **measurement hardening**, not backend replacement.
4242

@@ -223,8 +223,8 @@ That is done.
223223

224224
The remaining slice is:
225225

226-
- replace `policy_default` thresholds with measured capture where feasible,
227-
- add explicit rebuild / recovery / query-latency observation fields,
226+
- replace the remaining `policy_default` thresholds with measured capture where feasible,
227+
- extend measurement beyond query latency into rebuild / recovery duration fields,
228228
- keep escalation tied to measured pressure instead of architectural anxiety.
229229

230230
## Recommended Next Lane

docs/diataxis/zh/explanation/development-progress-dashboard.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@
5555
- `scripts/backend-baseline-sufficiency-utils.js`
5656
- `scripts/verify-backend-baseline-sufficiency.js`
5757
- `src/backend.baseline.sufficiency.contract.test.ts`
58+
- 已补上第一条真实 measured observation 路径:
59+
- 运行时 query 流量会把 sufficiency query-latency observation 从 `policy_default` 升级为 `measured`
60+
- warm-query threshold 在存在运行时 telemetry 时会携带 `observedValue`
61+
- docs/verifier 输出现在会暴露 query observation mode 与最近 persistence event 时间戳。
5862
- 已把当前架构姿态锁定为证据优先规则:
5963
- local sqlite + 确定性 ANN 继续作为默认 backend baseline,
6064
- 更重 backend 工作必须提供显式 escalation trigger,
@@ -64,7 +68,7 @@
6468
- `mkdocs.yml`
6569
- EN/ZH Diataxis overview 的进度入口。
6670
- 已更新下一段有界 backend governance 切片:
67-
- `policy_default` 阈值升级为 measured capture,而不是重复补第一版 route/report。
71+
- 把测量范围从 query latency 继续扩展到 rebuild/recovery duration,而不是重复补第一版 route/report。
6872

6973
## 主线最新增量(2026-04-14 M1 基线)
7074

docs/diataxis/zh/explanation/local-backend-sufficiency-and-escalation-plan.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636

3737
当前限制:
3838

39-
- 当前 evidence mode 仍然是 `policy_default`还不是测量驱动的性能采集
39+
- 顶层 evidence mode 仍然是 `policy_default`但 query-latency observation 已可在运行时产生查询历史后升级为 `measured`
4040

4141
所以,下一步 backend governance 工作应是**测量加固**,不是 backend replacement。
4242

@@ -222,8 +222,8 @@
222222

223223
剩余切片是:
224224

225-
- 在可行处用测量采集替换 `policy_default` 阈值,
226-
- 新增明确的 rebuild / recovery / query-latency observation 字段,
225+
- 在剩余维度上继续用测量采集替换 `policy_default` 阈值,
226+
- 把测量范围从 query latency 扩展到 rebuild / recovery duration 字段,
227227
- 保证升级依然由测量压力驱动,而不是由架构焦虑驱动。
228228

229229
## 推荐下一条主线

scripts/backend-baseline-sufficiency-utils.js

Lines changed: 75 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@ function fileExists(filePath) {
2929
}
3030
}
3131

32+
function isNonEmptyString(value) {
33+
return typeof value === 'string' && value.trim().length > 0;
34+
}
35+
3236
function createThreshold(thresholdId, unit, target, summary) {
3337
return {
3438
thresholdId,
@@ -64,6 +68,14 @@ function evaluateBackendBaselineSufficiency(options = {}) {
6468
repoRoot,
6569
provenance,
6670
});
71+
const runtimeState = options.runtimeState || {
72+
queryCount: 0,
73+
queryP95Ms: undefined,
74+
queryAverageMs: undefined,
75+
queryMaxMs: undefined,
76+
evidenceMode: 'policy_default',
77+
};
78+
const storeDiagnostics = options.storeDiagnostics || {};
6779
const packageJson = readJsonIfExists(path.join(repoRoot, 'package.json'));
6880
const scripts = packageJson.scripts || {};
6981

@@ -116,14 +128,27 @@ function evaluateBackendBaselineSufficiency(options = {}) {
116128
'Local backup, restore, and deterministic rebuild remain acceptable operator tactics.',
117129
'Product value is currently gated more by conversation/memory/operator surfaces than by backend distribution.',
118130
];
131+
const measuredQueryLatency =
132+
Number(runtimeState.queryCount || 0) > 0
133+
&& Number.isFinite(runtimeState.queryP95Ms)
134+
&& Number(runtimeState.queryP95Ms) >= 0;
135+
const queryLatencyObservationMode = measuredQueryLatency ? 'measured' : 'policy_default';
136+
const persistenceObservationMode =
137+
isNonEmptyString(storeDiagnostics.lastLoadAt) || isNonEmptyString(storeDiagnostics.lastSaveAt)
138+
? 'observed_event'
139+
: 'policy_default';
119140

120141
const thresholds = [
121-
createThreshold(
122-
'warm_query_p95_ms',
123-
'ms',
124-
300,
125-
'Hold local while warm interactive query latency stays within a 300 ms p95 budget on a reference device.'
126-
),
142+
{
143+
...createThreshold(
144+
'warm_query_p95_ms',
145+
'ms',
146+
300,
147+
'Hold local while warm interactive query latency stays within a 300 ms p95 budget on a reference device.'
148+
),
149+
observationStatus: queryLatencyObservationMode,
150+
observedValue: measuredQueryLatency ? Number(runtimeState.queryP95Ms) : undefined,
151+
},
127152
createThreshold(
128153
'cold_query_p95_ms',
129154
'ms',
@@ -149,6 +174,7 @@ function evaluateBackendBaselineSufficiency(options = {}) {
149174
foundationReadiness.status === 'integrated'
150175
&& foundationReadiness.decision === 'go'
151176
&& foundationReadiness.promotionCriteriaPassed === foundationReadiness.promotionCriteriaTotal;
177+
const queryLatencyWithinThreshold = !measuredQueryLatency || Number(runtimeState.queryP95Ms) <= 300;
152178

153179
const dimensions = [
154180
createDimension(
@@ -159,8 +185,12 @@ function evaluateBackendBaselineSufficiency(options = {}) {
159185
),
160186
createDimension(
161187
'query_latency',
162-
foundationReadiness.baseline.vectorAdapterIndependent && foundationReadiness.baseline.queryBackendDefaultMode === 'local_hybrid',
163-
'Local hybrid query + ANN acceleration is present; no measured repo evidence currently forces backend escalation.',
188+
foundationReadiness.baseline.vectorAdapterIndependent
189+
&& foundationReadiness.baseline.queryBackendDefaultMode === 'local_hybrid'
190+
&& queryLatencyWithinThreshold,
191+
measuredQueryLatency
192+
? `Measured runtime query p95 is ${Number(runtimeState.queryP95Ms).toFixed(4)} ms across ${Number(runtimeState.queryCount || 0)} recorded queries.`
193+
: 'Local hybrid query + ANN acceleration is present; no measured repo evidence currently forces backend escalation.',
164194
'Escalate when measured warm or cold p95 latency stays above target after local tuning.'
165195
),
166196
createDimension(
@@ -219,6 +249,15 @@ function evaluateBackendBaselineSufficiency(options = {}) {
219249
);
220250
}
221251

252+
if (measuredQueryLatency && Number(runtimeState.queryP95Ms) > 300) {
253+
activeTriggers.push(
254+
createTrigger(
255+
'query_latency_threshold_exceeded',
256+
`Measured runtime query p95 ${Number(runtimeState.queryP95Ms).toFixed(4)} ms exceeds the current warm-query target of 300 ms.`
257+
)
258+
);
259+
}
260+
222261
const sufficient = activeTriggers.length === 0;
223262

224263
if (sufficient) {
@@ -268,6 +307,24 @@ function evaluateBackendBaselineSufficiency(options = {}) {
268307
packageScripts: {
269308
sufficiencyVerifierPresent,
270309
},
310+
observations: {
311+
queryLatency: {
312+
queryCount: Number(runtimeState.queryCount || 0),
313+
queryP95Ms: measuredQueryLatency ? Number(runtimeState.queryP95Ms) : undefined,
314+
queryAverageMs: measuredQueryLatency && Number.isFinite(runtimeState.queryAverageMs)
315+
? Number(runtimeState.queryAverageMs)
316+
: undefined,
317+
queryMaxMs: measuredQueryLatency && Number.isFinite(runtimeState.queryMaxMs)
318+
? Number(runtimeState.queryMaxMs)
319+
: undefined,
320+
evidenceMode: queryLatencyObservationMode,
321+
},
322+
persistence: {
323+
lastLoadAt: isNonEmptyString(storeDiagnostics.lastLoadAt) ? storeDiagnostics.lastLoadAt : undefined,
324+
lastSaveAt: isNonEmptyString(storeDiagnostics.lastSaveAt) ? storeDiagnostics.lastSaveAt : undefined,
325+
evidenceMode: persistenceObservationMode,
326+
},
327+
},
271328
provenance,
272329
mandatoryChecks: [
273330
{
@@ -357,6 +414,16 @@ function formatBackendBaselineSufficiencyMarkdown(result, reportPaths) {
357414
lines.push('', '## Package Scripts', '');
358415
lines.push(`- Sufficiency verifier present: ${result.packageScripts.sufficiencyVerifierPresent ? 'yes' : 'no'}`);
359416

417+
lines.push('', '## Observations', '');
418+
lines.push(`- Query latency evidence mode: ${result.observations.queryLatency.evidenceMode}`);
419+
lines.push(`- Query count: ${result.observations.queryLatency.queryCount}`);
420+
lines.push(`- Query p95 ms: ${typeof result.observations.queryLatency.queryP95Ms === 'number' ? result.observations.queryLatency.queryP95Ms : 'unmeasured'}`);
421+
lines.push(`- Query average ms: ${typeof result.observations.queryLatency.queryAverageMs === 'number' ? result.observations.queryLatency.queryAverageMs : 'unmeasured'}`);
422+
lines.push(`- Query max ms: ${typeof result.observations.queryLatency.queryMaxMs === 'number' ? result.observations.queryLatency.queryMaxMs : 'unmeasured'}`);
423+
lines.push(`- Persistence evidence mode: ${result.observations.persistence.evidenceMode}`);
424+
lines.push(`- Last load at: ${result.observations.persistence.lastLoadAt || 'unobserved'}`);
425+
lines.push(`- Last save at: ${result.observations.persistence.lastSaveAt || 'unobserved'}`);
426+
360427
lines.push('', '## Provenance', '');
361428
lines.push(`- Repo root source: ${result.provenance.repoRootSource}`);
362429
lines.push(`- Runtime project root aligned: ${result.provenance.runtimeProjectRootAligned ? 'yes' : 'no'}`);

scripts/verify-backend-baseline-sufficiency.js

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,17 @@ function printHumanReport(result, reportPaths) {
3333
console.log(
3434
`[Backend Sufficiency] Sufficiency verifier present: ${result.packageScripts.sufficiencyVerifierPresent ? 'yes' : 'no'}`
3535
);
36+
console.log(`[Backend Sufficiency] Query latency evidence mode: ${result.observations.queryLatency.evidenceMode}`);
37+
console.log(`[Backend Sufficiency] Query count: ${result.observations.queryLatency.queryCount}`);
38+
console.log(
39+
`[Backend Sufficiency] Query p95 ms: ${typeof result.observations.queryLatency.queryP95Ms === 'number' ? result.observations.queryLatency.queryP95Ms : 'unmeasured'}`
40+
);
41+
console.log(
42+
`[Backend Sufficiency] Last load at: ${result.observations.persistence.lastLoadAt || 'unobserved'}`
43+
);
44+
console.log(
45+
`[Backend Sufficiency] Last save at: ${result.observations.persistence.lastSaveAt || 'unobserved'}`
46+
);
3647
console.log(`[Backend Sufficiency] Repo root source: ${result.provenance.repoRootSource}`);
3748
console.log(
3849
`[Backend Sufficiency] Runtime project root aligned: ${result.provenance.runtimeProjectRootAligned ? 'yes' : 'no'}`

src/backend.baseline.sufficiency.contract.test.ts

Lines changed: 38 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ type PackageJson = {
99
type BackendBaselineSufficiencyResult = {
1010
status: 'sufficient' | 'escalation_required';
1111
decision: 'hold_local' | 'escalate';
12-
evidenceMode: 'policy_default' | 'measured';
12+
evidenceMode: 'policy_default' | 'measured' | 'observed_event';
1313
baseline: {
1414
foundationStatus: 'planned' | 'in_progress' | 'integrated';
1515
foundationDecision: 'go' | 'no-go';
@@ -32,7 +32,9 @@ type BackendBaselineSufficiencyResult = {
3232
comparator: '<=';
3333
unit: 'ms' | 'minutes';
3434
target: number;
35-
observationStatus: 'policy_default' | 'measured';
35+
observationStatus: 'policy_default' | 'measured' | 'observed_event';
36+
observedValue?: number;
37+
observedAt?: string;
3638
}>;
3739
dimensions: Array<{
3840
dimensionId:
@@ -43,10 +45,14 @@ type BackendBaselineSufficiencyResult = {
4345
| 'recovery_ops'
4446
| 'observability';
4547
sufficient: boolean;
46-
evidenceMode: 'policy_default' | 'measured';
48+
evidenceMode: 'policy_default' | 'measured' | 'observed_event';
4749
}>;
4850
activeTriggers: Array<{
49-
triggerId: 'foundation_not_integrated' | 'sufficiency_docs_not_aligned' | 'sufficiency_verifier_missing';
51+
triggerId:
52+
| 'foundation_not_integrated'
53+
| 'sufficiency_docs_not_aligned'
54+
| 'sufficiency_verifier_missing'
55+
| 'query_latency_threshold_exceeded';
5056
summary: string;
5157
}>;
5258
documents: {
@@ -56,6 +62,20 @@ type BackendBaselineSufficiencyResult = {
5662
packageScripts: {
5763
sufficiencyVerifierPresent: boolean;
5864
};
65+
observations: {
66+
queryLatency: {
67+
queryCount: number;
68+
queryP95Ms?: number;
69+
queryAverageMs?: number;
70+
queryMaxMs?: number;
71+
evidenceMode: 'policy_default' | 'measured' | 'observed_event';
72+
};
73+
persistence: {
74+
lastLoadAt?: string;
75+
lastSaveAt?: string;
76+
evidenceMode: 'policy_default' | 'measured' | 'observed_event';
77+
};
78+
};
5979
provenance: {
6080
repoRootSource: 'explicit' | 'project_root_hint' | 'cwd' | 'module_root';
6181
runtimeProjectRootAligned: boolean;
@@ -185,6 +205,18 @@ describe('backend baseline sufficiency contract', () => {
185205
expect(result.packageScripts).toEqual({
186206
sufficiencyVerifierPresent: true,
187207
});
208+
expect(result.observations.queryLatency).toEqual({
209+
queryCount: 0,
210+
queryP95Ms: undefined,
211+
queryAverageMs: undefined,
212+
queryMaxMs: undefined,
213+
evidenceMode: 'policy_default',
214+
});
215+
expect(result.observations.persistence).toEqual({
216+
lastLoadAt: undefined,
217+
lastSaveAt: undefined,
218+
evidenceMode: 'policy_default',
219+
});
188220
expect(result.nextLaneRecommendation).toBe('product_memory_operator');
189221
expect(result.recommendations).toContain(
190222
'Keep sqlite + local ANN as the default backend baseline while the sufficiency report remains trigger-free.'
@@ -259,6 +291,8 @@ describe('backend baseline sufficiency contract', () => {
259291
expect(result.packageScripts).toEqual({
260292
sufficiencyVerifierPresent: false,
261293
});
294+
expect(result.observations.queryLatency.evidenceMode).toBe('policy_default');
295+
expect(result.observations.persistence.evidenceMode).toBe('policy_default');
262296
expect(result.nextLaneRecommendation).toBe('backend_escalation');
263297
});
264298
});

src/learning/KnowledgeLearningPlatform.test.ts

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1404,12 +1404,52 @@ describe('KnowledgeLearningPlatform', () => {
14041404
'observability',
14051405
]);
14061406
expect(sufficiency.activeTriggers).toEqual([]);
1407+
expect(sufficiency.observations.queryLatency).toEqual({
1408+
queryCount: 0,
1409+
queryP95Ms: undefined,
1410+
queryAverageMs: undefined,
1411+
queryMaxMs: undefined,
1412+
evidenceMode: 'policy_default',
1413+
});
1414+
expect(sufficiency.observations.persistence).toEqual({
1415+
lastLoadAt: undefined,
1416+
lastSaveAt: undefined,
1417+
evidenceMode: 'policy_default',
1418+
});
14071419
expect(sufficiency.nextLaneRecommendation).toBe('product_memory_operator');
14081420
} finally {
14091421
fs.rmSync(tempRoot, { recursive: true, force: true });
14101422
}
14111423
});
14121424

1425+
test('backend baseline sufficiency upgrades query latency observation to measured after runtime queries', async () => {
1426+
await platform.ingestKnowledge({
1427+
incremental: true,
1428+
documents: [
1429+
{
1430+
documentId: 'doc_backend_query_measure',
1431+
sourcePath: 'Knowledge_Base/backend_query_measure.md',
1432+
language: 'en',
1433+
content: '# Backend Query Measure\nLocal ANN query telemetry should become measurable after runtime traffic.',
1434+
},
1435+
],
1436+
});
1437+
1438+
await platform.queryKnowledge({
1439+
query: 'local ann telemetry',
1440+
topK: 2,
1441+
});
1442+
1443+
const sufficiency = await platform.getBackendBaselineSufficiency();
1444+
const warmThreshold = sufficiency.thresholds.find((entry) => entry.thresholdId === 'warm_query_p95_ms');
1445+
1446+
expect(sufficiency.observations.queryLatency.queryCount).toBeGreaterThan(0);
1447+
expect(sufficiency.observations.queryLatency.evidenceMode).toBe('measured');
1448+
expect(sufficiency.observations.queryLatency.queryP95Ms).toBeGreaterThanOrEqual(0);
1449+
expect(warmThreshold?.observationStatus).toBe('measured');
1450+
expect(warmThreshold?.observedValue).toBeGreaterThanOrEqual(0);
1451+
});
1452+
14131453
test('tutor action uses misconception context for targeted guidance', async () => {
14141454
const ingest = await platform.ingestKnowledge({
14151455
incremental: true,

0 commit comments

Comments
 (0)