You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/plans/trace-evaluation-architecture.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -110,10 +110,10 @@ flowchart TB
110
110
Q --> R
111
111
```
112
112
113
-
The normalized trajectory should have two layers:
113
+
The normalized trace model should keep one canonical source of truth plus derived read models:
114
114
115
-
-A compact summary for cheap storage and dashboard aggregation: counts, durations, token usage, cost, error count, and tool-call counts.
116
-
-A full trajectory for grading and explanation: ordered model turns, tool calls/results, branch metadata, source event IDs, content redaction state, and raw evidence handles.
115
+
-The full trajectory is the canonical artifact for grading, replay, and explanation: ordered model turns, tool calls/results, branch metadata, source event IDs, content redaction state, and raw evidence handles.
116
+
-The compact summary is a derived compatibility/read model for cheap result storage and dashboard aggregation: counts, durations, token usage, cost, error count, and tool-call counts. It must be recomputable from a full trajectory and should not be authored as separate trace state when the trajectory is available.
117
117
118
118
Directional wire shape:
119
119
@@ -178,7 +178,7 @@ The exact schema belongs in implementation, but these concepts should be stable:
178
178
- **Files:** `packages/core/src/evaluation/trace.ts`, `packages/core/src/evaluation/types.ts`, `packages/eval/src/schemas.ts`, new focused files under `packages/core/src/evaluation/trace/` if the existing file becomes too large.
179
179
- **Patterns:** Follow the existing `TraceSummary`, `TokenUsage`, and project wire conversion conventions. Keep internal fields camelCase and wire fields snake_case.
180
180
- **Test Scenarios:** Add tests that validate round-trip conversion, version rejection, missing optional content, inferred duration flags, branch metadata, and raw evidence handles.
181
-
- **Verification:** Unit tests should prove summaries can be derived from full trajectories without changing current summary behavior.
181
+
- **Verification:** Unit tests should prove summaries can be derived from full trajectories without changing current summary behavior, and that normalized trajectory artifacts do not embed a separate summary payload.
0 commit comments