Why
Today the dashboard chat bot (and Claude Code) infer field semantics from examples — read a find_plays row, see video_quality_pct: 3.35, guess it's a percentage. There's no anchor explaining what each field means, what units it's in, how it's populated, or what's a known gotcha.
Real symptom from issue #497 chat handoffs: the bot kept saying things like "Let me check the actual schema first" and re-running query to inspect columns mid-investigation — burning tool budget on metadata it should already have. It also got field semantics wrong (e.g. interpreted transfer_ms as wall-clock when it's actually proxy→upstream socket time, while the wire-shaped time has to be derived from ts deltas).
What
One canonical standard at .claude/standards/data-fields.md — read via read_standard(\"data-fields\") — with sections per data source and one entry per field:
- Name (qualified path —
network_requests.bytes_out or session_events.player_metrics.video_quality_pct)
- Type / units (UInt32, ms, KB, bool, IANA tz, ratio 0-1, …)
- Description — what it semantically represents
- How populated — live by player SDK / derived by forwarder / proxy synthesised / operator-set
- Known gotchas — e.g. "0 means unknown, not zero throughput"; "missing on iOS before first frame decoded"; "forwarder writes proxy→upstream timing, not wire timing"
- Cross-references — links to standards / findings / skills that interpret it
Sources to cover
| Source |
Approx fields |
Doc lives where today |
session_events table (and the nested player_metrics / current_play.player_metrics / server_metrics blobs) |
~60 |
01-schema.sql comments + ad-hoc inference |
network_requests table |
~25 |
01-schema.sql |
control_events table |
~12 |
01-schema.sql + chat.md hints |
characterization_runs table + report shape (runner.Report / Step / Cycle / StartupCycleResult / variant_activity) |
~60 across the nested types |
Go struct tags in tests/characterization/runner/report.go + the per-test standards docs |
Plays summary (the find_plays return shape) |
~25 |
None — derived in plays/find.go |
| Label vocabulary |
~50 known strings + the synthesis rules |
labels.go source + scattered system-prompt hints |
Rough total: 200-250 field entries. Estimated 1500-3000 lines of markdown.
How
Phased delivery so the doc is useful early:
- Phase 1 (~2h) —
session_events + network_requests field tables. These are the most-used in every chat investigation; gets the highest-impact coverage live first.
- Phase 2 (~1h) —
control_events + plays summary + label vocabulary (cross-referencing existing system-prompt hints rather than duplicating).
- Phase 3 (~1-2h) — characterization shapes. Carry forward the per-test standards' field reasoning where it exists; fill the gaps.
Each phase ships independently; doc is incrementally useful. The bot starts benefiting from Phase 1 immediately (most investigations only touch events + network requests).
How to ensure the bot actually uses it
Update prompts/chat.md with one sentence: "Before reasoning about the meaning of a specific field, call read_standard(name=\\\"data-fields\\\") once and look up the entry." Bot reads it on first field-question, then has it in context for the rest of the conversation.
Acceptance
.claude/standards/data-fields.md exists and covers Phase-1 fields with the schema-per-entry format above.
prompts/chat.md references it.
- A spot-check chat conversation that asks "what does
video_quality_pct mean" returns the answer without re-inspecting CH schema.
- (Future) phases 2 + 3 close their own follow-up issues that link back here.
Out of scope
- Auto-generation from CH schema / Go struct tags. Tempting but the doc's value is the prose semantics, not the type info. Worth revisiting if the doc gets too out-of-date to maintain by hand.
- A field-level browser UI. Out of scope; the standards file is the substrate, UI can come later.
Why
Today the dashboard chat bot (and Claude Code) infer field semantics from examples — read a
find_playsrow, seevideo_quality_pct: 3.35, guess it's a percentage. There's no anchor explaining what each field means, what units it's in, how it's populated, or what's a known gotcha.Real symptom from issue #497 chat handoffs: the bot kept saying things like "Let me check the actual schema first" and re-running
queryto inspect columns mid-investigation — burning tool budget on metadata it should already have. It also got field semantics wrong (e.g. interpretedtransfer_msas wall-clock when it's actually proxy→upstream socket time, while the wire-shaped time has to be derived fromtsdeltas).What
One canonical standard at
.claude/standards/data-fields.md— read viaread_standard(\"data-fields\")— with sections per data source and one entry per field:network_requests.bytes_outorsession_events.player_metrics.video_quality_pct)Sources to cover
session_eventstable (and the nestedplayer_metrics/current_play.player_metrics/server_metricsblobs)01-schema.sqlcomments + ad-hoc inferencenetwork_requeststable01-schema.sqlcontrol_eventstable01-schema.sql+chat.mdhintscharacterization_runstable + report shape (runner.Report/Step/Cycle/StartupCycleResult/variant_activity)tests/characterization/runner/report.go+ the per-test standards docsfind_playsreturn shape)plays/find.golabels.gosource + scattered system-prompt hintsRough total: 200-250 field entries. Estimated 1500-3000 lines of markdown.
How
Phased delivery so the doc is useful early:
session_events+network_requestsfield tables. These are the most-used in every chat investigation; gets the highest-impact coverage live first.control_events+ plays summary + label vocabulary (cross-referencing existing system-prompt hints rather than duplicating).Each phase ships independently; doc is incrementally useful. The bot starts benefiting from Phase 1 immediately (most investigations only touch events + network requests).
How to ensure the bot actually uses it
Update
prompts/chat.mdwith one sentence: "Before reasoning about the meaning of a specific field, callread_standard(name=\\\"data-fields\\\")once and look up the entry." Bot reads it on first field-question, then has it in context for the rest of the conversation.Acceptance
.claude/standards/data-fields.mdexists and covers Phase-1 fields with the schema-per-entry format above.prompts/chat.mdreferences it.video_quality_pctmean" returns the answer without re-inspecting CH schema.Out of scope