Skip to content

feat(knowledge): canonical data-fields.md — single LLM/Claude Code field reference #512

@jonathaneoliver

Description

@jonathaneoliver

Why

Today the dashboard chat bot (and Claude Code) infer field semantics from examples — read a find_plays row, see video_quality_pct: 3.35, guess it's a percentage. There's no anchor explaining what each field means, what units it's in, how it's populated, or what's a known gotcha.

Real symptom from issue #497 chat handoffs: the bot kept saying things like "Let me check the actual schema first" and re-running query to inspect columns mid-investigation — burning tool budget on metadata it should already have. It also got field semantics wrong (e.g. interpreted transfer_ms as wall-clock when it's actually proxy→upstream socket time, while the wire-shaped time has to be derived from ts deltas).

What

One canonical standard at .claude/standards/data-fields.md — read via read_standard(\"data-fields\") — with sections per data source and one entry per field:

  • Name (qualified path — network_requests.bytes_out or session_events.player_metrics.video_quality_pct)
  • Type / units (UInt32, ms, KB, bool, IANA tz, ratio 0-1, …)
  • Description — what it semantically represents
  • How populated — live by player SDK / derived by forwarder / proxy synthesised / operator-set
  • Known gotchas — e.g. "0 means unknown, not zero throughput"; "missing on iOS before first frame decoded"; "forwarder writes proxy→upstream timing, not wire timing"
  • Cross-references — links to standards / findings / skills that interpret it

Sources to cover

Source Approx fields Doc lives where today
session_events table (and the nested player_metrics / current_play.player_metrics / server_metrics blobs) ~60 01-schema.sql comments + ad-hoc inference
network_requests table ~25 01-schema.sql
control_events table ~12 01-schema.sql + chat.md hints
characterization_runs table + report shape (runner.Report / Step / Cycle / StartupCycleResult / variant_activity) ~60 across the nested types Go struct tags in tests/characterization/runner/report.go + the per-test standards docs
Plays summary (the find_plays return shape) ~25 None — derived in plays/find.go
Label vocabulary ~50 known strings + the synthesis rules labels.go source + scattered system-prompt hints

Rough total: 200-250 field entries. Estimated 1500-3000 lines of markdown.

How

Phased delivery so the doc is useful early:

  1. Phase 1 (~2h)session_events + network_requests field tables. These are the most-used in every chat investigation; gets the highest-impact coverage live first.
  2. Phase 2 (~1h)control_events + plays summary + label vocabulary (cross-referencing existing system-prompt hints rather than duplicating).
  3. Phase 3 (~1-2h) — characterization shapes. Carry forward the per-test standards' field reasoning where it exists; fill the gaps.

Each phase ships independently; doc is incrementally useful. The bot starts benefiting from Phase 1 immediately (most investigations only touch events + network requests).

How to ensure the bot actually uses it

Update prompts/chat.md with one sentence: "Before reasoning about the meaning of a specific field, call read_standard(name=\\\"data-fields\\\") once and look up the entry." Bot reads it on first field-question, then has it in context for the rest of the conversation.

Acceptance

  • .claude/standards/data-fields.md exists and covers Phase-1 fields with the schema-per-entry format above.
  • prompts/chat.md references it.
  • A spot-check chat conversation that asks "what does video_quality_pct mean" returns the answer without re-inspecting CH schema.
  • (Future) phases 2 + 3 close their own follow-up issues that link back here.

Out of scope

  • Auto-generation from CH schema / Go struct tags. Tempting but the doc's value is the prose semantics, not the type info. Worth revisiting if the doc gets too out-of-date to maintain by hand.
  • A field-level browser UI. Out of scope; the standards file is the substrate, UI can come later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions