docs(observability): add Grafana dashboard starter#6970
Draft
everettVT wants to merge 1 commit into
Draft
Conversation
5-panel Grafana dashboard JSON + docs page covering throughput, bytes flow, task lifecycle, top operators by duration, and a failed-task counter. Reads OTel-exported Daft metrics via Prometheus. Complement to the in-process Daft Dashboard: in-process for live per-query debugging, Grafana for fleet monitoring / on-call / SLO tracking alongside the rest of your infrastructure observability. Built from the documented OTel metric surface in docs/observability/telemetry.md. Shipped untested end-to-end — needs verification against a real Daft + OTel + Prometheus + Grafana stack (see PR test plan). Adds entry to docs/SUMMARY.md under Guide → Configuration & Optimization → Observability.
Rust Dependency DiffHead: ✅ OK: Within budget.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an importable Grafana dashboard for Daft observability, built on the OTel metrics already documented in
docs/observability/telemetry.md. 5 panels: rows/sec by operator type, bytes flow, task lifecycle, top operators by cumulative duration, and a failed-tasks stat panel.Companion to the in-process Daft Dashboard (
daft dashboard start):Why a draft
Shipped untested end-to-end. I built this from the documented metric surface but haven't run it against a live Daft + OTel + Prometheus + Grafana stack. The metric query shapes assume the standard OTel Collector → Prometheus exporter naming convention (
daft.X.Y→daft_x_y_total); this should be verified against an actual scrape before users follow the docs.Files
docs/observability/grafana.md— new docs page (prerequisites, import instructions, panel reference, metric naming convention, "what's not covered yet", adaptation tips)docs/observability/grafana/daft-dashboard.json— the dashboard JSON (Grafana 10+, schema v39,$DS_PROMETHEUSdatasource variable)docs/SUMMARY.md— adds "Grafana Dashboard" entry under Observability nav, after TelemetryTest plan
daft-dashboard.jsoninto a real Grafana instance pointed at a Prometheus scraping OTel-exported Daft metricsdaft_rows_out_total,daft_duration_total,daft_task_*, etc.)Top 10 operators by cumulative durationquery (topk(10, sum by (node_type, node_id) (daft_duration_total))) returns expected shape — the µs counter may need to be rescaled in the panel displaydaft_bytes_read_total,daft_task_active,daft_task_*as distributed-only pertelemetry.md)bytes_in/bytes_outPrometheus exposure is confirmed, add inflation/deflation panelsContext
This started as a marketing-repo canon artifact (
Eventual-Inc/marketing#360) supporting the W21 observability launch. Moved here because:docs.daft.ai/observability/will find it where they're already lookingThe marketing PR (#360 in
Eventual-Inc/marketing) will be closed in favor of this one.