Skip to content

Downsample and throttle timeseries plots#942

Open
SimonHeybrock wants to merge 9 commits into
mainfrom
940-timeseries-downsample
Open

Downsample and throttle timeseries plots#942
SimonHeybrock wants to merge 9 commits into
mainfrom
940-timeseries-downsample

Conversation

@SimonHeybrock
Copy link
Copy Markdown
Member

@SimonHeybrock SimonHeybrock commented May 26, 2026

Summary

Closes #940.

Long-running timeseries plots are laggy because every Kafka delta rebuilds the full hv.Curve from the entire buffered history and ships it through pipe.send. The timeseries plotter now exposes three time-based knobs — Period, Recent Window, Floor Period — that together bound the displayed point count and throttle plot updates to the chosen period.

Defaults (1 s / 1 h / 5 min) keep a multi-day 1 Hz run near ~4 000 points indefinitely, regardless of buffer size.

By setting "Floor Period" to zero the user can achieve display of a moving window such as the last hour.

Example

Downsampling settings in plot config Plot showing transition from downsampled floor rate to downsampled rate in recent window

Design notes

  • Downsampling lives at the plotter, not the extractor. The subscription stays a plain full-history pull; per-plot config can change without re-subscribing.
  • The throttle short-circuits compute() when no new data has crossed Period, which means no autoscaler run, no hv.Curve build, no _set_cached_state, no presenter dirty bit, and crucially no pipe.send / Bokeh patch / WebSocket flush / browser repaint.
  • The extractor's _to_local_datetime O(N) work is not skipped — that would need a gate in DataService. The skipped portion is the dominant cost (~100 ms pipe.send at N=100 k per the issue's measurements).
  • Knobs are time-based, not point-based: users reason in time, not in budgets. max_points was considered and dropped.
  • Bucket boundaries are anchored to the epoch (fixed time grid), so kept samples don't slide as new data arrives. The recent-window cutoff is quantized to the floor period; actual recent length runs from recent_seconds to recent_seconds + floor_period_seconds (soft lower bound).

🤖 Generated with Claude Code

SimonHeybrock and others added 6 commits May 26, 2026 08:19
Long-running timeseries (~100k points after a day at 1 Hz) make the
dashboard sluggish because the full hv.Curve is rebuilt and shipped
through pipe.send on every tick. The timeseries plotter now exposes
three time-based knobs - Period, Recent Window, Floor Period - that
together bound point count and throttle plot updates to the chosen
period. Defaults keep a multi-day 1 Hz run near ~4000 points
indefinitely.

Downsampling happens at the plotter (not the extractor) so the
subscription remains a simple full-history pull and the per-plot
config can change without re-subscribing. The throttle short-circuits
compute() when no new data has crossed Period, which skips the
autoscaler, hv.Curve build, pipe.send, and downstream browser repaint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plan doc moved into code: the "where logic lives" rationale becomes a
docstring on LinePlotter.from_timeseries_params, and the throttle
semantics (what compute() skips on a short-circuit) become a docstring
on the compute() override. The downsample_timeseries module docstring
no longer references the deleted plan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
FullHistoryExtractor guarantees a datetime64 time coord by the time data
reaches LinePlotter.compute(), so the int64-with-time-unit fallback in
_to_int64_ns and _latest_time_ns was dead code that silently masked any
upstream regression. The helper _to_int64_ns is removed and its cast
inlined; _latest_time_ns now assumes datetime64 directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
FullHistoryExtractor guarantees a non-empty datetime64 time coord on
every DataArray it produces, so the per-key dim/coord/size guards in
_latest_time_ns were redundant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous bucket-first scheme anchored buckets at the band's first
sample and force-included the very last index as a special case to keep
the curve's tail aligned with the lag indicator. Anchoring buckets at
the band's last sample instead makes bucket 0 contain the latest by
construction, so the special case disappears.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scipp.DataArray supports `data[dim, np_indices]` directly, which slices
the data values, dim-aligned coords, and variances in one step and
leaves scalar coords untouched. The hand-rolled `_select_indices`
helper that rebuilt the DataArray field-by-field was reinvention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the numpy datetime64->int64 cast and per-band index arithmetic
with scipp operations: `.to(unit='ns')` for the coord, scipp arithmetic
on datetime/timedelta variables for the bucket math, and a single
boolean keep-mask via `sc.where` selecting per-sample band reference
and period. The final selection uses scipp boolean indexing rather
than fancy-int indexing of concatenated index arrays.

Benchmarked on 1k-1M point inputs with typical dashboard parameters
(recent=1h@1s, floor=5min):
- n>=10k: 1.2-1.4x faster (1M points: 13ms -> 9ms)
- n=1k: slower in absolute terms (~200us overhead) but still sub-ms

Outputs are identical to the previous implementation at all sizes.
The previous design end-anchored buckets at each band's latest sample.
That made bucket IDs of all existing samples shift by +1 every tick,
so although the keep *pattern* was preserved, the actual kept samples
slid one position forward with every update.

Anchoring buckets to the epoch instead gives a fixed time grid: kept
samples sit on absolute time-quanta and don't move as new samples
arrive. The recent-band cutoff is quantized to the floor period so
band membership is stable between quantum crossings; the actual
recent length is now `recent_seconds` to `recent_seconds +
floor_period_seconds` (soft lower bound). At each crossing one floor
period of samples retires from the recent band as a batch.

Side benefits: the `latest_older` lookup and per-band reference time
disappear. At 1M points the function drops from ~10 ms to ~4.6 ms
(2.3x); cumulative ~2.8x over the original numpy implementation.
@SimonHeybrock SimonHeybrock marked this pull request as ready for review May 27, 2026 08:55
@SimonHeybrock SimonHeybrock requested a review from nvaytet May 27, 2026 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Long-running timeseries plots cause dashboard lag (server-side too, not just rendering)

1 participant