Downsample and throttle timeseries plots#942
Open
SimonHeybrock wants to merge 9 commits into
Open
Conversation
Long-running timeseries (~100k points after a day at 1 Hz) make the dashboard sluggish because the full hv.Curve is rebuilt and shipped through pipe.send on every tick. The timeseries plotter now exposes three time-based knobs - Period, Recent Window, Floor Period - that together bound point count and throttle plot updates to the chosen period. Defaults keep a multi-day 1 Hz run near ~4000 points indefinitely. Downsampling happens at the plotter (not the extractor) so the subscription remains a simple full-history pull and the per-plot config can change without re-subscribing. The throttle short-circuits compute() when no new data has crossed Period, which skips the autoscaler, hv.Curve build, pipe.send, and downstream browser repaint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plan doc moved into code: the "where logic lives" rationale becomes a docstring on LinePlotter.from_timeseries_params, and the throttle semantics (what compute() skips on a short-circuit) become a docstring on the compute() override. The downsample_timeseries module docstring no longer references the deleted plan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
FullHistoryExtractor guarantees a datetime64 time coord by the time data reaches LinePlotter.compute(), so the int64-with-time-unit fallback in _to_int64_ns and _latest_time_ns was dead code that silently masked any upstream regression. The helper _to_int64_ns is removed and its cast inlined; _latest_time_ns now assumes datetime64 directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
FullHistoryExtractor guarantees a non-empty datetime64 time coord on every DataArray it produces, so the per-key dim/coord/size guards in _latest_time_ns were redundant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous bucket-first scheme anchored buckets at the band's first sample and force-included the very last index as a special case to keep the curve's tail aligned with the lag indicator. Anchoring buckets at the band's last sample instead makes bucket 0 contain the latest by construction, so the special case disappears. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scipp.DataArray supports `data[dim, np_indices]` directly, which slices the data values, dim-aligned coords, and variances in one step and leaves scalar coords untouched. The hand-rolled `_select_indices` helper that rebuilt the DataArray field-by-field was reinvention. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 26, 2026
Replaces the numpy datetime64->int64 cast and per-band index arithmetic with scipp operations: `.to(unit='ns')` for the coord, scipp arithmetic on datetime/timedelta variables for the bucket math, and a single boolean keep-mask via `sc.where` selecting per-sample band reference and period. The final selection uses scipp boolean indexing rather than fancy-int indexing of concatenated index arrays. Benchmarked on 1k-1M point inputs with typical dashboard parameters (recent=1h@1s, floor=5min): - n>=10k: 1.2-1.4x faster (1M points: 13ms -> 9ms) - n=1k: slower in absolute terms (~200us overhead) but still sub-ms Outputs are identical to the previous implementation at all sizes.
The previous design end-anchored buckets at each band's latest sample. That made bucket IDs of all existing samples shift by +1 every tick, so although the keep *pattern* was preserved, the actual kept samples slid one position forward with every update. Anchoring buckets to the epoch instead gives a fixed time grid: kept samples sit on absolute time-quanta and don't move as new samples arrive. The recent-band cutoff is quantized to the floor period so band membership is stable between quantum crossings; the actual recent length is now `recent_seconds` to `recent_seconds + floor_period_seconds` (soft lower bound). At each crossing one floor period of samples retires from the recent band as a batch. Side benefits: the `latest_older` lookup and per-band reference time disappear. At 1M points the function drops from ~10 ms to ~4.6 ms (2.3x); cumulative ~2.8x over the original numpy implementation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #940.
Long-running timeseries plots are laggy because every Kafka delta rebuilds the full
hv.Curvefrom the entire buffered history and ships it throughpipe.send. The timeseries plotter now exposes three time-based knobs — Period, Recent Window, Floor Period — that together bound the displayed point count and throttle plot updates to the chosen period.Defaults (1 s / 1 h / 5 min) keep a multi-day 1 Hz run near ~4 000 points indefinitely, regardless of buffer size.
By setting "Floor Period" to zero the user can achieve display of a moving window such as the last hour.
Example
Design notes
compute()when no new data has crossedPeriod, which means no autoscaler run, nohv.Curvebuild, no_set_cached_state, no presenter dirty bit, and crucially nopipe.send/ Bokeh patch / WebSocket flush / browser repaint._to_local_datetimeO(N) work is not skipped — that would need a gate inDataService. The skipped portion is the dominant cost (~100 mspipe.sendat N=100 k per the issue's measurements).max_pointswas considered and dropped.recent_secondstorecent_seconds + floor_period_seconds(soft lower bound).🤖 Generated with Claude Code