Gate plotter.compute on viewer interest by SimonHeybrock · Pull Request #946 · scipp/esslivedata

SimonHeybrock · 2026-05-26T12:47:10Z

Motivation

plotter.compute() runs on every Kafka delta for every layer, regardless of whether anyone is watching. Hidden tabs (Panel Tabs(dynamic=True)), backgrounded sessions, and a running backend with no browser attached all still pay the full build cost on every update. For expensive plotters this can consume significant CPU, especially when many plot tabs are used. The motivating trigger for this change was #942, which adds downsampling with significant cost to timeseries plotting.

Approach

Add an interest gate on LayerStateMachine. A layer's plotter only builds when at least one viewer holds an interest token; otherwise the latest Kafka input is stashed and collapsed (last-writer-wins) until a viewer arrives. On a 0→1 token transition with pending input, the build runs synchronously so the same polling pass's component rebuild sees fresh has_cached_state.

Tokens are ref-counted via LayerStateMachine.set_active(token, active). The polling loop drives PlotOrchestrator.activate_layer(layer_id, session_layer, is_active) per (grid, layer): the currently-visible grid's layers hold tokens; everything else is released. Cleanup of dropped sessions is automatic via weakref.finalize, so a session disposed without PlotGridTabs.shutdown running cannot leak interest.

Visibility is orthogonal to the layer FSM. It is per-(session, layer) — interest is held by SessionLayer instances, not by the layer itself — so it cannot fit into the layer-singular WAITING_FOR_JOB → … → READY chain. The gate lives on LayerStateMachine because that object already owns the plotter reference and is shared across sessions, but it is decoupled from the lifecycle states.

Threading

stash_pending runs on the bg ingestion thread (data-subscriber callback).
set_active runs on the per-session polling thread (Bokeh main).
Both return a ComputeTask when a build is due; the caller invokes task.run() on its own thread, outside the gate lock.
The 0→1 activation flush therefore runs on the polling thread — deliberately, so the same poll pass sees fresh has_cached_state and renders without waiting another tick.

Notes on the design

ComputeTask is a frozen dataclass returned from gate operations so callers run the build outside the lock. _run_compute (Kafka delta) and activate_layer (tab activation) both funnel through _dispatch_compute_task for uniform error reporting and state transitions.
Plotter replacement via job_started clears the stash so old input cannot be flushed through a new plotter.
weakref.finalize auto-releases tokens if the owning SessionLayer is garbage-collected. Explicit set_active(False) remains the fast path; the finalizer is belt-and-braces against missed cleanup, and it removes the id()-reuse footgun where a stale int could be discarded by an unrelated caller.

Test plan

New unit tests on LayerStateMachine: stash-without-token, 0→1 flush, only-once flush, multi-token ref counting, intermediate-update collapse, unknown-token release, pending-cleared-on-plotter-replacement, no-flush-before-job_started, gc auto-release.
Existing dashboard suite: 1009 passed (pre-existing Panel/Widget deprecation failures unchanged from main).
Manual smoke against dummy instrument: hidden-tab plots stop computing; tab switches still show fresh state.

Relationship to #944

#944 implements the same runtime semantics — gating compute on viewer interest with a synchronous 0→1 flush — but places the gate inside Plotter itself (set_active, _pending, _dirty, _active_tokens, _build_lock, a compute → _build rename). That couples a layer-orchestration concern to every plotter subclass: CorrelationHistogramPlotter has to forward set_active, StaticPlotter.set_active is a no-op stub, six subclasses inherit a _build/compute split they don't need, and ~30 Plotter test sites need fixtures for token acquisition.

This PR puts the gate on LayerStateMachine — which is already per-layer, shared across sessions, and owns both the plotter reference and the lifecycle. Plotter and its subclasses are untouched; no new abstractions in the plotting layer.

	#944	this PR
Files touched	15	5
Plotter API change	yes (`set_active`, `_build`)	none
Plotter subclasses refactored	8	0
Plotter unit tests changed	yes (~30 sites)	no
Static / correlation special-cases	yes	none

Benchmarks from #944 (tab-switch latency, hidden-tab CPU) apply unchanged — the build path itself is identical.

Alternative shape to #944: same lazy-compute semantics, but the gate lives on LayerStateMachine instead of inside Plotter. Plotter stays a pure compute artifact -- no set_active, no compute/_build split, no internal locking. Every Plotter subclass, static/correlation special-cases, and ~30 plotter test fixtures are left untouched. LayerStateMachine gains a token set, pending input stash, and gate lock. stash_pending() and set_active() return a ComputeTask when a build should run now; callers (PlotOrchestrator._run_compute and the new activate_layer entry point) invoke the task outside any lock. The 0->1 token transition flushes synchronously on the polling thread so the same poll pass sees fresh has_cached_state. Plotter replacement via job_started clears the stash so old input cannot be flushed through a new plotter. The polling loop in plot_grid_tabs now drives orchestrator.activate_layer(layer_id, session_layer, is_active) once per (grid, layer); orphan and shutdown paths release tokens. Test migration is one helper update (add_cell_with_layer activates the new layer) plus two inline activate_layer calls in tests that bypass the helper. Plotter unit tests are unchanged. 8 new gate tests cover stash-without-token, 0->1 flush, intermediate-update collapse, multi-token ref counting, plotter replacement, and pre-job_started stash behaviour.

The gate spans two threads by design: stash_pending runs on the bg ingestion thread, set_active on the per-session polling thread. The 0→1 flush deliberately runs on the polling thread so the same poll pass's component rebuild observes fresh has_cached_state.

Attach a weakref.finalize when a token is first acquired, so the gate releases it if the caller (typically a SessionLayer) is garbage-collected without an explicit set_active(False). Explicit release remains the fast path; the finalizer is belt-and-braces against missed cleanup (e.g., a session disposed without PlotGridTabs.shutdown running) and removes the id()-reuse footgun where a stale int could be discarded by an unrelated caller.

SimonHeybrock added 3 commits May 26, 2026 12:40

SimonHeybrock changed the title ~~Gate plotter.compute on viewer interest at LayerStateMachine (alt to #944)~~ Gate plotter.compute on viewer interest May 27, 2026

SimonHeybrock marked this pull request as ready for review May 27, 2026 07:35

SimonHeybrock requested a review from MridulS May 27, 2026 07:35

Merge branch 'main' into lazy-compute-on-layer-state

96fd49d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gate plotter.compute on viewer interest#946

Gate plotter.compute on viewer interest#946
SimonHeybrock wants to merge 4 commits into
mainfrom
lazy-compute-on-layer-state

SimonHeybrock commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SimonHeybrock commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Approach

Threading

Notes on the design

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SimonHeybrock commented May 26, 2026 •

edited

Loading