Skip to content

Gate plotter.compute on viewer interest#946

Open
SimonHeybrock wants to merge 4 commits into
mainfrom
lazy-compute-on-layer-state
Open

Gate plotter.compute on viewer interest#946
SimonHeybrock wants to merge 4 commits into
mainfrom
lazy-compute-on-layer-state

Conversation

@SimonHeybrock
Copy link
Copy Markdown
Member

@SimonHeybrock SimonHeybrock commented May 26, 2026

Motivation

plotter.compute() runs on every Kafka delta for every layer, regardless of whether anyone is watching. Hidden tabs (Panel Tabs(dynamic=True)), backgrounded sessions, and a running backend with no browser attached all still pay the full build cost on every update. For expensive plotters this can consume significant CPU, especially when many plot tabs are used. The motivating trigger for this change was #942, which adds downsampling with significant cost to timeseries plotting.

Approach

Add an interest gate on LayerStateMachine. A layer's plotter only builds when at least one viewer holds an interest token; otherwise the latest Kafka input is stashed and collapsed (last-writer-wins) until a viewer arrives. On a 0→1 token transition with pending input, the build runs synchronously so the same polling pass's component rebuild sees fresh has_cached_state.

Tokens are ref-counted via LayerStateMachine.set_active(token, active). The polling loop drives PlotOrchestrator.activate_layer(layer_id, session_layer, is_active) per (grid, layer): the currently-visible grid's layers hold tokens; everything else is released. Cleanup of dropped sessions is automatic via weakref.finalize, so a session disposed without PlotGridTabs.shutdown running cannot leak interest.

Visibility is orthogonal to the layer FSM. It is per-(session, layer) — interest is held by SessionLayer instances, not by the layer itself — so it cannot fit into the layer-singular WAITING_FOR_JOB → … → READY chain. The gate lives on LayerStateMachine because that object already owns the plotter reference and is shared across sessions, but it is decoupled from the lifecycle states.

Threading

  • stash_pending runs on the bg ingestion thread (data-subscriber callback).
  • set_active runs on the per-session polling thread (Bokeh main).
  • Both return a ComputeTask when a build is due; the caller invokes task.run() on its own thread, outside the gate lock.
  • The 0→1 activation flush therefore runs on the polling thread — deliberately, so the same poll pass sees fresh has_cached_state and renders without waiting another tick.

Notes on the design

  • ComputeTask is a frozen dataclass returned from gate operations so callers run the build outside the lock. _run_compute (Kafka delta) and activate_layer (tab activation) both funnel through _dispatch_compute_task for uniform error reporting and state transitions.
  • Plotter replacement via job_started clears the stash so old input cannot be flushed through a new plotter.
  • weakref.finalize auto-releases tokens if the owning SessionLayer is garbage-collected. Explicit set_active(False) remains the fast path; the finalizer is belt-and-braces against missed cleanup, and it removes the id()-reuse footgun where a stale int could be discarded by an unrelated caller.

Test plan

  • New unit tests on LayerStateMachine: stash-without-token, 0→1 flush, only-once flush, multi-token ref counting, intermediate-update collapse, unknown-token release, pending-cleared-on-plotter-replacement, no-flush-before-job_started, gc auto-release.
  • Existing dashboard suite: 1009 passed (pre-existing Panel/Widget deprecation failures unchanged from main).
  • Manual smoke against dummy instrument: hidden-tab plots stop computing; tab switches still show fresh state.
Relationship to #944

#944 implements the same runtime semantics — gating compute on viewer interest with a synchronous 0→1 flush — but places the gate inside Plotter itself (set_active, _pending, _dirty, _active_tokens, _build_lock, a compute_build rename). That couples a layer-orchestration concern to every plotter subclass: CorrelationHistogramPlotter has to forward set_active, StaticPlotter.set_active is a no-op stub, six subclasses inherit a _build/compute split they don't need, and ~30 Plotter test sites need fixtures for token acquisition.

This PR puts the gate on LayerStateMachine — which is already per-layer, shared across sessions, and owns both the plotter reference and the lifecycle. Plotter and its subclasses are untouched; no new abstractions in the plotting layer.

#944 this PR
Files touched 15 5
Plotter API change yes (set_active, _build) none
Plotter subclasses refactored 8 0
Plotter unit tests changed yes (~30 sites) no
Static / correlation special-cases yes none

Benchmarks from #944 (tab-switch latency, hidden-tab CPU) apply unchanged — the build path itself is identical.

Alternative shape to #944: same lazy-compute semantics, but the gate lives
on LayerStateMachine instead of inside Plotter. Plotter stays a pure
compute artifact -- no set_active, no compute/_build split, no internal
locking. Every Plotter subclass, static/correlation special-cases, and
~30 plotter test fixtures are left untouched.

LayerStateMachine gains a token set, pending input stash, and gate lock.
stash_pending() and set_active() return a ComputeTask when a build should
run now; callers (PlotOrchestrator._run_compute and the new
activate_layer entry point) invoke the task outside any lock. The 0->1
token transition flushes synchronously on the polling thread so the
same poll pass sees fresh has_cached_state. Plotter replacement via
job_started clears the stash so old input cannot be flushed through a
new plotter.

The polling loop in plot_grid_tabs now drives
orchestrator.activate_layer(layer_id, session_layer, is_active) once
per (grid, layer); orphan and shutdown paths release tokens.

Test migration is one helper update (add_cell_with_layer activates the
new layer) plus two inline activate_layer calls in tests that bypass the
helper. Plotter unit tests are unchanged. 8 new gate tests cover
stash-without-token, 0->1 flush, intermediate-update collapse,
multi-token ref counting, plotter replacement, and pre-job_started
stash behaviour.
The gate spans two threads by design: stash_pending runs on the bg
ingestion thread, set_active on the per-session polling thread. The
0→1 flush deliberately runs on the polling thread so the same poll
pass's component rebuild observes fresh has_cached_state.
Attach a weakref.finalize when a token is first acquired, so the gate
releases it if the caller (typically a SessionLayer) is garbage-collected
without an explicit set_active(False). Explicit release remains the fast
path; the finalizer is belt-and-braces against missed cleanup (e.g., a
session disposed without PlotGridTabs.shutdown running) and removes the
id()-reuse footgun where a stale int could be discarded by an unrelated
caller.
@SimonHeybrock SimonHeybrock changed the title Gate plotter.compute on viewer interest at LayerStateMachine (alt to #944) Gate plotter.compute on viewer interest May 27, 2026
@SimonHeybrock SimonHeybrock marked this pull request as ready for review May 27, 2026 07:35
@SimonHeybrock SimonHeybrock requested a review from MridulS May 27, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant