HanSur94 · HanSur94 · Jun 24, 2026
diff --git a/benchmarks/.reports/coverage.md b/benchmarks/.reports/coverage.md
@@ -0,0 +1,137 @@
+# Benchmark coverage notes
+
+Tracks what `/bench-evolve` has added and which performance-critical paths
+still lack isolated benchmark coverage. Newest entries first.
+
+## Performance-critical surface (ranked) and coverage status
+
+| Path | Why it matters | Coverage |
+|------|----------------|----------|
+| **Downsampling kernels** (`minmax_downsample` / `lttb_downsample` → `minmax_core_mex` / `lttb_core_mex`) | Runs on every render + every zoom/pan, over the full dataset (≤50M pts). The library's core value. | ✅ `bench_downsample_kernels.m` (isolated, both methods) — *added 2026-06-24*. Also exercised indirectly in `benchmark.m` / `benchmark_zoom.m` / `benchmark_features.m` (render-mixed). |
+| **`binary_search`** (`binary_search_mex`) | Range-window lookup on raw full-N sorted arrays; on the resolve path for every zoom/pan + every tag range query (`FastSense.m`, `FastSenseToolbar.m`, `SensorTag.m`). | ✅ `bench_binary_search.m` (isolated, log-scaling gate) — *added 2026-06-24*. |
+| **Violation marker path** (`violation_cull` → `violation_cull_mex`; constant + step-function branches) | Fused detect+cull on every threshold render/zoom for thresholds with `ShowViolations` (incl. time-varying step thresholds). | ✅ `bench_violation_cull.m` (isolated, both branches, linear-scaling gate) — *added 2026-06-24*. |
+| **Disk range-query** (`FastSenseDataStore.getRange`, `resolve_disk_mex`) | Out-of-core read on every zoom/pan of a disk-backed line. The large-data story's hot read path. | ✅ `bench_datastore_range.m` (fixed-window query, indexed-read gate) — *added 2026-06-24*. Store create/slice still only exploratory (`benchmark_datastore.m` / `profile_datastore.m`). |
+| **CSV ingestion** (`dispatchDelimitedParse_` → `delimited_parse_mex`, fallback `readRawDelimited_`) | Front door for raw sensor data into the Tag pipeline; MEX is ~10–40× the textscan fallback. Slow parse = slow load for big logs. | ✅ `bench_delimited_parse.m` (isolated, row-scaling gate) — *added 2026-06-24*. |
+| **Pyramid build** (`FastSense.buildPyramidLevel`) | Multi-level pre-downsample cache built at render for large lines (powers O(1) zoom). Full-N at render. | ◐ Partial — it is essentially `minmax_downsample` per level (already gated by `bench_downsample_kernels.m`) + chunked disk reads; only *memory*-benchmarked end-to-end (`benchmark_memory.m`). Low marginal value to isolate; private method. |
+| **`to_step_function_mex`** | SIMD step-function conversion — a compiled, deployed, correctness-tested kernel (`TestToStepFunctionMex`). | ⏸️ **DEFERRED** — no confirmed production caller. `MonitorTag.recompute_` emits a binary vector (no step conversion); `StateTag.getXY` is pass-through; only the test suite calls it. The `dispatchDelimitedParse_` comment citing it is stale. **Investigate whether it's still wired into any render path (or is vestigial) before benchmarking.** |
+| **Tag layer** (SensorTag/MonitorTag/CompositeTag getXY, resolve, append) | Live-tick recompute path. | ✅ `bench_sensortag_getxy`, `bench_monitortag_tick`, `bench_monitortag_append`, `bench_compositetag_merge`, `bench_consumer_migration_tick`, `bench_tag_pipeline_1k`. |
+| **Dashboard refresh / load** | Live dashboard refresh rate. | ✅ `bench_dashboard`, `bench_dashboard_live`, `bench_dashboard_load`. |
+| **Full render vs plot(), zoom/pan, memory, features** | End-to-end render comparison. | ✅ `benchmark.m`, `benchmark_zoom.m`, `benchmark_memory.m`, `benchmark_features.m`. |
+
+## Change log
+
+### 2026-06-24 — `bench_downsample_kernels.m`
+- **Gap closed:** isolated downsampling-kernel microbenchmark. Previously the
+  only coverage was a single `minmax_downsample(x,y,1000)` call buried inside
+  the render-heavy `benchmark.m`; **LTTB had zero coverage anywhere**.
+- **What it does:** times `minmax_downsample` and `lttb_downsample` as pure
+  computation (no figure/render) across a 10K→10M size sweep, same ~2000-pt
+  output budget for both, reporting per-call ms + throughput (Mpts/s).
+- **Gate:** machine-independent — fits the empirical log-log scaling exponent
+  over the large-N portion and asserts it stays ≤ 1.3 (catches super-linear
+  creep regardless of host speed).
+- **Reaches the private wrappers** by `cd`-ing into `libs/FastSense/private`
+  (current folder is always searched, even when named `private`) — works in
+  both MATLAB and Octave, unlike the `addpath(.../private)` trick that
+  `benchmark.m` uses (Octave-only; MATLAB rejects private dirs on the path).
+- **First run (MATLAB R2025b, MEX active):** MinMax ~764 Mpts/s @ 10M,
+  LTTB ~349 Mpts/s @ 10M; scaling exponents 0.88 / 0.87 → PASS.
+
+### 2026-06-24 — `bench_binary_search.m`
+- **Gap closed:** isolated range-lookup microbenchmark. `binary_search` is the
+  most broadly-used uncovered kernel — the resolve/zoom window lookup in
+  `FastSense.m` (4060/4103/4178/4460), timestamp lookup (1683), toolbar
+  click/range, and tag range resolve (`SensorTag.m:152`), on raw full-N sorted
+  arrays, every zoom/pan. Re-prioritised **above** the violation marker path
+  this run: `violation_cull` runs on already-downsampled display data
+  (small-N, per-frame), whereas `binary_search` hits the raw full-N array.
+- **What it does:** times 20k scalar `'left'`/`'right'` lookups across a
+  10K→50M sweep, reporting per-query µs + Mqueries/s.
+- **Gate:** machine-independent — fits the per-query log-log exponent over the
+  large-N portion and asserts it stays ≤ 0.6, catching the catastrophic
+  O(log N)→O(N) (linear-scan) regression regardless of host speed.
+- **MEX detection caveat (baked into the bench):** `binary_search_mex` lives in
+  `libs/FastSense/private` and is visible to `binary_search.m` (its parent) but
+  NOT from `benchmarks/`. A plain `exist('binary_search_mex','file')` in the
+  bench misreports as fallback; the bench instead checks the built binary for
+  the current platform on disk (`['binary_search_mex.' mexext]`).
+- **First run (MATLAB R2025b, MEX active):** ~0.95 µs/query @ 10K → ~1.8 µs @ 50M;
+  exponent 0.09 (firmly logarithmic), growth 1.9× over the sweep → PASS.
+
+### 2026-06-24 — `bench_violation_cull.m`
+- **Gap closed:** isolated threshold-marker microbenchmark. `violation_cull` is
+  the fused detect+cull kernel called per (threshold x line) on every
+  render/zoom (`FastSense.m:1368/1371`, `4468/4471`); only
+  `bench_event_marker_regression.m` touched a neighbouring path before.
+- **What it does:** times both threshold branches as pure computation — a
+  constant threshold (thX=0 sentinel) and a 5-knot step-function threshold —
+  across a 1K→1M input sweep, reporting per-call ms + throughput. Annotated
+  that production input is the displayed/downsampled data (~few thousand pts,
+  the low end); upper sizes verify linear scaling.
+- **Gate:** machine-independent — log-log scaling exponent over N >= 1e4 must
+  stay <= 1.3 (catches super-linear creep in detect+cull).
+- **Reaches the private wrapper** via the `cd`-into-`libs/FastSense/private`
+  trick (see [[benchmarking-private-mex-kernels]]).
+- **First run (MATLAB R2025b, MEX active):** constant ~288 Mpts/s @ 1M, step
+  ~261 Mpts/s @ 1M; at the realistic ~1K size both are sub-10 µs. Scaling
+  exponents 0.93 / 0.92 → PASS.
+
+### 2026-06-24 — `bench_datastore_range.m`
+- **Gap closed:** focused, deterministic gate for the disk-backed range-query
+  path (`FastSenseDataStore.getRange`), which every zoom/pan on a disk-backed
+  line hits. Previously only exploratory scripts existed (`benchmark_datastore.m`
+  is a .mat-vs-SQLite sweep and Linux-only — shells out to `free`;
+  `profile_datastore.m` is a profiler script). No figure needed.
+- **What it does:** builds a chunked store at each size, fires fixed-size view
+  windows (width scaled so each query returns ~10k pts regardless of N), times
+  `getRange`, and reports create time + per-query ms + queries/s.
+- **Gate:** machine-independent — the indexed store must read only the window,
+  so per-query time must stay ~constant as the dataset grows; asserts the
+  query-time-vs-total-N exponent <= 0.5 (a full-scan regression → ~1.0).
+- **Robustness:** warms up a throwaway store first (absorbs one-time SQLite/MEX
+  init), and always `cleanup()`s each store (try/catch + post-loop) so temp DBs
+  never leak even if the gate trips.
+- **First run (MATLAB R2025b, mksqlite active):** query time flat at ~0.16 ms
+  across 100K→5M (50× more data), exponent −0.11, exactly 10002 pts/query → PASS.
+
+### Pivot note this run
+Intended target was `to_step_function_mex`, but a fresh survey found it has **no
+confirmed production caller** (see table) — benchmarking it would violate the
+"path that matters" rule. Deferred it (flagged for investigation) and pivoted to
+the disk range-query gate instead.
+
+### 2026-06-24 — `bench_delimited_parse.m`
+- **Gap closed:** isolated CSV-ingestion microbenchmark. `delimited_parse_mex`
+  (via `dispatchDelimitedParse_`) is the parse front door for the Tag pipeline,
+  documented at ~10–40× the textscan fallback, with zero coverage
+  (BatchTagPipeline / delimited ingestion was entirely unbenchmarked).
+- **What it does:** generates deterministic 4-column CSVs of growing row count,
+  times `dispatchDelimitedParse_` (file generation excluded), reports parse ms +
+  rows/s + MB/s. Always deletes its temp files (per-iter + onCleanup backstop).
+- **Gate:** machine-independent — log-log row-scaling exponent over rows ≥ 1e4
+  must stay ≤ 1.3 (catches super-linear parse creep, e.g. O(rows²) realloc).
+- **Reaches the private wrapper** via `cd`-into-`libs/SensorThreshold/private`
+  (see [[benchmarking-private-mex-kernels]]).
+- **First run (MATLAB R2025b, MEX active):** ~5.7 M rows/s (~205 MB/s) at 100K–500K
+  rows; exponent 0.98 (essentially linear) → PASS.
+
+### Pivot notes this run
+Two earmarked targets were rejected on fresh survey:
+- **Pyramid build** — `buildPyramidLevel` is just `minmax_downsample` per level
+  (already gated) + chunked reads; private; low marginal value. Downgraded to
+  ◐ Partial in the table, not benchmarked.
+- **DerivedTag.recompute_** — thin dispatch around a user-supplied `ComputeFn`
+  (`[X,Y] = ComputeFn(Parents)`), so a microbench would mostly measure the test
+  closure, not a FastSense kernel. Deferred unless paired with a built-in
+  compute/alignment path worth isolating.
+
+### Next gap for the following iteration
+Survey fresh, but leading candidates (higher-level paths now that the core MEX
+kernels are covered):
+- **EventStore persistence scaling** — `EventStore.save` (atomic temp-rename
+  write) / `load` as event count grows; relevant for long-running live
+  dashboards. Confirm it isn't already covered by `bench_event_marker_regression`
+  / `bench_dashboard_*` (those attach stores but may not stress save/load at scale).
+- **LiveEventPipeline per-tick processing** (`processMonitorTag_`) on the live
+  refresh path — confirm it isn't already covered by `bench_monitortag_tick`.
+- Still open: the `to_step_function_mex` wiring question (filed as a background task).
diff --git a/benchmarks/bench_binary_search.m b/benchmarks/bench_binary_search.m
@@ -0,0 +1,152 @@
+function bench_binary_search()
+%BENCH_BINARY_SEARCH Isolated microbenchmark of the range-lookup hot path.
+%
+%   binary_search is the gateway to every range query in FastSense. On each
+%   zoom/pan and render it locates the visible index window in a raw, sorted,
+%   full-length X array — FastSense.m (resolve/zoom window, timestamp lookup),
+%   FastSenseToolbar.m (click-to-point, range select) and SensorTag.m (tag
+%   range resolve) all call it, against arrays up to tens of millions of
+%   points. It is MEX-accelerated (binary_search_mex) with a pure-MATLAB
+%   fallback, yet has no benchmark anywhere.
+%
+%   The cost of any single call is tiny (O(log N) comparisons), so absolute
+%   throughput is not the point. The point is the GATE: binary search must
+%   stay logarithmic. If the MEX silently stops loading, or a change turns
+%   the search into a linear scan, large-data zoom/pan responsiveness
+%   collapses — and nothing else in the suite would catch it. This benchmark
+%   times many scalar lookups (both 'left' and 'right') across a wide size
+%   sweep and asserts the per-query time scales sub-linearly with N.
+%
+%   Per-query time grows only weakly with N (a mix of ~log2(N) comparisons
+%   and cache-miss penalty as the array spills out of cache), so the
+%   empirical log-log exponent stays well below the linear-scan exponent of
+%   ~1.0. The gate (exponent <= 0.6) cleanly separates the two regimes and
+%   is machine-independent.
+%
+%   Warmup dissolves first-call/JIT overhead; each measurement loops over a
+%   fixed query batch so per-call dispatch stays representative of production
+%   (binary_search is always called scalar); median of nRuns defuses spikes.
+%
+%   Run:
+%     octave --no-gui --eval "install(); bench_binary_search();"
+%
+%   Exits 0 with "PASS: ..." on success; raises assert() (non-zero exit) if
+%   either direction's per-query scaling exponent exceeds the gate.
+%
+%   See also binary_search, binary_search_mex, bench_downsample_kernels.
+
+    here = fileparts(mfilename('fullpath'));
+    addpath(fullfile(here, '..'));
+    install();
+    % binary_search lives in libs/FastSense/ (not a private/ folder), so
+    % install() puts it on the path and it is directly callable here.
+
+    sizes  = [1e4, 1e5, 1e6, 1e7, 5e7];
+    labels = {'10K', '100K', '1M', '10M', '50M'};
+
+    nQueries = 20000;   % scalar lookups timed per (size, direction, run)
+    nRuns    = 5;       % median of nRuns
+
+    % Deterministic seed — works in both MATLAB and Octave
+    if exist('rng', 'file') == 2
+        rng(0);
+    else
+        rand('state', 0); %#ok<RAND>
+    end
+
+    % binary_search_mex lives in libs/FastSense/private. It is visible to
+    % binary_search.m (its parent folder) and is what the wrapper actually
+    % dispatches to — but it is NOT visible from this benchmark's context,
+    % so a plain exist('binary_search_mex','file') here would misreport as a
+    % fallback. Detect the built binary for THIS platform on disk instead.
+    mexPath = fullfile(here, '..', 'libs', 'FastSense', 'private', ...
+        ['binary_search_mex.' mexext]);
+    useMex = (exist(mexPath, 'file') ~= 0);
+
+    nSizes = numel(sizes);
+    tLeft  = zeros(1, nSizes);   % per-query seconds, 'left'
+    tRight = zeros(1, nSizes);   % per-query seconds, 'right'
+
+    fprintf('\n=== binary_search range-lookup microbenchmark ===\n');
+    fprintf('  binary_search_mex: %s\n', tf_(useMex));
+    fprintf('  %d scalar lookups per measurement, median of %d runs\n', nQueries, nRuns);
+    fprintf('  %s\n', repmat('-', 1, 74));
+    fprintf('  %-6s | %-14s %-12s | %-14s %-12s\n', ...
+        'N', 'left (us/q)', 'left Mq/s', 'right (us/q)', 'right Mq/s');
+    fprintf('  %s\n', repmat('-', 1, 74));
+
+    for c = 1:nSizes
+        n = sizes(c);
+        x = linspace(0, 100, n);          % sorted ascending (binary_search contract)
+        vals = 100 * rand(1, nQueries);   % query targets within range (not timed)
+
+        tLeft(c)  = timeSearch_(x, vals, 'left',  nRuns);
+        tRight(c) = timeSearch_(x, vals, 'right', nRuns);
+
+        fprintf('  %-6s | %12.4f   %10.2f   | %12.4f   %10.2f\n', ...
+            labels{c}, ...
+            tLeft(c)  * 1e6, 1 / tLeft(c)  / 1e6, ...
+            tRight(c) * 1e6, 1 / tRight(c) / 1e6);
+
+        clear x vals;
+    end
+    fprintf('  %s\n', repmat('-', 1, 74));
+
+    % ---- Scaling gate: per-query time must stay sub-linear in N ----
+    % Fit over N >= 1e5 (small N is dominated by fixed call/dispatch overhead
+    % and would flatten the slope). O(log N) + cache effects keep the exponent
+    % well under 1.0; a linear-scan regression drives it toward 1.0.
+    fitMask = sizes >= 1e5;
+    slopeLeft  = scalingExponent_(sizes(fitMask), tLeft(fitMask));
+    slopeRight = scalingExponent_(sizes(fitMask), tRight(fitMask));
+    growthLeft = tLeft(end) / max(tLeft(1), eps);
+
+    gate = 0.6;
+    fprintf('  Per-query scaling exponent (large-N fit, linear-scan ~1.0):\n');
+    fprintf('    left  : %.2f   (gate: <= %.1f)\n', slopeLeft, gate);
+    fprintf('    right : %.2f   (gate: <= %.1f)\n', slopeRight, gate);
+    fprintf('    per-query growth 10K->50M (left): %.1fx\n', growthLeft);
+    fprintf('  %s\n', repmat('-', 1, 74));
+
+    assert(slopeLeft <= gate, ...
+        sprintf(['FAIL: binary_search ''left'' per-query exponent %.2f exceeds %.1f — ' ...
+                 'search is no longer logarithmic (linear-scan regression?).'], slopeLeft, gate));
+    assert(slopeRight <= gate, ...
+        sprintf(['FAIL: binary_search ''right'' per-query exponent %.2f exceeds %.1f — ' ...
+                 'search is no longer logarithmic (linear-scan regression?).'], slopeRight, gate));
+    fprintf('  PASS: lookups stay sub-linear (gate: exponent <= %.1f).\n\n', gate);
+end
+
+function t = timeSearch_(x, vals, dir, nRuns)
+    %TIMESEARCH_ Median-of-nRuns per-query time of binary_search over a batch.
+    %   Warms up first, then times nQueries back-to-back scalar lookups per
+    %   run and returns the median run divided by nQueries.
+    nq = numel(vals);
+    binary_search(x, vals(1),   dir); %#ok<*NASGU> % warmup
+    binary_search(x, vals(end), dir);
+    runTimes = zeros(1, nRuns);
+    for r = 1:nRuns
+        t0 = tic;
+        for q = 1:nq
+            binary_search(x, vals(q), dir);
+        end
+        runTimes(r) = toc(t0);
+    end
+    t = median(runTimes) / nq;
+end
+
+function slope = scalingExponent_(ns, times)
+    %SCALINGEXPONENT_ Log-log slope of per-query time vs N (the growth exponent).
+    %   slope -> 0 indicates flat/logarithmic scaling; -> 1 indicates linear.
+    times = max(times, eps);
+    p = polyfit(log10(ns(:)), log10(times(:)), 1);
+    slope = p(1);
+end
+
+function s = tf_(b)
+    if b
+        s = 'active';
+    else
+        s = 'fallback (pure MATLAB)';
+    end
+end