Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions benchmarks/.reports/coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Benchmark coverage notes

Tracks what `/bench-evolve` has added and which performance-critical paths
still lack isolated benchmark coverage. Newest entries first.

## Performance-critical surface (ranked) and coverage status

| Path | Why it matters | Coverage |
|------|----------------|----------|
| **Downsampling kernels** (`minmax_downsample` / `lttb_downsample` → `minmax_core_mex` / `lttb_core_mex`) | Runs on every render + every zoom/pan, over the full dataset (≤50M pts). The library's core value. | ✅ `bench_downsample_kernels.m` (isolated, both methods) — *added 2026-06-24*. Also exercised indirectly in `benchmark.m` / `benchmark_zoom.m` / `benchmark_features.m` (render-mixed). |
| **`binary_search`** (`binary_search_mex`) | Range-window lookup on raw full-N sorted arrays; on the resolve path for every zoom/pan + every tag range query (`FastSense.m`, `FastSenseToolbar.m`, `SensorTag.m`). | ✅ `bench_binary_search.m` (isolated, log-scaling gate) — *added 2026-06-24*. |
| **Violation marker path** (`violation_cull` → `violation_cull_mex`; constant + step-function branches) | Fused detect+cull on every threshold render/zoom for thresholds with `ShowViolations` (incl. time-varying step thresholds). | ✅ `bench_violation_cull.m` (isolated, both branches, linear-scaling gate) — *added 2026-06-24*. |
| **Disk range-query** (`FastSenseDataStore.getRange`, `resolve_disk_mex`) | Out-of-core read on every zoom/pan of a disk-backed line. The large-data story's hot read path. | ✅ `bench_datastore_range.m` (fixed-window query, indexed-read gate) — *added 2026-06-24*. Store create/slice still only exploratory (`benchmark_datastore.m` / `profile_datastore.m`). |
| **CSV ingestion** (`dispatchDelimitedParse_` → `delimited_parse_mex`, fallback `readRawDelimited_`) | Front door for raw sensor data into the Tag pipeline; MEX is ~10–40× the textscan fallback. Slow parse = slow load for big logs. | ✅ `bench_delimited_parse.m` (isolated, row-scaling gate) — *added 2026-06-24*. |
| **Pyramid build** (`FastSense.buildPyramidLevel`) | Multi-level pre-downsample cache built at render for large lines (powers O(1) zoom). Full-N at render. | ◐ Partial — it is essentially `minmax_downsample` per level (already gated by `bench_downsample_kernels.m`) + chunked disk reads; only *memory*-benchmarked end-to-end (`benchmark_memory.m`). Low marginal value to isolate; private method. |
| **`to_step_function_mex`** | SIMD step-function conversion — a compiled, deployed, correctness-tested kernel (`TestToStepFunctionMex`). | ⏸️ **DEFERRED** — no confirmed production caller. `MonitorTag.recompute_` emits a binary vector (no step conversion); `StateTag.getXY` is pass-through; only the test suite calls it. The `dispatchDelimitedParse_` comment citing it is stale. **Investigate whether it's still wired into any render path (or is vestigial) before benchmarking.** |
| **Tag layer** (SensorTag/MonitorTag/CompositeTag getXY, resolve, append) | Live-tick recompute path. | ✅ `bench_sensortag_getxy`, `bench_monitortag_tick`, `bench_monitortag_append`, `bench_compositetag_merge`, `bench_consumer_migration_tick`, `bench_tag_pipeline_1k`. |
| **Dashboard refresh / load** | Live dashboard refresh rate. | ✅ `bench_dashboard`, `bench_dashboard_live`, `bench_dashboard_load`. |
| **Full render vs plot(), zoom/pan, memory, features** | End-to-end render comparison. | ✅ `benchmark.m`, `benchmark_zoom.m`, `benchmark_memory.m`, `benchmark_features.m`. |

## Change log

### 2026-06-24 — `bench_downsample_kernels.m`
- **Gap closed:** isolated downsampling-kernel microbenchmark. Previously the
only coverage was a single `minmax_downsample(x,y,1000)` call buried inside
the render-heavy `benchmark.m`; **LTTB had zero coverage anywhere**.
- **What it does:** times `minmax_downsample` and `lttb_downsample` as pure
computation (no figure/render) across a 10K→10M size sweep, same ~2000-pt
output budget for both, reporting per-call ms + throughput (Mpts/s).
- **Gate:** machine-independent — fits the empirical log-log scaling exponent
over the large-N portion and asserts it stays ≤ 1.3 (catches super-linear
creep regardless of host speed).
- **Reaches the private wrappers** by `cd`-ing into `libs/FastSense/private`
(current folder is always searched, even when named `private`) — works in
both MATLAB and Octave, unlike the `addpath(.../private)` trick that
`benchmark.m` uses (Octave-only; MATLAB rejects private dirs on the path).
- **First run (MATLAB R2025b, MEX active):** MinMax ~764 Mpts/s @ 10M,
LTTB ~349 Mpts/s @ 10M; scaling exponents 0.88 / 0.87 → PASS.

### 2026-06-24 — `bench_binary_search.m`
- **Gap closed:** isolated range-lookup microbenchmark. `binary_search` is the
most broadly-used uncovered kernel — the resolve/zoom window lookup in
`FastSense.m` (4060/4103/4178/4460), timestamp lookup (1683), toolbar
click/range, and tag range resolve (`SensorTag.m:152`), on raw full-N sorted
arrays, every zoom/pan. Re-prioritised **above** the violation marker path
this run: `violation_cull` runs on already-downsampled display data
(small-N, per-frame), whereas `binary_search` hits the raw full-N array.
- **What it does:** times 20k scalar `'left'`/`'right'` lookups across a
10K→50M sweep, reporting per-query µs + Mqueries/s.
- **Gate:** machine-independent — fits the per-query log-log exponent over the
large-N portion and asserts it stays ≤ 0.6, catching the catastrophic
O(log N)→O(N) (linear-scan) regression regardless of host speed.
- **MEX detection caveat (baked into the bench):** `binary_search_mex` lives in
`libs/FastSense/private` and is visible to `binary_search.m` (its parent) but
NOT from `benchmarks/`. A plain `exist('binary_search_mex','file')` in the
bench misreports as fallback; the bench instead checks the built binary for
the current platform on disk (`['binary_search_mex.' mexext]`).
- **First run (MATLAB R2025b, MEX active):** ~0.95 µs/query @ 10K → ~1.8 µs @ 50M;
exponent 0.09 (firmly logarithmic), growth 1.9× over the sweep → PASS.

### 2026-06-24 — `bench_violation_cull.m`
- **Gap closed:** isolated threshold-marker microbenchmark. `violation_cull` is
the fused detect+cull kernel called per (threshold x line) on every
render/zoom (`FastSense.m:1368/1371`, `4468/4471`); only
`bench_event_marker_regression.m` touched a neighbouring path before.
- **What it does:** times both threshold branches as pure computation — a
constant threshold (thX=0 sentinel) and a 5-knot step-function threshold —
across a 1K→1M input sweep, reporting per-call ms + throughput. Annotated
that production input is the displayed/downsampled data (~few thousand pts,
the low end); upper sizes verify linear scaling.
- **Gate:** machine-independent — log-log scaling exponent over N >= 1e4 must
stay <= 1.3 (catches super-linear creep in detect+cull).
- **Reaches the private wrapper** via the `cd`-into-`libs/FastSense/private`
trick (see [[benchmarking-private-mex-kernels]]).
- **First run (MATLAB R2025b, MEX active):** constant ~288 Mpts/s @ 1M, step
~261 Mpts/s @ 1M; at the realistic ~1K size both are sub-10 µs. Scaling
exponents 0.93 / 0.92 → PASS.

### 2026-06-24 — `bench_datastore_range.m`
- **Gap closed:** focused, deterministic gate for the disk-backed range-query
path (`FastSenseDataStore.getRange`), which every zoom/pan on a disk-backed
line hits. Previously only exploratory scripts existed (`benchmark_datastore.m`
is a .mat-vs-SQLite sweep and Linux-only — shells out to `free`;
`profile_datastore.m` is a profiler script). No figure needed.
- **What it does:** builds a chunked store at each size, fires fixed-size view
windows (width scaled so each query returns ~10k pts regardless of N), times
`getRange`, and reports create time + per-query ms + queries/s.
- **Gate:** machine-independent — the indexed store must read only the window,
so per-query time must stay ~constant as the dataset grows; asserts the
query-time-vs-total-N exponent <= 0.5 (a full-scan regression → ~1.0).
- **Robustness:** warms up a throwaway store first (absorbs one-time SQLite/MEX
init), and always `cleanup()`s each store (try/catch + post-loop) so temp DBs
never leak even if the gate trips.
- **First run (MATLAB R2025b, mksqlite active):** query time flat at ~0.16 ms
across 100K→5M (50× more data), exponent −0.11, exactly 10002 pts/query → PASS.

### Pivot note this run
Intended target was `to_step_function_mex`, but a fresh survey found it has **no
confirmed production caller** (see table) — benchmarking it would violate the
"path that matters" rule. Deferred it (flagged for investigation) and pivoted to
the disk range-query gate instead.

### 2026-06-24 — `bench_delimited_parse.m`
- **Gap closed:** isolated CSV-ingestion microbenchmark. `delimited_parse_mex`
(via `dispatchDelimitedParse_`) is the parse front door for the Tag pipeline,
documented at ~10–40× the textscan fallback, with zero coverage
(BatchTagPipeline / delimited ingestion was entirely unbenchmarked).
- **What it does:** generates deterministic 4-column CSVs of growing row count,
times `dispatchDelimitedParse_` (file generation excluded), reports parse ms +
rows/s + MB/s. Always deletes its temp files (per-iter + onCleanup backstop).
- **Gate:** machine-independent — log-log row-scaling exponent over rows ≥ 1e4
must stay ≤ 1.3 (catches super-linear parse creep, e.g. O(rows²) realloc).
- **Reaches the private wrapper** via `cd`-into-`libs/SensorThreshold/private`
(see [[benchmarking-private-mex-kernels]]).
- **First run (MATLAB R2025b, MEX active):** ~5.7 M rows/s (~205 MB/s) at 100K–500K
rows; exponent 0.98 (essentially linear) → PASS.

### Pivot notes this run
Two earmarked targets were rejected on fresh survey:
- **Pyramid build** — `buildPyramidLevel` is just `minmax_downsample` per level
(already gated) + chunked reads; private; low marginal value. Downgraded to
◐ Partial in the table, not benchmarked.
- **DerivedTag.recompute_** — thin dispatch around a user-supplied `ComputeFn`
(`[X,Y] = ComputeFn(Parents)`), so a microbench would mostly measure the test
closure, not a FastSense kernel. Deferred unless paired with a built-in
compute/alignment path worth isolating.

### Next gap for the following iteration
Survey fresh, but leading candidates (higher-level paths now that the core MEX
kernels are covered):
- **EventStore persistence scaling** — `EventStore.save` (atomic temp-rename
write) / `load` as event count grows; relevant for long-running live
dashboards. Confirm it isn't already covered by `bench_event_marker_regression`
/ `bench_dashboard_*` (those attach stores but may not stress save/load at scale).
- **LiveEventPipeline per-tick processing** (`processMonitorTag_`) on the live
refresh path — confirm it isn't already covered by `bench_monitortag_tick`.
- Still open: the `to_step_function_mex` wiring question (filed as a background task).
152 changes: 152 additions & 0 deletions benchmarks/bench_binary_search.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
function bench_binary_search()
%BENCH_BINARY_SEARCH Isolated microbenchmark of the range-lookup hot path.
%
% binary_search is the gateway to every range query in FastSense. On each
% zoom/pan and render it locates the visible index window in a raw, sorted,
% full-length X array — FastSense.m (resolve/zoom window, timestamp lookup),
% FastSenseToolbar.m (click-to-point, range select) and SensorTag.m (tag
% range resolve) all call it, against arrays up to tens of millions of
% points. It is MEX-accelerated (binary_search_mex) with a pure-MATLAB
% fallback, yet has no benchmark anywhere.
%
% The cost of any single call is tiny (O(log N) comparisons), so absolute
% throughput is not the point. The point is the GATE: binary search must
% stay logarithmic. If the MEX silently stops loading, or a change turns
% the search into a linear scan, large-data zoom/pan responsiveness
% collapses — and nothing else in the suite would catch it. This benchmark
% times many scalar lookups (both 'left' and 'right') across a wide size
% sweep and asserts the per-query time scales sub-linearly with N.
%
% Per-query time grows only weakly with N (a mix of ~log2(N) comparisons
% and cache-miss penalty as the array spills out of cache), so the
% empirical log-log exponent stays well below the linear-scan exponent of
% ~1.0. The gate (exponent <= 0.6) cleanly separates the two regimes and
% is machine-independent.
%
% Warmup dissolves first-call/JIT overhead; each measurement loops over a
% fixed query batch so per-call dispatch stays representative of production
% (binary_search is always called scalar); median of nRuns defuses spikes.
%
% Run:
% octave --no-gui --eval "install(); bench_binary_search();"
%
% Exits 0 with "PASS: ..." on success; raises assert() (non-zero exit) if
% either direction's per-query scaling exponent exceeds the gate.
%
% See also binary_search, binary_search_mex, bench_downsample_kernels.

here = fileparts(mfilename('fullpath'));
addpath(fullfile(here, '..'));
install();
% binary_search lives in libs/FastSense/ (not a private/ folder), so
% install() puts it on the path and it is directly callable here.

sizes = [1e4, 1e5, 1e6, 1e7, 5e7];
labels = {'10K', '100K', '1M', '10M', '50M'};

nQueries = 20000; % scalar lookups timed per (size, direction, run)
nRuns = 5; % median of nRuns

% Deterministic seed — works in both MATLAB and Octave
if exist('rng', 'file') == 2
rng(0);
else
rand('state', 0); %#ok<RAND>
end

% binary_search_mex lives in libs/FastSense/private. It is visible to
% binary_search.m (its parent folder) and is what the wrapper actually
% dispatches to — but it is NOT visible from this benchmark's context,
% so a plain exist('binary_search_mex','file') here would misreport as a
% fallback. Detect the built binary for THIS platform on disk instead.
mexPath = fullfile(here, '..', 'libs', 'FastSense', 'private', ...
['binary_search_mex.' mexext]);
useMex = (exist(mexPath, 'file') ~= 0);

nSizes = numel(sizes);
tLeft = zeros(1, nSizes); % per-query seconds, 'left'
tRight = zeros(1, nSizes); % per-query seconds, 'right'

fprintf('\n=== binary_search range-lookup microbenchmark ===\n');
fprintf(' binary_search_mex: %s\n', tf_(useMex));
fprintf(' %d scalar lookups per measurement, median of %d runs\n', nQueries, nRuns);
fprintf(' %s\n', repmat('-', 1, 74));
fprintf(' %-6s | %-14s %-12s | %-14s %-12s\n', ...
'N', 'left (us/q)', 'left Mq/s', 'right (us/q)', 'right Mq/s');
fprintf(' %s\n', repmat('-', 1, 74));

for c = 1:nSizes
n = sizes(c);
x = linspace(0, 100, n); % sorted ascending (binary_search contract)
vals = 100 * rand(1, nQueries); % query targets within range (not timed)

tLeft(c) = timeSearch_(x, vals, 'left', nRuns);
tRight(c) = timeSearch_(x, vals, 'right', nRuns);

fprintf(' %-6s | %12.4f %10.2f | %12.4f %10.2f\n', ...
labels{c}, ...
tLeft(c) * 1e6, 1 / tLeft(c) / 1e6, ...
tRight(c) * 1e6, 1 / tRight(c) / 1e6);

clear x vals;
end
fprintf(' %s\n', repmat('-', 1, 74));

% ---- Scaling gate: per-query time must stay sub-linear in N ----
% Fit over N >= 1e5 (small N is dominated by fixed call/dispatch overhead
% and would flatten the slope). O(log N) + cache effects keep the exponent
% well under 1.0; a linear-scan regression drives it toward 1.0.
fitMask = sizes >= 1e5;
slopeLeft = scalingExponent_(sizes(fitMask), tLeft(fitMask));
slopeRight = scalingExponent_(sizes(fitMask), tRight(fitMask));
growthLeft = tLeft(end) / max(tLeft(1), eps);

gate = 0.6;
fprintf(' Per-query scaling exponent (large-N fit, linear-scan ~1.0):\n');
fprintf(' left : %.2f (gate: <= %.1f)\n', slopeLeft, gate);
fprintf(' right : %.2f (gate: <= %.1f)\n', slopeRight, gate);
fprintf(' per-query growth 10K->50M (left): %.1fx\n', growthLeft);
fprintf(' %s\n', repmat('-', 1, 74));

assert(slopeLeft <= gate, ...
sprintf(['FAIL: binary_search ''left'' per-query exponent %.2f exceeds %.1f — ' ...
'search is no longer logarithmic (linear-scan regression?).'], slopeLeft, gate));
assert(slopeRight <= gate, ...
sprintf(['FAIL: binary_search ''right'' per-query exponent %.2f exceeds %.1f — ' ...
'search is no longer logarithmic (linear-scan regression?).'], slopeRight, gate));
fprintf(' PASS: lookups stay sub-linear (gate: exponent <= %.1f).\n\n', gate);
end

function t = timeSearch_(x, vals, dir, nRuns)
%TIMESEARCH_ Median-of-nRuns per-query time of binary_search over a batch.
% Warms up first, then times nQueries back-to-back scalar lookups per
% run and returns the median run divided by nQueries.
nq = numel(vals);
binary_search(x, vals(1), dir); %#ok<*NASGU> % warmup
binary_search(x, vals(end), dir);
runTimes = zeros(1, nRuns);
for r = 1:nRuns
t0 = tic;
for q = 1:nq
binary_search(x, vals(q), dir);
end
runTimes(r) = toc(t0);
end
t = median(runTimes) / nq;
end

function slope = scalingExponent_(ns, times)
%SCALINGEXPONENT_ Log-log slope of per-query time vs N (the growth exponent).
% slope -> 0 indicates flat/logarithmic scaling; -> 1 indicates linear.
times = max(times, eps);
p = polyfit(log10(ns(:)), log10(times(:)), 1);
slope = p(1);
end

function s = tf_(b)
if b
s = 'active';
else
s = 'fallback (pure MATLAB)';
end
end
Loading