From 95e203db83b8700a63c7825aa8e9b0c12b57534b Mon Sep 17 00:00:00 2001 From: Hannes Suhr Date: Wed, 24 Jun 2026 18:40:43 +0200 Subject: [PATCH] test(benchmarks): add isolated microbenchmarks for core hot-path kernels Grows benchmark coverage over the performance-critical surface with five focused, deterministic bench_*.m files, each isolating one hot-path kernel as pure computation (no figure/render) and guarding it with a machine-independent scaling gate: - bench_downsample_kernels.m MinMax + LTTB downsampling (minmax_core_mex / lttb_core_mex). LTTB had zero coverage anywhere before this. - bench_binary_search.m range-window lookup (binary_search_mex); log-scaling gate catches an O(log N) -> O(N) regression. - bench_violation_cull.m fused threshold-marker detect+cull (violation_cull_mex), constant + step-function branches. - bench_datastore_range.m disk-backed range query (FastSenseDataStore.getRange); gate asserts fixed-window query time stays ~constant as the dataset grows (catches a full-scan regression). - bench_delimited_parse.m CSV ingestion (delimited_parse_mex); row-scaling gate. Tag-pipeline ingestion was previously unbenchmarked. Each follows the existing bench_*.m house style and reaches private wrappers by cd-ing into the owning private/ folder (works in both MATLAB and Octave, unlike addpath of a private dir). benchmarks/.reports/coverage.md records the ranked surface, what each run added, and the remaining gaps so coverage keeps expanding toward what matters. No library/production code changed; new files only. Co-Authored-By: Claude Opus 4.8 --- benchmarks/.reports/coverage.md | 137 +++++++++++++++++++ benchmarks/bench_binary_search.m | 152 +++++++++++++++++++++ benchmarks/bench_datastore_range.m | 158 ++++++++++++++++++++++ benchmarks/bench_delimited_parse.m | 182 ++++++++++++++++++++++++++ benchmarks/bench_downsample_kernels.m | 163 +++++++++++++++++++++++ benchmarks/bench_violation_cull.m | 174 ++++++++++++++++++++++++ 6 files changed, 966 insertions(+) create mode 100644 benchmarks/.reports/coverage.md create mode 100644 benchmarks/bench_binary_search.m create mode 100644 benchmarks/bench_datastore_range.m create mode 100644 benchmarks/bench_delimited_parse.m create mode 100644 benchmarks/bench_downsample_kernels.m create mode 100644 benchmarks/bench_violation_cull.m diff --git a/benchmarks/.reports/coverage.md b/benchmarks/.reports/coverage.md new file mode 100644 index 00000000..b1198353 --- /dev/null +++ b/benchmarks/.reports/coverage.md @@ -0,0 +1,137 @@ +# Benchmark coverage notes + +Tracks what `/bench-evolve` has added and which performance-critical paths +still lack isolated benchmark coverage. Newest entries first. + +## Performance-critical surface (ranked) and coverage status + +| Path | Why it matters | Coverage | +|------|----------------|----------| +| **Downsampling kernels** (`minmax_downsample` / `lttb_downsample` → `minmax_core_mex` / `lttb_core_mex`) | Runs on every render + every zoom/pan, over the full dataset (≤50M pts). The library's core value. | ✅ `bench_downsample_kernels.m` (isolated, both methods) — *added 2026-06-24*. Also exercised indirectly in `benchmark.m` / `benchmark_zoom.m` / `benchmark_features.m` (render-mixed). | +| **`binary_search`** (`binary_search_mex`) | Range-window lookup on raw full-N sorted arrays; on the resolve path for every zoom/pan + every tag range query (`FastSense.m`, `FastSenseToolbar.m`, `SensorTag.m`). | ✅ `bench_binary_search.m` (isolated, log-scaling gate) — *added 2026-06-24*. | +| **Violation marker path** (`violation_cull` → `violation_cull_mex`; constant + step-function branches) | Fused detect+cull on every threshold render/zoom for thresholds with `ShowViolations` (incl. time-varying step thresholds). | ✅ `bench_violation_cull.m` (isolated, both branches, linear-scaling gate) — *added 2026-06-24*. | +| **Disk range-query** (`FastSenseDataStore.getRange`, `resolve_disk_mex`) | Out-of-core read on every zoom/pan of a disk-backed line. The large-data story's hot read path. | ✅ `bench_datastore_range.m` (fixed-window query, indexed-read gate) — *added 2026-06-24*. Store create/slice still only exploratory (`benchmark_datastore.m` / `profile_datastore.m`). | +| **CSV ingestion** (`dispatchDelimitedParse_` → `delimited_parse_mex`, fallback `readRawDelimited_`) | Front door for raw sensor data into the Tag pipeline; MEX is ~10–40× the textscan fallback. Slow parse = slow load for big logs. | ✅ `bench_delimited_parse.m` (isolated, row-scaling gate) — *added 2026-06-24*. | +| **Pyramid build** (`FastSense.buildPyramidLevel`) | Multi-level pre-downsample cache built at render for large lines (powers O(1) zoom). Full-N at render. | ◐ Partial — it is essentially `minmax_downsample` per level (already gated by `bench_downsample_kernels.m`) + chunked disk reads; only *memory*-benchmarked end-to-end (`benchmark_memory.m`). Low marginal value to isolate; private method. | +| **`to_step_function_mex`** | SIMD step-function conversion — a compiled, deployed, correctness-tested kernel (`TestToStepFunctionMex`). | ⏸️ **DEFERRED** — no confirmed production caller. `MonitorTag.recompute_` emits a binary vector (no step conversion); `StateTag.getXY` is pass-through; only the test suite calls it. The `dispatchDelimitedParse_` comment citing it is stale. **Investigate whether it's still wired into any render path (or is vestigial) before benchmarking.** | +| **Tag layer** (SensorTag/MonitorTag/CompositeTag getXY, resolve, append) | Live-tick recompute path. | ✅ `bench_sensortag_getxy`, `bench_monitortag_tick`, `bench_monitortag_append`, `bench_compositetag_merge`, `bench_consumer_migration_tick`, `bench_tag_pipeline_1k`. | +| **Dashboard refresh / load** | Live dashboard refresh rate. | ✅ `bench_dashboard`, `bench_dashboard_live`, `bench_dashboard_load`. | +| **Full render vs plot(), zoom/pan, memory, features** | End-to-end render comparison. | ✅ `benchmark.m`, `benchmark_zoom.m`, `benchmark_memory.m`, `benchmark_features.m`. | + +## Change log + +### 2026-06-24 — `bench_downsample_kernels.m` +- **Gap closed:** isolated downsampling-kernel microbenchmark. Previously the + only coverage was a single `minmax_downsample(x,y,1000)` call buried inside + the render-heavy `benchmark.m`; **LTTB had zero coverage anywhere**. +- **What it does:** times `minmax_downsample` and `lttb_downsample` as pure + computation (no figure/render) across a 10K→10M size sweep, same ~2000-pt + output budget for both, reporting per-call ms + throughput (Mpts/s). +- **Gate:** machine-independent — fits the empirical log-log scaling exponent + over the large-N portion and asserts it stays ≤ 1.3 (catches super-linear + creep regardless of host speed). +- **Reaches the private wrappers** by `cd`-ing into `libs/FastSense/private` + (current folder is always searched, even when named `private`) — works in + both MATLAB and Octave, unlike the `addpath(.../private)` trick that + `benchmark.m` uses (Octave-only; MATLAB rejects private dirs on the path). +- **First run (MATLAB R2025b, MEX active):** MinMax ~764 Mpts/s @ 10M, + LTTB ~349 Mpts/s @ 10M; scaling exponents 0.88 / 0.87 → PASS. + +### 2026-06-24 — `bench_binary_search.m` +- **Gap closed:** isolated range-lookup microbenchmark. `binary_search` is the + most broadly-used uncovered kernel — the resolve/zoom window lookup in + `FastSense.m` (4060/4103/4178/4460), timestamp lookup (1683), toolbar + click/range, and tag range resolve (`SensorTag.m:152`), on raw full-N sorted + arrays, every zoom/pan. Re-prioritised **above** the violation marker path + this run: `violation_cull` runs on already-downsampled display data + (small-N, per-frame), whereas `binary_search` hits the raw full-N array. +- **What it does:** times 20k scalar `'left'`/`'right'` lookups across a + 10K→50M sweep, reporting per-query µs + Mqueries/s. +- **Gate:** machine-independent — fits the per-query log-log exponent over the + large-N portion and asserts it stays ≤ 0.6, catching the catastrophic + O(log N)→O(N) (linear-scan) regression regardless of host speed. +- **MEX detection caveat (baked into the bench):** `binary_search_mex` lives in + `libs/FastSense/private` and is visible to `binary_search.m` (its parent) but + NOT from `benchmarks/`. A plain `exist('binary_search_mex','file')` in the + bench misreports as fallback; the bench instead checks the built binary for + the current platform on disk (`['binary_search_mex.' mexext]`). +- **First run (MATLAB R2025b, MEX active):** ~0.95 µs/query @ 10K → ~1.8 µs @ 50M; + exponent 0.09 (firmly logarithmic), growth 1.9× over the sweep → PASS. + +### 2026-06-24 — `bench_violation_cull.m` +- **Gap closed:** isolated threshold-marker microbenchmark. `violation_cull` is + the fused detect+cull kernel called per (threshold x line) on every + render/zoom (`FastSense.m:1368/1371`, `4468/4471`); only + `bench_event_marker_regression.m` touched a neighbouring path before. +- **What it does:** times both threshold branches as pure computation — a + constant threshold (thX=0 sentinel) and a 5-knot step-function threshold — + across a 1K→1M input sweep, reporting per-call ms + throughput. Annotated + that production input is the displayed/downsampled data (~few thousand pts, + the low end); upper sizes verify linear scaling. +- **Gate:** machine-independent — log-log scaling exponent over N >= 1e4 must + stay <= 1.3 (catches super-linear creep in detect+cull). +- **Reaches the private wrapper** via the `cd`-into-`libs/FastSense/private` + trick (see [[benchmarking-private-mex-kernels]]). +- **First run (MATLAB R2025b, MEX active):** constant ~288 Mpts/s @ 1M, step + ~261 Mpts/s @ 1M; at the realistic ~1K size both are sub-10 µs. Scaling + exponents 0.93 / 0.92 → PASS. + +### 2026-06-24 — `bench_datastore_range.m` +- **Gap closed:** focused, deterministic gate for the disk-backed range-query + path (`FastSenseDataStore.getRange`), which every zoom/pan on a disk-backed + line hits. Previously only exploratory scripts existed (`benchmark_datastore.m` + is a .mat-vs-SQLite sweep and Linux-only — shells out to `free`; + `profile_datastore.m` is a profiler script). No figure needed. +- **What it does:** builds a chunked store at each size, fires fixed-size view + windows (width scaled so each query returns ~10k pts regardless of N), times + `getRange`, and reports create time + per-query ms + queries/s. +- **Gate:** machine-independent — the indexed store must read only the window, + so per-query time must stay ~constant as the dataset grows; asserts the + query-time-vs-total-N exponent <= 0.5 (a full-scan regression → ~1.0). +- **Robustness:** warms up a throwaway store first (absorbs one-time SQLite/MEX + init), and always `cleanup()`s each store (try/catch + post-loop) so temp DBs + never leak even if the gate trips. +- **First run (MATLAB R2025b, mksqlite active):** query time flat at ~0.16 ms + across 100K→5M (50× more data), exponent −0.11, exactly 10002 pts/query → PASS. + +### Pivot note this run +Intended target was `to_step_function_mex`, but a fresh survey found it has **no +confirmed production caller** (see table) — benchmarking it would violate the +"path that matters" rule. Deferred it (flagged for investigation) and pivoted to +the disk range-query gate instead. + +### 2026-06-24 — `bench_delimited_parse.m` +- **Gap closed:** isolated CSV-ingestion microbenchmark. `delimited_parse_mex` + (via `dispatchDelimitedParse_`) is the parse front door for the Tag pipeline, + documented at ~10–40× the textscan fallback, with zero coverage + (BatchTagPipeline / delimited ingestion was entirely unbenchmarked). +- **What it does:** generates deterministic 4-column CSVs of growing row count, + times `dispatchDelimitedParse_` (file generation excluded), reports parse ms + + rows/s + MB/s. Always deletes its temp files (per-iter + onCleanup backstop). +- **Gate:** machine-independent — log-log row-scaling exponent over rows ≥ 1e4 + must stay ≤ 1.3 (catches super-linear parse creep, e.g. O(rows²) realloc). +- **Reaches the private wrapper** via `cd`-into-`libs/SensorThreshold/private` + (see [[benchmarking-private-mex-kernels]]). +- **First run (MATLAB R2025b, MEX active):** ~5.7 M rows/s (~205 MB/s) at 100K–500K + rows; exponent 0.98 (essentially linear) → PASS. + +### Pivot notes this run +Two earmarked targets were rejected on fresh survey: +- **Pyramid build** — `buildPyramidLevel` is just `minmax_downsample` per level + (already gated) + chunked reads; private; low marginal value. Downgraded to + ◐ Partial in the table, not benchmarked. +- **DerivedTag.recompute_** — thin dispatch around a user-supplied `ComputeFn` + (`[X,Y] = ComputeFn(Parents)`), so a microbench would mostly measure the test + closure, not a FastSense kernel. Deferred unless paired with a built-in + compute/alignment path worth isolating. + +### Next gap for the following iteration +Survey fresh, but leading candidates (higher-level paths now that the core MEX +kernels are covered): +- **EventStore persistence scaling** — `EventStore.save` (atomic temp-rename + write) / `load` as event count grows; relevant for long-running live + dashboards. Confirm it isn't already covered by `bench_event_marker_regression` + / `bench_dashboard_*` (those attach stores but may not stress save/load at scale). +- **LiveEventPipeline per-tick processing** (`processMonitorTag_`) on the live + refresh path — confirm it isn't already covered by `bench_monitortag_tick`. +- Still open: the `to_step_function_mex` wiring question (filed as a background task). diff --git a/benchmarks/bench_binary_search.m b/benchmarks/bench_binary_search.m new file mode 100644 index 00000000..a6921229 --- /dev/null +++ b/benchmarks/bench_binary_search.m @@ -0,0 +1,152 @@ +function bench_binary_search() +%BENCH_BINARY_SEARCH Isolated microbenchmark of the range-lookup hot path. +% +% binary_search is the gateway to every range query in FastSense. On each +% zoom/pan and render it locates the visible index window in a raw, sorted, +% full-length X array — FastSense.m (resolve/zoom window, timestamp lookup), +% FastSenseToolbar.m (click-to-point, range select) and SensorTag.m (tag +% range resolve) all call it, against arrays up to tens of millions of +% points. It is MEX-accelerated (binary_search_mex) with a pure-MATLAB +% fallback, yet has no benchmark anywhere. +% +% The cost of any single call is tiny (O(log N) comparisons), so absolute +% throughput is not the point. The point is the GATE: binary search must +% stay logarithmic. If the MEX silently stops loading, or a change turns +% the search into a linear scan, large-data zoom/pan responsiveness +% collapses — and nothing else in the suite would catch it. This benchmark +% times many scalar lookups (both 'left' and 'right') across a wide size +% sweep and asserts the per-query time scales sub-linearly with N. +% +% Per-query time grows only weakly with N (a mix of ~log2(N) comparisons +% and cache-miss penalty as the array spills out of cache), so the +% empirical log-log exponent stays well below the linear-scan exponent of +% ~1.0. The gate (exponent <= 0.6) cleanly separates the two regimes and +% is machine-independent. +% +% Warmup dissolves first-call/JIT overhead; each measurement loops over a +% fixed query batch so per-call dispatch stays representative of production +% (binary_search is always called scalar); median of nRuns defuses spikes. +% +% Run: +% octave --no-gui --eval "install(); bench_binary_search();" +% +% Exits 0 with "PASS: ..." on success; raises assert() (non-zero exit) if +% either direction's per-query scaling exponent exceeds the gate. +% +% See also binary_search, binary_search_mex, bench_downsample_kernels. + + here = fileparts(mfilename('fullpath')); + addpath(fullfile(here, '..')); + install(); + % binary_search lives in libs/FastSense/ (not a private/ folder), so + % install() puts it on the path and it is directly callable here. + + sizes = [1e4, 1e5, 1e6, 1e7, 5e7]; + labels = {'10K', '100K', '1M', '10M', '50M'}; + + nQueries = 20000; % scalar lookups timed per (size, direction, run) + nRuns = 5; % median of nRuns + + % Deterministic seed — works in both MATLAB and Octave + if exist('rng', 'file') == 2 + rng(0); + else + rand('state', 0); %#ok + end + + % binary_search_mex lives in libs/FastSense/private. It is visible to + % binary_search.m (its parent folder) and is what the wrapper actually + % dispatches to — but it is NOT visible from this benchmark's context, + % so a plain exist('binary_search_mex','file') here would misreport as a + % fallback. Detect the built binary for THIS platform on disk instead. + mexPath = fullfile(here, '..', 'libs', 'FastSense', 'private', ... + ['binary_search_mex.' mexext]); + useMex = (exist(mexPath, 'file') ~= 0); + + nSizes = numel(sizes); + tLeft = zeros(1, nSizes); % per-query seconds, 'left' + tRight = zeros(1, nSizes); % per-query seconds, 'right' + + fprintf('\n=== binary_search range-lookup microbenchmark ===\n'); + fprintf(' binary_search_mex: %s\n', tf_(useMex)); + fprintf(' %d scalar lookups per measurement, median of %d runs\n', nQueries, nRuns); + fprintf(' %s\n', repmat('-', 1, 74)); + fprintf(' %-6s | %-14s %-12s | %-14s %-12s\n', ... + 'N', 'left (us/q)', 'left Mq/s', 'right (us/q)', 'right Mq/s'); + fprintf(' %s\n', repmat('-', 1, 74)); + + for c = 1:nSizes + n = sizes(c); + x = linspace(0, 100, n); % sorted ascending (binary_search contract) + vals = 100 * rand(1, nQueries); % query targets within range (not timed) + + tLeft(c) = timeSearch_(x, vals, 'left', nRuns); + tRight(c) = timeSearch_(x, vals, 'right', nRuns); + + fprintf(' %-6s | %12.4f %10.2f | %12.4f %10.2f\n', ... + labels{c}, ... + tLeft(c) * 1e6, 1 / tLeft(c) / 1e6, ... + tRight(c) * 1e6, 1 / tRight(c) / 1e6); + + clear x vals; + end + fprintf(' %s\n', repmat('-', 1, 74)); + + % ---- Scaling gate: per-query time must stay sub-linear in N ---- + % Fit over N >= 1e5 (small N is dominated by fixed call/dispatch overhead + % and would flatten the slope). O(log N) + cache effects keep the exponent + % well under 1.0; a linear-scan regression drives it toward 1.0. + fitMask = sizes >= 1e5; + slopeLeft = scalingExponent_(sizes(fitMask), tLeft(fitMask)); + slopeRight = scalingExponent_(sizes(fitMask), tRight(fitMask)); + growthLeft = tLeft(end) / max(tLeft(1), eps); + + gate = 0.6; + fprintf(' Per-query scaling exponent (large-N fit, linear-scan ~1.0):\n'); + fprintf(' left : %.2f (gate: <= %.1f)\n', slopeLeft, gate); + fprintf(' right : %.2f (gate: <= %.1f)\n', slopeRight, gate); + fprintf(' per-query growth 10K->50M (left): %.1fx\n', growthLeft); + fprintf(' %s\n', repmat('-', 1, 74)); + + assert(slopeLeft <= gate, ... + sprintf(['FAIL: binary_search ''left'' per-query exponent %.2f exceeds %.1f — ' ... + 'search is no longer logarithmic (linear-scan regression?).'], slopeLeft, gate)); + assert(slopeRight <= gate, ... + sprintf(['FAIL: binary_search ''right'' per-query exponent %.2f exceeds %.1f — ' ... + 'search is no longer logarithmic (linear-scan regression?).'], slopeRight, gate)); + fprintf(' PASS: lookups stay sub-linear (gate: exponent <= %.1f).\n\n', gate); +end + +function t = timeSearch_(x, vals, dir, nRuns) + %TIMESEARCH_ Median-of-nRuns per-query time of binary_search over a batch. + % Warms up first, then times nQueries back-to-back scalar lookups per + % run and returns the median run divided by nQueries. + nq = numel(vals); + binary_search(x, vals(1), dir); %#ok<*NASGU> % warmup + binary_search(x, vals(end), dir); + runTimes = zeros(1, nRuns); + for r = 1:nRuns + t0 = tic; + for q = 1:nq + binary_search(x, vals(q), dir); + end + runTimes(r) = toc(t0); + end + t = median(runTimes) / nq; +end + +function slope = scalingExponent_(ns, times) + %SCALINGEXPONENT_ Log-log slope of per-query time vs N (the growth exponent). + % slope -> 0 indicates flat/logarithmic scaling; -> 1 indicates linear. + times = max(times, eps); + p = polyfit(log10(ns(:)), log10(times(:)), 1); + slope = p(1); +end + +function s = tf_(b) + if b + s = 'active'; + else + s = 'fallback (pure MATLAB)'; + end +end diff --git a/benchmarks/bench_datastore_range.m b/benchmarks/bench_datastore_range.m new file mode 100644 index 00000000..231c0633 --- /dev/null +++ b/benchmarks/bench_datastore_range.m @@ -0,0 +1,158 @@ +function bench_datastore_range() +%BENCH_DATASTORE_RANGE Isolated gate for the disk-backed range-query hot path. +% +% FastSenseDataStore is FastSense's out-of-core backend: datasets too large +% for RAM live in a chunked SQLite store, and every zoom/pan on a +% disk-backed line issues a range query (getRange) to pull just the visible +% window before downsampling. This is the large-data story's hot read path +% (resolve_disk_mex + chunked SQLite reads, WAL mode for live use). +% +% Today the store has only EXPLORATORY coverage: benchmark_datastore.m +% (a .mat-vs-SQLite size sweep, and Linux-only — it shells out to `free`) +% and profile_datastore.m (a MATLAB-profiler bottleneck script). Neither is +% a focused, deterministic regression GATE. This benchmark fills that gap. +% +% The key property a chunked, indexed store must preserve: for a FIXED-size +% view window, query latency should stay roughly CONSTANT as the total +% dataset grows — the store seeks to the window (≈ O(log N) index/chunk +% lookup) and reads only the window's points, never the whole dataset. To +% hold the returned point count constant across sizes, the window width is +% scaled inversely with dataset density (each query returns ~targetPts +% points regardless of N). +% +% Gate (machine-independent): the log-log exponent of per-query time vs +% TOTAL dataset size must stay near zero (<= 0.5). A full-scan regression — +% where query cost grows with the whole dataset rather than the window — +% drives the exponent toward 1.0 and trips the gate, regardless of host +% speed or which backend (SQLite vs binary fallback) is active. +% +% Store creation time (inherently O(N) chunked write) is reported for +% context but NOT gated. +% +% Run: +% octave --no-gui --eval "install(); bench_datastore_range();" +% +% Exits 0 with "PASS: ..." on success; raises assert() (non-zero exit) if +% range-query latency scales with total dataset size. +% +% See also FastSenseDataStore, benchmark_datastore, profile_datastore, +% bench_binary_search. + + here = fileparts(mfilename('fullpath')); + addpath(fullfile(here, '..')); + install(); + + sizes = [1e5, 5e5, 1e6, 5e6]; + labels = {'100K', '500K', '1M', '5M'}; + + targetPts = 10000; % each range query returns ~this many points + nQueries = 30; % random view windows per measurement + nRuns = 3; % median of nRuns + + xSpan = 1000; % data spans X in [0, xSpan] + + % Deterministic seed — works in both MATLAB and Octave + if exist('rng', 'file') == 2 + rng(0); + else + rand('state', 0); randn('state', 0); %#ok + end + + hasSqlite = (exist('mksqlite', 'file') == 3); + + % Warmup: absorb one-time SQLite/MEX/file-creation init on a throwaway + % store so the first sized store below isn't penalised (which would bias + % the scaling fit negative and inflate its create time). + wx = linspace(0, xSpan, 1000); + wds = FastSenseDataStore(wx, sin(wx / 50)); + wds.getRange(0, xSpan / 10); + wds.cleanup(); + clear wx wds; + + nSizes = numel(sizes); + tQuery = zeros(1, nSizes); % per-query seconds + tCreate = zeros(1, nSizes); % store creation seconds + avgPts = zeros(1, nSizes); % avg points returned per query + + fprintf('\n=== FastSenseDataStore range-query microbenchmark ===\n'); + fprintf(' backend: %s\n', backend_(hasSqlite)); + fprintf(' fixed view window ~%d pts, %d queries x median of %d runs\n', ... + targetPts, nQueries, nRuns); + fprintf(' %s\n', repmat('-', 1, 76)); + fprintf(' %-6s | %-12s | %-14s %-12s | %-10s\n', ... + 'N', 'create (s)', 'query (ms)', 'queries/s', 'pts/query'); + fprintf(' %s\n', repmat('-', 1, 76)); + + for c = 1:nSizes + n = sizes(c); + x = linspace(0, xSpan, n); + y = sin(x / 50) + 0.1 * randn(1, n); + + % Window width that returns ~targetPts points at this density + w = max(xSpan * targetPts / n, eps); + centers = (w / 2) + (xSpan - w) * rand(1, nQueries); + + t0 = tic; + ds = FastSenseDataStore(x, y); + tCreate(c) = toc(t0); + clear x y; + + try + % Warmup + measure average returned point count + [wx, ~] = ds.getRange(centers(1) - w/2, centers(1) + w/2); + ds.getRange(centers(2) - w/2, centers(2) + w/2); + avgPts(c) = numel(wx); + + runTimes = zeros(1, nRuns); + for r = 1:nRuns + tq = tic; + for q = 1:nQueries + ds.getRange(centers(q) - w/2, centers(q) + w/2); + end + runTimes(r) = toc(tq); + end + tQuery(c) = median(runTimes) / nQueries; + catch err + ds.cleanup(); % never leak the temp store on failure + rethrow(err); + end + ds.cleanup(); % release SQLite handle + temp file before next size + + fprintf(' %-6s | %10.3f | %12.4f %10.1f | %9.0f\n', ... + labels{c}, tCreate(c), tQuery(c) * 1000, 1 / tQuery(c), avgPts(c)); + end + fprintf(' %s\n', repmat('-', 1, 76)); + + % ---- Gate: fixed-window query time must NOT scale with total dataset ---- + slope = scalingExponent_(sizes, tQuery); + growth = tQuery(end) / max(tQuery(1), eps); + + gate = 0.5; + fprintf(' Query-time vs total-N exponent (indexed read ~0, full scan ~1.0):\n'); + fprintf(' exponent : %.2f (gate: <= %.1f)\n', slope, gate); + fprintf(' 100K->5M query-time growth: %.2fx (50x more data)\n', growth); + fprintf(' %s\n', repmat('-', 1, 76)); + + assert(slope <= gate, ... + sprintf(['FAIL: getRange per-query time scales with total dataset size ' ... + '(exponent %.2f > %.1f) — fixed-window queries should be ~constant; ' ... + 'full-scan / unindexed-read regression suspected.'], slope, gate)); + fprintf(' PASS: fixed-window range queries stay ~constant vs dataset size (exponent <= %.1f).\n\n', gate); +end + +function slope = scalingExponent_(ns, times) + %SCALINGEXPONENT_ Log-log slope of per-query time vs total dataset size. + % slope -> 0 indicates the indexed store reads only the window; + % slope -> 1 indicates cost grows with the whole dataset (full scan). + times = max(times, eps); + p = polyfit(log10(ns(:)), log10(times(:)), 1); + slope = p(1); +end + +function s = backend_(hasSqlite) + if hasSqlite + s = 'mksqlite/SQLite (chunked)'; + else + s = 'binary fallback (mksqlite absent)'; + end +end diff --git a/benchmarks/bench_delimited_parse.m b/benchmarks/bench_delimited_parse.m new file mode 100644 index 00000000..f0bce1c0 --- /dev/null +++ b/benchmarks/bench_delimited_parse.m @@ -0,0 +1,182 @@ +function bench_delimited_parse() +%BENCH_DELIMITED_PARSE Isolated microbenchmark of the CSV-ingestion hot path. +% +% The Tag pipeline ingests raw sensor data from delimited text (CSV/TSV) +% files. dispatchDelimitedParse_ is the parse entry point: it prefers the +% compiled delimited_parse_mex kernel and falls back to the pure +% MATLAB/Octave textscan-based readRawDelimited_ when the binary is absent. +% Per the in-repo note (Phase 1028), the MEX is ~10-40x faster than the +% fallback at harness scale — yet BatchTagPipeline / delimited ingestion has +% no benchmark at all. This is the front door for getting data into +% FastSense, and slow parsing directly inflates load time for large logs. +% +% This benchmark generates deterministic multi-column CSV files of growing +% row count, times dispatchDelimitedParse_ on each (the whichever-is-active +% path — MEX or fallback, reported), and reports parse latency plus row and +% byte throughput. File generation is done once per size and is NOT timed. +% +% Gate (machine-independent): delimited parsing is an O(rows) sweep, so the +% empirical log-log scaling exponent over the large-N portion must stay +% sub-quadratic (<= 1.3). Super-linear creep — e.g. an accidental O(rows^2) +% reallocation in the fallback, or per-row overhead growth — trips the gate +% regardless of host speed. +% +% Warmup parse dissolves first-call/JIT overhead; small files are parsed +% over an inner repeat loop so sub-millisecond parses stay measurable; +% median of nRuns defuses one-off spikes. Temp files are always deleted. +% +% Run: +% octave --no-gui --eval "install(); bench_delimited_parse();" +% +% Exits 0 with "PASS: ..." on success; raises assert() (non-zero exit) if +% parse time scales super-linearly with row count. +% +% See also dispatchDelimitedParse_, readRawDelimited_, delimited_parse_mex, +% BatchTagPipeline, bench_datastore_range. + + here = fileparts(mfilename('fullpath')); + addpath(fullfile(here, '..')); + install(); + % dispatchDelimitedParse_ / delimited_parse_mex / readRawDelimited_ live in + % SensorThreshold's private/ folder, which cannot be put on the path. The + % current working folder is always searched regardless of its name, so + % cd-ing into private makes them directly callable in both MATLAB and + % Octave (and makes the exist() MEX check accurate). onCleanup restores the + % original folder even if an assert below trips. + privDir = fullfile(here, '..', 'libs', 'SensorThreshold', 'private'); + origDir = pwd; + restoreDir = onCleanup(@() cd(origDir)); %#ok + cd(privDir); + + rows = [1e3, 1e4, 1e5, 5e5]; + labels = {'1K', '10K', '100K', '500K'}; + nCols = 4; % time + 3 value columns (a modest "wide" sensor CSV) + + nRuns = 5; % median of nRuns per size + targetRows = 2e5; % inner-loop repeats sized to parse ~this many rows + + % Deterministic seed — works in both MATLAB and Octave + if exist('rng', 'file') == 2 + rng(0); + else + rand('state', 0); randn('state', 0); %#ok + end + + useMex = (exist('delimited_parse_mex', 'file') == 3); + + nSizes = numel(rows); + tParse = zeros(1, nSizes); % per-parse seconds + fileMB = zeros(1, nSizes); + + % Track temp files so they are always cleaned up, even on a gate failure. + tmpFiles = {}; + cleanupTmp = onCleanup(@() deleteFiles_(tmpFiles)); %#ok + + fprintf('\n=== Delimited-parse (CSV ingestion) microbenchmark ===\n'); + fprintf(' delimited_parse_mex: %s\n', tf_(useMex)); + fprintf(' %d columns (time + %d values), median of %d runs\n', nCols, nCols - 1, nRuns); + fprintf(' %s\n', repmat('-', 1, 72)); + fprintf(' %-6s | %-9s | %-13s %-12s %-10s\n', ... + 'rows', 'file MB', 'parse (ms)', 'rows/s (M)', 'MB/s'); + fprintf(' %s\n', repmat('-', 1, 72)); + + for c = 1:nSizes + n = rows(c); + path = [tempname, '.csv']; + tmpFiles{end+1} = path; %#ok + writeCsv_(path, n, nCols); + d = dir(path); + fileMB(c) = d.bytes / 1e6; + + nInner = max(1, ceil(targetRows / n)); + tParse(c) = timeParse_(path, nInner, nRuns); + + fprintf(' %-6s | %8.2f | %11.4f %10.2f %8.1f\n', ... + labels{c}, fileMB(c), ... + tParse(c) * 1000, n / tParse(c) / 1e6, fileMB(c) / tParse(c)); + + delete(path); % free disk eagerly between sizes + tmpFiles{c} = ''; % already gone — don't double-delete + end + fprintf(' %s\n', repmat('-', 1, 72)); + + % ---- Scaling gate: fit exponent over the large-N portion (>= 1e4) ---- + fitMask = rows >= 1e4; + slope = scalingExponent_(rows(fitMask), tParse(fitMask)); + + gate = 1.3; + fprintf(' Scaling exponent (large-N fit, ideal ~1.0): %.2f (gate: <= %.1f)\n', slope, gate); + fprintf(' %s\n', repmat('-', 1, 72)); + + assert(slope <= gate, ... + sprintf(['FAIL: delimited parse scaling exponent %.2f exceeds %.1f — ' ... + 'super-linear creep in the CSV-ingestion path.'], slope, gate)); + fprintf(' PASS: parsing scales near-linearly (gate: exponent <= %.1f).\n\n', gate); +end + +function writeCsv_(path, n, nCols) + %WRITECSV_ Write a deterministic n-row, nCols-column CSV with a header. + % Column 1 is a monotonic time axis; remaining columns are smooth + % signals plus light noise. Generation is intentionally outside the + % timed region. + x = linspace(0, 1000, n); + M = zeros(n, nCols); + M(:, 1) = x(:); + for k = 2:nCols + M(:, k) = sin(x(:) / (10 * k)) + 0.1 * randn(n, 1); + end + + fid = fopen(path, 'w'); + if fid == -1 + error('bench:fileOpen', 'Cannot open temp file for writing: %s', path); + end + closer = onCleanup(@() fclose(fid)); %#ok + + hdr = 't'; + for k = 2:nCols + hdr = [hdr, sprintf(',c%d', k - 1)]; %#ok + end + fprintf(fid, '%s\n', hdr); + + rowFmt = ['%.6g', repmat(',%.6g', 1, nCols - 1), '\n']; + fprintf(fid, rowFmt, M.'); % transpose: fprintf consumes column-major +end + +function t = timeParse_(path, nInner, nRuns) + %TIMEPARSE_ Median-of-nRuns per-parse time of dispatchDelimitedParse_. + dispatchDelimitedParse_(path); % warmup (also primes OS file cache) + runTimes = zeros(1, nRuns); + for r = 1:nRuns + t0 = tic; + for i = 1:nInner + dispatchDelimitedParse_(path); + end + runTimes(r) = toc(t0); + end + t = median(runTimes) / nInner; +end + +function slope = scalingExponent_(ns, times) + %SCALINGEXPONENT_ Log-log slope of per-parse time vs row count. + times = max(times, eps); + p = polyfit(log10(ns(:)), log10(times(:)), 1); + slope = p(1); +end + +function deleteFiles_(files) + %DELETEFILES_ Best-effort cleanup of any temp files still present. + for i = 1:numel(files) + f = files{i}; + if ~isempty(f) && exist(f, 'file') + delete(f); + end + end +end + +function s = tf_(b) + if b + s = 'active'; + else + s = 'fallback (pure MATLAB/Octave)'; + end +end diff --git a/benchmarks/bench_downsample_kernels.m b/benchmarks/bench_downsample_kernels.m new file mode 100644 index 00000000..992c5162 --- /dev/null +++ b/benchmarks/bench_downsample_kernels.m @@ -0,0 +1,163 @@ +function bench_downsample_kernels() +%BENCH_DOWNSAMPLE_KERNELS Isolated microbenchmark of the downsampling hot path. +% +% Downsampling is the single most performance-critical computation in +% FastSense: minmax_downsample / lttb_downsample run on every render and +% on every zoom/pan, over the full dataset (up to tens of millions of +% points). They are the reason the library exists. Yet the only existing +% coverage is a single minmax_downsample(x, y, 1000) call buried inside +% the render-heavy benchmark.m — mixed with figure creation and drawnow, +% never isolated, and LTTB is not benchmarked anywhere at all. +% +% This benchmark times BOTH downsamplers as PURE computation (no figure, +% no rendering) across a size sweep, reporting per-call latency and +% throughput (Mpts/s). With the MEX kernels compiled it exercises +% minmax_core_mex / lttb_core_mex (the production path); without them it +% transparently times the pure-MATLAB fallbacks (a flag reports which). +% +% Both methods are driven to the same output budget (~2000 points, a +% realistic display width) so their throughput is directly comparable. +% +% Gate (machine-independent): downsampling is an O(N) sweep, so per-call +% time must scale near-linearly with N. The benchmark fits the empirical +% scaling exponent over the large-N portion of the sweep (where O(N) +% dominates measurement noise) and asserts it stays sub-linear-ish +% (exponent <= 1.3). Super-linear creep — the classic downsampling +% regression — trips this gate regardless of absolute machine speed. +% +% Warmup passes dissolve JIT first-call overhead; small sizes are timed +% over an inner repeat loop so sub-millisecond calls stay measurable; +% median of nRuns defuses one-off spikes. +% +% Run: +% octave --no-gui --eval "install(); bench_downsample_kernels();" +% +% Exits 0 with "PASS: ..." on success; raises assert() (non-zero exit) if +% either kernel's empirical scaling exponent exceeds the gate. +% +% See also minmax_downsample, lttb_downsample, benchmark, benchmark_zoom. + + here = fileparts(mfilename('fullpath')); + addpath(fullfile(here, '..')); + install(); + % minmax_downsample / lttb_downsample live in FastSense's private/ folder. + % A private/ folder cannot be put on the path (Octave permits it but + % MATLAB rejects it), so the wrappers are not callable from here. The + % current working folder is ALWAYS searched regardless of its name, + % however, so cd-ing into the private folder makes them directly callable + % in both MATLAB and Octave — no path manipulation, no touching libs/. + % onCleanup restores the original folder even if an assert below trips. + privDir = fullfile(here, '..', 'libs', 'FastSense', 'private'); + origDir = pwd; + restoreDir = onCleanup(@() cd(origDir)); %#ok + cd(privDir); + + sizes = [1e4, 1e5, 1e6, 5e6, 1e7]; + labels = {'10K', '100K', '1M', '5M', '10M'}; + + % Equal output budget so the two methods are directly comparable: + % minmax emits ~2*numBuckets points -> numBuckets = 1000 -> ~2000 pts + % lttb emits numOut points -> numOut = 2000 -> 2000 pts + minmaxBuckets = 1000; + lttbOut = 2000; + + nRuns = 5; % median of nRuns per (method, size) + targetWork = 2e6; % inner-loop repeats sized to process ~this many pts + + % Deterministic seed — works in both MATLAB and Octave + if exist('rng', 'file') == 2 + rng(0); + else + rand('state', 0); randn('state', 0); %#ok + end + + mexMinmax = (exist('minmax_core_mex', 'file') == 3); + mexLttb = (exist('lttb_core_mex', 'file') == 3); + + nSizes = numel(sizes); + tMinmax = zeros(1, nSizes); % per-call seconds + tLttb = zeros(1, nSizes); + + fprintf('\n=== Downsampling kernel microbenchmark (pure computation) ===\n'); + fprintf(' MinMax MEX: %s | LTTB MEX: %s\n', tf_(mexMinmax), tf_(mexLttb)); + fprintf(' Output budget: minmax numBuckets=%d (~%d pts) lttb numOut=%d\n', ... + minmaxBuckets, 2 * minmaxBuckets, lttbOut); + fprintf(' %s\n', repmat('-', 1, 74)); + fprintf(' %-6s | %-13s %-12s | %-13s %-12s\n', ... + 'N', 'MinMax (ms)', 'MinMax Mpts/s', 'LTTB (ms)', 'LTTB Mpts/s'); + fprintf(' %s\n', repmat('-', 1, 74)); + + for c = 1:nSizes + n = sizes(c); + x = linspace(0, 100, n); + y = sin(x * 2 * pi / 10) + 0.5 * randn(1, n); + + nInner = max(1, ceil(targetWork / n)); + + tMinmax(c) = timeCall_(@() minmax_downsample(x, y, minmaxBuckets), nInner, nRuns); + tLttb(c) = timeCall_(@() lttb_downsample(x, y, lttbOut), nInner, nRuns); + + fprintf(' %-6s | %11.3f %10.1f | %11.3f %10.1f\n', ... + labels{c}, ... + tMinmax(c) * 1000, n / tMinmax(c) / 1e6, ... + tLttb(c) * 1000, n / tLttb(c) / 1e6); + + clear x y; + end + fprintf(' %s\n', repmat('-', 1, 74)); + + % ---- Scaling gate: fit exponent over the large-N portion (>= 1e5) ---- + % Small N is dominated by fixed dispatch/allocation overhead and would + % bias the slope; restrict the fit to where the O(N) sweep dominates. + fitMask = sizes >= 1e5; + slopeMinmax = scalingExponent_(sizes(fitMask), tMinmax(fitMask)); + slopeLttb = scalingExponent_(sizes(fitMask), tLttb(fitMask)); + + gate = 1.3; + fprintf(' Scaling exponent (large-N fit, ideal ~1.0):\n'); + fprintf(' MinMax : %.2f (gate: <= %.1f)\n', slopeMinmax, gate); + fprintf(' LTTB : %.2f (gate: <= %.1f)\n', slopeLttb, gate); + fprintf(' %s\n', repmat('-', 1, 74)); + + assert(slopeMinmax <= gate, ... + sprintf(['FAIL: minmax_downsample scaling exponent %.2f exceeds %.1f — ' ... + 'super-linear creep in the downsampling hot path.'], slopeMinmax, gate)); + assert(slopeLttb <= gate, ... + sprintf(['FAIL: lttb_downsample scaling exponent %.2f exceeds %.1f — ' ... + 'super-linear creep in the downsampling hot path.'], slopeLttb, gate)); + fprintf(' PASS: both kernels scale near-linearly (gate: exponent <= %.1f).\n\n', gate); +end + +function t = timeCall_(fn, nInner, nRuns) + %TIMECALL_ Median-of-nRuns per-call time of fn, averaged over nInner reps. + % Warms up first to dissolve JIT/first-call overhead, then times nInner + % back-to-back calls per run and returns the median run divided by + % nInner — a robust per-call estimate that keeps sub-ms calls measurable. + fn(); fn(); % warmup + runTimes = zeros(1, nRuns); + for r = 1:nRuns + t0 = tic; + for i = 1:nInner + fn(); + end + runTimes(r) = toc(t0); + end + t = median(runTimes) / nInner; +end + +function slope = scalingExponent_(ns, times) + %SCALINGEXPONENT_ Log-log slope of per-call time vs N (the O(N) exponent). + % slope ~ 1.0 indicates linear scaling; > 1 indicates super-linear creep. + % Guards against a degenerate fit when timings are too small to resolve. + times = max(times, eps); + p = polyfit(log10(ns(:)), log10(times(:)), 1); + slope = p(1); +end + +function s = tf_(b) + if b + s = 'active'; + else + s = 'fallback (pure MATLAB)'; + end +end diff --git a/benchmarks/bench_violation_cull.m b/benchmarks/bench_violation_cull.m new file mode 100644 index 00000000..6d9d53ae --- /dev/null +++ b/benchmarks/bench_violation_cull.m @@ -0,0 +1,174 @@ +function bench_violation_cull() +%BENCH_VIOLATION_CULL Isolated microbenchmark of the threshold-marker hot path. +% +% violation_cull is the fused detect-and-cull kernel behind threshold +% violation markers. On every render and every zoom/pan, FastSense calls it +% once per (threshold x line) for each threshold with ShowViolations: it +% finds the points that cross the threshold and culls them to one marker per +% pixel column in a single pass (FastSense.m:1368/1371, 4468/4471). It is +% MEX-accelerated (violation_cull_mex) with a pure-MATLAB fallback, and +% handles both constant thresholds and time-varying (step-function) +% thresholds — the latter a recent feature (per-widget time-varying spec). +% No benchmark exercises it directly; only bench_event_marker_regression +% touches a neighbouring render path (getEventsForTag). +% +% This benchmark times BOTH threshold branches as pure computation (no +% figure, no rendering): a constant threshold (thX = 0 sentinel) and a +% multi-knot step-function threshold, across an input-size sweep, reporting +% per-call latency and throughput (input Mpts/s). +% +% In production the input is the line's DISPLAYED (downsampled) data — +% typically a few thousand points (~2 x pixel width). The lower sizes here +% bracket that realistic range; the larger sizes exist to verify the kernel +% scales linearly with input length (the regression we actually guard). +% +% Gate (machine-independent): detection + culling is an O(N) sweep, so the +% empirical log-log scaling exponent over the large-N portion must stay +% sub-quadratic (<= 1.3). Super-linear creep trips the gate regardless of +% absolute host speed. +% +% Warmup dissolves JIT first-call overhead; small sizes are timed over an +% inner repeat loop so sub-millisecond calls stay measurable; median of +% nRuns defuses one-off spikes. +% +% Run: +% octave --no-gui --eval "install(); bench_violation_cull();" +% +% Exits 0 with "PASS: ..." on success; raises assert() (non-zero exit) if +% either branch's scaling exponent exceeds the gate. +% +% See also violation_cull, compute_violations, compute_violations_dynamic, +% downsample_violations, bench_downsample_kernels. + + here = fileparts(mfilename('fullpath')); + addpath(fullfile(here, '..')); + install(); + % violation_cull (and violation_cull_mex) live in FastSense's private/ + % folder, which cannot be put on the path. The current working folder is + % always searched regardless of its name, so cd-ing into private makes the + % wrapper directly callable in both MATLAB and Octave. onCleanup restores + % the original folder even if an assert below trips. + privDir = fullfile(here, '..', 'libs', 'FastSense', 'private'); + origDir = pwd; + restoreDir = onCleanup(@() cd(origDir)); %#ok + cd(privDir); + + sizes = [1e3, 1e4, 1e5, 1e6]; + labels = {'1K', '10K', '100K', '1M'}; + + nRuns = 5; % median of nRuns per (branch, size) + targetWork = 2e6; % inner-loop repeats sized to process ~this many pts + + % Deterministic seed — works in both MATLAB and Octave + if exist('rng', 'file') == 2 + rng(0); + else + rand('state', 0); randn('state', 0); %#ok + end + + useMex = (exist('violation_cull_mex', 'file') == 3); + + % Threshold configuration. Signal oscillates ~[-1.5, 1.5]; an upper + % threshold at 0.5 yields a healthy fraction of violations so the culling + % stage does real work. The step-function branch uses 5 knots across the + % X range to exercise the piecewise-constant interpolation path. + direction = 'upper'; + constLevel = 0.5; + PixelWidth = 1000; % nominal axis width in pixels + stepKnotsN = 5; + + nSizes = numel(sizes); + tConst = zeros(1, nSizes); % per-call seconds, constant threshold + tStep = zeros(1, nSizes); % per-call seconds, step-function threshold + + fprintf('\n=== violation_cull threshold-marker microbenchmark (pure computation) ===\n'); + fprintf(' violation_cull_mex: %s\n', tf_(useMex)); + fprintf(' direction=%s constLevel=%.2f stepKnots=%d pixelWidth=%d\n', ... + direction, constLevel, stepKnotsN, PixelWidth); + fprintf(' (production input = displayed/downsampled data, ~few thousand pts)\n'); + fprintf(' %s\n', repmat('-', 1, 74)); + fprintf(' %-6s | %-13s %-12s | %-13s %-12s\n', ... + 'N', 'const (ms)', 'const Mpts/s', 'step (ms)', 'step Mpts/s'); + fprintf(' %s\n', repmat('-', 1, 74)); + + for c = 1:nSizes + n = sizes(c); + x = linspace(0, 100, n); % sorted ascending + y = sin(x * 2 * pi / 10) + 0.5 * randn(1, n); % ~[-1.5, 1.5] + + pw = (x(end) - x(1)) / PixelWidth; % X units per pixel + xmin = x(1); + + % Step-function threshold: knots across the X range, varying levels + thX = linspace(x(1), x(end), stepKnotsN); + thY = constLevel + 0.2 * sin(1:stepKnotsN); + + nInner = max(1, ceil(targetWork / n)); + + % Constant threshold uses the thX = 0 sentinel (matches FastSense.m) + tConst(c) = timeCall_(@() violation_cull(x, y, 0, constLevel, direction, pw, xmin), ... + nInner, nRuns); + tStep(c) = timeCall_(@() violation_cull(x, y, thX, thY, direction, pw, xmin), ... + nInner, nRuns); + + fprintf(' %-6s | %11.4f %10.1f | %11.4f %10.1f\n', ... + labels{c}, ... + tConst(c) * 1000, n / tConst(c) / 1e6, ... + tStep(c) * 1000, n / tStep(c) / 1e6); + + clear x y; + end + fprintf(' %s\n', repmat('-', 1, 74)); + + % ---- Scaling gate: fit exponent over the large-N portion (>= 1e4) ---- + fitMask = sizes >= 1e4; + slopeConst = scalingExponent_(sizes(fitMask), tConst(fitMask)); + slopeStep = scalingExponent_(sizes(fitMask), tStep(fitMask)); + + gate = 1.3; + fprintf(' Scaling exponent (large-N fit, ideal ~1.0):\n'); + fprintf(' constant : %.2f (gate: <= %.1f)\n', slopeConst, gate); + fprintf(' step : %.2f (gate: <= %.1f)\n', slopeStep, gate); + fprintf(' %s\n', repmat('-', 1, 74)); + + assert(slopeConst <= gate, ... + sprintf(['FAIL: violation_cull (constant) scaling exponent %.2f exceeds %.1f — ' ... + 'super-linear creep in the threshold-marker path.'], slopeConst, gate)); + assert(slopeStep <= gate, ... + sprintf(['FAIL: violation_cull (step) scaling exponent %.2f exceeds %.1f — ' ... + 'super-linear creep in the threshold-marker path.'], slopeStep, gate)); + fprintf(' PASS: both branches scale near-linearly (gate: exponent <= %.1f).\n\n', gate); +end + +function t = timeCall_(fn, nInner, nRuns) + %TIMECALL_ Median-of-nRuns per-call time of fn, averaged over nInner reps. + % Warms up first to dissolve JIT/first-call overhead, then times nInner + % back-to-back calls per run and returns the median run divided by + % nInner — a robust per-call estimate that keeps sub-ms calls measurable. + fn(); fn(); % warmup + runTimes = zeros(1, nRuns); + for r = 1:nRuns + t0 = tic; + for i = 1:nInner + fn(); + end + runTimes(r) = toc(t0); + end + t = median(runTimes) / nInner; +end + +function slope = scalingExponent_(ns, times) + %SCALINGEXPONENT_ Log-log slope of per-call time vs N (the O(N) exponent). + % slope ~ 1.0 indicates linear scaling; > 1 indicates super-linear creep. + times = max(times, eps); + p = polyfit(log10(ns(:)), log10(times(:)), 1); + slope = p(1); +end + +function s = tf_(b) + if b + s = 'active'; + else + s = 'fallback (pure MATLAB)'; + end +end