Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .planning/STATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Phase: — (none active; latest shipped = Phase 1041, MERGED via PR #189 on 2026
Plan: —
Milestone: v3.0 FastSense Companion — SHIPPED 2026-04-30; v3.1 Plant Log Integration — SHIPPED 2026-05-19; v4.0 Multi-User LAN Concurrency — SHIPPED 2026-05 via PR #152 (parallel branch); v1.0 perf line — COMPLETE via PR #114. No milestone in flight.
Status: Phase 1041 complete — inline time-range control (toolbar dropdown + Custom date strip) shipped; PR #189 MERGED 2026-06-03. No planned milestone in flight — repo in polish/housekeeping. Outstanding: 12 wiki-bot dup PRs were closed to 1 (#190) + workflow root-caused (260609-mcz); backlog Phase 999.1 (in-app help system) unplanned; ROADMAP v4.0 boxes stale (shipped on main via #152 — router misreports, see memory gsd-router-stale-v4-misroute).
Last activity: 2026-06-10 - PR #197 MERGED (dashboard perf pass ~ idle tick + crash fixes + preview restoration; quick tasks 260609-v5p, 260610-fta, 260610-g0w). Quick task 260610-hwj (review-sweep fixes batch 2: serialization round-trips, gauge MonitorTag construction crash, disk-backed export, listener leaks, marker test helper) shipped via PR #198 from claude/review-fixes-batch2.
Last activity: 2026-06-10 - PRs #197 + #198 MERGED (perf pass ~8x idle tick, crash fixes, serialization round-trips, leak fixes). Quick task 260610-nwa: -ffast-math NaN-detection fix in MEX kernels (ToStepFunctionMex 9 FAIL -> 13/13) + CI perf gates report-only — PR from claude/mex-nan-fastmath.

### Note on parallel v4.0 work (main branch state)

Expand Down Expand Up @@ -106,6 +106,7 @@ Other main PRs (#138, #139, #141, #144, #145, #146) auto-merged without conflict
| 260610-g0w | Perf round 2 (profiler-driven) + latent preview bug: `getPreviewSeries` now derives bucket count from minmax output (260512 bucket-math bumps nb inside the MEX; old exact-shape check silently returned [] → slider previews missing + 0% preview-cache hits in every mex-on-path session, i.e. all test envs + any session that ran add_fastsense_private_path; clean production used the MATLAB fallback and was unaffected). Vectorized getEventMarkers extraction (isprop 8280→280 per 20 ticks); per-class ismethod cache in computeEventMarkers (~6 ms/tick); stale-banner set-skip; TimeRangeSelector isLive_ guards kill 'Invalid or deleted object' mouse-motion spam from chained WindowButtonMotionFcn closures outliving deleted selectors. Profiled idle tick 26.5→22.5 ms with previews now drawing. perf_fixes 10/10, preview_envelope 7/7 (case 6 → adaptive contract), preview_overlay 10/10, range_selector 2/2, time_window 8/8. | 2026-06-10 | e1079c20 | — | [260610-g0w-fix-getpreviewseries-mex-shape-mismatch-](./quick/260610-g0w-fix-getpreviewseries-mex-shape-mismatch-/) |
| 260609-v5p | Speed up DashboardEngine live refresh: data-unchanged fast path in FastSenseWidget.update()/refresh() (fingerprint [n,x1,xend,yend], same append-only contract as PreviewCacheKey_) skips updateData/preview-invalidate/formatTimeAxis on idle ticks; single Tag.getXY per tick (updateTimeRangeCache(x) optional arg); refreshEventMarkers_ O(nE²)→O(nE) isequal diff; computeEventMarkers vectorized accumulators + sortrows-based dedup (max-severity-wins preserved, non-finite sev→1); getEventMarkers preallocation + per-unique-severity color lookup; vectorized formatTimeAxis_. New bench_dashboard_live.m (8 Tag-bound widgets × 50k pts + 200 events): idle tick 281→34 ms (~8×), active tick ~50→30 ms. Verified R2025b: test_dashboard_perf_fixes 9/9 (2 new), preview-envelope 7/7, events-toggle 22/22, time-window 8/8, TestDashboardEngine 18/18, TestDashboardEngineEventMarkers 8/8, TestFastSenseWidgetUpdate 2/2, TestFastSenseWidgetEventMarkers 12/12, TestDashboardDirtyFlag 6/6. Known stale test: flat test_dashboard_engine_event_markers case_render assumes one handle per marker — broken since 260508 color-group batching, fails pre- and post-change identically (Octave mirror that self-skips on Octave). | 2026-06-09 | 8cd6443f, c29be759, cbd66937, 98184f36 | — | [260609-v5p-speed-up-dashboardengine-live-refresh-pa](./quick/260609-v5p-speed-up-dashboardengine-live-refresh-pa/) |
| 260610-hwj | Review-sweep fixes batch 2 (branch claude/review-fixes-batch2, separate PR from the perf pass): GaugeWidget.fromStruct restores Threshold (was Tag — threshold coloring dead after load) + constructor probes for allValues() (pre-v2.0-only method; MonitorTag-bound gauges crashed at construction since the migration); GroupWidget round-trips ExpandedHeight (collapsed groups were stuck collapsed after load); central themeOverride backfill in DashboardWidgetRegistry.fromStruct (dropped on load for every widget except GroupWidget); FastSense exportData routes through lineFullData (disk-backed lines exported empty columns); markerXData test helper parses batched NaN-separated marker polylines (stale since 260508). New test_review_fixes_batch2.m 4/4 R2025b / 3/3+gate Octave 11; event_markers 9/9 (first MATLAB pass ever); SerializerRoundTrip 15/15; Serializer 12/12; toolbar 19/19. | 2026-06-10 | 18387785 | — | [260610-hwj-review-sweep-fixes-batch-2-widget-serial](./quick/260610-hwj-review-sweep-fixes-batch-2-widget-serial/) |
| 260610-nwa | Fix -ffast-math breaking NaN detection in MEX kernels + make CI perf gates report-only. Root cause: -ffast-math implies -ffinite-math-only → compiler folds the IEEE self-compare NaN test (v==v) to true, incl. clang's IR lowering of NEON/AVX compare intrinsics — to_step_function_mex scanned all-NaN as fully active, NaN gaps rendered as solid steps (all 9 TestToStepFunctionMex failures on ARM64 MATLAB). Fix: build_mex.m appends -fno-finite-math-only after every -ffast-math (6 sites); kernel scalar tails use mxIsNaN (opaque — survives MSVC /fp:fast); constraint documented in both files; ARM64 binary rebuilt, refresh-mex-binaries workflow regenerates the rest. Audit: only this kernel used the idiom (build_store already used isnan with a warning comment; violation_cull uses mxIsNaN; minmax/lttb get pre-segmented data). Also: TestTagPerfRegression timing gates now report-only on shared CI runners (3 false failures across 2 unrelated PRs on 2026-06-10; FASTSENSE_PERF_GATES=strict opts back in; hard locally). Verified R2025b: ToStepFunctionMex 13/13 (was 9 FAIL), StateTag 18/18, gate policy both modes. | 2026-06-10 | (PR #199) | — | [260610-nwa-fix-ffast-math-breaking-nan-detection-in](./quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/) |

## Progress Bar

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
quick_id: 260610-nwa
description: Fix -ffast-math breaking NaN detection in MEX kernels + make CI perf gates report-only
date: 2026-06-10
mode: quick-inline
---

# Quick Task 260610-nwa: fast-math NaN fix + CI perf-gate policy

## Task 1 — -ffast-math NaN bug (chip task_88d3d685)
Diagnosed 2026-06-09: all 9 TestToStepFunctionMex failures on macOS ARM64 MATLAB.
-ffast-math implies -ffinite-math-only -> compiler assumes no NaNs -> folds the
IEEE self-compare NaN test (v == v) to true, including clang's IR lowering of
NEON/AVX compare intrinsics. to_step_function_mex scanned all-NaN input as fully
active; NaN gaps in StateTag step functions rendered as solid steps.

Audit: only to_step_function_mex.c used the vulnerable idiom. build_store_mex.c
already used isnan() with a comment warning about this exact hazard;
violation_cull_mex.c uses mxIsNaN; minmax/lttb receive pre-segmented NaN-free data.

Fix: (a) build_mex.m appends -fno-finite-math-only after every -ffast-math
(6 sites; keeps reassociation/FMA wins); (b) scalar tails in the kernel use
mxIsNaN (opaque libmx call — survives any fast-math mode incl. MSVC /fp:fast);
(c) fast-math constraint documented in both files. Local mexmaca64 rebuilt
(both FastSense + SensorThreshold copies); refresh-mex-binaries workflow
triggers on this change and regenerates all platforms.

## Task 2 — CI perf gates report-only
All five TestTagPerfRegression benches gate on wall-clock measurements; shared
GitHub runners produced three false failures across two unrelated PRs on
2026-06-10. invokeBenchOrSkip_ now converts gate trips into assume-skips WITH
the measurement diagnostic when CI is set (unless FASTSENSE_PERF_GATES=strict).
Gates stay hard on developer machines.

## Verification
- to_step_function_mex([1 5 10],[NaN NaN NaN],20) -> empty (was 6 elements).
- TestToStepFunctionMex 13/13 (was 9 failures). TestStateTag 18/18; flat statetag green.
- Gate policy: CI=true simulated in-session -> 3 passed / 2 report-only skips / 0 failed;
CI unset -> hard gates unchanged.
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
quick_id: 260610-nwa
status: complete
date: 2026-06-10
---

# Summary: fast-math NaN fix + CI perf-gate policy

See PLAN.md for the full diagnosis. Both tasks landed:
- build_mex.m: -fno-finite-math-only after every -ffast-math (6 sites) + rationale comment.
- to_step_function_mex.c: scalar tails use mxIsNaN; FAST-MATH CONSTRAINT documented.
- Local ARM64 binary rebuilt (FastSense + SensorThreshold copies); CI refresh workflow
regenerates the other platforms (triggers on build_mex.m / mex_src changes).
- TestTagPerfRegression.invokeBenchOrSkip_: timing gates report-only on CI
(FASTSENSE_PERF_GATES=strict opts back in), hard locally.

Verified live R2025b: TestToStepFunctionMex 13/13 (was 9 FAIL), TestStateTag 18/18,
flat test_statetag green; gate policy verified in both CI and local modes.
Session gotcha hit again: the fastsense_private_proxy temp dir shadowed the rebuilt
binary — first verification ran the STALE copy; refreshed proxies before retesting
(see memory matlab-session-test-gotchas).
21 changes: 15 additions & 6 deletions libs/FastSense/build_mex.m
Original file line number Diff line number Diff line change
Expand Up @@ -105,13 +105,22 @@ function build_mex()

% Set optimization and SIMD flags — MSVC uses /flags while GCC/Clang
% use -flags. The boolean useMSVC distinguishes the two conventions.
%
% -fno-finite-math-only MUST follow -ffast-math (260610-nwa):
% -ffast-math implies -ffinite-math-only, which lets the compiler
% assume no NaNs and fold the IEEE self-compare NaN test (v == v)
% to constant true — clang lowers even NEON/AVX compare intrinsics
% through IR this folding applies to. That silently broke
% to_step_function_mex's NaN-segment scan (all-NaN inputs scanned as
% fully active; NaN gaps rendered as solid steps). The reassociation,
% FMA-contraction and no-signed-zero wins of -ffast-math are kept.
useMSVC = ispc && ~isOctave;
switch arch
case 'x86_64'
if useMSVC
opt_flags = {'/O2', '/arch:AVX2', '/fp:fast'};
else
opt_flags = {'-O3', '-mavx2', '-mfma', '-ftree-vectorize', '-ffast-math'};
opt_flags = {'-O3', '-mavx2', '-mfma', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'};
end
fprintf('SIMD target: AVX2 + FMA\n');
case 'arm64'
Expand All @@ -120,17 +129,17 @@ function build_mex()
opt_flags = {'/O2', '/fp:fast'};
elseif isOctave && ~isempty(compiler)
% GCC on ARM needs explicit CPU target
opt_flags = {'-O3', '-mcpu=apple-m3', '-ftree-vectorize', '-ffast-math'};
opt_flags = {'-O3', '-mcpu=apple-m3', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'};
else
% Clang on Apple Silicon: NEON enabled by default
opt_flags = {'-O3', '-ffast-math'};
opt_flags = {'-O3', '-ffast-math', '-fno-finite-math-only'};
end
fprintf('SIMD target: ARM NEON\n');
otherwise
if useMSVC
opt_flags = {'/O2', '/fp:fast'};
else
opt_flags = {'-O3', '-ffast-math'};
opt_flags = {'-O3', '-ffast-math', '-fno-finite-math-only'};
end
fprintf('SIMD target: scalar fallback\n');
end
Expand Down Expand Up @@ -207,7 +216,7 @@ function build_mex()
if useMSVC
sse_flags = {'/O2', '/arch:SSE2', '/fp:fast'};
else
sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math'};
sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'};
end
compile_mex(src_file, out_name, outDir, include_flag, ...
[sse_flags, extra_flags], compiler, extra_srcs);
Expand Down Expand Up @@ -312,7 +321,7 @@ function build_mex()
if useMSVC
sse_flags = {'/O2', '/arch:SSE2', '/fp:fast'};
else
sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math'};
sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'};
end
compile_mex(srcFile, outName, sensorOutDir, sensorIncFlag, ...
[sse_flags, extraFlags], compiler, extraSrcs);
Expand Down
21 changes: 15 additions & 6 deletions libs/FastSense/private/mex_src/to_step_function_mex.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,17 @@
*
* Algorithm:
* Phase 1: SIMD NaN scan — detect active segments in SIMD_WIDTH chunks
* using self-compare (v == v is false for NaN). Branchless
* conditional store builds the active index array.
* via compare intrinsics (NaN lanes compare unequal to
* themselves). Branchless conditional store builds the
* active index array.
*
* FAST-MATH CONSTRAINT (260610-nwa): this kernel MUST be compiled with
* NaN semantics intact (-fno-finite-math-only after -ffast-math —
* build_mex.m sets this). Under plain -ffast-math the compiler assumes
* no NaNs and folds self-compares — including clang's IR lowering of
* the compare intrinsics — making every NaN segment scan as active.
* Scalar tails use mxIsNaN (an opaque libmx call) so they survive any
* fast-math mode, including MSVC /fp:fast.
* Phase 2: SIMD bulk copy to compute segEnds (shifted segBounds).
* Phase 3: SIMD gap detection — gather prevEnd/currStart pairs and
* compare in SIMD_WIDTH-wide batches.
Expand Down Expand Up @@ -78,7 +87,7 @@ static size_t simd_nan_scan(const double *values, size_t nB,
/* Scalar tail */
for (; i < nB; i++) {
activeIdx[cnt] = (uint32_t)i;
cnt += (values[i] == values[i]); /* false for NaN */
cnt += (size_t)(!mxIsNaN(values[i])); /* opaque call: survives fast-math */
}
return cnt;
}
Expand Down Expand Up @@ -141,7 +150,7 @@ static size_t simd_nan_scan(const double *values, size_t nB,
}
for (; i < nB; i++) {
activeIdx[cnt] = (uint32_t)i;
cnt += (values[i] == values[i]);
cnt += (size_t)(!mxIsNaN(values[i]));
}
return cnt;
}
Expand Down Expand Up @@ -203,7 +212,7 @@ static size_t simd_nan_scan(const double *values, size_t nB,
}
for (; i < nB; i++) {
activeIdx[cnt] = (uint32_t)i;
cnt += (values[i] == values[i]);
cnt += (size_t)(!mxIsNaN(values[i]));
}
return cnt;
}
Expand Down Expand Up @@ -251,7 +260,7 @@ static size_t simd_nan_scan(const double *values, size_t nB,
size_t i;
for (i = 0; i < nB; i++) {
activeIdx[cnt] = (uint32_t)i;
cnt += (values[i] == values[i]);
cnt += (size_t)(!mxIsNaN(values[i]));
}
return cnt;
}
Expand Down
Binary file modified libs/FastSense/private/to_step_function_mex.mexmaca64
Binary file not shown.
Binary file modified libs/SensorThreshold/private/to_step_function_mex.mexmaca64
Binary file not shown.
15 changes: 15 additions & 0 deletions tests/suite/TestTagPerfRegression.m
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,21 @@ function invokeBenchOrSkip_(testCase, benchName)
testCase.assumeFalse(true, sprintf( ...
'%s blocked by pre-existing v2.0-migration bug (%s: %s) — see deferred-items.md', ...
benchName, ex.identifier, ex.message));
elseif ~isempty(getenv('CI')) && ~strcmpi(getenv('FASTSENSE_PERF_GATES'), 'strict')
% Shared-CI-runner policy (260610-nwa): every bench in this suite
% gates on wall-clock measurements (relative overhead, absolute ms,
% or micro-timing ratios). On GitHub's shared runners those gates
% produced three false failures across two unrelated PRs in one
% day (consumer-migration 23-35% on identical code that also
% passed; getxy tripping alongside) — runner noise, not
% regressions. On CI, surface the measurement as a SKIP with the
% full diagnostic instead of failing the run. Gates stay HARD on
% developer machines (no CI env var), and a CI job can opt back
% in with FASTSENSE_PERF_GATES=strict (e.g. on a dedicated or
% self-hosted runner).
testCase.assumeFalse(true, sprintf( ...
'%s perf gate tripped on a shared CI runner (report-only; set FASTSENSE_PERF_GATES=strict to enforce): %s', ...
benchName, ex.message));
else
% Genuine regression — re-throw so the suite fails.
rethrow(ex);
Expand Down
Loading