diff --git a/.planning/STATE.md b/.planning/STATE.md index b7e9e12b..652119ce 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -27,7 +27,7 @@ Phase: — (none active; latest shipped = Phase 1041, MERGED via PR #189 on 2026 Plan: — Milestone: v3.0 FastSense Companion — SHIPPED 2026-04-30; v3.1 Plant Log Integration — SHIPPED 2026-05-19; v4.0 Multi-User LAN Concurrency — SHIPPED 2026-05 via PR #152 (parallel branch); v1.0 perf line — COMPLETE via PR #114. No milestone in flight. Status: Phase 1041 complete — inline time-range control (toolbar dropdown + Custom date strip) shipped; PR #189 MERGED 2026-06-03. No planned milestone in flight — repo in polish/housekeeping. Outstanding: 12 wiki-bot dup PRs were closed to 1 (#190) + workflow root-caused (260609-mcz); backlog Phase 999.1 (in-app help system) unplanned; ROADMAP v4.0 boxes stale (shipped on main via #152 — router misreports, see memory gsd-router-stale-v4-misroute). -Last activity: 2026-06-10 - PR #197 MERGED (dashboard perf pass ~8× idle tick + crash fixes + preview restoration; quick tasks 260609-v5p, 260610-fta, 260610-g0w). Quick task 260610-hwj (review-sweep fixes batch 2: serialization round-trips, gauge MonitorTag construction crash, disk-backed export, listener leaks, marker test helper) shipped via PR #198 from claude/review-fixes-batch2. +Last activity: 2026-06-10 - PRs #197 + #198 MERGED (perf pass ~8x idle tick, crash fixes, serialization round-trips, leak fixes). Quick task 260610-nwa: -ffast-math NaN-detection fix in MEX kernels (ToStepFunctionMex 9 FAIL -> 13/13) + CI perf gates report-only — PR from claude/mex-nan-fastmath. ### Note on parallel v4.0 work (main branch state) @@ -106,6 +106,7 @@ Other main PRs (#138, #139, #141, #144, #145, #146) auto-merged without conflict | 260610-g0w | Perf round 2 (profiler-driven) + latent preview bug: `getPreviewSeries` now derives bucket count from minmax output (260512 bucket-math bumps nb inside the MEX; old exact-shape check silently returned [] → slider previews missing + 0% preview-cache hits in every mex-on-path session, i.e. all test envs + any session that ran add_fastsense_private_path; clean production used the MATLAB fallback and was unaffected). Vectorized getEventMarkers extraction (isprop 8280→280 per 20 ticks); per-class ismethod cache in computeEventMarkers (~6 ms/tick); stale-banner set-skip; TimeRangeSelector isLive_ guards kill 'Invalid or deleted object' mouse-motion spam from chained WindowButtonMotionFcn closures outliving deleted selectors. Profiled idle tick 26.5→22.5 ms with previews now drawing. perf_fixes 10/10, preview_envelope 7/7 (case 6 → adaptive contract), preview_overlay 10/10, range_selector 2/2, time_window 8/8. | 2026-06-10 | e1079c20 | — | [260610-g0w-fix-getpreviewseries-mex-shape-mismatch-](./quick/260610-g0w-fix-getpreviewseries-mex-shape-mismatch-/) | | 260609-v5p | Speed up DashboardEngine live refresh: data-unchanged fast path in FastSenseWidget.update()/refresh() (fingerprint [n,x1,xend,yend], same append-only contract as PreviewCacheKey_) skips updateData/preview-invalidate/formatTimeAxis on idle ticks; single Tag.getXY per tick (updateTimeRangeCache(x) optional arg); refreshEventMarkers_ O(nE²)→O(nE) isequal diff; computeEventMarkers vectorized accumulators + sortrows-based dedup (max-severity-wins preserved, non-finite sev→1); getEventMarkers preallocation + per-unique-severity color lookup; vectorized formatTimeAxis_. New bench_dashboard_live.m (8 Tag-bound widgets × 50k pts + 200 events): idle tick 281→34 ms (~8×), active tick ~50→30 ms. Verified R2025b: test_dashboard_perf_fixes 9/9 (2 new), preview-envelope 7/7, events-toggle 22/22, time-window 8/8, TestDashboardEngine 18/18, TestDashboardEngineEventMarkers 8/8, TestFastSenseWidgetUpdate 2/2, TestFastSenseWidgetEventMarkers 12/12, TestDashboardDirtyFlag 6/6. Known stale test: flat test_dashboard_engine_event_markers case_render assumes one handle per marker — broken since 260508 color-group batching, fails pre- and post-change identically (Octave mirror that self-skips on Octave). | 2026-06-09 | 8cd6443f, c29be759, cbd66937, 98184f36 | — | [260609-v5p-speed-up-dashboardengine-live-refresh-pa](./quick/260609-v5p-speed-up-dashboardengine-live-refresh-pa/) | | 260610-hwj | Review-sweep fixes batch 2 (branch claude/review-fixes-batch2, separate PR from the perf pass): GaugeWidget.fromStruct restores Threshold (was Tag — threshold coloring dead after load) + constructor probes for allValues() (pre-v2.0-only method; MonitorTag-bound gauges crashed at construction since the migration); GroupWidget round-trips ExpandedHeight (collapsed groups were stuck collapsed after load); central themeOverride backfill in DashboardWidgetRegistry.fromStruct (dropped on load for every widget except GroupWidget); FastSense exportData routes through lineFullData (disk-backed lines exported empty columns); markerXData test helper parses batched NaN-separated marker polylines (stale since 260508). New test_review_fixes_batch2.m 4/4 R2025b / 3/3+gate Octave 11; event_markers 9/9 (first MATLAB pass ever); SerializerRoundTrip 15/15; Serializer 12/12; toolbar 19/19. | 2026-06-10 | 18387785 | — | [260610-hwj-review-sweep-fixes-batch-2-widget-serial](./quick/260610-hwj-review-sweep-fixes-batch-2-widget-serial/) | +| 260610-nwa | Fix -ffast-math breaking NaN detection in MEX kernels + make CI perf gates report-only. Root cause: -ffast-math implies -ffinite-math-only → compiler folds the IEEE self-compare NaN test (v==v) to true, incl. clang's IR lowering of NEON/AVX compare intrinsics — to_step_function_mex scanned all-NaN as fully active, NaN gaps rendered as solid steps (all 9 TestToStepFunctionMex failures on ARM64 MATLAB). Fix: build_mex.m appends -fno-finite-math-only after every -ffast-math (6 sites); kernel scalar tails use mxIsNaN (opaque — survives MSVC /fp:fast); constraint documented in both files; ARM64 binary rebuilt, refresh-mex-binaries workflow regenerates the rest. Audit: only this kernel used the idiom (build_store already used isnan with a warning comment; violation_cull uses mxIsNaN; minmax/lttb get pre-segmented data). Also: TestTagPerfRegression timing gates now report-only on shared CI runners (3 false failures across 2 unrelated PRs on 2026-06-10; FASTSENSE_PERF_GATES=strict opts back in; hard locally). Verified R2025b: ToStepFunctionMex 13/13 (was 9 FAIL), StateTag 18/18, gate policy both modes. | 2026-06-10 | (PR #199) | — | [260610-nwa-fix-ffast-math-breaking-nan-detection-in](./quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/) | ## Progress Bar diff --git a/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-PLAN.md b/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-PLAN.md new file mode 100644 index 00000000..6dfea11d --- /dev/null +++ b/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-PLAN.md @@ -0,0 +1,39 @@ +--- +quick_id: 260610-nwa +description: Fix -ffast-math breaking NaN detection in MEX kernels + make CI perf gates report-only +date: 2026-06-10 +mode: quick-inline +--- + +# Quick Task 260610-nwa: fast-math NaN fix + CI perf-gate policy + +## Task 1 — -ffast-math NaN bug (chip task_88d3d685) +Diagnosed 2026-06-09: all 9 TestToStepFunctionMex failures on macOS ARM64 MATLAB. +-ffast-math implies -ffinite-math-only -> compiler assumes no NaNs -> folds the +IEEE self-compare NaN test (v == v) to true, including clang's IR lowering of +NEON/AVX compare intrinsics. to_step_function_mex scanned all-NaN input as fully +active; NaN gaps in StateTag step functions rendered as solid steps. + +Audit: only to_step_function_mex.c used the vulnerable idiom. build_store_mex.c +already used isnan() with a comment warning about this exact hazard; +violation_cull_mex.c uses mxIsNaN; minmax/lttb receive pre-segmented NaN-free data. + +Fix: (a) build_mex.m appends -fno-finite-math-only after every -ffast-math +(6 sites; keeps reassociation/FMA wins); (b) scalar tails in the kernel use +mxIsNaN (opaque libmx call — survives any fast-math mode incl. MSVC /fp:fast); +(c) fast-math constraint documented in both files. Local mexmaca64 rebuilt +(both FastSense + SensorThreshold copies); refresh-mex-binaries workflow +triggers on this change and regenerates all platforms. + +## Task 2 — CI perf gates report-only +All five TestTagPerfRegression benches gate on wall-clock measurements; shared +GitHub runners produced three false failures across two unrelated PRs on +2026-06-10. invokeBenchOrSkip_ now converts gate trips into assume-skips WITH +the measurement diagnostic when CI is set (unless FASTSENSE_PERF_GATES=strict). +Gates stay hard on developer machines. + +## Verification +- to_step_function_mex([1 5 10],[NaN NaN NaN],20) -> empty (was 6 elements). +- TestToStepFunctionMex 13/13 (was 9 failures). TestStateTag 18/18; flat statetag green. +- Gate policy: CI=true simulated in-session -> 3 passed / 2 report-only skips / 0 failed; + CI unset -> hard gates unchanged. diff --git a/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-SUMMARY.md b/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-SUMMARY.md new file mode 100644 index 00000000..10a97f38 --- /dev/null +++ b/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-SUMMARY.md @@ -0,0 +1,21 @@ +--- +quick_id: 260610-nwa +status: complete +date: 2026-06-10 +--- + +# Summary: fast-math NaN fix + CI perf-gate policy + +See PLAN.md for the full diagnosis. Both tasks landed: +- build_mex.m: -fno-finite-math-only after every -ffast-math (6 sites) + rationale comment. +- to_step_function_mex.c: scalar tails use mxIsNaN; FAST-MATH CONSTRAINT documented. +- Local ARM64 binary rebuilt (FastSense + SensorThreshold copies); CI refresh workflow + regenerates the other platforms (triggers on build_mex.m / mex_src changes). +- TestTagPerfRegression.invokeBenchOrSkip_: timing gates report-only on CI + (FASTSENSE_PERF_GATES=strict opts back in), hard locally. + +Verified live R2025b: TestToStepFunctionMex 13/13 (was 9 FAIL), TestStateTag 18/18, +flat test_statetag green; gate policy verified in both CI and local modes. +Session gotcha hit again: the fastsense_private_proxy temp dir shadowed the rebuilt +binary — first verification ran the STALE copy; refreshed proxies before retesting +(see memory matlab-session-test-gotchas). diff --git a/libs/FastSense/build_mex.m b/libs/FastSense/build_mex.m index 72f50b4f..e73b57c0 100644 --- a/libs/FastSense/build_mex.m +++ b/libs/FastSense/build_mex.m @@ -105,13 +105,22 @@ function build_mex() % Set optimization and SIMD flags — MSVC uses /flags while GCC/Clang % use -flags. The boolean useMSVC distinguishes the two conventions. + % + % -fno-finite-math-only MUST follow -ffast-math (260610-nwa): + % -ffast-math implies -ffinite-math-only, which lets the compiler + % assume no NaNs and fold the IEEE self-compare NaN test (v == v) + % to constant true — clang lowers even NEON/AVX compare intrinsics + % through IR this folding applies to. That silently broke + % to_step_function_mex's NaN-segment scan (all-NaN inputs scanned as + % fully active; NaN gaps rendered as solid steps). The reassociation, + % FMA-contraction and no-signed-zero wins of -ffast-math are kept. useMSVC = ispc && ~isOctave; switch arch case 'x86_64' if useMSVC opt_flags = {'/O2', '/arch:AVX2', '/fp:fast'}; else - opt_flags = {'-O3', '-mavx2', '-mfma', '-ftree-vectorize', '-ffast-math'}; + opt_flags = {'-O3', '-mavx2', '-mfma', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'}; end fprintf('SIMD target: AVX2 + FMA\n'); case 'arm64' @@ -120,17 +129,17 @@ function build_mex() opt_flags = {'/O2', '/fp:fast'}; elseif isOctave && ~isempty(compiler) % GCC on ARM needs explicit CPU target - opt_flags = {'-O3', '-mcpu=apple-m3', '-ftree-vectorize', '-ffast-math'}; + opt_flags = {'-O3', '-mcpu=apple-m3', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'}; else % Clang on Apple Silicon: NEON enabled by default - opt_flags = {'-O3', '-ffast-math'}; + opt_flags = {'-O3', '-ffast-math', '-fno-finite-math-only'}; end fprintf('SIMD target: ARM NEON\n'); otherwise if useMSVC opt_flags = {'/O2', '/fp:fast'}; else - opt_flags = {'-O3', '-ffast-math'}; + opt_flags = {'-O3', '-ffast-math', '-fno-finite-math-only'}; end fprintf('SIMD target: scalar fallback\n'); end @@ -207,7 +216,7 @@ function build_mex() if useMSVC sse_flags = {'/O2', '/arch:SSE2', '/fp:fast'}; else - sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math'}; + sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'}; end compile_mex(src_file, out_name, outDir, include_flag, ... [sse_flags, extra_flags], compiler, extra_srcs); @@ -312,7 +321,7 @@ function build_mex() if useMSVC sse_flags = {'/O2', '/arch:SSE2', '/fp:fast'}; else - sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math'}; + sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'}; end compile_mex(srcFile, outName, sensorOutDir, sensorIncFlag, ... [sse_flags, extraFlags], compiler, extraSrcs); diff --git a/libs/FastSense/private/mex_src/to_step_function_mex.c b/libs/FastSense/private/mex_src/to_step_function_mex.c index 618f63af..5ea09ef0 100644 --- a/libs/FastSense/private/mex_src/to_step_function_mex.c +++ b/libs/FastSense/private/mex_src/to_step_function_mex.c @@ -13,8 +13,17 @@ * * Algorithm: * Phase 1: SIMD NaN scan — detect active segments in SIMD_WIDTH chunks - * using self-compare (v == v is false for NaN). Branchless - * conditional store builds the active index array. + * via compare intrinsics (NaN lanes compare unequal to + * themselves). Branchless conditional store builds the + * active index array. + * + * FAST-MATH CONSTRAINT (260610-nwa): this kernel MUST be compiled with + * NaN semantics intact (-fno-finite-math-only after -ffast-math — + * build_mex.m sets this). Under plain -ffast-math the compiler assumes + * no NaNs and folds self-compares — including clang's IR lowering of + * the compare intrinsics — making every NaN segment scan as active. + * Scalar tails use mxIsNaN (an opaque libmx call) so they survive any + * fast-math mode, including MSVC /fp:fast. * Phase 2: SIMD bulk copy to compute segEnds (shifted segBounds). * Phase 3: SIMD gap detection — gather prevEnd/currStart pairs and * compare in SIMD_WIDTH-wide batches. @@ -78,7 +87,7 @@ static size_t simd_nan_scan(const double *values, size_t nB, /* Scalar tail */ for (; i < nB; i++) { activeIdx[cnt] = (uint32_t)i; - cnt += (values[i] == values[i]); /* false for NaN */ + cnt += (size_t)(!mxIsNaN(values[i])); /* opaque call: survives fast-math */ } return cnt; } @@ -141,7 +150,7 @@ static size_t simd_nan_scan(const double *values, size_t nB, } for (; i < nB; i++) { activeIdx[cnt] = (uint32_t)i; - cnt += (values[i] == values[i]); + cnt += (size_t)(!mxIsNaN(values[i])); } return cnt; } @@ -203,7 +212,7 @@ static size_t simd_nan_scan(const double *values, size_t nB, } for (; i < nB; i++) { activeIdx[cnt] = (uint32_t)i; - cnt += (values[i] == values[i]); + cnt += (size_t)(!mxIsNaN(values[i])); } return cnt; } @@ -251,7 +260,7 @@ static size_t simd_nan_scan(const double *values, size_t nB, size_t i; for (i = 0; i < nB; i++) { activeIdx[cnt] = (uint32_t)i; - cnt += (values[i] == values[i]); + cnt += (size_t)(!mxIsNaN(values[i])); } return cnt; } diff --git a/libs/FastSense/private/to_step_function_mex.mexmaca64 b/libs/FastSense/private/to_step_function_mex.mexmaca64 index 6134159d..b47ea0f8 100755 Binary files a/libs/FastSense/private/to_step_function_mex.mexmaca64 and b/libs/FastSense/private/to_step_function_mex.mexmaca64 differ diff --git a/libs/SensorThreshold/private/to_step_function_mex.mexmaca64 b/libs/SensorThreshold/private/to_step_function_mex.mexmaca64 index 6134159d..b47ea0f8 100755 Binary files a/libs/SensorThreshold/private/to_step_function_mex.mexmaca64 and b/libs/SensorThreshold/private/to_step_function_mex.mexmaca64 differ diff --git a/tests/suite/TestTagPerfRegression.m b/tests/suite/TestTagPerfRegression.m index e38db79a..d7c1584c 100644 --- a/tests/suite/TestTagPerfRegression.m +++ b/tests/suite/TestTagPerfRegression.m @@ -74,6 +74,21 @@ function invokeBenchOrSkip_(testCase, benchName) testCase.assumeFalse(true, sprintf( ... '%s blocked by pre-existing v2.0-migration bug (%s: %s) — see deferred-items.md', ... benchName, ex.identifier, ex.message)); + elseif ~isempty(getenv('CI')) && ~strcmpi(getenv('FASTSENSE_PERF_GATES'), 'strict') + % Shared-CI-runner policy (260610-nwa): every bench in this suite + % gates on wall-clock measurements (relative overhead, absolute ms, + % or micro-timing ratios). On GitHub's shared runners those gates + % produced three false failures across two unrelated PRs in one + % day (consumer-migration 23-35% on identical code that also + % passed; getxy tripping alongside) — runner noise, not + % regressions. On CI, surface the measurement as a SKIP with the + % full diagnostic instead of failing the run. Gates stay HARD on + % developer machines (no CI env var), and a CI job can opt back + % in with FASTSENSE_PERF_GATES=strict (e.g. on a dedicated or + % self-hosted runner). + testCase.assumeFalse(true, sprintf( ... + '%s perf gate tripped on a shared CI runner (report-only; set FASTSENSE_PERF_GATES=strict to enforce): %s', ... + benchName, ex.message)); else % Genuine regression — re-throw so the suite fails. rethrow(ex);