diff --git a/.planning/STATE.md b/.planning/STATE.md
index b7e9e12b..652119ce 100644
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -27,7 +27,7 @@ Phase: — (none active; latest shipped = Phase 1041, MERGED via PR #189 on 2026
 Plan: —
 Milestone: v3.0 FastSense Companion — SHIPPED 2026-04-30; v3.1 Plant Log Integration — SHIPPED 2026-05-19; v4.0 Multi-User LAN Concurrency — SHIPPED 2026-05 via PR #152 (parallel branch); v1.0 perf line — COMPLETE via PR #114. No milestone in flight.
 Status: Phase 1041 complete — inline time-range control (toolbar dropdown + Custom date strip) shipped; PR #189 MERGED 2026-06-03. No planned milestone in flight — repo in polish/housekeeping. Outstanding: 12 wiki-bot dup PRs were closed to 1 (#190) + workflow root-caused (260609-mcz); backlog Phase 999.1 (in-app help system) unplanned; ROADMAP v4.0 boxes stale (shipped on main via #152 — router misreports, see memory gsd-router-stale-v4-misroute).
-Last activity: 2026-06-10 - PR #197 MERGED (dashboard perf pass ~8× idle tick + crash fixes + preview restoration; quick tasks 260609-v5p, 260610-fta, 260610-g0w). Quick task 260610-hwj (review-sweep fixes batch 2: serialization round-trips, gauge MonitorTag construction crash, disk-backed export, listener leaks, marker test helper) shipped via PR #198 from claude/review-fixes-batch2.
+Last activity: 2026-06-10 - PRs #197 + #198 MERGED (perf pass ~8x idle tick, crash fixes, serialization round-trips, leak fixes). Quick task 260610-nwa: -ffast-math NaN-detection fix in MEX kernels (ToStepFunctionMex 9 FAIL -> 13/13) + CI perf gates report-only — PR from claude/mex-nan-fastmath.
 
 ### Note on parallel v4.0 work (main branch state)
 
@@ -106,6 +106,7 @@ Other main PRs (#138, #139, #141, #144, #145, #146) auto-merged without conflict
 | 260610-g0w | Perf round 2 (profiler-driven) + latent preview bug: `getPreviewSeries` now derives bucket count from minmax output (260512 bucket-math bumps nb inside the MEX; old exact-shape check silently returned [] → slider previews missing + 0% preview-cache hits in every mex-on-path session, i.e. all test envs + any session that ran add_fastsense_private_path; clean production used the MATLAB fallback and was unaffected). Vectorized getEventMarkers extraction (isprop 8280→280 per 20 ticks); per-class ismethod cache in computeEventMarkers (~6 ms/tick); stale-banner set-skip; TimeRangeSelector isLive_ guards kill 'Invalid or deleted object' mouse-motion spam from chained WindowButtonMotionFcn closures outliving deleted selectors. Profiled idle tick 26.5→22.5 ms with previews now drawing. perf_fixes 10/10, preview_envelope 7/7 (case 6 → adaptive contract), preview_overlay 10/10, range_selector 2/2, time_window 8/8. | 2026-06-10 | e1079c20 | — | [260610-g0w-fix-getpreviewseries-mex-shape-mismatch-](./quick/260610-g0w-fix-getpreviewseries-mex-shape-mismatch-/) |
 | 260609-v5p | Speed up DashboardEngine live refresh: data-unchanged fast path in FastSenseWidget.update()/refresh() (fingerprint [n,x1,xend,yend], same append-only contract as PreviewCacheKey_) skips updateData/preview-invalidate/formatTimeAxis on idle ticks; single Tag.getXY per tick (updateTimeRangeCache(x) optional arg); refreshEventMarkers_ O(nE²)→O(nE) isequal diff; computeEventMarkers vectorized accumulators + sortrows-based dedup (max-severity-wins preserved, non-finite sev→1); getEventMarkers preallocation + per-unique-severity color lookup; vectorized formatTimeAxis_. New bench_dashboard_live.m (8 Tag-bound widgets × 50k pts + 200 events): idle tick 281→34 ms (~8×), active tick ~50→30 ms. Verified R2025b: test_dashboard_perf_fixes 9/9 (2 new), preview-envelope 7/7, events-toggle 22/22, time-window 8/8, TestDashboardEngine 18/18, TestDashboardEngineEventMarkers 8/8, TestFastSenseWidgetUpdate 2/2, TestFastSenseWidgetEventMarkers 12/12, TestDashboardDirtyFlag 6/6. Known stale test: flat test_dashboard_engine_event_markers case_render assumes one handle per marker — broken since 260508 color-group batching, fails pre- and post-change identically (Octave mirror that self-skips on Octave). | 2026-06-09 | 8cd6443f, c29be759, cbd66937, 98184f36 | — | [260609-v5p-speed-up-dashboardengine-live-refresh-pa](./quick/260609-v5p-speed-up-dashboardengine-live-refresh-pa/) |
 | 260610-hwj | Review-sweep fixes batch 2 (branch claude/review-fixes-batch2, separate PR from the perf pass): GaugeWidget.fromStruct restores Threshold (was Tag — threshold coloring dead after load) + constructor probes for allValues() (pre-v2.0-only method; MonitorTag-bound gauges crashed at construction since the migration); GroupWidget round-trips ExpandedHeight (collapsed groups were stuck collapsed after load); central themeOverride backfill in DashboardWidgetRegistry.fromStruct (dropped on load for every widget except GroupWidget); FastSense exportData routes through lineFullData (disk-backed lines exported empty columns); markerXData test helper parses batched NaN-separated marker polylines (stale since 260508). New test_review_fixes_batch2.m 4/4 R2025b / 3/3+gate Octave 11; event_markers 9/9 (first MATLAB pass ever); SerializerRoundTrip 15/15; Serializer 12/12; toolbar 19/19. | 2026-06-10 | 18387785 | — | [260610-hwj-review-sweep-fixes-batch-2-widget-serial](./quick/260610-hwj-review-sweep-fixes-batch-2-widget-serial/) |
+| 260610-nwa | Fix -ffast-math breaking NaN detection in MEX kernels + make CI perf gates report-only. Root cause: -ffast-math implies -ffinite-math-only → compiler folds the IEEE self-compare NaN test (v==v) to true, incl. clang's IR lowering of NEON/AVX compare intrinsics — to_step_function_mex scanned all-NaN as fully active, NaN gaps rendered as solid steps (all 9 TestToStepFunctionMex failures on ARM64 MATLAB). Fix: build_mex.m appends -fno-finite-math-only after every -ffast-math (6 sites); kernel scalar tails use mxIsNaN (opaque — survives MSVC /fp:fast); constraint documented in both files; ARM64 binary rebuilt, refresh-mex-binaries workflow regenerates the rest. Audit: only this kernel used the idiom (build_store already used isnan with a warning comment; violation_cull uses mxIsNaN; minmax/lttb get pre-segmented data). Also: TestTagPerfRegression timing gates now report-only on shared CI runners (3 false failures across 2 unrelated PRs on 2026-06-10; FASTSENSE_PERF_GATES=strict opts back in; hard locally). Verified R2025b: ToStepFunctionMex 13/13 (was 9 FAIL), StateTag 18/18, gate policy both modes. | 2026-06-10 | (PR #199) | — | [260610-nwa-fix-ffast-math-breaking-nan-detection-in](./quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/) |
 
 ## Progress Bar
 
diff --git a/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-PLAN.md b/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-PLAN.md
new file mode 100644
index 00000000..6dfea11d
--- /dev/null
+++ b/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-PLAN.md
@@ -0,0 +1,39 @@
+---
+quick_id: 260610-nwa
+description: Fix -ffast-math breaking NaN detection in MEX kernels + make CI perf gates report-only
+date: 2026-06-10
+mode: quick-inline
+---
+
+# Quick Task 260610-nwa: fast-math NaN fix + CI perf-gate policy
+
+## Task 1 — -ffast-math NaN bug (chip task_88d3d685)
+Diagnosed 2026-06-09: all 9 TestToStepFunctionMex failures on macOS ARM64 MATLAB.
+-ffast-math implies -ffinite-math-only -> compiler assumes no NaNs -> folds the
+IEEE self-compare NaN test (v == v) to true, including clang's IR lowering of
+NEON/AVX compare intrinsics. to_step_function_mex scanned all-NaN input as fully
+active; NaN gaps in StateTag step functions rendered as solid steps.
+
+Audit: only to_step_function_mex.c used the vulnerable idiom. build_store_mex.c
+already used isnan() with a comment warning about this exact hazard;
+violation_cull_mex.c uses mxIsNaN; minmax/lttb receive pre-segmented NaN-free data.
+
+Fix: (a) build_mex.m appends -fno-finite-math-only after every -ffast-math
+(6 sites; keeps reassociation/FMA wins); (b) scalar tails in the kernel use
+mxIsNaN (opaque libmx call — survives any fast-math mode incl. MSVC /fp:fast);
+(c) fast-math constraint documented in both files. Local mexmaca64 rebuilt
+(both FastSense + SensorThreshold copies); refresh-mex-binaries workflow
+triggers on this change and regenerates all platforms.
+
+## Task 2 — CI perf gates report-only
+All five TestTagPerfRegression benches gate on wall-clock measurements; shared
+GitHub runners produced three false failures across two unrelated PRs on
+2026-06-10. invokeBenchOrSkip_ now converts gate trips into assume-skips WITH
+the measurement diagnostic when CI is set (unless FASTSENSE_PERF_GATES=strict).
+Gates stay hard on developer machines.
+
+## Verification
+- to_step_function_mex([1 5 10],[NaN NaN NaN],20) -> empty (was 6 elements).
+- TestToStepFunctionMex 13/13 (was 9 failures). TestStateTag 18/18; flat statetag green.
+- Gate policy: CI=true simulated in-session -> 3 passed / 2 report-only skips / 0 failed;
+  CI unset -> hard gates unchanged.
diff --git a/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-SUMMARY.md b/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-SUMMARY.md
new file mode 100644
index 00000000..10a97f38
--- /dev/null
+++ b/.planning/quick/260610-nwa-fix-ffast-math-breaking-nan-detection-in/260610-nwa-SUMMARY.md
@@ -0,0 +1,21 @@
+---
+quick_id: 260610-nwa
+status: complete
+date: 2026-06-10
+---
+
+# Summary: fast-math NaN fix + CI perf-gate policy
+
+See PLAN.md for the full diagnosis. Both tasks landed:
+- build_mex.m: -fno-finite-math-only after every -ffast-math (6 sites) + rationale comment.
+- to_step_function_mex.c: scalar tails use mxIsNaN; FAST-MATH CONSTRAINT documented.
+- Local ARM64 binary rebuilt (FastSense + SensorThreshold copies); CI refresh workflow
+  regenerates the other platforms (triggers on build_mex.m / mex_src changes).
+- TestTagPerfRegression.invokeBenchOrSkip_: timing gates report-only on CI
+  (FASTSENSE_PERF_GATES=strict opts back in), hard locally.
+
+Verified live R2025b: TestToStepFunctionMex 13/13 (was 9 FAIL), TestStateTag 18/18,
+flat test_statetag green; gate policy verified in both CI and local modes.
+Session gotcha hit again: the fastsense_private_proxy temp dir shadowed the rebuilt
+binary — first verification ran the STALE copy; refreshed proxies before retesting
+(see memory matlab-session-test-gotchas).
diff --git a/libs/FastSense/build_mex.m b/libs/FastSense/build_mex.m
index 72f50b4f..e73b57c0 100644
--- a/libs/FastSense/build_mex.m
+++ b/libs/FastSense/build_mex.m
@@ -105,13 +105,22 @@ function build_mex()
 
     % Set optimization and SIMD flags — MSVC uses /flags while GCC/Clang
     % use -flags.  The boolean useMSVC distinguishes the two conventions.
+    %
+    % -fno-finite-math-only MUST follow -ffast-math (260610-nwa):
+    % -ffast-math implies -ffinite-math-only, which lets the compiler
+    % assume no NaNs and fold the IEEE self-compare NaN test (v == v)
+    % to constant true — clang lowers even NEON/AVX compare intrinsics
+    % through IR this folding applies to. That silently broke
+    % to_step_function_mex's NaN-segment scan (all-NaN inputs scanned as
+    % fully active; NaN gaps rendered as solid steps). The reassociation,
+    % FMA-contraction and no-signed-zero wins of -ffast-math are kept.
     useMSVC = ispc && ~isOctave;
     switch arch
         case 'x86_64'
             if useMSVC
                 opt_flags = {'/O2', '/arch:AVX2', '/fp:fast'};
             else
-                opt_flags = {'-O3', '-mavx2', '-mfma', '-ftree-vectorize', '-ffast-math'};
+                opt_flags = {'-O3', '-mavx2', '-mfma', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'};
             end
             fprintf('SIMD target: AVX2 + FMA\n');
         case 'arm64'
@@ -120,17 +129,17 @@ function build_mex()
                 opt_flags = {'/O2', '/fp:fast'};
             elseif isOctave && ~isempty(compiler)
                 % GCC on ARM needs explicit CPU target
-                opt_flags = {'-O3', '-mcpu=apple-m3', '-ftree-vectorize', '-ffast-math'};
+                opt_flags = {'-O3', '-mcpu=apple-m3', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'};
             else
                 % Clang on Apple Silicon: NEON enabled by default
-                opt_flags = {'-O3', '-ffast-math'};
+                opt_flags = {'-O3', '-ffast-math', '-fno-finite-math-only'};
             end
             fprintf('SIMD target: ARM NEON\n');
         otherwise
             if useMSVC
                 opt_flags = {'/O2', '/fp:fast'};
             else
-                opt_flags = {'-O3', '-ffast-math'};
+                opt_flags = {'-O3', '-ffast-math', '-fno-finite-math-only'};
             end
             fprintf('SIMD target: scalar fallback\n');
     end
@@ -207,7 +216,7 @@ function build_mex()
                     if useMSVC
                         sse_flags = {'/O2', '/arch:SSE2', '/fp:fast'};
                     else
-                        sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math'};
+                        sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'};
                     end
                     compile_mex(src_file, out_name, outDir, include_flag, ...
                                 [sse_flags, extra_flags], compiler, extra_srcs);
@@ -312,7 +321,7 @@ function build_mex()
                         if useMSVC
                             sse_flags = {'/O2', '/arch:SSE2', '/fp:fast'};
                         else
-                            sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math'};
+                            sse_flags = {'-O3', '-msse2', '-ftree-vectorize', '-ffast-math', '-fno-finite-math-only'};
                         end
                         compile_mex(srcFile, outName, sensorOutDir, sensorIncFlag, ...
                                     [sse_flags, extraFlags], compiler, extraSrcs);
diff --git a/libs/FastSense/private/mex_src/to_step_function_mex.c b/libs/FastSense/private/mex_src/to_step_function_mex.c
index 618f63af..5ea09ef0 100644
--- a/libs/FastSense/private/mex_src/to_step_function_mex.c
+++ b/libs/FastSense/private/mex_src/to_step_function_mex.c
@@ -13,8 +13,17 @@
  *
  * Algorithm:
  *   Phase 1: SIMD NaN scan — detect active segments in SIMD_WIDTH chunks
- *            using self-compare (v == v is false for NaN).  Branchless
- *            conditional store builds the active index array.
+ *            via compare intrinsics (NaN lanes compare unequal to
+ *            themselves).  Branchless conditional store builds the
+ *            active index array.
+ *
+ *   FAST-MATH CONSTRAINT (260610-nwa): this kernel MUST be compiled with
+ *   NaN semantics intact (-fno-finite-math-only after -ffast-math —
+ *   build_mex.m sets this). Under plain -ffast-math the compiler assumes
+ *   no NaNs and folds self-compares — including clang's IR lowering of
+ *   the compare intrinsics — making every NaN segment scan as active.
+ *   Scalar tails use mxIsNaN (an opaque libmx call) so they survive any
+ *   fast-math mode, including MSVC /fp:fast.
  *   Phase 2: SIMD bulk copy to compute segEnds (shifted segBounds).
  *   Phase 3: SIMD gap detection — gather prevEnd/currStart pairs and
  *            compare in SIMD_WIDTH-wide batches.
@@ -78,7 +87,7 @@ static size_t simd_nan_scan(const double *values, size_t nB,
     /* Scalar tail */
     for (; i < nB; i++) {
         activeIdx[cnt] = (uint32_t)i;
-        cnt += (values[i] == values[i]); /* false for NaN */
+        cnt += (size_t)(!mxIsNaN(values[i])); /* opaque call: survives fast-math */
     }
     return cnt;
 }
@@ -141,7 +150,7 @@ static size_t simd_nan_scan(const double *values, size_t nB,
     }
     for (; i < nB; i++) {
         activeIdx[cnt] = (uint32_t)i;
-        cnt += (values[i] == values[i]);
+        cnt += (size_t)(!mxIsNaN(values[i]));
     }
     return cnt;
 }
@@ -203,7 +212,7 @@ static size_t simd_nan_scan(const double *values, size_t nB,
     }
     for (; i < nB; i++) {
         activeIdx[cnt] = (uint32_t)i;
-        cnt += (values[i] == values[i]);
+        cnt += (size_t)(!mxIsNaN(values[i]));
     }
     return cnt;
 }
@@ -251,7 +260,7 @@ static size_t simd_nan_scan(const double *values, size_t nB,
     size_t i;
     for (i = 0; i < nB; i++) {
         activeIdx[cnt] = (uint32_t)i;
-        cnt += (values[i] == values[i]);
+        cnt += (size_t)(!mxIsNaN(values[i]));
     }
     return cnt;
 }
diff --git a/libs/FastSense/private/to_step_function_mex.mexmaca64 b/libs/FastSense/private/to_step_function_mex.mexmaca64
index 6134159d..b47ea0f8 100755
Binary files a/libs/FastSense/private/to_step_function_mex.mexmaca64 and b/libs/FastSense/private/to_step_function_mex.mexmaca64 differ
diff --git a/libs/SensorThreshold/private/to_step_function_mex.mexmaca64 b/libs/SensorThreshold/private/to_step_function_mex.mexmaca64
index 6134159d..b47ea0f8 100755
Binary files a/libs/SensorThreshold/private/to_step_function_mex.mexmaca64 and b/libs/SensorThreshold/private/to_step_function_mex.mexmaca64 differ
diff --git a/tests/suite/TestTagPerfRegression.m b/tests/suite/TestTagPerfRegression.m
index e38db79a..d7c1584c 100644
--- a/tests/suite/TestTagPerfRegression.m
+++ b/tests/suite/TestTagPerfRegression.m
@@ -74,6 +74,21 @@ function invokeBenchOrSkip_(testCase, benchName)
             testCase.assumeFalse(true, sprintf( ...
                 '%s blocked by pre-existing v2.0-migration bug (%s: %s) — see deferred-items.md', ...
                 benchName, ex.identifier, ex.message));
+        elseif ~isempty(getenv('CI')) && ~strcmpi(getenv('FASTSENSE_PERF_GATES'), 'strict')
+            % Shared-CI-runner policy (260610-nwa): every bench in this suite
+            % gates on wall-clock measurements (relative overhead, absolute ms,
+            % or micro-timing ratios). On GitHub's shared runners those gates
+            % produced three false failures across two unrelated PRs in one
+            % day (consumer-migration 23-35% on identical code that also
+            % passed; getxy tripping alongside) — runner noise, not
+            % regressions. On CI, surface the measurement as a SKIP with the
+            % full diagnostic instead of failing the run. Gates stay HARD on
+            % developer machines (no CI env var), and a CI job can opt back
+            % in with FASTSENSE_PERF_GATES=strict (e.g. on a dedicated or
+            % self-hosted runner).
+            testCase.assumeFalse(true, sprintf( ...
+                '%s perf gate tripped on a shared CI runner (report-only; set FASTSENSE_PERF_GATES=strict to enforce): %s', ...
+                benchName, ex.message));
         else
             % Genuine regression — re-throw so the suite fails.
             rethrow(ex);