Claude/optimize sensor resolve 1 xpjc#38
Merged
Conversation
toStepFunction was O(n²) due to repeated cell array growth and array concatenation inside the loop. Replace with single-pass pre-allocated output: vectorized active-segment detection, vectorized gap detection, and direct index writes with a final trim. Also fix the allChanges concatenation in resolve() Step 1 — pre-compute total length and fill via block copy instead of growing with []. https://claude.ai/code/session_01GgjQM4v4dyk378ZHJbBCTJ
New MEX file replaces the MATLAB toStepFunction inner loop with a single-pass C implementation: count active segments, pre-allocate output, fill with gap detection, trim once. Eliminates all MATLAB interpreter overhead for this hot path. - to_step_function_mex.c: C MEX source with pre-allocated buffers - build_mex.m: register new MEX + copy to SensorThreshold/private - mergeResolvedByLabel.m: persistent useMex gate dispatches to MEX when compiled, falls back to pure-MATLAB implementation otherwise https://claude.ai/code/session_01GgjQM4v4dyk378ZHJbBCTJ
Rewrite with platform-specific SIMD for all hot phases: - Phase 1: NaN scan uses SIMD self-compare (v==v is false for NaN) with branchless conditional-store index collection and early-exit skip when all lanes are NaN. AVX2: 4 doubles/cycle, SSE2/NEON: 2. - Phase 2: segEnds shifted copy via SIMD load/store (simd_copy). - Phase 3: Gap detection gathers prevEnd/currStart into packed buffers then uses SIMD compare + movemask (AVX2/SSE2) or lane extract (NEON). - Phase 5: Final trim-to-size copy via simd_copy. All four SIMD backends supported: AVX2, SSE2, ARM NEON, scalar fallback. Uses simd_utils.h indirectly (same include path) and adds its own intrinsics directly for NaN-specific ops not in simd_utils.h. https://claude.ai/code/session_01GgjQM4v4dyk378ZHJbBCTJ
Function-based tests (Octave-compatible): - All NaN, single active, all contiguous, different values - NaN gap separator, mixed contiguous+gap, dataEnd edge - Single boundary, MEX parity check (when compiled) Class-based MEX parity tests (MATLAB unittest): - Same edge cases as above, plus: - 20 randomized small trials with ~40% NaN density - 100K segment stress test exercising full SIMD paths - 50K all-active (no gaps) test - 10K all-NaN large test - 10K alternating NaN worst-case for gap detection https://claude.ai/code/session_01GgjQM4v4dyk378ZHJbBCTJ
Contributor
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'FastSense Performance'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.
| Benchmark suite | Current: d3356a5 | Previous: 763306b | Ratio |
|---|---|---|---|
Downsample mean std(1M) |
0.069 ms |
0.033 ms |
2.09 |
Instantiation mean std(1M) |
1.492 ms |
1.082 ms |
1.38 |
Zoom cycle mean (1M) |
16.405 ms |
14.501 ms |
1.13 |
Downsample mean std(5M) |
0.085 ms |
0.031 ms |
2.74 |
Render mean std(5M) |
15.119 ms |
1.436 ms |
10.53 |
Zoom cycle mean (5M) |
15.82 ms |
13.757 ms |
1.15 |
Downsample mean std10M) |
0.215 ms |
0.096 ms |
2.24 |
Instantiation mean std10M) |
1.618 ms |
1.351 ms |
1.20 |
Render mean std10M) |
4.126 ms |
2.062 ms |
2.00 |
Zoom cycle mean (10M) |
15.5 ms |
13.693 ms |
1.13 |
Zoom cycle mean std10M) |
0.982 ms |
0.707 ms |
1.39 |
Downsample mean std50M) |
1.129 ms |
0.516 ms |
2.19 |
Zoom cycle mean (50M) |
15.681 ms |
13.608 ms |
1.15 |
Downsample mean (100M) |
213.427 ms |
190.334 ms |
1.12 |
Downsample mean ( std00M) |
10.31 ms |
0.463 ms |
22.27 |
Zoom cycle mean (100M) |
15.812 ms |
13.617 ms |
1.16 |
Downsample mean ( std00M) |
33.218 ms |
0.463 ms |
71.75 |
Instantiation mean ( std00M) |
1241.429 ms |
183.504 ms |
6.77 |
Render mean (500M) |
688.837 ms |
440.434 ms |
1.56 |
Render mean ( std00M) |
504.688 ms |
2.383 ms |
211.79 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @HanSur94
toStepFunction was a local function inside mergeResolvedByLabel.m, making it invisible outside that file. The Octave test failed because local functions cannot be called from external test files, even when the private directory is on the path. Extracting it to its own .m file in private/ keeps the same encapsulation (only SensorThreshold code can call it) while making it accessible to the test's proxy-directory pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The needs_build check in install.m only probed for binary_search_mex. If older MEX files existed but to_step_function_mex was missing, install() would skip build_mex() entirely. Now probes both binary_search_mex and to_step_function_mex so any missing MEX triggers an incremental rebuild. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stress-tests the full resolve pipeline with 500M datapoints, 2 state channels (~9K total transitions), and 4 threshold rules with different condition types (single-condition, multi-condition, upper, lower). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MEX files in private/ directories are invisible to exist() from outside the parent package. Check actual file paths instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renders the 500M-point sensor with all resolved thresholds and violations after the timing runs complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves a tiny 4-point sensor before the timed runs to force MATLAB's JIT compiler to compile all code paths (Sensor.resolve, binary_search, compute_violations, toStepFunction, mergeResolvedByLabel). This way all 3 timed runs measure steady-state performance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs a tiny end-to-end workflow (Sensor, StateChannel, resolve, FastSense render) on trivial data during install(). This forces MATLAB's JIT to compile all hot code paths once per session, so the first real call to resolve() or render() has no warmup penalty. Uses a persistent flag so repeated install() calls skip the warmup. Wrapped in try/catch so it never blocks installation on failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents a visible window flash during install() and avoids display issues on headless CI runners. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.