Problem
The benchmark CI step Analyze and Filter fails on all three platforms (Linux, Windows, macOS) with:
[!] SECURITY FAILURE: HighSecPolicy ANOVA p-value < 0.01. Opcodes are distinguishable.
| Platform |
ANOVA p-value |
Run |
| Linux |
2.44e-06 |
job |
| Windows |
4.29e-05 |
job |
| macOS |
1.15e-07 |
job |
Linux consistently fails. Windows and macOS fail intermittently.
Root Cause
bench_analyzer.py --fail-on-leak runs a one-way ANOVA across per-opcode timing distributions under HighSecPolicy + RollingKeyOram. The test detects statistically significant differences between opcode execution times (p < 0.01), which indicates a potential timing side-channel.
The DebugPolicy benchmark passes (no security check), but the HighSecPolicy benchmark consistently fails — the constant-time execution goal is not yet achieved for this policy.
Current Workaround
The step uses continue-on-error: true so CI doesn't block:
# .github/workflows/benchmark.yml:131
continue-on-error: true # TODO: fix the side channel attacks
Goals
- ANOVA p-value: Achieve p > 0.01 (opcodes statistically indistinguishable)
- Mutual Information leakage_bits: Reduce to < 10⁻⁴ bits (currently computed in
bench_analyzer.py)
What Needs to Happen
- Investigate why HighSecPolicy opcode timings are distinguishable (the ANOVA F-stat and per-opcode deltas in the CI artifacts can guide this)
- Achieve constant-time execution across all opcodes under HighSecPolicy, or adjust the statistical threshold / methodology if the current test is too sensitive for shared CI runners
- Drive
leakage_bits below 10⁻⁴
- Remove
continue-on-error: true once the fix is verified
Context
Recent commits addressing this area:
1880c93 fix(security): constant-time operand resolution to eliminate timing side-channel
334cc34 refactor(security): remove runtime MBA, exclude NATIVE_CALL from ANOVA
fa6035a perf(runtime): reduce fixed overhead without changing security semantics (P1-P8)
a0885e3 test(security): add isolated verify_bb_mac coverage for enc_state evolution
Problem
The benchmark CI step Analyze and Filter fails on all three platforms (Linux, Windows, macOS) with:
Linux consistently fails. Windows and macOS fail intermittently.
Root Cause
bench_analyzer.py --fail-on-leakruns a one-way ANOVA across per-opcode timing distributions under HighSecPolicy + RollingKeyOram. The test detects statistically significant differences between opcode execution times (p < 0.01), which indicates a potential timing side-channel.The DebugPolicy benchmark passes (no security check), but the HighSecPolicy benchmark consistently fails — the constant-time execution goal is not yet achieved for this policy.
Current Workaround
The step uses
continue-on-error: trueso CI doesn't block:Goals
bench_analyzer.py)What Needs to Happen
leakage_bitsbelow 10⁻⁴continue-on-error: trueonce the fix is verifiedContext
Recent commits addressing this area:
1880c93fix(security): constant-time operand resolution to eliminate timing side-channel334cc34refactor(security): remove runtime MBA, exclude NATIVE_CALL from ANOVAfa6035aperf(runtime): reduce fixed overhead without changing security semantics (P1-P8)a0885e3test(security): add isolated verify_bb_mac coverage for enc_state evolution