Benchmark ANOVA side-channel detection consistently fails on all platforms

## Problem

The benchmark CI step **Analyze and Filter** fails on **all three platforms** (Linux, Windows, macOS) with:

```
[!] SECURITY FAILURE: HighSecPolicy ANOVA p-value < 0.01. Opcodes are distinguishable.
```

| Platform | ANOVA p-value | Run |
|----------|--------------|-----|
| Linux | 2.44e-06 | [job](https://github.com/scc-tw/VMPilot/actions/runs/23999694656/job/69993629081) |
| Windows | 4.29e-05 | [job](https://github.com/scc-tw/VMPilot/actions/runs/23999694656/job/69993629075) |
| macOS | 1.15e-07 | [job](https://github.com/scc-tw/VMPilot/actions/runs/23999462671/job/69993045363) |

Linux consistently fails. Windows and macOS fail intermittently.

## Root Cause

`bench_analyzer.py --fail-on-leak` runs a one-way ANOVA across per-opcode timing distributions under HighSecPolicy + RollingKeyOram. The test detects statistically significant differences between opcode execution times (p < 0.01), which indicates a potential timing side-channel.

The DebugPolicy benchmark passes (no security check), but the HighSecPolicy benchmark consistently fails — the constant-time execution goal is not yet achieved for this policy.

## Current Workaround

The step uses `continue-on-error: true` so CI doesn't block:

```yaml
# .github/workflows/benchmark.yml:131
continue-on-error: true # TODO: fix the side channel attacks
```

## Goals

1. **ANOVA p-value**: Achieve p > 0.01 (opcodes statistically indistinguishable)
2. **Mutual Information leakage_bits**: Reduce to < 10⁻⁴ bits (currently computed in `bench_analyzer.py`)

## What Needs to Happen

1. Investigate why HighSecPolicy opcode timings are distinguishable (the ANOVA F-stat and per-opcode deltas in the CI artifacts can guide this)
2. Achieve constant-time execution across all opcodes under HighSecPolicy, or adjust the statistical threshold / methodology if the current test is too sensitive for shared CI runners
3. Drive `leakage_bits` below 10⁻⁴
4. Remove `continue-on-error: true` once the fix is verified

## Context

Recent commits addressing this area:
- `1880c93` fix(security): constant-time operand resolution to eliminate timing side-channel
- `334cc34` refactor(security): remove runtime MBA, exclude NATIVE_CALL from ANOVA
- `fa6035a` perf(runtime): reduce fixed overhead without changing security semantics (P1-P8)
- `a0885e3` test(security): add isolated verify_bb_mac coverage for enc_state evolution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark ANOVA side-channel detection consistently fails on all platforms #16

Problem

Root Cause

Current Workaround

Goals

What Needs to Happen

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmark ANOVA side-channel detection consistently fails on all platforms #16

Description

Problem

Root Cause

Current Workaround

Goals

What Needs to Happen

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions