Skip to content

fix(permutation): make bitwise_permute constant-time#165

Merged
coderdan merged 2 commits into
mainfrom
harden/permutation-bitwise
May 11, 2026
Merged

fix(permutation): make bitwise_permute constant-time#165
coderdan merged 2 commits into
mainfrom
harden/permutation-bitwise

Conversation

@coderdan
Copy link
Copy Markdown
Contributor

@coderdan coderdan commented May 10, 2026

Stacked on #164 — review/merge that one first.

Summary

  • bitwise_permute previously used BitArray::get_unchecked(secret_index) to fetch bits at key-derived positions. The bit array fits in a single cache line (≤16 bytes) so it's not exploitable by classic cache-line probing, but it's still a microarchitectural hazard — port pressure, store-buffer effects, and other intra-cache-line side channels can leak secret indices on some CPUs.
  • Reimplemented as: unpack input to one byte per bit (MSB-first) → permute through the constant-time permute_array primitive → repack to bytes. This routes the bit-level operation through the same scan-and-mask machinery introduced in fix(permutation): close cache-timing side-channel in permute_array #164.
  • Secret intermediates (bytes, bits, permuted) wrapped in Zeroizing<_> for unwind safety, matching the elementwise pattern.
  • Drops direct bitvec usage in bitwise.rs; the BitArray machinery is no longer needed there.

Performance impact

Apple silicon (arm64), --release, 500K iterations, key bytes pre-extracted via the public Permute API for the "old" reference:

N old (variable-time) new (constant-time) overhead
8 1.9 ns 53.9 ns 28×
16 5.7 ns 267 ns 47×
32 34 ns 1.45 µs 42×
64 80 ns 5.6 µs 70×
128 137 ns 27.3 µs 199×

In line with the elementwise overhead — bit-level bitwise_permute reuses permute_array internally, so the cost profile matches. Same SIMD-shuffle follow-up (tbl/pshufb) applies if these sizes show up in a hot path.

Verification

  • cargo test -p vitaminc-permutation — 6 unit + 3 doctests pass (round-trip, key inversion, bitwise correctness across u8/u16/u32/u64/u128 + NonZero variants)
  • cargo fmt --check — clean
  • ct_analyzer on release asm (arm64 + x86_64) — PASSED, 0 errors / 0 warnings
  • Constant-time scan-via-permute_array, no secret-indexed loads remain in bitwise.rs

Test plan

  • Existing bitwise_permute_case exercises N=8/16/32/64/128 across all int and NonZero variants
  • CI green

@coderdan coderdan force-pushed the fix/permutation-cache-timing branch from 9bdec01 to 60fe1e9 Compare May 10, 2026 12:31
@coderdan coderdan force-pushed the harden/permutation-bitwise branch from 28a092f to 2e89ac6 Compare May 10, 2026 12:32
Copy link
Copy Markdown
Contributor

@tobyhede tobyhede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Adding some unit tests would improve, but not a deal breaker

coderdan added 2 commits May 11, 2026 15:32
The previous `bitwise_permute` used `BitArray::get_unchecked(secret_index)`
to read bits at secret-derived positions. Even though all sized bit
arrays here fit in a single cache line (≤16 bytes), the secret-indexed
load is still a microarchitectural hazard — port pressure and other
intra-cache-line side channels can leak the index on some CPUs.

Reimplement as: unpack to one byte per bit (MSB-first), permute through
the now-constant-time `permute_array` primitive, repack. This routes the
bit-level operation through the same scan-and-mask machinery that
already protects element-wise permutation.

Secret intermediates (`bytes`, `bits`, `permuted`) are wrapped in
`Zeroizing<_>` for unwind safety, matching the pattern in elementwise.

Drops the direct `bitvec` usage in this file; the `BitArray` machinery
is no longer needed.

Bench results extended to cover bitwise_permute. Overhead: 28×–199×
across N=8..128, in line with the elementwise result.
Addresses Toby's review feedback on PR #165 — the existing
`bitwise_permute_case` only asserts `output != input`, which is too weak
to catch most plausible regressions in the bit-pack/unpack glue.

Adds four targeted invariant checks:

- `zero_in_zero_out`: 0 -> 0 across every supported width. Catches stray
  bits introduced by sign/endian bugs in the unpack/repack path.
- `all_ones_invariant`: !0 is a fixed point of every bit permutation
  (all bit positions hold 1, so no movement is observable). Catches
  missing-bit bugs at high/low width boundaries.
- `preserves_hamming_weight`: popcount(permute(x)) == popcount(x). The
  strongest single property check available here — any bug that drops,
  duplicates, or mis-positions a bit changes the count. Also justifies
  the `unsafe new_unchecked` in the NonZero impls (popcount > 0 in =>
  popcount > 0 out).
- `is_deterministic`: cheap guard against future nondeterminism
  (hash-seeds, uninit-memory branches, etc.) creeping into the path.

Existing `permute_array` correctness is already covered in elementwise
tests, so these new tests focus on the bit-shuffle glue specific to
`bitwise.rs` rather than retesting the underlying primitive.
@coderdan coderdan force-pushed the harden/permutation-bitwise branch from 2e89ac6 to 1d247ad Compare May 11, 2026 05:35
@coderdan coderdan changed the base branch from fix/permutation-cache-timing to main May 11, 2026 05:37
@coderdan coderdan closed this May 11, 2026
@coderdan coderdan reopened this May 11, 2026
@coderdan coderdan merged commit c1e0e03 into main May 11, 2026
1 check passed
coderdan added a commit that referenced this pull request May 11, 2026
The wasm32 backend (#163), permutation cache-timing fixes (#164, #165),
and digest 0.11 migration (#161) accumulated enough surface change since
0.1.0-pre4.2 to warrant a minor-version bump rather than another point
release in the 0.1.0-pre4.x series — `0.2.0-pre` opens a new pre-release
track that better signals the scope to downstream consumers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants