k256: endomorphism-aware wNAF for vartime scalar multiplication by 42Pupusas · Pull Request #1745 · RustCrypto/elliptic-curves

42Pupusas · 2026-04-24T01:07:39Z

Summary

Replaces the placeholder MulVartime / MulByGeneratorVartime impls
for k256::ProjectivePoint (which fell back to the constant-time path
and had TODOs to match) with a real endomorphism-aware width-5 wNAF,
then folds the combined mul_by_generator_and_mul_add_vartime into a
single shared-doublings ladder over all 4 GLV sub-scalars.

Closes #1725.

What changes

Commit 1 — wNAF core. GLV-decompose the scalar into two ~128-bit
halves, compute a width-5 signed-digit NAF of each magnitude, and run
a standard left-to-right double-and-add with precomputed odd
multiples [P, 3P, ..., 15P]. Sign is folded into the precomputed
points at setup.
Commit 2 — share doublings. Extract a small WnafSlot
(odd-multiples table + digits) and a wnaf_ladder helper. The
combined a*G + b*P variant runs one ladder over all 4 GLV slots
(G + βG + P + βP), doing one double() per step instead of two
independent ladders.
Commit 3 — debug_assert. Guard the fixed 130-entry wNAF digit
buffer; the bound is currently implicit in WNAF_WIDTH = 5, this
makes it explicit at test time.

Perf (Schnorr verify, default features, x86_64)

Stage	time/verify
Master (constant-time fallback)	~62 µs
After commit 1 (wNAF)	~53 µs
After commit 2 (shared ladder)	~50 µs

~19% faster end-to-end. Also speeds up any other user of
MulVartime / MulByGeneratorVartime on the k256 curve.

Test plan

cargo test -p k256 --lib --features getrandom — 89 passed
New randomized tests for mul_vartime and
mul_and_mul_add_vartime vs. the constant-time reference
(32 iterations each with ProjectivePoint::generate() and
Scalar::generate()).
Edge-case tests: scalar = 0, 1, −1, point = identity.
cargo bench -p k256 --bench schnorr -- verify on an idle host
confirms the numbers above (criterion-reported change is stable
across runs).

Notes

Not constant time; SECURITY: comments are on the two vartime impls.
Only reachable via the MulVartime / MulByGeneratorVartime traits
that callers opt into for non-secret scalars.
Briefly explored several further optimizations (batched-affine
odd-multiples via Montgomery's trick; static precomputed G tables
with mixed-add; wider window for the G side). The first two
regressed perf at this width/scalar size; the last gave ~4% more but
added a ~6 KB static, const-generics, and a new LazyLock path —
not worth the complexity for a single-curve specialization. This PR
sticks to the change that's pure upside.

Replaces the placeholder MulVartime / MulByGeneratorVartime impls (which just called the constant-time path and had TODOs to match) with a width-5 wNAF that uses the GLV endomorphism to split each scalar into two ~128-bit halves. Schnorr verify: ~62 µs -> ~53 µs (14% faster, no precomputed-tables; ~55 µs with tables). Addresses RustCrypto#1725. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Folds the combined `mul_by_generator_and_mul_add_vartime` into a single wNAF ladder over all 4 GLV sub-scalars (s1, s2 for G and the endomorphism; e1, e2 for P and the endomorphism). One `double()` per step instead of two independent ladders. Factors out a small `WnafSlot` (odd-multiples table + digits) and a `wnaf_ladder` helper so the single-point `mul_vartime` and the combined op share the same loop body. Schnorr verify: ~53 µs -> ~50 µs (no precomputed-tables; ~51 µs with tables). Total vs. pre-wNAF baseline: ~62 µs -> ~50 µs (~19% faster). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`wnaf_128` writes into a fixed 130-entry buffer; the bound holds for the current `WNAF_WIDTH = 5` and the ≤128-bit GLV sub-scalars, but it's implicit. Add a `debug_assert!` in the loop so that any future change to `WNAF_WIDTH` that invalidates the bound is caught at test time rather than silently writing out of bounds in worst-case inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`wnaf_128` tracked the residual scalar in two u64 limbs, but a negative recentered digit adds up to 2^(W-1) − 1 to the value, which can legitimately overflow past bit 127 when the input is close to 2^128 − 1. The old code let `hi.wrapping_add(1)` silently wrap, losing the carried bit and producing a NAF that reconstructs to the wrong value. The GLV decomposition's `(r1, r2)` each have magnitude strictly less than 2^128, so values in the carry-out window are possible (though vanishingly rare in random scalars — which is why the existing randomized tests never caught it). Fix by carrying the overflow bit into a third limb `top` that is absorbed back on the next right-shift. Perf impact is in the noise: the `top` branch is almost never taken and the predictor handles it cleanly. Add two regression tests: - `test_wnaf_128_reconstruction_adversarial` — reconstructs the NAF of a scalar with low 128 bits = 0xFF..FF and asserts it equals 2^128 − 1. - `test_mul_vartime_adversarial_scalars` — end-to-end check that `mul_vartime(P, k)` matches the constant-time reference when `k`'s low 128 bits trigger the carry window. Also add a `debug_assert!` on `idx` in `WnafSlot::apply` to guard the parallel invariant (`idx < WNAF_TABLE_SIZE`) if `WNAF_WIDTH` is ever widened without growing the table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tarcieri · 2026-05-10T22:56:32Z

It would be good if this could be implemented in terms of types from the (rustcrypto-)group crate and its existing group::{Wnaf, WnafBase, WnafScalar}.

That will probably require API changes to (rustcrypto-)group, e.g. potentially moving scalar multiplication to the WnafGroup trait which can provide a method but let this crate plug in its own that's aware of the endomorphism. But if done right I think it would let us reuse all the types, including the new BasepointTableVartime which many other curves in this repo are using to accelerate e.g. ECDSA verification.

42Pupusas · 2026-05-10T23:11:36Z

that makes sense, I should have thought of that but was kind of tunnel visioned on this crate.

will explore this direction @tarcieri , if an API change is needed to the group crate I am guessing a corresponding PR should be raised there?

tarcieri · 2026-05-11T00:29:28Z

Yes, and you should be able to use patch.crates-io to update this PR

Replace custom wNAF implementation (wnaf_128, build_odd_multiples, WnafSlot, wnaf_ladder) with the group crate's WnafBase/WnafScalar types and WnafBase::multiscalar_mul_array. A new WnafScalar::from_le_bytes constructor accepts short (128-bit) GLV half-scalars, producing ~half the wNAF digits and ~half the doublings in the evaluation loop. multiscalar_mul_array avoids the two collect() heap allocations of the iterator-based multiscalar_mul. Depends on RustCrypto/group#15 for the group crate changes (wnaf_table size fix, from_le_bytes, multiscalar_mul_array, pre-sized Vec allocations). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

42Pupusas · 2026-05-14T17:41:42Z

@tarcieri the new approach maintained most of the performance gains, but still falls behind a bit behind the hand-rolled version due to some extra allocations that happen in the group crate.

From my initial exploration, the Wnaf primitives could add a const generic for TABLE_SIZE, basically dropping heap allocations to 0 for the mul paths. However this is much more intrusive and potentially cascades down to most other curves that use it, so I held off from moving forward on that path.

If that approach seems valuable to you, I can follow up with a PR for that

tarcieri · 2026-05-14T19:46:01Z

@42Pupusas I would suggest keeping the changes as minimal as possible, as they need to get upstreamed to https://github.com/zkcrypto/group

It does look like there are a few things in the PR you opened which could be split out into their own PRs and upstreamed directly though

42Pupusas and others added 4 commits May 9, 2026 09:20

42Pupusas mentioned this pull request May 14, 2026

Wnaf Optimizations RustCrypto/group#15

Open

42Pupusas force-pushed the k256/schnorr-verify-perf branch 2 times, most recently from 9e3e659 to d22cd8a Compare May 14, 2026 17:06

42Pupusas force-pushed the k256/schnorr-verify-perf branch from d22cd8a to 577a58b Compare May 14, 2026 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k256: endomorphism-aware wNAF for vartime scalar multiplication#1745

k256: endomorphism-aware wNAF for vartime scalar multiplication#1745
42Pupusas wants to merge 5 commits into
RustCrypto:masterfrom
42Pupusas:k256/schnorr-verify-perf

42Pupusas commented Apr 24, 2026

Uh oh!

tarcieri commented May 10, 2026

Uh oh!

42Pupusas commented May 10, 2026 •

edited

Loading

Uh oh!

tarcieri commented May 11, 2026

Uh oh!

42Pupusas commented May 14, 2026

Uh oh!

tarcieri commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

42Pupusas commented Apr 24, 2026

Summary

What changes

Perf (Schnorr verify, default features, x86_64)

Test plan

Notes

Uh oh!

tarcieri commented May 10, 2026

Uh oh!

42Pupusas commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarcieri commented May 11, 2026

Uh oh!

42Pupusas commented May 14, 2026

Uh oh!

tarcieri commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

42Pupusas commented May 10, 2026 •

edited

Loading