Wnaf Optimizations#15
Open
42Pupusas wants to merge 2 commits into
Open
Conversation
For window size w, wNAF digits have max magnitude 2^(w-1) - 1. The table is indexed by |digit| / 2, so the maximum index is (2^(w-1) - 1) / 2 = 2^(w-2) - 1, requiring 2^(w-2) entries. The previous 2^(w-1) allocation computed twice as many odd multiples as needed, wasting point additions during table setup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- WnafBase::multiscalar_mul_array: accepts fixed-size arrays instead of iterators, avoiding the two collect() heap allocations in multiscalar_mul. - WnafScalar::new/from_le_bytes: use Vec::with_capacity to avoid reallocation during wnaf_form. - WnafBase::new: use Vec::with_capacity for exact table sizing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
42Pupusas
added a commit
to 42Pupusas/elliptic-curves
that referenced
this pull request
May 14, 2026
Replace custom wNAF implementation (wnaf_128, build_odd_multiples, WnafSlot, wnaf_ladder) with the group crate's WnafBase/WnafScalar types and WnafBase::multiscalar_mul_array. A new WnafScalar::from_le_bytes constructor accepts short (128-bit) GLV half-scalars, producing ~half the wNAF digits and ~half the doublings in the evaluation loop. multiscalar_mul_array avoids the two collect() heap allocations of the iterator-based multiscalar_mul. Depends on RustCrypto/group#15 for the group crate changes (wnaf_table size fix, from_le_bytes, multiscalar_mul_array, pre-sized Vec allocations). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
42Pupusas
added a commit
to 42Pupusas/elliptic-curves
that referenced
this pull request
May 14, 2026
Replace custom wNAF implementation (wnaf_128, build_odd_multiples, WnafSlot, wnaf_ladder) with the group crate's WnafBase/WnafScalar types and WnafBase::multiscalar_mul_array. A new WnafScalar::from_le_bytes constructor accepts short (128-bit) GLV half-scalars, producing ~half the wNAF digits and ~half the doublings in the evaluation loop. multiscalar_mul_array avoids the two collect() heap allocations of the iterator-based multiscalar_mul. Depends on RustCrypto/group#15 for the group crate changes (wnaf_table size fix, from_le_bytes, multiscalar_mul_array, pre-sized Vec allocations). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the base work for integrating optimizations discussed here with the proper shared types.
wnaf_tableto allocate2^(w-2)entries instead of2^(w-1): For window size w, wNAF digits have max magnitude2^(w-1) - 1, so the table is indexed by|digit| / 2with max index2^(w-2) - 1. The previous2^(w-1)allocation computed twice as many odd multiples as needed, wasting point additions during table setup. Forw=5this halves table construction from 16 to 8 entries.WnafScalar::from_le_bytes: Constructs wNAF digits from a raw little-endian byte slice, enabling callers to pass pre-decomposed scalars shorter than the full field representation. This is needed for endomorphism-based (GLV) scalar multiplication where a 256-bit scalar is split into two ~128-bit halves — using 17-byte inputs instead of 32-byte produces ~half the wNAF digits and ~half the doublings in the evaluation loop.WnafBase::multiscalar_mul_array: Fixed-size array variant ofmultiscalar_multhat avoids the twocollect::<Vec<_>>()heap allocations the iterator-based version requires to build the slice-of-slices forwnaf_multi_exp.Vecallocations:WnafScalar::new,from_le_bytes, andWnafBase::newnow useVec::with_capacitysized to the exact output length, avoiding potential reallocation during wnaf_form/wnaf_table.Verification
cargo test— passes (groupcrate)patch.crates=iopointing at this branch: 211 tests pass acrossk256(90),p256(39),p384(34),p521(26),sm2(22).k256Schnorr verify (cargo bench --bench schnorr): the table size fix alone recovered ~10% regression from using the group crate's wNAF vs a hand-rolled implementation; combined withmultiscalar_mul_arrayand pre-sizing, total performance matches the hand-rolled baseline (~52 µs)perf record+perf reportto confirm the table fix halved time spent inwnaf_table(14.6% → 8.4% of total verify time)multiscalar_mul_arrayeliminates 2 of 10 per-call heap allocations@tarcieri I can add the dhat profiling harness if interested, but it adds a dev dependency so did not include it