k256: endomorphism-aware wNAF implementation

Originally posted as https://github.com/RustCrypto/elliptic-curves/issues/1656#issuecomment-4239647512

I tried up wiring the generic `group::Wnaf` implementation providing variable-time scalar multiplication in `k256`, similar to what #1722 did for the generic implementation in `primeorder`. Unfortunately:

```
high-level operations/point-scalar mul (variable-time)
                        time:   [45.005 µs 45.223 µs 45.495 µs]
                        change: [+21.943% +22.999% +24.075%] (p = 0.00 < 0.05)
                        Performance has regressed.
```

My best guess is the endomorphism-optimized constant time scalar multiplication implementation in `k256` is beating the non-endomorphism-accelerated generic wNAF implementation.

I ran the benchmarks of the libsecp256k1 C library to compare numbers on the same hardware, where the constant-time scalar multiplication implementation in `k256` is ~37 µs for comparison:

```
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)

ecmult_gen                    ,    11.4       ,    11.6       ,    12.4
ecmult_const                  ,    25.7       ,    25.9       ,    26.2
ecmult_const_xonly            ,    28.5       ,    28.6       ,    28.8
ecmult_1p                     ,    20.7       ,    20.9       ,    21.0
ecmult_0p_g                   ,    14.6       ,    14.8       ,    15.0
ecmult_1p_g                   ,    12.2       ,    12.3       ,    12.8
ecmult_multi_0p_g             ,    14.6       ,    14.7       ,    14.8
ecmult_multi_1p_g             ,    12.3       ,    12.3       ,    12.4
ecmult_multi_2p_g             ,    11.5       ,    11.6       ,    11.7
```

The `*const*` are the constant-time implementations, where their constant-time scalar multiply is ~30% faster than ours.

The one to compare a `MulVartime` impl to above is `ecmult_1p` I believe, where it's ~20% faster than the constant-time implementation.

To get a similar speedup over the constant-time implementation in `k256`, I believe the implementation of wNAF would need to be endomorphism-optimized as it is in the libsecp256k1 C library. That would effectively involve `k256` providing its own implementation of wNAF and using it behind the scenes to optimize `MulVartime` and linear combinations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k256: endomorphism-aware wNAF implementation #1725

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

k256: endomorphism-aware wNAF implementation #1725

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions