Originally posted as #1656 (comment)
I tried up wiring the generic group::Wnaf implementation providing variable-time scalar multiplication in k256, similar to what #1722 did for the generic implementation in primeorder. Unfortunately:
high-level operations/point-scalar mul (variable-time)
time: [45.005 µs 45.223 µs 45.495 µs]
change: [+21.943% +22.999% +24.075%] (p = 0.00 < 0.05)
Performance has regressed.
My best guess is the endomorphism-optimized constant time scalar multiplication implementation in k256 is beating the non-endomorphism-accelerated generic wNAF implementation.
I ran the benchmarks of the libsecp256k1 C library to compare numbers on the same hardware, where the constant-time scalar multiplication implementation in k256 is ~37 µs for comparison:
Benchmark , Min(us) , Avg(us) , Max(us)
ecmult_gen , 11.4 , 11.6 , 12.4
ecmult_const , 25.7 , 25.9 , 26.2
ecmult_const_xonly , 28.5 , 28.6 , 28.8
ecmult_1p , 20.7 , 20.9 , 21.0
ecmult_0p_g , 14.6 , 14.8 , 15.0
ecmult_1p_g , 12.2 , 12.3 , 12.8
ecmult_multi_0p_g , 14.6 , 14.7 , 14.8
ecmult_multi_1p_g , 12.3 , 12.3 , 12.4
ecmult_multi_2p_g , 11.5 , 11.6 , 11.7
The *const* are the constant-time implementations, where their constant-time scalar multiply is ~30% faster than ours.
The one to compare a MulVartime impl to above is ecmult_1p I believe, where it's ~20% faster than the constant-time implementation.
To get a similar speedup over the constant-time implementation in k256, I believe the implementation of wNAF would need to be endomorphism-optimized as it is in the libsecp256k1 C library. That would effectively involve k256 providing its own implementation of wNAF and using it behind the scenes to optimize MulVartime and linear combinations.
Originally posted as #1656 (comment)
I tried up wiring the generic
group::Wnafimplementation providing variable-time scalar multiplication ink256, similar to what #1722 did for the generic implementation inprimeorder. Unfortunately:My best guess is the endomorphism-optimized constant time scalar multiplication implementation in
k256is beating the non-endomorphism-accelerated generic wNAF implementation.I ran the benchmarks of the libsecp256k1 C library to compare numbers on the same hardware, where the constant-time scalar multiplication implementation in
k256is ~37 µs for comparison:The
*const*are the constant-time implementations, where their constant-time scalar multiply is ~30% faster than ours.The one to compare a
MulVartimeimpl to above isecmult_1pI believe, where it's ~20% faster than the constant-time implementation.To get a similar speedup over the constant-time implementation in
k256, I believe the implementation of wNAF would need to be endomorphism-optimized as it is in the libsecp256k1 C library. That would effectively involvek256providing its own implementation of wNAF and using it behind the scenes to optimizeMulVartimeand linear combinations.