Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging#1118
Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging#1118hanno-becker wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46538 cycles |
46536 cycles |
1.00 |
ML-DSA-44 sign |
131062 cycles |
131058 cycles |
1.00 |
ML-DSA-44 verify |
47344 cycles |
47346 cycles |
1.00 |
ML-DSA-65 keypair |
81686 cycles |
81682 cycles |
1.00 |
ML-DSA-65 sign |
215381 cycles |
215367 cycles |
1.00 |
ML-DSA-65 verify |
79305 cycles |
79306 cycles |
1.00 |
ML-DSA-87 keypair |
132409 cycles |
132411 cycles |
1.00 |
ML-DSA-87 sign |
277469 cycles |
277415 cycles |
1.00 |
ML-DSA-87 verify |
134241 cycles |
134234 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113288 cycles |
112746 cycles |
1.00 |
ML-DSA-44 sign |
403510 cycles |
400854 cycles |
1.01 |
ML-DSA-44 verify |
121569 cycles |
120116 cycles |
1.01 |
ML-DSA-65 keypair |
193992 cycles |
192886 cycles |
1.01 |
ML-DSA-65 sign |
651150 cycles |
649888 cycles |
1.00 |
ML-DSA-65 verify |
194758 cycles |
192947 cycles |
1.01 |
ML-DSA-87 keypair |
319376 cycles |
318753 cycles |
1.00 |
ML-DSA-87 sign |
831047 cycles |
828832 cycles |
1.00 |
ML-DSA-87 verify |
329038 cycles |
326641 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
45392 cycles |
45452 cycles |
1.00 |
ML-DSA-44 sign |
136338 cycles |
136127 cycles |
1.00 |
ML-DSA-44 verify |
47463 cycles |
47248 cycles |
1.00 |
ML-DSA-65 keypair |
78478 cycles |
78548 cycles |
1.00 |
ML-DSA-65 sign |
221924 cycles |
222310 cycles |
1.00 |
ML-DSA-65 verify |
77951 cycles |
77415 cycles |
1.01 |
ML-DSA-87 keypair |
126284 cycles |
124515 cycles |
1.01 |
ML-DSA-87 sign |
279614 cycles |
275775 cycles |
1.01 |
ML-DSA-87 verify |
123991 cycles |
122738 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
719602 cycles |
820317 cycles |
0.88 |
ML-DSA-44 sign |
2246699 cycles |
3224057 cycles |
0.70 |
ML-DSA-44 verify |
762835 cycles |
917185 cycles |
0.83 |
ML-DSA-65 keypair |
1251304 cycles |
1391201 cycles |
0.90 |
ML-DSA-65 sign |
3627167 cycles |
5232394 cycles |
0.69 |
ML-DSA-65 verify |
1252102 cycles |
1464903 cycles |
0.85 |
ML-DSA-87 keypair |
2106818 cycles |
2299598 cycles |
0.92 |
ML-DSA-87 sign |
4794070 cycles |
6620374 cycles |
0.72 |
ML-DSA-87 verify |
2120585 cycles |
2408309 cycles |
0.88 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
94177 cycles |
94270 cycles |
1.00 |
ML-DSA-44 sign |
310203 cycles |
329827 cycles |
0.94 |
ML-DSA-44 verify |
97031 cycles |
98781 cycles |
0.98 |
ML-DSA-65 keypair |
158476 cycles |
161555 cycles |
0.98 |
ML-DSA-65 sign |
500936 cycles |
538788 cycles |
0.93 |
ML-DSA-65 verify |
157551 cycles |
160081 cycles |
0.98 |
ML-DSA-87 keypair |
261642 cycles |
264477 cycles |
0.99 |
ML-DSA-87 sign |
650338 cycles |
695417 cycles |
0.94 |
ML-DSA-87 verify |
260657 cycles |
266020 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
56284 cycles |
57221 cycles |
0.98 |
ML-DSA-44 sign |
167843 cycles |
166930 cycles |
1.01 |
ML-DSA-44 verify |
60027 cycles |
58283 cycles |
1.03 |
ML-DSA-65 keypair |
99358 cycles |
96734 cycles |
1.03 |
ML-DSA-65 sign |
272455 cycles |
270287 cycles |
1.01 |
ML-DSA-65 verify |
99857 cycles |
97285 cycles |
1.03 |
ML-DSA-87 keypair |
158208 cycles |
161661 cycles |
0.98 |
ML-DSA-87 sign |
334417 cycles |
335089 cycles |
1.00 |
ML-DSA-87 verify |
157879 cycles |
153800 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: dfda657 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-87 verify |
158518 cycles |
153800 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46964 cycles |
47162 cycles |
1.00 |
ML-DSA-44 sign |
144418 cycles |
144655 cycles |
1.00 |
ML-DSA-44 verify |
49951 cycles |
50104 cycles |
1.00 |
ML-DSA-65 keypair |
84031 cycles |
83041 cycles |
1.01 |
ML-DSA-65 sign |
232816 cycles |
229850 cycles |
1.01 |
ML-DSA-65 verify |
83951 cycles |
83119 cycles |
1.01 |
ML-DSA-87 keypair |
131766 cycles |
131179 cycles |
1.00 |
ML-DSA-87 sign |
281804 cycles |
281956 cycles |
1.00 |
ML-DSA-87 verify |
129740 cycles |
129801 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
130860 cycles |
133677 cycles |
0.98 |
ML-DSA-44 sign |
492691 cycles |
522396 cycles |
0.94 |
ML-DSA-44 verify |
142552 cycles |
146685 cycles |
0.97 |
ML-DSA-65 keypair |
219946 cycles |
223803 cycles |
0.98 |
ML-DSA-65 sign |
797000 cycles |
850834 cycles |
0.94 |
ML-DSA-65 verify |
227832 cycles |
233807 cycles |
0.97 |
ML-DSA-87 keypair |
366825 cycles |
375278 cycles |
0.98 |
ML-DSA-87 sign |
1017924 cycles |
1083775 cycles |
0.94 |
ML-DSA-87 verify |
377803 cycles |
387875 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
61870 cycles |
61851 cycles |
1.00 |
ML-DSA-44 sign |
190809 cycles |
191604 cycles |
1.00 |
ML-DSA-44 verify |
66296 cycles |
66346 cycles |
1.00 |
ML-DSA-65 keypair |
111571 cycles |
116244 cycles |
0.96 |
ML-DSA-65 sign |
320974 cycles |
322314 cycles |
1.00 |
ML-DSA-65 verify |
111271 cycles |
113021 cycles |
0.98 |
ML-DSA-87 keypair |
172676 cycles |
172777 cycles |
1.00 |
ML-DSA-87 sign |
380745 cycles |
384407 cycles |
0.99 |
ML-DSA-87 verify |
172559 cycles |
174711 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
115084 cycles |
118552 cycles |
0.97 |
ML-DSA-44 sign |
407771 cycles |
446050 cycles |
0.91 |
ML-DSA-44 verify |
123860 cycles |
128938 cycles |
0.96 |
ML-DSA-65 keypair |
195940 cycles |
202120 cycles |
0.97 |
ML-DSA-65 sign |
650499 cycles |
718282 cycles |
0.91 |
ML-DSA-65 verify |
200657 cycles |
207260 cycles |
0.97 |
ML-DSA-87 keypair |
325260 cycles |
334395 cycles |
0.97 |
ML-DSA-87 sign |
836275 cycles |
919394 cycles |
0.91 |
ML-DSA-87 verify |
331657 cycles |
342499 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
67289 cycles |
67284 cycles |
1.00 |
ML-DSA-44 sign |
201443 cycles |
201465 cycles |
1.00 |
ML-DSA-44 verify |
70197 cycles |
70236 cycles |
1.00 |
ML-DSA-65 keypair |
119340 cycles |
119592 cycles |
1.00 |
ML-DSA-65 sign |
327977 cycles |
328455 cycles |
1.00 |
ML-DSA-65 verify |
116780 cycles |
116975 cycles |
1.00 |
ML-DSA-87 keypair |
196703 cycles |
196660 cycles |
1.00 |
ML-DSA-87 sign |
425032 cycles |
424673 cycles |
1.00 |
ML-DSA-87 verify |
193211 cycles |
193003 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
144207 cycles |
150243 cycles |
0.96 |
ML-DSA-44 sign |
482947 cycles |
543993 cycles |
0.89 |
ML-DSA-44 verify |
152571 cycles |
162793 cycles |
0.94 |
ML-DSA-65 keypair |
248619 cycles |
253828 cycles |
0.98 |
ML-DSA-65 sign |
797728 cycles |
879250 cycles |
0.91 |
ML-DSA-65 verify |
250855 cycles |
261051 cycles |
0.96 |
ML-DSA-87 keypair |
414648 cycles |
428028 cycles |
0.97 |
ML-DSA-87 sign |
1021025 cycles |
1133779 cycles |
0.90 |
ML-DSA-87 verify |
417342 cycles |
438707 cycles |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112457 cycles |
112463 cycles |
1.00 |
ML-DSA-44 sign |
354616 cycles |
354285 cycles |
1.00 |
ML-DSA-44 verify |
117054 cycles |
117088 cycles |
1.00 |
ML-DSA-65 keypair |
194541 cycles |
194650 cycles |
1.00 |
ML-DSA-65 sign |
584282 cycles |
584287 cycles |
1.00 |
ML-DSA-65 verify |
193240 cycles |
192995 cycles |
1.00 |
ML-DSA-87 keypair |
320612 cycles |
321252 cycles |
1.00 |
ML-DSA-87 sign |
748693 cycles |
749933 cycles |
1.00 |
ML-DSA-87 verify |
317879 cycles |
318651 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
125226 cycles |
128439 cycles |
0.97 |
ML-DSA-44 sign |
421018 cycles |
444902 cycles |
0.95 |
ML-DSA-44 verify |
134116 cycles |
136577 cycles |
0.98 |
ML-DSA-65 keypair |
216986 cycles |
220139 cycles |
0.99 |
ML-DSA-65 sign |
681415 cycles |
718637 cycles |
0.95 |
ML-DSA-65 verify |
218343 cycles |
221218 cycles |
0.99 |
ML-DSA-87 keypair |
361874 cycles |
365464 cycles |
0.99 |
ML-DSA-87 sign |
886549 cycles |
917775 cycles |
0.97 |
ML-DSA-87 verify |
368411 cycles |
371436 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
71566 cycles |
71503 cycles |
1.00 |
ML-DSA-44 sign |
211564 cycles |
211366 cycles |
1.00 |
ML-DSA-44 verify |
74848 cycles |
74967 cycles |
1.00 |
ML-DSA-65 keypair |
125946 cycles |
125922 cycles |
1.00 |
ML-DSA-65 sign |
347535 cycles |
348013 cycles |
1.00 |
ML-DSA-65 verify |
123867 cycles |
124042 cycles |
1.00 |
ML-DSA-87 keypair |
206188 cycles |
206707 cycles |
1.00 |
ML-DSA-87 sign |
443030 cycles |
447437 cycles |
0.99 |
ML-DSA-87 verify |
204440 cycles |
204174 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135363 cycles |
137989 cycles |
0.98 |
ML-DSA-44 sign |
457401 cycles |
481848 cycles |
0.95 |
ML-DSA-44 verify |
145386 cycles |
148733 cycles |
0.98 |
ML-DSA-65 keypair |
237653 cycles |
240592 cycles |
0.99 |
ML-DSA-65 sign |
742121 cycles |
785306 cycles |
0.95 |
ML-DSA-65 verify |
236161 cycles |
241073 cycles |
0.98 |
ML-DSA-87 keypair |
390893 cycles |
395138 cycles |
0.99 |
ML-DSA-87 sign |
958819 cycles |
1005113 cycles |
0.95 |
ML-DSA-87 verify |
396238 cycles |
403185 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
204952 cycles |
212493 cycles |
0.96 |
ML-DSA-44 sign |
690273 cycles |
756435 cycles |
0.91 |
ML-DSA-44 verify |
218086 cycles |
229158 cycles |
0.95 |
ML-DSA-65 keypair |
369004 cycles |
378664 cycles |
0.97 |
ML-DSA-65 sign |
1129113 cycles |
1240500 cycles |
0.91 |
ML-DSA-65 verify |
356558 cycles |
372168 cycles |
0.96 |
ML-DSA-87 keypair |
589071 cycles |
602034 cycles |
0.98 |
ML-DSA-87 sign |
1454044 cycles |
1579603 cycles |
0.92 |
ML-DSA-87 verify |
596654 cycles |
618336 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112686 cycles |
112405 cycles |
1.00 |
ML-DSA-44 sign |
356052 cycles |
354779 cycles |
1.00 |
ML-DSA-44 verify |
117667 cycles |
117271 cycles |
1.00 |
ML-DSA-65 keypair |
194353 cycles |
194498 cycles |
1.00 |
ML-DSA-65 sign |
585143 cycles |
584927 cycles |
1.00 |
ML-DSA-65 verify |
193113 cycles |
193003 cycles |
1.00 |
ML-DSA-87 keypair |
321039 cycles |
321197 cycles |
1.00 |
ML-DSA-87 sign |
749458 cycles |
749906 cycles |
1.00 |
ML-DSA-87 verify |
318256 cycles |
318296 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
270334 cycles |
270813 cycles |
1.00 |
ML-DSA-44 sign |
814667 cycles |
814217 cycles |
1.00 |
ML-DSA-44 verify |
274970 cycles |
273907 cycles |
1.00 |
ML-DSA-65 keypair |
467712 cycles |
467318 cycles |
1.00 |
ML-DSA-65 sign |
1367463 cycles |
1320861 cycles |
1.04 |
ML-DSA-65 verify |
456340 cycles |
451480 cycles |
1.01 |
ML-DSA-87 keypair |
805783 cycles |
802075 cycles |
1.00 |
ML-DSA-87 sign |
1881318 cycles |
1880613 cycles |
1.00 |
ML-DSA-87 verify |
787853 cycles |
779252 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
206083 cycles |
212221 cycles |
0.97 |
ML-DSA-44 sign |
691174 cycles |
758032 cycles |
0.91 |
ML-DSA-44 verify |
218535 cycles |
229778 cycles |
0.95 |
ML-DSA-65 keypair |
369344 cycles |
378417 cycles |
0.98 |
ML-DSA-65 sign |
1129681 cycles |
1241106 cycles |
0.91 |
ML-DSA-65 verify |
356857 cycles |
372482 cycles |
0.96 |
ML-DSA-87 keypair |
589415 cycles |
603782 cycles |
0.98 |
ML-DSA-87 sign |
1455802 cycles |
1581844 cycles |
0.92 |
ML-DSA-87 verify |
596976 cycles |
618440 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
391571 cycles |
458494 cycles |
0.85 |
ML-DSA-44 sign |
1483786 cycles |
2126863 cycles |
0.70 |
ML-DSA-44 verify |
443220 cycles |
552683 cycles |
0.80 |
ML-DSA-65 keypair |
676713 cycles |
770631 cycles |
0.88 |
ML-DSA-65 sign |
2440668 cycles |
3460057 cycles |
0.71 |
ML-DSA-65 verify |
707703 cycles |
857490 cycles |
0.83 |
ML-DSA-87 keypair |
1128071 cycles |
1249666 cycles |
0.90 |
ML-DSA-87 sign |
3173423 cycles |
4303345 cycles |
0.74 |
ML-DSA-87 verify |
1174442 cycles |
1370001 cycles |
0.86 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
223259 cycles |
222102 cycles |
1.01 |
ML-DSA-44 sign |
616370 cycles |
622325 cycles |
0.99 |
ML-DSA-44 verify |
223427 cycles |
227406 cycles |
0.98 |
ML-DSA-65 keypair |
396333 cycles |
385188 cycles |
1.03 |
ML-DSA-65 sign |
1033678 cycles |
1017117 cycles |
1.02 |
ML-DSA-65 verify |
378398 cycles |
371026 cycles |
1.02 |
ML-DSA-87 keypair |
656252 cycles |
657858 cycles |
1.00 |
ML-DSA-87 sign |
1362649 cycles |
1413224 cycles |
0.96 |
ML-DSA-87 verify |
638462 cycles |
647577 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
290502 cycles |
317617 cycles |
0.91 |
ML-DSA-44 sign |
980030 cycles |
1205867 cycles |
0.81 |
ML-DSA-44 verify |
311188 cycles |
362564 cycles |
0.86 |
ML-DSA-65 keypair |
549669 cycles |
577543 cycles |
0.95 |
ML-DSA-65 sign |
1648606 cycles |
1961272 cycles |
0.84 |
ML-DSA-65 verify |
505767 cycles |
556667 cycles |
0.91 |
ML-DSA-87 keypair |
835538 cycles |
912005 cycles |
0.92 |
ML-DSA-87 sign |
2090280 cycles |
2489549 cycles |
0.84 |
ML-DSA-87 verify |
852209 cycles |
953121 cycles |
0.89 |
This comment was automatically generated by workflow using github-action-benchmark.
CBMC Results (ML-DSA-87, REDUCE-RAM)
Full Results (196 proofs)
|
CBMC Results (ML-DSA-65, REDUCE-RAM)
Full Results (196 proofs)
|
CBMC Results (ML-DSA-44, REDUCE-RAM)
Full Results (196 proofs)
|
CBMC Results (ML-DSA-87)
Full Results (196 proofs)
|
CBMC Results (ML-DSA-65)
Full Results (196 proofs)
|
CBMC Results (ML-DSA-44)
Full Results (196 proofs)
|
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 38f3f91 | Previous: a71b5d2 | Ratio |
|---|---|---|---|
ML-DSA-65 sign |
1367463 cycles |
1320861 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
Replace the single-layer C-reference forward and inverse NTT in
`mldsa/src/poly.c` with one that merges two layers each.
Also, store each twiddle alongside its precomputed twist, letting
`mld_fqmul(a, b, b_twisted)` drop the multiply with MLDSA_Q^{-1}
that was previously hidden inside `mld_montgomery_reduce`.
Mirrors pq-code-package/mlkem-native/#463 and pq-code/package/mlkem-native/#683
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
rod-chapman
left a comment
There was a problem hiding this comment.
Look good. 1 suggestion to improve proof times.
|
|
||
| # Disable any setting of EXTERNAL_SAT_SOLVER, and choose SMT backend instead | ||
| EXTERNAL_SAT_SOLVER= | ||
| CBMCFLAGS=--bitwuzla |
There was a problem hiding this comment.
On my laptop, proof of this new implementation takes 131s with bitwuzla, so I tried z3, which compltes the proof in about 26s. Suggest switch to CBMCFLAGS=--smt2
Replace the single-layer C-reference forward and inverse NTT in
mldsa/src/poly.cwith one that merges two layers each.Also, store each twiddle alongside its precomputed twist, letting
mld_fqmul(a, b, b_twisted)drop the multiply with MLDSA_Q^{-1}that was previously hidden inside
mld_montgomery_reduce.Mirrors pq-code-package/mlkem-native/463 (@rod-chapman) and pq-code-package/mlkem-native/683