Skip to content

Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging#1118

Draft
hanno-becker wants to merge 1 commit into
mainfrom
c_ntt_2
Draft

Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging#1118
hanno-becker wants to merge 1 commit into
mainfrom
c_ntt_2

Conversation

@hanno-becker
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker commented May 12, 2026

Replace the single-layer C-reference forward and inverse NTT in
mldsa/src/poly.c with one that merges two layers each.

Also, store each twiddle alongside its precomputed twist, letting
mld_fqmul(a, b, b_twisted) drop the multiply with MLDSA_Q^{-1}
that was previously hidden inside mld_montgomery_reduce.

Mirrors pq-code-package/mlkem-native/463 (@rod-chapman) and pq-code-package/mlkem-native/683

@hanno-becker hanno-becker changed the title [WIP] Speed up C-reference NTT/invNTT with twisted zetas + 2-layer me… [WIP] Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging May 12, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 46538 cycles 46536 cycles 1.00
ML-DSA-44 sign 131062 cycles 131058 cycles 1.00
ML-DSA-44 verify 47344 cycles 47346 cycles 1.00
ML-DSA-65 keypair 81686 cycles 81682 cycles 1.00
ML-DSA-65 sign 215381 cycles 215367 cycles 1.00
ML-DSA-65 verify 79305 cycles 79306 cycles 1.00
ML-DSA-87 keypair 132409 cycles 132411 cycles 1.00
ML-DSA-87 sign 277469 cycles 277415 cycles 1.00
ML-DSA-87 verify 134241 cycles 134234 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 113288 cycles 112746 cycles 1.00
ML-DSA-44 sign 403510 cycles 400854 cycles 1.01
ML-DSA-44 verify 121569 cycles 120116 cycles 1.01
ML-DSA-65 keypair 193992 cycles 192886 cycles 1.01
ML-DSA-65 sign 651150 cycles 649888 cycles 1.00
ML-DSA-65 verify 194758 cycles 192947 cycles 1.01
ML-DSA-87 keypair 319376 cycles 318753 cycles 1.00
ML-DSA-87 sign 831047 cycles 828832 cycles 1.00
ML-DSA-87 verify 329038 cycles 326641 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 45392 cycles 45452 cycles 1.00
ML-DSA-44 sign 136338 cycles 136127 cycles 1.00
ML-DSA-44 verify 47463 cycles 47248 cycles 1.00
ML-DSA-65 keypair 78478 cycles 78548 cycles 1.00
ML-DSA-65 sign 221924 cycles 222310 cycles 1.00
ML-DSA-65 verify 77951 cycles 77415 cycles 1.01
ML-DSA-87 keypair 126284 cycles 124515 cycles 1.01
ML-DSA-87 sign 279614 cycles 275775 cycles 1.01
ML-DSA-87 verify 123991 cycles 122738 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 719602 cycles 820317 cycles 0.88
ML-DSA-44 sign 2246699 cycles 3224057 cycles 0.70
ML-DSA-44 verify 762835 cycles 917185 cycles 0.83
ML-DSA-65 keypair 1251304 cycles 1391201 cycles 0.90
ML-DSA-65 sign 3627167 cycles 5232394 cycles 0.69
ML-DSA-65 verify 1252102 cycles 1464903 cycles 0.85
ML-DSA-87 keypair 2106818 cycles 2299598 cycles 0.92
ML-DSA-87 sign 4794070 cycles 6620374 cycles 0.72
ML-DSA-87 verify 2120585 cycles 2408309 cycles 0.88

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 94177 cycles 94270 cycles 1.00
ML-DSA-44 sign 310203 cycles 329827 cycles 0.94
ML-DSA-44 verify 97031 cycles 98781 cycles 0.98
ML-DSA-65 keypair 158476 cycles 161555 cycles 0.98
ML-DSA-65 sign 500936 cycles 538788 cycles 0.93
ML-DSA-65 verify 157551 cycles 160081 cycles 0.98
ML-DSA-87 keypair 261642 cycles 264477 cycles 0.99
ML-DSA-87 sign 650338 cycles 695417 cycles 0.94
ML-DSA-87 verify 260657 cycles 266020 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 56284 cycles 57221 cycles 0.98
ML-DSA-44 sign 167843 cycles 166930 cycles 1.01
ML-DSA-44 verify 60027 cycles 58283 cycles 1.03
ML-DSA-65 keypair 99358 cycles 96734 cycles 1.03
ML-DSA-65 sign 272455 cycles 270287 cycles 1.01
ML-DSA-65 verify 99857 cycles 97285 cycles 1.03
ML-DSA-87 keypair 158208 cycles 161661 cycles 0.98
ML-DSA-87 sign 334417 cycles 335089 cycles 1.00
ML-DSA-87 verify 157879 cycles 153800 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: dfda657 Previous: a71b5d2 Ratio
ML-DSA-87 verify 158518 cycles 153800 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 46964 cycles 47162 cycles 1.00
ML-DSA-44 sign 144418 cycles 144655 cycles 1.00
ML-DSA-44 verify 49951 cycles 50104 cycles 1.00
ML-DSA-65 keypair 84031 cycles 83041 cycles 1.01
ML-DSA-65 sign 232816 cycles 229850 cycles 1.01
ML-DSA-65 verify 83951 cycles 83119 cycles 1.01
ML-DSA-87 keypair 131766 cycles 131179 cycles 1.00
ML-DSA-87 sign 281804 cycles 281956 cycles 1.00
ML-DSA-87 verify 129740 cycles 129801 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 130860 cycles 133677 cycles 0.98
ML-DSA-44 sign 492691 cycles 522396 cycles 0.94
ML-DSA-44 verify 142552 cycles 146685 cycles 0.97
ML-DSA-65 keypair 219946 cycles 223803 cycles 0.98
ML-DSA-65 sign 797000 cycles 850834 cycles 0.94
ML-DSA-65 verify 227832 cycles 233807 cycles 0.97
ML-DSA-87 keypair 366825 cycles 375278 cycles 0.98
ML-DSA-87 sign 1017924 cycles 1083775 cycles 0.94
ML-DSA-87 verify 377803 cycles 387875 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 61870 cycles 61851 cycles 1.00
ML-DSA-44 sign 190809 cycles 191604 cycles 1.00
ML-DSA-44 verify 66296 cycles 66346 cycles 1.00
ML-DSA-65 keypair 111571 cycles 116244 cycles 0.96
ML-DSA-65 sign 320974 cycles 322314 cycles 1.00
ML-DSA-65 verify 111271 cycles 113021 cycles 0.98
ML-DSA-87 keypair 172676 cycles 172777 cycles 1.00
ML-DSA-87 sign 380745 cycles 384407 cycles 0.99
ML-DSA-87 verify 172559 cycles 174711 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 115084 cycles 118552 cycles 0.97
ML-DSA-44 sign 407771 cycles 446050 cycles 0.91
ML-DSA-44 verify 123860 cycles 128938 cycles 0.96
ML-DSA-65 keypair 195940 cycles 202120 cycles 0.97
ML-DSA-65 sign 650499 cycles 718282 cycles 0.91
ML-DSA-65 verify 200657 cycles 207260 cycles 0.97
ML-DSA-87 keypair 325260 cycles 334395 cycles 0.97
ML-DSA-87 sign 836275 cycles 919394 cycles 0.91
ML-DSA-87 verify 331657 cycles 342499 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 67289 cycles 67284 cycles 1.00
ML-DSA-44 sign 201443 cycles 201465 cycles 1.00
ML-DSA-44 verify 70197 cycles 70236 cycles 1.00
ML-DSA-65 keypair 119340 cycles 119592 cycles 1.00
ML-DSA-65 sign 327977 cycles 328455 cycles 1.00
ML-DSA-65 verify 116780 cycles 116975 cycles 1.00
ML-DSA-87 keypair 196703 cycles 196660 cycles 1.00
ML-DSA-87 sign 425032 cycles 424673 cycles 1.00
ML-DSA-87 verify 193211 cycles 193003 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 144207 cycles 150243 cycles 0.96
ML-DSA-44 sign 482947 cycles 543993 cycles 0.89
ML-DSA-44 verify 152571 cycles 162793 cycles 0.94
ML-DSA-65 keypair 248619 cycles 253828 cycles 0.98
ML-DSA-65 sign 797728 cycles 879250 cycles 0.91
ML-DSA-65 verify 250855 cycles 261051 cycles 0.96
ML-DSA-87 keypair 414648 cycles 428028 cycles 0.97
ML-DSA-87 sign 1021025 cycles 1133779 cycles 0.90
ML-DSA-87 verify 417342 cycles 438707 cycles 0.95

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 112457 cycles 112463 cycles 1.00
ML-DSA-44 sign 354616 cycles 354285 cycles 1.00
ML-DSA-44 verify 117054 cycles 117088 cycles 1.00
ML-DSA-65 keypair 194541 cycles 194650 cycles 1.00
ML-DSA-65 sign 584282 cycles 584287 cycles 1.00
ML-DSA-65 verify 193240 cycles 192995 cycles 1.00
ML-DSA-87 keypair 320612 cycles 321252 cycles 1.00
ML-DSA-87 sign 748693 cycles 749933 cycles 1.00
ML-DSA-87 verify 317879 cycles 318651 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 125226 cycles 128439 cycles 0.97
ML-DSA-44 sign 421018 cycles 444902 cycles 0.95
ML-DSA-44 verify 134116 cycles 136577 cycles 0.98
ML-DSA-65 keypair 216986 cycles 220139 cycles 0.99
ML-DSA-65 sign 681415 cycles 718637 cycles 0.95
ML-DSA-65 verify 218343 cycles 221218 cycles 0.99
ML-DSA-87 keypair 361874 cycles 365464 cycles 0.99
ML-DSA-87 sign 886549 cycles 917775 cycles 0.97
ML-DSA-87 verify 368411 cycles 371436 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 71566 cycles 71503 cycles 1.00
ML-DSA-44 sign 211564 cycles 211366 cycles 1.00
ML-DSA-44 verify 74848 cycles 74967 cycles 1.00
ML-DSA-65 keypair 125946 cycles 125922 cycles 1.00
ML-DSA-65 sign 347535 cycles 348013 cycles 1.00
ML-DSA-65 verify 123867 cycles 124042 cycles 1.00
ML-DSA-87 keypair 206188 cycles 206707 cycles 1.00
ML-DSA-87 sign 443030 cycles 447437 cycles 0.99
ML-DSA-87 verify 204440 cycles 204174 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 135363 cycles 137989 cycles 0.98
ML-DSA-44 sign 457401 cycles 481848 cycles 0.95
ML-DSA-44 verify 145386 cycles 148733 cycles 0.98
ML-DSA-65 keypair 237653 cycles 240592 cycles 0.99
ML-DSA-65 sign 742121 cycles 785306 cycles 0.95
ML-DSA-65 verify 236161 cycles 241073 cycles 0.98
ML-DSA-87 keypair 390893 cycles 395138 cycles 0.99
ML-DSA-87 sign 958819 cycles 1005113 cycles 0.95
ML-DSA-87 verify 396238 cycles 403185 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 204952 cycles 212493 cycles 0.96
ML-DSA-44 sign 690273 cycles 756435 cycles 0.91
ML-DSA-44 verify 218086 cycles 229158 cycles 0.95
ML-DSA-65 keypair 369004 cycles 378664 cycles 0.97
ML-DSA-65 sign 1129113 cycles 1240500 cycles 0.91
ML-DSA-65 verify 356558 cycles 372168 cycles 0.96
ML-DSA-87 keypair 589071 cycles 602034 cycles 0.98
ML-DSA-87 sign 1454044 cycles 1579603 cycles 0.92
ML-DSA-87 verify 596654 cycles 618336 cycles 0.96

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 112686 cycles 112405 cycles 1.00
ML-DSA-44 sign 356052 cycles 354779 cycles 1.00
ML-DSA-44 verify 117667 cycles 117271 cycles 1.00
ML-DSA-65 keypair 194353 cycles 194498 cycles 1.00
ML-DSA-65 sign 585143 cycles 584927 cycles 1.00
ML-DSA-65 verify 193113 cycles 193003 cycles 1.00
ML-DSA-87 keypair 321039 cycles 321197 cycles 1.00
ML-DSA-87 sign 749458 cycles 749906 cycles 1.00
ML-DSA-87 verify 318256 cycles 318296 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 270334 cycles 270813 cycles 1.00
ML-DSA-44 sign 814667 cycles 814217 cycles 1.00
ML-DSA-44 verify 274970 cycles 273907 cycles 1.00
ML-DSA-65 keypair 467712 cycles 467318 cycles 1.00
ML-DSA-65 sign 1367463 cycles 1320861 cycles 1.04
ML-DSA-65 verify 456340 cycles 451480 cycles 1.01
ML-DSA-87 keypair 805783 cycles 802075 cycles 1.00
ML-DSA-87 sign 1881318 cycles 1880613 cycles 1.00
ML-DSA-87 verify 787853 cycles 779252 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 206083 cycles 212221 cycles 0.97
ML-DSA-44 sign 691174 cycles 758032 cycles 0.91
ML-DSA-44 verify 218535 cycles 229778 cycles 0.95
ML-DSA-65 keypair 369344 cycles 378417 cycles 0.98
ML-DSA-65 sign 1129681 cycles 1241106 cycles 0.91
ML-DSA-65 verify 356857 cycles 372482 cycles 0.96
ML-DSA-87 keypair 589415 cycles 603782 cycles 0.98
ML-DSA-87 sign 1455802 cycles 1581844 cycles 0.92
ML-DSA-87 verify 596976 cycles 618440 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 391571 cycles 458494 cycles 0.85
ML-DSA-44 sign 1483786 cycles 2126863 cycles 0.70
ML-DSA-44 verify 443220 cycles 552683 cycles 0.80
ML-DSA-65 keypair 676713 cycles 770631 cycles 0.88
ML-DSA-65 sign 2440668 cycles 3460057 cycles 0.71
ML-DSA-65 verify 707703 cycles 857490 cycles 0.83
ML-DSA-87 keypair 1128071 cycles 1249666 cycles 0.90
ML-DSA-87 sign 3173423 cycles 4303345 cycles 0.74
ML-DSA-87 verify 1174442 cycles 1370001 cycles 0.86

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 223259 cycles 222102 cycles 1.01
ML-DSA-44 sign 616370 cycles 622325 cycles 0.99
ML-DSA-44 verify 223427 cycles 227406 cycles 0.98
ML-DSA-65 keypair 396333 cycles 385188 cycles 1.03
ML-DSA-65 sign 1033678 cycles 1017117 cycles 1.02
ML-DSA-65 verify 378398 cycles 371026 cycles 1.02
ML-DSA-87 keypair 656252 cycles 657858 cycles 1.00
ML-DSA-87 sign 1362649 cycles 1413224 cycles 0.96
ML-DSA-87 verify 638462 cycles 647577 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 290502 cycles 317617 cycles 0.91
ML-DSA-44 sign 980030 cycles 1205867 cycles 0.81
ML-DSA-44 verify 311188 cycles 362564 cycles 0.86
ML-DSA-65 keypair 549669 cycles 577543 cycles 0.95
ML-DSA-65 sign 1648606 cycles 1961272 cycles 0.84
ML-DSA-65 verify 505767 cycles 556667 cycles 0.91
ML-DSA-87 keypair 835538 cycles 912005 cycles 0.92
ML-DSA-87 sign 2090280 cycles 2489549 cycles 0.84
ML-DSA-87 verify 852209 cycles 953121 cycles 0.89

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 12, 2026

CBMC Results (ML-DSA-87, REDUCE-RAM)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 2924s 1460s +100.3%
fqmul ⚠️ 219s 29s +655%
Full Results (196 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 2924s 1460s +100.3%
mld_invntt_2_layers_block 799s - new
mld_ntt_2_layers_block 508s - new
fqmul ⚠️ 219s 29s +655%
polyvec_matrix_pointwise_montgomery_yvec 129s 122s +6%
rej_uniform_native 110s 107s +3%
poly_pointwise_montgomery_c 108s 157s -31%
mld_ntt_2_layers 81s - new
mld_invntt_2_layers 74s - new
mld_ct_memcmp 73s 70s +4%
sign_verify_internal 39s 40s -3%
mld_attempt_signature_generation 25s 24s +4%
keccakf1600x4_permute_native 20s 24s -17%
polyt0_unpack 18s 11s +64%
poly_chknorm_c 17s 14s +21%
polyeta_unpack 17s 15s +13%
poly_invntt_tomont_c 14s 9s +56%
polyveck_decompose 14s 16s -12%
mld_check_pct 12s 11s +9%
poly_add 12s 11s +9%
poly_uniform_eta_4x 11s 13s -15%
rej_uniform_c 11s 20s -45%
keccak_absorb_once_x4 9s 10s -10%
keccakf1600_permute_native 9s 5s +80%
keccakf1600_permute 8s 6s +33%
polyvec_matrix_pointwise_montgomery_row 8s 10s -20%
polyz_unpack_c 8s 5s +60%
rej_uniform 8s 18s -56%
compute_pack_t0_t1 7s 7s +0%
mld_sample_s1_s2_serial 7s 7s +0%
poly_caddq_c 7s 6s +17%
polyveck_chknorm 7s 6s +17%
polyveck_reduce 7s 8s -12%
polyvecl_ntt 7s 9s -22%
sign 7s 10s -30%
sign_signature_internal 7s 6s +17%
keccak_absorb 6s 6s +0%
pointwise_acc_native_aarch64 6s 6s +0%
pointwise_acc_native_x86_64 6s 6s +0%
poly_power2round 6s 7s -14%
polyt0_pack 6s 5s +20%
polyveck_invntt_tomont 6s 6s +0%
sign_verify_pre_hash_internal 6s 5s +20%
sign_verify_pre_hash_shake256 6s 4s +50%
caddq 5s 2s +150%
mld_prepare_domain_separation_prefix 5s 3s +67%
mld_sample_s1_s2 5s 4s +25%
ntt_native_x86_64 5s 3s +67%
pack_sig_z 5s 2s +150%
poly_caddq 5s 4s +25%
poly_decompose_native 5s 3s +67%
poly_ntt_native 5s 2s +150%
poly_uniform_eta 5s 3s +67%
polyt1_unpack 5s 2s +150%
polyveck_caddq 5s 4s +25%
polyveck_ntt 5s 2s +150%
polyveck_unpack_eta 5s 3s +67%
polyvecl_unpack_z 5s 6s -17%
polyz_unpack_19_native_aarch64 5s 4s +25%
shake128_squeeze 5s 3s +67%
sign_open 5s 6s -17%
sign_pk_from_sk 5s 6s -17%
sign_signature 5s 5s +0%
unpack_sk_t0hat 5s 5s +0%
keccak_f1600_x4_native_avx2 4s 4s +0%
poly_chknorm_native_aarch64 4s 3s +33%
poly_decompose_32_native_aarch64 4s 3s +33%
poly_invntt_tomont 4s 3s +33%
poly_invntt_tomont_native 4s 2s +100%
poly_permute_bitrev_to_custom_optional_native 4s 2s +100%
poly_pointwise_montgomery 4s 2s +100%
poly_reduce 4s 4s +0%
poly_shiftl 4s 4s +0%
poly_sub 4s 5s -20%
poly_uniform 4s 6s -33%
poly_use_hint 4s 3s +33%
polyvecl_chknorm 4s 5s -20%
polyvecl_uniform_gamma1 4s 3s +33%
polyw1_pack 4s 6s -33%
polyz_pack 4s 3s +33%
rej_eta_c 4s 5s -20%
shake256x4_absorb_once 4s 2s +100%
sign_signature_pre_hash_internal 4s 4s +0%
sign_signature_pre_hash_shake256 4s 2s +100%
sign_verify 4s 3s +33%
sign_verify_extmu 4s 5s -20%
sys_check_capability 4s 2s +100%
decompose 3s 2s +50%
fqscale 3s 2s +50%
intt_native_aarch64 3s 5s -40%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 3s +0%
keccak_squeezeblocks_x4 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 1s +200%
keccakf1600x4_extract_bytes 3s 3s +0%
keccakf1600x4_permute 3s 3s +0%
keccakf1600x4_xor_bytes 3s 5s -40%
keccakf1600x4_xor_bytes_native 3s 3s +0%
make_hint 3s 2s +50%
mld_compute_pack_z 3s 5s -40%
mld_ct_abs_i32 3s 3s +0%
mld_ct_cmask_neg_i32 3s 3s +0%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_ct_get_optblocker_u32 3s 2s +50%
mld_value_barrier_u32 3s 2s +50%
ntt_native_aarch64 3s 4s -25%
pack_sig_h 3s 3s +0%
pointwise_native_aarch64 3s 3s +0%
pointwise_native_x86_64 3s 2s +50%
poly_challenge 3s 5s -40%
poly_chknorm 3s 3s +0%
poly_chknorm_native 3s 5s -40%
poly_decompose 3s 3s +0%
poly_decompose_88_native_aarch64 3s 3s +0%
poly_decompose_c 3s 2s +50%
poly_ntt 3s 2s +50%
poly_pointwise_montgomery_native 3s 3s +0%
poly_use_hint_c 3s 4s -25%
poly_use_hint_native 3s 3s +0%
polyeta_pack 3s 2s +50%
polyt1_pack 3s 4s -25%
polyvec_matrix_expand 3s 2s +50%
polyveck_pack_w1 3s 2s +50%
polyvecl_pack_eta 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_pointwise_acc_montgomery_native 3s 1s +200%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyvecl_unpack_eta 3s 6s -50%
polyz_unpack_native 3s 3s +0%
power2round 3s 4s -25%
rej_eta 3s 1s +200%
shake128_release 3s 4s -25%
shake128x4_absorb_once 3s 3s +0%
shake256_finalize 3s 3s +0%
shake256_release 3s 2s +50%
shake256_squeeze 3s 2s +50%
sign_keypair 3s 4s -25%
sign_keypair_internal 3s 4s -25%
sign_signature_extmu 3s 2s +50%
sk_s1hat_get_poly 3s 2s +50%
sk_s2hat_get_poly 3s 4s -25%
sk_t0hat_get_poly 3s 3s +0%
unpack_sk_s1hat 3s 1s +200%
unpack_sk_s2hat 3s 3s +0%
yvec_init 3s 2s +50%
intt_native_x86_64 2s 5s -60%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_init 2s 3s -33%
keccak_squeeze 2s 4s -50%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_ct_sel_int32 2s 1s +100%
mld_h 2s 3s -33%
mld_value_barrier_u8 2s 2s +0%
montgomery_reduce 2s 5s -60%
nttunpack_native_x86_64 2s 2s +0%
pack_sk_rho_key_tr_s2 2s 3s -33%
pack_sk_s1 2s 1s +100%
poly_caddq_native 2s 5s -60%
poly_ntt_c 2s 3s -33%
poly_permute_bitrev_to_custom_optional 2s 2s +0%
poly_uniform_4x 2s 3s -33%
poly_uniform_gamma1 2s 4s -50%
poly_uniform_gamma1_4x 2s 4s -50%
poly_use_hint_native_aarch64 2s 4s -50%
polyvec_matrix_expand_serial 2s 3s -33%
polyveck_pack_eta 2s 3s -33%
polyvecl_pointwise_acc_montgomery_c 2s 2s +0%
polyz_unpack_17_native_aarch64 2s 3s -33%
reduce32 2s 2s +0%
rej_eta_native 2s 4s -50%
shake128_absorb 2s 2s +0%
shake128_finalize 2s 2s +0%
shake128x4_squeezeblocks 2s 3s -33%
shake256 2s 3s -33%
shake256_absorb 2s 2s +0%
shake256_init 2s 2s +0%
shake256x4_squeezeblocks 2s 3s -33%
unpack_pk_t1 2s 3s -33%
use_hint 2s 3s -33%
yvec_get_poly 2s 4s -50%
keccak_f1600_x1_native_aarch64 1s 1s +0%
keccak_finalize 1s 1s +0%
keccakf1600_xor_bytes 1s 2s -50%
mld_keccakf1600_extract_bytes 1s 1s +0%
mld_polymat_expand_entry 1s 2s -50%
mld_value_barrier_i64 1s 1s +0%
pack_sig_c 1s 3s -67%
poly_caddq_native_aarch64 1s 2s -50%
polyz_unpack 1s 3s -67%
shake128_init 1s 3s -67%
sig_unpack_hints 1s 4s -75%
unpack_sk 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 12, 2026

CBMC Results (ML-DSA-65, REDUCE-RAM)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3004s 1496s +100.8%
fqmul ⚠️ 225s 28s +704%
Full Results (196 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3004s 1496s +100.8%
mld_invntt_2_layers_block 819s - new
mld_ntt_2_layers_block 510s - new
fqmul ⚠️ 225s 28s +704%
polyvec_matrix_pointwise_montgomery_yvec 163s 149s +9%
poly_pointwise_montgomery_c 112s 167s -33%
rej_uniform_native 112s 106s +6%
mld_ntt_2_layers 85s - new
mld_ct_memcmp 77s 75s +3%
mld_invntt_2_layers 77s - new
mld_attempt_signature_generation 30s 27s +11%
keccakf1600x4_permute_native 23s 23s +0%
poly_chknorm_c 19s 14s +36%
polyvecl_chknorm 19s 19s +0%
sign_verify_internal 19s 18s +6%
poly_invntt_tomont_c 16s 7s +129%
polyt0_unpack 16s 10s +60%
mld_check_pct 14s 14s +0%
polyveck_decompose 14s 14s +0%
poly_add 13s 13s +0%
rej_uniform_c 13s 19s -32%
poly_uniform_eta_4x 11s 11s +0%
polyvec_matrix_pointwise_montgomery_row 10s 10s +0%
polyveck_caddq 10s 8s +25%
compute_pack_t0_t1 9s 9s +0%
keccak_absorb_once_x4 9s 9s +0%
keccakf1600_permute 9s 6s +50%
polyvecl_ntt 9s 6s +50%
rej_uniform 9s 18s -50%
keccak_absorb 8s 7s +14%
polyveck_reduce 8s 8s +0%
keccakf1600_permute_native 7s 7s +0%
mld_compute_pack_z 7s 5s +40%
poly_caddq_c 7s 7s +0%
poly_decompose_native 7s 5s +40%
polyveck_invntt_tomont 7s 5s +40%
polyveck_pack_w1 7s 3s +133%
polyz_unpack_c 7s 6s +17%
shake256_init 7s 3s +133%
sign 7s 7s +0%
sign_verify_pre_hash_internal 7s 3s +133%
keccak_f1600_x1_native_aarch64 6s 2s +200%
keccak_f1600_x4_native_avx2 6s 4s +50%
mld_h 6s 2s +200%
pointwise_acc_native_aarch64 6s 3s +100%
poly_shiftl 6s 8s -25%
intt_native_aarch64 5s 2s +150%
keccak_squeezeblocks_x4 5s 4s +25%
pack_sig_h 5s 2s +150%
pointwise_acc_native_x86_64 5s 6s -17%
poly_uniform_gamma1 5s 4s +25%
poly_use_hint_native 5s 1s +400%
shake128_init 5s 3s +67%
shake256_squeeze 5s 4s +25%
sign_keypair_internal 5s 3s +67%
sign_pk_from_sk 5s 5s +0%
sign_signature 5s 3s +67%
sign_verify_extmu 5s 4s +25%
sys_check_capability 5s 4s +25%
intt_native_x86_64 4s 3s +33%
mld_ct_cmask_neg_i32 4s 1s +300%
mld_keccakf1600_extract_bytes 4s 3s +33%
mld_sample_s1_s2 4s 5s -20%
nttunpack_native_x86_64 4s 4s +0%
pack_sig_z 4s 2s +100%
pointwise_native_x86_64 4s 4s +0%
poly_challenge 4s 4s +0%
poly_chknorm 4s 4s +0%
poly_chknorm_native_aarch64 4s 2s +100%
poly_decompose_32_native_aarch64 4s 4s +0%
poly_decompose_c 4s 4s +0%
poly_invntt_tomont_native 4s 3s +33%
poly_permute_bitrev_to_custom_optional_native 4s 2s +100%
poly_uniform 4s 4s +0%
poly_use_hint_c 4s 5s -20%
polyt0_pack 4s 3s +33%
polyt1_pack 4s 4s +0%
polyveck_unpack_eta 4s 3s +33%
polyw1_pack 4s 2s +100%
polyz_unpack_native 4s 2s +100%
sign_signature_pre_hash_internal 4s 3s +33%
sign_verify 4s 4s +0%
sk_s1hat_get_poly 4s 3s +33%
unpack_sk_t0hat 4s 1s +300%
yvec_init 4s 2s +100%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 1s +200%
keccak_init 3s 4s -25%
keccak_squeeze 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 3s +0%
keccakf1600x4_extract_bytes_native 3s 4s -25%
keccakf1600x4_xor_bytes_native 3s 5s -40%
make_hint 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 2s +50%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_polymat_expand_entry 3s 2s +50%
mld_sample_s1_s2_serial 3s 5s -40%
mld_value_barrier_i64 3s 2s +50%
mld_value_barrier_u32 3s 2s +50%
mld_value_barrier_u8 3s 1s +200%
ntt_native_x86_64 3s 3s +0%
poly_caddq_native_aarch64 3s 3s +0%
poly_chknorm_native 3s 4s -25%
poly_decompose_88_native_aarch64 3s 3s +0%
poly_invntt_tomont 3s 1s +200%
poly_ntt 3s 4s -25%
poly_ntt_native 3s 4s -25%
poly_permute_bitrev_to_custom_optional 3s 3s +0%
poly_pointwise_montgomery_native 3s 2s +50%
poly_reduce 3s 5s -40%
poly_uniform_eta 3s 8s -62%
poly_uniform_gamma1_4x 3s 6s -50%
polyeta_unpack 3s 2s +50%
polyveck_chknorm 3s 3s +0%
polyveck_ntt 3s 2s +50%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_pointwise_acc_montgomery_c 3s 2s +50%
polyvecl_uniform_gamma1 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyvecl_unpack_z 3s 3s +0%
polyz_pack 3s 5s -40%
polyz_unpack 3s 3s +0%
polyz_unpack_19_native_aarch64 3s 3s +0%
rej_eta 3s 2s +50%
rej_eta_c 3s 3s +0%
shake128_absorb 3s 2s +50%
shake128x4_squeezeblocks 3s 4s -25%
shake256_release 3s 2s +50%
shake256x4_absorb_once 3s 3s +0%
sign_keypair 3s 3s +0%
sign_open 3s 4s -25%
sign_signature_extmu 3s 4s -25%
sign_signature_internal 3s 7s -57%
sign_signature_pre_hash_shake256 3s 6s -50%
sign_verify_pre_hash_shake256 3s 4s -25%
unpack_pk_t1 3s 2s +50%
unpack_sk 3s 5s -40%
unpack_sk_s2hat 3s 3s +0%
caddq 2s 3s -33%
decompose 2s 4s -50%
fqscale 2s 5s -60%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 4s -50%
keccak_finalize 2s 3s -33%
keccakf1600_xor_bytes 2s 4s -50%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_permute 2s 2s +0%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_ct_get_optblocker_u8 2s 1s +100%
mld_prepare_domain_separation_prefix 2s 4s -50%
montgomery_reduce 2s 4s -50%
ntt_native_aarch64 2s 1s +100%
pack_sk_rho_key_tr_s2 2s 2s +0%
pack_sk_s1 2s 3s -33%
pointwise_native_aarch64 2s 4s -50%
poly_caddq 2s 2s +0%
poly_caddq_native 2s 5s -60%
poly_decompose 2s 2s +0%
poly_ntt_c 2s 4s -50%
poly_pointwise_montgomery 2s 3s -33%
poly_power2round 2s 7s -71%
poly_uniform_4x 2s 3s -33%
poly_use_hint 2s 4s -50%
poly_use_hint_native_aarch64 2s 2s +0%
polyeta_pack 2s 3s -33%
polyt1_unpack 2s 2s +0%
polyvec_matrix_expand 2s 4s -50%
polyveck_pack_eta 2s 2s +0%
polyvecl_pack_eta 2s 3s -33%
polyvecl_unpack_eta 2s 2s +0%
polyz_unpack_17_native_aarch64 2s 4s -50%
power2round 2s 3s -33%
reduce32 2s 3s -33%
rej_eta_native 2s 4s -50%
shake128_finalize 2s 1s +100%
shake128_release 2s 2s +0%
shake128_squeeze 2s 1s +100%
shake256 2s 1s +100%
shake256_absorb 2s 2s +0%
shake256_finalize 2s 2s +0%
shake256x4_squeezeblocks 2s 6s -67%
sig_unpack_hints 2s 2s +0%
sk_s2hat_get_poly 2s 3s -33%
sk_t0hat_get_poly 2s 2s +0%
use_hint 2s 4s -50%
yvec_get_poly 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 1s 4s -75%
keccakf1600x4_extract_bytes 1s 3s -67%
keccakf1600x4_xor_bytes 1s 2s -50%
mld_ct_abs_i32 1s 4s -75%
mld_ct_sel_int32 1s 2s -50%
pack_sig_c 1s 2s -50%
poly_sub 1s 4s -75%
polyvec_matrix_expand_serial 1s 2s -50%
polyvecl_pointwise_acc_montgomery_native 1s 3s -67%
shake128x4_absorb_once 1s 2s -50%
unpack_sk_s1hat 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 12, 2026

CBMC Results (ML-DSA-44, REDUCE-RAM)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3130s 1443s +116.9%
fqmul ⚠️ 240s 28s +757%
poly_chknorm_c ⚠️ 21s 14s +50%
Full Results (196 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3130s 1443s +116.9%
mld_invntt_2_layers_block 889s - new
mld_ntt_2_layers_block 565s - new
fqmul ⚠️ 240s 28s +757%
poly_pointwise_montgomery_c 119s 176s -32%
rej_uniform_native 112s 108s +4%
polyvec_matrix_pointwise_montgomery_yvec 94s 88s +7%
mld_ntt_2_layers 87s - new
mld_ct_memcmp 79s 76s +4%
mld_invntt_2_layers 76s - new
mld_attempt_signature_generation 26s 26s +0%
keccakf1600x4_permute_native 22s 23s -4%
poly_chknorm_c ⚠️ 21s 14s +50%
sign_verify_internal 19s 21s -10%
poly_invntt_tomont_c 17s 8s +112%
polyt0_unpack 17s 10s +70%
polyeta_unpack 16s 16s +0%
mld_check_pct 14s 15s -7%
rej_uniform_c 14s 18s -22%
polyveck_chknorm 13s 12s +8%
polyz_unpack_c 13s 14s -7%
poly_uniform_eta_4x 12s 11s +9%
keccak_absorb_once_x4 11s 11s +0%
poly_decompose_c 11s 9s +22%
rej_uniform 11s 20s -45%
compute_pack_t0_t1 10s 7s +43%
poly_add 10s 10s +0%
keccakf1600_permute_native 9s 7s +29%
poly_caddq_c 9s 7s +29%
polyvec_matrix_pointwise_montgomery_row 9s 8s +12%
sign 9s 6s +50%
poly_challenge 7s 4s +75%
poly_pointwise_montgomery_native 7s 5s +40%
polyveck_reduce 7s 6s +17%
sign_signature_pre_hash_shake256 7s 5s +40%
keccak_squeezeblocks_x4 6s 5s +20%
keccakf1600_permute 6s 8s -25%
keccakf1600x4_extract_bytes_native 6s 2s +200%
mld_compute_pack_z 6s 6s +0%
mld_ct_cmask_nonzero_u32 6s 2s +200%
pointwise_acc_native_aarch64 6s 5s +20%
pointwise_acc_native_x86_64 6s 5s +20%
poly_invntt_tomont_native 6s 3s +100%
poly_ntt_native 6s 3s +100%
polyveck_unpack_eta 6s 3s +100%
polyvecl_ntt 6s 3s +100%
sign_keypair 6s 2s +200%
unpack_sk_s2hat 6s 2s +200%
keccak_absorb 5s 9s -44%
make_hint 5s 2s +150%
mld_h 5s 3s +67%
mld_polymat_expand_entry 5s 3s +67%
mld_value_barrier_i64 5s 3s +67%
nttunpack_native_x86_64 5s 2s +150%
pack_sk_s1 5s 3s +67%
poly_caddq_native 5s 3s +67%
poly_chknorm 5s 3s +67%
poly_chknorm_native 5s 6s -17%
poly_shiftl 5s 4s +25%
poly_uniform 5s 4s +25%
polyvec_matrix_expand 5s 5s +0%
polyvec_matrix_expand_serial 5s 3s +67%
polyvecl_uniform_gamma1_serial 5s 3s +67%
polyz_pack 5s 3s +67%
polyz_unpack_native 5s 4s +25%
rej_eta_c 5s 3s +67%
rej_eta_native 5s 2s +150%
sign_verify 5s 3s +67%
sign_verify_pre_hash_shake256 5s 4s +25%
sk_s2hat_get_poly 5s 4s +25%
sk_t0hat_get_poly 5s 2s +150%
unpack_pk_t1 5s 1s +400%
intt_native_x86_64 4s 4s +0%
keccakf1600_xor_bytes 4s 2s +100%
keccakf1600_xor_bytes (big endian) 4s 2s +100%
mld_sample_s1_s2_serial 4s 4s +0%
poly_caddq_native_aarch64 4s 3s +33%
poly_invntt_tomont 4s 4s +0%
poly_ntt_c 4s 3s +33%
poly_use_hint 4s 4s +0%
poly_use_hint_native_aarch64 4s 1s +300%
polyt1_unpack 4s 3s +33%
polyveck_caddq 4s 4s +0%
polyveck_decompose 4s 5s -20%
polyveck_invntt_tomont 4s 5s -20%
polyveck_pack_eta 4s 2s +100%
polyvecl_pointwise_acc_montgomery_native 4s 5s -20%
polyvecl_unpack_eta 4s 1s +300%
polyw1_pack 4s 3s +33%
shake128_finalize 4s 3s +33%
shake128x4_squeezeblocks 4s 3s +33%
shake256_squeeze 4s 2s +100%
shake256x4_absorb_once 4s 4s +0%
sig_unpack_hints 4s 2s +100%
sign_keypair_internal 4s 2s +100%
sign_signature 4s 6s -33%
sign_signature_internal 4s 4s +0%
sign_signature_pre_hash_internal 4s 3s +33%
sign_verify_pre_hash_internal 4s 4s +0%
unpack_sk 4s 2s +100%
use_hint 4s 4s +0%
yvec_init 4s 5s -20%
caddq 3s 2s +50%
fqscale 3s 1s +200%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccak_finalize 3s 2s +50%
keccak_init 3s 2s +50%
keccakf1600_extract_bytes (big endian) 3s 4s -25%
keccakf1600x4_permute 3s 2s +50%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_get_optblocker_u8 3s 1s +200%
mld_sample_s1_s2 3s 3s +0%
montgomery_reduce 3s 4s -25%
ntt_native_x86_64 3s 5s -40%
pack_sig_c 3s 4s -25%
pack_sk_rho_key_tr_s2 3s 4s -25%
pointwise_native_aarch64 3s 3s +0%
poly_chknorm_native_aarch64 3s 4s -25%
poly_decompose 3s 1s +200%
poly_decompose_native 3s 2s +50%
poly_permute_bitrev_to_custom_optional_native 3s 3s +0%
poly_reduce 3s 2s +50%
poly_uniform_eta 3s 4s -25%
poly_uniform_gamma1_4x 3s 3s +0%
polyeta_pack 3s 2s +50%
polyt1_pack 3s 7s -57%
polyvecl_chknorm 3s 5s -40%
polyvecl_pack_eta 3s 2s +50%
polyvecl_pointwise_acc_montgomery 3s 2s +50%
polyz_unpack 3s 3s +0%
polyz_unpack_17_native_aarch64 3s 4s -25%
power2round 3s 3s +0%
reduce32 3s 3s +0%
rej_eta 3s 4s -25%
shake128_absorb 3s 3s +0%
shake128_init 3s 3s +0%
shake128_release 3s 3s +0%
shake128_squeeze 3s 2s +50%
sign_open 3s 4s -25%
sign_pk_from_sk 3s 6s -50%
sign_signature_extmu 3s 3s +0%
sign_verify_extmu 3s 3s +0%
sk_s1hat_get_poly 3s 2s +50%
unpack_sk_s1hat 3s 4s -25%
yvec_get_poly 3s 3s +0%
decompose 2s 5s -60%
intt_native_aarch64 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 1s +100%
keccak_squeeze 2s 1s +100%
keccakf1600x4_extract_bytes 2s 4s -50%
keccakf1600x4_xor_bytes 2s 3s -33%
keccakf1600x4_xor_bytes_native 2s 4s -50%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_ct_get_optblocker_u32 2s 1s +100%
mld_ct_sel_int32 2s 2s +0%
mld_keccakf1600_extract_bytes 2s 1s +100%
mld_prepare_domain_separation_prefix 2s 3s -33%
mld_value_barrier_u32 2s 3s -33%
ntt_native_aarch64 2s 3s -33%
pack_sig_h 2s 5s -60%
pack_sig_z 2s 3s -33%
pointwise_native_x86_64 2s 2s +0%
poly_caddq 2s 3s -33%
poly_decompose_32_native_aarch64 2s 3s -33%
poly_decompose_88_native_aarch64 2s 4s -50%
poly_permute_bitrev_to_custom_optional 2s 1s +100%
poly_power2round 2s 6s -67%
poly_sub 2s 2s +0%
poly_uniform_4x 2s 3s -33%
poly_uniform_gamma1 2s 2s +0%
poly_use_hint_c 2s 3s -33%
poly_use_hint_native 2s 2s +0%
polyt0_pack 2s 4s -50%
polyveck_ntt 2s 5s -60%
polyveck_pack_w1 2s 5s -60%
polyvecl_pointwise_acc_montgomery_c 2s 2s +0%
polyvecl_uniform_gamma1 2s 4s -50%
polyvecl_unpack_z 2s 2s +0%
polyz_unpack_19_native_aarch64 2s 2s +0%
shake128x4_absorb_once 2s 2s +0%
shake256 2s 2s +0%
shake256_absorb 2s 2s +0%
shake256_finalize 2s 4s -50%
shake256_init 2s 2s +0%
shake256_release 2s 3s -33%
shake256x4_squeezeblocks 2s 2s +0%
sys_check_capability 2s 2s +0%
unpack_sk_t0hat 2s 4s -50%
keccak_f1600_x1_native_aarch64 1s 1s +0%
mld_ct_cmask_neg_i32 1s 2s -50%
mld_value_barrier_u8 1s 2s -50%
poly_ntt 1s 2s -50%
poly_pointwise_montgomery 1s 1s +0%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 12, 2026

CBMC Results (ML-DSA-87)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3813s 1958s +94.7%
fqmul ⚠️ 244s 27s +804%
poly_pointwise_montgomery_c ⚠️ 158s 93s +70%
Full Results (196 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3813s 1958s +94.7%
mld_invntt_2_layers_block 854s - new
mld_ntt_2_layers_block 565s - new
polyvecl_pointwise_acc_montgomery_c 298s 237s +26%
fqmul ⚠️ 244s 27s +804%
polyvec_matrix_expand 180s 165s +9%
poly_pointwise_montgomery_c ⚠️ 158s 93s +70%
rej_uniform_native 138s 122s +13%
mld_attempt_signature_generation 104s 102s +2%
mld_ntt_2_layers 87s - new
mld_invntt_2_layers 78s - new
mld_ct_memcmp 77s 76s +1%
sign_verify_internal 59s 56s +5%
sign_signature_internal 53s 54s -2%
polyvec_matrix_expand_serial 39s 36s +8%
compute_pack_t0_t1 25s 23s +9%
keccakf1600x4_permute_native 24s 22s +9%
rej_uniform 23s 16s +44%
polyt0_unpack 19s 13s +46%
polyvec_matrix_pointwise_montgomery_yvec 19s 22s -14%
mld_check_pct 17s 14s +21%
poly_chknorm_c 17s 16s +6%
rej_uniform_c 17s 20s -15%
poly_uniform_eta_4x 13s 15s -13%
poly_uniform_4x 12s 13s -8%
polyveck_decompose 12s 11s +9%
mld_compute_pack_z 11s 9s +22%
poly_add 11s 11s +0%
polyeta_unpack 11s 12s -8%
keccakf1600_permute 10s 7s +43%
polyveck_ntt 10s 12s -17%
keccak_absorb_once_x4 9s 10s -10%
polyveck_caddq 9s 8s +12%
polyveck_invntt_tomont 9s 7s +29%
keccakf1600_permute_native 8s 10s -20%
pointwise_acc_native_x86_64 8s 8s +0%
poly_invntt_tomont_c 8s 9s -11%
unpack_sk_t0hat 8s 5s +60%
keccak_absorb 7s 7s +0%
pointwise_acc_native_aarch64 7s 10s -30%
polyveck_chknorm 7s 5s +40%
polyvecl_ntt 7s 6s +17%
sign_pk_from_sk 7s 7s +0%
mld_h 6s 3s +100%
mld_sample_s1_s2 6s 6s +0%
mld_sample_s1_s2_serial 6s 4s +50%
poly_uniform_eta 6s 5s +20%
polyvecl_chknorm 6s 4s +50%
polyz_unpack_c 6s 6s +0%
sig_unpack_hints 6s 3s +100%
sign 6s 9s -33%
sign_keypair_internal 6s 5s +20%
sign_verify_extmu 6s 3s +100%
fqscale 5s 1s +400%
keccak_squeezeblocks_x4 5s 7s -29%
keccakf1600_extract_bytes (big endian) 5s 3s +67%
keccakf1600x4_extract_bytes 5s 2s +150%
mld_ct_cmask_nonzero_u32 5s 4s +25%
mld_prepare_domain_separation_prefix 5s 4s +25%
montgomery_reduce 5s 3s +67%
poly_caddq_native 5s 4s +25%
poly_chknorm_native 5s 3s +67%
poly_ntt 5s 4s +25%
poly_ntt_c 5s 2s +150%
poly_use_hint_c 5s 3s +67%
polyt0_pack 5s 5s +0%
polyveck_unpack_eta 5s 3s +67%
polyvecl_pointwise_acc_montgomery_native 5s 4s +25%
polyvecl_unpack_eta 5s 5s +0%
polyvecl_unpack_z 5s 3s +67%
shake128_release 5s 1s +400%
shake256 5s 2s +150%
sign_verify_pre_hash_shake256 5s 4s +25%
caddq 4s 3s +33%
decompose 4s 5s -20%
intt_native_aarch64 4s 6s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 4s 1s +300%
keccak_init 4s 2s +100%
keccakf1600_xor_bytes 4s 5s -20%
keccakf1600x4_extract_bytes_native 4s 3s +33%
keccakf1600x4_xor_bytes 4s 2s +100%
mld_ct_cmask_nonzero_u8 4s 2s +100%
mld_polymat_expand_entry 4s 2s +100%
ntt_native_aarch64 4s 3s +33%
ntt_native_x86_64 4s 4s +0%
nttunpack_native_x86_64 4s 2s +100%
pack_sig_c 4s 1s +300%
pack_sig_z 4s 2s +100%
pack_sk_rho_key_tr_s2 4s 4s +0%
pointwise_native_aarch64 4s 3s +33%
pointwise_native_x86_64 4s 3s +33%
poly_caddq_c 4s 6s -33%
poly_caddq_native_aarch64 4s 5s -20%
poly_decompose_32_native_aarch64 4s 1s +300%
poly_decompose_c 4s 5s -20%
poly_invntt_tomont_native 4s 4s +0%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_sub 4s 1s +300%
poly_uniform 4s 5s -20%
poly_uniform_gamma1 4s 2s +100%
poly_use_hint_native_aarch64 4s 2s +100%
polyveck_pack_w1 4s 4s +0%
polyvecl_pointwise_acc_montgomery 4s 6s -33%
polyvecl_uniform_gamma1 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 3s +33%
polyw1_pack 4s 5s -20%
polyz_unpack 4s 3s +33%
polyz_unpack_17_native_aarch64 4s 2s +100%
polyz_unpack_native 4s 2s +100%
power2round 4s 4s +0%
rej_eta_c 4s 4s +0%
shake128_init 4s 1s +300%
shake128x4_absorb_once 4s 3s +33%
shake256_init 4s 3s +33%
sign_keypair 4s 5s -20%
sign_open 4s 6s -33%
sign_signature_extmu 4s 3s +33%
sign_signature_pre_hash_internal 4s 6s -33%
sk_s1hat_get_poly 4s 2s +100%
sk_s2hat_get_poly 4s 3s +33%
unpack_pk_t1 4s 2s +100%
unpack_sk 4s 3s +33%
keccak_f1600_x4_native_avx2 3s 2s +50%
keccak_finalize 3s 3s +0%
keccak_squeeze 3s 3s +0%
make_hint 3s 3s +0%
pack_sig_h 3s 5s -40%
pack_sk_s1 3s 3s +0%
poly_caddq 3s 4s -25%
poly_challenge 3s 5s -40%
poly_chknorm_native_aarch64 3s 4s -25%
poly_decompose_native 3s 3s +0%
poly_pointwise_montgomery_native 3s 3s +0%
poly_reduce 3s 3s +0%
poly_shiftl 3s 4s -25%
poly_use_hint 3s 4s -25%
poly_use_hint_native 3s 3s +0%
polyt1_pack 3s 3s +0%
polyt1_unpack 3s 3s +0%
reduce32 3s 4s -25%
shake128_absorb 3s 3s +0%
shake128x4_squeezeblocks 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_release 3s 2s +50%
shake256x4_absorb_once 3s 3s +0%
shake256x4_squeezeblocks 3s 3s +0%
sign_signature 3s 3s +0%
sign_signature_pre_hash_shake256 3s 3s +0%
sign_verify 3s 7s -57%
sign_verify_pre_hash_internal 3s 4s -25%
sk_t0hat_get_poly 3s 2s +50%
sys_check_capability 3s 2s +50%
unpack_sk_s1hat 3s 2s +50%
unpack_sk_s2hat 3s 3s +0%
intt_native_x86_64 2s 2s +0%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccakf1600x4_permute 2s 4s -50%
keccakf1600x4_xor_bytes_native 2s 5s -60%
mld_ct_abs_i32 2s 4s -50%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_get_optblocker_i64 2s 1s +100%
mld_ct_get_optblocker_u32 2s 1s +100%
mld_ct_get_optblocker_u8 2s 4s -50%
mld_ct_sel_int32 2s 3s -33%
mld_value_barrier_i64 2s 2s +0%
mld_value_barrier_u32 2s 3s -33%
mld_value_barrier_u8 2s 6s -67%
poly_chknorm 2s 3s -33%
poly_decompose 2s 3s -33%
poly_decompose_88_native_aarch64 2s 3s -33%
poly_invntt_tomont 2s 2s +0%
poly_ntt_native 2s 5s -60%
poly_permute_bitrev_to_custom_optional_native 2s 4s -50%
poly_pointwise_montgomery 2s 4s -50%
poly_power2round 2s 8s -75%
poly_uniform_gamma1_4x 2s 6s -67%
polyvec_matrix_pointwise_montgomery_row 2s 2s +0%
polyveck_pack_eta 2s 3s -33%
polyveck_reduce 2s 6s -67%
polyvecl_pack_eta 2s 2s +0%
polyz_pack 2s 2s +0%
polyz_unpack_19_native_aarch64 2s 3s -33%
rej_eta 2s 3s -33%
rej_eta_native 2s 4s -50%
shake128_finalize 2s 1s +100%
shake128_squeeze 2s 2s +0%
shake256_finalize 2s 2s +0%
use_hint 2s 3s -33%
yvec_get_poly 2s 3s -33%
yvec_init 2s 3s -33%
keccakf1600_xor_bytes (big endian) 1s 2s -50%
mld_keccakf1600_extract_bytes 1s 2s -50%
polyeta_pack 1s 4s -75%
shake256_squeeze 1s 4s -75%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 12, 2026

CBMC Results (ML-DSA-65)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3629s 1912s +89.8%
fqmul ⚠️ 237s 28s +746%
poly_pointwise_montgomery_c ⚠️ 159s 95s +67%
Full Results (196 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3629s 1912s +89.8%
mld_invntt_2_layers_block 841s - new
mld_ntt_2_layers_block 539s - new
polyvecl_pointwise_acc_montgomery_c 315s 286s +10%
fqmul ⚠️ 237s 28s +746%
poly_pointwise_montgomery_c ⚠️ 159s 95s +67%
polyvec_matrix_expand 147s 146s +1%
rej_uniform_native 136s 124s +10%
mld_ntt_2_layers 94s - new
mld_ct_memcmp 78s 73s +7%
mld_invntt_2_layers 72s - new
mld_attempt_signature_generation 67s 71s -6%
sign_verify_internal 50s 55s -9%
sign_signature_internal 49s 45s +9%
polyvec_matrix_pointwise_montgomery_yvec 28s 29s -3%
polyvec_matrix_expand_serial 25s 25s +0%
keccakf1600x4_permute_native 22s 23s -4%
rej_uniform 21s 16s +31%
polyt0_unpack 17s 15s +13%
compute_pack_t0_t1 15s 13s +15%
poly_uniform_eta_4x 15s 13s +15%
polyveck_decompose 14s 14s +0%
rej_uniform_c 14s 21s -33%
poly_uniform_4x 13s 10s +30%
mld_check_pct 12s 13s -8%
poly_chknorm_c 12s 17s -29%
poly_add 11s 11s +0%
polyvecl_ntt 11s 8s +38%
keccakf1600_permute_native 10s 9s +11%
poly_decompose_c 10s 6s +67%
polyveck_chknorm 10s 10s +0%
polyveck_caddq 9s 7s +29%
polyveck_invntt_tomont 9s 9s +0%
keccak_absorb_once_x4 8s 9s -11%
mld_compute_pack_z 8s 9s -11%
poly_invntt_tomont_c 8s 10s -20%
poly_ntt_native 7s 2s +250%
polyvecl_chknorm 7s 3s +133%
polyvecl_uniform_gamma1_serial 7s 3s +133%
sign 7s 7s +0%
unpack_sk 7s 3s +133%
keccak_absorb 6s 8s -25%
keccak_squeezeblocks_x4 6s 4s +50%
keccakf1600_permute 6s 9s -33%
mld_h 6s 4s +50%
poly_caddq_c 6s 4s +50%
poly_chknorm 6s 1s +500%
poly_pointwise_montgomery_native 6s 4s +50%
polyveck_ntt 6s 7s -14%
sign_pk_from_sk 6s 5s +20%
sign_verify 6s 4s +50%
pointwise_acc_native_x86_64 5s 7s -29%
poly_caddq_native_aarch64 5s 2s +150%
poly_invntt_tomont_native 5s 2s +150%
poly_permute_bitrev_to_custom_optional 5s 3s +67%
poly_uniform_gamma1 5s 3s +67%
polyveck_unpack_eta 5s 3s +67%
sig_unpack_hints 5s 4s +25%
sign_keypair_internal 5s 5s +0%
fqscale 4s 2s +100%
intt_native_aarch64 4s 4s +0%
intt_native_x86_64 4s 3s +33%
keccakf1600x4_xor_bytes_native 4s 4s +0%
mld_ct_get_optblocker_i64 4s 4s +0%
mld_sample_s1_s2 4s 3s +33%
mld_sample_s1_s2_serial 4s 6s -33%
ntt_native_aarch64 4s 4s +0%
nttunpack_native_x86_64 4s 4s +0%
pack_sig_c 4s 4s +0%
pack_sig_h 4s 3s +33%
pack_sk_s1 4s 3s +33%
pointwise_acc_native_aarch64 4s 6s -33%
poly_chknorm_native_aarch64 4s 2s +100%
poly_decompose_88_native_aarch64 4s 4s +0%
poly_decompose_native 4s 3s +33%
poly_invntt_tomont 4s 4s +0%
poly_shiftl 4s 6s -33%
poly_uniform 4s 2s +100%
poly_uniform_gamma1_4x 4s 3s +33%
poly_use_hint_native 4s 2s +100%
poly_use_hint_native_aarch64 4s 4s +0%
polyeta_pack 4s 3s +33%
polyt0_pack 4s 3s +33%
polyt1_pack 4s 5s -20%
polyveck_reduce 4s 3s +33%
polyvecl_pack_eta 4s 2s +100%
polyvecl_pointwise_acc_montgomery_native 4s 2s +100%
polyvecl_uniform_gamma1 4s 3s +33%
polyvecl_unpack_eta 4s 5s -20%
polyz_unpack_c 4s 7s -43%
shake256_absorb 4s 4s +0%
shake256x4_absorb_once 4s 2s +100%
sign_keypair 4s 5s -20%
sign_open 4s 4s +0%
sign_signature_pre_hash_internal 4s 5s -20%
sign_verify_pre_hash_internal 4s 6s -33%
sign_verify_pre_hash_shake256 4s 3s +33%
sys_check_capability 4s 2s +100%
unpack_pk_t1 4s 3s +33%
unpack_sk_s1hat 4s 3s +33%
unpack_sk_t0hat 4s 5s -20%
yvec_init 4s 3s +33%
decompose 3s 4s -25%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 2s +50%
keccak_finalize 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600x4_extract_bytes 3s 2s +50%
keccakf1600x4_extract_bytes_native 3s 2s +50%
keccakf1600x4_permute 3s 3s +0%
make_hint 3s 2s +50%
mld_ct_abs_i32 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 2s +50%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_value_barrier_i64 3s 4s -25%
mld_value_barrier_u32 3s 4s -25%
mld_value_barrier_u8 3s 3s +0%
ntt_native_x86_64 3s 3s +0%
pack_sig_z 3s 2s +50%
pointwise_native_x86_64 3s 4s -25%
poly_caddq 3s 4s -25%
poly_caddq_native 3s 2s +50%
poly_challenge 3s 3s +0%
poly_chknorm_native 3s 4s -25%
poly_ntt 3s 4s -25%
poly_ntt_c 3s 4s -25%
poly_pointwise_montgomery 3s 3s +0%
poly_power2round 3s 8s -62%
poly_sub 3s 3s +0%
poly_use_hint 3s 3s +0%
polyeta_unpack 3s 5s -40%
polyvec_matrix_pointwise_montgomery_row 3s 2s +50%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_unpack_z 3s 4s -25%
polyz_pack 3s 3s +0%
polyz_unpack 3s 2s +50%
polyz_unpack_19_native_aarch64 3s 2s +50%
reduce32 3s 1s +200%
rej_eta 3s 4s -25%
rej_eta_native 3s 3s +0%
shake128_finalize 3s 2s +50%
shake128_squeeze 3s 2s +50%
shake256_finalize 3s 3s +0%
shake256_init 3s 3s +0%
shake256_release 3s 2s +50%
sign_signature 3s 6s -50%
sign_signature_extmu 3s 5s -40%
sign_verify_extmu 3s 6s -50%
sk_s1hat_get_poly 3s 2s +50%
sk_s2hat_get_poly 3s 2s +50%
sk_t0hat_get_poly 3s 4s -25%
unpack_sk_s2hat 3s 3s +0%
use_hint 3s 3s +0%
caddq 2s 6s -67%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 5s -60%
keccak_f1600_x4_native_aarch64_v84a 2s 4s -50%
keccak_init 2s 4s -50%
keccak_squeeze 2s 2s +0%
keccakf1600_xor_bytes 2s 2s +0%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_xor_bytes 2s 3s -33%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_get_optblocker_u32 2s 1s +100%
mld_ct_sel_int32 2s 1s +100%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_prepare_domain_separation_prefix 2s 3s -33%
montgomery_reduce 2s 3s -33%
pack_sk_rho_key_tr_s2 2s 3s -33%
pointwise_native_aarch64 2s 1s +100%
poly_decompose 2s 3s -33%
poly_decompose_32_native_aarch64 2s 2s +0%
poly_permute_bitrev_to_custom_optional_native 2s 2s +0%
poly_uniform_eta 2s 3s -33%
poly_use_hint_c 2s 5s -60%
polyt1_unpack 2s 2s +0%
polyveck_pack_eta 2s 4s -50%
polyveck_pack_w1 2s 3s -33%
polyw1_pack 2s 3s -33%
polyz_unpack_17_native_aarch64 2s 4s -50%
power2round 2s 3s -33%
rej_eta_c 2s 3s -33%
shake128_absorb 2s 4s -50%
shake128_init 2s 4s -50%
shake128_release 2s 1s +100%
shake128x4_absorb_once 2s 1s +100%
shake256 2s 3s -33%
shake256_squeeze 2s 4s -50%
shake256x4_squeezeblocks 2s 2s +0%
sign_signature_pre_hash_shake256 2s 4s -50%
yvec_get_poly 2s 2s +0%
keccak_f1600_x4_native_avx2 1s 2s -50%
mld_polymat_expand_entry 1s 4s -75%
poly_reduce 1s 4s -75%
polyz_unpack_native 1s 2s -50%
shake128x4_squeezeblocks 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 12, 2026

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3595s 1601s +124.5%
fqmul ⚠️ 251s 27s +830%
poly_pointwise_montgomery_c ⚠️ 172s 86s +100%
Full Results (196 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3595s 1601s +124.5%
mld_invntt_2_layers_block 915s - new
mld_ntt_2_layers_block 577s - new
polyvecl_pointwise_acc_montgomery_c 284s 237s +20%
fqmul ⚠️ 251s 27s +830%
poly_pointwise_montgomery_c ⚠️ 172s 86s +100%
rej_uniform_native 141s 115s +23%
mld_ntt_2_layers 93s - new
mld_ct_memcmp 89s 72s +24%
mld_invntt_2_layers 80s - new
mld_attempt_signature_generation 58s 54s +7%
sign_verify_internal 32s 30s +7%
sign_signature_internal 31s 28s +11%
polyvec_matrix_expand 29s 27s +7%
keccakf1600x4_permute_native 24s 23s +4%
rej_uniform 22s 17s +29%
polyt0_unpack 20s 15s +33%
polyvecl_chknorm 17s 16s +6%
rej_uniform_c 16s 16s +0%
compute_pack_t0_t1 15s 15s +0%
poly_chknorm_c 15s 15s +0%
poly_uniform_4x 15s 11s +36%
poly_uniform_eta_4x 15s 14s +7%
polyeta_unpack 15s 11s +36%
polyvec_matrix_pointwise_montgomery_yvec 14s 14s +0%
polyz_unpack_c 14s 13s +8%
polyvec_matrix_expand_serial 13s 10s +30%
poly_add 12s 13s -8%
mld_check_pct 11s 15s -27%
poly_invntt_tomont_c 11s 6s +83%
keccak_absorb_once_x4 10s 9s +11%
mld_compute_pack_z 8s 8s +0%
poly_decompose_c 8s 6s +33%
keccak_absorb 7s 6s +17%
keccakf1600_permute_native 7s 7s +0%
polyveck_caddq 7s 5s +40%
sign 7s 7s +0%
sign_keypair_internal 7s 3s +133%
keccakf1600x4_xor_bytes_native 6s 4s +50%
poly_decompose_32_native_aarch64 6s 3s +100%
poly_ntt_native 6s 2s +200%
poly_uniform_eta 6s 4s +50%
poly_use_hint_c 6s 4s +50%
polyveck_decompose 6s 7s -14%
shake256_squeeze 6s 2s +200%
sign_pk_from_sk 6s 6s +0%
sign_signature_pre_hash_shake256 6s 6s +0%
intt_native_x86_64 5s 2s +150%
keccak_squeezeblocks_x4 5s 5s +0%
keccakf1600_permute 5s 7s -29%
mld_ct_get_optblocker_i64 5s 4s +25%
mld_ct_get_optblocker_u8 5s 3s +67%
mld_h 5s 3s +67%
mld_value_barrier_i64 5s 2s +150%
montgomery_reduce 5s 2s +150%
ntt_native_aarch64 5s 5s +0%
pointwise_acc_native_aarch64 5s 5s +0%
pointwise_acc_native_x86_64 5s 6s -17%
poly_caddq_c 5s 4s +25%
poly_chknorm_native 5s 3s +67%
poly_invntt_tomont_native 5s 3s +67%
polyt0_pack 5s 4s +25%
polyveck_ntt 5s 5s +0%
polyvecl_ntt 5s 6s -17%
shake128x4_squeezeblocks 5s 3s +67%
shake256x4_squeezeblocks 5s 4s +25%
sign_open 5s 3s +67%
sign_signature_extmu 5s 4s +25%
unpack_pk_t1 5s 2s +150%
caddq 4s 3s +33%
decompose 4s 1s +300%
keccak_init 4s 3s +33%
keccakf1600x4_extract_bytes_native 4s 1s +300%
mld_keccakf1600_extract_bytes 4s 3s +33%
mld_prepare_domain_separation_prefix 4s 3s +33%
mld_sample_s1_s2 4s 2s +100%
pack_sig_h 4s 2s +100%
pack_sig_z 4s 3s +33%
pack_sk_s1 4s 3s +33%
pointwise_native_aarch64 4s 3s +33%
poly_caddq 4s 3s +33%
poly_caddq_native_aarch64 4s 4s +0%
poly_challenge 4s 7s -43%
poly_decompose_native 4s 5s -20%
poly_invntt_tomont 4s 2s +100%
poly_ntt_c 4s 3s +33%
poly_pointwise_montgomery_native 4s 3s +33%
poly_power2round 4s 6s -33%
poly_shiftl 4s 9s -56%
poly_uniform 4s 4s +0%
poly_use_hint 4s 4s +0%
polyveck_invntt_tomont 4s 4s +0%
polyveck_pack_w1 4s 3s +33%
polyvecl_pointwise_acc_montgomery 4s 3s +33%
polyvecl_pointwise_acc_montgomery_native 4s 2s +100%
polyw1_pack 4s 4s +0%
polyz_pack 4s 3s +33%
polyz_unpack_17_native_aarch64 4s 4s +0%
polyz_unpack_native 4s 2s +100%
reduce32 4s 3s +33%
rej_eta_native 4s 6s -33%
shake128_squeeze 4s 2s +100%
shake256_init 4s 2s +100%
sign_signature 4s 4s +0%
sign_signature_pre_hash_internal 4s 6s -33%
sign_verify_extmu 4s 3s +33%
sign_verify_pre_hash_internal 4s 4s +0%
use_hint 4s 2s +100%
fqscale 3s 2s +50%
keccakf1600_extract_bytes (big endian) 3s 3s +0%
keccakf1600x4_extract_bytes 3s 5s -40%
make_hint 3s 3s +0%
mld_ct_cmask_neg_i32 3s 1s +200%
mld_ct_get_optblocker_u32 3s 1s +200%
mld_ct_sel_int32 3s 4s -25%
mld_polymat_expand_entry 3s 1s +200%
pack_sig_c 3s 3s +0%
pack_sk_rho_key_tr_s2 3s 5s -40%
pointwise_native_x86_64 3s 3s +0%
poly_caddq_native 3s 3s +0%
poly_decompose 3s 3s +0%
poly_ntt 3s 3s +0%
poly_permute_bitrev_to_custom_optional 3s 1s +200%
poly_permute_bitrev_to_custom_optional_native 3s 2s +50%
poly_pointwise_montgomery 3s 4s -25%
poly_sub 3s 4s -25%
poly_uniform_gamma1_4x 3s 3s +0%
poly_use_hint_native_aarch64 3s 2s +50%
polyeta_pack 3s 5s -40%
polyvec_matrix_pointwise_montgomery_row 3s 2s +50%
polyveck_reduce 3s 2s +50%
polyveck_unpack_eta 3s 2s +50%
polyvecl_uniform_gamma1 3s 4s -25%
polyvecl_unpack_eta 3s 4s -25%
polyz_unpack 3s 4s -25%
power2round 3s 4s -25%
rej_eta_c 3s 3s +0%
shake128_absorb 3s 1s +200%
shake256 3s 2s +50%
shake256_absorb 3s 1s +200%
shake256_finalize 3s 1s +200%
shake256_release 3s 4s -25%
shake256x4_absorb_once 3s 3s +0%
sign_keypair 3s 4s -25%
sign_verify 3s 2s +50%
sign_verify_pre_hash_shake256 3s 3s +0%
sk_s1hat_get_poly 3s 3s +0%
sk_s2hat_get_poly 3s 3s +0%
unpack_sk 3s 3s +0%
unpack_sk_s1hat 3s 2s +50%
unpack_sk_t0hat 3s 3s +0%
yvec_init 3s 2s +50%
intt_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 4s -50%
keccak_finalize 2s 1s +100%
keccak_squeeze 2s 2s +0%
keccakf1600x4_permute 2s 3s -33%
keccakf1600x4_xor_bytes 2s 3s -33%
mld_ct_abs_i32 2s 4s -50%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_sample_s1_s2_serial 2s 5s -60%
mld_value_barrier_u8 2s 2s +0%
ntt_native_x86_64 2s 5s -60%
nttunpack_native_x86_64 2s 4s -50%
poly_chknorm 2s 3s -33%
poly_chknorm_native_aarch64 2s 3s -33%
poly_decompose_88_native_aarch64 2s 2s +0%
poly_reduce 2s 3s -33%
poly_uniform_gamma1 2s 1s +100%
poly_use_hint_native 2s 2s +0%
polyt1_pack 2s 4s -50%
polyt1_unpack 2s 2s +0%
polyveck_chknorm 2s 7s -71%
polyveck_pack_eta 2s 1s +100%
polyvecl_pack_eta 2s 4s -50%
polyvecl_uniform_gamma1_serial 2s 2s +0%
polyvecl_unpack_z 2s 1s +100%
polyz_unpack_19_native_aarch64 2s 5s -60%
rej_eta 2s 3s -33%
shake128_init 2s 2s +0%
shake128x4_absorb_once 2s 2s +0%
sig_unpack_hints 2s 2s +0%
sys_check_capability 2s 2s +0%
yvec_get_poly 2s 4s -50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 2s -50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 1s +0%
keccakf1600_xor_bytes 1s 3s -67%
keccakf1600_xor_bytes (big endian) 1s 6s -83%
mld_value_barrier_u32 1s 2s -50%
shake128_finalize 1s 2s -50%
shake128_release 1s 1s +0%
sk_t0hat_get_poly 1s 1s +0%
unpack_sk_s2hat 1s 4s -75%

@hanno-becker hanno-becker changed the title [WIP] Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging May 13, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-65 sign 1367463 cycles 1320861 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Replace the single-layer C-reference forward and inverse NTT in
`mldsa/src/poly.c` with one that merges two layers each.

Also, store each twiddle alongside its precomputed twist, letting
`mld_fqmul(a, b, b_twisted)` drop the multiply with MLDSA_Q^{-1}
that was previously hidden inside `mld_montgomery_reduce`.

Mirrors pq-code-package/mlkem-native/#463 and pq-code/package/mlkem-native/#683

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Copy link
Copy Markdown
Contributor

@rod-chapman rod-chapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good. 1 suggestion to improve proof times.


# Disable any setting of EXTERNAL_SAT_SOLVER, and choose SMT backend instead
EXTERNAL_SAT_SOLVER=
CBMCFLAGS=--bitwuzla
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my laptop, proof of this new implementation takes 131s with bitwuzla, so I tried z3, which compltes the proof in about 26s. Suggest switch to CBMCFLAGS=--smt2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants