Skip to content

Namespace STACK_SIZE#1396

Draft
willieyz wants to merge 1 commit intomainfrom
namespace-STACK_SIZE
Draft

Namespace STACK_SIZE#1396
willieyz wants to merge 1 commit intomainfrom
namespace-STACK_SIZE

Conversation

@willieyz
Copy link
Copy Markdown
Contributor

@willieyz willieyz marked this pull request as ready for review December 17, 2025 02:28
@willieyz willieyz requested a review from a team as a code owner December 17, 2025 02:28

#define STACK_SIZE (16*6 + 8*8 + 6*8 + (STACK_LOCS) * 8)
#define MLK_STACK_SIZE (16*6 + 8*8 + 6*8 + (STACK_LOCS) * 8)
#define STACK_BASE_GPRS (6*8)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't namespace STACK_SIZE, but not the other macros; either all of them, or none.

Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this we should namespace all macros in the assembly files.

Note that the lack of namespacing here is of little effect since all affected files are in dev/ and not in the main source tree.

@willieyz willieyz force-pushed the namespace-STACK_SIZE branch 3 times, most recently from a7001df to 59a18fd Compare December 17, 2025 10:50
@willieyz willieyz marked this pull request as draft December 18, 2025 02:00
@willieyz willieyz force-pushed the namespace-STACK_SIZE branch 2 times, most recently from e96e223 to 0878495 Compare December 31, 2025 10:24
@willieyz willieyz marked this pull request as ready for review December 31, 2025 10:39
@willieyz willieyz marked this pull request as draft December 31, 2025 11:06
@willieyz willieyz force-pushed the namespace-STACK_SIZE branch 4 times, most recently from f855045 to b6a4787 Compare January 3, 2026 20:24
@willieyz willieyz force-pushed the namespace-STACK_SIZE branch 2 times, most recently from 4db09cb to 5f64f67 Compare January 12, 2026 03:46
@willieyz willieyz requested a review from hanno-becker January 22, 2026 02:27
@willieyz willieyz marked this pull request as ready for review January 22, 2026 02:27
@willieyz willieyz marked this pull request as draft January 23, 2026 02:29
@willieyz willieyz force-pushed the namespace-STACK_SIZE branch from 0b7e708 to 17d96d2 Compare January 23, 2026 02:35
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Jan 23, 2026

CBMC Results (ML-KEM-512)

Full Results (187 proofs)
Proof Status Current Previous Change
**TOTAL** 1403s 1261s +11.3%
mlk_indcpa_keypair_derand 169s 160s +6%
mlk_indcpa_enc 157s 145s +8%
mlk_keccak_squeezeblocks_x4 138s 123s +12%
mlk_rej_uniform_c 95s 76s +25%
mlk_polyvec_basemul_acc_montgomery_cached_c 77s 70s +10%
mlk_poly_rej_uniform 44s 40s +10%
poly_ntt_native 40s 35s +14%
polyvec_basemul_acc_montgomery_cached_native 25s 22s +14%
keccakf1600x4_permute_native_x4 21s 18s +17%
mlk_polyvec_add 20s 18s +11%
mlk_ntt_layer 16s 12s +33%
mlk_poly_reduce_native 16s 15s +7%
mlk_poly_decompress_d10_native 15s 13s +15%
mlk_poly_decompress_d4_native 15s 13s +15%
mlk_poly_frombytes_native 13s 10s +30%
mlk_indcpa_dec 12s 12s +0%
mlk_poly_frommsg 11s 11s +0%
mlk_ntt_butterfly_block 10s 7s +43%
mlk_polymat_permute_bitrev_to_custom 10s 7s +43%
keccakf1600_permute_native 9s 4s +125%
mlk_keccak_squeezeblocks 9s 9s +0%
mlk_fqmul 7s 8s -12%
mlk_invntt_layer 7s 7s +0%
mlk_keccak_absorb_once_x4 7s 8s -12%
mlk_poly_rej_uniform_x4 7s 8s -12%
kem_dec 6s 4s +50%
mlk_keccak_absorb_once 6s 4s +50%
mlk_keccak_squeeze_once 6s 6s +0%
mlk_poly_getnoise_eta1122_4x 6s 3s +100%
mlk_polyvec_mulcache_compute 6s 5s +20%
poly_frombytes_native_x86_64 6s 7s -14%
poly_invntt_tomont_native 6s 3s +100%
mlk_poly_add 5s 5s +0%
mlk_poly_cbd_eta2 5s 6s -17%
mlk_poly_reduce_c 5s 4s +25%
mlk_polyvec_decompress_du 5s 2s +150%
mlk_value_barrier_i32 5s 3s +67%
nttunpack_native_x86_64 5s 4s +25%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 5s 4s +25%
intt_native_x86_64 4s 4s +0%
kem_check_pk 4s 3s +33%
kem_enc_derand 4s 3s +33%
kem_keypair_derand 4s 4s +0%
mlk_barrett_reduce 4s 3s +33%
mlk_check_pct 4s 3s +33%
mlk_ct_cmask_neg_i16 4s 2s +100%
mlk_ct_sel_uint8 4s 2s +100%
mlk_montgomery_reduce 4s 3s +33%
mlk_poly_cbd_eta1 4s 2s +100%
mlk_poly_compress_d10 4s 2s +100%
mlk_poly_decompress_d10_c 4s 5s -20%
mlk_poly_decompress_d11_native 4s 2s +100%
mlk_poly_decompress_d4 4s 2s +100%
mlk_poly_getnoise_eta1_4x_native 4s 3s +33%
mlk_poly_invntt_tomont 4s 2s +100%
mlk_poly_mulcache_compute_c 4s 3s +33%
mlk_poly_tobytes_c 4s 3s +33%
mlk_polyvec_frombytes 4s 1s +300%
mlk_polyvec_permute_bitrev_to_custom 4s 2s +100%
mlk_polyvec_reduce 4s 4s +0%
mlk_scalar_compress_d1 4s 3s +33%
mlk_scalar_decompress_d4 4s 2s +100%
mlk_shake128_absorb_once 4s 2s +100%
mlk_shake256 4s 1s +300%
mlk_shake256x4 4s 5s -20%
poly_decompress_d10_native_x86_64 4s 4s +0%
poly_decompress_d11_native_x86_64 4s 1s +300%
poly_decompress_d4_native_x86_64 4s 5s -20%
poly_decompress_d5_native_x86_64 4s 4s +0%
poly_getnoise_eta1122_4x_native 4s 2s +100%
intt_native_aarch64 3s 3s +0%
keccakf1600x4_xor_bytes_native 3s 3s +0%
kem_enc 3s 3s +0%
kem_keypair 3s 3s +0%
mlk_ct_cmask_nonzero_u8 3s 1s +200%
mlk_ct_memcmp 3s 2s +50%
mlk_ct_sel_int16 3s 3s +0%
mlk_gen_matrix 3s 3s +0%
mlk_keccakf1600_extract_bytes (big endian) 3s 1s +200%
mlk_keccakf1600_permute 3s 3s +0%
mlk_keccakf1600x4_extract_bytes 3s 2s +50%
mlk_poly_compress_d10_native 3s 1s +200%
mlk_poly_compress_d11_c 3s 1s +200%
mlk_poly_compress_d11_native 3s 2s +50%
mlk_poly_compress_d4_native 3s 3s +0%
mlk_poly_compress_d5 3s 1s +200%
mlk_poly_compress_d5_native 3s 3s +0%
mlk_poly_compress_du 3s 2s +50%
mlk_poly_compress_dv 3s 1s +200%
mlk_poly_decompress_d11 3s 2s +50%
mlk_poly_decompress_d4_c 3s 1s +200%
mlk_poly_decompress_d5_c 3s 2s +50%
mlk_poly_decompress_d5_native 3s 2s +50%
mlk_poly_decompress_du 3s 2s +50%
mlk_poly_decompress_dv 3s 1s +200%
mlk_poly_frombytes 3s 2s +50%
mlk_poly_getnoise_eta1_4x 3s 2s +50%
mlk_poly_invntt_tomont_c 3s 1s +200%
mlk_poly_mulcache_compute_native 3s 2s +50%
mlk_poly_ntt_c 3s 2s +50%
mlk_poly_tobytes_native 3s 2s +50%
mlk_poly_tomont 3s 2s +50%
mlk_poly_tomont_native 3s 4s -25%
mlk_polyvec_basemul_acc_montgomery_cached 3s 2s +50%
mlk_polyvec_invntt_tomont 3s 3s +0%
mlk_polyvec_permute_bitrev_to_custom_native 3s 2s +50%
mlk_scalar_compress_d5 3s 3s +0%
mlk_shake128x4_absorb_once 3s 2s +50%
mlk_value_barrier_u32 3s 1s +200%
ntt_native_x86_64 3s 2s +50%
poly_compress_d11_native_x86_64 3s 3s +0%
poly_reduce_native_x86_64 3s 1s +200%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 3s +0%
rej_uniform_native 3s 5s -40%
keccak_f1600_x1_native_aarch64 2s 1s +100%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 1s +100%
keccakf1600x4_extract_bytes_native 2s 2s +0%
kem_check_sk 2s 3s -33%
mlk_ct_cmask_nonzero_u16 2s 1s +100%
mlk_ct_get_optblocker_u32 2s 3s -33%
mlk_ct_get_optblocker_u8 2s 2s +0%
mlk_gen_matrix_serial 2s 2s +0%
mlk_keccakf1600_extract_bytes 2s 1s +100%
mlk_keccakf1600x4_permute 2s 1s +100%
mlk_keypair_getnoise_eta1 2s 3s -33%
mlk_poly_compress_d10_c 2s 4s -50%
mlk_poly_compress_d4 2s 3s -33%
mlk_poly_compress_d4_c 2s 6s -67%
mlk_poly_decompress_d5 2s 2s +0%
mlk_poly_frombytes_c 2s 2s +0%
mlk_poly_getnoise_eta2 2s 2s +0%
mlk_poly_mulcache_compute 2s 2s +0%
mlk_poly_ntt 2s 3s -33%
mlk_poly_tobytes 2s 2s +0%
mlk_poly_tomont_c 2s 4s -50%
mlk_polyvec_compress_du 2s 3s -33%
mlk_polyvec_tobytes 2s 4s -50%
mlk_polyvec_tomont 2s 2s +0%
mlk_rej_uniform 2s 3s -33%
mlk_scalar_compress_d10 2s 2s +0%
mlk_scalar_compress_d11 2s 1s +100%
mlk_scalar_compress_d4 2s 3s -33%
mlk_scalar_signed_to_unsigned_q 2s 4s -50%
mlk_sha3_256 2s 2s +0%
mlk_sha3_512 2s 2s +0%
mlk_shake128_squeezeblocks 2s 2s +0%
ntt_native_aarch64 2s 2s +0%
poly_compress_d4_native_x86_64 2s 3s -33%
poly_compress_d5_native_x86_64 2s 2s +0%
poly_mulcache_compute_native_x86_64 2s 3s -33%
poly_reduce_native_aarch64 2s 2s +0%
poly_tobytes_native_aarch64 2s 2s +0%
poly_tomont_native_x86_64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 2s 1s +100%
rej_uniform_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 1s +0%
mlk_ct_cmov_zero 1s 2s -50%
mlk_ct_get_optblocker_i32 1s 2s -50%
mlk_keccakf1600_xor_bytes 1s 2s -50%
mlk_keccakf1600_xor_bytes (big endian) 1s 1s +0%
mlk_keccakf1600x4_xor_bytes 1s 1s +0%
mlk_matvec_mul 1s 3s -67%
mlk_poly_compress_d11 1s 3s -67%
mlk_poly_compress_d5_c 1s 2s -50%
mlk_poly_decompress_d10 1s 1s +0%
mlk_poly_decompress_d11_c 1s 3s -67%
mlk_poly_reduce 1s 3s -67%
mlk_poly_sub 1s 1s +0%
mlk_poly_tomsg 1s 1s +0%
mlk_polyvec_ntt 1s 5s -80%
mlk_scalar_decompress_d10 1s 3s -67%
mlk_scalar_decompress_d11 1s 2s -50%
mlk_scalar_decompress_d5 1s 1s +0%
mlk_shake128x4_squeezeblocks 1s 2s -50%
mlk_value_barrier_u8 1s 3s -67%
poly_compress_d10_native_x86_64 1s 2s -50%
poly_mulcache_compute_native_aarch64 1s 2s -50%
poly_tobytes_native_x86_64 1s 3s -67%
poly_tomont_native_aarch64 1s 2s -50%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 1s 5s -80%
rej_uniform_native_x86_64 1s 3s -67%
sys_check_capability 1s 1s +0%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Jan 23, 2026

CBMC Results (ML-KEM-1024)

Full Results (187 proofs)
Proof Status Current Previous Change
**TOTAL** 1422s 1386s +2.6%
mlk_indcpa_enc 255s 246s +4%
polyvec_basemul_acc_montgomery_cached_native 130s 123s +6%
mlk_keccak_squeezeblocks_x4 122s 122s +0%
mlk_indcpa_keypair_derand 84s 81s +4%
mlk_rej_uniform_c 72s 74s -3%
mlk_polyvec_basemul_acc_montgomery_cached_c 68s 69s -1%
mlk_poly_rej_uniform 35s 33s +6%
poly_ntt_native 30s 32s -6%
mlk_polyvec_add 25s 25s +0%
keccakf1600x4_permute_native_x4 18s 20s -10%
mlk_polyvec_ntt 16s 14s +14%
mlk_poly_reduce_native 15s 15s +0%
mlk_poly_decompress_d5_native 14s 14s +0%
mlk_ntt_layer 13s 11s +18%
mlk_indcpa_dec 12s 13s -8%
mlk_poly_decompress_d11_native 12s 12s +0%
mlk_poly_frommsg 11s 10s +10%
mlk_polymat_permute_bitrev_to_custom 10s 10s +0%
mlk_keccak_absorb_once_x4 9s 7s +29%
mlk_poly_rej_uniform_x4 9s 8s +12%
keccakf1600_permute_native 8s 6s +33%
mlk_gen_matrix 8s 9s -11%
mlk_ntt_butterfly_block 8s 10s -20%
mlk_poly_compress_d11_c 8s 13s -38%
mlk_poly_frombytes_native 8s 7s +14%
poly_frombytes_native_x86_64 8s 5s +60%
kem_dec 7s 7s +0%
mlk_fqmul 7s 7s +0%
mlk_keccak_squeezeblocks 7s 5s +40%
mlk_gen_matrix_serial 6s 5s +20%
mlk_invntt_layer 6s 4s +50%
mlk_keccak_squeeze_once 6s 6s +0%
mlk_keypair_getnoise_eta1 6s 3s +100%
mlk_poly_compress_dv 6s 2s +200%
poly_decompress_d11_native_x86_64 6s 6s +0%
poly_tomont_native_aarch64 6s 2s +200%
intt_native_x86_64 5s 2s +150%
mlk_poly_add 5s 4s +25%
mlk_poly_decompress_d5 5s 3s +67%
mlk_shake256x4 5s 3s +67%
poly_tobytes_native_x86_64 5s 2s +150%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 5s 4s +25%
kem_enc_derand 4s 2s +100%
mlk_ct_cmask_nonzero_u16 4s 3s +33%
mlk_keccak_absorb_once 4s 3s +33%
mlk_keccakf1600_permute 4s 3s +33%
mlk_poly_compress_du 4s 2s +100%
mlk_poly_mulcache_compute_native 4s 1s +300%
mlk_polyvec_decompress_du 4s 3s +33%
mlk_polyvec_permute_bitrev_to_custom 4s 2s +100%
mlk_polyvec_permute_bitrev_to_custom_native 4s 2s +100%
mlk_polyvec_reduce 4s 1s +300%
mlk_polyvec_tobytes 4s 1s +300%
mlk_shake128_absorb_once 4s 3s +33%
ntt_native_x86_64 4s 1s +300%
poly_decompress_d5_native_x86_64 4s 3s +33%
poly_mulcache_compute_native_aarch64 4s 2s +100%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccakf1600x4_xor_bytes_native 3s 4s -25%
kem_check_pk 3s 4s -25%
kem_enc 3s 1s +200%
kem_keypair_derand 3s 2s +50%
mlk_ct_cmask_neg_i16 3s 2s +50%
mlk_ct_sel_uint8 3s 3s +0%
mlk_poly_cbd_eta2 3s 2s +50%
mlk_poly_compress_d11 3s 2s +50%
mlk_poly_compress_d4 3s 1s +200%
mlk_poly_compress_d5_c 3s 2s +50%
mlk_poly_decompress_d10 3s 3s +0%
mlk_poly_decompress_d5_c 3s 2s +50%
mlk_poly_getnoise_eta1_4x 3s 3s +0%
mlk_poly_getnoise_eta1_4x_native 3s 3s +0%
mlk_poly_invntt_tomont_c 3s 5s -40%
mlk_poly_tobytes_native 3s 2s +50%
mlk_poly_tomsg 3s 2s +50%
mlk_polyvec_tomont 3s 1s +200%
mlk_scalar_decompress_d10 3s 2s +50%
mlk_scalar_decompress_d4 3s 2s +50%
mlk_shake256 3s 3s +0%
poly_compress_d11_native_x86_64 3s 2s +50%
poly_compress_d5_native_x86_64 3s 1s +200%
poly_mulcache_compute_native_x86_64 3s 1s +200%
poly_tomont_native_x86_64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 3s 2s +50%
rej_uniform_native_aarch64 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 2s +0%
kem_check_sk 2s 3s -33%
mlk_check_pct 2s 2s +0%
mlk_ct_get_optblocker_i32 2s 1s +100%
mlk_ct_get_optblocker_u32 2s 4s -50%
mlk_ct_get_optblocker_u8 2s 2s +0%
mlk_ct_memcmp 2s 2s +0%
mlk_ct_sel_int16 2s 2s +0%
mlk_keccakf1600_extract_bytes 2s 2s +0%
mlk_keccakf1600_xor_bytes 2s 2s +0%
mlk_keccakf1600_xor_bytes (big endian) 2s 2s +0%
mlk_keccakf1600x4_permute 2s 2s +0%
mlk_keccakf1600x4_xor_bytes 2s 2s +0%
mlk_matvec_mul 2s 2s +0%
mlk_montgomery_reduce 2s 2s +0%
mlk_poly_cbd_eta1 2s 1s +100%
mlk_poly_compress_d10 2s 3s -33%
mlk_poly_compress_d10_c 2s 2s +0%
mlk_poly_compress_d10_native 2s 3s -33%
mlk_poly_compress_d4_c 2s 2s +0%
mlk_poly_compress_d4_native 2s 3s -33%
mlk_poly_decompress_d10_c 2s 4s -50%
mlk_poly_decompress_d11 2s 2s +0%
mlk_poly_decompress_d11_c 2s 2s +0%
mlk_poly_decompress_d4 2s 3s -33%
mlk_poly_decompress_d4_c 2s 3s -33%
mlk_poly_decompress_d4_native 2s 2s +0%
mlk_poly_decompress_du 2s 3s -33%
mlk_poly_decompress_dv 2s 2s +0%
mlk_poly_frombytes 2s 1s +100%
mlk_poly_getnoise_eta1122_4x 2s 1s +100%
mlk_poly_getnoise_eta2 2s 5s -60%
mlk_poly_mulcache_compute 2s 3s -33%
mlk_poly_mulcache_compute_c 2s 3s -33%
mlk_poly_ntt 2s 4s -50%
mlk_poly_ntt_c 2s 2s +0%
mlk_poly_reduce_c 2s 4s -50%
mlk_poly_sub 2s 3s -33%
mlk_poly_tobytes 2s 2s +0%
mlk_poly_tobytes_c 2s 2s +0%
mlk_poly_tomont 2s 6s -67%
mlk_poly_tomont_c 2s 1s +100%
mlk_polyvec_basemul_acc_montgomery_cached 2s 4s -50%
mlk_polyvec_compress_du 2s 2s +0%
mlk_polyvec_frombytes 2s 3s -33%
mlk_polyvec_invntt_tomont 2s 3s -33%
mlk_polyvec_mulcache_compute 2s 2s +0%
mlk_scalar_compress_d1 2s 2s +0%
mlk_scalar_compress_d10 2s 1s +100%
mlk_scalar_compress_d11 2s 1s +100%
mlk_scalar_decompress_d11 2s 3s -33%
mlk_scalar_decompress_d5 2s 3s -33%
mlk_scalar_signed_to_unsigned_q 2s 2s +0%
mlk_sha3_256 2s 4s -50%
mlk_sha3_512 2s 1s +100%
mlk_value_barrier_i32 2s 1s +100%
mlk_value_barrier_u32 2s 3s -33%
mlk_value_barrier_u8 2s 3s -33%
ntt_native_aarch64 2s 4s -50%
nttunpack_native_x86_64 2s 2s +0%
poly_compress_d10_native_x86_64 2s 4s -50%
poly_compress_d4_native_x86_64 2s 2s +0%
poly_decompress_d10_native_x86_64 2s 1s +100%
poly_decompress_d4_native_x86_64 2s 2s +0%
poly_getnoise_eta1122_4x_native 2s 2s +0%
poly_reduce_native_aarch64 2s 3s -33%
poly_reduce_native_x86_64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 2s 3s -33%
rej_uniform_native 2s 1s +100%
rej_uniform_native_x86_64 2s 3s -33%
intt_native_aarch64 1s 1s +0%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
keccak_f1600_x4_native_aarch64_v84a 1s 1s +0%
keccak_f1600_x4_native_avx2 1s 1s +0%
kem_keypair 1s 2s -50%
mlk_barrett_reduce 1s 2s -50%
mlk_ct_cmask_nonzero_u8 1s 2s -50%
mlk_ct_cmov_zero 1s 1s +0%
mlk_keccakf1600_extract_bytes (big endian) 1s 1s +0%
mlk_keccakf1600x4_extract_bytes 1s 1s +0%
mlk_poly_compress_d11_native 1s 1s +0%
mlk_poly_compress_d5 1s 3s -67%
mlk_poly_compress_d5_native 1s 1s +0%
mlk_poly_decompress_d10_native 1s 3s -67%
mlk_poly_frombytes_c 1s 1s +0%
mlk_poly_invntt_tomont 1s 2s -50%
mlk_poly_reduce 1s 1s +0%
mlk_poly_tomont_native 1s 3s -67%
mlk_rej_uniform 1s 1s +0%
mlk_scalar_compress_d4 1s 3s -67%
mlk_scalar_compress_d5 1s 3s -67%
mlk_shake128_squeezeblocks 1s 3s -67%
mlk_shake128x4_absorb_once 1s 4s -75%
mlk_shake128x4_squeezeblocks 1s 3s -67%
poly_invntt_tomont_native 1s 2s -50%
poly_tobytes_native_aarch64 1s 4s -75%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 1s 2s -50%
sys_check_capability 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Jan 23, 2026

CBMC Results (ML-KEM-768)

Full Results (187 proofs)
Proof Status Current Previous Change
**TOTAL** 1438s 1373s +4.7%
mlk_indcpa_enc 252s 256s -2%
mlk_indcpa_keypair_derand 202s 187s +8%
mlk_keccak_squeezeblocks_x4 123s 119s +3%
mlk_rej_uniform_c 74s 68s +9%
polyvec_basemul_acc_montgomery_cached_native 62s 60s +3%
mlk_polyvec_basemul_acc_montgomery_cached_c 47s 45s +4%
mlk_poly_rej_uniform 32s 32s +0%
poly_ntt_native 27s 24s +12%
mlk_polyvec_add 26s 26s +0%
keccakf1600x4_permute_native_x4 18s 19s -5%
mlk_poly_reduce_native 16s 15s +7%
mlk_indcpa_dec 14s 16s -12%
mlk_poly_decompress_d4_native 13s 11s +18%
mlk_poly_decompress_d10_native 12s 10s +20%
mlk_ntt_layer 11s 9s +22%
mlk_poly_frommsg 10s 8s +25%
mlk_keccak_absorb_once_x4 9s 9s +0%
mlk_poly_frombytes_native 9s 7s +29%
mlk_fqmul 7s 6s +17%
mlk_ntt_butterfly_block 7s 6s +17%
mlk_poly_rej_uniform_x4 7s 7s +0%
keccakf1600_permute_native 6s 6s +0%
mlk_gen_matrix_serial 6s 4s +50%
mlk_poly_add 6s 5s +20%
mlk_polymat_permute_bitrev_to_custom 6s 6s +0%
mlk_shake256x4 6s 8s -25%
mlk_invntt_layer 5s 5s +0%
mlk_keccak_squeeze_once 5s 8s -38%
mlk_keccak_squeezeblocks 5s 7s -29%
mlk_poly_decompress_d10_c 5s 4s +25%
mlk_poly_decompress_du 5s 3s +67%
mlk_poly_invntt_tomont 5s 2s +150%
mlk_poly_mulcache_compute_c 5s 3s +67%
mlk_value_barrier_u32 5s 1s +400%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 5s 2s +150%
intt_native_x86_64 4s 2s +100%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 2s +100%
keccakf1600x4_xor_bytes_native 4s 4s +0%
kem_check_sk 4s 1s +300%
kem_dec 4s 4s +0%
kem_enc_derand 4s 2s +100%
kem_keypair_derand 4s 3s +33%
mlk_keccak_absorb_once 4s 4s +0%
mlk_keccakf1600_extract_bytes 4s 2s +100%
mlk_keccakf1600_extract_bytes (big endian) 4s 2s +100%
mlk_keccakf1600x4_xor_bytes 4s 2s +100%
mlk_poly_compress_d10 4s 1s +300%
mlk_poly_compress_d11_c 4s 3s +33%
mlk_poly_compress_d11_native 4s 3s +33%
mlk_poly_decompress_dv 4s 1s +300%
mlk_poly_mulcache_compute_native 4s 2s +100%
mlk_poly_ntt 4s 2s +100%
mlk_poly_ntt_c 4s 6s -33%
mlk_polyvec_permute_bitrev_to_custom_native 4s 2s +100%
mlk_polyvec_tobytes 4s 3s +33%
mlk_scalar_signed_to_unsigned_q 4s 3s +33%
mlk_shake128x4_squeezeblocks 4s 1s +300%
ntt_native_x86_64 4s 3s +33%
nttunpack_native_x86_64 4s 1s +300%
poly_compress_d4_native_x86_64 4s 2s +100%
poly_frombytes_native_x86_64 4s 5s -20%
poly_getnoise_eta1122_4x_native 4s 3s +33%
poly_mulcache_compute_native_x86_64 4s 3s +33%
poly_reduce_native_aarch64 4s 1s +300%
poly_tobytes_native_x86_64 4s 5s -20%
poly_tomont_native_x86_64 4s 3s +33%
rej_uniform_native 4s 2s +100%
rej_uniform_native_aarch64 4s 4s +0%
kem_check_pk 3s 3s +0%
kem_keypair 3s 2s +50%
mlk_barrett_reduce 3s 3s +0%
mlk_ct_cmask_neg_i16 3s 1s +200%
mlk_ct_cmask_nonzero_u16 3s 3s +0%
mlk_ct_memcmp 3s 3s +0%
mlk_gen_matrix 3s 3s +0%
mlk_keccakf1600_permute 3s 3s +0%
mlk_keccakf1600x4_permute 3s 1s +200%
mlk_montgomery_reduce 3s 2s +50%
mlk_poly_compress_d10_c 3s 3s +0%
mlk_poly_compress_d11 3s 1s +200%
mlk_poly_compress_d4 3s 3s +0%
mlk_poly_compress_d5_c 3s 2s +50%
mlk_poly_compress_dv 3s 2s +50%
mlk_poly_decompress_d10 3s 1s +200%
mlk_poly_decompress_d4 3s 2s +50%
mlk_poly_decompress_d5_native 3s 4s -25%
mlk_poly_frombytes 3s 4s -25%
mlk_poly_getnoise_eta1122_4x 3s 4s -25%
mlk_poly_getnoise_eta1_4x 3s 3s +0%
mlk_poly_reduce_c 3s 1s +200%
mlk_poly_tobytes 3s 1s +200%
mlk_poly_tobytes_native 3s 1s +200%
mlk_poly_tomont 3s 2s +50%
mlk_polyvec_frombytes 3s 3s +0%
mlk_polyvec_invntt_tomont 3s 3s +0%
mlk_polyvec_ntt 3s 1s +200%
mlk_rej_uniform 3s 1s +200%
mlk_scalar_compress_d10 3s 2s +50%
mlk_scalar_compress_d11 3s 2s +50%
mlk_scalar_compress_d4 3s 1s +200%
mlk_scalar_compress_d5 3s 2s +50%
poly_decompress_d10_native_x86_64 3s 4s -25%
poly_decompress_d4_native_x86_64 3s 6s -50%
poly_reduce_native_x86_64 3s 2s +50%
poly_tobytes_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 3s +0%
sys_check_capability 3s 2s +50%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 1s +100%
keccak_f1600_x4_native_avx2 2s 3s -33%
keccakf1600x4_extract_bytes_native 2s 1s +100%
kem_enc 2s 3s -33%
mlk_check_pct 2s 3s -33%
mlk_ct_get_optblocker_i32 2s 2s +0%
mlk_ct_get_optblocker_u32 2s 2s +0%
mlk_ct_get_optblocker_u8 2s 3s -33%
mlk_ct_sel_uint8 2s 3s -33%
mlk_keccakf1600_xor_bytes 2s 2s +0%
mlk_keccakf1600_xor_bytes (big endian) 2s 3s -33%
mlk_matvec_mul 2s 5s -60%
mlk_poly_cbd_eta2 2s 2s +0%
mlk_poly_compress_d10_native 2s 1s +100%
mlk_poly_compress_d4_c 2s 4s -50%
mlk_poly_compress_d4_native 2s 3s -33%
mlk_poly_compress_d5 2s 2s +0%
mlk_poly_decompress_d11 2s 2s +0%
mlk_poly_decompress_d11_c 2s 1s +100%
mlk_poly_decompress_d11_native 2s 2s +0%
mlk_poly_decompress_d4_c 2s 1s +100%
mlk_poly_decompress_d5 2s 2s +0%
mlk_poly_decompress_d5_c 2s 3s -33%
mlk_poly_getnoise_eta2 2s 2s +0%
mlk_poly_invntt_tomont_c 2s 3s -33%
mlk_poly_mulcache_compute 2s 2s +0%
mlk_poly_reduce 2s 1s +100%
mlk_poly_sub 2s 3s -33%
mlk_poly_tobytes_c 2s 1s +100%
mlk_poly_tomsg 2s 2s +0%
mlk_polyvec_basemul_acc_montgomery_cached 2s 1s +100%
mlk_polyvec_compress_du 2s 3s -33%
mlk_polyvec_decompress_du 2s 1s +100%
mlk_polyvec_mulcache_compute 2s 2s +0%
mlk_polyvec_permute_bitrev_to_custom 2s 3s -33%
mlk_scalar_compress_d1 2s 1s +100%
mlk_sha3_256 2s 4s -50%
mlk_sha3_512 2s 3s -33%
mlk_shake128_absorb_once 2s 3s -33%
mlk_shake128_squeezeblocks 2s 1s +100%
mlk_shake128x4_absorb_once 2s 4s -50%
mlk_shake256 2s 3s -33%
mlk_value_barrier_i32 2s 2s +0%
mlk_value_barrier_u8 2s 1s +100%
ntt_native_aarch64 2s 4s -50%
poly_compress_d10_native_x86_64 2s 1s +100%
poly_compress_d11_native_x86_64 2s 4s -50%
poly_decompress_d11_native_x86_64 2s 3s -33%
poly_decompress_d5_native_x86_64 2s 1s +100%
poly_mulcache_compute_native_aarch64 2s 2s +0%
poly_tomont_native_aarch64 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 2s 4s -50%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 2s 4s -50%
intt_native_aarch64 1s 3s -67%
keccak_f1600_x4_native_aarch64_v84a 1s 1s +0%
mlk_ct_cmask_nonzero_u8 1s 2s -50%
mlk_ct_cmov_zero 1s 2s -50%
mlk_ct_sel_int16 1s 3s -67%
mlk_keccakf1600x4_extract_bytes 1s 3s -67%
mlk_keypair_getnoise_eta1 1s 2s -50%
mlk_poly_cbd_eta1 1s 4s -75%
mlk_poly_compress_d5_native 1s 2s -50%
mlk_poly_compress_du 1s 1s +0%
mlk_poly_frombytes_c 1s 4s -75%
mlk_poly_getnoise_eta1_4x_native 1s 3s -67%
mlk_poly_tomont_c 1s 3s -67%
mlk_poly_tomont_native 1s 2s -50%
mlk_polyvec_reduce 1s 1s +0%
mlk_polyvec_tomont 1s 3s -67%
mlk_scalar_decompress_d10 1s 4s -75%
mlk_scalar_decompress_d11 1s 1s +0%
mlk_scalar_decompress_d4 1s 3s -67%
mlk_scalar_decompress_d5 1s 1s +0%
poly_compress_d5_native_x86_64 1s 4s -75%
poly_invntt_tomont_native 1s 1s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 1s 2s -50%
rej_uniform_native_x86_64 1s 2s -50%

@willieyz willieyz force-pushed the namespace-STACK_SIZE branch from 17d96d2 to 4be4259 Compare January 23, 2026 06:19
@willieyz willieyz marked this pull request as ready for review January 23, 2026 10:05
@willieyz willieyz marked this pull request as draft January 23, 2026 10:12
@willieyz willieyz marked this pull request as ready for review January 23, 2026 11:00
@hanno-becker
Copy link
Copy Markdown
Contributor

@willieyz Could you rebase this please?

@willieyz
Copy link
Copy Markdown
Contributor Author

@willieyz Could you rebase this please?

Hello @hanno-becker,
sorry for the late reply...
I’m on vacation right now and will be back on March 23. Is it okay if I rebase the PR then?
I will report with discord chat when I finish the rebase!

- The STACK_SIZE should namespace as MLK_STACK_SIZE
  to avoid clashing with symbols in consuming libraries.
- Also, found a missing namespace macro called KECCAK_F1600_ROUNDS,
  also namespace it.

- This commit also namespace rej_uniform_asm

Signed-off-by: willieyz <willie.zhao@chelpis.com>
@willieyz willieyz force-pushed the namespace-STACK_SIZE branch from 4be4259 to 68eabf4 Compare March 23, 2026 17:11
@willieyz
Copy link
Copy Markdown
Contributor Author

Hello, @hanno-becker ,
I had rebase this on top of main. Could you please take another look when you have time?
Thank you for you help!

Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willieyz. Generally I support this change for consistency and I checked that you have not missed any non-namespaced macros.

A couple of comments got wrongly changed in the x86_64 code (see below), please change them back.

Since the scope of this PR has changed, please update the commit message and PR description to reflect that.

//
// Notes:
// - We exit early if we find the required number of good values,
// - We exit early if we find the required number of MLK_GOOD values,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// - We exit early if we find the required number of MLK_GOOD values,
// - We exit early if we find the required number of good values,

// occupies a corresponding 16-bit element of `MLK_VALS` xmm register,
// 2. Compute an 8-bit value `MLK_GOOD` such that
// MLK_GOOD[i] = MLK_VALS[i] < MLKEM_Q ? 1 : 0, for i in [0, 7],
// 3. Shuffle the elements in `MLK_VALS` such that all MLK_GOOD elements
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// 3. Shuffle the elements in `MLK_VALS` such that all MLK_GOOD elements
// 3. Shuffle the elements in `MLK_VALS` such that all good elements

movq $0, cnt // cnt counts the number of good values we've found.
movq $0, pos // pos is the current position in the input buffer.
movq $0x5555, pext_mask // 0x5555 mask to extract every second bit.
movq $0, MLK_CNT // MLK_CNT counts the number of MLK_GOOD values we've found.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
movq $0, MLK_CNT // MLK_CNT counts the number of MLK_GOOD values we've found.
movq $0, MLK_CNT // MLK_CNT counts the number of good values we've found.

pinsrq $1, %rax, MLK_BOUND

// Broadcast 12-bit mask 0xFFF to all 16-bit elements of bound reg.
// Broadcast 12-bit mask 0xFFF to all 16-bit elements of MLK_BOUND reg.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Broadcast 12-bit mask 0xFFF to all 16-bit elements of MLK_BOUND reg.
// Broadcast 12-bit mask 0xFFF to all 16-bit elements of MLK_AND_MASK reg.

@mkannwischer
Copy link
Copy Markdown
Contributor

Marking as draft for now. Please mark it as ready when my comments have been addressed.

@mkannwischer mkannwischer marked this pull request as draft April 4, 2026 02:21
@mkannwischer
Copy link
Copy Markdown
Contributor

@willieyz - gentle ping. Could you please get this updated so we can get it merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Namespace STACK_SIZE in various assembly files

4 participants