Skip to content

native: Unify asm backend symbol naming#1663

Merged
mkannwischer merged 5 commits intomainfrom
unify-asm-symbols
Apr 24, 2026
Merged

native: Unify asm backend symbol naming#1663
mkannwischer merged 5 commits intomainfrom
unify-asm-symbols

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

No description provided.

Comment thread scripts/autogen Outdated
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 23, 2026

CBMC Results (ML-KEM-512)

Full Results (191 proofs)
Proof Status Current Previous Change
**TOTAL** 1318s 1477s -10.8%
mlk_indcpa_keypair_derand 257s 273s -6%
mlk_indcpa_enc 177s 186s -5%
mlk_rej_uniform_c 140s 157s -11%
mlk_polyvec_basemul_acc_montgomery_cached_c 56s 62s -10%
mlk_poly_rej_uniform 34s 35s -3%
mlk_ntt_layer 26s 43s -40%
mlk_keccak_squeezeblocks_x4 25s 31s -19%
poly_ntt_native 25s 31s -19%
mlk_poly_reduce_native 20s 22s -9%
keccakf1600x4_permute_native_x4 19s 18s +6%
mlk_indcpa_dec 15s 18s -17%
mlk_fqmul 14s 20s -30%
mlk_poly_decompress_d4_native 13s 17s -24%
mlk_poly_decompress_d10_native 11s 16s -31%
mlk_polyvec_add 10s 13s -23%
mlk_poly_frommsg 9s 13s -31%
mlk_keccak_squeezeblocks 8s 10s -20%
rej_uniform_native_x86_64 8s 3s +167%
mlk_keccak_absorb_once_x4 7s 7s +0%
mlk_keccak_squeeze_once 7s 9s -22%
mlk_poly_frombytes_native 7s 12s -42%
mlk_ct_cmask_neg_i16 6s 4s +50%
mlk_poly_cbd_eta2 6s 8s -25%
mlk_poly_ntt 6s 9s -33%
polyvec_basemul_acc_montgomery_cached_native 6s 7s -14%
intt_native_aarch64 5s 1s +400%
mlk_keccakf1600_permute_c 5s 7s -29%
mlk_ntt_butterfly_block 5s 10s -50%
mlk_poly_rej_uniform_x4 5s 6s -17%
mlk_polyvec_permute_bitrev_to_custom 5s 4s +25%
mlk_rej_uniform 5s 3s +67%
poly_decompress_d4_native_x86_64 5s 4s +25%
kem_check_pk 4s 5s -20%
kem_dec 4s 5s -20%
kem_enc_derand 4s 3s +33%
mlk_enc_getnoise_eta1_eta2 4s 5s -20%
mlk_gen_matrix 4s 3s +33%
mlk_invntt_layer 4s 6s -33%
mlk_poly_cbd_eta1 4s 4s +0%
mlk_poly_compress_d4_c 4s 2s +100%
mlk_poly_decompress_d11_c 4s 2s +100%
mlk_poly_decompress_du 4s 2s +100%
mlk_poly_mulcache_compute_c 4s 4s +0%
mlk_poly_ntt_c 4s 3s +33%
mlk_poly_tobytes 4s 3s +33%
mlk_polyvec_invntt_tomont 4s 3s +33%
mlk_polyvec_reduce 4s 2s +100%
mlk_scalar_decompress_d10 4s 2s +100%
mlk_shake256x4 4s 4s +0%
poly_compress_d4_native_x86_64 4s 1s +300%
poly_decompress_d10_native_x86_64 4s 6s -33%
poly_decompress_d11_native_x86_64 4s 4s +0%
intt_native_x86_64 3s 2s +50%
keccak_f1600_x1_native_aarch64_v84a 3s 4s -25%
kem_check_sk 3s 4s -25%
kem_keypair 3s 1s +200%
kem_keypair_derand 3s 4s -25%
mlk_ct_get_optblocker_u32 3s 1s +200%
mlk_ct_get_optblocker_u8 3s 2s +50%
mlk_ct_sel_int16 3s 1s +200%
mlk_gen_matrix_serial 3s 1s +200%
mlk_keccak_absorb_once 3s 5s -40%
mlk_keccakf1600_extract_bytes (big endian) 3s 2s +50%
mlk_keccakf1600_permute 3s 2s +50%
mlk_keccakf1600x4_extract_bytes 3s 1s +200%
mlk_keccakf1600x4_permute 3s 2s +50%
mlk_keypair_getnoise_eta1 3s 2s +50%
mlk_poly_compress_d10_native 3s 3s +0%
mlk_poly_compress_d4 3s 2s +50%
mlk_poly_compress_d4_native 3s 2s +50%
mlk_poly_compress_d5_native 3s 3s +0%
mlk_poly_compress_du 3s 3s +0%
mlk_poly_compress_dv 3s 3s +0%
mlk_poly_decompress_dv 3s 4s -25%
mlk_poly_getnoise_eta1122_4x 3s 2s +50%
mlk_poly_invntt_tomont_c 3s 3s +0%
mlk_poly_mulcache_compute_native 3s 2s +50%
mlk_poly_reduce_c 3s 1s +200%
mlk_poly_tomont_c 3s 2s +50%
mlk_polymat_permute_bitrev_to_custom 3s 2s +50%
mlk_polyvec_compress_du 3s 2s +50%
mlk_polyvec_ntt 3s 5s -40%
mlk_polyvec_permute_bitrev_to_custom_native 3s 4s -25%
mlk_scalar_decompress_d5 3s 2s +50%
mlk_sha3_512 3s 3s +0%
mlk_shake128x4_squeezeblocks 3s 1s +200%
mlk_shake256 3s 2s +50%
nttunpack_native_x86_64 3s 2s +50%
poly_compress_d10_native_x86_64 3s 5s -40%
poly_frombytes_native_x86_64 3s 8s -62%
poly_getnoise_eta1122_4x_native 3s 2s +50%
poly_tomont_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 1s +200%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 3s 4s -25%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 3s 4s -25%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 1s +100%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 1s +100%
keccakf1600_permute_native 2s 4s -50%
kem_enc 2s 4s -50%
mlk_barrett_reduce 2s 2s +0%
mlk_ct_cmask_nonzero_u16 2s 4s -50%
mlk_ct_cmask_nonzero_u8 2s 3s -33%
mlk_ct_cmov_zero 2s 3s -33%
mlk_ct_sel_uint8 2s 1s +100%
mlk_keccakf1600_xor_bytes (big endian) 2s 2s +0%
mlk_keccakf1600x4_extract_bytes_c 2s 3s -33%
mlk_keccakf1600x4_xor_bytes_c 2s 2s +0%
mlk_poly_add 2s 1s +100%
mlk_poly_compress_d10_c 2s 2s +0%
mlk_poly_compress_d11 2s 2s +0%
mlk_poly_compress_d11_c 2s 1s +100%
mlk_poly_compress_d11_native 2s 3s -33%
mlk_poly_compress_d5_c 2s 2s +0%
mlk_poly_decompress_d10 2s 5s -60%
mlk_poly_decompress_d10_c 2s 2s +0%
mlk_poly_decompress_d11_native 2s 1s +100%
mlk_poly_decompress_d4_c 2s 1s +100%
mlk_poly_decompress_d5 2s 1s +100%
mlk_poly_decompress_d5_c 2s 2s +0%
mlk_poly_decompress_d5_native 2s 1s +100%
mlk_poly_frombytes 2s 4s -50%
mlk_poly_frombytes_c 2s 2s +0%
mlk_poly_getnoise_eta1_4x 2s 2s +0%
mlk_poly_getnoise_eta1_4x_native 2s 2s +0%
mlk_poly_getnoise_eta2 2s 1s +100%
mlk_poly_invntt_tomont 2s 2s +0%
mlk_poly_mulcache_compute 2s 2s +0%
mlk_poly_reduce 2s 3s -33%
mlk_poly_tobytes_c 2s 2s +0%
mlk_poly_tobytes_native 2s 4s -50%
mlk_poly_tomont 2s 3s -33%
mlk_poly_tomsg 2s 4s -50%
mlk_polyvec_basemul_acc_montgomery_cached 2s 2s +0%
mlk_polyvec_decompress_du 2s 3s -33%
mlk_polyvec_tobytes 2s 1s +100%
mlk_polyvec_tomont 2s 2s +0%
mlk_scalar_compress_d1 2s 1s +100%
mlk_scalar_compress_d10 2s 2s +0%
mlk_scalar_compress_d11 2s 3s -33%
mlk_scalar_compress_d5 2s 2s +0%
mlk_scalar_signed_to_unsigned_q 2s 2s +0%
mlk_sha3_256 2s 4s -50%
mlk_shake128_squeezeblocks 2s 2s +0%
mlk_shake128x4_absorb_once 2s 2s +0%
mlk_value_barrier_i32 2s 2s +0%
ntt_native_aarch64 2s 3s -33%
ntt_native_x86_64 2s 6s -67%
poly_compress_d11_native_x86_64 2s 2s +0%
poly_decompress_d5_native_x86_64 2s 3s -33%
poly_invntt_tomont_native 2s 1s +100%
poly_mulcache_compute_native_aarch64 2s 1s +100%
poly_mulcache_compute_native_x86_64 2s 3s -33%
poly_reduce_native_aarch64 2s 4s -50%
poly_reduce_native_x86_64 2s 5s -60%
poly_tomont_native_x86_64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 2s 2s +0%
rej_uniform_native 2s 2s +0%
keccak_f1600_x1_native_aarch64 1s 2s -50%
keccak_f1600_x4_native_aarch64_v84a 1s 2s -50%
keccakf1600x4_extract_bytes_native 1s 1s +0%
keccakf1600x4_xor_bytes_native 1s 1s +0%
mlk_check_pct 1s 2s -50%
mlk_ct_get_optblocker_i32 1s 3s -67%
mlk_ct_memcmp 1s 2s -50%
mlk_keccakf1600_extract_bytes 1s 2s -50%
mlk_keccakf1600_xor_bytes 1s 2s -50%
mlk_keccakf1600x4_xor_bytes 1s 2s -50%
mlk_matvec_mul 1s 4s -75%
mlk_montgomery_reduce 1s 1s +0%
mlk_poly_compress_d10 1s 1s +0%
mlk_poly_compress_d5 1s 3s -67%
mlk_poly_decompress_d11 1s 3s -67%
mlk_poly_decompress_d4 1s 1s +0%
mlk_poly_sub 1s 2s -50%
mlk_poly_tomont_native 1s 4s -75%
mlk_polyvec_frombytes 1s 2s -50%
mlk_polyvec_mulcache_compute 1s 6s -83%
mlk_scalar_compress_d4 1s 2s -50%
mlk_scalar_decompress_d11 1s 4s -75%
mlk_scalar_decompress_d4 1s 4s -75%
mlk_shake128_absorb_once 1s 4s -75%
mlk_value_barrier_u32 1s 1s +0%
mlk_value_barrier_u8 1s 4s -75%
poly_compress_d5_native_x86_64 1s 4s -75%
poly_tobytes_native_aarch64 1s 3s -67%
poly_tobytes_native_x86_64 1s 3s -67%
rej_uniform_native_aarch64 1s 4s -75%
sys_check_capability 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 23, 2026

CBMC Results (ML-KEM-768)

Full Results (191 proofs)
Proof Status Current Previous Change
**TOTAL** 1274s 1214s +4.9%
mlk_indcpa_keypair_derand 198s 168s +18%
mlk_indcpa_enc 179s 173s +3%
mlk_rej_uniform_c 130s 124s +5%
mlk_polyvec_basemul_acc_montgomery_cached_c 46s 44s +5%
mlk_poly_rej_uniform 31s 27s +15%
mlk_ntt_layer 29s 27s +7%
mlk_keccak_squeezeblocks_x4 24s 25s -4%
poly_ntt_native 24s 26s -8%
mlk_poly_reduce_native 22s 21s +5%
keccakf1600x4_permute_native_x4 19s 17s +12%
polyvec_basemul_acc_montgomery_cached_native 19s 18s +6%
mlk_fqmul 15s 16s -6%
mlk_poly_decompress_d4_native 15s 15s +0%
mlk_poly_decompress_d10_native 14s 15s -7%
mlk_indcpa_dec 12s 13s -8%
mlk_keccak_squeezeblocks 9s 7s +29%
mlk_poly_frombytes_native 9s 10s -10%
mlk_poly_frommsg 9s 8s +12%
mlk_polyvec_add 9s 8s +12%
kem_dec 7s 5s +40%
mlk_invntt_layer 7s 5s +40%
mlk_keccak_absorb_once_x4 7s 8s -12%
mlk_keccak_squeeze_once 7s 9s -22%
mlk_poly_ntt 7s 7s +0%
kem_check_pk 6s 4s +50%
mlk_gen_matrix 6s 5s +20%
mlk_ntt_butterfly_block 6s 6s +0%
mlk_poly_rej_uniform_x4 6s 7s -14%
rej_uniform_native_x86_64 6s 3s +100%
mlk_keccak_absorb_once 5s 5s +0%
mlk_matvec_mul 5s 2s +150%
mlk_polyvec_compress_du 5s 3s +67%
mlk_scalar_compress_d11 5s 3s +67%
mlk_scalar_decompress_d5 5s 3s +67%
poly_decompress_d10_native_x86_64 5s 6s -17%
poly_frombytes_native_x86_64 5s 5s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 5s 2s +150%
rej_uniform_native 5s 2s +150%
kem_enc_derand 4s 2s +100%
mlk_barrett_reduce 4s 2s +100%
mlk_ct_get_optblocker_u32 4s 2s +100%
mlk_keccakf1600_permute_c 4s 4s +0%
mlk_poly_cbd_eta2 4s 1s +300%
mlk_poly_decompress_d11_native 4s 3s +33%
mlk_polymat_permute_bitrev_to_custom 4s 4s +0%
mlk_polyvec_mulcache_compute 4s 3s +33%
mlk_scalar_compress_d4 4s 1s +300%
mlk_shake256x4 4s 4s +0%
poly_decompress_d4_native_x86_64 4s 3s +33%
poly_getnoise_eta1122_4x_native 4s 1s +300%
poly_mulcache_compute_native_x86_64 4s 2s +100%
poly_tobytes_native_aarch64 4s 3s +33%
poly_tomont_native_x86_64 4s 2s +100%
intt_native_aarch64 3s 3s +0%
intt_native_x86_64 3s 2s +50%
keccak_f1600_x1_native_aarch64 3s 1s +200%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 4s -25%
keccak_f1600_x4_native_avx2 3s 1s +200%
kem_keypair_derand 3s 4s -25%
mlk_ct_get_optblocker_i32 3s 3s +0%
mlk_gen_matrix_serial 3s 4s -25%
mlk_keccakf1600_permute 3s 2s +50%
mlk_keccakf1600_xor_bytes 3s 2s +50%
mlk_keccakf1600x4_permute 3s 2s +50%
mlk_keypair_getnoise_eta1 3s 4s -25%
mlk_poly_compress_d11 3s 1s +200%
mlk_poly_compress_d4_c 3s 3s +0%
mlk_poly_compress_d4_native 3s 3s +0%
mlk_poly_compress_d5 3s 2s +50%
mlk_poly_compress_d5_native 3s 1s +200%
mlk_poly_compress_du 3s 2s +50%
mlk_poly_compress_dv 3s 1s +200%
mlk_poly_decompress_d10_c 3s 3s +0%
mlk_poly_getnoise_eta1_4x 3s 2s +50%
mlk_poly_ntt_c 3s 3s +0%
mlk_poly_reduce 3s 3s +0%
mlk_poly_tomsg 3s 2s +50%
mlk_polyvec_decompress_du 3s 1s +200%
mlk_polyvec_frombytes 3s 3s +0%
mlk_polyvec_invntt_tomont 3s 1s +200%
mlk_polyvec_ntt 3s 4s -25%
mlk_polyvec_permute_bitrev_to_custom 3s 2s +50%
mlk_polyvec_permute_bitrev_to_custom_native 3s 3s +0%
mlk_polyvec_reduce 3s 1s +200%
mlk_polyvec_tomont 3s 2s +50%
mlk_scalar_signed_to_unsigned_q 3s 5s -40%
mlk_shake128_absorb_once 3s 1s +200%
mlk_shake128x4_squeezeblocks 3s 1s +200%
mlk_value_barrier_u8 3s 2s +50%
ntt_native_aarch64 3s 3s +0%
nttunpack_native_x86_64 3s 3s +0%
poly_compress_d11_native_x86_64 3s 3s +0%
poly_compress_d4_native_x86_64 3s 1s +200%
poly_compress_d5_native_x86_64 3s 3s +0%
poly_invntt_tomont_native 3s 1s +200%
poly_mulcache_compute_native_aarch64 3s 2s +50%
poly_reduce_native_x86_64 3s 1s +200%
poly_tobytes_native_x86_64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 3s 4s -25%
rej_uniform_native_aarch64 3s 2s +50%
keccak_f1600_x1_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccakf1600_permute_native 2s 1s +100%
keccakf1600x4_extract_bytes_native 2s 1s +100%
kem_enc 2s 2s +0%
mlk_ct_cmask_nonzero_u16 2s 1s +100%
mlk_ct_cmov_zero 2s 3s -33%
mlk_ct_get_optblocker_u8 2s 2s +0%
mlk_ct_sel_int16 2s 4s -50%
mlk_ct_sel_uint8 2s 3s -33%
mlk_keccakf1600_extract_bytes 2s 2s +0%
mlk_keccakf1600_extract_bytes (big endian) 2s 2s +0%
mlk_keccakf1600_xor_bytes (big endian) 2s 4s -50%
mlk_keccakf1600x4_extract_bytes_c 2s 1s +100%
mlk_keccakf1600x4_xor_bytes 2s 2s +0%
mlk_montgomery_reduce 2s 2s +0%
mlk_poly_add 2s 2s +0%
mlk_poly_cbd_eta1 2s 2s +0%
mlk_poly_compress_d10 2s 2s +0%
mlk_poly_compress_d10_c 2s 3s -33%
mlk_poly_compress_d10_native 2s 4s -50%
mlk_poly_compress_d11_native 2s 2s +0%
mlk_poly_compress_d5_c 2s 1s +100%
mlk_poly_decompress_d11 2s 2s +0%
mlk_poly_decompress_d4_c 2s 2s +0%
mlk_poly_decompress_d5 2s 2s +0%
mlk_poly_decompress_d5_native 2s 2s +0%
mlk_poly_decompress_du 2s 3s -33%
mlk_poly_decompress_dv 2s 2s +0%
mlk_poly_getnoise_eta1_4x_native 2s 1s +100%
mlk_poly_getnoise_eta2 2s 3s -33%
mlk_poly_invntt_tomont 2s 2s +0%
mlk_poly_invntt_tomont_c 2s 4s -50%
mlk_poly_mulcache_compute 2s 3s -33%
mlk_poly_mulcache_compute_c 2s 4s -50%
mlk_poly_mulcache_compute_native 2s 1s +100%
mlk_poly_reduce_c 2s 2s +0%
mlk_poly_sub 2s 1s +100%
mlk_poly_tobytes 2s 3s -33%
mlk_poly_tobytes_c 2s 3s -33%
mlk_poly_tomont 2s 2s +0%
mlk_poly_tomont_c 2s 1s +100%
mlk_polyvec_tobytes 2s 1s +100%
mlk_rej_uniform 2s 4s -50%
mlk_scalar_compress_d1 2s 3s -33%
mlk_scalar_decompress_d10 2s 2s +0%
mlk_scalar_decompress_d4 2s 2s +0%
mlk_value_barrier_i32 2s 2s +0%
ntt_native_x86_64 2s 2s +0%
poly_compress_d10_native_x86_64 2s 3s -33%
poly_decompress_d5_native_x86_64 2s 2s +0%
poly_reduce_native_aarch64 2s 4s -50%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 2s +0%
sys_check_capability 2s 2s +0%
keccakf1600x4_xor_bytes_native 1s 1s +0%
kem_check_sk 1s 3s -67%
kem_keypair 1s 4s -75%
mlk_check_pct 1s 3s -67%
mlk_ct_cmask_neg_i16 1s 1s +0%
mlk_ct_cmask_nonzero_u8 1s 2s -50%
mlk_ct_memcmp 1s 2s -50%
mlk_enc_getnoise_eta1_eta2 1s 3s -67%
mlk_keccakf1600x4_extract_bytes 1s 5s -80%
mlk_keccakf1600x4_xor_bytes_c 1s 1s +0%
mlk_poly_compress_d11_c 1s 2s -50%
mlk_poly_compress_d4 1s 1s +0%
mlk_poly_decompress_d10 1s 3s -67%
mlk_poly_decompress_d11_c 1s 2s -50%
mlk_poly_decompress_d4 1s 3s -67%
mlk_poly_decompress_d5_c 1s 1s +0%
mlk_poly_frombytes 1s 2s -50%
mlk_poly_frombytes_c 1s 4s -75%
mlk_poly_getnoise_eta1122_4x 1s 2s -50%
mlk_poly_tobytes_native 1s 4s -75%
mlk_poly_tomont_native 1s 1s +0%
mlk_polyvec_basemul_acc_montgomery_cached 1s 2s -50%
mlk_scalar_compress_d10 1s 3s -67%
mlk_scalar_compress_d5 1s 4s -75%
mlk_scalar_decompress_d11 1s 3s -67%
mlk_sha3_256 1s 1s +0%
mlk_sha3_512 1s 3s -67%
mlk_shake128_squeezeblocks 1s 3s -67%
mlk_shake128x4_absorb_once 1s 3s -67%
mlk_shake256 1s 2s -50%
mlk_value_barrier_u32 1s 3s -67%
poly_decompress_d11_native_x86_64 1s 3s -67%
poly_tomont_native_aarch64 1s 2s -50%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 23, 2026

CBMC Results (ML-KEM-1024)

Full Results (191 proofs)
Proof Status Current Previous Change
**TOTAL** 1173s 1154s +1.6%
mlk_indcpa_enc 133s 133s +0%
mlk_indcpa_keypair_derand 120s 124s -3%
mlk_rej_uniform_c 108s 108s +0%
mlk_polyvec_basemul_acc_montgomery_cached_c 73s 72s +1%
polyvec_basemul_acc_montgomery_cached_native 35s 31s +13%
mlk_poly_rej_uniform 30s 28s +7%
mlk_ntt_layer 29s 27s +7%
mlk_keccak_squeezeblocks_x4 25s 25s +0%
poly_ntt_native 22s 21s +5%
mlk_poly_reduce_native 20s 20s +0%
keccakf1600x4_permute_native_x4 19s 17s +12%
mlk_poly_decompress_d11_native 16s 13s +23%
mlk_fqmul 15s 16s -6%
mlk_polyvec_add 12s 13s -8%
mlk_poly_decompress_d5_native 11s 14s -21%
mlk_poly_frommsg 10s 7s +43%
mlk_poly_frombytes_native 8s 10s -20%
kem_dec 7s 6s +17%
mlk_indcpa_dec 7s 7s +0%
mlk_keccak_squeeze_once 7s 8s -12%
mlk_keccak_squeezeblocks 7s 11s -36%
mlk_ntt_butterfly_block 7s 7s +0%
mlk_poly_ntt 7s 5s +40%
mlk_polymat_permute_bitrev_to_custom 7s 6s +17%
rej_uniform_native_x86_64 7s 3s +133%
mlk_invntt_layer 6s 6s +0%
mlk_keccak_absorb_once_x4 6s 6s +0%
mlk_ct_cmask_neg_i16 5s 3s +67%
mlk_gen_matrix 5s 5s +0%
mlk_keccak_absorb_once 5s 6s -17%
mlk_poly_mulcache_compute_c 5s 2s +150%
mlk_poly_rej_uniform_x4 5s 7s -29%
mlk_polyvec_ntt 5s 3s +67%
mlk_scalar_signed_to_unsigned_q 5s 3s +67%
poly_frombytes_native_x86_64 5s 5s +0%
kem_keypair 4s 2s +100%
mlk_ct_get_optblocker_u32 4s 1s +300%
mlk_ct_get_optblocker_u8 4s 3s +33%
mlk_gen_matrix_serial 4s 4s +0%
mlk_keccakf1600_permute_c 4s 5s -20%
mlk_matvec_mul 4s 3s +33%
mlk_poly_compress_d10 4s 3s +33%
mlk_poly_compress_d11_c 4s 5s -20%
mlk_poly_compress_du 4s 1s +300%
mlk_poly_frombytes 4s 2s +100%
mlk_poly_mulcache_compute 4s 1s +300%
mlk_polyvec_decompress_du 4s 2s +100%
mlk_polyvec_permute_bitrev_to_custom_native 4s 3s +33%
mlk_shake128_squeezeblocks 4s 1s +300%
mlk_shake256x4 4s 3s +33%
ntt_native_x86_64 4s 3s +33%
poly_decompress_d11_native_x86_64 4s 6s -33%
poly_decompress_d5_native_x86_64 4s 3s +33%
poly_mulcache_compute_native_aarch64 4s 4s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 4s 2s +100%
keccakf1600_permute_native 3s 3s +0%
keccakf1600x4_extract_bytes_native 3s 3s +0%
keccakf1600x4_xor_bytes_native 3s 2s +50%
kem_check_pk 3s 2s +50%
kem_enc 3s 2s +50%
mlk_barrett_reduce 3s 2s +50%
mlk_check_pct 3s 4s -25%
mlk_ct_cmask_nonzero_u16 3s 3s +0%
mlk_ct_cmov_zero 3s 1s +200%
mlk_enc_getnoise_eta1_eta2 3s 3s +0%
mlk_keccakf1600_xor_bytes (big endian) 3s 1s +200%
mlk_keccakf1600x4_extract_bytes_c 3s 3s +0%
mlk_keccakf1600x4_permute 3s 2s +50%
mlk_keccakf1600x4_xor_bytes_c 3s 2s +50%
mlk_poly_cbd_eta1 3s 1s +200%
mlk_poly_cbd_eta2 3s 3s +0%
mlk_poly_compress_d4_native 3s 2s +50%
mlk_poly_compress_d5_native 3s 2s +50%
mlk_poly_decompress_d5_c 3s 3s +0%
mlk_poly_decompress_du 3s 2s +50%
mlk_poly_decompress_dv 3s 3s +0%
mlk_poly_getnoise_eta1122_4x 3s 2s +50%
mlk_poly_getnoise_eta1_4x_native 3s 2s +50%
mlk_poly_invntt_tomont_c 3s 1s +200%
mlk_poly_mulcache_compute_native 3s 4s -25%
mlk_poly_ntt_c 3s 3s +0%
mlk_poly_tobytes 3s 1s +200%
mlk_poly_tobytes_c 3s 2s +50%
mlk_poly_tomont 3s 4s -25%
mlk_poly_tomont_c 3s 2s +50%
mlk_polyvec_basemul_acc_montgomery_cached 3s 4s -25%
mlk_polyvec_compress_du 3s 3s +0%
mlk_polyvec_permute_bitrev_to_custom 3s 3s +0%
mlk_polyvec_tobytes 3s 3s +0%
mlk_scalar_compress_d11 3s 2s +50%
mlk_scalar_compress_d5 3s 3s +0%
mlk_sha3_512 3s 1s +200%
mlk_shake128_absorb_once 3s 4s -25%
mlk_shake128x4_absorb_once 3s 3s +0%
mlk_shake128x4_squeezeblocks 3s 4s -25%
ntt_native_aarch64 3s 1s +200%
poly_compress_d11_native_x86_64 3s 3s +0%
poly_compress_d5_native_x86_64 3s 3s +0%
poly_getnoise_eta1122_4x_native 3s 1s +200%
poly_mulcache_compute_native_x86_64 3s 4s -25%
poly_reduce_native_x86_64 3s 1s +200%
poly_tomont_native_x86_64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 3s 2s +50%
rej_uniform_native 3s 2s +50%
rej_uniform_native_aarch64 3s 4s -25%
intt_native_aarch64 2s 2s +0%
intt_native_x86_64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 1s +100%
keccak_f1600_x4_native_avx2 2s 2s +0%
kem_check_sk 2s 3s -33%
mlk_ct_cmask_nonzero_u8 2s 1s +100%
mlk_ct_get_optblocker_i32 2s 1s +100%
mlk_ct_memcmp 2s 3s -33%
mlk_ct_sel_uint8 2s 4s -50%
mlk_keccakf1600_extract_bytes 2s 3s -33%
mlk_keccakf1600_permute 2s 1s +100%
mlk_keccakf1600_xor_bytes 2s 2s +0%
mlk_keypair_getnoise_eta1 2s 3s -33%
mlk_montgomery_reduce 2s 4s -50%
mlk_poly_add 2s 1s +100%
mlk_poly_compress_d10_c 2s 2s +0%
mlk_poly_compress_d11 2s 3s -33%
mlk_poly_compress_d4 2s 1s +100%
mlk_poly_compress_d4_c 2s 2s +0%
mlk_poly_compress_d5 2s 2s +0%
mlk_poly_compress_d5_c 2s 5s -60%
mlk_poly_decompress_d10 2s 2s +0%
mlk_poly_decompress_d10_c 2s 1s +100%
mlk_poly_decompress_d10_native 2s 2s +0%
mlk_poly_decompress_d4_c 2s 2s +0%
mlk_poly_decompress_d4_native 2s 2s +0%
mlk_poly_decompress_d5 2s 2s +0%
mlk_poly_getnoise_eta1_4x 2s 3s -33%
mlk_poly_getnoise_eta2 2s 3s -33%
mlk_poly_reduce 2s 1s +100%
mlk_poly_reduce_c 2s 2s +0%
mlk_poly_sub 2s 1s +100%
mlk_poly_tobytes_native 2s 2s +0%
mlk_poly_tomont_native 2s 1s +100%
mlk_poly_tomsg 2s 3s -33%
mlk_polyvec_frombytes 2s 4s -50%
mlk_polyvec_invntt_tomont 2s 2s +0%
mlk_polyvec_mulcache_compute 2s 3s -33%
mlk_polyvec_reduce 2s 2s +0%
mlk_polyvec_tomont 2s 4s -50%
mlk_rej_uniform 2s 1s +100%
mlk_scalar_compress_d10 2s 3s -33%
mlk_scalar_compress_d4 2s 2s +0%
mlk_scalar_decompress_d10 2s 2s +0%
mlk_scalar_decompress_d11 2s 1s +100%
mlk_sha3_256 2s 1s +100%
mlk_shake256 2s 2s +0%
mlk_value_barrier_i32 2s 2s +0%
mlk_value_barrier_u8 2s 2s +0%
nttunpack_native_x86_64 2s 3s -33%
poly_compress_d10_native_x86_64 2s 3s -33%
poly_compress_d4_native_x86_64 2s 2s +0%
poly_decompress_d10_native_x86_64 2s 3s -33%
poly_decompress_d4_native_x86_64 2s 3s -33%
poly_reduce_native_aarch64 2s 1s +100%
poly_tobytes_native_x86_64 2s 2s +0%
poly_tomont_native_aarch64 2s 4s -50%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 2s +0%
sys_check_capability 2s 4s -50%
keccak_f1600_x1_native_aarch64 1s 3s -67%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
keccak_f1600_x4_native_aarch64_v84a 1s 2s -50%
kem_enc_derand 1s 1s +0%
kem_keypair_derand 1s 3s -67%
mlk_ct_sel_int16 1s 2s -50%
mlk_keccakf1600_extract_bytes (big endian) 1s 2s -50%
mlk_keccakf1600x4_extract_bytes 1s 1s +0%
mlk_keccakf1600x4_xor_bytes 1s 5s -80%
mlk_poly_compress_d10_native 1s 2s -50%
mlk_poly_compress_d11_native 1s 2s -50%
mlk_poly_compress_dv 1s 2s -50%
mlk_poly_decompress_d11 1s 3s -67%
mlk_poly_decompress_d11_c 1s 3s -67%
mlk_poly_decompress_d4 1s 1s +0%
mlk_poly_frombytes_c 1s 1s +0%
mlk_poly_invntt_tomont 1s 3s -67%
mlk_scalar_compress_d1 1s 3s -67%
mlk_scalar_decompress_d4 1s 5s -80%
mlk_scalar_decompress_d5 1s 2s -50%
mlk_value_barrier_u32 1s 2s -50%
poly_invntt_tomont_native 1s 4s -75%
poly_tobytes_native_aarch64 1s 2s -50%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 1s 2s -50%

@mkannwischer mkannwischer force-pushed the unify-asm-symbols branch 3 times, most recently from 5945253 to 41bf8d8 Compare April 23, 2026 10:08
Comment thread mlkem/src/native/aarch64/src/arith_native_aarch64.h Outdated
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Every aarch64 asm symbol now ends in `_aarch64_asm`; every x86_64
avx2 symbol now ends in `_avx2_asm`.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@mkannwischer mkannwischer marked this pull request as ready for review April 23, 2026 12:16
@mkannwischer mkannwischer requested a review from a team as a code owner April 23, 2026 12:16
Comment thread test/bench/bench_components_mlkem.c Outdated
@mkannwischer mkannwischer added the benchmark this PR should be benchmarked in CI label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 12319 cycles 12319 cycles 1
ML-KEM-512 encaps 14997 cycles 14999 cycles 1.00
ML-KEM-512 decaps 19549 cycles 19553 cycles 1.00
ML-KEM-768 keypair 21264 cycles 21264 cycles 1
ML-KEM-768 encaps 23870 cycles 23873 cycles 1.00
ML-KEM-768 decaps 30422 cycles 30416 cycles 1.00
ML-KEM-1024 keypair 30328 cycles 30327 cycles 1.00
ML-KEM-1024 encaps 34573 cycles 34574 cycles 1.00
ML-KEM-1024 decaps 44189 cycles 44192 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 12029 cycles 12033 cycles 1.00
ML-KEM-512 encaps 13633 cycles 13743 cycles 0.99
ML-KEM-512 decaps 17834 cycles 17770 cycles 1.00
ML-KEM-768 keypair 21180 cycles 21022 cycles 1.01
ML-KEM-768 encaps 21967 cycles 22045 cycles 1.00
ML-KEM-768 decaps 27985 cycles 28383 cycles 0.99
ML-KEM-1024 keypair 30010 cycles 29907 cycles 1.00
ML-KEM-1024 encaps 31745 cycles 31725 cycles 1.00
ML-KEM-1024 decaps 39438 cycles 39485 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 59732 cycles 59743 cycles 1.00
ML-KEM-512 encaps 67426 cycles 67428 cycles 1.00
ML-KEM-512 decaps 86091 cycles 86147 cycles 1.00
ML-KEM-768 keypair 97452 cycles 97421 cycles 1.00
ML-KEM-768 encaps 111054 cycles 110877 cycles 1.00
ML-KEM-768 decaps 137799 cycles 137695 cycles 1.00
ML-KEM-1024 keypair 155079 cycles 154604 cycles 1.00
ML-KEM-1024 encaps 172564 cycles 171622 cycles 1.01
ML-KEM-1024 decaps 210021 cycles 207533 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppc64le (POWER10) benchmarks

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 59212 cycles 59193 cycles 1.00
ML-KEM-512 encaps 72019 cycles 71878 cycles 1.00
ML-KEM-512 decaps 91710 cycles 91690 cycles 1.00
ML-KEM-768 keypair 98099 cycles 98764 cycles 0.99
ML-KEM-768 encaps 114462 cycles 115317 cycles 0.99
ML-KEM-768 decaps 140083 cycles 140839 cycles 0.99
ML-KEM-1024 keypair 149092 cycles 148527 cycles 1.00
ML-KEM-1024 encaps 167487 cycles 167276 cycles 1.00
ML-KEM-1024 decaps 198880 cycles 198267 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 12794 cycles 12761 cycles 1.00
ML-KEM-512 encaps 14277 cycles 14248 cycles 1.00
ML-KEM-512 decaps 19151 cycles 19120 cycles 1.00
ML-KEM-768 keypair 22500 cycles 22417 cycles 1.00
ML-KEM-768 encaps 23072 cycles 23038 cycles 1.00
ML-KEM-768 decaps 30064 cycles 30055 cycles 1.00
ML-KEM-1024 keypair 34279 cycles 32991 cycles 1.04
ML-KEM-1024 encaps 33074 cycles 32995 cycles 1.00
ML-KEM-1024 decaps 42450 cycles 42448 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-1024 keypair 34279 cycles 32991 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 28278 cycles 28147 cycles 1.00
ML-KEM-512 encaps 36608 cycles 36558 cycles 1.00
ML-KEM-512 decaps 45130 cycles 45150 cycles 1.00
ML-KEM-768 keypair 46203 cycles 46280 cycles 1.00
ML-KEM-768 encaps 55777 cycles 55581 cycles 1.00
ML-KEM-768 decaps 69885 cycles 69890 cycles 1.00
ML-KEM-1024 keypair 70411 cycles 70335 cycles 1.00
ML-KEM-1024 encaps 82426 cycles 82474 cycles 1.00
ML-KEM-1024 decaps 99288 cycles 98864 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 14248 cycles 14310 cycles 1.00
ML-KEM-512 encaps 15995 cycles 16022 cycles 1.00
ML-KEM-512 decaps 21555 cycles 21523 cycles 1.00
ML-KEM-768 keypair 25117 cycles 24794 cycles 1.01
ML-KEM-768 encaps 25683 cycles 25510 cycles 1.01
ML-KEM-768 decaps 33555 cycles 33318 cycles 1.01
ML-KEM-1024 keypair 37387 cycles 37273 cycles 1.00
ML-KEM-1024 encaps 36158 cycles 37026 cycles 0.98
ML-KEM-1024 decaps 47254 cycles 46782 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 17671 cycles 17649 cycles 1.00
ML-KEM-512 encaps 20596 cycles 20599 cycles 1.00
ML-KEM-512 decaps 27070 cycles 27066 cycles 1.00
ML-KEM-768 keypair 29914 cycles 29900 cycles 1.00
ML-KEM-768 encaps 32725 cycles 32770 cycles 1.00
ML-KEM-768 decaps 41982 cycles 41963 cycles 1.00
ML-KEM-1024 keypair 43717 cycles 43745 cycles 1.00
ML-KEM-1024 encaps 48774 cycles 48728 cycles 1.00
ML-KEM-1024 decaps 61377 cycles 61389 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 17598 cycles 17549 cycles 1.00
ML-KEM-512 encaps 19899 cycles 19884 cycles 1.00
ML-KEM-512 decaps 26413 cycles 26413 cycles 1
ML-KEM-768 keypair 31194 cycles 33114 cycles 0.94
ML-KEM-768 encaps 31568 cycles 31081 cycles 1.02
ML-KEM-768 decaps 41462 cycles 41514 cycles 1.00
ML-KEM-1024 keypair 44361 cycles 43946 cycles 1.01
ML-KEM-1024 encaps 45812 cycles 45954 cycles 1.00
ML-KEM-1024 decaps 58039 cycles 58219 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 36585 cycles 36629 cycles 1.00
ML-KEM-512 encaps 43090 cycles 43067 cycles 1.00
ML-KEM-512 decaps 55715 cycles 55695 cycles 1.00
ML-KEM-768 keypair 58681 cycles 58664 cycles 1.00
ML-KEM-768 encaps 67556 cycles 67602 cycles 1.00
ML-KEM-768 decaps 84514 cycles 84441 cycles 1.00
ML-KEM-1024 keypair 89040 cycles 88993 cycles 1.00
ML-KEM-1024 encaps 99242 cycles 99200 cycles 1.00
ML-KEM-1024 decaps 120612 cycles 120553 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 40222 cycles 40253 cycles 1.00
ML-KEM-512 encaps 48433 cycles 48414 cycles 1.00
ML-KEM-512 decaps 62654 cycles 62588 cycles 1.00
ML-KEM-768 keypair 63818 cycles 63684 cycles 1.00
ML-KEM-768 encaps 74918 cycles 74995 cycles 1.00
ML-KEM-768 decaps 93781 cycles 93558 cycles 1.00
ML-KEM-1024 keypair 95292 cycles 95138 cycles 1.00
ML-KEM-1024 encaps 109370 cycles 109353 cycles 1.00
ML-KEM-1024 decaps 132156 cycles 132141 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 18660 cycles 18637 cycles 1.00
ML-KEM-512 encaps 21885 cycles 21875 cycles 1.00
ML-KEM-512 decaps 28885 cycles 28868 cycles 1.00
ML-KEM-768 keypair 31591 cycles 31541 cycles 1.00
ML-KEM-768 encaps 34745 cycles 34773 cycles 1.00
ML-KEM-768 decaps 44822 cycles 44779 cycles 1.00
ML-KEM-1024 keypair 46075 cycles 46074 cycles 1.00
ML-KEM-1024 encaps 51510 cycles 51495 cycles 1.00
ML-KEM-1024 decaps 65021 cycles 65024 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 35445 cycles 35410 cycles 1.00
ML-KEM-512 encaps 40094 cycles 40132 cycles 1.00
ML-KEM-512 decaps 51092 cycles 51139 cycles 1.00
ML-KEM-768 keypair 56737 cycles 56670 cycles 1.00
ML-KEM-768 encaps 64543 cycles 65147 cycles 0.99
ML-KEM-768 decaps 79371 cycles 79293 cycles 1.00
ML-KEM-1024 keypair 87847 cycles 87860 cycles 1.00
ML-KEM-1024 encaps 97110 cycles 96879 cycles 1.00
ML-KEM-1024 decaps 115947 cycles 115820 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 45675 cycles 45732 cycles 1.00
ML-KEM-512 encaps 54414 cycles 54473 cycles 1.00
ML-KEM-512 decaps 69757 cycles 69796 cycles 1.00
ML-KEM-768 keypair 74145 cycles 74157 cycles 1.00
ML-KEM-768 encaps 86116 cycles 86026 cycles 1.00
ML-KEM-768 decaps 106614 cycles 106622 cycles 1.00
ML-KEM-1024 keypair 112106 cycles 112077 cycles 1.00
ML-KEM-1024 encaps 124636 cycles 124624 cycles 1.00
ML-KEM-1024 decaps 150559 cycles 150546 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 28269 cycles 28220 cycles 1.00
ML-KEM-512 encaps 34109 cycles 34106 cycles 1.00
ML-KEM-512 decaps 44367 cycles 44334 cycles 1.00
ML-KEM-768 keypair 47687 cycles 47612 cycles 1.00
ML-KEM-768 encaps 53900 cycles 53939 cycles 1.00
ML-KEM-768 decaps 68354 cycles 68364 cycles 1.00
ML-KEM-1024 keypair 70253 cycles 70250 cycles 1.00
ML-KEM-1024 encaps 78732 cycles 78721 cycles 1.00
ML-KEM-1024 decaps 98426 cycles 98438 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 38933 cycles 38884 cycles 1.00
ML-KEM-512 encaps 44528 cycles 44604 cycles 1.00
ML-KEM-512 decaps 56587 cycles 56681 cycles 1.00
ML-KEM-768 keypair 62341 cycles 62294 cycles 1.00
ML-KEM-768 encaps 71079 cycles 72306 cycles 0.98
ML-KEM-768 decaps 87353 cycles 87694 cycles 1.00
ML-KEM-1024 keypair 96221 cycles 96166 cycles 1.00
ML-KEM-1024 encaps 106376 cycles 106126 cycles 1.00
ML-KEM-1024 decaps 126806 cycles 126586 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 28265 cycles 28269 cycles 1.00
ML-KEM-512 encaps 34166 cycles 34120 cycles 1.00
ML-KEM-512 decaps 44400 cycles 44374 cycles 1.00
ML-KEM-768 keypair 47674 cycles 47689 cycles 1.00
ML-KEM-768 encaps 54014 cycles 53927 cycles 1.00
ML-KEM-768 decaps 68460 cycles 68398 cycles 1.00
ML-KEM-1024 keypair 70362 cycles 70285 cycles 1.00
ML-KEM-1024 encaps 78748 cycles 78783 cycles 1.00
ML-KEM-1024 decaps 98557 cycles 98502 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 50675 cycles 50978 cycles 0.99
ML-KEM-512 encaps 58502 cycles 58933 cycles 0.99
ML-KEM-512 decaps 74639 cycles 76615 cycles 0.97
ML-KEM-768 keypair 86414 cycles 86388 cycles 1.00
ML-KEM-768 encaps 94789 cycles 94629 cycles 1.00
ML-KEM-768 decaps 116995 cycles 117573 cycles 1.00
ML-KEM-1024 keypair 130638 cycles 129988 cycles 1.01
ML-KEM-1024 encaps 142358 cycles 141966 cycles 1.00
ML-KEM-1024 decaps 175006 cycles 173240 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 59196 cycles 59137 cycles 1.00
ML-KEM-512 encaps 68638 cycles 68635 cycles 1.00
ML-KEM-512 decaps 87372 cycles 87344 cycles 1.00
ML-KEM-768 keypair 95297 cycles 95310 cycles 1.00
ML-KEM-768 encaps 110305 cycles 109871 cycles 1.00
ML-KEM-768 decaps 134531 cycles 134344 cycles 1.00
ML-KEM-1024 keypair 148104 cycles 148038 cycles 1.00
ML-KEM-1024 encaps 163848 cycles 163878 cycles 1.00
ML-KEM-1024 decaps 195607 cycles 195579 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks

Details
Benchmark suite Current: fd3457a Previous: fb00ece Ratio
ML-KEM-512 keypair 155516 cycles 155504 cycles 1.00
ML-KEM-512 encaps 163413 cycles 163433 cycles 1.00
ML-KEM-512 decaps 206632 cycles 206692 cycles 1.00
ML-KEM-768 keypair 249906 cycles 249933 cycles 1.00
ML-KEM-768 encaps 270399 cycles 270445 cycles 1.00
ML-KEM-768 decaps 332197 cycles 332250 cycles 1.00
ML-KEM-1024 keypair 395782 cycles 395718 cycles 1.00
ML-KEM-1024 encaps 422774 cycles 422709 cycles 1.00
ML-KEM-1024 decaps 506229 cycles 506207 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Comment thread test/bench/bench_components_mlkem.c
… API

Replace the AArch64-only `MLK_ARITH_BACKEND_AARCH64` guard with per-function
`MLK_USE_NATIVE_xxx` / `MLK_USE_FIPS202_xxx_NATIVE` gates, so each native
component benchmark is enabled exactly when the active backend provides that
function. Extend coverage to all entry points in `mlkem/src/native/api.h`
(adds rej_uniform, poly_frombytes, and the D4/D5/D10/D11 compress/decompress
families) and `mlkem/src/fips202/native/api.h` (adds keccak_f1600 x1/x4 and
the x4 xor_bytes/extract_bytes natives).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you very much @mkannwischer. The lack of uniformity in the naming of assembly files annoyed me for a while. Thanks for actually doing something about it 👍

@mkannwischer mkannwischer merged commit 3807e30 into main Apr 24, 2026
425 checks passed
@mkannwischer mkannwischer deleted the unify-asm-symbols branch April 24, 2026 05:00
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 24, 2026
Every aarch64 asm symbol now ends in `_aarch64_asm`; every x86_64 avx2
symbol now ends in `_avx2_asm`.

Port of pq-code-package/mlkem-native#1663 (commit 2/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 24, 2026
Every aarch64 asm symbol now ends in `_aarch64_asm`; every x86_64 avx2
symbol now ends in `_avx2_asm`.

Port of pq-code-package/mlkem-native#1663 (commit 2/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 24, 2026
Port of pq-code-package/mlkem-native#1663 (commit 3/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 25, 2026
Every aarch64 asm symbol now ends in `_aarch64_asm`; every x86_64 avx2
symbol now ends in `_avx2_asm`.

Port of pq-code-package/mlkem-native#1663 (commit 2/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 25, 2026
Port of pq-code-package/mlkem-native#1663 (commit 3/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 25, 2026
Every aarch64 asm symbol now ends in `_aarch64_asm`; every x86_64 avx2
symbol now ends in `_avx2_asm`.

Port of pq-code-package/mlkem-native#1663 (commit 2/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 25, 2026
Port of pq-code-package/mlkem-native#1663 (commit 3/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 25, 2026
Every aarch64 asm symbol now ends in `_aarch64_asm`; every x86_64 avx2
symbol now ends in `_avx2_asm`.

Port of pq-code-package/mlkem-native#1663 (commit 2/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 25, 2026
Port of pq-code-package/mlkem-native#1663 (commit 3/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 25, 2026
Every aarch64 asm symbol now ends in `_aarch64_asm`; every x86_64 avx2
symbol now ends in `_avx2_asm`.

Port of pq-code-package/mlkem-native#1663 (commit 2/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Apr 25, 2026
Port of pq-code-package/mlkem-native#1663 (commit 3/4).

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark this PR should be benchmarked in CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants