Name	Name	Last commit message	Last commit date
parent directory ..
charts	charts
Makefile	Makefile
README.md	README.md
arm-baseline.json	arm-baseline.json
baseline.json	baseline.json
benchmark-results.json	benchmark-results.json
benchmark_c_raw.c	benchmark_c_raw.c
benchmark_runner.py	benchmark_runner.py
benchmark_suite.py	benchmark_suite.py
check_baseline_justification.py	check_baseline_justification.py
comparative_benchmark.py	comparative_benchmark.py
generate_charts.py	generate_charts.py
performance_comparison.py	performance_comparison.py
performance_suite.py	performance_suite.py
phase0_baseline.py	phase0_baseline.py
phase0_baseline_results.json	phase0_baseline_results.json
requirements-bench.txt	requirements-bench.txt
validation_suite.py	validation_suite.py

AMA Cryptography — Benchmarks

This directory contains the benchmarking, regression-detection, and validation tools for measuring the performance of the AMA Cryptography library.

Provenance Policy

Every number quoted in a chart, README, or dashboard must be traceable to one of the artifacts below (or to live output regenerated from them):

Artifact	Scope	Produced by
`baseline.json`	CI regression tolerances (65% of measured performance)	Edited manually when primitives land/change
`phase0_baseline_results.json`	Python/ctypes-path per-op medians	`python benchmarks/phase0_baseline.py`
`benchmark_results.json` (runtime-only)	Suite output consumed by dashboards	`python benchmarks/benchmark_suite.py --json benchmarks/benchmark_results.json`
`../build/bin/benchmark_c_raw` (runtime-only)	Raw C per-op medians (no ctypes overhead)	`cmake -B build -DAMA_USE_NATIVE_PQC=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build --target benchmark_c_raw && build/bin/benchmark_c_raw --json`
`../docs/compliance/CSRC_ALIGN_REPORT.md`	NIST ACVP vector counts (1,215/1,215/0 — 815 AFT + 400 SHA-3 MCT)	Updated with each alignment run

If a chart cannot cite one of these, it should not be in the repository. The fallback tables in generate_charts.py are anchored to phase0_baseline_results.json and benchmark_c_raw; they are overridden by live data from benchmark_results.json when available.

Raw C Benchmark (`benchmark_c_raw.c`)

Directly calls C library functions without Python or ctypes involvement. Provides the most accurate measurement of C library performance.

Build

# Option 1: Build via the benchmarks Makefile (auto-detects library)
make -C benchmarks benchmark_c_raw

# Option 2: Build via cmake (adds benchmark_c_raw target)
cmake -B build -DAMA_USE_NATIVE_PQC=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build
make -C benchmarks benchmark_c_raw

Run

# Human-readable table (default)
./benchmarks/benchmark_c_raw

# Machine-parseable CSV
./benchmarks/benchmark_c_raw --csv

# Machine-parseable JSON
./benchmarks/benchmark_c_raw --json

What It Benchmarks

Category	Operations
Hash	SHA3-256 (32B, 1KB), SHA3-512 (32B, 1KB)
MAC	HMAC-SHA3-256 (32B, 1KB)
KDF	HKDF-SHA3-256 (96B output)
Signatures (classical)	Ed25519 (keygen/sign/verify, plus Shamir wNAF double-scalar-mult)
Signatures (PQC)	ML-DSA-65 (keygen/sign/verify); ML-DSA-65 NTT / invNTT kernel isolation (scalar vs dispatched); SLH-DSA-SHAKE-128s (FIPS 205 L1 keygen/sign/verify)
KEM	ML-KEM-1024 (keygen/encaps/decaps); ML-KEM-1024 poly add/sub/reduce kernels (scalar vs dispatched)
AEAD	AES-256-GCM (1KB/4KB/16KB/64KB enc+dec), ChaCha20-Poly1305 (256B/1KB/4KB/64KB)
Password hashing	Argon2id (m=64 KiB, m=1 MiB)
Key Exchange	X25519 (keygen, DH exchange, batch×{1,4,8,16}); X25519 DH with MULX/ADX kernel off vs on (BMI2+ADX gate quantification)
Elliptic curves (Bitcoin)	secp256k1 pubkey-from-privkey (SEC1 compressed)
Threshold signatures	FROST 2-of-3 round1 commit / round2 sign / aggregate (RFC 9591)

Kernel-isolation rows

Three benchmark families produce paired (scalar) / (dispatch) rows so the per-kernel SIMD win is measurable directly, not folded into an end-to-end primitive cost:

ML-DSA-65 NTT / invNTT — uses the benchmark-only ama_dilithium_ntt_bench() / ama_dilithium_invntt_bench() entry points (in include/ama_cryptography.h) which route through the exact same dil_ntt_cached / dil_invntt_cached code paths as production sign/verify but bind the dispatch slot explicitly.
ML-KEM-1024 poly_{add,sub,reduce} — runs the SVE2-targeted helper-kernel pair against the inline scalar reference; on AVX2 / NEON hosts the dispatch slot is NULL and both rows time the same compiler auto-vectorised loop (which is itself useful: the zero-delta is the documented contract on those hosts).
X25519 DH (MULX off / MULX on) — uses the benchmark/test-only ama_x25519_set_mulx_override() API to pin the runtime BMI2+ADX gate without rebuilding. On hosts without the kernel (CPUID gate fails or AMA_HAVE_X25519_FE64_MULX_IMPL not defined at build), both rows time the pure-C fe64 path and the equal numbers are themselves informative ("no kernel on this host").

Methodology

Timer: clock_gettime(CLOCK_MONOTONIC) (nanosecond resolution)
Warmup: 50 iterations discarded before measurement
Iterations: 200–5,000 depending on operation speed
Statistics: mean, median, stddev, min, max, ops/sec

Output Format

The default table output includes a comparison-ready format:

Operation                      | Raw C ops/sec  | Raw C latency
-------------------------------|----------------|---------------
SHA3-256 (32B)                 |        555556  |       1.80 us
Ed25519 Sign                   |         15625  |      64.00 us
ML-DSA-65 Sign                 |          1053  |     950.00 us

Python/ctypes Benchmarks

Script	Purpose
`benchmark_runner.py`	CI/CD regression detection against `baseline.json`
`performance_suite.py`	Comprehensive Python vs Cython vs C comparison
`phase0_baseline.py`	Establishes ctypes-based performance baselines
`validation_suite.py`	Validates documented claims against measured performance
`comparative_benchmark.py`	Compares AMA vs OpenSSL+liboqs

Running CI Benchmarks

python benchmarks/benchmark_runner.py --verbose

Running the Full Suite

python benchmarks/benchmark_suite.py --markdown BENCHMARKS.md --json benchmarks/benchmark_results.json

Properties Table — Beyond ops/sec

Throughput is only one axis on which a cryptographic library should be compared. The table below fixes four other axes that readers routinely care about but rarely find side-by-side. Citations are linked inline; every row points to the project's own documentation rather than a second-hand summary. "—" in a cell indicates the project does not make an explicit claim on that axis (not necessarily "no" — absence of claim).

Property	AMA Cryptography	PyNaCl / libsodium	cryptography / OpenSSL	liboqs
Constant-time guarantee	Documented per-primitive in `CONSTANT_TIME_VERIFICATION.md`; empirical verification via in-tree dudect harness under `-DAMA_ENABLE_DUDECT=ON`.	Documented at the library level; "designed to be constant-time in secret data" per libsodium docs. No dudect harness shipped.	OpenSSL 3.x documents constant-time for ECDSA, ECDH; historical audit findings (CVE-2020-0601, CVE-2020-1971) show carve-outs on some codepaths.	liboqs Constant-time Policy applies to PQC kernels; upstream CI runs `valgrind --tool=memcheck` with uninit-tainting.
Supply-chain surface	Zero runtime crypto deps (INVARIANT-1); only in-tree C + optional vendored ed25519-donna (public domain, compiled from source). SBOM at `schemas/*.spdx`.	libsodium native binary (`libsodium.so`) installed via OS package or wheel; PyNaCl is a thin CFFI wrapper.	OpenSSL native binary (`libcrypto.so`); `cryptography` wheels ship a Rust binding on top.	liboqs native binary (`liboqs.so`) compiled from C source; Python wrapper is a thin ctypes shim.
Self-test (FIPS-style POST)	Startup POST via `ama_self_test()` covers AES-GCM, SHA3, Ed25519, ML-DSA, ML-KEM, HKDF self-tests (see `src/c/ama_core.c`).	None shipped with libsodium / PyNaCl.	OpenSSL FIPS provider (when built) runs full FIPS 140-3 KAT suite at init; default build does not.	liboqs ships KAT test binaries (`tests/kat_sig_algs`, `tests/kat_kem_algs`), run offline; no startup POST.
Audit LoC (native C)	~12 k lines in `src/c/.c` + `.h`; plus ~3 k lines of vendored ed25519-donna under `src/c/vendor/`. Per-primitive provenance in `src/c/PROVENANCE.md`.	libsodium: ~45 k lines of hand-tuned C + assembly (including portable / assembly / ARM / SSE variants).	OpenSSL libcrypto: ~500 k lines of C covering TLS, PKCS#11, legacy algorithms, engines, FIPS module.	liboqs: ~60 k lines for PQC primitives alone (ML-KEM, ML-DSA, SLH-DSA, HQC, Classic McEliece, Frodo, etc.).
ACVP / NIST KAT coverage	1,215 / 1,215 vectors in-tree (see `docs/compliance/CSRC_ALIGN_REPORT.md`); continuously enforced in `.github/workflows/acvp_validation.yml`.	No ACVP self-attestation; libsodium has not pursued FIPS.	OpenSSL FIPS provider validated against NIST ACVP when built in FIPS mode.	PQClean upstream ships KAT vectors; liboqs CI runs them.

Why this table matters

Throughput comparisons alone rank a PQC or classical-crypto library on one narrow axis. A production deployment routinely has to trade a 2× throughput gain for a 10× expansion in supply-chain surface or the loss of a constant-time guarantee. Making those trade-offs explicit is the point of this table: a 5× speedup that comes via pulling in 500 k additional audit LoC is a different engineering choice than a 5× speedup that comes from a better in-tree algorithm.

Benchmark coverage map (2026-05)

The raw-C harness was extended in 2026-05 to close the explicit coverage gaps called out in the May 2026 brief. Each gap is now backed by at least one benchmark row in benchmark_c_raw.c, named so it can be grepped from JSON output:

Gap (May 2026 brief)	Status	Row(s) in `benchmark_c_raw`
MULX/ADX on-vs-off X25519 ratio	✅ closed	`X25519 DH (MULX off)`, `X25519 DH (MULX on)`
SLH-DSA / SPHINCS+ (FIPS 205)	✅ closed	`SLH-DSA-SHAKE-128s KeyGen` / `Sign` / `Verify` (NIST L1)
secp256k1	✅ closed	`secp256k1 pubkey` (compressed SEC1)
FROST	✅ closed	`FROST round1 commit` / `round2 sign` / `aggregate` (2-of-3, RFC 9591)
Dilithium NTT kernel isolation	✅ closed	`ML-DSA-65 NTT (scalar)` / `NTT (dispatch)` / `invNTT (scalar)` / `invNTT (dispatch)`
ML-KEM-1024 decapsulate	✅ already covered	`ML-KEM-1024 Decaps` (`benchmark_c_raw.c` decaps row)
X25519 batch×4 (no env gating)	✅ already covered	`X25519 DH Batch×4` (unconditional)
Argon2id	✅ already covered	`Argon2id (m=64KiB,t=1)`, `Argon2id (m=1MiB,t=1)`
Raw HKDF-SHA3-256	✅ already covered	`HKDF-SHA3-256 (96B)`

The benchmark/test-only entry points that enable the new rows (ama_x25519_set_mulx_override, ama_dilithium_ntt_bench, ama_dilithium_invntt_bench) are documented in include/ama_cryptography.h and explicitly marked as not part of the production crypto surface. They exist so a single shipped binary can measure paired scalar-vs-dispatched and kernel-on-vs-off rows without rebuilding the library with different -D flags.

VAES AES-256-GCM dispatch (PR A, 2026-04)

The library grew an optional bulk-throughput AES-256-GCM kernel in PR A: VAES + VPCLMULQDQ on YMM, runtime-dispatched behind an OSXSAVE-gated CPUID probe (see ama_cpuid_has_vaes_aesgcm()).

The VAES + VPCLMULQDQ AES-GCM path targets YMM (256-bit), not ZMM. Zen 3+ / Ice Lake+ CPUs execute these without the AVX-512 ZMM frequency penalty documented for Skylake-SP / Cascade Lake. Cloud VM variance on shared hosts is still the dominant noise source; published throughput numbers are from bare-metal runs, not CI. The regression baseline tracked in baseline.json continues to target the AVX2 AES-NI + PCLMULQDQ path shipped in #253 / #254 / #260 / #261, and the VAES kernel will only be promoted to the published throughput number after a 3-run stability check on dedicated silicon — mirroring the 2026-04-21 dilithium_sign recalibration pattern already in baseline.json::metadata.baseline_change_log.

Hosts without VAES (or any non-x86-64 host) automatically route through the AVX2 AES-NI + PCLMULQDQ fallback, which was already validated by ACVP and is byte-identical to the VAES path (see tests/c/test_aes_gcm_vaes_equiv.c).

Interpreting Results

See BENCHMARKS.md (generated locally by running python benchmarks/benchmark_suite.py; not checked into version control) for the authoritative performance document, including:

Three-column comparison (Raw C | Python/ctypes | ctypes overhead)
Competitive context against libsodium, liboqs, and OpenSSL
Detailed analysis of ctypes overhead impact per operation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

AMA Cryptography — Benchmarks

Provenance Policy

Raw C Benchmark (`benchmark_c_raw.c`)

Build

Run

What It Benchmarks

Kernel-isolation rows

Methodology

Output Format

Python/ctypes Benchmarks

Running CI Benchmarks

Running the Full Suite

Properties Table — Beyond ops/sec

Why this table matters

Benchmark coverage map (2026-05)

VAES AES-256-GCM dispatch (PR A, 2026-04)

Interpreting Results

Uh oh!

FilesExpand file tree

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

AMA Cryptography — Benchmarks

Provenance Policy

Raw C Benchmark (benchmark_c_raw.c)

Build

Run

What It Benchmarks

Kernel-isolation rows

Methodology

Output Format

Python/ctypes Benchmarks

Running CI Benchmarks

Running the Full Suite

Properties Table — Beyond ops/sec

Why this table matters

Benchmark coverage map (2026-05)

VAES AES-256-GCM dispatch (PR A, 2026-04)

Interpreting Results

Raw C Benchmark (`benchmark_c_raw.c`)