This directory contains intermediate development artifacts that are not part of the final mlkem-native sources.
It is only relevant to you if you are developing mlkem-native or would like to understand the origin of the assembly source files.
aarch64_clean contains the 'clean' assembly underlying the AArch64 native backend of mlkem-native.
The files in this directory are handwritten and kept readable through the extensive use of register aliases and macros.
aarch64_opt contains the results of running the SLOTHY superoptimizer on the clean assembly files in aarch64_clean.
The optimized sections are 'raw' assembly in the sense that they no longer use register macros or aliases, but the surrounding code (such as the
function preamble and postamble) typically still use those register aliases/macros. Also, the macros and alias
definitions themselves are still kept. See the SLOTHY paper1 for more details on SLOTHY.
The final AArch64 arithmetic assembly from mlkem/src/native/aarch64/src is auto-generated
from the optimized assembly using the simpasm script, which simplifies it through a combination
of assembly+disassembly. This final assembly does not contain any register aliases or macros anymore.
The final assembly is autogenerated from the optimized assembly through the autogen script.
Non-assembly files are synchronized by copy between this directory and mlkem.
To test the clean assembly, run autogen --aarch64-clean. This will import the clean backend into mlkem/src/native/aarch64/*,
replacing the optimized one. With autogen --aarch64-clean --no-simplify or autogen --no-simplify you can moreover reinstate
the non-simplified assembly in the main source tree.
Alternatively, you can also just manually copy the entire aarch64_clean and aarch64_opt trees into mlkem/src/native/aarch64/.
As for the AArch64 arithmetic assembly, the final FIPS-202 assembly is the result of running simpasm
on the assembly in fips202/aarch64/src. Non-assembly files are synchronized by copy.
As for the AArch64 arithmetic assembly, the final x86_64 arithmetic assembly is the result of running simpasm
on the assembly in x86_64/src. Non-assembly files are synchronized by copy.
Footnotes
-
Abdulrahman, Becker, Kannwischer, Klein: Fast and Clean: Auditable high-performance assembly via constraint solving, https://eprint.iacr.org/2022/1303 ↩