code/poly1305-core: Rust code for the core of Poly1305. This contains the implementation of the Poly1305 hash that will be used in the two next sub-crates.code/poly1305-gen: Rust code for the first binary, usespoly1305-coreto compute a tag from a given key and a file (both given as arguments).code/poly1305-check: Rust code for the second binary, usespoly1305-coreto check that a given tag from a given key and a file is indeed correct.
- Build the binaries
make build
You can then execute poly1305-core and poly1305-check that are now in the current directory.
- Build (without target cpu optimizations, maximize compatibility)
make build generic
- Bench :
make bench
- Test :
make test
- Clean :
make clean
Note : please execute at least
make buildbefore The test vector included in the RFC8439 at section 2.5.2 is of course embeded in a test in the rust crate, but you can test it by yourself :
# Don't add \n at the end
echo -n 'Cryptographic Forum Research Group' > ./test-vector.txt
./poly1305-gen 85d6be7857556d337f4452fe42d506a80103808afb0db2fd4abff6af4149f51b ./test-vector.txt
# Should output : a8061dc1305136c6c22b8baf0c0127a9
./poly1305-check 85d6be7857556d337f4452fe42d506a80103808afb0db2fd4abff6af4149f51b ./test-vector.txt a8061dc1305136c6c22b8baf0c0127a9
# Should output : ACCEPTOne can also test that the two programms are coherent together :
export AUTH_KEY="[whatever 32 bytes key you want]" FILE="/path/to/your/favorite/file"
./poly1305-check $AUTH_KEY $FILE $(./poly1305-gen $AUTH_KEY $FILE)I tried to implement it in an optimized way. To do that I used an optimized multiplication implementation for the field 2^130 - 5 from here.
This implementation is tested (both arithmetic functions and the Poly1305 itself) by using :
- Test vector from rfc8439 for Poly1305,
- proptest for property testing and malachite as a reference for the field operations.
The idea is to split the 130 bits field integer in 5 separates 26 bits limbs represented by u64. It allows to handle such integers without any dependancy and to propagate the carry more efficiently. I implemented a naive addition on top of that. It might be more optimized to split the 130 bits differently by using u128 integers instead of u64, reducing the number of limbs but I did not tried. This first "naive" implementation focused on arithmetic optimization gave a throughput of approx. 3.7 cycles/byte.
After that first implementation, I tried profiling it with flamegraph but did not find a real bottleneck so I kept this implementation which seems quite efficient (even if it is possible to do a better one).
This programm allows to compute a tag for a given key and file from arguments :
./poly1305-gen [32 BYTES KEY IN HEX] /path/to/file
Note : hex key is case insensitive
This programm allows to check if a tag is correct for a given key and file from arguments :
poly1305-check [32 BYTES KEY IN HEX] /path/to/file [16 BYTES TAG]
If the given tag is valid, it outputs ACCEPT to stdout, REJECT otherwise.
Note : hex key and tag are case insensitive