ARM: default to NUDUPL squaring, drop phased pipeline, and add NUDUPL profiling + CI reliability fixes#297
Closed
hoffmang9 wants to merge 46 commits into
Closed
ARM: default to NUDUPL squaring, drop phased pipeline, and add NUDUPL profiling + CI reliability fixes#297hoffmang9 wants to merge 46 commits into
hoffmang9 wants to merge 46 commits into
Conversation
- Add ARM C++ fallback for gcd_unsigned via asm_arm_fallback_impl.inc - Dispatch to ARM fallback in threading.h when ARCH_ARM; declare in asm_main.h - Define ARCH_ARM for __aarch64__ in parameters.h - Guard x86-only code: get_time_cycles (rdtsc) and x86intrin in vdf.h - Fix mpz_cmp_si callers in threading.h to use _() for GMP pointer - Makefile: detect arm64/aarch64, use C++ fallback instead of x86 asm; add Homebrew paths on macOS - Skip TEST_ASM verification blocks on ARM in gcd_unsigned.h, gcd_128.h, gcd_base_continued_fractions.h - Include ARM fallback .inc from vdf_client, prover_test, 1weso_test, 2weso_test, vdf_bench - README: add Running tests section (C++ and Python, ARM note)
- gcd_unsigned_arm_uv_callback: make thread_local in gcd_unsigned.h and asm_arm_fallback_impl.inc so concurrent threads do not overwrite it. - vdf_bench: include vdf.h and define gcd_base_bits/gcd_128_max_iter unconditionally; only asm_arm_fallback_impl.inc stays under #if ARCH_ARM.
Member
Author
|
@cursor review |
…ating “after single slow” cases when recovery does multiple slow steps.
Member
Author
|
@cursor review |
Member
Author
|
@cursor review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Member
Author
|
Superseded by #298 (new branch with squashed/clean history and updated PR scope/title). Closing this PR for easier review going forward. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Description
Summary
This PR makes VDF squaring fast and maintainable on ARM64 (macOS arm64 / aarch64) by routing ARM builds to the C++ NUDUPL squaring path and removing ARM’s dependency on the x86-centric phased pipeline. It also adds NUDUPL-focused profiling hooks, tightens several numeric invariants that were unstable on ARM, and improves CI/build reliability across platforms.
In addition, it adds a fast/streaming C wrapper API for one-Wesolowski proving, restructures C++ test build targets, and speeds up Rust fuzzing in CI by running fuzz targets in parallel (without reducing per-target fuzz time).
What changed
ARM64: prefer NUDUPL, avoid phased pipeline
repeated_square()onARCH_ARMuses the C++ NUDUPL path (repeated_square_nudupl) and does not attempt the phased pipeline. (src/vdf.h)vdf_benchcompatibility: on ARM,vdf_bench square_asmis treated as a NUDUPL benchmark for script compatibility. (src/vdf_bench.cpp)asm_arm_func_gcd_unsigned(...)underARCH_ARMand provide the implementation viasrc/asm_arm_fallback_impl.incincluded in ARM binaries. (src/asm_main.h,src/asm_arm_fallback_impl.inc)NUDUPL performance work + profiling
qfb_nudupl) and inmpz_xgcd_partialto eliminate per-iterationmpz_init/mpz_clearchurn. (src/nucomp.h,src/xgcd_partial.c)src/chiavdf_profile.hand wired throughrepeated_square_nudupl()andqfb_nudupl()so we can attribute time togcdext,xgcd_partial, and thea>=Lbranch when needed. (src/chiavdf_profile.h,src/vdf.h,src/nucomp.h)CHIAVDF_DIAG=1(orCHIAVDF_VDF_TEST_STATS=1)CHIAVDF_PROFILE=1(orCHIAVDF_NUDUPL_PROFILE=1)CHIAVDF_BENCH_DIAGwas removed (useCHIAVDF_DIAG/CHIAVDF_VDF_TEST_STATS).GCD / numeric stability fixes
gcd_128boundary contract tightening: reject invalid partial steps and enforce expected invariants at the double→integer boundary. (src/gcd_128.h)src/gcd_unsigned.h,src/gcd_base_continued_fractions.h)Streaming fast C wrapper
src/c_bindings/fast_wrapper.{h,cpp}for fast proving + streaming variants.src/fast_wrapper_test.cppto validate streaming and non-streaming outputs match.Build system + CI reliability
Makefile.vdf-client:teststarget).-mcpu=nativetuning on macOS arm64 (disabled under CI)..github/workflows/test.yaml):macos-13-arm64to the matrix.apt-get updatewith retries on Ubuntu to avoid transient mirror/index 404s.make ... tests.main:optimized=1: runs./1weso_testand./2weso_testwith no args (defaults).ASAN/TSAN: runs./1weso_test 1000and./2weso_test 1000.prover_testin fast mode in CI:CHIAVDF_PROVER_TEST_FAST=1 ./prover_test..github/workflows/rust.yml):setup.pyreadsREADME.mdas UTF-8 to avoid Windows metadata encoding failures.pyproject.tomluses conditional Homebrew installs on macOS for wheel builds..gitignoreupdated for additional binaries/artifacts (e.g.src/fast_wrapper_test,*.o.tmp,*.dSYM/).Test plan
From
src/:Notes:
prover_testis a soak/stress test by default; setCHIAVDF_PROVER_TEST_FAST=1to run a quick correctness test.square_asminvdf_benchbenchmarks NUDUPL (compatibility alias).Note
High Risk
Changes core VDF squaring/gcd hot paths and adds architecture-conditional behavior (ARM vs x86), which can impact correctness and performance across platforms; CI updates mitigate but don’t eliminate the need for careful cross-arch validation.
Overview
Routes
ARCH_ARMVDF squaring to a newrepeated_square_nudupl()loop (NUDUPL + conditional reduction) and gates the x86 phased/asm pipeline behindARCH_X86/ARCH_X64includes, withvdf_bench square_asmfalling back to NUDUPL on non-x86.Reworks NUDUPL hot loops for speed/robustness by reusing thread-local GMP temporaries (in
qfb_nuduplandmpz_xgcd_partial), adjusting remainder/division handling, and adding optionalVDF_TESTprofiling hooks viachiavdf_profile.h.Improves build/packaging/CI reliability: adds macOS arm64 to C++ test matrix, makes
prover_testCI-friendly viaCHIAVDF_PROVER_TEST_FAST, adds apt update retries, updates Rust fuzzing to run per-target in a matrix with caching, hardens macOS wheel deps inpyproject.toml, and readsREADME.mdas UTF-8 insetup.py.Written by Cursor Bugbot for commit 16abd8f. This will update automatically on new commits. Configure here.