Skip to content

ARM: default to NUDUPL squaring, drop phased pipeline, and add NUDUPL profiling + CI reliability fixes#297

Closed
hoffmang9 wants to merge 46 commits into
mainfrom
arm-vdf-fallback
Closed

ARM: default to NUDUPL squaring, drop phased pipeline, and add NUDUPL profiling + CI reliability fixes#297
hoffmang9 wants to merge 46 commits into
mainfrom
arm-vdf-fallback

Conversation

@hoffmang9
Copy link
Copy Markdown
Member

@hoffmang9 hoffmang9 commented Jan 31, 2026

PR Description

Summary

This PR makes VDF squaring fast and maintainable on ARM64 (macOS arm64 / aarch64) by routing ARM builds to the C++ NUDUPL squaring path and removing ARM’s dependency on the x86-centric phased pipeline. It also adds NUDUPL-focused profiling hooks, tightens several numeric invariants that were unstable on ARM, and improves CI/build reliability across platforms.

In addition, it adds a fast/streaming C wrapper API for one-Wesolowski proving, restructures C++ test build targets, and speeds up Rust fuzzing in CI by running fuzz targets in parallel (without reducing per-target fuzz time).

What changed

ARM64: prefer NUDUPL, avoid phased pipeline

  • ARM default squaring: repeated_square() on ARCH_ARM uses the C++ NUDUPL path (repeated_square_nudupl) and does not attempt the phased pipeline. (src/vdf.h)
  • vdf_bench compatibility: on ARM, vdf_bench square_asm is treated as a NUDUPL benchmark for script compatibility. (src/vdf_bench.cpp)
  • ARM “asm GCD/UV stream” fallback symbol: declare asm_arm_func_gcd_unsigned(...) under ARCH_ARM and provide the implementation via src/asm_arm_fallback_impl.inc included in ARM binaries. (src/asm_main.h, src/asm_arm_fallback_impl.inc)

NUDUPL performance work + profiling

  • Hot-loop allocation removal: reuse thread-local GMP temporaries in NUDUPL (qfb_nudupl) and in mpz_xgcd_partial to eliminate per-iteration mpz_init/mpz_clear churn. (src/nucomp.h, src/xgcd_partial.c)
  • Profiling hooks (counts + optional timings): centralized in src/chiavdf_profile.h and wired through repeated_square_nudupl() and qfb_nudupl() so we can attribute time to gcdext, xgcd_partial, and the a>=L branch when needed. (src/chiavdf_profile.h, src/vdf.h, src/nucomp.h)
  • Supported knobs:
    • Counts / lightweight summaries: CHIAVDF_DIAG=1 (or CHIAVDF_VDF_TEST_STATS=1)
    • Heavier timing instrumentation: CHIAVDF_PROFILE=1 (or CHIAVDF_NUDUPL_PROFILE=1)
    • Note: CHIAVDF_BENCH_DIAG was removed (use CHIAVDF_DIAG / CHIAVDF_VDF_TEST_STATS).

GCD / numeric stability fixes

  • gcd_128 boundary contract tightening: reject invalid partial steps and enforce expected invariants at the double→integer boundary. (src/gcd_128.h)
  • Misc. correctness/robustness fixes across GCD helpers and fast-path interfaces that were fragile on ARM. (e.g. src/gcd_unsigned.h, src/gcd_base_continued_fractions.h)

Streaming fast C wrapper

  • Adds src/c_bindings/fast_wrapper.{h,cpp} for fast proving + streaming variants.
  • Adds src/fast_wrapper_test.cpp to validate streaming and non-streaming outputs match.

Build system + CI reliability

  • Makefile.vdf-client:
    • Detects ARM and disables x86 asm objects on ARM.
    • Splits default binaries vs extra test binaries (tests target).
    • Adds TSAN define, improves macOS include/lib discovery, and supports local-only -mcpu=native tuning on macOS arm64 (disabled under CI).
  • C++ CI (.github/workflows/test.yaml):
    • Adds macos-13-arm64 to the matrix.
    • Runs apt-get update with retries on Ubuntu to avoid transient mirror/index 404s.
    • Builds extra tests via make ... tests.
    • Iteration counts match main:
      • optimized=1: runs ./1weso_test and ./2weso_test with no args (defaults).
      • ASAN/TSAN: runs ./1weso_test 1000 and ./2weso_test 1000.
    • Runs prover_test in fast mode in CI: CHIAVDF_PROVER_TEST_FAST=1 ./prover_test.
  • Rust CI (.github/workflows/rust.yml):
    • Runs fuzz targets in parallel via a matrix; same per-target time budget.
    • Adds caching for cargo registry/git and build artifacts.
    • Adds defensive dependency installation steps on macOS and Ubuntu.
  • Packaging hardening:
    • setup.py reads README.md as UTF-8 to avoid Windows metadata encoding failures.
    • pyproject.toml uses conditional Homebrew installs on macOS for wheel builds.
  • Repo hygiene:
    • .gitignore updated for additional binaries/artifacts (e.g. src/fast_wrapper_test, *.o.tmp, *.dSYM/).

Test plan

From src/:

make -f Makefile.vdf-client clean all tests

# Short correctness runs (sanitizer/CI style)
./1weso_test 1000
./2weso_test 1000
CHIAVDF_PROVER_TEST_FAST=1 ./prover_test

# Benchmark / diagnostics (optional)
CHIAVDF_DIAG=1 ./vdf_bench square_asm 400000 --recover-a
CHIAVDF_PROFILE=1 ./vdf_bench square_vdf 2000000

Notes:

  • prover_test is a soak/stress test by default; set CHIAVDF_PROVER_TEST_FAST=1 to run a quick correctness test.
  • On ARM, square_asm in vdf_bench benchmarks NUDUPL (compatibility alias).

Note

High Risk
Changes core VDF squaring/gcd hot paths and adds architecture-conditional behavior (ARM vs x86), which can impact correctness and performance across platforms; CI updates mitigate but don’t eliminate the need for careful cross-arch validation.

Overview
Routes ARCH_ARM VDF squaring to a new repeated_square_nudupl() loop (NUDUPL + conditional reduction) and gates the x86 phased/asm pipeline behind ARCH_X86/ARCH_X64 includes, with vdf_bench square_asm falling back to NUDUPL on non-x86.

Reworks NUDUPL hot loops for speed/robustness by reusing thread-local GMP temporaries (in qfb_nudupl and mpz_xgcd_partial), adjusting remainder/division handling, and adding optional VDF_TEST profiling hooks via chiavdf_profile.h.

Improves build/packaging/CI reliability: adds macOS arm64 to C++ test matrix, makes prover_test CI-friendly via CHIAVDF_PROVER_TEST_FAST, adds apt update retries, updates Rust fuzzing to run per-target in a matrix with caching, hardens macOS wheel deps in pyproject.toml, and reads README.md as UTF-8 in setup.py.

Written by Cursor Bugbot for commit 16abd8f. This will update automatically on new commits. Configure here.

- Add ARM C++ fallback for gcd_unsigned via asm_arm_fallback_impl.inc
- Dispatch to ARM fallback in threading.h when ARCH_ARM; declare in asm_main.h
- Define ARCH_ARM for __aarch64__ in parameters.h
- Guard x86-only code: get_time_cycles (rdtsc) and x86intrin in vdf.h
- Fix mpz_cmp_si callers in threading.h to use _() for GMP pointer
- Makefile: detect arm64/aarch64, use C++ fallback instead of x86 asm; add Homebrew paths on macOS
- Skip TEST_ASM verification blocks on ARM in gcd_unsigned.h, gcd_128.h, gcd_base_continued_fractions.h
- Include ARM fallback .inc from vdf_client, prover_test, 1weso_test, 2weso_test, vdf_bench
- README: add Running tests section (C++ and Python, ARM note)
Comment thread src/asm_arm_fallback_impl.inc Outdated
Comment thread src/vdf_bench.cpp Outdated
- gcd_unsigned_arm_uv_callback: make thread_local in gcd_unsigned.h and
  asm_arm_fallback_impl.inc so concurrent threads do not overwrite it.
- vdf_bench: include vdf.h and define gcd_base_bits/gcd_128_max_iter
  unconditionally; only asm_arm_fallback_impl.inc stays under #if ARCH_ARM.
Comment thread src/asm_arm_fallback_impl.inc Outdated
Comment thread src/asm_arm_fallback_impl.inc Outdated
Comment thread src/asm_arm_fallback_impl.inc Outdated
@hoffmang9 hoffmang9 marked this pull request as draft January 31, 2026 21:26
@hoffmang9
Copy link
Copy Markdown
Member Author

@cursor review

Comment thread src/vdf.h Outdated
@hoffmang9 hoffmang9 marked this pull request as ready for review February 4, 2026 09:43
@hoffmang9
Copy link
Copy Markdown
Member Author

@cursor review

Comment thread src/fast_wrapper_test.cpp Outdated
@hoffmang9
Copy link
Copy Markdown
Member Author

@cursor review

Comment thread src/threading.h Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Comment thread src/vdf.h Outdated
Comment thread .github/workflows/test.yaml
Comment thread src/asm_arm_fallback_impl.inc Outdated
@hoffmang9 hoffmang9 changed the title ARM64 fast VDF fallback + streaming C wrapper + diagnostics/tests + CI reliability fixes ARM: default to NUDUPL squaring, drop phased pipeline, and add NUDUPL profiling + CI reliability fixes Feb 5, 2026
Comment thread src/gcd_unsigned.h Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread src/vdf.h
@hoffmang9
Copy link
Copy Markdown
Member Author

Superseded by #298 (new branch with squashed/clean history and updated PR scope/title). Closing this PR for easier review going forward.

@hoffmang9 hoffmang9 closed this Feb 5, 2026
@hoffmang9 hoffmang9 deleted the arm-vdf-fallback branch February 5, 2026 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant