Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
c2f6b39
Port optimized VDF GCD path to ARM (aarch64/arm64)
hoffmang9 Jan 31, 2026
eae35d1
thread_local ARM UV callback; vdf_bench include/defs at file scope
hoffmang9 Jan 31, 2026
63cf20c
Address cursor review issues and enable vdf-client testing on Macos ARM
hoffmang9 Jan 31, 2026
70d2127
Revert to minimal chagnes to compile vdf on macos-arm
hoffmang9 Feb 2, 2026
9b5751e
fix bug and add make clean to the vdf makefile
hoffmang9 Feb 2, 2026
ffcb8a6
Fix bugs found by ASAN
hoffmang9 Feb 3, 2026
edc4461
various fixes to threading etc for ARM
hoffmang9 Feb 3, 2026
acd6cb4
Fix various TSAN and ASAN surfaced issues
hoffmang9 Feb 3, 2026
4841124
conditionally check before brew installing e.g. cmake on MacOS
hoffmang9 Feb 3, 2026
2ef7bbb
take a different conditional strategy for cibuildwheel
hoffmang9 Feb 3, 2026
d9a1e65
Refactor to enhance speed and tighten restrictions on ARM
hoffmang9 Feb 3, 2026
63f27e9
Tighten tolerances and fix TSAN bug
hoffmang9 Feb 3, 2026
cfa3198
Continue optimizing on ARM
hoffmang9 Feb 4, 2026
4b322d5
consolidate setting base bits
hoffmang9 Feb 4, 2026
e4d9a85
IPS in the 90K range and lots of instrumenting
hoffmang9 Feb 4, 2026
1d7e74a
Don't automatically build fast_wrapper_test
hoffmang9 Feb 4, 2026
9a5dc61
Address TSAN issue
hoffmang9 Feb 4, 2026
15104dc
Additional uv optimizations and fixes for CI failures
hoffmang9 Feb 4, 2026
a846d1d
Fix gcc issues
hoffmang9 Feb 4, 2026
f0ec03c
Fix the rust fuzz test and add some more performance
hoffmang9 Feb 4, 2026
043a3af
Additional significant optimizations on ARM64 Mac
hoffmang9 Feb 4, 2026
0e784f3
Lets call 135.3K ips on MacOS ARM64 good enough for now
hoffmang9 Feb 4, 2026
295a88b
Final optimizations, update the README, attempt o fix long rust fuzz …
hoffmang9 Feb 4, 2026
0353c91
make entered_after_single_slow diagnostics consistent and stop overst…
hoffmang9 Feb 4, 2026
69ed50a
Add more tests for these changes
hoffmang9 Feb 4, 2026
1a7c8b6
Restore rust fuzzer
hoffmang9 Feb 4, 2026
32a2261
Update the Readme and speed up Rust fuzzing
hoffmang9 Feb 4, 2026
7ac15d7
Fix readme parsing on Windows
hoffmang9 Feb 4, 2026
832b931
get rust fuzzer running again
hoffmang9 Feb 4, 2026
965f07c
fix MacOS Intel rust build issue
hoffmang9 Feb 4, 2026
f342713
Fix transient ubuntu apt get issues
hoffmang9 Feb 4, 2026
b96d68f
fix test passes 1-byte buffer as 100-byte form
hoffmang9 Feb 4, 2026
022a5de
Fix leftover dead code path
hoffmang9 Feb 4, 2026
d0391b0
Fix possible divide by zero in metrics
hoffmang9 Feb 4, 2026
58c4865
Note the changes to prover_test in README.md
hoffmang9 Feb 4, 2026
a88eede
Fix possible hang from reading an uninitialized “iter 0” entry
hoffmang9 Feb 4, 2026
7201f58
Fix README statements about rust tests
hoffmang9 Feb 4, 2026
22e1261
NUDUPL won. remove ARM alt path and much of the instrumentation to ti…
hoffmang9 Feb 5, 2026
194ed80
Fix wheel builds
hoffmang9 Feb 5, 2026
84534b7
revert-unrelated-changes
hoffmang9 Feb 5, 2026
f58f2c1
Add a few things back in
hoffmang9 Feb 5, 2026
5e453f8
fix Windows builds, work around ubuntu runner silliness
hoffmang9 Feb 5, 2026
292dfb8
Fixed the ASAN LeakSanitizer leak
hoffmang9 Feb 5, 2026
46ef9b3
address mac cibuildwheel flakiness, bring back the short prover test …
hoffmang9 Feb 5, 2026
d15e489
fix PulmarkReducer created inside loop
hoffmang9 Feb 5, 2026
16abd8f
update README.md
hoffmang9 Feb 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,17 @@ If you're running a timelord, the following tests are available, depending of wh

Those tests will simulate the vdf_client and verify for correctness the produced proofs.

## Running tests

**C++ tests** (vdf_client, prover_test, 1weso_test, 2weso_test): build from `src/` with
`make -f Makefile.vdf-client` (requires GMP and Boost; on macOS with Homebrew, install
with `brew install gmp boost`). Then run `./1weso_test`, `./2weso_test`, `./prover_test`
as above. On **ARM** (aarch64/arm64) the fast VDF path uses a C++ GCD fallback instead of
x86 assembly, so these tests run correctly but are slower than on x86.

**Python tests**: run `pytest tests/ -v` after installing the package (e.g. `pip install -e .`).
They require the chiavdf extension to be built (CMake, GMP, pybind11).

## Contributing and workflow

Contributions are welcome and more details are available in chia-blockchain's
Expand Down
4 changes: 4 additions & 0 deletions src/1weso_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,7 @@ catch (std::exception const& e) {
std::cerr << "Exception " << e.what() << '\n';
return 1;
}

#if defined(ARCH_ARM)
#include "asm_arm_fallback_impl.inc"
#endif
4 changes: 4 additions & 0 deletions src/2weso_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,7 @@ catch (std::exception const& e) {
std::cerr << "Exception " << e.what() << '\n';
return 1;
}

#if defined(ARCH_ARM)
#include "asm_arm_fallback_impl.inc"
#endif
22 changes: 21 additions & 1 deletion src/Makefile.vdf-client
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,15 @@ LDLIBS += -lgmpxx -lgmp -pthread
CXXFLAGS += -flto -std=c++1z -D VDF_MODE=0 -D FAST_MACHINE=1 -pthread $(NOPIE) -fvisibility=hidden
ifeq ($(UNAME),Darwin)
CXXFLAGS += -D CHIAOSX=1
# Homebrew (common on macOS) installs boost/gmp to /opt/homebrew or /usr/local
ifneq ($(wildcard /opt/homebrew/include/boost/asio.hpp),)
CXXFLAGS += -I/opt/homebrew/include
LDFLAGS += -L/opt/homebrew/lib
endif
ifneq ($(wildcard /usr/local/include/boost/asio.hpp),)
CXXFLAGS += -I/usr/local/include
LDFLAGS += -L/usr/local/lib
endif
endif

OPT_CFLAGS = -O3 -g
Expand All @@ -27,13 +36,24 @@ endif

.PHONY: all clean

ARCH := $(shell uname -m)
# aarch64 (Linux), arm64 (macOS Apple Silicon)
ifneq ($(filter $(ARCH),aarch64 arm64),)
# ARM: use C++ fallback for GCD (no x86 assembly); implementation is in each BIN via .inc
ASM_OBJS =
CXXFLAGS += -D ARCH_ARM
else
# x86: use hand-tuned assembly
ASM_OBJS = asm_compiled.o avx2_asm_compiled.o avx512_asm_compiled.o
endif

BINS = vdf_client prover_test 1weso_test 2weso_test vdf_bench
all: $(BINS)

clean:
rm -f *.o hw/*.o $(BINS) compile_asm emu_hw_test hw_test hw_vdf_client emu_hw_vdf_client

$(BINS) avx512_test: %: %.o lzcnt.o asm_compiled.o avx2_asm_compiled.o avx512_asm_compiled.o
$(BINS) avx512_test: %: %.o lzcnt.o $(ASM_OBJS)
$(CXX) $(LDFLAGS) -o $@ $^ $(LDLIBS)

$(addsuffix .o,$(BINS)) avx512_test.o: CXXFLAGS += $(OPT_CFLAGS)
Expand Down
95 changes: 95 additions & 0 deletions src/asm_arm_fallback_impl.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
/*
* ARM C++ fallback implementation. Include this from exactly one .cpp per binary
* when ARCH_ARM (after vdf.h so all types are available).
*/
#if defined(ARCH_ARM)

thread_local void (*gcd_unsigned_arm_uv_callback)(int index, const array<array<uint64, 2>, 2>& uv, int parity) = nullptr;

namespace {

thread_local asm_code::asm_func_gcd_unsigned_data* g_arm_gcd_data = nullptr;
thread_local int g_arm_iter_count = 0;

void arm_uv_callback(int index, const array<array<uint64, 2>, 2>& uv, int parity) {
g_arm_iter_count = index + 1;
asm_code::asm_func_gcd_unsigned_data* data = g_arm_gcd_data;
if (!data || !data->out_uv_addr) return;

uint64* entry = data->out_uv_addr + index * 8;
entry[0] = uv[0][0];
entry[1] = uv[1][0];
entry[2] = uv[0][1];
entry[3] = uv[1][1];
entry[4] = static_cast<uint64>(parity);
entry[5] = 0;
entry[6] = 0;
entry[7] = 0;
}

void load_limbs_to_fixed_integer(uint64* limbs, fixed_integer<uint64, gcd_size>& out) {
integer tmp;
mpz_import(tmp.impl, gcd_size, -1, sizeof(uint64), -1, 0, limbs);
out = fixed_integer<uint64, gcd_size>(tmp);
}

void store_fixed_integer_to_limbs(const fixed_integer<uint64, gcd_size>& in, uint64* limbs) {
for (int i = 0; i < gcd_size; i++) {
limbs[i] = in[i];
}
}

} // namespace

extern "C" int asm_arm_func_gcd_unsigned(asm_code::asm_func_gcd_unsigned_data* data) {
assert(data != nullptr);
assert(data->a != nullptr && data->b != nullptr);
assert(data->a_2 != nullptr && data->b_2 != nullptr);
assert(data->threshold != nullptr);

fixed_integer<uint64, gcd_size> a, b, threshold;
load_limbs_to_fixed_integer(data->a, a);
load_limbs_to_fixed_integer(data->b, b);
load_limbs_to_fixed_integer(data->threshold, threshold);

if (a < b) {
std::swap(a, b);
}
Comment thread
hoffmang9 marked this conversation as resolved.
Outdated
if (b <= threshold) {
return -1;
}

array<fixed_integer<uint64, gcd_size>, 2> ab = {a, b};
array<fixed_integer<uint64, gcd_size>, 2> uv;
uv[0] = fixed_integer<uint64, gcd_size>(integer(1));
uv[1] = fixed_integer<uint64, gcd_size>(integer(0));
int parity = 1;

g_arm_gcd_data = data;
gcd_unsigned_arm_uv_callback = arm_uv_callback;

gcd_unsigned<gcd_size>(ab, uv, parity, threshold);

gcd_unsigned_arm_uv_callback = nullptr;
int iter = g_arm_iter_count;
g_arm_gcd_data = nullptr;
g_arm_iter_count = 0;

store_fixed_integer_to_limbs(ab[0], data->a_2);
store_fixed_integer_to_limbs(ab[1], data->b_2);
store_fixed_integer_to_limbs(ab[0], data->a);
store_fixed_integer_to_limbs(ab[1], data->b);

data->iter = iter;

if (data->out_uv_counter_addr) {
*data->out_uv_counter_addr = data->uv_counter_start + iter;
Comment thread
hoffmang9 marked this conversation as resolved.
Outdated
}
if (data->out_uv_addr && iter > 0) {
data->out_uv_addr[(iter - 1) * 8 + 5] = 1;
}

return 0;
Comment thread
hoffmang9 marked this conversation as resolved.
Outdated
}

#endif
3 changes: 3 additions & 0 deletions src/asm_main.h
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,9 @@ struct asm_func_gcd_unsigned_data {

extern "C" int asm_avx2_func_gcd_unsigned(asm_func_gcd_unsigned_data* data);
extern "C" int asm_cel_func_gcd_unsigned(asm_func_gcd_unsigned_data* data);
#if defined(ARCH_ARM)
extern "C" int asm_arm_func_gcd_unsigned(asm_func_gcd_unsigned_data* data);
#endif
#ifdef COMPILE_ASM
void compile_asm_gcd_unsigned() {
EXPAND_MACROS_SCOPE_PUBLIC;
Expand Down
2 changes: 1 addition & 1 deletion src/gcd_128.h
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ bool gcd_128(
//todo break;
}

#ifdef TEST_ASM
#if defined(TEST_ASM) && !defined(ARCH_ARM)
#ifndef GENERATE_ASM_TRACKING_DATA
if (test_asm_run) {
if (test_asm_print) {
Expand Down
2 changes: 1 addition & 1 deletion src/gcd_base_continued_fractions.h
Original file line number Diff line number Diff line change
Expand Up @@ -649,7 +649,7 @@ bool gcd_base_continued_fraction(vector2& ab, matrix2& uv, bool is_lehmer, doubl

//print( " gcd_base", iter_table+iter_slow, iter_table, iter_slow );

#ifdef TEST_ASM
#if defined(TEST_ASM) && !defined(ARCH_ARM)
#ifndef GENERATE_ASM_TRACKING_DATA
if (test_asm_run) {
if (test_asm_print) {
Expand Down
12 changes: 11 additions & 1 deletion src/gcd_unsigned.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
#ifndef GCD_UNSIGNED_H
#define GCD_UNSIGNED_H

#ifdef ARCH_ARM
// Callback used by ARM fallback to capture UV matrix at each gcd_unsigned iteration (thread-local so concurrent threads do not overwrite each other).
extern thread_local void (*gcd_unsigned_arm_uv_callback)(int index, const array<array<uint64, 2>, 2>& uv, int parity);
#endif

//threshold is 0 to calculate the normal gcd
template<int size> void gcd_unsigned_slow(
array<fixed_integer<uint64, size>, 2>& ab,
Expand Down Expand Up @@ -207,6 +212,11 @@ template<int size> void gcd_unsigned(

matricies.push_back(uv_uint64);
local_parities.push_back(local_parity);
#ifdef ARCH_ARM
if (gcd_unsigned_arm_uv_callback) {
gcd_unsigned_arm_uv_callback(static_cast<int>(matricies.size()) - 1, uv_uint64, local_parity);
}
#endif
} else {
//can just make the gcd fail if this happens in the asm code
print( " gcd_unsigned slow" );
Expand Down Expand Up @@ -253,7 +263,7 @@ template<int size> void gcd_unsigned(
}
}

#ifdef TEST_ASM
#if defined(TEST_ASM) && !defined(ARCH_ARM)
if (test_asm_run) {
if (test_asm_print) {
print( "test asm gcd_unsigned", test_asm_counter );
Expand Down
2 changes: 1 addition & 1 deletion src/parameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ bool enable_avx512_ifma=false;
#define ARCH_X86
#elif defined(__x86_64__) || defined(_M_X64)
#define ARCH_X64
#elif (defined(__arm__) && defined(__ARM_ARCH) && __ARM_ARCH >= 5) || (defined(_M_ARM) && _M_ARM >= 5) || defined(__ARM_FEATURE_CLZ) /* ARM (Architecture Version 5) */
#elif defined(__aarch64__) || (defined(__arm__) && defined(__ARM_ARCH) && __ARM_ARCH >= 5) || (defined(_M_ARM) && _M_ARM >= 5) || defined(__ARM_FEATURE_CLZ) /* ARM (aarch64 or Architecture Version 5+) */
#define ARCH_ARM
#endif

Expand Down
4 changes: 4 additions & 0 deletions src/prover_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,7 @@ int main() {
delete(fast_storage);
delete(weso);
}

#if defined(ARCH_ARM)
#include "asm_arm_fallback_impl.inc"
#endif
27 changes: 19 additions & 8 deletions src/threading.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ static_assert(sizeof(unsigned long int)==8, "");
static_assert(sizeof(long int)==8, "");

static uint64 get_time_cycles() {
// Returns the time in EDX:EAX.
#if defined(ARCH_X86) || defined(ARCH_X64)
// Returns the time in EDX:EAX (x86 rdtsc).
uint64 high;
uint64 low;
asm volatile(
Expand All @@ -23,6 +24,11 @@ static uint64 get_time_cycles() {
: "=a"(low), "=d"(high) :: "memory");

return (high<<32) | low;
#elif defined(ARCH_ARM)
return static_cast<uint64>(__builtin_readcyclecounter());
#else
return 0;
#endif
}

#ifdef ENABLE_TRACK_CYCLES
Expand Down Expand Up @@ -456,12 +462,12 @@ template<int d_expected_size, int d_padded_size> struct alignas(64) mpz {
bool operator>(const mpz_struct* t) const { return mpz_cmp(*this, t)>0; }
bool operator!=(const mpz_struct* t) const { return mpz_cmp(*this, t)!=0; }

bool operator<(int64 i) const { return mpz_cmp_si(*this, i)<0; }
bool operator<=(int64 i) const { return mpz_cmp_si(*this, i)<=0; }
bool operator==(int64 i) const { return mpz_cmp_si(*this, i)==0; }
bool operator>=(int64 i) const { return mpz_cmp_si(*this, i)>=0; }
bool operator>(int64 i) const { return mpz_cmp_si(*this, i)>0; }
bool operator!=(int64 i) const { return mpz_cmp_si(*this, i)!=0; }
bool operator<(int64 i) const { return mpz_cmp_si(_(), i)<0; }
bool operator<=(int64 i) const { return mpz_cmp_si(_(), i)<=0; }
bool operator==(int64 i) const { return mpz_cmp_si(_(), i)==0; }
bool operator>=(int64 i) const { return mpz_cmp_si(_(), i)>=0; }
bool operator>(int64 i) const { return mpz_cmp_si(_(), i)>0; }
bool operator!=(int64 i) const { return mpz_cmp_si(_(), i)!=0; }

bool operator<(uint64 i) const { return mpz_cmp_ui(_(), i)<0; }
bool operator<=(uint64 i) const { return mpz_cmp_ui(_(), i)<=0; }
Expand Down Expand Up @@ -789,9 +795,14 @@ template<class mpz_type> bool gcd_unsigned(
assert((uint64(data.out_uv_addr)&63)==0); //should be cache line aligned
}

int error_code=hasAVX2()?
int error_code=
#if defined(ARCH_ARM)
asm_code::asm_arm_func_gcd_unsigned(&data);
#else
hasAVX2()?
asm_code::asm_avx2_func_gcd_unsigned(&data):
asm_code::asm_cel_func_gcd_unsigned(&data);
#endif

if (error_code!=0) {
c_thread_state.raise_error();
Expand Down
2 changes: 2 additions & 0 deletions src/vdf.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@

#include "include.h"

#if defined(ARCH_X86) || defined(ARCH_X64)
#include <x86intrin.h>
#endif

#include "parameters.h"

Expand Down
8 changes: 8 additions & 0 deletions src/vdf_bench.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include "parameters.h"
#include "asm_main.h"
#include "integer.h"
#include "vdf.h"
#include "vdf_new.h"
#include "nucomp.h"
#include "picosha2.h"
Expand All @@ -18,6 +19,9 @@

#define CH_SIZE 32

int gcd_base_bits = 50;
int gcd_128_max_iter = 3;

static void usage(const char *progname)
{
fprintf(stderr, "Usage: %s {square_asm|square|discr} N\n", progname);
Expand Down Expand Up @@ -115,3 +119,7 @@ int main(int argc, char **argv)
}
return 0;
}

#if defined(ARCH_ARM)
#include "asm_arm_fallback_impl.inc"
#endif
Comment thread
cursor[bot] marked this conversation as resolved.
Outdated
4 changes: 4 additions & 0 deletions src/vdf_client.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -352,3 +352,7 @@ catch (std::exception& e) {
std::cerr << "Exception: " << e.what() << "\n";
return 1;
}

#if defined(ARCH_ARM)
#include "asm_arm_fallback_impl.inc"
#endif
Loading