Skip to content

Commit 394482a

Browse files
AztecBotiakovenkos
andauthored
fix(bb): unaligned SIMD store in pippenger_constantine tests to stop debug-build segfault (#23847)
## Problem The nightly barretenberg **debug** build has been failing (aztec-claude run [26935061960](https://github.com/AztecProtocol/aztec-claude/actions/runs/26935061960); same failure in aztec-packages runs #105/#106). The build dies with `exit status 139` (SIGSEGV) on: ``` FAILED ... ecc_tests PippengerConstantine.SimdX4MatchesScalarPathLanewise (code: 139) [ RUN ] PippengerConstantine.SimdX4MatchesScalarPathLanewise timeout: the monitored command dumped core ``` ## Root cause In `barretenberg/cpp/src/barretenberg/ecc/scalar_multiplication/pippenger_constantine.hpp`, `simd_u32x4_store` writes the result vector with: ```cpp *reinterpret_cast<SimdU32x4*>(dst) = v; ``` `SimdU32x4` is `uint32_t __attribute__((vector_size(16)))`, which carries **16-byte alignment**, so this is an *aligned* 128-bit store. But `dst` is an arbitrary `uint32_t*` — the test and fuzzer pass a stack `std::array<uint32_t, 4>` (4-byte aligned). At `-O0` (debug) the store lowers to an alignment-requiring `movaps`/`movdqa` and faults whenever `dst` is not 16-byte aligned. This only surfaces in the **debug** nightly: the helper is `[[gnu::always_inline]]`, so at `-O2` SROA promotes the local `out` array into registers and the memory store is elided — which is why the full (release) CI is green while the debug build segfaults. The SIMD x4 helpers are currently consumed only by the unit test and fuzzer (not yet wired into the MSM hot loop), so the blast radius is the test/fuzzer. ## Fix Store via `__builtin_memcpy`, which has no alignment precondition and lowers to the intended unaligned `movdqu` / NEON `st1` (the WASM `wasm_v128_store` path is unchanged). This matches the helper's documented intent. ## Verification (red/green, debug preset) Built `ecc_tests` with the `debug` CMake preset (`build-debug`, `-O0 -D_GLIBCXX_DEBUG`), matching the nightly: - **Without the fix:** `PippengerConstantine.SimdX4MatchesScalarPathLanewise` → exit **139** (SIGSEGV), reproducing the nightly. - **With the fix:** all 6 `PippengerConstantine.*` tests pass. A standalone repro confirmed the mechanism independently: the aligned store to a 4-byte-aligned destination segfaults at `-O0`; the `memcpy` form stores correctly. --- *Created by [claudebox](https://claudebox.work/v2/sessions/cadf49316638602b) · group: `slackbot`* --------- Co-authored-by: iakovenkos <sergey.s.yakovenko@gmail.com>
1 parent b82384a commit 394482a

3 files changed

Lines changed: 5 additions & 7 deletions

File tree

barretenberg/cpp/src/barretenberg/ecc/scalar_multiplication/pippenger_constantine.fuzzer.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)
147147
}
148148

149149
// Check 2: SIMD x4 path agrees with scalar path lane-by-lane.
150-
std::array<uint32_t, 4> simd_out{};
150+
alignas(16) std::array<uint32_t, 4> simd_out{};
151151
production_simd(scalars, bit_offset, window_bits, simd_out);
152152
for (size_t lane = 0; lane < 4; ++lane) {
153153
const uint32_t want = production_scalar(scalars[lane].data(), bit_offset, window_bits);

barretenberg/cpp/src/barretenberg/ecc/scalar_multiplication/pippenger_constantine.hpp

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -211,11 +211,9 @@ struct ConstantineSliceParamsU32 {
211211
}
212212

213213
// Store a `SimdU32x4` to a 4-lane uint32 destination as a single 128-bit op.
214-
// On WASM the explicit `wasm_v128_store` is used because earlier codegen for
215-
// the equivalent struct-wrapper assignment was observed to round-trip the
216-
// vector through 4 scalar memory slots; the intrinsic guarantees the
217-
// `i32x4.store` opcode. On native the `vector_size` store lowers directly to
218-
// SSE2 `movdqu` / NEON `st1`.
214+
// Precondition: `dst` is 16-byte aligned.
215+
// On WASM the explicit intrinsic guarantees a `v128.store`; on native the typed
216+
// vector store lets the compiler use aligned SIMD stores (e.g. x86 movaps/movdqa).
219217
[[gnu::always_inline]] inline void simd_u32x4_store(uint32_t* dst, SimdU32x4 v) noexcept
220218
{
221219
#ifdef __wasm_simd128__

barretenberg/cpp/src/barretenberg/ecc/scalar_multiplication/pippenger_constantine.test.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ TEST(PippengerConstantine, SimdX4MatchesScalarPathLanewise)
207207
std::array<std::array<uint64_t, NUM_LIMBS_U64>, 4> scalars{
208208
random_scalar_limbs(), random_scalar_limbs(), random_scalar_limbs(), random_scalar_limbs()
209209
};
210-
std::array<uint32_t, 4> got_simd{};
210+
alignas(16) std::array<uint32_t, 4> got_simd{};
211211
production_simd_path(scalars.data(), bit_offset, window_bits, got_simd.data());
212212
for (size_t lane = 0; lane < 4; ++lane) {
213213
const uint32_t want = production_scalar_path(scalars[lane].data(), bit_offset, window_bits);

0 commit comments

Comments
 (0)