fix(bb): allocate gemini masking poly with virtual size == dyadic#22937
Merged
iakovenkos merged 4 commits intoMay 6, 2026
Merged
Conversation
4e8ddef to
a8d081c
Compare
iakovenkos
approved these changes
May 6, 2026
This was referenced May 7, 2026
federicobarbacovi
added a commit
that referenced
this pull request
May 7, 2026
…g SIGABRT (#23019) ## Summary Unblocks the `Nightly Debug Build` workflow, which has been failing again on `next` after #22937 landed. With the `gemini_masking_poly` fix in #22937 the dsl_tests abort site is gone; the next abort site is in `BatchMergeProver`, hit by `BatchMergeTests/{2,3}.TooManySubtablesFails`. Most recent failing run: https://github.com/AztecProtocol/aztec-packages/actions/runs/25478845904 (b30fe8f, exit 134, 7m46s). ## Root cause `TooManySubtablesFails` deliberately calls `BB_DISABLE_ASSERTS()` so it can drive the verifier with `N = max_subtables + 1` subtables. With BB asserts demoted to warnings, control flow falls past the `BB_ASSERT_LTE(N, M, ...)` guard at the top of `BatchMergeProver::construct_proof` and reaches `compute_degree_check_polynomial`, which iterates over `flattened_columns.size()` (sized for `N`) but indexes `degree_check_challenges` (sized for `(M+1) * NUM_WIRES`). In release the over-read returns garbage that the verifier rejects — exactly what the test wants. In debug, libstdc++'s `_GLIBCXX_DEBUG` (set by the `debug` preset's `CXXFLAGS`) makes `std::vector::operator[]` bounds-checked and the OOB at `idx == 40` traps: ``` Error: attempt to subscript container with out-of-bounds index 40, but container only holds 40 elements. timeout: the monitored command dumped core ``` `BB_DISABLE_ASSERTS()` can't suppress this — the bounds check is libstdc++'s, not bb's. ## Fix Clamp the loop to `min(flattened_columns.size(), degree_check_challenges.size())`. Normal paths are unaffected because both sizes equal `(M+1) * NUM_WIRES` whenever `N <= M` (the assert holds). The misuse path now produces a partial degree-check polynomial that the verifier still rejects in both Debug and Release — same observable behaviour, no UB. ## Verification Reproduced inside the ClaudeBox container on `b30fe8f401` with the same flags as the nightly: ```bash cd barretenberg/cpp rm -rf build-debug CXXFLAGS="-gdwarf-4 -D_GLIBCXX_DEBUG" CC=clang-20 CXX=clang++-20 \ cmake -S . -B build-debug -G Ninja \ -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=OFF -DDISABLE_ASM=OFF -DENABLE_STACKTRACES=ON cd build-debug && ninja goblin_tests ./bin/goblin_tests --gtest_filter='*TooManySubtablesFails*' ``` - Pre-fix: aborts on `BatchMergeTests/2.TooManySubtablesFails` with the OOB shown above (exit 134). - Post-fix: 2 passed, 2 skipped (the native-curve cases self-skip). Full `goblin_tests`: 68 passed / 7 skipped / 0 failed. ## Relationship to other PRs - #22937 (merged) fixed the prior abort site (`gemini_masking_poly` virtual size). Without it the build never reached `BatchMergeProver`; with it, this is the next abort site. - #22976 is a draft from an earlier ClaudeBox session carrying the same one-file change against an older base. This PR re-opens the fix on top of current `next` (`b30fe8f401`) and is ready for review so it doesn't sit silently in draft. Once this merges, #22976 can be closed. - #22811 (BB_DISABLE_ASSERTS workaround for `HonkRecursionConstraintTestWithoutPredicate.GenerateVKFromConstraints`) is obsoleted by #22937 and intentionally not included. Detailed analysis: https://gist.github.com/AztecBot/3ea741b9545337f7a3adcfb60b088378 ClaudeBox log: https://claudebox.work/s/6f2f86264b1de77e?run=1 ClaudeBox log: https://claudebox.work/s/6f2f86264b1de77e?run=1 --------- Co-authored-by: federicobarbacovi <171914500+federicobarbacovi@users.noreply.github.com>
federicobarbacovi
added a commit
that referenced
this pull request
May 8, 2026
…g SIGABRT (#23077) Fix BatchMergeTests/*.TooManySubtablesFails by making the code use the TweakableBatchMergeProver. ## Summary Unblocks the [Nightly Debug Build](https://github.com/AztecProtocol/aztec-packages/actions/runs/25539203050) on `next`. With `_GLIBCXX_DEBUG` enabled, `BatchMergeTests/2.TooManySubtablesFails` aborts (exit 134) with: ``` Error: attempt to subscript container with out-of-bounds index 40, but container only holds 40 elements. ``` ## Root cause `TooManySubtablesFails` uses `BB_DISABLE_ASSERTS()` so it can drive the verifier with `N = max_subtables + 1` subtables. With BB asserts demoted to warnings, control flow falls past the `BB_ASSERT_LTE(N, M, ...)` guard at the top of `BatchMergeProver::construct_proof` and reaches `compute_degree_check_polynomial`, which iterates over `flattened_columns.size()` (sized for `N`) but indexes `degree_check_challenges` (sized for `(M+1) * NUM_WIRES`). In Release the over-read returns garbage that the verifier rejects — exactly what the test wants. In Debug, libstdc++'s `_GLIBCXX_DEBUG` (set by the `debug` preset's `CXXFLAGS`) bounds-checks `std::vector::operator[]` and traps. `BB_DISABLE_ASSERTS()` cannot suppress this — the bounds check is libstdc++'s, not bb's. ## Why the previously-merged fix didn't fix anything PR #23019 (already merged into `merge-train/barretenberg` via the open #23025) was *titled* the same as this PR but only modified `TweakableBatchMergeProver::construct_proof`. The actual `TooManySubtablesFails` test path uses `prove_and_verify` with the default `fault_mode = FaultMode::NONE`, which routes through `BatchMergeProver` (the base class), not `TweakableBatchMergeProver`. I confirmed this in this container by building `goblin_tests` against `origin/merge-train/barretenberg` HEAD with the debug preset — the same OOB still aborts. The merge train cannot deliver this fix without a real prover-side change. ## Fix Clamp the loop to `min(flattened_columns.size(), degree_check_challenges.size())` inside `BatchMergeProver::compute_degree_check_polynomial`. Normal paths are unaffected because both sizes equal `(M+1) * NUM_WIRES` whenever `N <= M` (the assert holds). The misuse path now produces a partial degree-check polynomial that the verifier still rejects in both Debug and Release — same observable behaviour, no UB. This is the same one-file change as the open draft #22976; opening this fresh PR against current `next` so it doesn't sit silently in draft. ## Verification Reproduced inside the ClaudeBox container on `f23aa82c52` (current `next` HEAD) with the same flags as the nightly: ```bash cd barretenberg/cpp HOME=/tmp cmake --preset debug -DAVM_TRANSPILER_LIB= cd build-debug && ninja goblin_tests ./bin/goblin_tests --gtest_filter='*TooManySubtablesFails*' ``` - Pre-fix: aborts on `BatchMergeTests/2.TooManySubtablesFails` with the OOB shown above (exit 134). Confirmed identical abort on `origin/merge-train/barretenberg` HEAD. - Post-fix: 2 passed, 2 skipped (the native-curve cases self-skip). Full `goblin_tests` post-fix: **68 passed / 7 skipped / 0 failed** (342s). ## Related PRs - #22937 (merged) — fixed the prior abort site (`gemini_masking_poly` virtual size). - #23019 (merged into merge-train/barretenberg via open #23025) — same title, but the diff only modified `TweakableBatchMergeProver`, which is unused on this test path; it is a no-op for the nightly. - #22976 (open draft) — same one-file prover clamp as this PR, against an older base. Detailed analysis: https://gist.github.com/AztecBot/7be72c96a1d3d18458dce92a828116a2 ClaudeBox log: https://claudebox.work/s/8ad866e315acbe92?run=1 ClaudeBox log: https://claudebox.work/s/8ad866e315acbe92?run=1 --------- Co-authored-by: federicobarbacovi <171914500+federicobarbacovi@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Nightly debug build was failing again because
OinkProver::commit_to_masking_polyallocatesgemini_masking_polywith sizemax_end_index(). When that value is odd, sumcheck's pairwise iteration over(edge_idx, edge_idx + 1)reads one element past the polynomial —compute_effective_round_sizeitself rounds the iteration bound up to even, but the masking polynomial wasn't padded to match. In release builds this is silent UB; in debug, the bounds-checked accessor insidePolynomial::operator[]trips and the build fails very early (~2 minutes wall-clock, consistent with the failing job duration).Fix: round the masking polynomial's allocation up to even at construction so the layout matches what sumcheck assumes. The change only ever enlarges the polynomial by at most one element and never affects non-ZK flavors (gated by
flavor_has_gemini_masking<Flavor>()).Recurrence
This is the fifth independent claudebox session converging on this exact patch over the past four days; the prior four (
claudebox/fix-bb-debug-nightly2026-05-01,claudebox/fix-nightly-bb-debug-build2026-05-03, PR #22918 2026-05-04, plus an earlier branch) were not merged, so the bug recurred each night. The diff here is byte-for-byte identical to the closed PR #22918.Most recent visible failed run: https://github.com/AztecProtocol/aztec-packages/actions/runs/25303760883
Verification
cmake --preset debugconfigures cleanly.ninja ultra_honk_testsbuilds cleanly in thedebugpreset../bin/ultra_honk_tests --gtest_filter='*Gemini*:*ZK*:*Mask*'— 28 passed, 6 skipped (the 6 skips are flavor-gated, not regressions).Detailed analysis: https://gist.github.com/AztecBot/97de8e254a5df2dad90f895be7d28f08
ClaudeBox log: https://claudebox.work/s/bd19299558fa14bc?run=1
ClaudeBox log: https://claudebox.work/s/bd19299558fa14bc?run=1