fix(bb): clamp BatchMergeProver degree-check loop to fix nightly debug SIGABRT#22976
Closed
AztecBot wants to merge 1 commit into
Closed
fix(bb): clamp BatchMergeProver degree-check loop to fix nightly debug SIGABRT#22976AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
federicobarbacovi
added a commit
that referenced
this pull request
May 7, 2026
…g SIGABRT (#23019) ## Summary Unblocks the `Nightly Debug Build` workflow, which has been failing again on `next` after #22937 landed. With the `gemini_masking_poly` fix in #22937 the dsl_tests abort site is gone; the next abort site is in `BatchMergeProver`, hit by `BatchMergeTests/{2,3}.TooManySubtablesFails`. Most recent failing run: https://github.com/AztecProtocol/aztec-packages/actions/runs/25478845904 (b30fe8f, exit 134, 7m46s). ## Root cause `TooManySubtablesFails` deliberately calls `BB_DISABLE_ASSERTS()` so it can drive the verifier with `N = max_subtables + 1` subtables. With BB asserts demoted to warnings, control flow falls past the `BB_ASSERT_LTE(N, M, ...)` guard at the top of `BatchMergeProver::construct_proof` and reaches `compute_degree_check_polynomial`, which iterates over `flattened_columns.size()` (sized for `N`) but indexes `degree_check_challenges` (sized for `(M+1) * NUM_WIRES`). In release the over-read returns garbage that the verifier rejects — exactly what the test wants. In debug, libstdc++'s `_GLIBCXX_DEBUG` (set by the `debug` preset's `CXXFLAGS`) makes `std::vector::operator[]` bounds-checked and the OOB at `idx == 40` traps: ``` Error: attempt to subscript container with out-of-bounds index 40, but container only holds 40 elements. timeout: the monitored command dumped core ``` `BB_DISABLE_ASSERTS()` can't suppress this — the bounds check is libstdc++'s, not bb's. ## Fix Clamp the loop to `min(flattened_columns.size(), degree_check_challenges.size())`. Normal paths are unaffected because both sizes equal `(M+1) * NUM_WIRES` whenever `N <= M` (the assert holds). The misuse path now produces a partial degree-check polynomial that the verifier still rejects in both Debug and Release — same observable behaviour, no UB. ## Verification Reproduced inside the ClaudeBox container on `b30fe8f401` with the same flags as the nightly: ```bash cd barretenberg/cpp rm -rf build-debug CXXFLAGS="-gdwarf-4 -D_GLIBCXX_DEBUG" CC=clang-20 CXX=clang++-20 \ cmake -S . -B build-debug -G Ninja \ -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=OFF -DDISABLE_ASM=OFF -DENABLE_STACKTRACES=ON cd build-debug && ninja goblin_tests ./bin/goblin_tests --gtest_filter='*TooManySubtablesFails*' ``` - Pre-fix: aborts on `BatchMergeTests/2.TooManySubtablesFails` with the OOB shown above (exit 134). - Post-fix: 2 passed, 2 skipped (the native-curve cases self-skip). Full `goblin_tests`: 68 passed / 7 skipped / 0 failed. ## Relationship to other PRs - #22937 (merged) fixed the prior abort site (`gemini_masking_poly` virtual size). Without it the build never reached `BatchMergeProver`; with it, this is the next abort site. - #22976 is a draft from an earlier ClaudeBox session carrying the same one-file change against an older base. This PR re-opens the fix on top of current `next` (`b30fe8f401`) and is ready for review so it doesn't sit silently in draft. Once this merges, #22976 can be closed. - #22811 (BB_DISABLE_ASSERTS workaround for `HonkRecursionConstraintTestWithoutPredicate.GenerateVKFromConstraints`) is obsoleted by #22937 and intentionally not included. Detailed analysis: https://gist.github.com/AztecBot/3ea741b9545337f7a3adcfb60b088378 ClaudeBox log: https://claudebox.work/s/6f2f86264b1de77e?run=1 ClaudeBox log: https://claudebox.work/s/6f2f86264b1de77e?run=1 --------- Co-authored-by: federicobarbacovi <171914500+federicobarbacovi@users.noreply.github.com>
federicobarbacovi
added a commit
that referenced
this pull request
May 8, 2026
…g SIGABRT (#23077) Fix BatchMergeTests/*.TooManySubtablesFails by making the code use the TweakableBatchMergeProver. ## Summary Unblocks the [Nightly Debug Build](https://github.com/AztecProtocol/aztec-packages/actions/runs/25539203050) on `next`. With `_GLIBCXX_DEBUG` enabled, `BatchMergeTests/2.TooManySubtablesFails` aborts (exit 134) with: ``` Error: attempt to subscript container with out-of-bounds index 40, but container only holds 40 elements. ``` ## Root cause `TooManySubtablesFails` uses `BB_DISABLE_ASSERTS()` so it can drive the verifier with `N = max_subtables + 1` subtables. With BB asserts demoted to warnings, control flow falls past the `BB_ASSERT_LTE(N, M, ...)` guard at the top of `BatchMergeProver::construct_proof` and reaches `compute_degree_check_polynomial`, which iterates over `flattened_columns.size()` (sized for `N`) but indexes `degree_check_challenges` (sized for `(M+1) * NUM_WIRES`). In Release the over-read returns garbage that the verifier rejects — exactly what the test wants. In Debug, libstdc++'s `_GLIBCXX_DEBUG` (set by the `debug` preset's `CXXFLAGS`) bounds-checks `std::vector::operator[]` and traps. `BB_DISABLE_ASSERTS()` cannot suppress this — the bounds check is libstdc++'s, not bb's. ## Why the previously-merged fix didn't fix anything PR #23019 (already merged into `merge-train/barretenberg` via the open #23025) was *titled* the same as this PR but only modified `TweakableBatchMergeProver::construct_proof`. The actual `TooManySubtablesFails` test path uses `prove_and_verify` with the default `fault_mode = FaultMode::NONE`, which routes through `BatchMergeProver` (the base class), not `TweakableBatchMergeProver`. I confirmed this in this container by building `goblin_tests` against `origin/merge-train/barretenberg` HEAD with the debug preset — the same OOB still aborts. The merge train cannot deliver this fix without a real prover-side change. ## Fix Clamp the loop to `min(flattened_columns.size(), degree_check_challenges.size())` inside `BatchMergeProver::compute_degree_check_polynomial`. Normal paths are unaffected because both sizes equal `(M+1) * NUM_WIRES` whenever `N <= M` (the assert holds). The misuse path now produces a partial degree-check polynomial that the verifier still rejects in both Debug and Release — same observable behaviour, no UB. This is the same one-file change as the open draft #22976; opening this fresh PR against current `next` so it doesn't sit silently in draft. ## Verification Reproduced inside the ClaudeBox container on `f23aa82c52` (current `next` HEAD) with the same flags as the nightly: ```bash cd barretenberg/cpp HOME=/tmp cmake --preset debug -DAVM_TRANSPILER_LIB= cd build-debug && ninja goblin_tests ./bin/goblin_tests --gtest_filter='*TooManySubtablesFails*' ``` - Pre-fix: aborts on `BatchMergeTests/2.TooManySubtablesFails` with the OOB shown above (exit 134). Confirmed identical abort on `origin/merge-train/barretenberg` HEAD. - Post-fix: 2 passed, 2 skipped (the native-curve cases self-skip). Full `goblin_tests` post-fix: **68 passed / 7 skipped / 0 failed** (342s). ## Related PRs - #22937 (merged) — fixed the prior abort site (`gemini_masking_poly` virtual size). - #23019 (merged into merge-train/barretenberg via open #23025) — same title, but the diff only modified `TweakableBatchMergeProver`, which is unused on this test path; it is a no-op for the nightly. - #22976 (open draft) — same one-file prover clamp as this PR, against an older base. Detailed analysis: https://gist.github.com/AztecBot/7be72c96a1d3d18458dce92a828116a2 ClaudeBox log: https://claudebox.work/s/8ad866e315acbe92?run=1 ClaudeBox log: https://claudebox.work/s/8ad866e315acbe92?run=1 --------- Co-authored-by: federicobarbacovi <171914500+federicobarbacovi@users.noreply.github.com>
Collaborator
Author
|
This issue was automatically closed because it was referenced in PR #23019 which has been merged to the default branch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the SIGABRT (exit code 134) that the Nightly Debug Build hits in
goblin_tests. Complementary to #22811 (which workarounds the ZKvirtual_paddingissue indsl_tests); both are needed to fully unblock the nightly.BatchMergeTests/{2,3}.TooManySubtablesFailsdeliberately callsBB_DISABLE_ASSERTS()to test verifier rejection when the prover sendsN = max_subtables + 1subtables. With assertions promoted to warnings, control flow falls past theBB_ASSERT_LTE(N, M, ...)guard at the top ofBatchMergeProver::construct_proofand lands incompute_degree_check_polynomial, which iterates overflattened_columns.size()(sized for N = 10) but indexesdegree_check_challenges(sized for M+1 = 10 →(M+1)*NUM_WIRES = 40).In Release the over-read returns garbage that the verifier rejects — exactly what the test wants. In Debug, libstdc++'s
_GLIBCXX_DEBUGtraps thevector::operator[]:This is unsolvable by
BB_DISABLE_ASSERTS()because the bounds check is libstdc++'s, not bb's. The fix has to actually stop the OOB.Fix
Clamp the loop to
min(flattened_columns.size(), degree_check_challenges.size()). Normal paths are unaffected becauseflattened_columns.size() == degree_check_challenges.size() == (M+1) * NUM_WIRESwheneverN ≤ M(the assert holds). The misuse path now produces a partial polynomial that the verifier rejects in both Debug and Release — same observable behaviour, no UB.Verification
Reproduced inside the ClaudeBox container with the same flags as the nightly:
cd barretenberg/cpp cmake --preset debug -DAVM_TRANSPILER_LIB= cmake --build --preset debug ./build-debug/bin/goblin_testsBatchMergeTests/2.TooManySubtablesFails(exit 134, dumps core).Background
Last green nightly: 2026-04-21 (sha
e26ff3a16). First red: 2026-04-22. Most recent failure: run 25360353769. The nightly's GH Actions log is admin-only, so I reproduced the build locally to find the abort site.Full triage (including the bisected timeline, the related ZK
virtual_paddingissue covered by #22811, and notes on debugging the EC2-side build): https://gist.github.com/AztecBot/98978cbb1ed448c665b6de0bbc72cbc1ClaudeBox log: https://claudebox.work/s/4082a6c0c166e9fb?run=1
ClaudeBox log: https://claudebox.work/s/4082a6c0c166e9fb?run=1