Skip to content

fix(bb): clamp BatchMergeProver degree-check loop to fix nightly debug SIGABRT#22976

Closed
AztecBot wants to merge 1 commit into
nextfrom
claudebox/fix-bb-debug-batch-merge-oob
Closed

fix(bb): clamp BatchMergeProver degree-check loop to fix nightly debug SIGABRT#22976
AztecBot wants to merge 1 commit into
nextfrom
claudebox/fix-bb-debug-batch-merge-oob

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

@AztecBot AztecBot commented May 6, 2026

Summary

Fixes the SIGABRT (exit code 134) that the Nightly Debug Build hits in goblin_tests. Complementary to #22811 (which workarounds the ZK virtual_padding issue in dsl_tests); both are needed to fully unblock the nightly.

BatchMergeTests/{2,3}.TooManySubtablesFails deliberately calls BB_DISABLE_ASSERTS() to test verifier rejection when the prover sends N = max_subtables + 1 subtables. With assertions promoted to warnings, control flow falls past the BB_ASSERT_LTE(N, M, ...) guard at the top of BatchMergeProver::construct_proof and lands in compute_degree_check_polynomial, which iterates over flattened_columns.size() (sized for N = 10) but indexes degree_check_challenges (sized for M+1 = 10 → (M+1)*NUM_WIRES = 40).

In Release the over-read returns garbage that the verifier rejects — exactly what the test wants. In Debug, libstdc++'s _GLIBCXX_DEBUG traps the vector::operator[]:

Error: attempt to subscript container with out-of-bounds index 40,
       but container only holds 40 elements.
timeout: the monitored command dumped core

This is unsolvable by BB_DISABLE_ASSERTS() because the bounds check is libstdc++'s, not bb's. The fix has to actually stop the OOB.

Fix

Clamp the loop to min(flattened_columns.size(), degree_check_challenges.size()). Normal paths are unaffected because flattened_columns.size() == degree_check_challenges.size() == (M+1) * NUM_WIRES whenever N ≤ M (the assert holds). The misuse path now produces a partial polynomial that the verifier rejects in both Debug and Release — same observable behaviour, no UB.

Verification

Reproduced inside the ClaudeBox container with the same flags as the nightly:

cd barretenberg/cpp
cmake --preset debug -DAVM_TRANSPILER_LIB=
cmake --build --preset debug
./build-debug/bin/goblin_tests
  • Pre-fix: aborts on BatchMergeTests/2.TooManySubtablesFails (exit 134, dumps core).
  • Post-fix: 68 passed / 7 skipped / 0 failed.

Background

Last green nightly: 2026-04-21 (sha e26ff3a16). First red: 2026-04-22. Most recent failure: run 25360353769. The nightly's GH Actions log is admin-only, so I reproduced the build locally to find the abort site.

Full triage (including the bisected timeline, the related ZK virtual_padding issue covered by #22811, and notes on debugging the EC2-side build): https://gist.github.com/AztecBot/98978cbb1ed448c665b6de0bbc72cbc1

ClaudeBox log: https://claudebox.work/s/4082a6c0c166e9fb?run=1

ClaudeBox log: https://claudebox.work/s/4082a6c0c166e9fb?run=1

@AztecBot AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels May 6, 2026
federicobarbacovi added a commit that referenced this pull request May 7, 2026
…g SIGABRT (#23019)

## Summary

Unblocks the `Nightly Debug Build` workflow, which has been failing
again on `next` after #22937 landed. With the `gemini_masking_poly` fix
in #22937 the dsl_tests abort site is gone; the next abort site is in
`BatchMergeProver`, hit by
`BatchMergeTests/{2,3}.TooManySubtablesFails`.

Most recent failing run:
https://github.com/AztecProtocol/aztec-packages/actions/runs/25478845904
(b30fe8f, exit 134, 7m46s).

## Root cause

`TooManySubtablesFails` deliberately calls `BB_DISABLE_ASSERTS()` so it
can drive the verifier with `N = max_subtables + 1` subtables. With BB
asserts demoted to warnings, control flow falls past the
`BB_ASSERT_LTE(N, M, ...)` guard at the top of
`BatchMergeProver::construct_proof` and reaches
`compute_degree_check_polynomial`, which iterates over
`flattened_columns.size()` (sized for `N`) but indexes
`degree_check_challenges` (sized for `(M+1) * NUM_WIRES`).

In release the over-read returns garbage that the verifier rejects —
exactly what the test wants. In debug, libstdc++'s `_GLIBCXX_DEBUG` (set
by the `debug` preset's `CXXFLAGS`) makes `std::vector::operator[]`
bounds-checked and the OOB at `idx == 40` traps:

```
Error: attempt to subscript container with out-of-bounds index 40,
       but container only holds 40 elements.
timeout: the monitored command dumped core
```

`BB_DISABLE_ASSERTS()` can't suppress this — the bounds check is
libstdc++'s, not bb's.

## Fix

Clamp the loop to `min(flattened_columns.size(),
degree_check_challenges.size())`. Normal paths are unaffected because
both sizes equal `(M+1) * NUM_WIRES` whenever `N <= M` (the assert
holds). The misuse path now produces a partial degree-check polynomial
that the verifier still rejects in both Debug and Release — same
observable behaviour, no UB.

## Verification

Reproduced inside the ClaudeBox container on `b30fe8f401` with the same
flags as the nightly:

```bash
cd barretenberg/cpp
rm -rf build-debug
CXXFLAGS="-gdwarf-4 -D_GLIBCXX_DEBUG" CC=clang-20 CXX=clang++-20 \
  cmake -S . -B build-debug -G Ninja \
        -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=OFF -DDISABLE_ASM=OFF -DENABLE_STACKTRACES=ON
cd build-debug && ninja goblin_tests
./bin/goblin_tests --gtest_filter='*TooManySubtablesFails*'
```

- Pre-fix: aborts on `BatchMergeTests/2.TooManySubtablesFails` with the
OOB shown above (exit 134).
- Post-fix: 2 passed, 2 skipped (the native-curve cases self-skip).

Full `goblin_tests`: 68 passed / 7 skipped / 0 failed.

## Relationship to other PRs

- #22937 (merged) fixed the prior abort site (`gemini_masking_poly`
virtual size). Without it the build never reached `BatchMergeProver`;
with it, this is the next abort site.
- #22976 is a draft from an earlier ClaudeBox session carrying the same
one-file change against an older base. This PR re-opens the fix on top
of current `next` (`b30fe8f401`) and is ready for review so it doesn't
sit silently in draft. Once this merges, #22976 can be closed.
- #22811 (BB_DISABLE_ASSERTS workaround for
`HonkRecursionConstraintTestWithoutPredicate.GenerateVKFromConstraints`)
is obsoleted by #22937 and intentionally not included.

Detailed analysis:
https://gist.github.com/AztecBot/3ea741b9545337f7a3adcfb60b088378

ClaudeBox log: https://claudebox.work/s/6f2f86264b1de77e?run=1


ClaudeBox log: https://claudebox.work/s/6f2f86264b1de77e?run=1

---------

Co-authored-by: federicobarbacovi <171914500+federicobarbacovi@users.noreply.github.com>
federicobarbacovi added a commit that referenced this pull request May 8, 2026
…g SIGABRT (#23077)

Fix BatchMergeTests/*.TooManySubtablesFails by making the code use the TweakableBatchMergeProver.

## Summary

Unblocks the [Nightly Debug
Build](https://github.com/AztecProtocol/aztec-packages/actions/runs/25539203050)
on `next`. With `_GLIBCXX_DEBUG` enabled,
`BatchMergeTests/2.TooManySubtablesFails` aborts (exit 134) with:

```
Error: attempt to subscript container with out-of-bounds index 40,
       but container only holds 40 elements.
```

## Root cause

`TooManySubtablesFails` uses `BB_DISABLE_ASSERTS()` so it can drive the
verifier with `N = max_subtables + 1` subtables. With BB asserts demoted
to warnings, control flow falls past the `BB_ASSERT_LTE(N, M, ...)`
guard at the top of `BatchMergeProver::construct_proof` and reaches
`compute_degree_check_polynomial`, which iterates over
`flattened_columns.size()` (sized for `N`) but indexes
`degree_check_challenges` (sized for `(M+1) * NUM_WIRES`). In Release
the over-read returns garbage that the verifier rejects — exactly what
the test wants. In Debug, libstdc++'s `_GLIBCXX_DEBUG` (set by the
`debug` preset's `CXXFLAGS`) bounds-checks `std::vector::operator[]` and
traps. `BB_DISABLE_ASSERTS()` cannot suppress this — the bounds check is
libstdc++'s, not bb's.

## Why the previously-merged fix didn't fix anything

PR #23019 (already merged into `merge-train/barretenberg` via the open
#23025) was *titled* the same as this PR but only modified
`TweakableBatchMergeProver::construct_proof`. The actual
`TooManySubtablesFails` test path uses `prove_and_verify` with the
default `fault_mode = FaultMode::NONE`, which routes through
`BatchMergeProver` (the base class), not `TweakableBatchMergeProver`. I
confirmed this in this container by building `goblin_tests` against
`origin/merge-train/barretenberg` HEAD with the debug preset — the same
OOB still aborts. The merge train cannot deliver this fix without a real
prover-side change.

## Fix

Clamp the loop to `min(flattened_columns.size(),
degree_check_challenges.size())` inside
`BatchMergeProver::compute_degree_check_polynomial`. Normal paths are
unaffected because both sizes equal `(M+1) * NUM_WIRES` whenever `N <=
M` (the assert holds). The misuse path now produces a partial
degree-check polynomial that the verifier still rejects in both Debug
and Release — same observable behaviour, no UB.

This is the same one-file change as the open draft #22976; opening this
fresh PR against current `next` so it doesn't sit silently in draft.

## Verification

Reproduced inside the ClaudeBox container on `f23aa82c52` (current
`next` HEAD) with the same flags as the nightly:

```bash
cd barretenberg/cpp
HOME=/tmp cmake --preset debug -DAVM_TRANSPILER_LIB=
cd build-debug && ninja goblin_tests
./bin/goblin_tests --gtest_filter='*TooManySubtablesFails*'
```

- Pre-fix: aborts on `BatchMergeTests/2.TooManySubtablesFails` with the
OOB shown above (exit 134). Confirmed identical abort on
`origin/merge-train/barretenberg` HEAD.
- Post-fix: 2 passed, 2 skipped (the native-curve cases self-skip).

Full `goblin_tests` post-fix: **68 passed / 7 skipped / 0 failed**
(342s).

## Related PRs

- #22937 (merged) — fixed the prior abort site (`gemini_masking_poly`
virtual size).
- #23019 (merged into merge-train/barretenberg via open #23025) — same
title, but the diff only modified `TweakableBatchMergeProver`, which is
unused on this test path; it is a no-op for the nightly.
- #22976 (open draft) — same one-file prover clamp as this PR, against
an older base.

Detailed analysis:
https://gist.github.com/AztecBot/7be72c96a1d3d18458dce92a828116a2

ClaudeBox log: https://claudebox.work/s/8ad866e315acbe92?run=1

ClaudeBox log: https://claudebox.work/s/8ad866e315acbe92?run=1

---------

Co-authored-by: federicobarbacovi <171914500+federicobarbacovi@users.noreply.github.com>
@AztecBot
Copy link
Copy Markdown
Collaborator Author

AztecBot commented May 8, 2026

This issue was automatically closed because it was referenced in PR #23019 which has been merged to the default branch.

View workflow run

@AztecBot AztecBot closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant