Skip to content

fix(bb): allocate gemini masking poly with virtual size == dyadic#22937

Merged
iakovenkos merged 4 commits into
merge-train/barretenbergfrom
claudebox/fix-nightly-bb-debug-build
May 6, 2026
Merged

fix(bb): allocate gemini masking poly with virtual size == dyadic#22937
iakovenkos merged 4 commits into
merge-train/barretenbergfrom
claudebox/fix-nightly-bb-debug-build

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

@AztecBot AztecBot commented May 5, 2026

Summary

Nightly debug build was failing again because OinkProver::commit_to_masking_poly allocates gemini_masking_poly with size max_end_index(). When that value is odd, sumcheck's pairwise iteration over (edge_idx, edge_idx + 1) reads one element past the polynomial — compute_effective_round_size itself rounds the iteration bound up to even, but the masking polynomial wasn't padded to match. In release builds this is silent UB; in debug, the bounds-checked accessor inside Polynomial::operator[] trips and the build fails very early (~2 minutes wall-clock, consistent with the failing job duration).

Fix: round the masking polynomial's allocation up to even at construction so the layout matches what sumcheck assumes. The change only ever enlarges the polynomial by at most one element and never affects non-ZK flavors (gated by flavor_has_gemini_masking<Flavor>()).

Recurrence

This is the fifth independent claudebox session converging on this exact patch over the past four days; the prior four (claudebox/fix-bb-debug-nightly 2026-05-01, claudebox/fix-nightly-bb-debug-build 2026-05-03, PR #22918 2026-05-04, plus an earlier branch) were not merged, so the bug recurred each night. The diff here is byte-for-byte identical to the closed PR #22918.

Most recent visible failed run: https://github.com/AztecProtocol/aztec-packages/actions/runs/25303760883

Verification

  • cmake --preset debug configures cleanly.
  • ninja ultra_honk_tests builds cleanly in the debug preset.
  • ./bin/ultra_honk_tests --gtest_filter='*Gemini*:*ZK*:*Mask*'28 passed, 6 skipped (the 6 skips are flavor-gated, not regressions).

Detailed analysis: https://gist.github.com/AztecBot/97de8e254a5df2dad90f895be7d28f08

ClaudeBox log: https://claudebox.work/s/bd19299558fa14bc?run=1

ClaudeBox log: https://claudebox.work/s/bd19299558fa14bc?run=1

@AztecBot AztecBot force-pushed the claudebox/fix-nightly-bb-debug-build branch from 4e8ddef to a8d081c Compare May 5, 2026 06:01
@AztecBot AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels May 5, 2026
@iakovenkos iakovenkos self-assigned this May 6, 2026
@iakovenkos iakovenkos changed the base branch from next to merge-train/barretenberg May 6, 2026 16:14
@iakovenkos iakovenkos changed the title fix(bb): pad gemini_masking_poly size to even to avoid sumcheck OOB fix(bb): allocate gemini masking poly with virtual size == dyadic May 6, 2026
@iakovenkos iakovenkos self-requested a review May 6, 2026 17:13
@iakovenkos iakovenkos marked this pull request as ready for review May 6, 2026 17:13
@iakovenkos iakovenkos enabled auto-merge (squash) May 6, 2026 17:14
@iakovenkos iakovenkos merged commit 1f9d87b into merge-train/barretenberg May 6, 2026
14 checks passed
@iakovenkos iakovenkos deleted the claudebox/fix-nightly-bb-debug-build branch May 6, 2026 20:54
federicobarbacovi added a commit that referenced this pull request May 7, 2026
…g SIGABRT (#23019)

## Summary

Unblocks the `Nightly Debug Build` workflow, which has been failing
again on `next` after #22937 landed. With the `gemini_masking_poly` fix
in #22937 the dsl_tests abort site is gone; the next abort site is in
`BatchMergeProver`, hit by
`BatchMergeTests/{2,3}.TooManySubtablesFails`.

Most recent failing run:
https://github.com/AztecProtocol/aztec-packages/actions/runs/25478845904
(b30fe8f, exit 134, 7m46s).

## Root cause

`TooManySubtablesFails` deliberately calls `BB_DISABLE_ASSERTS()` so it
can drive the verifier with `N = max_subtables + 1` subtables. With BB
asserts demoted to warnings, control flow falls past the
`BB_ASSERT_LTE(N, M, ...)` guard at the top of
`BatchMergeProver::construct_proof` and reaches
`compute_degree_check_polynomial`, which iterates over
`flattened_columns.size()` (sized for `N`) but indexes
`degree_check_challenges` (sized for `(M+1) * NUM_WIRES`).

In release the over-read returns garbage that the verifier rejects —
exactly what the test wants. In debug, libstdc++'s `_GLIBCXX_DEBUG` (set
by the `debug` preset's `CXXFLAGS`) makes `std::vector::operator[]`
bounds-checked and the OOB at `idx == 40` traps:

```
Error: attempt to subscript container with out-of-bounds index 40,
       but container only holds 40 elements.
timeout: the monitored command dumped core
```

`BB_DISABLE_ASSERTS()` can't suppress this — the bounds check is
libstdc++'s, not bb's.

## Fix

Clamp the loop to `min(flattened_columns.size(),
degree_check_challenges.size())`. Normal paths are unaffected because
both sizes equal `(M+1) * NUM_WIRES` whenever `N <= M` (the assert
holds). The misuse path now produces a partial degree-check polynomial
that the verifier still rejects in both Debug and Release — same
observable behaviour, no UB.

## Verification

Reproduced inside the ClaudeBox container on `b30fe8f401` with the same
flags as the nightly:

```bash
cd barretenberg/cpp
rm -rf build-debug
CXXFLAGS="-gdwarf-4 -D_GLIBCXX_DEBUG" CC=clang-20 CXX=clang++-20 \
  cmake -S . -B build-debug -G Ninja \
        -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=OFF -DDISABLE_ASM=OFF -DENABLE_STACKTRACES=ON
cd build-debug && ninja goblin_tests
./bin/goblin_tests --gtest_filter='*TooManySubtablesFails*'
```

- Pre-fix: aborts on `BatchMergeTests/2.TooManySubtablesFails` with the
OOB shown above (exit 134).
- Post-fix: 2 passed, 2 skipped (the native-curve cases self-skip).

Full `goblin_tests`: 68 passed / 7 skipped / 0 failed.

## Relationship to other PRs

- #22937 (merged) fixed the prior abort site (`gemini_masking_poly`
virtual size). Without it the build never reached `BatchMergeProver`;
with it, this is the next abort site.
- #22976 is a draft from an earlier ClaudeBox session carrying the same
one-file change against an older base. This PR re-opens the fix on top
of current `next` (`b30fe8f401`) and is ready for review so it doesn't
sit silently in draft. Once this merges, #22976 can be closed.
- #22811 (BB_DISABLE_ASSERTS workaround for
`HonkRecursionConstraintTestWithoutPredicate.GenerateVKFromConstraints`)
is obsoleted by #22937 and intentionally not included.

Detailed analysis:
https://gist.github.com/AztecBot/3ea741b9545337f7a3adcfb60b088378

ClaudeBox log: https://claudebox.work/s/6f2f86264b1de77e?run=1


ClaudeBox log: https://claudebox.work/s/6f2f86264b1de77e?run=1

---------

Co-authored-by: federicobarbacovi <171914500+federicobarbacovi@users.noreply.github.com>
federicobarbacovi added a commit that referenced this pull request May 8, 2026
…g SIGABRT (#23077)

Fix BatchMergeTests/*.TooManySubtablesFails by making the code use the TweakableBatchMergeProver.

## Summary

Unblocks the [Nightly Debug
Build](https://github.com/AztecProtocol/aztec-packages/actions/runs/25539203050)
on `next`. With `_GLIBCXX_DEBUG` enabled,
`BatchMergeTests/2.TooManySubtablesFails` aborts (exit 134) with:

```
Error: attempt to subscript container with out-of-bounds index 40,
       but container only holds 40 elements.
```

## Root cause

`TooManySubtablesFails` uses `BB_DISABLE_ASSERTS()` so it can drive the
verifier with `N = max_subtables + 1` subtables. With BB asserts demoted
to warnings, control flow falls past the `BB_ASSERT_LTE(N, M, ...)`
guard at the top of `BatchMergeProver::construct_proof` and reaches
`compute_degree_check_polynomial`, which iterates over
`flattened_columns.size()` (sized for `N`) but indexes
`degree_check_challenges` (sized for `(M+1) * NUM_WIRES`). In Release
the over-read returns garbage that the verifier rejects — exactly what
the test wants. In Debug, libstdc++'s `_GLIBCXX_DEBUG` (set by the
`debug` preset's `CXXFLAGS`) bounds-checks `std::vector::operator[]` and
traps. `BB_DISABLE_ASSERTS()` cannot suppress this — the bounds check is
libstdc++'s, not bb's.

## Why the previously-merged fix didn't fix anything

PR #23019 (already merged into `merge-train/barretenberg` via the open
#23025) was *titled* the same as this PR but only modified
`TweakableBatchMergeProver::construct_proof`. The actual
`TooManySubtablesFails` test path uses `prove_and_verify` with the
default `fault_mode = FaultMode::NONE`, which routes through
`BatchMergeProver` (the base class), not `TweakableBatchMergeProver`. I
confirmed this in this container by building `goblin_tests` against
`origin/merge-train/barretenberg` HEAD with the debug preset — the same
OOB still aborts. The merge train cannot deliver this fix without a real
prover-side change.

## Fix

Clamp the loop to `min(flattened_columns.size(),
degree_check_challenges.size())` inside
`BatchMergeProver::compute_degree_check_polynomial`. Normal paths are
unaffected because both sizes equal `(M+1) * NUM_WIRES` whenever `N <=
M` (the assert holds). The misuse path now produces a partial
degree-check polynomial that the verifier still rejects in both Debug
and Release — same observable behaviour, no UB.

This is the same one-file change as the open draft #22976; opening this
fresh PR against current `next` so it doesn't sit silently in draft.

## Verification

Reproduced inside the ClaudeBox container on `f23aa82c52` (current
`next` HEAD) with the same flags as the nightly:

```bash
cd barretenberg/cpp
HOME=/tmp cmake --preset debug -DAVM_TRANSPILER_LIB=
cd build-debug && ninja goblin_tests
./bin/goblin_tests --gtest_filter='*TooManySubtablesFails*'
```

- Pre-fix: aborts on `BatchMergeTests/2.TooManySubtablesFails` with the
OOB shown above (exit 134). Confirmed identical abort on
`origin/merge-train/barretenberg` HEAD.
- Post-fix: 2 passed, 2 skipped (the native-curve cases self-skip).

Full `goblin_tests` post-fix: **68 passed / 7 skipped / 0 failed**
(342s).

## Related PRs

- #22937 (merged) — fixed the prior abort site (`gemini_masking_poly`
virtual size).
- #23019 (merged into merge-train/barretenberg via open #23025) — same
title, but the diff only modified `TweakableBatchMergeProver`, which is
unused on this test path; it is a no-op for the nightly.
- #22976 (open draft) — same one-file prover clamp as this PR, against
an older base.

Detailed analysis:
https://gist.github.com/AztecBot/7be72c96a1d3d18458dce92a828116a2

ClaudeBox log: https://claudebox.work/s/8ad866e315acbe92?run=1

ClaudeBox log: https://claudebox.work/s/8ad866e315acbe92?run=1

---------

Co-authored-by: federicobarbacovi <171914500+federicobarbacovi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants