Skip to content

chore: release v2.0.0#2941

Draft
shuklaayush wants to merge 635 commits into
mainfrom
develop-v2.0.0-rc.3
Draft

chore: release v2.0.0#2941
shuklaayush wants to merge 635 commits into
mainfrom
develop-v2.0.0-rc.3

Conversation

@shuklaayush

Copy link
Copy Markdown
Collaborator

No description provided.

stephenh-axiom-xyz and others added 30 commits June 27, 2026 00:27
~~- we need `cargo-openvm` from `develop-v2.0.0-rc.1` to run the reth
benchmark for this branch since it adds new opcodes~~

resolves int-6949
## Summary

- Adds `scripts/pre-push.sh`, a lightweight bash script that runs
formatting, clippy, and tests locally on only the crates changed
compared to a target branch (default: `develop-v2.0.0-beta`)
- Auto-detects NVIDIA GPU via `nvidia-smi` and enables `--features
cuda,touchemall` when present; CPU-only otherwise
- Uses `cargo nextest` with `--cargo-profile=fast` when available (falls
back to `cargo test`), and applies the `heavy` nextest profile for
integration test crates

## Details

The script follows three steps:
1. **`cargo +nightly fmt --all -- --check`** — full workspace format
check (fast, nightly required per repo config)
2. **`cargo clippy`** — per changed crate, with appropriate feature
flags and `-D warnings`
3. **`cargo nextest run`** — per changed crate, with
`--cargo-profile=fast` and feature detection

Crate detection walks changed file paths upward to find the nearest
`Cargo.toml` with a `[package]` section. Benchmark and guest program
crates are skipped (they require nightly `rust-src`).

## Usage

```bash
./scripts/pre-push.sh                          # compare against develop-v2.0.0-beta
./scripts/pre-push.sh main                     # compare against main
./scripts/pre-push.sh origin/my-feature        # any valid git ref
```

## Test plan

- [ ] Run on a branch with Rust changes and verify only affected crates
are checked
- [ ] Run on a branch with no Rust changes (e.g., docs-only) and verify
early exit
- [ ] Run on a machine with GPU and verify `cuda,touchemall` features
appear in output
- [ ] Run on a machine without GPU and verify only `parallel` feature is
used
- [ ] Verify formatting failures are caught and reported

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Jonathan Wang <jonathanpwang@users.noreply.github.com>
Resolves INT-6981 and INT-6982. The PR to remove `DagCommitSubAir` is
[here](https://github.com/openvm-org/openvm/pull/2533/changes).

This branch does two primary things:

1. Re-introduces `DagCommitSubAir` into the recursion circuit (reversing
the recursion-side functional removal from
`c6e042f3b8f833dde0222cb48d5c81f04e2274f8`).
2. Moves the root prover path to **no cached trace** mode (`has_cached =
false`), using `CachedTraceCtx::Records` instead of cached PCS data.

Branch commits:
- `a97b89e37` — re-introduce `DagCommitSubAir` into recursion + API
plumbing updates
- `27e1c48d9` — root prover switched to no-cached mode
- `6a25c5b35` — follow-up compile/test/type fixes

Suggested review order:
1. **Recursion DagCommit reintroduction** (`crates/recursion/**`)
2. **Root prover no-cached flow**
(`crates/continuations/src/prover/root/**`)
3. **Cross-crate API updates** (`continuations` non-root traces,
`guest-libs/verify-stark/circuit`, `sdk`, tests)

---

Files:
- `crates/recursion/src/batch_constraint/expr_eval/dag_commit.rs` (new)
- `crates/recursion/src/batch_constraint/expr_eval/mod.rs`

What changed:
- Restores `DagCommitSubAir`, `DagCommitCols`, `DagCommitPvs`, and
helper logic for digest packing/collapse.
- Adds `DagCommitInfo` generation (`commit` + Poseidon2 input stream)
for no-cached mode.

Files:
-
`crates/recursion/src/batch_constraint/expr_eval/symbolic_expression/air.rs`

What changed:
- `SymbolicExpressionAir` now carries `dag_commit_subair:
Option<Arc<DagCommitSubAir<_>>>`.
- In cached mode:
- uses cached main width (`CachedSymbolicExpressionColumns`), zero
public values.
- In no-cached mode:
  - prefixes common main with `DagCommitCols`,
  - exposes `DagCommitPvs` public values,
  - evaluates DagCommit sub-AIR directly from common main prefix.

Files:
-
`crates/recursion/src/batch_constraint/expr_eval/symbolic_expression/trace.rs`

What changed:
- `SymbolicExpressionTraceGenerator` now takes `has_cached`.
- `CachedTraceRecord` again carries optional `dag_commit_info`.
- `build_cached_trace_record(child_vk, has_cached)` computes DagCommit
inputs/commit when `has_cached == false`.
- CPU tracegen writes DagCommit columns + symbolic-expression columns in
no-cached mode.

Files:
- `crates/recursion/src/batch_constraint/mod.rs`
- `crates/recursion/src/system/mod.rs`

What changed:
- `BatchConstraintModule` now stores `has_cached` and configures
`SymbolicExpressionAir` accordingly.
- `VerifierConfig` includes `has_cached`.
- Reintroduces explicit `CachedTraceCtx<PB>` API:
  - `PcsData(CommittedTraceData<PB>)`
  - `Records(CachedTraceRecord)`
- `VerifierTraceGen::generate_proving_ctxs(_base)` now accepts
`CachedTraceCtx` again.
- CPU/GPU verifier tracegen paths now set either:
  - `cached_mains` from `PcsData`, or
  - symbolic-expression public values from `Records` (DagCommit commit).

Note:
- `DagCommitBus` is **not** reintroduced. DagCommit remains folded as
sub-AIR, so bus-level wiring is still intentionally absent.

Files:
- `crates/recursion/cuda/src/batch_constraint/expr_eval/dag_commit.cuh`
(new)
-
`crates/recursion/cuda/src/batch_constraint/expr_eval/symbolic_expression.cu`
- `crates/recursion/src/batch_constraint/cuda_abi.rs`
- `crates/recursion/src/batch_constraint/cuda_utils.rs`

What changed:
- Restores DagCommit column writing in CUDA symbolic-expression tracegen
when no-cached mode is used.
- Adds per-row cached metadata (`CachedGpuRecord`) carrying Poseidon2
start state + `is_constraint`.
- `_sym_expr_common_tracegen`/Rust ABI now takes `d_cached_records`
pointer.

---

Files:
- `crates/continuations/src/prover/root/mod.rs`
- `crates/continuations/src/prover/root/trace.rs`
- `crates/continuations/src/prover/mod.rs`

What changed:
- Root verifier subcircuit is built with `VerifierConfig { has_cached:
false, ... }`.
- Root prover now computes/stores `CachedTraceRecord` and feeds
recursion via:
  - `CachedTraceCtx::Records(self.cached_trace_record.clone())`
- Removed root prover dependency on cached PCS data for recursion
tracegen (`child_vk_pcs_data` removed).
- Removed root cached-commit getter (no longer needed by root path).
- Simplified `RootProver` type shape by removing struct-level `PB`;
backend type is now method-generic.
- Updated root prover aliases accordingly.

---

`CachedTraceCtx`
Files:
- `crates/continuations/src/prover/inner/trace.rs`
- `crates/continuations/src/prover/deferral/inner/trace.rs`
- `crates/continuations/src/prover/deferral/hook/trace.rs`

What changed:
- Calls to recursion `generate_proving_ctxs` now pass
`CachedTraceCtx::PcsData(...)`.

Files:
- `guest-libs/verify-stark/circuit/src/prover/mod.rs`
- `guest-libs/verify-stark/circuit/src/prover/trace.rs`

What changed:
- Sets `VerifierConfig.has_cached = true` explicitly where required.
- Uses `CachedTraceCtx::PcsData(...)` for recursion tracegen calls.

changes
Files:
- `crates/sdk/src/prover/root.rs`
- `crates/continuations/src/tests/mod.rs`
- `crates/continuations/src/tests/e2e.rs`

What changed:
- Adds explicit PB typing where inference became ambiguous after moving
root prover backend typing to method generics.

---

Files:
- `crates/recursion/src/tests.rs`

What changed:
- Adds coverage for no-cached DagCommit-subAIR path:
  - `test_recursion_circuit_dag_commit_subair`
  - uses `MixtureFixture::standard(...)`
  - builds verifier with `has_cached: false`
  - runs with `CachedTraceCtx::Records(...)`.
- Existing recursion tests updated to pass
`CachedTraceCtx::PcsData(...)` where appropriate.

---

If you only review a subset, prioritize:
1. `recursion` symbolic-expression + DagCommit integration (`air.rs`,
`trace.rs`, `dag_commit.rs`, system wiring)
2. root prover no-cached migration (`continuations/prover/root/*`)
3. CUDA DagCommit tracegen plumbing (`symbolic_expression.cu`,
`cuda_abi.rs`, `cuda_utils.rs`)

---------

Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
Removes all `openvm-native-circuit`, `openvm-native-recursion`, and
`openvm-native-compiler` dependencies from the three benchmark crates,
which no longer exist in the workspace.

- **Deleted** `src/generate-fixtures.rs` — depended on
`openvm-native-circuit`, `openvm-native-recursion`, `openvm-sdk`
- **Cargo.toml**: Removed all optional native-circuit deps
(`openvm-circuit`, `openvm-continuations`, `openvm-native-circuit`,
`openvm-native-recursion`, `openvm-sdk`, `openvm-stark-sdk`, `bitcode`,
`serde`). Removed `generate-fixtures` feature and its `[[bin]]` entry.
Set `default = []`.
- `src/lib.rs` unchanged — ELF building utilities remain intact.

- **Deleted** `src/execute-verifier.rs` — the entire binary used
`NativeCpuBuilder` and `NATIVE_MAX_TRACE_HEIGHTS` from native-circuit.
- **Deleted** `src/lib.rs` — was a placeholder; the crate is bench-only.
- **Cargo.toml**:
- Removed `openvm-continuations` and `openvm-sdk` deps (only used by
verifier code).
- Removed `[[bin]]` for `execute-verifier` and the `evm-prove` feature.
- Expanded `tco` and `aot` features to propagate to all extension
circuit crates directly (previously went through `openvm-sdk`).
- Added `openvm-circuit` with `test-utils` to dev-dependencies (for
`SystemParams::new_for_testing`).
- **`benches/execute.rs`**:
- Removed all 6 verifier benchmarks (`benchmark_leaf_verifier_*`,
`benchmark_internal_verifier_*`) and their setup functions
(`setup_leaf_verifier`, `setup_internal_verifier`).
- Removed `#[cfg(feature = "aot")]` branches and
`create_aot_instance`/`create_metered_aot_instance` functions — kept
only the interpreter-based execution paths.
- Updated engine/params API: `BabyBearPoseidon2Engine` →
`BabyBearPoseidon2CpuEngine`, `FriParameters::standard_fast()` →
`SystemParams::new_for_testing(21)`.
- Removed unused imports: `fs`, `Arc`, `ContinuationVmProof`, `Proof`,
`NativeCpuBuilder`, etc.
- App execution benchmarks (`benchmark_execute`,
`benchmark_execute_metered`, `benchmark_execute_metered_cost`) remain
unchanged.

**Major restructuring** to match the v2-proof-system reference pattern.

- **`src/lib.rs` rewritten** (replaces `src/util.rs`):
  - `BenchmarkCli` with `--max-segment-length` and `--app-only` flags.
- `BenchmarkCli::run()` dispatches to `run_default_benchmark` (full
aggregation) or `run_default_app_benchmark` (app-only).
  - `BenchmarkCli::apply_config()` helper for shared segmentation logic.
- `run_benchmark()` — full aggregation with `Sdk::prove()`, proof size
metrics (total + zstd-compressed), and verification via
`verify_vm_stark_proof_decoded`.
  - `run_app_benchmark()` — app-only proof with `verify_app_proof`.

- **Deleted** `src/util.rs` — consolidated into `lib.rs`.

- **Bins rewritten** to use pre-compiled ELFs via `include_bytes!`
instead of runtime `build_bench_program()`:
  - `fibonacci.rs` — loads ELF from `guest/fibonacci/elf/`, n=800k.
  - `pairing.rs` — BN254 pairing check.
  - `ecrecover.rs` — ECDSA secp256k1 recovery (5 signatures).
  - `regex.rs` — email regex matching with `regex_email.txt` fixture.
- `base64_json.rs`, `bincode.rs`, `rkyv.rs`, `revm_transfer.rs` —
serialization/EVM benchmarks.

- **New bins**:
- `keccak.rs` — keccak256 iteration benchmark (4096 iterations),
builder-constructed config.
- `keccak_par.rs` — parallel keccak proving with `--concurrency` flag,
shared `app_pk`/`app_vk` across threads.
- `kitchen_sink.rs` — restored (previously deleted due to native-circuit
deps), now loads pre-compiled ELF.

- **Deleted bins**:
- `fib_e2e.rs` — replaced by `fibonacci.rs` (use `--app-only` for
app-only mode).
- `verify_fibair.rs` — depended on `openvm-native-circuit` and
`openvm-native-recursion`.
  - `async_regex.rs` — `AsyncAppProver` no longer exists in the SDK.

- **Cargo.toml**:
- Added `openvm-sdk-config`, `openvm-verify-stark-host`, `zstd`,
`p3-field`.
  - Removed `tokio`, `derive_more`, `rand`.
- Added `openvm-stark-backend/parallel` and
`openvm-stark-backend/jemalloc` to `parallel`/`jemalloc` features.
  - Removed `async` feature.
Merge openvm-org/stark-backend#304 first so we
can update root params here.

Introduces the `openvm-static-verifier` crate — a Halo2-based circuit
that verifies STARK proofs inside a BN254 SNARK, enabling on-chain EVM
verification. This is the core component that bridges the STARK proof
system to Ethereum.

- **BabyBear field arithmetic in Halo2**: base and extension field chips
(`BabyBearChip`) for constrained BabyBear arithmetic over BN254
- **Transcript verification**: `TranscriptChip` that replays Fiat-Shamir
transcript inside the Halo2 circuit
- **Poseidon2 hashing**: in-circuit Poseidon2 permutation for BabyBear
state, matching the STARK hash configuration
- **Multi-stage verification pipeline**:
  - `batch_constraints` — batches and verifies all AIR constraints
  - `stacked_reduction` — stacked trace commitment reduction
  - `whir` — WHIR (FRI-like) polynomial proximity verification
- `full_pipeline` — end-to-end STARK proof verification orchestrating
all stages
- **Public values handling**: constrained public value extraction and
validation
- **Keygen and proving API**: `StaticVerifierProvingKey`, `Halo2Prover`,
auto-tuned circuit degree (`k`)
- **EVM wrapper circuit**: SHPLONK wrapper for generating on-chain
verifiable proofs (`evm-prove` / `evm-verify` features)
- **Cell-count profiling**: optional flamegraph profiling of Halo2 cell
usage (`cell-profiling` feature)
- **Integration test**: `real_prover_roundtrip` — full Halo2 KZG keygen
+ prove + verify on a `MixtureFixture` proof

- Wire up `openvm-static-verifier` behind `evm-prove` / `evm-verify`
feature flags (previously stubbed out)
- Add `evm-verify-fmt` feature (optional Solidity formatting, requires
Rust 1.91+)
- New `Halo2Prover` and `StaticVerifierProvingKey` types in SDK prover
pipeline
- Dummy witness generation for keygen (`keygen/dummy.rs`)
- Halo2 params helpers (`halo2_params.rs`)

- Add `static-verifier.yml` CI workflow for the new crate
- Remove unused SSH key step (`webfactory/ssh-agent`) from all workflows
- Remove outdated `OPENVM_FAST_TEST` env var from all workflows,
`AGENTS.md`, and `pre-push.sh`

- [x] `cargo nextest run --cargo-profile fast` in
`crates/static-verifier` (unit + integration tests)
- [ ] `cargo nextest run --release --features parallel,evm-verify` in
`crates/sdk`
- [ ] CI workflows pass on this branch

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: stephenh-axiom-xyz <stephenh@intrinsictech.xyz>
Co-authored-by: Ayush Shukla <shuklaayush247@gmail.com>
- Fix CUDA stream safety issue: `generate_gpu_proving_ctx` was called
inside `par_iter` in 4 modules (batch_constraint, gkr, stacking, whir),
causing GPU buffer allocations and H2D transfers to run on rayon worker
threads instead of the main/default CUDA stream
- Split all affected sites into two phases: (1) CPU trace generation in
parallel via `par_iter`, (2) H2D transfer serially on the main thread
via `transport_matrix_h2d_row`
- Delete the now-unused `generate_gpu_proving_ctx` helper from
`tracegen.rs`
- Inline H2D transfer in the `_ =>` fallback arms of
`ModuleChip<GpuBackend>` impls for stacking and whir
## Summary

Fixes CUDA race condition in recursion preflight where `run_preflight`
was called from parallel threads via `thread::scope`, each launching
`merkle_precomputation_hash_vectors` GPU kernels on different CUDA
streams.

Extracted `apply_merkle_precomputation` from `run_preflight` so that:
- **CPU**: merkle precomputation runs inline within parallel
`thread::scope` (pure CPU, safe)
- **CUDA**: merkle precomputation runs serially after `thread::scope` on
the main thread
## Summary

- Reject `HINT_BUFFER` instructions with `num_words == 0` at execution
time, before record allocation or trace generation
- Previously this was only guarded by `debug_assert_ne!`, which is
stripped in release builds
- Adds new `ExecutionError::HintBufferZeroWords` variant for a clear
error message

## Motivation

In release builds, a `HINT_BUFFER` with `num_words == 0` passes through
to CUDA tracegen where it produces an empty offsets array, a zero-height
`DeviceMatrix` (instead of a proper `dummy()`), and can cause undefined
behavior in the GPU kernel (zero-stride `RowSlice`, invalid launch
params).

Since `num_words` is read from a register at runtime, this cannot be
caught statically during transpilation or program construction. The fix
promotes the existing debug assertion to a proper runtime error in both
the preflight and interpreter executors, matching the pattern used by
the existing `HintBufferTooLarge` check.

## Changes

- `crates/vm/src/arch/execution.rs`: Add `HintBufferZeroWords { pc }` to
`ExecutionError`
- `extensions/rv32im/circuit/src/hintstore/mod.rs`: Add `num_words == 0`
check in preflight executor, remove redundant `debug_assert_ne!`
- `extensions/rv32im/circuit/src/hintstore/execution.rs`: Replace
`debug_assert_ne!` with runtime error in interpreter executor

resolves int-7155
This should please the ci lint check
## Summary

- Fix GPU cached program trace generation to match CPU behavior for
empty or small programs
- Replace `next_power_of_two_or_zero(num_records)` with
`num_records.next_power_of_two()` so the minimum trace height is 1
instead of 0

## Motivation

The CPU path (`generate_cached_trace` in `trace.rs`) pads instructions
with `while !len.is_power_of_two()`, which rounds 0 up to 1 and emits a
TERMINATE padding row. The GPU path used `next_power_of_two_or_zero`,
which returns 0 for empty input — producing a zero-height trace with no
TERMINATE row. This is a CPU/GPU semantic mismatch.

`next_power_of_two()` returns 1 for input 0, matching the CPU behavior.
The CUDA kernel already fills padding rows with TERMINATE +
EXIT_CODE_FAIL, so both height and content now agree.

## Changes

- `crates/vm/src/system/cuda/program.rs`: Use
`num_records.next_power_of_two()` instead of
`next_power_of_two_or_zero(num_records)`, remove unused import

Closes INT-7167
…wo or zero (#2637)

This resolves INT-7147.

I decided not to impose the power-of-two as a global restriction, but
this will most likely always be the case, at least in the near future
The extra stream usage is increasing maintenance and harder to debug
without much performance gain at the moment.
)

This resolves INT-7146

Not sure about the target branch
@shuklaayush shuklaayush added run-benchmark triggers benchmark workflows on the pr run-benchmark-e2e labels Jun 26, 2026
zlangley and others added 3 commits June 26, 2026 19:38
## Summary
- Track `.cargo/config.toml` with `git-fetch-with-cli` and a
commented-out SSH patch block for `stark-backend`.
- Load `GH_ACTIONS_DEPLOY_PRIVATE_KEY` before workflow checkouts so SSH
patch URLs can resolve in CI.
- Stop ignoring `.cargo/config.toml` so the rc3 patching setup is
available on the branch.

## Validation
- Parsed all workflow YAML files.
- Checked every `actions/checkout` has an earlier SSH-agent step in the
same job.
- Ran `git diff --check`.
@shuklaayush shuklaayush force-pushed the develop-v2.0.0-rc.3 branch from 8c44867 to dc4f0bf Compare June 30, 2026 19:11
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
group app.proof_time_ms app.cycles leaf.proof_time_ms
fibonacci 3,090 12,000,265 (-3522 [-83.9%]) 677
keccak 16,946 18,655,329 3,149
sha2_bench 9,198 14,793,960 1,125
regex 1,170 4,137,067 (-11769 [-97.1%]) 352
ecrecover 607 123,583 (-5977 [-95.3%]) 294
pairing 936 1,745,757 (-6345 [-95.4%]) 308
kitchen_sink 4,109 2,579,903 884
fibonacci_e2e 1,523 12,000,265 289
regex_e2e 770 4,137,067 166
ecrecover_e2e 507 123,583 142
pairing_e2e 654 1,745,757 147
kitchen_sink_e2e 2,334 2,579,903 387

Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights.

Commit: 15c6048

Benchmark Workflow

@shuklaayush shuklaayush changed the title test(DON'T MERGE): develop-v2.0.0-rc.3 chore: release v2.0.0 Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-benchmark triggers benchmark workflows on the pr run-benchmark-e2e run-sdk-tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.