chore: release v2.0.0#2941
Draft
shuklaayush wants to merge 635 commits into
Draft
Conversation
~~- we need `cargo-openvm` from `develop-v2.0.0-rc.1` to run the reth benchmark for this branch since it adds new opcodes~~ resolves int-6949
## Summary - Adds `scripts/pre-push.sh`, a lightweight bash script that runs formatting, clippy, and tests locally on only the crates changed compared to a target branch (default: `develop-v2.0.0-beta`) - Auto-detects NVIDIA GPU via `nvidia-smi` and enables `--features cuda,touchemall` when present; CPU-only otherwise - Uses `cargo nextest` with `--cargo-profile=fast` when available (falls back to `cargo test`), and applies the `heavy` nextest profile for integration test crates ## Details The script follows three steps: 1. **`cargo +nightly fmt --all -- --check`** — full workspace format check (fast, nightly required per repo config) 2. **`cargo clippy`** — per changed crate, with appropriate feature flags and `-D warnings` 3. **`cargo nextest run`** — per changed crate, with `--cargo-profile=fast` and feature detection Crate detection walks changed file paths upward to find the nearest `Cargo.toml` with a `[package]` section. Benchmark and guest program crates are skipped (they require nightly `rust-src`). ## Usage ```bash ./scripts/pre-push.sh # compare against develop-v2.0.0-beta ./scripts/pre-push.sh main # compare against main ./scripts/pre-push.sh origin/my-feature # any valid git ref ``` ## Test plan - [ ] Run on a branch with Rust changes and verify only affected crates are checked - [ ] Run on a branch with no Rust changes (e.g., docs-only) and verify early exit - [ ] Run on a machine with GPU and verify `cuda,touchemall` features appear in output - [ ] Run on a machine without GPU and verify only `parallel` feature is used - [ ] Verify formatting failures are caught and reported --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Jonathan Wang <jonathanpwang@users.noreply.github.com>
Resolves INT-6981 and INT-6982. The PR to remove `DagCommitSubAir` is [here](https://github.com/openvm-org/openvm/pull/2533/changes). This branch does two primary things: 1. Re-introduces `DagCommitSubAir` into the recursion circuit (reversing the recursion-side functional removal from `c6e042f3b8f833dde0222cb48d5c81f04e2274f8`). 2. Moves the root prover path to **no cached trace** mode (`has_cached = false`), using `CachedTraceCtx::Records` instead of cached PCS data. Branch commits: - `a97b89e37` — re-introduce `DagCommitSubAir` into recursion + API plumbing updates - `27e1c48d9` — root prover switched to no-cached mode - `6a25c5b35` — follow-up compile/test/type fixes Suggested review order: 1. **Recursion DagCommit reintroduction** (`crates/recursion/**`) 2. **Root prover no-cached flow** (`crates/continuations/src/prover/root/**`) 3. **Cross-crate API updates** (`continuations` non-root traces, `guest-libs/verify-stark/circuit`, `sdk`, tests) --- Files: - `crates/recursion/src/batch_constraint/expr_eval/dag_commit.rs` (new) - `crates/recursion/src/batch_constraint/expr_eval/mod.rs` What changed: - Restores `DagCommitSubAir`, `DagCommitCols`, `DagCommitPvs`, and helper logic for digest packing/collapse. - Adds `DagCommitInfo` generation (`commit` + Poseidon2 input stream) for no-cached mode. Files: - `crates/recursion/src/batch_constraint/expr_eval/symbolic_expression/air.rs` What changed: - `SymbolicExpressionAir` now carries `dag_commit_subair: Option<Arc<DagCommitSubAir<_>>>`. - In cached mode: - uses cached main width (`CachedSymbolicExpressionColumns`), zero public values. - In no-cached mode: - prefixes common main with `DagCommitCols`, - exposes `DagCommitPvs` public values, - evaluates DagCommit sub-AIR directly from common main prefix. Files: - `crates/recursion/src/batch_constraint/expr_eval/symbolic_expression/trace.rs` What changed: - `SymbolicExpressionTraceGenerator` now takes `has_cached`. - `CachedTraceRecord` again carries optional `dag_commit_info`. - `build_cached_trace_record(child_vk, has_cached)` computes DagCommit inputs/commit when `has_cached == false`. - CPU tracegen writes DagCommit columns + symbolic-expression columns in no-cached mode. Files: - `crates/recursion/src/batch_constraint/mod.rs` - `crates/recursion/src/system/mod.rs` What changed: - `BatchConstraintModule` now stores `has_cached` and configures `SymbolicExpressionAir` accordingly. - `VerifierConfig` includes `has_cached`. - Reintroduces explicit `CachedTraceCtx<PB>` API: - `PcsData(CommittedTraceData<PB>)` - `Records(CachedTraceRecord)` - `VerifierTraceGen::generate_proving_ctxs(_base)` now accepts `CachedTraceCtx` again. - CPU/GPU verifier tracegen paths now set either: - `cached_mains` from `PcsData`, or - symbolic-expression public values from `Records` (DagCommit commit). Note: - `DagCommitBus` is **not** reintroduced. DagCommit remains folded as sub-AIR, so bus-level wiring is still intentionally absent. Files: - `crates/recursion/cuda/src/batch_constraint/expr_eval/dag_commit.cuh` (new) - `crates/recursion/cuda/src/batch_constraint/expr_eval/symbolic_expression.cu` - `crates/recursion/src/batch_constraint/cuda_abi.rs` - `crates/recursion/src/batch_constraint/cuda_utils.rs` What changed: - Restores DagCommit column writing in CUDA symbolic-expression tracegen when no-cached mode is used. - Adds per-row cached metadata (`CachedGpuRecord`) carrying Poseidon2 start state + `is_constraint`. - `_sym_expr_common_tracegen`/Rust ABI now takes `d_cached_records` pointer. --- Files: - `crates/continuations/src/prover/root/mod.rs` - `crates/continuations/src/prover/root/trace.rs` - `crates/continuations/src/prover/mod.rs` What changed: - Root verifier subcircuit is built with `VerifierConfig { has_cached: false, ... }`. - Root prover now computes/stores `CachedTraceRecord` and feeds recursion via: - `CachedTraceCtx::Records(self.cached_trace_record.clone())` - Removed root prover dependency on cached PCS data for recursion tracegen (`child_vk_pcs_data` removed). - Removed root cached-commit getter (no longer needed by root path). - Simplified `RootProver` type shape by removing struct-level `PB`; backend type is now method-generic. - Updated root prover aliases accordingly. --- `CachedTraceCtx` Files: - `crates/continuations/src/prover/inner/trace.rs` - `crates/continuations/src/prover/deferral/inner/trace.rs` - `crates/continuations/src/prover/deferral/hook/trace.rs` What changed: - Calls to recursion `generate_proving_ctxs` now pass `CachedTraceCtx::PcsData(...)`. Files: - `guest-libs/verify-stark/circuit/src/prover/mod.rs` - `guest-libs/verify-stark/circuit/src/prover/trace.rs` What changed: - Sets `VerifierConfig.has_cached = true` explicitly where required. - Uses `CachedTraceCtx::PcsData(...)` for recursion tracegen calls. changes Files: - `crates/sdk/src/prover/root.rs` - `crates/continuations/src/tests/mod.rs` - `crates/continuations/src/tests/e2e.rs` What changed: - Adds explicit PB typing where inference became ambiguous after moving root prover backend typing to method generics. --- Files: - `crates/recursion/src/tests.rs` What changed: - Adds coverage for no-cached DagCommit-subAIR path: - `test_recursion_circuit_dag_commit_subair` - uses `MixtureFixture::standard(...)` - builds verifier with `has_cached: false` - runs with `CachedTraceCtx::Records(...)`. - Existing recursion tests updated to pass `CachedTraceCtx::PcsData(...)` where appropriate. --- If you only review a subset, prioritize: 1. `recursion` symbolic-expression + DagCommit integration (`air.rs`, `trace.rs`, `dag_commit.rs`, system wiring) 2. root prover no-cached migration (`continuations/prover/root/*`) 3. CUDA DagCommit tracegen plumbing (`symbolic_expression.cu`, `cuda_abi.rs`, `cuda_utils.rs`) --------- Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
Closes INT-6717
Closes INT-6989
Removes all `openvm-native-circuit`, `openvm-native-recursion`, and `openvm-native-compiler` dependencies from the three benchmark crates, which no longer exist in the workspace. - **Deleted** `src/generate-fixtures.rs` — depended on `openvm-native-circuit`, `openvm-native-recursion`, `openvm-sdk` - **Cargo.toml**: Removed all optional native-circuit deps (`openvm-circuit`, `openvm-continuations`, `openvm-native-circuit`, `openvm-native-recursion`, `openvm-sdk`, `openvm-stark-sdk`, `bitcode`, `serde`). Removed `generate-fixtures` feature and its `[[bin]]` entry. Set `default = []`. - `src/lib.rs` unchanged — ELF building utilities remain intact. - **Deleted** `src/execute-verifier.rs` — the entire binary used `NativeCpuBuilder` and `NATIVE_MAX_TRACE_HEIGHTS` from native-circuit. - **Deleted** `src/lib.rs` — was a placeholder; the crate is bench-only. - **Cargo.toml**: - Removed `openvm-continuations` and `openvm-sdk` deps (only used by verifier code). - Removed `[[bin]]` for `execute-verifier` and the `evm-prove` feature. - Expanded `tco` and `aot` features to propagate to all extension circuit crates directly (previously went through `openvm-sdk`). - Added `openvm-circuit` with `test-utils` to dev-dependencies (for `SystemParams::new_for_testing`). - **`benches/execute.rs`**: - Removed all 6 verifier benchmarks (`benchmark_leaf_verifier_*`, `benchmark_internal_verifier_*`) and their setup functions (`setup_leaf_verifier`, `setup_internal_verifier`). - Removed `#[cfg(feature = "aot")]` branches and `create_aot_instance`/`create_metered_aot_instance` functions — kept only the interpreter-based execution paths. - Updated engine/params API: `BabyBearPoseidon2Engine` → `BabyBearPoseidon2CpuEngine`, `FriParameters::standard_fast()` → `SystemParams::new_for_testing(21)`. - Removed unused imports: `fs`, `Arc`, `ContinuationVmProof`, `Proof`, `NativeCpuBuilder`, etc. - App execution benchmarks (`benchmark_execute`, `benchmark_execute_metered`, `benchmark_execute_metered_cost`) remain unchanged. **Major restructuring** to match the v2-proof-system reference pattern. - **`src/lib.rs` rewritten** (replaces `src/util.rs`): - `BenchmarkCli` with `--max-segment-length` and `--app-only` flags. - `BenchmarkCli::run()` dispatches to `run_default_benchmark` (full aggregation) or `run_default_app_benchmark` (app-only). - `BenchmarkCli::apply_config()` helper for shared segmentation logic. - `run_benchmark()` — full aggregation with `Sdk::prove()`, proof size metrics (total + zstd-compressed), and verification via `verify_vm_stark_proof_decoded`. - `run_app_benchmark()` — app-only proof with `verify_app_proof`. - **Deleted** `src/util.rs` — consolidated into `lib.rs`. - **Bins rewritten** to use pre-compiled ELFs via `include_bytes!` instead of runtime `build_bench_program()`: - `fibonacci.rs` — loads ELF from `guest/fibonacci/elf/`, n=800k. - `pairing.rs` — BN254 pairing check. - `ecrecover.rs` — ECDSA secp256k1 recovery (5 signatures). - `regex.rs` — email regex matching with `regex_email.txt` fixture. - `base64_json.rs`, `bincode.rs`, `rkyv.rs`, `revm_transfer.rs` — serialization/EVM benchmarks. - **New bins**: - `keccak.rs` — keccak256 iteration benchmark (4096 iterations), builder-constructed config. - `keccak_par.rs` — parallel keccak proving with `--concurrency` flag, shared `app_pk`/`app_vk` across threads. - `kitchen_sink.rs` — restored (previously deleted due to native-circuit deps), now loads pre-compiled ELF. - **Deleted bins**: - `fib_e2e.rs` — replaced by `fibonacci.rs` (use `--app-only` for app-only mode). - `verify_fibair.rs` — depended on `openvm-native-circuit` and `openvm-native-recursion`. - `async_regex.rs` — `AsyncAppProver` no longer exists in the SDK. - **Cargo.toml**: - Added `openvm-sdk-config`, `openvm-verify-stark-host`, `zstd`, `p3-field`. - Removed `tokio`, `derive_more`, `rand`. - Added `openvm-stark-backend/parallel` and `openvm-stark-backend/jemalloc` to `parallel`/`jemalloc` features. - Removed `async` feature.
Merge openvm-org/stark-backend#304 first so we can update root params here. Introduces the `openvm-static-verifier` crate — a Halo2-based circuit that verifies STARK proofs inside a BN254 SNARK, enabling on-chain EVM verification. This is the core component that bridges the STARK proof system to Ethereum. - **BabyBear field arithmetic in Halo2**: base and extension field chips (`BabyBearChip`) for constrained BabyBear arithmetic over BN254 - **Transcript verification**: `TranscriptChip` that replays Fiat-Shamir transcript inside the Halo2 circuit - **Poseidon2 hashing**: in-circuit Poseidon2 permutation for BabyBear state, matching the STARK hash configuration - **Multi-stage verification pipeline**: - `batch_constraints` — batches and verifies all AIR constraints - `stacked_reduction` — stacked trace commitment reduction - `whir` — WHIR (FRI-like) polynomial proximity verification - `full_pipeline` — end-to-end STARK proof verification orchestrating all stages - **Public values handling**: constrained public value extraction and validation - **Keygen and proving API**: `StaticVerifierProvingKey`, `Halo2Prover`, auto-tuned circuit degree (`k`) - **EVM wrapper circuit**: SHPLONK wrapper for generating on-chain verifiable proofs (`evm-prove` / `evm-verify` features) - **Cell-count profiling**: optional flamegraph profiling of Halo2 cell usage (`cell-profiling` feature) - **Integration test**: `real_prover_roundtrip` — full Halo2 KZG keygen + prove + verify on a `MixtureFixture` proof - Wire up `openvm-static-verifier` behind `evm-prove` / `evm-verify` feature flags (previously stubbed out) - Add `evm-verify-fmt` feature (optional Solidity formatting, requires Rust 1.91+) - New `Halo2Prover` and `StaticVerifierProvingKey` types in SDK prover pipeline - Dummy witness generation for keygen (`keygen/dummy.rs`) - Halo2 params helpers (`halo2_params.rs`) - Add `static-verifier.yml` CI workflow for the new crate - Remove unused SSH key step (`webfactory/ssh-agent`) from all workflows - Remove outdated `OPENVM_FAST_TEST` env var from all workflows, `AGENTS.md`, and `pre-push.sh` - [x] `cargo nextest run --cargo-profile fast` in `crates/static-verifier` (unit + integration tests) - [ ] `cargo nextest run --release --features parallel,evm-verify` in `crates/sdk` - [ ] CI workflows pass on this branch --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: stephenh-axiom-xyz <stephenh@intrinsictech.xyz> Co-authored-by: Ayush Shukla <shuklaayush247@gmail.com>
- Fix CUDA stream safety issue: `generate_gpu_proving_ctx` was called inside `par_iter` in 4 modules (batch_constraint, gkr, stacking, whir), causing GPU buffer allocations and H2D transfers to run on rayon worker threads instead of the main/default CUDA stream - Split all affected sites into two phases: (1) CPU trace generation in parallel via `par_iter`, (2) H2D transfer serially on the main thread via `transport_matrix_h2d_row` - Delete the now-unused `generate_gpu_proving_ctx` helper from `tracegen.rs` - Inline H2D transfer in the `_ =>` fallback arms of `ModuleChip<GpuBackend>` impls for stacking and whir
## Summary Fixes CUDA race condition in recursion preflight where `run_preflight` was called from parallel threads via `thread::scope`, each launching `merkle_precomputation_hash_vectors` GPU kernels on different CUDA streams. Extracted `apply_merkle_precomputation` from `run_preflight` so that: - **CPU**: merkle precomputation runs inline within parallel `thread::scope` (pure CPU, safe) - **CUDA**: merkle precomputation runs serially after `thread::scope` on the main thread
## Summary
- Reject `HINT_BUFFER` instructions with `num_words == 0` at execution
time, before record allocation or trace generation
- Previously this was only guarded by `debug_assert_ne!`, which is
stripped in release builds
- Adds new `ExecutionError::HintBufferZeroWords` variant for a clear
error message
## Motivation
In release builds, a `HINT_BUFFER` with `num_words == 0` passes through
to CUDA tracegen where it produces an empty offsets array, a zero-height
`DeviceMatrix` (instead of a proper `dummy()`), and can cause undefined
behavior in the GPU kernel (zero-stride `RowSlice`, invalid launch
params).
Since `num_words` is read from a register at runtime, this cannot be
caught statically during transpilation or program construction. The fix
promotes the existing debug assertion to a proper runtime error in both
the preflight and interpreter executors, matching the pattern used by
the existing `HintBufferTooLarge` check.
## Changes
- `crates/vm/src/arch/execution.rs`: Add `HintBufferZeroWords { pc }` to
`ExecutionError`
- `extensions/rv32im/circuit/src/hintstore/mod.rs`: Add `num_words == 0`
check in preflight executor, remove redundant `debug_assert_ne!`
- `extensions/rv32im/circuit/src/hintstore/execution.rs`: Replace
`debug_assert_ne!` with runtime error in interpreter executor
resolves int-7155
This should please the ci lint check
## Summary - Fix GPU cached program trace generation to match CPU behavior for empty or small programs - Replace `next_power_of_two_or_zero(num_records)` with `num_records.next_power_of_two()` so the minimum trace height is 1 instead of 0 ## Motivation The CPU path (`generate_cached_trace` in `trace.rs`) pads instructions with `while !len.is_power_of_two()`, which rounds 0 up to 1 and emits a TERMINATE padding row. The GPU path used `next_power_of_two_or_zero`, which returns 0 for empty input — producing a zero-height trace with no TERMINATE row. This is a CPU/GPU semantic mismatch. `next_power_of_two()` returns 1 for input 0, matching the CPU behavior. The CUDA kernel already fills padding rows with TERMINATE + EXIT_CODE_FAIL, so both height and content now agree. ## Changes - `crates/vm/src/system/cuda/program.rs`: Use `num_records.next_power_of_two()` instead of `next_power_of_two_or_zero(num_records)`, remove unused import Closes INT-7167
…wo or zero (#2637) This resolves INT-7147. I decided not to impose the power-of-two as a global restriction, but this will most likely always be the case, at least in the near future
The extra stream usage is increasing maintenance and harder to debug without much performance gain at the moment.
## Summary - Track `.cargo/config.toml` with `git-fetch-with-cli` and a commented-out SSH patch block for `stark-backend`. - Load `GH_ACTIONS_DEPLOY_PRIVATE_KEY` before workflow checkouts so SSH patch URLs can resolve in CI. - Stop ignoring `.cargo/config.toml` so the rc3 patching setup is available on the branch. ## Validation - Parsed all workflow YAML files. - Checked every `actions/checkout` has an earlier SSH-agent step in the same job. - Ran `git diff --check`.
8c44867 to
dc4f0bf
Compare
resolves int-8606
resolves int-8602
Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights. Commit: 15c6048 |
resolves int-8615
towards int-4393 we can't test it until the action exists on `main` so i'll defer testing to after we merge the v2 branch to main
v2.0.0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.