perf(v2.1): improve poseidon2 memory metering#2944
Open
shuklaayush wants to merge 16 commits into
Open
Conversation
This comment has been minimized.
This comment has been minimized.
118745a to
012a16a
Compare
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights. Commit: 020479d |
gdmlcjs
approved these changes
Jul 1, 2026
|
|
||
| debug_assert!(trace_heights.len() >= 2); | ||
| let poseidon2_idx = trace_heights.len() - 2; | ||
| let initial_default_poseidon_rows = global_first_touches.estimated_default_poseidon_rows(); |
There was a problem hiding this comment.
The default poseidon rows seem to be added for each (1000 instruction) interval in the segment. Could we ignore these duplicates?
| /// New segment internal nodes whose initial-side value is a canonical default node. | ||
| pub(super) merkle_nodes: u32, | ||
| /// Bit `h` is set when height `h` has at least one first-touch internal node. | ||
| merkle_height_mask: u64, |
There was a problem hiding this comment.
Would it perhaps be simpler to change this to something like "max_height" instead of "mask" since the elements of the bit mask are all contiguous?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR makes Poseidon2 memory metering closer to the rows that trace generation actually emits.
A memory segment has two trees: the memory tree at the start of the segment and the memory tree at the end of the segment. Metering already counts the leaves and Merkle nodes touched by the segment. This PR also checks whether each touched leaf or node was already non-default at the latest checkpoint.
That matters for Poseidon2 because default memory values share hash inputs. If many old leaves or old internal nodes are still default, trace generation can reuse the same Poseidon2 row for those default inputs instead of emitting one row per node.
Model
The code uses two memory trackers:
Each memory page stores one
u64leaf mask:For each leaf or Merkle node counted in the segment, the trackers answer two questions:
The result is:
A first touch looks like this:
Default rows are shared by value:
So the estimate is:
first_touch_default_nodesmeans leaves and Merkle nodes that are touched in this segment and were still default at the checkpoint.default_internal_heightsis a bit mask of Merkle heights that had at least one such default internal node.Initial Memory
The global tracker is seeded from the executable's sparse initial memory image:
For
DEFERRAL_AS, sparse image bytes are converted back to deferral field-cell units before mapping them to Merkle leaves.RVR
RVR uses the same Rust accounting path as the interpreter.
The generated C hot path is kept narrow:
PageAccess { page_id, leaf_mask }into RustAS_MEMORYPerformance
RVR reth benchmark comparison for block
23992138:The branch reduces the segment count by one on this benchmark without increasing compile-metered time.
Testing
RUSTC_WRAPPER= cargo test --profile fast -p rvr-openvm aligned_memory_access_uses_single_leaf_trace --libRUSTC_WRAPPER= cargo test --profile fast -p openvm-circuit metered --libgit diff --checkorigin/develop-v2.1.0-rv64