Skip to content

perf(v2.1): improve poseidon2 memory metering#2944

Open
shuklaayush wants to merge 16 commits into
develop-v2.1.0-rv64from
fix/metered-occupancy-default-old
Open

perf(v2.1): improve poseidon2 memory metering#2944
shuklaayush wants to merge 16 commits into
develop-v2.1.0-rv64from
fix/metered-occupancy-default-old

Conversation

@shuklaayush

@shuklaayush shuklaayush commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR makes Poseidon2 memory metering closer to the rows that trace generation actually emits.

A memory segment has two trees: the memory tree at the start of the segment and the memory tree at the end of the segment. Metering already counts the leaves and Merkle nodes touched by the segment. This PR also checks whether each touched leaf or node was already non-default at the latest checkpoint.

That matters for Poseidon2 because default memory values share hash inputs. If many old leaves or old internal nodes are still default, trace generation can reuse the same Poseidon2 row for those default inputs instead of emitting one row per node.

Model

The code uses two memory trackers:

segment_memory
    clears at segment start
    counts leaves and Merkle nodes touched by this segment

global_memory
    persists across checkpoints
    tracks memory that is already known to be non-default

Each memory page stores one u64 leaf mask:

one page = 64 memory leaves

leaf index:   0  1  2  3  ... 63
leaf_mask:    1  0  1  0  ...  0

For each leaf or Merkle node counted in the segment, the trackers answer two questions:

segment_memory: is this new in the current segment?
global_memory:  was this already non-default at the checkpoint?

The result is:

new in segment + already non-default -> count one old Poseidon2 row for this node
new in segment + still default       -> count it in the shared default bucket

A first touch looks like this:

checkpoint / old side              segment end / new side

          D3                                X3
        /    \                            /    \
      D2      D2          write         X2      D2
     /  \    /  \        ------>       /  \    /  \
   D1   D1 D1   D1                    X1  D1 D1   D1
   |                                  |
default leaf                       touched leaf

D = default node input shared with other default nodes at the same height
X = node changed by this segment and counted normally

Default rows are shared by value:

default leaves                     -> 1 old Poseidon2 row
default internal nodes at height h -> 1 old Poseidon2 row for that height

So the estimate is:

new_rows           = segment_leaves + segment_merkle_nodes
old_nondefault     = new_rows - first_touch_default_nodes
old_default_rows   = has_default_leaf + count(default_internal_heights)

Poseidon2 rows = new_rows + old_nondefault + old_default_rows

first_touch_default_nodes means leaves and Merkle nodes that are touched in this segment and were still default at the checkpoint. default_internal_heights is a bit mask of Merkle heights that had at least one such default internal node.

Initial Memory

The global tracker is seeded from the executable's sparse initial memory image:

nonzero initial byte -> containing memory leaf is non-default
zero initial byte    -> leaf stays default

For DEFERRAL_AS, sparse image bytes are converted back to deferral field-cell units before mapping them to Merkle leaves.

RVR

RVR uses the same Rust accounting path as the interpreter.

generated C page buffers
        |
        v
PageAccess { page_id, leaf_mask }
        |
        v
MemoryCtx
        |
        +--> segment_memory: Boundary/Merkle rows for this segment
        +--> global_memory: old Poseidon2 rows for default vs non-default memory

The generated C hot path is kept narrow:

  • aligned normal memory loads/stores record one memory leaf directly
  • page buffers pass PageAccess { page_id, leaf_mask } into Rust
  • extension calls that already trace their memory do not flush and reload local page state again
  • public-values and deferral tracing skip normal-memory tracer work when the address space is not AS_MEMORY

Performance

RVR reth benchmark comparison for block 23992138:

latest origin/develop-v2.1.0-rv64: 75 segments, compile 266.322s, execute 1.999s
this branch:                         74 segments, compile 266.230s, execute 1.932s

The branch reduces the segment count by one on this benchmark without increasing compile-metered time.

Testing

  • RUSTC_WRAPPER= cargo test --profile fast -p rvr-openvm aligned_memory_access_uses_single_leaf_trace --lib
  • RUSTC_WRAPPER= cargo test --profile fast -p openvm-circuit metered --lib
  • git diff --check
  • RVR reth metered benchmark against latest origin/develop-v2.1.0-rv64

@shuklaayush shuklaayush marked this pull request as draft June 29, 2026 15:42
@shuklaayush shuklaayush changed the title fix(v2.1): tighter poseidon2 metering perf(v2.1): tighten default-memory Poseidon2 metering Jun 29, 2026
@shuklaayush shuklaayush changed the title perf(v2.1): tighten default-memory Poseidon2 metering perf(v2.1): improve Poseidon2 memory metering Jun 29, 2026
@shuklaayush shuklaayush changed the title perf(v2.1): improve Poseidon2 memory metering perf(v2.1): improve poseidon2 memory metering Jun 29, 2026
@github-actions

This comment has been minimized.

@shuklaayush shuklaayush force-pushed the fix/metered-occupancy-default-old branch from 118745a to 012a16a Compare June 30, 2026 21:36
@shuklaayush shuklaayush requested a review from gdmlcjs June 30, 2026 21:40
@shuklaayush shuklaayush marked this pull request as ready for review June 30, 2026 21:40
@github-actions

Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
group app.proof_time_ms app.cycles leaf.proof_time_ms
fibonacci 1,025 4,000,051 388
keccak 15,768 14,365,133 3,040
sha2_bench 8,178 11,167,961 1,000
regex 1,156 4,090,656 354
ecrecover 431 112,210 280
pairing 588 592,827 300
kitchen_sink 3,919 1,979,971 870

Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights.

Commit: 020479d

Benchmark Workflow

@gdmlcjs gdmlcjs left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No errors found!


debug_assert!(trace_heights.len() >= 2);
let poseidon2_idx = trace_heights.len() - 2;
let initial_default_poseidon_rows = global_first_touches.estimated_default_poseidon_rows();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default poseidon rows seem to be added for each (1000 instruction) interval in the segment. Could we ignore these duplicates?

/// New segment internal nodes whose initial-side value is a canonical default node.
pub(super) merkle_nodes: u32,
/// Bit `h` is set when height `h` has at least one first-touch internal node.
merkle_height_mask: u64,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it perhaps be simpler to change this to something like "max_height" instead of "mask" since the elements of the bit mask are all contiguous?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants