Skip to content

Introduction of REST based + SSZ serialized new-payload-with-witness#773

Open
developeruche wants to merge 5 commits intoethereum:mainfrom
developeruche:feat/new-payload-with-witness
Open

Introduction of REST based + SSZ serialized new-payload-with-witness#773
developeruche wants to merge 5 commits intoethereum:mainfrom
developeruche:feat/new-payload-with-witness

Conversation

@developeruche
Copy link
Copy Markdown

@developeruche developeruche commented Mar 20, 2026

Currently, for a zkVM prover to statelessly validate a block, the Execution Witness (the primary input needed for this validation) can only be obtained by calling two separate Engine API methods sequentially: first engine_newPayload, then debug_executionWitness. This introduces two fundamental problems:

1. Impossible for zkAttestors to follow the exact head of the chain

A block must first be fully executed via engine_newPayload before debug_executionWitness can be called. This means in the very best case, attestation happens one block behind the chain's head — making true real-time attestation impossible.

2. Slow witness retrieval, especially for large witnesses

On an average Ethereum block, the Execution Witness is about 17 MB. Using the current two-call method:

Witness Size (Approx) debug_executionWitness Retrieval (Geth)
17 MB (typical) ~1.4 s
100 MB ~3.9 s
300 MB ~11 s
500 MB ~23 s

These numbers are from the Geth client.

Approach

The key question was: how do we cut this down enough for true real-time attestation?

Step 1: Combine the two calls

The first step was introducing engine_newPayloadWithWitness, a JSON-RPC method that combines engine_newPayload and debug_executionWitness into a single call. This gave roughly a 2× improvement — but was still not enough. The timeout for engine_newPayload is 8 seconds; we need the new method to be comfortably within this budget even for worst-case blocks (~500 MB witness).

Step 2: Fix the transport and serialization bottlenecks

Profiling engine_newPayloadWithWitness over JSON-RPC revealed two dominant bottlenecks:

  1. Transport: Sending hundreds of megabytes of witness data over JSON-RPC was extremely expensive — the 500 MB witness case spent 74% of total time just on transport overhead.
  2. Serialization: The witness is JSON-serialized with hex encoding, roughly doubling the on-the-wire size (a ~300 MB raw witness becomes ~500–600 MB in JSON hex).

To address both, this spec introduces POST /new-payload-with-witness:

  • Pure HTTP for transport (no JSON-RPC framing overhead)
  • SSZ serialization for the response (compact binary encoding, no hex doubling)

Step 3: Optimize the EL pipeline

After eliminating serialization and transport bottlenecks, the dominant cost was the EL's "store" phase (3,791 ms, 98% of block time). Profiling revealed that this was not CPU-bound by the Merkle trie update itself, but rather by I/O contention and blocking synchronization:

  1. Trie worker optimization: The trie update worker used a zero-buffered channel (sync_channel(0)), forcing block N's in-memory trie update to wait until block N−1's disk flush completed. Decoupling these into two threads with a buffered channel eliminated the blocking.
  2. Blocking witness persistence: store_witness was on the critical path, serializing a ~300 MB witness and writing it to RocksDB before store_block could proceed. This was moved to a background thread since it's only needed for debug_executionWitness RPC lookups.
  3. Concurrent Merkle trie updates: Storage trie updates for individual accounts are independent and can be parallelized. A 16-shard worker pool was introduced to process account and storage trie updates concurrently, with cross-worker communication for storage root aggregation.

These optimizations reduced the store phase from 3,791 ms to 726 ms (~5.2× improvement).

Benchmark Results

All benchmarks were performed using the Ethrex client running on a Kurtosis local testnet with Lighthouse as the CL. The test block contained 203 transactions consuming 36 Mgas, producing a ~302 MB SSZ-encoded witness which is about 500mb if it were to be JSON encoded.

engine_newPayloadWithWitness (JSON-RPC + JSON)

Implementation: ethrex engine_newPayloadWithWitness

Latency by witness size

Witness Size (Approx) Total Round-Trip (ms)
100 MB 1,117 ms
300 MB 6,794 ms
500 MB 8,131 ms

Component breakdown (500 MB witness)

Component Duration (ms) % of Total Notes
Block Production 60 ms 0.7%
Witness Generation 528 ms 6.5%
Serialization (RLP + Hex) 1,499 ms 18%
Transport / Overhead 6,044 ms 74% The bottleneck
TOTAL 8,131 ms 100%

POST /new-payload-with-witness (HTTP + SSZ) — Initial

Implementation: ethrex feat/zkengine-http (before EL optimizations)

Component breakdown (500 MB witness → 306 MB SSZ)

Component Duration % of Block Total
Validate 0.01 ms
Execute 95 ms 2%
Merkle 0.58 ms
Store 3,791 ms 98%
Block total 3,887 ms
Execution + Witness 3,889 ms ≈ block total
SSZ Encoding 71 ms
EL total 3,984 ms

CL side:

Metric Value
Round-trip latency 4,653 ms ✓ within 8s budget
Witness size received 306 MB ✓ (500 MB shrunk to 306 MB)

POST /new-payload-with-witness (HTTP + SSZ) — Optimized

Implementation: ethrex feat/zkengine-http (with EL pipeline optimizations)

Component breakdown (302 MB SSZ witness, 203 txs, 36 Mgas)

Component Duration % of Block Total
Validate 0.00 ms
Execute 71 ms 9%
Merkle 0.35 ms
Store 726 ms 91%
Block total 798 ms
Execution + Witness 800 ms ≈ block total
SSZ Encoding 117 ms
EL total 932 ms

Store breakdown (726 ms)

Sub-step Duration % of Store
Witness generation 574 ms 79%
Witness persistence ~149 ms 20%
Store block (RocksDB writes + trie channel + commit) 3.5 ms 0.5%

Summary comparison

Metric JSON-RPC + JSON HTTP + SSZ (initial) HTTP + SSZ (optimized)
EL total 8,131 ms 3,984 ms 932 ms
Wire size ~500 MB 306 MB 302 MB
Serialization 1,499 ms 71 ms 117 ms
Store phase 3,791 ms 726 ms

Note

With the EL pipeline optimizations, the dominant cost has shifted from the Merkle trie update (previously ~3,500 ms) to witness generation (574 ms, 79% of the store phase). Witness generation involves re-traversing the state and storage tries with logging enabled to capture the pre-state and post-update trie nodes required for stateless validation. This is the current irreducible floor.

For more context on "Store"

The "store" phase is the EL's state commitment step. It makes the executed block's state available for subsequent blocks. After EL pipeline optimizations, its sub-steps are:

Sub-step Duration Description
Witness generation ~574 ms Re-traverses the state and storage tries to collect all accessed trie nodes, contract bytecodes, and block headers into the ExecutionWitness. This is the dominant cost — I/O-bound by MPT node reads.
Witness persistence ~149 ms Serializes the witness and writes it to the database for debug_executionWitness RPC lookups. Can be deferred to a background task.
Trie channel + RocksDB writes ~3.5 ms Sends the trie diff to the background worker (0.01 ms), writes block data to RocksDB (1.3 ms), waits for in-memory trie update (0.0 ms), and commits (2.2 ms).

The witness generation cost scales with the number of state accesses in the block and is unaffected by the choice of transport or serialization format.

Prototype Code

  1. engine_newPayloadWithWitness (JSON-RPC method): In its Ethrex implementation here, a new JSON-RPC method was introduced. This method behaves exactly like engine_newPayloadV5 but additionally returns the ExecutionWitness of the block in the response.

  2. POST /new-payload-with-witness (HTTP + SSZ endpoint): In the Ethrex implementation here, the same logic is exposed as a REST endpoint on the Engine API server. The request accepts the same JSON parameters as engine_newPayloadV5. The response is SSZ-encoded bytes containing the PayloadStatus and the ExecutionWitness.

Comment thread src/engine/rest.md Outdated
Comment thread src/engine/rest.md Outdated
Comment thread src/engine/rest.md Outdated
| 0 | `status` | `uint8` |
| 1 | `latest_valid_hash` | `Union[None, ByteVector[32]]` |
| 2 | `validation_error` | `Union[None, List[uint8, VALIDATION_ERROR_MAX]]` |
| 3 | `witness` | `List[uint8, MAX_WITNESS_BYTES]` |
Copy link
Copy Markdown

@jsign jsign Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why witness is SSZ encoded if the container is also meant to be SSZ serialized?

As in why not:

| Index | Field name | SSZ type |
| ----- | ---------- | -------- |
| 0 | `status` | `uint8` |
| 1 | `latest_valid_hash` | `Union[None, ByteVector[32]]` |
| 2 | `validation_error` | `Union[None, List[uint8, VALIDATION_ERROR_MAX]]` |
| 3 | `witness` | `Union[None, ExecutionWitnessV1]` |

or even

| Index | Field name | SSZ type |
| ----- | ---------- | -------- |
| 0 | `payload_status` | `PayloadStatusV1` |
| 1 | `witness` | `Union[None, ExecutionWitnessV1]` |

Or this embedded SSZ bytes is done intentionally to be able to avoid re-serializing at the client size? Like trying to use verbatim to inject as part of guest program input (I mean "inject" since guest program input isn't only the execution witness which is a field in a bigger container)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the goal was to prevent reserialization when preparing zkVM I/O buf, and I kept in validation_error and latest_valid_hash so this response could still be used in a similar manner engine_newPayload response if been used in the CL.

Copy link
Copy Markdown

@jsign jsign Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- I'm thinking if ssz libs support "injecting" serialized SSZ of a field which is part of a container. As in, see here for the StatelessInput struct. This struct is what we send as unique input to the guest program. The ExecutionWitness is a field there, so if we want to leverage this "already serialized" reality, I feel prob ssz libs should kind of support this feature?

I think I see three potential options:

  1. We assume SSZ libs should support this kind of feature
  2. Pass the ExecutionInput as an independent input from the rest of StatelessInput -- so the guest program has two inputs instead of one. Here using the already serialized execution witness feels more natural -- might feel a bit awkward from an API design perspective (but not much I think), but if saves enough time might be justified.
  3. Don't do this optmization in the EngineAPI layer, and just make it part of the StatelessInput serialization that is happening anyway. I'm not sure if we already benchmarked how much overhead in time this is? (i.e. don't do this optimisation)

cc @kevaundray if he might want to chime in too.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For option one, SZZ don't currently have this feature. I ran the benchmark for option 3, for our worst case(500mb), the overhead to be saved was 1ms, and for a regular block, the overhead is 0.2ms. This makes the optimization almost irrelevant. I would propose we just move on with option 3

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep sounds good to me!

Comment thread src/engine/rest.md Outdated
Comment thread src/engine/rest.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants