@@ -1152,3 +1152,112 @@ After the fix, all bundles 13450–13454 finalized successfully on-chain:
11521152- [ ] Verifier digests match proofs
11531153- [ ] Relayer started with ` --config <path> ` and ` --min-codec-version 10 `
11541154- [ ] Target bundles/batches reset to ` rollup_status = 1 `
1155+
1156+ ---
1157+
1158+ ## 2026-06-09: Mainnet Shadow Fork — Re-prove + Real On-Chain Finalize (Bundles 17297–17301)
1159+
1160+ Full end-to-end run on a ** mainnet** Anvil fork using the dockerized coordinator/prover images
1161+ (` zhuoatscroll/{coordinator-api,prover}:v4.7.13-openvm16 ` ): imported bundles 17297–17301
1162+ (batches 517761–517765, codec v10, single-batch each) from mainnet RDS, cleared production proofs,
1163+ re-proved all 20 chunks + 5 batches + 5 bundles locally on 4×RTX 3090, deployed a fresh
1164+ ` ZkEvmVerifierPostFeynman ` , and finalized all 5 bundles with ** real** ` finalizeBundlePostEuclidV2 `
1165+ transactions. ` lastFinalizedBatchIndex ` advanced 517760 → 517765 and ` finalizedStateRoots[517765] `
1166+ matched the DB batch state root. Three new traps surfaced — documented below.
1167+
1168+ ### Trap A: halo2 SRS files must live in ` ~/.openvm/params/ ` , NOT ` ~/.openvm/ `
1169+
1170+ ** Symptom** : chunk and batch proofs succeed, but the ** first bundle proof** crashes the prover at the
1171+ halo2 wrapping stage:
1172+ ```
1173+ thread 'tokio-rt-worker' panicked at .../halo2/utils.rs:127:
1174+ Params file "/root/.openvm/params/kzg_bn254_23.srs" does not exist
1175+ ```
1176+ Container exits; the bundle stays stuck at ` proving_status = 2 ` .
1177+
1178+ ** Root cause** : ` CacheHalo2ParamsReader ` reads the KZG SRS from ` $HOME/.openvm/params/kzg_bn254_{k}.srs `
1179+ (openvm ` extensions/native/recursion/src/halo2/utils.rs ` ). Only the bundle proof's ` halo2_outer ` /
1180+ ` halo2_wrapper ` stages need it (k = 22/23/24); chunk/batch proofs use smaller in-tree params, so the
1181+ problem stays hidden until the first bundle reaches halo2. If the ` .srs ` files are downloaded/placed at
1182+ ` ~/.openvm/ ` root (or any other dir), they are silently not found.
1183+
1184+ ** Fix** : ensure the SRS files are under the ` params/ ` subdir of the mounted openvm dir:
1185+ ``` bash
1186+ mkdir -p ~ /.openvm/params
1187+ mv ~ /.openvm/kzg_bn254_2{2,3,4}.srs ~ /.openvm/params/ # if they landed in the wrong place
1188+ # files: kzg_bn254_22.srs (~513MB), _23.srs (~1.1GB), _24.srs (~2.1GB)
1189+ ```
1190+ When running the prover in Docker, mount the host openvm dir to ` /root/.openvm ` (writable) and confirm
1191+ ` /root/.openvm/params/kzg_bn254_23.srs ` resolves inside the container.
1192+
1193+ ### Trap B: prover Docker ` --gpus device=N ` renumbers the GPU to index 0 inside the container
1194+
1195+ ** Symptom** : prover container exits immediately (code 139) with:
1196+ ```
1197+ CudaError { code: 100, name: "cudaErrorNoDevice", message: "no CUDA-capable device is detected" }
1198+ ```
1199+ Only the prover on GPU 0 works; provers for GPUs 1/2/3 crash on boot.
1200+
1201+ ** Root cause** : ` docker run --gpus "device=N" ` exposes ** only** that one GPU to the container and
1202+ ** renumbers it to index 0** inside. Setting ` CUDA_VISIBLE_DEVICES=N ` (the host index) then points at a
1203+ device that doesn't exist in the container.
1204+
1205+ ** Fix** : pair ` --gpus "device=$i" ` with ` CUDA_VISIBLE_DEVICES=0 ` (the only visible device in-container):
1206+ ``` bash
1207+ docker run -d --name shadow-prover-$i --network host \
1208+ --gpus " device=$i " -e CUDA_VISIBLE_DEVICES=0 -e RUST_MIN_STACK=16777216 \
1209+ -v .../prover-$i .json:/prover/conf/config.json:ro \
1210+ -v .../prover-$i :/prover/.work -v ~ /.openvm:/root/.openvm \
1211+ zhuoatscroll/prover:v4.7.13-openvm16 --config /prover/conf/config.json
1212+ ```
1213+ (Alternative: ` --gpus all ` + ` CUDA_VISIBLE_DEVICES=$i ` .)
1214+
1215+ ### Trap C: galileoV2 verifier assets are under S3 ` v0.8.0/ ` , prover circuits under ` galileov2/ `
1216+
1217+ The coordinator verifier assets (` openVmVk.json ` , ` verifier.bin ` , ` root_verifier_vk ` ) for the galileoV2
1218+ fork are served from ` scroll-zkvm/v0.8.0/verifier/ ` (the ` galileov2/verifier/ ` path returns ** 403** ),
1219+ while the prover downloads its circuits (` {chunk,batch,bundle}/<vk_hash>/app.vmexe ` ) from
1220+ ` scroll-zkvm/galileov2/ ` . They are nonetheless consistent: the VK hashes in
1221+ ` v0.8.0/verifier/openVmVk.json ` (` chunk 64cf16… ` , ` batch e9d653… ` , ` bundle 6b155f… ` ) match the circuit
1222+ objects available under ` galileov2/ ` . Download coordinator assets from ` v0.8.0/verifier/ ` ; point the
1223+ prover ` circuits.galileoV2.base_url ` at ` …/scroll-zkvm/galileov2/ ` .
1224+
1225+ ### Other notes from this run
1226+
1227+ - ** Coordinator/prover are run via the prebuilt Docker images** (native prover build fails on CUDA on
1228+ this host). Run with ` --network host ` so the coordinator reaches the shadow DB on ` localhost:5433 ` ,
1229+ the prover reaches the coordinator on ` localhost:8390 ` , and the relayer reaches Anvil on
1230+ ` localhost:18545 ` . Coordinator entrypoint is ` /bin/coordinator_api ` ; ` LD_LIBRARY_PATH ` for ` libzkp.so `
1231+ is already baked into the image. Coordinator config: ` l2.chain_id = 534352 ` (Scroll ** mainnet** L2),
1232+ ` l2.l2geth.endpoint ` = internal debug-enabled proxy, one ` verifiers[] ` entry with
1233+ ` fork_name: galileoV2 ` + low ` min_prover_version ` .
1234+ - ** ` l2_block ` export by JOIN on ` chunk_hash ` is pathologically slow** against the prod RDS (full scan of
1235+ a huge table). Export by ** block-number range** instead (`WHERE number BETWEEN <min_start> AND
1236+ <max_end>` , PK-indexed, ~tens of seconds). The block range is the min ` start_block_number` / max
1237+ ` end_block_number ` across the target batches' chunks.
1238+ - ** ` chunk_proofs_status ` / ` batch_proofs_status ` may not auto-promote** in the shadow setup. A small
1239+ watcher loop that sets ` batch.chunk_proofs_status = 2 ` once all of a batch's chunks reach
1240+ ` proving_status = 4 ` , and ` bundle.batch_proofs_status = 2 ` once all of a bundle's batches reach
1241+ ` proving_status = 4 ` , keeps the chunk→batch→bundle pipeline flowing without stalls.
1242+ - ** The batch committer fails harmlessly during a finalize-only run** : the relayer's commit sender
1243+ (derived from ` commit_sender_signer_config ` , e.g. ` 0xBC732a76… ` ) is unfunded and not a sequencer, so
1244+ ` commitBatch ` loops with "Insufficient funds"/` ErrorCallerIsNotSequencer ` . This is expected and does
1245+ ** not** affect the bundle finalizer, which runs independently and uses the (funded, prover-authorized)
1246+ finalize sender.
1247+ - ** Set ` bundle_index_seq ` above the imported max** (e.g. ` SELECT setval('bundle_index_seq', 18000) ` )
1248+ before starting the relayer, so any proposer-created bundle gets a higher index and cannot block
1249+ ` GetFirstPendingBundle ` (orders by ` index ASC ` ). Also clear stale ` finalize_tx_hash ` on imported
1250+ bundles/batches.
1251+
1252+ ### Successful finalize transactions (mainnet fork)
1253+
1254+ | Bundle | Batch | Finalize Tx | Status |
1255+ | --------| -------| -------------| --------|
1256+ | 17297 | 517761 | ` 0x5e8a7e01…cd1b ` | ✅ |
1257+ | 17298 | 517762 | ` 0xf55e9f00…c407f ` | ✅ |
1258+ | 17299 | 517763 | ` 0xa6466c0a…412f ` | ✅ |
1259+ | 17300 | 517764 | ` 0x1deca7d1…2d8f ` | ✅ |
1260+ | 17301 | 517765 | ` 0x204b28de…a100 ` | ✅ |
1261+
1262+ Final ` lastFinalizedBatchIndex = 517765 ` ; verifier deployed at ` 0xf74BcAA17bbb3B0a996aF04a7b301E69501C4bf0 `
1263+ (plonk ` 0x1d710357818776073705b29482486AbCF586f33b ` ), digests ` 0x00398b78… ` / ` 0x0021785a… ` .
0 commit comments