@@ -401,9 +401,10 @@ Non-Nix users: install Zisk manually per the
401401 cached on disk (`CUDA_CACHE_MAXSIZE` above), so even a *fresh* process stays
402402 warm — the ~12 s figure above is a separate process from the cold run. So a
403403 single small one-off proof looks slow (it eats the cold-start); amortize by
404- batching, or just disregard it. Proving is stateful with no checkpointing —
405- if a run is killed it loses in-flight shard proofs and restarts from the
406- first shard.
404+ batching, or just disregard it. By default proving is stateful with no
405+ checkpointing — if a run is killed it loses in-flight shard proofs and
406+ restarts from the first shard; use a proof store (`--store-dir`, see
407+ *Sharding large environments* below) to make a sharded run resumable.
407408
408409 ```
409410 RUST_LOG=info cargo run --release -- --gpu \
@@ -464,7 +465,9 @@ Non-Nix users: install Zisk manually per the
464465 or scope to a single constant with `--only-const <name>` (both backends),
465466 which resolves the name to its work item and ships only that constant's
466467 closure sub-env to the guest — so individual constants of a large env still
467- fit the cap.
468+ fit the cap. To prove a large env in full under Zisk, shard it (see
469+ *Sharding large environments* below): each shard ships only its own closure
470+ sub-env, so the pieces fit the cap even when the whole env does not.
468471
469472 **Host RAM cap (`--max-witness-stored`).** Distinct from the in-guest
470473 heap cap above, the prover side (Zisk's `proofman`) holds in-flight
@@ -494,6 +497,99 @@ Non-Nix users: install Zisk manually per the
494497 `CALCULATING_CONTRIBUTIONS`. Not relevant for `--execute` or
495498 `--verify-constraints` modes.
496499
500+ 3. **Sharding large environments.** A whole large env (e.g. all of
501+ `Init`) does not fit a single Zisk proof: it overruns both the 512 MB
502+ guest heap and, more bindingly, the prover's minimal-trace shared-memory
503+ ceiling (64 GB per job on the assembly path). Sharding splits the env's
504+ type-checking work into pieces that are proven independently — each
505+ shipped with only its own dependency closure — and then folded into one
506+ aggregate proof. This is a two-part workflow: produce a shard manifest
507+ (a `.ixes` file), then prove it with `--shard-plan`.
508+
509+ **(a) Produce a shard manifest.** A manifest partitions the env's
510+ per-constant work, balancing shards by real type-checking cost and
511+ cutting where cross-shard dependencies are fewest. Two ways today:
512+
513+ - *Fixed shard count*, via the `ix` CLI. Profiling runs the kernel over
514+ the env (slow) and writes a `.ixesp` cost map; sharding is instant
515+ graph work on that map, so the shard count is cheap to re-tune without
516+ re-profiling:
517+
518+ ```
519+ # Profile once (slow): writes the .ixesp cost map.
520+ lake exe ix profile init.ixe --out init.ixesp
521+ # Cut into N shards (instant): writes the .ixes manifest. Re-run with a
522+ # different N to re-tune — no re-profiling needed.
523+ lake exe ix shard init.ixesp --shards 16 --out init.ixes
524+ ```
525+
526+ - *Sized by a cycle or peak-RAM budget* (so you pick a resource limit
527+ instead of a shard count). This currently lives in a kernel example
528+ binary rather than the `ix` CLI; it profiles and shards in one step
529+ (re-profiling on each run). With no flags it auto-sizes the shard count
530+ from this machine's RAM (peak prover RAM ≈ `45 + 32 × cycles_billions`
531+ GB, targeting ~78 % of total):
532+
533+ ```
534+ # Auto-size N from this box's RAM; writes init.ixe.ixes.
535+ cargo run -p ix-kernel --release --example shard_plan -- init.ixe
536+ # Or force a per-shard guest-cycle budget / a RAM figure / a fixed count:
537+ # --max-cycles 4500000000 --ram-gb 256 --shards 16
538+ ```
539+
540+ **(b) Prove the manifest.** Pass the `.ixes` to the host with
541+ `--shard-plan` alongside the same `--ixe` it was built for. Each shard
542+ becomes one leaf proof (built over only that shard's closure sub-env),
543+ and the leaves fold up the manifest's bisection tree into a single root
544+ proof:
545+
546+ ```
547+ cd zisk
548+ RUST_LOG=info cargo run --release -- --gpu \
549+ --ixe ../init.ixe --shard-plan ../init.ixes
550+ ```
551+
552+ **Resumable proving (`--store-dir`).** A proof store makes a sharded run
553+ restartable and lets proofs be reused across runs and envs:
554+
555+ ```
556+ RUST_LOG=info cargo run --release -- --gpu \
557+ --ixe ../init.ixe --shard-plan ../init.ixes \
558+ --store-dir ../init-store --require-closed
559+ ```
560+
561+ - `--store-dir DIR`: each completed shard proof is banked to
562+ `DIR/proofs/<n>.{proof,cover,asm}` as it finishes. **Re-run the same
563+ command to resume** — shards already covered by the store are skipped
564+ (not re-proven) and folded into the aggregate instead. Reuse is
565+ trustless: a stored proof is re-verified in-circuit at aggregation,
566+ never trusted by the host. Proofs are content-addressed, so a constant
567+ proven in one env is reused in any other env that contains it.
568+ - `--no-reuse`: ignore the store for this run (the "no sharing" baseline);
569+ the existing store is neither read nor overwritten.
570+ - `--require-closed`: error *before* proving unless every typing
571+ dependency of every certified constant is either proven in this run or
572+ already in the store — axioms are the only permitted residual. Turns
573+ "result modulo axioms" from a printed hope into an enforced
574+ precondition. Pairs with `--plan` for a no-prove check.
575+
576+ **Other shard modes and tuning flags.**
577+
578+ - *Static partitioning without a manifest:* `--shard-consts N` (N work
579+ items per shard) or `--shard-bytes B` (pack contiguous items until
580+ serialized bytes exceed `B`). Simpler, but not balanced by real
581+ type-checking cost — `--shard-plan` is the performant path.
582+ - `--plan`: print the planned partition and exit, no proving.
583+ - `--execute`: run the shards in the VM only and print per-shard cycle
584+ counts — use it to measure a shard's true cycle cost (e.g. to pick or
585+ recalibrate a cycle budget) without proving.
586+ - `--only-shard K`: prove just the 1-indexed shard `K` (skips
587+ aggregation) for smoke-testing a single shard.
588+ - `--dump-input FILE`: write one shard's guest stdin to `FILE` for
589+ standalone profiling with `ziskemu` or `cargo-zisk`.
590+ - `--agg-arity A` (default 16): how many child proofs each aggregation
591+ step folds at once.
592+
497593### Nix
498594
499595#### Prerequisites
0 commit comments