Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ revive-build-utils = { version = "1.2.0", path = "crates/build-utils" }
revive-builtins = { version = "1.1.0", path = "crates/builtins" }
revive-common = { version = "1.1.0", path = "crates/common" }
revive-differential = { version = "1.0.0", path = "crates/differential" }
revive-fuzz = { version = "1.0.0", path = "crates/fuzz" }
revive-integration = { version = "1.3.0", path = "crates/integration" }
revive-linker = { version = "1.0.0", path = "crates/linker" }
revive-llvm-context = { version = "1.3.0", path = "crates/llvm-context" }
Expand Down
18 changes: 18 additions & 0 deletions Makefile

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the clean target also need an update? I saw that fuzz's README suggested some cleanup.

Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
install-wasm \
install-llvm-builder \
install-llvm \
install-llvm-sancov \
install-revive-runner \
format \
clippy \
Expand All @@ -19,6 +20,7 @@
test-wasm \
test-llvm-builder \
test-book \
fuzz-libfuzzer \
bench \
bench-pvm \
bench-evm \
Expand All @@ -45,6 +47,15 @@ install-llvm: install-llvm-builder
git submodule update --init --recursive --depth 1
revive-llvm build --llvm-projects lld --llvm-projects clang

# LLVM built with `-fsanitize=fuzzer-no-link` so libFuzzer sees C++
# edges. Shares `target-llvm/<env>/` with `install-llvm` — run
# `revive-llvm clean` when switching. Resulting archives only link
# into fuzz-target binaries (which supply libFuzzer's runtime).
# `JOBS=N` caps thread count.
install-llvm-sancov: install-llvm-builder
git submodule update --init --recursive --depth 1
CMAKE_BUILD_PARALLEL_LEVEL=$(JOBS) revive-llvm build --llvm-projects lld --llvm-projects clang --enable-sancov
Comment on lines +55 to +57

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider moving git submodule update --init --recursive --depth 1 (from install-llvm-sancov and install-llvm) into install-llvm-builder instead so that it's only in one place.


install-revive-runner:
cargo install --locked --force --path crates/runner --no-default-features

Expand Down Expand Up @@ -91,6 +102,13 @@ test-book:
cargo install mdbook --version 0.5.1 --locked
mdbook test book

# Coverage-guided differential fuzzer (libFuzzer + SanCov). Generates
# Solidity, runs solc → EVM and resolc → PVM, diffs the executions.
# `JOBS=N` shards across N forked workers. Requires `solc` and geth's
# `evm` in $PATH. Toolchain pinned via `fuzz/rust-toolchain.toml`.
fuzz-libfuzzer:
cd fuzz && cargo +nightly fuzz run solidity_differential -- -fork=$(or $(JOBS),4) -ignore_crashes=0
Comment on lines +109 to +110

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also cd back to ../ so that we don't have to do it manually after running the command.


bench: install-bin
cargo criterion --all --all-features --message-format=json \
| criterion-table > crates/benchmarks/BENCHMARKS.md
Expand Down
1 change: 1 addition & 0 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
- [IR reference](./developer_guide/newyork_ir.md)
- [PVM and the pallet-revive runtime target](./developer_guide/target.md)
- [Testing strategy](./developer_guide/testing.md)
- [Differential fuzzing](./developer_guide/fuzzing.md)
- [Cross compilation](./developer_guide/cross_compilation.md)
- [FAQ](./faq.md)
- [Roadmap and Vision](./roadmap.md)
Expand Down
277 changes: 277 additions & 0 deletions book/src/developer_guide/fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
# Differential fuzzing

`revive` ships a coverage-guided differential fuzzer that compares the
same logical contract execution between resolc's PVM lowering and
Comment on lines +3 to +4

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update this file to not split paragraphs mid-sentence (just have paragraphs on one line)? It makes it more consistent and also easier when updating the file without having to deal with certain formatting 👍

solc's EVM lowering. Any byte-level mismatch on revert flags or
return data is treated as a backend bug.

The harness uses libFuzzer with SanitizerCoverage feedback over every
Rust crate in resolc's dep graph, so the mutation engine learns from
edge coverage in the parser, IR lowering, and pallet-revive
simulation.

> [!TIP]
>
> The fuzzer shells out to `solc` and geth's `evm`. Both must be on
> `$PATH`. See the [Testing strategy](./testing.md) chapter for the
> geth EVM-tool installation snippet.

## Running the fuzzer

```bash
# 4 forked workers, runs until interrupted
make fuzz-libfuzzer

# Tune fork count
make fuzz-libfuzzer JOBS=8

# Equivalent direct invocation (gives access to every libFuzzer flag)
cd fuzz
cargo +nightly fuzz run solidity_differential -- -fork=8
```

Useful libFuzzer flags (everything after the bare `--` is passed
through to the libFuzzer runtime):

| Flag | Effect |
|---|---|
| `-fork=N` | N parallel forked workers, sharing the corpus dir. |
| `-max_total_time=S` | Wall-clock budget in seconds. |
| `-runs=N` | Iteration budget instead of wall-clock. |
| `-max_len=N` | Cap input length (default 4096). |
| `-rss_limit_mb=N` | OOM threshold per worker (default 2048). |
| `-ignore_crashes=1` | Keep running after a crash instead of stopping. |
| `-print_final_stats=1` | Print coverage / corpus stats at exit. |

> [!NOTE]
>
> libFuzzer needs a nightly Rust toolchain because the SanitizerCoverage
> flags it relies on (`-Zsanitizer-coverage-*`, `-Cpasses=sancov-module`,
> etc.) are nightly-only. The `rust-toolchain.toml` inside `fuzz/`
> scopes nightly to that directory — the rest of the workspace stays
> on stable.

### Reproducing a crash

libFuzzer writes crash inputs to
`fuzz/artifacts/solidity_differential/crash-<sha256>`. The bytes are
deterministic — re-feeding them produces the same `SolidityCase`:

```bash
cd fuzz
cargo +nightly fuzz run solidity_differential \
artifacts/solidity_differential/crash-<sha256>
```

The panic message embeds the rendered contract source plus the action
sequence in hex — enough to file an issue without keeping the input
bytes around.

## What the fuzzer does

```text
libFuzzer mutator (random bytes)
Unstructured ──► TemplateKind::arbitrary
▼ pick template + per-template op selectors
SolidityCase { source, constructor_args, actions }
┌───────────┴────────────┐
▼ ▼
resolc → PVM blob solc → EVM bytecode
│ │
▼ ▼
revive_runner::Specs.run geth `evm` subprocess
(pallet-revive sim) (constructor + per-action calls)
│ │
└───────────┬────────────┘
compare()
├─ deploy_reverted flag
├─ per-action revert flag
└─ per-action return-data bytes
any mismatch → Divergence → panic
```

Both observers replay the same calldata sequence (`constructor_args`
concatenated as constructor input; `fn_0(uint256)` selector + 32-byte
arg per subsequent action). State carries across actions on both
backends. The harness compares revert flags and return-data bytes
only — gas-cost differences between geth `evm` and pallet-revive sim
are by design not part of the comparison.

## Coverage signal

cargo-fuzz compiles every Rust crate in the dep graph with
SanitizerCoverage, so the libFuzzer mutation engine sees edges in:

* `revive-yul` parser
* `resolc` standard-json pipeline
* `revive-llvm-context` codegen (every lowering pattern)
* `revive-runner` / pallet-revive simulation
* `arbitrary` and the generator itself
Comment on lines +109 to +115

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a revive-newyork once main is merged into this branch. (Same for revive/fuzz/README.md.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Also after reading Georgiy's post, there this fuzzer can be improved in many ways.


Opaque (not instrumented by default):

* `solc` subprocess
* geth `evm` subprocess
* LLVM C++ libraries inside resolc

To extend SanCov instrumentation into LLVM's C++ codebase, rebuild
LLVM via `make install-llvm-sancov`. That target adds
`-fsanitize=fuzzer-no-link` to LLVM's C/CXX flags, so every basic
block in the resulting static archives carries libFuzzer edge
counters. The `-no-link` suffix avoids requiring `libclang_rt.fuzzer.a`
at LLVM-build time — the libFuzzer runtime comes from the fuzz-target
binary.

> [!WARNING]
>
> A SanCov-instrumented LLVM at `$LLVM_SYS_221_PREFIX` will break
> non-fuzz `cargo build` invocations: the linker needs
> `__sanitizer_cov_*` symbols that only the libFuzzer runtime
> supplies. Keep two LLVM trees if you need both: switch via
> `LLVM_SYS_221_PREFIX`.
Comment on lines +133 to +137

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what's your own workflow/setup when using both non-instrumented LLVM and the fuzzer (needing instrumented LLVM)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough disk space to keep both LLVMs, so I do rebuild. Otherwise I would keep 2 LLVM versions like it's mentioned here, i.e. build LLVM once, move it to some target-llvm-1 and build another LLVM


This is the right shape: the engine explores **resolc's** Rust-side
input space (and optionally LLVM internals), not solc's or geth's. A
5-minute run on 4 forks against a non-instrumented LLVM loads ~1.17M
edges, of which the templated generator reaches ~28.5K.

## Generator

`crates/fuzz/src/templates.rs` defines eight contract templates. Each
exposes the same wire shape:

```solidity
constructor(uint256 seed) { ... }
function fn_0(uint256 arg) external returns (T) { ... }
```

so the observer doesn't need to know which template it's running.

| Kind | What it exercises |
|---|---|
| `Srem` | `int256` storage slot + `slot0 % arg` — the original [paritytech/revive#527](https://github.com/paritytech/revive/pull/527) probe |
| `ArithChain` | Two storage slots, three signed-arithmetic ops chained |
| `UncheckedArith` | `unchecked { … }` wrapping arithmetic on `uint256` |
| `Mapping` | `mapping(uint256 => uint256)` increment — exercises keccak-derived storage slots |
| `DynArray` | Dynamic `uint256[]` push + indexed read — exercises array layout + length update |
| `RequireGuard` | `require(predicate, "guard")` with eight predicate shapes |
| `LoopAccum` | Bounded `for` accumulator (`bound = arg & 0x1F`, ≤ 31 iterations) |
| `Bitwise` | Pure-bitwise composition (`& \| ^` + `<<` / `>>`) |

Op selectors inside each template are themselves `arbitrary`-driven,
so one template covers many distinct opcode lowerings. The
`#[ignore]`d `every_template_compiles` test pipes each template
through `solc --standard-json` and asserts no fatal errors:

```bash
cargo test -p revive-fuzz --lib -- --ignored every_template_compiles
```

A 256-bit boundary-value pool (`0`, `1`, `-1`, `INT_MIN`, `INT_MAX`,
`2^128`, `2^64`, alternating-bit patterns, …) is mixed into operands
with 1-in-5 probability so corner-case pairs surface within a minute
under pure-random. libFuzzer's mutator preserves the biasing because
it operates on the same byte tape `Unstructured` consumes.

## Divergence taxonomy

`Divergence` (in `crates/fuzz/src/differential.rs`) categorises every

@elle-j elle-j Jul 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you update the file to find wherever spellings should be updated to American spelling (e.g. categorises -> categorizes, optimiser -> optimizer, etc) for consistency.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

outcome:

| Variant | Meaning | libFuzzer treatment |
|---|---|---|
| `EvmCompile(msg)` | `solc → EVM` panicked. Almost always a **generator bug** (template emitted Solidity solc rejects). | **Silent skip** in the libFuzzer panic helper — doesn't burn a corpus slot. |
| `PvmCompile(msg)` | `resolc → PVM` panicked. solc accepted but resolc choked. | **Crash** — exactly the kind of resolc ICE the fuzzer is meant to find. |
| `DeployRevert { … }` | Constructor reverted on one backend but not the other. | Crash. |
| `ActionCount { … }` | Action result vectors of unequal length. Defensive; should not happen. | Crash. |
| `ActionRevert { … }` | One backend reverted on a call the other completed. | Crash. |
| `ActionReturnData { … }` | Both completed; return-data bytes differ. | Crash. |

Compile failures used to panic the whole process via
`.expect("source should compile")` inside resolc's `test_utils`. The
harness wraps both calls in `std::panic::catch_unwind` and routes the
payload into a dedicated variant, so a generator bug doesn't poison
the whole libFuzzer run.
Comment on lines +196 to +200

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


## Performance

Templated Solidity on an M-class laptop:

| Step | Cost |
|---|---|
| `arbitrary → SolidityCase` | <1 ms |
| `solc → EVM` (subprocess) | ~80–100 ms |
| `resolc → PVM` (in-process) | ~150 ms |
| geth `evm` per action | ~10 ms × 2–4 actions |
| `revive_runner::Specs.run()` per action | ~5 ms |

≈ 250 ms / iter end-to-end. With `-fork=12`: ~30–40 iter/sec total.

Five-minute runs from an empty corpus, four forks:

| Generator | Iters | `cov` (edges) | `ft` (features) | Corpus |
|---|---|---|---|---|
| SREM-only | 14,077 | 26,669 | 32,392 | 101 |
| Templated | 6,087 | **28,566** | **46,530** | **313** |

The templated generator opens ~2K more edges and 14K more features
than the SREM-only baseline, and keeps a 3× larger corpus.

> [!WARNING]
>
> libFuzzer is single-threaded per process. Use `-fork=N` for
> parallelism — not Rust threads, not rayon. Rayon inside a fuzz
> target would interleave coverage counters and produce useless data.

## Known limitations

* **Subprocess overhead dominates.** `solc` + `evm` subprocess costs
cap throughput at ~30 iter/sec on 12 cores. A native-Rust EVM on the
EVM side would be ~10× faster but is out of scope.
* **Recursive resolc isn't instrumented.** `resolc::test_utils` spawns
the installed `~/.cargo/bin/resolc` as a subprocess via
`--recursive-process` for per-contract lowering. Only the
in-process call sites carry SanCov instrumentation; the subprocess
is opaque to libFuzzer. `revive_fuzz::warn_if_resolc_stale` logs a
warning when the installed binary is older than workspace source,
to flag the case where a local fix isn't visible to the fuzzer.
* **One external function shape.** The harness hardcodes
`fn_0(uint256)` so the observer doesn't have to vary calldata
encoding. Removing that assumption requires generalising
`observe::action_calldata`.
* **Solc internals are opaque.** libFuzzer can't see solc's Yul
optimiser. Fine for resolc-side bug finding; not useful for solc
bug finding.
* **Stack traces aren't captured** in `catch_unwind` payloads. Easy
follow-up to wire `std::backtrace::Backtrace::capture()` through.

## Code map

```text
crates/fuzz/ # revive-fuzz harness library (stable Rust, main workspace)
├── Cargo.toml # `panic-on-divergence` feature
├── src/
│ ├── lib.rs # re-exports + `panic_on_divergence` helper
│ ├── generator.rs # SolidityCase + Arbitrary impl
│ ├── templates.rs # 8 template renderers + solc self-test
│ ├── pipeline.rs # solc / resolc invocation helpers
│ ├── observe.rs # observe_evm / observe_pvm
│ ├── differential.rs # Divergence + run_case_solc_evm
│ └── stale.rs # `warn_if_resolc_stale`

fuzz/ # cargo-fuzz package (separate workspace, nightly)
├── Cargo.toml # libfuzzer-sys + path-dep on revive-fuzz
├── rust-toolchain.toml # nightly, scoped here only
└── fuzz_targets/
└── solidity_differential.rs # libFuzzer entry
```

The split exists because cargo-fuzz needs a nightly toolchain and
pulls in `libfuzzer-sys` — keeping that in a separate workspace
prevents either from leaking into the main `cargo build`.
Loading
Loading