Skip to content

Commit d24679f

Browse files
def-claude
andcommitted
tests: Add the cargo-fuzz suite and all of its supporting changes
Squashes every fuzzing-infrastructure change in this branch into one commit, separate from the individual bug fixes the fuzzing surfaced (one commit each): * The cargo-fuzz crates under `src/*/fuzz` — targets, seed corpora, dictionaries, and `prepare-corpus.sh` scripts — covering the SQL parser / pretty-printer, repr (strconv, jsonb, Row codec/proto, arithmetic oracles), the expr optimizer transforms, Avro/Protobuf/CSV/pgwire/pgcopy decoders, pgrepr/pgtz, the upsert state machine, persist durable-state decode, and the proto round-trips across storage-types/persist/catalog/external table descs. * The harness and runner wiring: `--profile fruitful`, `--jobs auto`, per-crate sharding, artifact-based crash detection, `.repro.txt` sidecars, a time-capped post-fuzz corpus minimize/upload, and the auto-generated `buf.yaml` fuzz-crate excludes. * CI: move cargo-fuzz from nightly to release qualification (24h, 48-core). * The production-side enablement the targets require: the `fuzzing` Cargo feature and the `#[doc(hidden)]` / `cfg`-gated re-exports that expose upsert, persist-client, and pgwire internals to the fuzz crates. * The macOS build fix those exports necessitated: switching the affected storage `Stream::inspect` calls to `InspectCore::inspect_container`, which avoids the objc2-driven trait-solver overflow the `Inspect` bound triggers on macOS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 5faf988 commit d24679f

132 files changed

Lines changed: 20034 additions & 13 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,4 @@ uv.lock
6868
/plots/
6969
/test/testdrive/types.parquet*
7070
/test/mz-deploy/**/target/
71+
target-fuzz/

Cargo.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,11 @@ exclude = [
260260
"misc/wasm/*",
261261
# Ignore any Rust dependencies that python packages might pull in.
262262
"misc/python/venv/*",
263+
# The `src/*/fuzz` cargo-fuzz crates need no entry here: each sets
264+
# `package.workspace = "../../../test/cargo-fuzz"`, attaching it to the fuzz
265+
# workspace, and a crate nested under a workspace member is never
266+
# auto-included in the root workspace. They build on a nightly toolchain
267+
# (libFuzzer) via `cargo +nightly fuzz run ...` or `ci/test/cargo-fuzz.sh`.
263268
]
264269

265270
# Use Cargo's new feature resolver, which can handle target-specific features,

ci/builder/Dockerfile

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,14 @@ RUN mkdir rust \
247247
&& cargo install --root /usr/local --version "=0.1.61" --locked --features=vendored-openssl cargo-udeps \
248248
&& cargo install --root /usr/local --version "=0.4.0" --locked cargo-binutils \
249249
&& cargo install --root /usr/local --version "=0.13.1" --locked wasm-pack \
250+
&& if [ "$RUST_VERSION" = "nightly" ]; then \
251+
# NOTE: no --locked, unlike the installs above. cargo-fuzz 0.13.1's \
252+
# bundled Cargo.lock pins deps that fail on the pinned nightly \
253+
# (yanked futures-util/zip, plus a crate using the perma-unstable \
254+
# `rustc_layout_scalar_valid_range_*` attribute). Let cargo resolve \
255+
# compatible versions instead. \
256+
cargo install --root /usr/local --version "=0.13.1" cargo-fuzz; \
257+
fi \
250258
&& rm -rf /cargo/registry /cargo/git
251259

252260
# Shims for sanitizers

ci/plugins/mzcompose/hooks/command

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,7 @@ cleanup() {
353353
killall -9 -q clusterd || true # There might be remaining processes from a cargo-test run
354354

355355
if [ ! -s services.log ] \
356+
&& [ "$BUILDKITE_LABEL" != ":rust: cargo-fuzz" ] \
356357
&& [ "$BUILDKITE_LABEL" != "Maelstrom coverage of persist" ] \
357358
&& [ "$BUILDKITE_LABEL" != "Long single-node Maelstrom coverage of persist" ] \
358359
&& [ "$BUILDKITE_LABEL" != "Maelstrom coverage of txn-wal" ] \

ci/plugins/mzcompose/plugin.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,7 @@ configuration:
2121
type: string
2222
composition:
2323
type: string
24+
ci_builder:
25+
type: string
2426
required: ["composition"]
2527
additionalProperties: false

ci/release-qualification/pipeline.template.yml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,29 @@ steps:
254254
composition: sqlsmith
255255
args: [--max-joins=15, --explain-only, --runtime=6000]
256256

257+
- id: cargo-fuzz
258+
label: ":rust: cargo-fuzz"
259+
depends_on: []
260+
timeout_in_minutes: 1440
261+
agents:
262+
queue: hetzner-x86-64-dedi-48cpu-192gb
263+
sanitizer: skip
264+
plugins:
265+
- ./ci/plugins/mzcompose:
266+
composition: cargo-fuzz
267+
ci_builder: nightly
268+
args:
269+
- --profile=fruitful
270+
- --max-seconds=86400
271+
- --wall-budget=84600
272+
# Step hard-times out at 1440min (86400s). --wall-budget ends fuzzing
273+
# at 84600s, leaving 1800s; cap minimize at 1200s so the corpus
274+
# upload has ~600s of headroom before the kill.
275+
- --minimize-timeout=1200
276+
- --corpus-sync
277+
artifact_paths:
278+
- src/*/fuzz/artifacts/**/*
279+
257280
- id: test-preflight-check-rollback
258281
label: Test with preflight check and rollback
259282
depends_on: []

ci/test/lint-buf/generate-buf-config.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@
1818

1919
SOURCE_DIR = "src/"
2020
PROTO_FILE_GLOB = f"{SOURCE_DIR}**/*.proto"
21+
# Each fuzz crate (`src/<crate>/fuzz`) is its own cargo `[workspace]`, so
22+
# building it standalone creates `src/<crate>/fuzz/target/`. Build-script deps
23+
# (e.g. `protobuf-src`) extract `.proto` files into that tree, which buf must
24+
# not scan. We exclude every fuzz crate's `target/` instead of hand-listing them.
25+
FUZZ_CRATE_GLOB = f"{SOURCE_DIR}*/fuzz"
2126

2227
GENERATION_COMMENT = "File generated by generate-buf-config.py - DO NOT EDIT"
2328
BUF_INSTRUCTION_PREFIX = "// buf breaking:"
@@ -37,6 +42,11 @@ def is_ignore(self) -> bool:
3742
def collect_proto_files() -> list[ProtoFile]:
3843
print(f"Working dir: {os.getcwd()}")
3944
proto_file_paths = glob.glob(PROTO_FILE_GLOB, recursive=True)
45+
# Filter out build artifacts: each fuzz crate is its own `[workspace]`, so
46+
# building it standalone creates `src/<crate>/fuzz/target/`. A build-script
47+
# dep (`protobuf-src`) vendors protoc's bundled `.proto` files (Google's
48+
# well-known types) into that tree, which is not source we want buf to scan.
49+
proto_file_paths = [p for p in proto_file_paths if "/target/" not in p]
4050
return [ProtoFile(path) for path in proto_file_paths]
4151

4252

@@ -82,6 +92,20 @@ def generate_buf_ignore_section(ignored_files: list[ProtoFile]) -> str:
8292
return "\n".join(ignore_entry_lines).strip()
8393

8494

95+
def generate_fuzz_target_excludes() -> str:
96+
fuzz_crate_dirs = sorted(d for d in glob.glob(FUZZ_CRATE_GLOB) if os.path.isdir(d))
97+
exclude_lines = []
98+
for fuzz_dir in fuzz_crate_dirs:
99+
# e.g. "src/transform/fuzz" -> "transform/fuzz/target"
100+
relative_path = fuzz_dir.removeprefix(SOURCE_DIR)
101+
exclude_lines.append(f" - {relative_path}/target")
102+
103+
if len(exclude_lines) == 0:
104+
exclude_lines.append(" # none")
105+
106+
return "\n".join(exclude_lines).strip()
107+
108+
85109
def write_buf_configuration(
86110
template_path: str, target_path: str, ignored_files: list[ProtoFile]
87111
) -> None:
@@ -92,6 +116,9 @@ def write_buf_configuration(
92116
content = content.replace(
93117
"${ignore-entries}", generate_buf_ignore_section(ignored_files)
94118
)
119+
content = content.replace(
120+
"${fuzz-target-excludes}", generate_fuzz_target_excludes()
121+
)
95122

96123
with open(target_path, "w") as output_file:
97124
output_file.write(content)

misc/python/materialize/cli/ci_annotate_errors.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,9 @@
100100
# \s\S is any character including newlines, so this matches multiline strings
101101
# non-greedy using ? so that we don't match all the result comparison issues into one block
102102
| ----------\ RESULT\ COMPARISON\ ISSUE\ START\ ----------[\s\S]*?----------\ RESULT\ COMPARISON\ ISSUE\ END\ ------------
103+
# cargo-fuzz crash, emitted by the cargo-fuzz mzcompose runner (one block
104+
# per failing target, with the crash input and a reproduce command)
105+
| ----------\ CARGO-FUZZ\ FAILURE\ START\ ----------[\s\S]*?----------\ CARGO-FUZZ\ FAILURE\ END\ ----------
103106
# output consistency tests
104107
# | possibly\ invalid\ operation\ specification # disabled
105108
# for miri test summary

src/avro/fuzz/.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
target/
2+
corpus/
3+
artifacts/
4+
coverage/
5+
Cargo.lock

src/avro/fuzz/Cargo.toml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Fuzz crate for mz-avro decoders. Avro bytes arrive from Kafka, so a
2+
# decoder bug here is a crash/poisoning risk for source ingestion.
3+
#
4+
# Excluded from the main workspace because libFuzzer requires nightly Rust.
5+
# Run via the repo-wide runner: `bin/ci-builder run nightly ci/test/cargo-fuzz.sh`,
6+
# or locally:
7+
# cd src/avro/fuzz
8+
# cargo +nightly fuzz run reader_decode -- -max_total_time=60
9+
10+
[package]
11+
workspace = "../../../test/cargo-fuzz"
12+
name = "mz-avro-fuzz"
13+
version = "0.0.0"
14+
publish = false
15+
edition = "2021"
16+
17+
[package.metadata]
18+
cargo-fuzz = true
19+
20+
[dependencies]
21+
libfuzzer-sys = "0.4"
22+
mz-avro = { path = ".." }
23+
24+
[[bin]]
25+
name = "reader_decode"
26+
path = "fuzz_targets/reader_decode.rs"
27+
test = false
28+
doc = false
29+
bench = false
30+
31+
[[bin]]
32+
name = "schema_resolve"
33+
path = "fuzz_targets/schema_resolve.rs"
34+
test = false
35+
doc = false
36+
bench = false
37+
38+
[[bin]]
39+
name = "avro_schema_parse"
40+
path = "fuzz_targets/avro_schema_parse.rs"
41+
test = false
42+
doc = false
43+
bench = false

0 commit comments

Comments
 (0)