clusterd-test-driver: headless frontend to clusterd for scripted compute tests by antiguru · Pull Request #37008 · MaterializeInc/materialize

antiguru · 2026-06-12T10:53:25Z

Motivation

Running a small compute experiment today means standing up a full environmentd — the SQL layer, the catalog, the coordinator — even when all you want is to hand clusterd a few commands and watch what it does. That is slow to set up and hard to control precisely.

This adds mz-clusterd-test-driver: a headless frontend that speaks the compute protocol to clusterd directly, with no environmentd. A test drives it from a text script, so it controls the exact persist state, the exact commands the replica sees, and the exact timestamps. Design doc: doc/developer/design/20260612_headless_clusterd_test_driver.md.

What a script looks like

Each stanza is a command followed by a ---- block holding its expected output. That block is the assertion, and REWRITE=1 regenerates it in place. Here is the whole lifecycle of a count(*) materialized view over a persist shard, read back at the end:

create-instance
----
ok

initialization-complete
----
ok

write-single-ts shard=events ts=0 count=3000
----
wrote 3000

define-schema name=count_out
  count bigint
----
ok

create-dataflow name=count-mv as-of=0
  import source=1000 shard=events upper=1
  build id=2000
    Reduce aggregates=[count(*)]
      Get u1000
  export kind=materialized-view sink=2001 on=2000 shard=mv schema=count_out
----
ok

schedule id=2001
----
ok

allow-writes id=2001
----
ok

peek id=2001 schema=count_out ts=0
----
3000

A command that fails renders as error: <message>, so an expected failure is just another golden block — there is no special assertion command. Because the waits are level-triggered on monotonic frontiers, the order a script waits in does not change the result, so a single sequential script stays deterministic.

How it works

The crate is a generic mechanism, a dataflow builder, and the scripting layer on top.

The mechanism hosts the persist PubSub server, connects over CTP (sending only Hello), and exposes a Driver that sends any ComputeCommand, submits dataflows without auto-scheduling, watches frontiers, and peeks.
DataflowBuilder takes generic parts — persist imports, MIR objects to build, and index/materialized-view/subscribe exports — and runs the real MIR → LIR → RenderPlan lowering, because a RenderPlan can't be hand-built outside mz-compute-types. A build's computation is written in the mz-expr-parser .spec syntax (Reduce aggregates=[count(*)] over Get u1000) rather than a bespoke vocabulary, since MirRelationExpr's own serde isn't hand-authorable (Row literals are opaque bytes).
create-dataflow is the one abstraction behind index, materialized-view, and subscribe exports (copy-to isn't implemented yet). A materialized view is read back by peeking the sink id — that becomes a persist peek of its output shard, the same path SELECT * FROM mv takes — and a subscribe streams responses that the driver buffers and await-subscribe drains. Dataflows start read-only, so a sink needs allow-writes before its writes land.
The handshake is explicit (create-instance, optional update-configuration, initialization-complete), and reconnect re-runs it without initialization-complete to exercise reconciliation.

Verification

mzcompose runs each scenario script against a real clusterd and fails on any golden mismatch; the scenarios are index, deep-history, side-effects, multi-dataflow, reconciliation, error-behavior, reduce, materialized-view, and subscribe. Unit tests cover the direct-write round trip, the lowered dataflow structure, and the script parser, and run-local.py runs the same scripts on the host (with REWRITE=1) without docker images.

antiguru

What we're missing is a specific language to script the headless driver. Encoding the interactions in Rust is a good first step, but ultimately we want something that is easier to iterate on. I could even imagine Python scripts that we load through pyo3 or so.

antiguru · 2026-06-12T11:11:56Z

Agreed on a scripting language being the goal. I captured it under "Future work" in the design doc: the mechanism is already a thin scriptable surface (send, submit_dataflow, schedule, expect_frontier, peek, subscribe_raw), so a declarative script or Python-via-pyo3 layer would bind to it rather than replace it. Left it out of this PR to keep the first step focused.

def-

I have once built something very similar at a previous company, also using json inputs to test an execution engine directly instead of going through the entire database stack. Back then the test framework turned out to not be very useful since it was easier to write unit tests and full system tests instead of the middle-ground headless json tests.

Do we have some concrete examples of regression tests we've wanted to build and failed before, which would be possible with clusterd-test-driver?

I still think it's a worthy experiment, and I can especially imagine that building a fuzzer for clusterd-test-driver with some invariants could lead us to interesting bugs.

DAlperin · 2026-06-18T18:17:38Z

+# Join the two sources on their key column — `#0` (left key) equals `#2` (right key)
+# in the 4-column output — and arrange the result. `optimize` is what lets the
+# `Join` lower.
+create-dataflow name=join as-of=0 optimize


Not blocking, but it occurs to me that when the caller opts into the optimize verb they might want to be able to assert that the optimized variant looks a certain way so that subtle optimizer behavior drift can't unknowingly meaningfully change the test.

Actually the more I think about this its probably pretty easy and worth doing since it otherwise breaks the spirit of the test framework a bit IMO

Good call — done in a stacked PR (antiguru#182). Added an explain verb: same body as create-dataflow, but it renders the lowered LIR (EXPLAIN PHYSICAL PLAN form) as the golden instead of submitting. join.spec now asserts the differential-join plan shape alongside the count, so optimizer/lowering drift surfaces. The render is e.g.:

u2000: →Differential Join %0:u1000[#0] » %1:u1001[#0] →Arrange (#0) →Stream u1000 →Arrange (#0) →Stream u1001

New crate providing an alternate frontend to `clusterd` that stands in for environmentd's controller in compute tests: it hosts persist PubSub, connects the compute CTP channel, and drives dataflows directly, with no SQL layer, catalog, or timestamp oracle. This first piece is the harness library: connection-target resolution, the CTP connect + `Hello` handshake, persist-PubSub hosting, synthetic-row and direct persist writes, response demultiplexing (frontiers + peeks), the `Driver` (submit/schedule/await-frontier/peek), and `DataflowBuilder`, which lowers caller-supplied MIR to a shippable `RenderPlan` dataflow (optionally running the MIR optimizer). Exercised end-to-end by `tests/index_smoke.rs` against a real clusterd. The text script runner that drives this harness follows in a stacked PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Drives the headless harness from text command scripts instead of recompiled Rust scenarios. A script is a sequence of commands, each with a `----` golden block that is the assertion; `create-dataflow` carries arbitrary MIR (parsed by `mz-expr-parser`, the `.spec` syntax) over the full `DataflowBuilder` surface, including index / materialized-view / subscribe exports and an opt-in `optimize`. Adds the `text` parser, the `script` interpreter, the `headless-driver` binary, the mzbuild image and CI pipeline entry, the mzcompose composition and a host-local runner (`run-local.py`), and the scenario scripts (index, deep-history, side-effects, reconciliation, error-behavior, reduce, materialized-view, subscribe, join, index-and-mv, custom-schema). `Composition.run` gains `use_aliases` so the one-shot driver container can host the PubSub that clusterd dials. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DAlperin

Seems sane to me and like a good test framework to have.

DAlperin · 2026-06-18T18:23:17Z

+/// and closes when it sends `InitializationComplete`; in between, the replica
+/// reconciles the replayed dataflows against its live ones rather than
+/// rehydrating.
+pub async fn connect_and_hello(compute_addr: &str) -> anyhow::Result<ComputeCtpClient> {


This is only called from the driver. Might not need to be its own file

It's shared by two modules, not just the driver: driver.rs uses connect_and_hello, and responses.rs uses the ComputeCtpClient type for the response pump. Folding it into driver.rs would make the response pump depend on the driver just for the transport type, so I'd keep the CTP primitives in their own small module. Happy to revisit if you feel strongly.

Missed it in my grep. My bad!

DAlperin · 2026-06-18T18:27:44Z

+            }
+            PeekResponse::Error(e) => anyhow::bail!("peek error: {e}"),
+            PeekResponse::Canceled => anyhow::bail!("peek canceled"),
+            PeekResponse::Stashed(_) => anyhow::bail!("unexpected stashed peek result"),


I thought for sure I had claude out here but it explicitly disabled the peek stash so this is fine:)

antiguru · 2026-06-18T19:15:37Z

Thanks for the review!

The framework-selection list in the mz-test skill had no entry for dataflow/replica-level compute tests. Add the clusterd test driver (#37008) with its run commands and REWRITE workflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012QXMSBfD9bXzJaqs52URyV

### Motivation The `mz-clusterd-test-driver` crate (added in #37008) is not in the miri exclude list, so its unit tests run under nightly miri. Every `dataflow::tests::*` test drives MIR-to-LIR lowering, which reaches `mz_ore::stack::maybe_grow` -> `stacker` -> `psm::stack_pointer`, a foreign function miri cannot call. The tests therefore abort under miri (`unsupported operation: can't call foreign function rust_psm_stack_pointer on OS linux`). ### Description Mark the six lowering tests `#[cfg_attr(miri, ignore)]`, matching the existing convention in `mz-expr` (e.g. `src/expr/src/visit.rs`, `scalar.rs`, `relation/join_input_mapper.rs`), rather than excluding the crate or shimming `mz_ore::stack`. ### Verification `CARGO_TARGET_DIR=$PWD/miri-target MIRIFLAGS="-Zmiri-disable-isolation -Zmiri-strict-provenance" cargo +nightly miri nextest run -p mz-clusterd-test-driver dataflow::tests` skips them (exit 0); a normal `cargo nextest run` still runs all of them. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

antiguru commented Jun 12, 2026

View reviewed changes

Comment thread src/compute-test-driver/src/lib.rs Outdated

antiguru changed the title ~~compute-test-driver: headless frontend to clusterd for scripted compute tests~~ clusterd-test-driver: headless frontend to clusterd for scripted compute tests Jun 12, 2026

antiguru marked this pull request as ready for review June 12, 2026 18:04

antiguru requested a review from a team as a code owner June 12, 2026 18:04

antiguru requested review from def- and frankmcsherry June 12, 2026 18:04

def- reviewed Jun 12, 2026

View reviewed changes

antiguru commented Jun 15, 2026

View reviewed changes

Comment thread test/clusterd-test-driver/mzcompose.py Outdated

antiguru commented Jun 15, 2026

View reviewed changes

Comment thread src/clusterd-test-driver/src/script.rs Outdated

Comment thread src/clusterd-test-driver/src/script.rs Outdated

antiguru requested review from DAlperin and def- June 15, 2026 12:45

antiguru force-pushed the headless-compute-driver branch 5 times, most recently from 610ac37 to 1855839 Compare June 16, 2026 09:44

DAlperin reviewed Jun 18, 2026

View reviewed changes

antiguru and others added 2 commits June 18, 2026 20:25

antiguru force-pushed the headless-compute-driver branch from 1855839 to 45780a2 Compare June 18, 2026 18:28

DAlperin approved these changes Jun 18, 2026

View reviewed changes

antiguru mentioned this pull request Jun 18, 2026

clusterd-test-driver: add explain verb to assert optimized plan shape antiguru/materialize#182

Closed

antiguru merged commit 6d4c0fb into MaterializeInc:main Jun 18, 2026
132 checks passed

antiguru mentioned this pull request Jun 18, 2026

clusterd-test-driver: add explain verb to assert optimized plan shape #37141

Open

antiguru mentioned this pull request Jun 19, 2026

clusterd-test-driver: ignore lowering tests under miri #37149

Merged

Uh oh!

Conversation

antiguru commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

What a script looks like

How it works

Verification

Uh oh!

antiguru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antiguru commented Jun 12, 2026

Uh oh!

def- left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DAlperin Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

DAlperin Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

antiguru Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

DAlperin left a comment

Choose a reason for hiding this comment

Uh oh!

DAlperin Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

antiguru Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

DAlperin Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

DAlperin Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antiguru commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antiguru commented Jun 12, 2026 •

edited

Loading