Skip to content

clusterd-test-driver: headless frontend to clusterd for scripted compute tests#37008

Merged
antiguru merged 2 commits into
MaterializeInc:mainfrom
antiguru:headless-compute-driver
Jun 18, 2026
Merged

clusterd-test-driver: headless frontend to clusterd for scripted compute tests#37008
antiguru merged 2 commits into
MaterializeInc:mainfrom
antiguru:headless-compute-driver

Conversation

@antiguru

@antiguru antiguru commented Jun 12, 2026

Copy link
Copy Markdown
Member

Motivation

Running a small compute experiment today means standing up a full environmentd — the SQL layer, the catalog, the coordinator — even when all you want is to hand clusterd a few commands and watch what it does. That is slow to set up and hard to control precisely.

This adds mz-clusterd-test-driver: a headless frontend that speaks the compute protocol to clusterd directly, with no environmentd. A test drives it from a text script, so it controls the exact persist state, the exact commands the replica sees, and the exact timestamps. Design doc: doc/developer/design/20260612_headless_clusterd_test_driver.md.

What a script looks like

Each stanza is a command followed by a ---- block holding its expected output. That block is the assertion, and REWRITE=1 regenerates it in place. Here is the whole lifecycle of a count(*) materialized view over a persist shard, read back at the end:

create-instance
----
ok

initialization-complete
----
ok

write-single-ts shard=events ts=0 count=3000
----
wrote 3000

define-schema name=count_out
  count bigint
----
ok

create-dataflow name=count-mv as-of=0
  import source=1000 shard=events upper=1
  build id=2000
    Reduce aggregates=[count(*)]
      Get u1000
  export kind=materialized-view sink=2001 on=2000 shard=mv schema=count_out
----
ok

schedule id=2001
----
ok

allow-writes id=2001
----
ok

peek id=2001 schema=count_out ts=0
----
3000

A command that fails renders as error: <message>, so an expected failure is just another golden block — there is no special assertion command. Because the waits are level-triggered on monotonic frontiers, the order a script waits in does not change the result, so a single sequential script stays deterministic.

How it works

The crate is a generic mechanism, a dataflow builder, and the scripting layer on top.

  • The mechanism hosts the persist PubSub server, connects over CTP (sending only Hello), and exposes a Driver that sends any ComputeCommand, submits dataflows without auto-scheduling, watches frontiers, and peeks.
  • DataflowBuilder takes generic parts — persist imports, MIR objects to build, and index/materialized-view/subscribe exports — and runs the real MIR → LIR → RenderPlan lowering, because a RenderPlan can't be hand-built outside mz-compute-types. A build's computation is written in the mz-expr-parser .spec syntax (Reduce aggregates=[count(*)] over Get u1000) rather than a bespoke vocabulary, since MirRelationExpr's own serde isn't hand-authorable (Row literals are opaque bytes).
  • create-dataflow is the one abstraction behind index, materialized-view, and subscribe exports (copy-to isn't implemented yet). A materialized view is read back by peeking the sink id — that becomes a persist peek of its output shard, the same path SELECT * FROM mv takes — and a subscribe streams responses that the driver buffers and await-subscribe drains. Dataflows start read-only, so a sink needs allow-writes before its writes land.
  • The handshake is explicit (create-instance, optional update-configuration, initialization-complete), and reconnect re-runs it without initialization-complete to exercise reconciliation.

Verification

mzcompose runs each scenario script against a real clusterd and fails on any golden mismatch; the scenarios are index, deep-history, side-effects, multi-dataflow, reconciliation, error-behavior, reduce, materialized-view, and subscribe. Unit tests cover the direct-write round trip, the lowered dataflow structure, and the script parser, and run-local.py runs the same scripts on the host (with REWRITE=1) without docker images.

@antiguru antiguru left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we're missing is a specific language to script the headless driver. Encoding the interactions in Rust is a good first step, but ultimately we want something that is easier to iterate on. I could even imagine Python scripts that we load through pyo3 or so.

Comment thread src/compute-test-driver/src/lib.rs Outdated
@antiguru

Copy link
Copy Markdown
Member Author

Agreed on a scripting language being the goal. I captured it under "Future work" in the design doc: the mechanism is already a thin scriptable surface (send, submit_dataflow, schedule, expect_frontier, peek, subscribe_raw), so a declarative script or Python-via-pyo3 layer would bind to it rather than replace it. Left it out of this PR to keep the first step focused.

@antiguru antiguru changed the title compute-test-driver: headless frontend to clusterd for scripted compute tests clusterd-test-driver: headless frontend to clusterd for scripted compute tests Jun 12, 2026
@antiguru antiguru marked this pull request as ready for review June 12, 2026 18:04
@antiguru antiguru requested a review from a team as a code owner June 12, 2026 18:04
@antiguru antiguru requested review from def- and frankmcsherry June 12, 2026 18:04

@def- def- left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have once built something very similar at a previous company, also using json inputs to test an execution engine directly instead of going through the entire database stack. Back then the test framework turned out to not be very useful since it was easier to write unit tests and full system tests instead of the middle-ground headless json tests.

Do we have some concrete examples of regression tests we've wanted to build and failed before, which would be possible with clusterd-test-driver?

I still think it's a worthy experiment, and I can especially imagine that building a fuzzer for clusterd-test-driver with some invariants could lead us to interesting bugs.

Comment thread test/clusterd-test-driver/mzcompose.py Outdated
Comment thread src/clusterd-test-driver/src/script.rs Outdated
Comment thread src/clusterd-test-driver/src/script.rs Outdated
@antiguru antiguru requested review from DAlperin and def- June 15, 2026 12:45
@antiguru antiguru force-pushed the headless-compute-driver branch 5 times, most recently from 610ac37 to 1855839 Compare June 16, 2026 09:44
# Join the two sources on their key column — `#0` (left key) equals `#2` (right key)
# in the 4-column output — and arrange the result. `optimize` is what lets the
# `Join` lower.
create-dataflow name=join as-of=0 optimize

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but it occurs to me that when the caller opts into the optimize verb they might want to be able to assert that the optimized variant looks a certain way so that subtle optimizer behavior drift can't unknowingly meaningfully change the test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the more I think about this its probably pretty easy and worth doing since it otherwise breaks the spirit of the test framework a bit IMO

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — done in a stacked PR (antiguru#182). Added an explain verb: same body as create-dataflow, but it renders the lowered LIR (EXPLAIN PHYSICAL PLAN form) as the golden instead of submitting. join.spec now asserts the differential-join plan shape alongside the count, so optimizer/lowering drift surfaces. The render is e.g.:

u2000:
  →Differential Join %0:u1000[#0] » %1:u1001[#0]
    →Arrange (#0)
      →Stream u1000
    →Arrange (#0)
      →Stream u1001

antiguru and others added 2 commits June 18, 2026 20:25
New crate providing an alternate frontend to `clusterd` that stands in for
environmentd's controller in compute tests: it hosts persist PubSub, connects the
compute CTP channel, and drives dataflows directly, with no SQL layer, catalog,
or timestamp oracle.

This first piece is the harness library: connection-target resolution, the CTP
connect + `Hello` handshake, persist-PubSub hosting, synthetic-row and direct
persist writes, response demultiplexing (frontiers + peeks), the `Driver`
(submit/schedule/await-frontier/peek), and `DataflowBuilder`, which lowers
caller-supplied MIR to a shippable `RenderPlan` dataflow (optionally running the
MIR optimizer). Exercised end-to-end by `tests/index_smoke.rs` against a real
clusterd.

The text script runner that drives this harness follows in a stacked PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drives the headless harness from text command scripts instead of recompiled Rust
scenarios. A script is a sequence of commands, each with a `----` golden block
that is the assertion; `create-dataflow` carries arbitrary MIR (parsed by
`mz-expr-parser`, the `.spec` syntax) over the full `DataflowBuilder` surface,
including index / materialized-view / subscribe exports and an opt-in `optimize`.

Adds the `text` parser, the `script` interpreter, the `headless-driver` binary,
the mzbuild image and CI pipeline entry, the mzcompose composition and a
host-local runner (`run-local.py`), and the scenario scripts (index,
deep-history, side-effects, reconciliation, error-behavior, reduce,
materialized-view, subscribe, join, index-and-mv, custom-schema).
`Composition.run` gains `use_aliases` so the one-shot driver container can host
the PubSub that clusterd dials.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@antiguru antiguru force-pushed the headless-compute-driver branch from 1855839 to 45780a2 Compare June 18, 2026 18:28

@DAlperin DAlperin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems sane to me and like a good test framework to have.

/// and closes when it sends `InitializationComplete`; in between, the replica
/// reconciles the replayed dataflows against its live ones rather than
/// rehydrating.
pub async fn connect_and_hello(compute_addr: &str) -> anyhow::Result<ComputeCtpClient> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called from the driver. Might not need to be its own file

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's shared by two modules, not just the driver: driver.rs uses connect_and_hello, and responses.rs uses the ComputeCtpClient type for the response pump. Folding it into driver.rs would make the response pump depend on the driver just for the transport type, so I'd keep the CTP primitives in their own small module. Happy to revisit if you feel strongly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed it in my grep. My bad!

}
PeekResponse::Error(e) => anyhow::bail!("peek error: {e}"),
PeekResponse::Canceled => anyhow::bail!("peek canceled"),
PeekResponse::Stashed(_) => anyhow::bail!("unexpected stashed peek result"),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought for sure I had claude out here but it explicitly disabled the peek stash so this is fine:)

@antiguru antiguru merged commit 6d4c0fb into MaterializeInc:main Jun 18, 2026
132 checks passed
@antiguru

Copy link
Copy Markdown
Member Author

Thanks for the review!

antiguru pushed a commit that referenced this pull request Jun 18, 2026
The framework-selection list in the mz-test skill had no entry for
dataflow/replica-level compute tests. Add the clusterd test driver
(#37008) with its run commands and REWRITE workflow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012QXMSBfD9bXzJaqs52URyV
antiguru added a commit that referenced this pull request Jun 19, 2026
### Motivation

The `mz-clusterd-test-driver` crate (added in #37008) is not in the miri
exclude list, so its unit tests run under nightly miri.
Every `dataflow::tests::*` test drives MIR-to-LIR lowering, which
reaches `mz_ore::stack::maybe_grow` -> `stacker` ->
`psm::stack_pointer`, a foreign function miri cannot call.
The tests therefore abort under miri (`unsupported operation: can't call
foreign function rust_psm_stack_pointer on OS linux`).

### Description

Mark the six lowering tests `#[cfg_attr(miri, ignore)]`, matching the
existing convention in `mz-expr` (e.g. `src/expr/src/visit.rs`,
`scalar.rs`, `relation/join_input_mapper.rs`), rather than excluding the
crate or shimming `mz_ore::stack`.

### Verification

`CARGO_TARGET_DIR=$PWD/miri-target MIRIFLAGS="-Zmiri-disable-isolation
-Zmiri-strict-provenance" cargo +nightly miri nextest run -p
mz-clusterd-test-driver dataflow::tests` skips them (exit 0); a normal
`cargo nextest run` still runs all of them.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants