clusterd-test-driver: headless frontend to clusterd for scripted compute tests#37008
Conversation
antiguru
left a comment
There was a problem hiding this comment.
What we're missing is a specific language to script the headless driver. Encoding the interactions in Rust is a good first step, but ultimately we want something that is easier to iterate on. I could even imagine Python scripts that we load through pyo3 or so.
|
Agreed on a scripting language being the goal. I captured it under "Future work" in the design doc: the mechanism is already a thin scriptable surface ( |
def-
left a comment
There was a problem hiding this comment.
I have once built something very similar at a previous company, also using json inputs to test an execution engine directly instead of going through the entire database stack. Back then the test framework turned out to not be very useful since it was easier to write unit tests and full system tests instead of the middle-ground headless json tests.
Do we have some concrete examples of regression tests we've wanted to build and failed before, which would be possible with clusterd-test-driver?
I still think it's a worthy experiment, and I can especially imagine that building a fuzzer for clusterd-test-driver with some invariants could lead us to interesting bugs.
610ac37 to
1855839
Compare
| # Join the two sources on their key column — `#0` (left key) equals `#2` (right key) | ||
| # in the 4-column output — and arrange the result. `optimize` is what lets the | ||
| # `Join` lower. | ||
| create-dataflow name=join as-of=0 optimize |
There was a problem hiding this comment.
Not blocking, but it occurs to me that when the caller opts into the optimize verb they might want to be able to assert that the optimized variant looks a certain way so that subtle optimizer behavior drift can't unknowingly meaningfully change the test.
There was a problem hiding this comment.
Actually the more I think about this its probably pretty easy and worth doing since it otherwise breaks the spirit of the test framework a bit IMO
There was a problem hiding this comment.
Good call — done in a stacked PR (antiguru#182). Added an explain verb: same body as create-dataflow, but it renders the lowered LIR (EXPLAIN PHYSICAL PLAN form) as the golden instead of submitting. join.spec now asserts the differential-join plan shape alongside the count, so optimizer/lowering drift surfaces. The render is e.g.:
u2000:
→Differential Join %0:u1000[#0] » %1:u1001[#0]
→Arrange (#0)
→Stream u1000
→Arrange (#0)
→Stream u1001
New crate providing an alternate frontend to `clusterd` that stands in for environmentd's controller in compute tests: it hosts persist PubSub, connects the compute CTP channel, and drives dataflows directly, with no SQL layer, catalog, or timestamp oracle. This first piece is the harness library: connection-target resolution, the CTP connect + `Hello` handshake, persist-PubSub hosting, synthetic-row and direct persist writes, response demultiplexing (frontiers + peeks), the `Driver` (submit/schedule/await-frontier/peek), and `DataflowBuilder`, which lowers caller-supplied MIR to a shippable `RenderPlan` dataflow (optionally running the MIR optimizer). Exercised end-to-end by `tests/index_smoke.rs` against a real clusterd. The text script runner that drives this harness follows in a stacked PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drives the headless harness from text command scripts instead of recompiled Rust scenarios. A script is a sequence of commands, each with a `----` golden block that is the assertion; `create-dataflow` carries arbitrary MIR (parsed by `mz-expr-parser`, the `.spec` syntax) over the full `DataflowBuilder` surface, including index / materialized-view / subscribe exports and an opt-in `optimize`. Adds the `text` parser, the `script` interpreter, the `headless-driver` binary, the mzbuild image and CI pipeline entry, the mzcompose composition and a host-local runner (`run-local.py`), and the scenario scripts (index, deep-history, side-effects, reconciliation, error-behavior, reduce, materialized-view, subscribe, join, index-and-mv, custom-schema). `Composition.run` gains `use_aliases` so the one-shot driver container can host the PubSub that clusterd dials. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1855839 to
45780a2
Compare
DAlperin
left a comment
There was a problem hiding this comment.
Seems sane to me and like a good test framework to have.
| /// and closes when it sends `InitializationComplete`; in between, the replica | ||
| /// reconciles the replayed dataflows against its live ones rather than | ||
| /// rehydrating. | ||
| pub async fn connect_and_hello(compute_addr: &str) -> anyhow::Result<ComputeCtpClient> { |
There was a problem hiding this comment.
This is only called from the driver. Might not need to be its own file
There was a problem hiding this comment.
It's shared by two modules, not just the driver: driver.rs uses connect_and_hello, and responses.rs uses the ComputeCtpClient type for the response pump. Folding it into driver.rs would make the response pump depend on the driver just for the transport type, so I'd keep the CTP primitives in their own small module. Happy to revisit if you feel strongly.
| } | ||
| PeekResponse::Error(e) => anyhow::bail!("peek error: {e}"), | ||
| PeekResponse::Canceled => anyhow::bail!("peek canceled"), | ||
| PeekResponse::Stashed(_) => anyhow::bail!("unexpected stashed peek result"), |
There was a problem hiding this comment.
I thought for sure I had claude out here but it explicitly disabled the peek stash so this is fine:)
|
Thanks for the review! |
The framework-selection list in the mz-test skill had no entry for dataflow/replica-level compute tests. Add the clusterd test driver (#37008) with its run commands and REWRITE workflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012QXMSBfD9bXzJaqs52URyV
### Motivation The `mz-clusterd-test-driver` crate (added in #37008) is not in the miri exclude list, so its unit tests run under nightly miri. Every `dataflow::tests::*` test drives MIR-to-LIR lowering, which reaches `mz_ore::stack::maybe_grow` -> `stacker` -> `psm::stack_pointer`, a foreign function miri cannot call. The tests therefore abort under miri (`unsupported operation: can't call foreign function rust_psm_stack_pointer on OS linux`). ### Description Mark the six lowering tests `#[cfg_attr(miri, ignore)]`, matching the existing convention in `mz-expr` (e.g. `src/expr/src/visit.rs`, `scalar.rs`, `relation/join_input_mapper.rs`), rather than excluding the crate or shimming `mz_ore::stack`. ### Verification `CARGO_TARGET_DIR=$PWD/miri-target MIRIFLAGS="-Zmiri-disable-isolation -Zmiri-strict-provenance" cargo +nightly miri nextest run -p mz-clusterd-test-driver dataflow::tests` skips them (exit 0); a normal `cargo nextest run` still runs all of them. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Motivation
Running a small compute experiment today means standing up a full
environmentd— the SQL layer, the catalog, the coordinator — even when all you want is to handclusterda few commands and watch what it does. That is slow to set up and hard to control precisely.This adds
mz-clusterd-test-driver: a headless frontend that speaks the compute protocol toclusterddirectly, with noenvironmentd. A test drives it from a text script, so it controls the exact persist state, the exact commands the replica sees, and the exact timestamps. Design doc:doc/developer/design/20260612_headless_clusterd_test_driver.md.What a script looks like
Each stanza is a command followed by a
----block holding its expected output. That block is the assertion, andREWRITE=1regenerates it in place. Here is the whole lifecycle of acount(*)materialized view over a persist shard, read back at the end:A command that fails renders as
error: <message>, so an expected failure is just another golden block — there is no special assertion command. Because the waits are level-triggered on monotonic frontiers, the order a script waits in does not change the result, so a single sequential script stays deterministic.How it works
The crate is a generic mechanism, a dataflow builder, and the scripting layer on top.
Hello), and exposes aDriverthat sends anyComputeCommand, submits dataflows without auto-scheduling, watches frontiers, and peeks.DataflowBuildertakes generic parts — persist imports, MIR objects to build, and index/materialized-view/subscribe exports — and runs the real MIR → LIR →RenderPlanlowering, because aRenderPlancan't be hand-built outsidemz-compute-types. Abuild's computation is written in themz-expr-parser.specsyntax (Reduce aggregates=[count(*)]overGet u1000) rather than a bespoke vocabulary, sinceMirRelationExpr's own serde isn't hand-authorable (Rowliterals are opaque bytes).create-dataflowis the one abstraction behind index, materialized-view, and subscribe exports (copy-toisn't implemented yet). A materialized view is read back bypeeking the sink id — that becomes a persist peek of its output shard, the same pathSELECT * FROM mvtakes — and a subscribe streams responses that the driver buffers andawait-subscribedrains. Dataflows start read-only, so a sink needsallow-writesbefore its writes land.create-instance, optionalupdate-configuration,initialization-complete), andreconnectre-runs it withoutinitialization-completeto exercise reconciliation.Verification
mzcomposeruns each scenario script against a realclusterdand fails on any golden mismatch; the scenarios are index, deep-history, side-effects, multi-dataflow, reconciliation, error-behavior, reduce, materialized-view, and subscribe. Unit tests cover the direct-write round trip, the lowered dataflow structure, and the script parser, andrun-local.pyruns the same scripts on the host (withREWRITE=1) without docker images.