test(e2e): golden behavioural-equivalence harness (Tier A + B)#213
Merged
Conversation
meld's central claim is "fusion preserves observable behaviour." This harness falsifies it differentially: run a real component unfused under deterministic wasmtime (the result IS the golden), meld-fuse it, run the fused output the same way, assert identical observable behaviour. No hand-authored expected values. Tier A (active, green): single-component round-trip equivalence over the wit-bindgen ABI fixtures + real cross-language command components (hello_rust/c/cpp), both memory strategies. 12 fuse-and-run checks; the hello_* ones assert byte-identical stdout. SharedMemory+rebasing correctly declines memory.grow fixtures (logged skip, not a divergence). Tier B (discovery oracle, #[ignore] on meld#212): fuse a real two-component composition (consumer.runner.compute -> provider.add(20,22) = 42, built offline via wasm-tools + wac, see compose/build.sh) and assert the fused output computes 42 standalone. Building it surfaced three real multi-component fusion gaps (meld#212): separate-input cross-component links not internalised, wac-composed exports dropped, bare-world func-export result type dropped. The test body is the fix's acceptance test — un-ignore when #212 lands. Honest boundary: equivalence under wasmtime (reference runtime), NOT the synth/kiln MCU target — that hardware smoke is tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LS-N verification gate
Approved Failed LS entries(none) Missing regression tests
Updated automatically by |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Answers "how do I know it really works" with a differential end-to-end test of meld's central claim — fusion preserves observable behaviour:
meld fuseit.Tier A — active, green (12 checks)
Single-component round-trip equivalence over the wit-bindgen ABI fixtures and real cross-language command components (
hello_rust/hello_c_cli/hello_cpp_cli), both memory strategies. Thehello_*cases assert byte-identical stdout — a real behavioural diff, not just "didn't trap."SharedMemory+rebasing correctly declinesmemory.growfixtures (logged skip — meld refusing is not a divergence).Tier B — discovery oracle (
#[ignore]on #212)Fuses a real two-component composition built offline with
wasm-tools+wac(compose/build.sh):consumer.runner.compute()callsprovider.add(20,22) = 42. The body asserts the meld-fused output computes42standalone — the acceptance test for #212, un-ignore when it lands.Building Tier B surfaced three real multi-component fusion gaps (filed as #212):
wac-composed inputs lose their top-level export (emptyworld root {}).Tier A proves real wit-bindgen compositions fuse + run with identical behaviour today; Tier B marks the boundary of what multi-component fusion doesn't yet handle.
Honest boundary
Equivalence is proven under wasmtime (the reference runtime), not the synth/kiln MCU target — a module passing here can still break after synth transcodes it. That cross-repo hardware smoke is tracked separately (owner: you).
Fixtures are committed (
*.wasmis gitignored; force-added like the existing fixtures) and regenerable viacompose/build.sh.🤖 Generated with Claude Code