test(mtp): add RED-bar scripts for converter + e2e determinism by dusterbloom · Pull Request #242 · Luce-Org/lucebox-hub

dusterbloom · 2026-05-21T11:19:51Z

Summary

Two shell test scripts carved out of #241 per @howard0su's review request to split the rename PR from the test additions.

What's added

dflash/test/test_mtp_converter.sh — Phase 1 RED test for convert_dflash_to_gguf.py --mtp-assistant mode. Validates the converter's GGUF output matches the AtomicChat gold reference on (1) tensor list, (2) required MTP tensors, (3) required GGUF metadata keys, (4) requires_target_arch == "gemma4". Exits 77 (autotools skip code) cleanly if the gold reference at $MTP_GOLD_GGUF is missing.
dflash/test/test_mtp_e2e.sh — Phase 3 RED test for --draft-method mtp byte-identical output vs --draft-method none at temp=0 seed=0. Exits 77 cleanly if the test_gemma4_dflash binary or the model files are missing.

Safe to land now

Both scripts:

Use #!/usr/bin/env bash + set -euo pipefail
Have no C++ namespace dependencies
Exit 77 (skip) when prerequisites aren't on disk
Have no global state, no side effects beyond writing to /tmp

They'll turn GREEN when the corresponding MTP work lands. Until then they're CI-stable skips.

Not in this PR

The third test file flagged in #241, dflash/test/test_flash_attn_sparse_mask.cpp, is deferred. It's a TDD red-bar that intentionally references symbols (flash_prefill_forward_bf16_masked(), 6-arg ggml_flash_attn_sparse()) that don't exist yet, so adding it to main without the spike that turns it green would leave a permanently broken build target. It will ship with the sparse-mask spike PR.

@howard0su

Two shell tests carved out of Luce-Org#241 per @howard0su's review request: - test_mtp_converter.sh: Phase 1 RED for convert_dflash_to_gguf.py --mtp-assistant mode. Validates the converter output matches the AtomicChat gold reference (tensor list, required MTP tensors, GGUF metadata keys, requires_target_arch == gemma4). Exits 77 cleanly if gold reference missing. - test_mtp_e2e.sh: Phase 3 RED for --draft-method mtp byte-identical output vs --draft-method none at temp=0 seed=0. Exits 77 cleanly if test_gemma4_dflash binary or models missing. Both safe to land now: scripts skip gracefully when their prerequisites aren't on disk, so they're CI-stable. Will turn GREEN when the corresponding MTP work lands.

cubic-dev-ai

2 issues found across 2 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

Two cubic findings on PR Luce-Org#242 (both valid P2): - Tensor-list extraction used 'grep -E \d' which matches a literal backslash-d in POSIX ERE, not a digit. Switched to [0-9]. - target-arch check was substring-based and would pass for values merely containing 'gemma4'. Switched to exact-equality. Script still exits 77 cleanly when prerequisites (gold model, safetensors) are absent.

davide221 · 2026-05-22T08:15:53Z

Submodule rollback hidden in diff, can you fix?

cubic-dev-ai Bot reviewed May 21, 2026

View reviewed changes

Comment thread dflash/test/test_mtp_converter.sh Outdated

Comment thread dflash/test/test_mtp_converter.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(mtp): add RED-bar scripts for converter + e2e determinism#242

test(mtp): add RED-bar scripts for converter + e2e determinism#242
dusterbloom wants to merge 2 commits into
Luce-Org:mainfrom
dusterbloom:test/mtp-red-bar-scripts

dusterbloom commented May 21, 2026

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

davide221 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dusterbloom commented May 21, 2026

Summary

What's added

Safe to land now

Not in this PR

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davide221 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cubic-dev-ai Bot left a comment •

edited

Loading