Skip to content

test(mtp): add RED-bar scripts for converter + e2e determinism#242

Open
dusterbloom wants to merge 2 commits into
Luce-Org:mainfrom
dusterbloom:test/mtp-red-bar-scripts
Open

test(mtp): add RED-bar scripts for converter + e2e determinism#242
dusterbloom wants to merge 2 commits into
Luce-Org:mainfrom
dusterbloom:test/mtp-red-bar-scripts

Conversation

@dusterbloom
Copy link
Copy Markdown
Contributor

Summary

Two shell test scripts carved out of #241 per @howard0su's review request to split the rename PR from the test additions.

What's added

  • dflash/test/test_mtp_converter.shPhase 1 RED test for convert_dflash_to_gguf.py --mtp-assistant mode. Validates the converter's GGUF output matches the AtomicChat gold reference on (1) tensor list, (2) required MTP tensors, (3) required GGUF metadata keys, (4) requires_target_arch == "gemma4". Exits 77 (autotools skip code) cleanly if the gold reference at $MTP_GOLD_GGUF is missing.

  • dflash/test/test_mtp_e2e.shPhase 3 RED test for --draft-method mtp byte-identical output vs --draft-method none at temp=0 seed=0. Exits 77 cleanly if the test_gemma4_dflash binary or the model files are missing.

Safe to land now

Both scripts:

  • Use #!/usr/bin/env bash + set -euo pipefail
  • Have no C++ namespace dependencies
  • Exit 77 (skip) when prerequisites aren't on disk
  • Have no global state, no side effects beyond writing to /tmp

They'll turn GREEN when the corresponding MTP work lands. Until then they're CI-stable skips.

Not in this PR

The third test file flagged in #241, dflash/test/test_flash_attn_sparse_mask.cpp, is deferred. It's a TDD red-bar that intentionally references symbols (flash_prefill_forward_bf16_masked(), 6-arg ggml_flash_attn_sparse()) that don't exist yet, so adding it to main without the spike that turns it green would leave a permanently broken build target. It will ship with the sparse-mask spike PR.

Two shell tests carved out of Luce-Org#241 per @howard0su's review request:

- test_mtp_converter.sh: Phase 1 RED for convert_dflash_to_gguf.py
  --mtp-assistant mode. Validates the converter output matches the
  AtomicChat gold reference (tensor list, required MTP tensors, GGUF
  metadata keys, requires_target_arch == gemma4). Exits 77 cleanly if
  gold reference missing.

- test_mtp_e2e.sh: Phase 3 RED for --draft-method mtp byte-identical
  output vs --draft-method none at temp=0 seed=0. Exits 77 cleanly if
  test_gemma4_dflash binary or models missing.

Both safe to land now: scripts skip gracefully when their prerequisites
aren't on disk, so they're CI-stable. Will turn GREEN when the
corresponding MTP work lands.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread dflash/test/test_mtp_converter.sh Outdated
Comment thread dflash/test/test_mtp_converter.sh
Two cubic findings on PR Luce-Org#242 (both valid P2):
- Tensor-list extraction used 'grep -E \d' which matches a literal
  backslash-d in POSIX ERE, not a digit. Switched to [0-9].
- target-arch check was substring-based and would pass for values
  merely containing 'gemma4'. Switched to exact-equality.

Script still exits 77 cleanly when prerequisites (gold model,
safetensors) are absent.
@davide221
Copy link
Copy Markdown
Contributor

Submodule rollback hidden in diff, can you fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants