You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
K3 Block A vast evidence (honest drafter handling)
Re-ran the feasibility smoke on H200 with the DFlash-honesty fix.
summary now: verifier_loadable/forward_ok=true; drafter_loadable=true
(backbone memory probe); drafter_faithful_transformers_load=false;
drafter_forward_ok=null (n/a — spec-decode-only); validation_path=
vllm_pr_41703_or_sglang. Verifier 2.77 tok/s. Confirms hardware
feasibility for the verifier; DFlash drafting protocol intentionally
NOT claimed here (deferred to the vLLM/SGLang run).
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
"reason": "architectures=['DFlashDraftModel'] is not loadable as a standalone transformers model (no auto_map / not a built-in class). DFlash is a block-diffusion speculative-decoding drafter; run it via vLLM (PR #41703) or SGLang per the model card. The transformers path here only loads the qwen3 backbone as a memory probe and does NOT exercise the DFlash drafting protocol.",
81
+
"validation_path": "vllm_pr_41703_or_sglang"
82
+
}
83
+
],
84
+
"summary": {
85
+
"status": "pass",
86
+
"verifier_loadable": true,
87
+
"verifier_forward_ok": true,
88
+
"drafter_loadable": true,
89
+
"drafter_faithful_transformers_load": false,
90
+
"drafter_forward_ok": null,
91
+
"drafter_note": "architectures=['DFlashDraftModel'] is not loadable as a standalone transformers model (no auto_map / not a built-in class). DFlash is a block-diffusion speculative-decoding drafter; run it via vLLM (PR #41703) or SGLang per the model card. The transformers path here only loads the qwen3 backbone as a memory probe and does NOT exercise the DFlash drafting protocol.",
[k3-smoke] NOTE: architectures=['DFlashDraftModel'] is not loadable as a standalone transformers model (no auto_map / not a built-in class). DFlash is a block-diffusion speculative-decoding drafter; run it via vLLM (PR #41703) or SGLang per the model card. The transformers path here only loads the qwen3 backbone as a memory probe and does NOT exercise the DFlash drafting protocol.
11
+
[k3-smoke] -> loading qwen3 backbone as a MEMORY PROBE ONLY (not a faithful DFlash load; standalone forward will be skipped).
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
23
+
- MISSING: those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
24
+
[k3-smoke] drafter loaded in 4.1s (backbone memory probe)
25
+
[transformers] The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
26
+
[k3-smoke] verifier forward OK; gen=8 tokens in 2.89s (2.77 tok/s)
27
+
[k3-smoke] drafter forward SKIPPED (spec-decode-only drafter; validate via vLLM PR #41703 / SGLang — not transformers).
0 commit comments