-
Notifications
You must be signed in to change notification settings - Fork 393
add: DFlash block diffusion speculative decoding #1128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 23 commits
Commits
Show all changes
75 commits
Select commit
Hold shift + click to select a range
3f89ea9
add: DFlash block diffusion speculative decoding
ChenhanYu 190cb3a
fix: rewrite DFlash to match SpecForge reference
ChenhanYu b7a2a7b
fix: correct mask_token_id and base model forward dispatch
ChenhanYu a310d96
add: auto-detect mask_token_id for DFlash across model families
ChenhanYu 972dfaa
fix: prevent DDP deadlock during AR validation
ChenhanYu 6c4eb80
fix: avoid DynamicModule dispatch loop in forward/training paths
ChenhanYu 2c42363
fix: revert training/eval to super().forward() matching EAGLE pattern
ChenhanYu a279960
fix: DDP deadlock when no valid loss positions on a rank
ChenhanYu cbddc30
add: logit distillation option for DFlash training
ChenhanYu c53a66a
fix: print training accuracy to console at each log step
ChenhanYu 2eabf57
fix: use response-only loss mask for DFlash training
ChenhanYu 2a16232
fix: apply assistant_masks to labels in LanguageDataCollator
ChenhanYu e3b9930
fix: robust response-only loss mask via regex assistant span detection
ChenhanYu 07066c2
docs: add DFlash section to speculative decoding README
ChenhanYu a32de63
fix: resolve DFlash components from base model architecture
ChenhanYu 6a6a9ca
fix: enable response-only loss mask for DFlash training
ChenhanYu a777849
add: DFlash launcher example for Qwen3-8B
ChenhanYu 2c56aca
fix: inline values in DFlash launcher YAML for --yaml compatibility
ChenhanYu 306fc3e
add: unit tests for DFlash speculative decoding
ChenhanYu c4a3ecb
fix: add docstrings to DFlash classes for coverage check
ChenhanYu 1c23ced
add: AR validation step to DFlash launcher pipeline
ChenhanYu 38450b0
fix: split DFlash tests into CPU (unit) and GPU tests
ChenhanYu 4c2fc77
fix: correct DFlash attention mask test for reverse-causal pattern
ChenhanYu bce17cf
fix: remove __init__.py from GPU test dirs to avoid conftest conflict
ChenhanYu 1165272
fix: match dtype in DFlash GPU tests to model dtype
ChenhanYu 273ba32
fix: use Optional types for nullable DFlash arguments
ChenhanYu 9bf9c34
fix: merge AR validation into DFlash training script
ChenhanYu d19cd3b
fix: align pseudo_speculative_generate with training masks
ChenhanYu 73bb0cc
fix: use standard causal mask within DFlash blocks
ChenhanYu 3fa0d64
fix: increase DDP timeout to 1800s for DFlash training
ChenhanYu 80afde2
fix: revert to SpecForge's reverse-causal mask (j >= i)
ChenhanYu bfdd582
fix: use continuing position IDs for DFlash inference block
ChenhanYu fb7acab
fix: remove attention mask at DFlash inference, matching SpecForge
ChenhanYu 290670f
add: standalone DFlash training script with SpecForge data pipeline
ChenhanYu eb6a0c9
fix: create attention mask in f32 then cast, matching SpecForge
ChenhanYu 2c853c1
fix: use HF attention dispatch in DFlashAttention for SpecForge parity
ChenhanYu d6adadb
fix: default DFlash attention to sdpa matching SpecForge
ChenhanYu 65df160
fix: initialize DFlash weights with normal_(std=0.02) matching SpecForge
ChenhanYu 4451101
debug: add attn_fn resolution and per-layer comparison prints
ChenhanYu 2726068
feat: update DFlash training to match SpecForge latest (post-PR #473)
ChenhanYu 3516c0b
fix: remove extra unsqueeze in DFlash training attention mask
ChenhanYu 606e31d
fix: create training attention mask in f32 to avoid bf16 overflow
ChenhanYu e1237f7
fix: add dflash_num_anchors/loss_decay_gamma to launch_train.sh
ChenhanYu b8e5eb7
feat: add logit distillation to new random-anchor DFlash training
ChenhanYu b0df28c
fix: add dflash_use_logit_distillation to launch_train.sh
ChenhanYu 4226349
fix: shift teacher logits by -1 for DFlash logit distillation
ChenhanYu 818eb74
fix: mask all tokens when assistant pattern not found
ChenhanYu 49038c7
feat: auto-inject generation tags for reliable answer_only_loss
ChenhanYu c49f6d9
fix: add <think> wrapper to simplified ChatML template for Qwen3
ChenhanYu ae2e7bd
fix: remove think wrapper from simplified ChatML template
ChenhanYu 82eedb2
feat: add chatml_think template variant for Qwen3 think injection
ChenhanYu 4ebb9de
docs: document simplified generation templates and limitations
ChenhanYu 4561515
fix: ensure zero-loss path has gradient for DDP sync
ChenhanYu 047ba1d
fix: prefer conversations field when messages lacks assistant turn
ChenhanYu bdcc0de
cleanup: remove debug prints and regex fallback in DFlash and dataset…
ChenhanYu e526016
fix: unwrap DDP model for AR validation to avoid deadlock
ChenhanYu 633da55
fix: skip samples without assistant turns instead of crashing
ChenhanYu 43afb06
fix: handle empty batch with dummy assistant turn
ChenhanYu 90e9b4b
fix: AR validation deadlock - eval all ranks, validate on rank 0
ChenhanYu 870db23
feat: add TensorBoard logging for DFlash training
ChenhanYu 0d3f1fa
fix: skip AR validation during DDP training to prevent deadlock
ChenhanYu 9b76b7d
feat: add DFlash export to z-lab compatible HF format
ChenhanYu 1cfd558
fix: checkpoint resume + export-then-validate pipeline
ChenhanYu 6a153bf
feat: auto-detect HEAD_NODE_IP for multi-node DFlash training
ChenhanYu c0c4330
fix: use explicit bfloat16 and device_map for export loading
ChenhanYu 6684f47
fix: improve HEAD_NODE_IP auto-detection for multi-node
ChenhanYu dd6e282
fix: multi-method HEAD_NODE_IP detection for multi-node
ChenhanYu 3efd659
fix: force dp_shard_size=1 for DFlash DDP training
ChenhanYu 7d0028e
fix: support both DDP and FSDP for DFlash training
ChenhanYu 496830d
fix: use AutoModelForCausalLM directly for export loading
ChenhanYu 7255969
fix: use export_hf_checkpoint.py script for DFlash export
ChenhanYu 3a8ff9c
fix: load model from output_dir for checkpoint resume
ChenhanYu ba0132c
add: DFlash results page + online validation for AR evaluation
ChenhanYu d5a5200
fix: rename dataset to nvidia/Nemotron-Post-Training-Dataset-v2
ChenhanYu 7605414
chg: replace HTML results page with Markdown for GitHub rendering
ChenhanYu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.