[codex] DeepSeek FP4 MTP decode safeguards and MLA hooks#779
Draft
josusanmartin wants to merge 1 commit into
Draft
[codex] DeepSeek FP4 MTP decode safeguards and MLA hooks#779josusanmartin wants to merge 1 commit into
josusanmartin wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ATOM PR Draft
Title
DeepSeek FP4 MTP decode safeguards and guarded small-batch MLA path
Summary
This PR contains DeepSeek R1 FP4 + MTP changes developed for the AMD MI355X finals environment. The patch focuses on correctness-preserving decode improvements and guarded experiment surfaces:
q_len=4,qh32,gqa32, FP8 KV decode.Motivation
DeepSeek R1 FP4 with MTP spends substantial time in decode attention/MoE and MTP verifier plumbing. The competition workload uses fixed
8192/1024random prompts atCONC=4,32,128, so safe decode-path improvements are valuable, but GSM8K correctness must remain intact.The submitted leaderboard runs using this stack passed GSM8K at all three concurrencies.
Implementation Notes
atom/model_engine/scheduler.py: clamp speculative rollback and keep sequence token counts consistent after preemption.atom/model_ops/attention_mla.py: add guarded direct AITERmla_decode_stage1_asm_fwd+mla_reduce_v1path and scratch cache.atom/model_engine/model_runner.py: support optionalskip_logitspath for exact greedy argmax from hidden states.atom/model_ops/embed_head.py: expose local LM-head projection for TP-local argmax.atom/model_ops/rejection_sampler.py: add argmax-only verifier path for exact greedy speculative acceptance.atom/spec_decode/eagle.pyandatom/utils/forward_context.py: carry optional completion counts and speculative metadata.atom/utils/envs.py: add guarded env vars, all default-off except existing behavior-compatible defaults.Rule / Correctness Notes
Validation
Submitted leaderboard results:
Known limitation: these submitted numbers are accuracy-safe but do not clear all DeepSeek performance gates.