[Feat,Refactor]: Offline Dflash; Spec Mixin; Deprecate parallel draft; by h-guo18 · Pull Request #1271 · NVIDIA/Model-Optimizer

h-guo18 · 2026-04-16T03:09:08Z

What does this PR do?

Type of change: New feature, refactoring

HFSpecDecMixin: Extract duplicated base-model discovery, forward pass, NVTX profiling, and torch.compile logic from HFEagleModel / HFDFlashModel into a shared mixin.
Offline DFlash: Add dflash_offline config flag for training from pre-computed hidden states; deletes base model layers to save memory.
Deprecate ParallelDraft: Remove parallel_draft_step, ParallelDraft module, and all related logic from Eagle.
File reorg: transformers.py → hf_eagle.py; HFMedusaModel → hf_medusa.py; DFlashModule → modeling_dflash.py; EagleModule → modeling_eagle.py.
Config validation: Move dflash_mask_token_id auto-detection from main.py into DFlashConfig Pydantic validators.

Testing

Validated with existing Eagle and DFlash training scripts (online + offline modes).

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ❌ — removes parallel_draft_step config; renames transformers.py → hf_eagle.py
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ❌
Did you update Changelog?: ❌

Additional Information

Breaking changes:

modelopt.torch.speculative.plugins.transformers → .hf_eagle
parallel_draft_step / parallel_draft_heads_num_layers removed from Eagle config
_draft_model_config → eagle_config in export plugin

Summary by CodeRabbit

New Features
- HuggingFace Medusa speculative-decoding plugin added.
- Offline-mode support for DFlash speculative decoding.
Improvements
- Simplified EAGLE speculative flow (removed parallel-draft >1 complexity).
- Improved base-model/config handling and shared speculative-decoding utilities across HF plugins.
- Refactored speculative plugin integration for clearer HF interoperability.
Chores
- Cleaned up default EAGLE configuration exports.
Tests
- Updated export-related tests to reflect rope-scaling fallback behavior.

copy-pr-bot · 2026-04-16T03:09:12Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-16T03:09:14Z

📝 Walkthrough

Walkthrough

Consolidates speculative-decoding logic into dedicated modules, adds HFSpecDecMixin for shared base-model utilities, extracts EAGLE/DFlash draft models into new modeling modules, introduces a standalone HFMedusa plugin, adds DFlash offline/config validators, and updates call sites, imports, and defaults accordingly.

Changes

Cohort / File(s)	Summary
Plugin imports & tooling `.pre-commit-config.yaml`, `modelopt/torch/speculative/plugins/__init__.py`, `modelopt/torch/speculative/utils.py`	Switch plugin entrypoint from `transformers` to `hf_eagle`/`hf_medusa`; update pre-commit license exclusion; retarget CP‑TTT patch gating to the `hf_eagle` plugin.
Examples & scripts `examples/speculative_decoding/eagle_utils.py`, `examples/speculative_decoding/main.py`, `examples/speculative_decoding/scripts/ar_validate.py`	Redirect imports to new plugin modules and replace inline DFlash mask-token filling with Pydantic `DFlashConfig` validation/normalization in training flow.
DFlash config & runtime `modelopt/torch/speculative/config.py`, `modelopt/torch/speculative/dflash/dflash_model.py`	Add `dflash_offline` flag and validators (resolve mask token from context, enforce non-null); propagate `dflash_offline` into `DFlashModel.modify`.
Shared mixin `modelopt/torch/speculative/plugins/hf_spec_mixin.py`	Add `HFSpecDecMixin` providing base-model part discovery, `_base_model_forward` (freeze/loss handling), NVTX helper, and torch.compile activation helper for HF speculative plugins.
EAGLE modeling & outputs `modelopt/torch/speculative/plugins/modeling_eagle.py`	Add `EagleModule` and `EagleBaseModelOutput` with EAGLE draft architecture, auxiliary hidden-state support, optional reduced-vocab lm_head, and offline-output mapping.
DFlash modeling & utilities `modelopt/torch/speculative/plugins/modeling_dflash.py`	Add `DFlashModule`, attention/decoder layer implementations, `DFlashBaseModelOutput.from_offline_dict`, RoPE helpers, and `build_target_layer_ids`.
HFEagle plugin refactor `modelopt/torch/speculative/plugins/hf_eagle.py`	Remove in-file Medusa and parallel-draft logic; make `HFEagleModel` inherit `HFSpecDecMixin`; simplify draft/TTT flow and rely on shared `modeling_eagle`.
HFMedusa plugin added `modelopt/torch/speculative/plugins/hf_medusa.py`	Introduce `HFMedusaModel` implementing Medusa heads, registration, forward/loss aggregation, device/dtype alignment, and export in `__all__`.
HFDFlash plugin refactor `modelopt/torch/speculative/plugins/hf_dflash.py`	Replace local DFlash draft code with imports from `modeling_dflash`, add `HFSpecDecMixin`, introduce `_dflash_base_model_forward`, and adjust offline/online forward and pseudo-generation plumbing.
Exports & defaults `modelopt/torch/speculative/eagle/default_config.py`, `modelopt/torch/export/plugins/hf_spec_export.py`	Remove parallel-draft keys from default EAGLE configs; exporter now reads draft values from `model.eagle_config` (instead of `_draft_model_config`).
Tests `tests/unit/torch/export/test_hf_spec_rope_export.py`	Adjust tests to set `model.eagle_config.rope_theta=None` and assert `rope_scaling` falls through to training config for specific cases.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Caller
  participant Mixin as HFSpecDecMixin
  participant Base as BaseModel
  participant Draft as DraftModule
  participant Head as LMHead

  Caller->>Mixin: forward(input_ids, attention_mask, ...)
  Mixin->>Base: _base_model_forward(input_ids, attention_mask, freeze=...)
  Base-->>Mixin: BaseOutputs (hidden_states, logits, aux)
  Mixin->>Draft: forward(noise_embedding, target_hidden, position_ids, mask)
  Draft-->>Mixin: draft_logits / draft_hidden
  Mixin->>Head: compute final logits (optionally from draft/base)
  Head-->>Mixin: logits
  Mixin-->>Caller: ModelOutput (loss, logits, draft_logits, aux)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title concisely describes the main changes: introducing offline DFlash support, extracting shared mixin logic, and deprecating parallel draft functionality across Eagle/DFlash components.
Docstring Coverage	✅ Passed	Docstring coverage is 80.95% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	Security review found no anti-patterns: torch.load uses weights_only=True, trust_remote_code is configurable, no unsafe pickle/eval/exec calls, no nosec comments, no new dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch haoguo/spec-mixin

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

modelopt/torch/speculative/plugins/modeling_dflash.py (1)

118-124: Consider handling None for _attn_implementation more explicitly.

The code assumes config._attn_implementation is set (per the comment referencing dflash/default_config.py), but if it's None, ALL_ATTENTION_FUNCTIONS.get(None, ...) would still work and fall back to SDPA. However, this could be made more explicit for clarity.

♻️ Optional: More explicit None handling

     def _get_attn_fn(self):
         """Lazily resolve the HF attention function (default: sdpa)."""
         if self._attn_fn is not None:
             return self._attn_fn
-        impl = self.config._attn_implementation  # default set in dflash/default_config.py
+        impl = self.config._attn_implementation or "sdpa"
         self._attn_fn = ALL_ATTENTION_FUNCTIONS.get(impl, ALL_ATTENTION_FUNCTIONS["sdpa"])
         return self._attn_fn

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/modeling_dflash.py` around lines 118 -
124, The _get_attn_fn method should explicitly handle a None or missing
config._attn_implementation instead of relying on dict.get's default; update
_get_attn_fn to read impl = self.config._attn_implementation and if impl is None
or impl not in ALL_ATTENTION_FUNCTIONS explicitly assign impl = "sdpa" (or the
intended default) before setting self._attn_fn via
ALL_ATTENTION_FUNCTIONS[impl], so callers of _get_attn_fn and readers of the
code see clear, intentional fallback behavior referencing the _get_attn_fn
method, config._attn_implementation, and ALL_ATTENTION_FUNCTIONS.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/speculative/plugins/hf_medusa.py`:
- Line 137: The call in hf_medusa.py is passing an incorrectly named keyword
rcache_position which causes the cache position to be ignored; update the
invocation (around the code that constructs the KV cache / calls the method
using rcache_position) to use the correct keyword name cache_position so the
value is honored (search for rcache_position in the hf_medusa.py plugin and
replace it with cache_position in that function/method call).
- Around line 162-171: The loop mutates labels in-place causing misaligned
targets for later heads; preserve the original labels (e.g., store
original_labels = labels before the for loop) and inside the loop compute a
shifted copy like shifted = original_labels[..., 1 + i :].contiguous() (instead
of reassigning labels), then use loss_logits = medusa_logits[i][:, : -(1 +
i)].contiguous(), loss_labels = shifted.view(-1), and compute loss as before
with loss_fct, medusa_decay_coefficient, and medusa_heads_coefficient to avoid
cumulative shifts across iterations.

In `@modelopt/torch/speculative/plugins/hf_spec_mixin.py`:
- Around line 157-165: The code temporarily mutates the global
torch._dynamo.config.suppress_errors but never restores it; wrap the change
around the compile loop by saving the original value of
torch._dynamo.config.suppress_errors, set it to True before iterating
self._compile_targets and compiling each target (using getattr(self, name),
torch.compile(...), setattr(self, name, ...)), and ensure you restore the
original suppress_errors value in a finally block so the global config is
returned to its prior state regardless of compilation success or exceptions.
- Around line 169-175: HFDFlashModel currently inherits
HFSpecDecMixin.get_dummy_inputs which raises NotImplementedError; add an
override in HFDFlashModel (hf_dflash.py) implementing get_dummy_inputs()
following the same shape/keys pattern used by HFEagleModel (so it returns a dict
of dummy tensors/arrays for the export forward pass rather than raising),
ensuring the method signature matches the base and provides the expected keys
consumed by unified_export_hf.py (called at the export flow around
unified_export_hf.py:386) so speculative/offline export does not fail at
runtime.

In `@modelopt/torch/speculative/plugins/modeling_eagle.py`:
- Around line 159-163: The code always assigns self._input_embeds from
self.layers[0].input_layernorm(inputs_embeds) even when the EAGLE-3 pre-hook
that consumes it is not registered; guard the assignment so it only runs when
the pre-hook will consume the value (e.g., check
self.config.use_aux_hidden_state or the same condition used when registering the
pre-hook) to avoid retaining an unnecessary activation tensor. Update the block
that sets self._input_embeds (referencing inputs_embeds,
self.layers[0].input_layernorm, and self._input_embeds) to perform the
assignment only when use_aux_hidden_state (or the hook-registered flag) is true.

---

Nitpick comments:
In `@modelopt/torch/speculative/plugins/modeling_dflash.py`:
- Around line 118-124: The _get_attn_fn method should explicitly handle a None
or missing config._attn_implementation instead of relying on dict.get's default;
update _get_attn_fn to read impl = self.config._attn_implementation and if impl
is None or impl not in ALL_ATTENTION_FUNCTIONS explicitly assign impl = "sdpa"
(or the intended default) before setting self._attn_fn via
ALL_ATTENTION_FUNCTIONS[impl], so callers of _get_attn_fn and readers of the
code see clear, intentional fallback behavior referencing the _get_attn_fn
method, config._attn_implementation, and ALL_ATTENTION_FUNCTIONS.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0cc18e20-ee50-4dfc-bcc2-bad5a1a6e6e1

📥 Commits

Reviewing files that changed from the base of the PR and between 07ae8e7 and 6df234f.

📒 Files selected for processing (16)

.pre-commit-config.yaml
examples/speculative_decoding/eagle_utils.py
examples/speculative_decoding/main.py
examples/speculative_decoding/scripts/ar_validate.py
modelopt/torch/export/plugins/hf_spec_export.py
modelopt/torch/speculative/config.py
modelopt/torch/speculative/dflash/dflash_model.py
modelopt/torch/speculative/eagle/default_config.py
modelopt/torch/speculative/plugins/__init__.py
modelopt/torch/speculative/plugins/hf_dflash.py
modelopt/torch/speculative/plugins/hf_eagle.py
modelopt/torch/speculative/plugins/hf_medusa.py
modelopt/torch/speculative/plugins/hf_spec_mixin.py
modelopt/torch/speculative/plugins/modeling_dflash.py
modelopt/torch/speculative/plugins/modeling_eagle.py
modelopt/torch/speculative/utils.py

💤 Files with no reviewable changes (1)

modelopt/torch/speculative/eagle/default_config.py

coderabbitai · 2026-04-16T07:44:22Z

+        inputs_embeds = inputs_embeds.to(hidden_states.dtype).to(hidden_states.device)
+        # In EAGLE-3, we save input embeddings to attribute, and use it in first decoder layer by hook function
+        # Also, we normalize input embeddings and hidden states before concatenating them.
+        # The default input norm in first layer attn will be disabled.
+        self._input_embeds = self.layers[0].input_layernorm(inputs_embeds)


⚠️ Potential issue | 🟠 Major

Only stash _input_embeds when the EAGLE-3 pre-hook will consume it.

These lines run for every config, but the pre-hook is only registered under use_aux_hidden_state. In the normal path this leaves one extra activation tensor hanging off self for no reason, which increases memory pressure and can keep autograd state alive longer than necessary.

♻️ Proposed fix

inputs_embeds = inputs_embeds.to(hidden_states.dtype).to(hidden_states.device) # In EAGLE-3, we save input embeddings to attribute, and use it in first decoder layer by hook function # Also, we normalize input embeddings and hidden states before concatenating them. # The default input norm in first layer attn will be disabled. - self._input_embeds = self.layers[0].input_layernorm(inputs_embeds) + if self.config.use_aux_hidden_state: + self._input_embeds = self.layers[0].input_layernorm(inputs_embeds) + else: + self._input_embeds = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/speculative/plugins/modeling_eagle.py` around lines 159 - 163, The code always assigns self._input_embeds from self.layers[0].input_layernorm(inputs_embeds) even when the EAGLE-3 pre-hook that consumes it is not registered; guard the assignment so it only runs when the pre-hook will consume the value (e.g., check self.config.use_aux_hidden_state or the same condition used when registering the pre-hook) to avoid retaining an unnecessary activation tensor. Update the block that sets self._input_embeds (referencing inputs_embeds, self.layers[0].input_layernorm, and self._input_embeds) to perform the assignment only when use_aux_hidden_state (or the hook-registered flag) is true.

codecov · 2026-04-16T07:48:48Z

Codecov Report

❌ Patch coverage is 76.36816% with 95 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.23%. Comparing base (6ded36b) to head (f42d707).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...delopt/torch/speculative/plugins/modeling_eagle.py	50.60%	41 Missing ⚠️
modelopt/torch/speculative/plugins/hf_medusa.py	35.41%	31 Missing ⚠️
...odelopt/torch/speculative/plugins/hf_spec_mixin.py	82.45%	10 Missing ⚠️
modelopt/torch/speculative/plugins/hf_dflash.py	73.07%	7 Missing ⚠️
modelopt/torch/speculative/plugins/hf_eagle.py	90.32%	3 Missing ⚠️
...elopt/torch/speculative/plugins/modeling_dflash.py	98.41%	2 Missing ⚠️
modelopt/torch/speculative/config.py	95.65%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1271      +/-   ##
==========================================
- Coverage   75.58%   72.23%   -3.36%     
==========================================
  Files         459      463       +4     
  Lines       48613    48661      +48     
==========================================
- Hits        36745    35149    -1596     
- Misses      11868    13512    +1644

Flag	Coverage Δ
examples	`40.02% <27.86%> (+10.18%)`	⬆️
gpu	`51.79% <61.94%> (-8.71%)`	⬇️
unit	`52.24% <60.94%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

github-actions · 2026-04-16T21:56:42Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1271/
Built to branch `gh-pages` at 2026-04-17 00:27 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/speculative/plugins/__init__.py (1)
16-21: ⚠️ Potential issue | 🟡 Minor

Keep the plugin list in this docstring in sync with the exports.

The module now re-exports hf_dflash and hf_medusa at Lines 33-35, so saying the package currently supports only hf_eagle is misleading for readers and generated docs. Either list all three HuggingFace entrypoints here or rephrase this bullet as a non-exhaustive example.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/__init__.py` around lines 16 - 21, The
docstring currently lists only hf_eagle but the module also re-exports hf_dflash
and hf_medusa; update the top-level docstring in
modelopt.torch.speculative.plugins (the triple-quoted module docstring) to
either enumerate all three plugins (hf_eagle, hf_dflash, hf_medusa) or change
the bullet to indicate it is a non-exhaustive/example list; ensure the text
matches the actual exports so generated docs are accurate and consistent with
the re-exports of hf_eagle, hf_dflash and hf_medusa.

♻️ Duplicate comments (5)

modelopt/torch/speculative/plugins/hf_medusa.py (2)

113-123: ⚠️ Potential issue | 🔴 Critical

Use the correct Transformers kwarg here: cache_position.

rcache_position is not a recognized forward argument. On models without a **kwargs escape hatch this raises immediately; otherwise the cache position is silently ignored and KV-cache behavior becomes wrong.

🐛 Proposed fix

-                rcache_position=cache_position,
+                cache_position=cache_position,

In transformers 4.56.0, is `rcache_position` a valid `PreTrainedModel.forward` keyword argument, or should callers use `cache_position`?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/hf_medusa.py` around lines 113 - 123, The
forward call is passing an invalid kwarg name rcache_position to self.model when
building outputs; change that argument to the correct Transformers kwarg
cache_position so the model receives the cache position (i.e., replace the
rcache_position keyword with cache_position in the call that produces outputs)
to ensure KV-cache behavior is applied correctly in the self.model(...)
invocation.

147-156: ⚠️ Potential issue | 🔴 Critical

Don't shift labels in-place across heads.

Each iteration reuses the already-shifted tensor, so later heads train against cumulatively truncated targets instead of a fresh (i + 1)-token shift from the original labels.

🐛 Proposed fix

+            original_labels = labels
             # Medusa loss
             for i in range(self.medusa_num_heads):
-                labels = labels[..., 1:].contiguous()
+                shifted_labels = original_labels[..., (i + 1) :].contiguous()
                 loss_logits = medusa_logits[i][:, : -(1 + i)].contiguous()
                 loss_logits = loss_logits.view(-1, loss_logits.shape[-1])
-                loss_labels = labels.view(-1)
+                loss_labels = shifted_labels.view(-1)
                 loss += (
                     loss_fct(loss_logits, loss_labels)
                     * medusa_decay_coefficient**i
                     * medusa_heads_coefficient
                 )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/hf_medusa.py` around lines 147 - 156, The
loop currently mutates labels in-place so each head uses a cumulatively shifted
target; instead, capture the original labels before the loop (e.g.,
original_labels = labels) and inside the loop compute a per-head shifted view
like head_labels = original_labels[..., 1 + i :].contiguous(); use head_labels
(viewed into loss_labels) when computing loss with medusa_logits[i][:, :-(1 +
i)].contiguous(), leaving the original_labels untouched and preserving correct
(i+1)-token shifts per head while keeping loss_fct, medusa_decay_coefficient,
and medusa_heads_coefficient usage unchanged.

modelopt/torch/speculative/plugins/modeling_eagle.py (1)

159-163: ⚠️ Potential issue | 🟠 Major

Only stash _input_embeds when the EAGLE-3 hook will consume it.

For non-use_aux_hidden_state configs, this tensor is never used but remains attached to self for the whole forward pass, which increases memory pressure and can retain autograd state longer than necessary.

♻️ Proposed fix

         inputs_embeds = inputs_embeds.to(hidden_states.dtype).to(hidden_states.device)
         # In EAGLE-3, we save input embeddings to attribute, and use it in first decoder layer by hook function
         # Also, we normalize input embeddings and hidden states before concatenating them.
         # The default input norm in first layer attn will be disabled.
-        self._input_embeds = self.layers[0].input_layernorm(inputs_embeds)
+        if self.config.use_aux_hidden_state:
+            self._input_embeds = self.layers[0].input_layernorm(inputs_embeds)
+        else:
+            self._input_embeds = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/modeling_eagle.py` around lines 159 - 163,
The code unconditionally saves self._input_embeds which retains memory/autograd
even when not used; change the assignment to only stash the normalized input
embeddings when the model is configured to use the EAGLE-3 hook (check the flag
use_aux_hidden_state or self.config.use_aux_hidden_state), e.g. compute
inputs_embeds = self.layers[0].input_layernorm(inputs_embeds) as now but only
set self._input_embeds = ... when the hook flag is true (otherwise avoid storing
it or explicitly del/release it), referencing the attribute self._input_embeds
and the config flag use_aux_hidden_state and the layer method
layers[0].input_layernorm to locate the change.

modelopt/torch/speculative/plugins/hf_spec_mixin.py (1)

151-165: ⚠️ Potential issue | 🟠 Major

Restore torch._dynamo.config.suppress_errors after compilation.

This flips a process-global Dynamo flag and never restores it, so one compile attempt here can silently change later torch.compile() behavior elsewhere in the same process.

♻️ Proposed fix

     import torch._dynamo
 
-    torch._dynamo.config.suppress_errors = True  # Allow fallback to eager mode
-
-    for name, kwargs in self._compile_targets:
-        try:
-            setattr(self, name, torch.compile(getattr(self, name), dynamic=False, **kwargs))
-        except Exception:  # noqa: PERF203
-            print(f"Disabling torch.compile for {name} due to compilation error.")
+    prev_suppress_errors = torch._dynamo.config.suppress_errors
+    torch._dynamo.config.suppress_errors = True  # Allow fallback to eager mode
+    try:
+        for name, kwargs in self._compile_targets:
+            try:
+                setattr(self, name, torch.compile(getattr(self, name), dynamic=False, **kwargs))
+            except Exception:  # noqa: PERF203
+                print(f"Disabling torch.compile for {name} due to compilation error.")
+    finally:
+        torch._dynamo.config.suppress_errors = prev_suppress_errors

#!/bin/bash
set -euo pipefail
sed -n '151,166p' modelopt/torch/speculative/plugins/hf_spec_mixin.py
rg -n 'suppress_errors' modelopt/torch/speculative/plugins -C2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/hf_spec_mixin.py` around lines 151 - 165,
The method _activate_torch_compile temporarily flips the process-global
torch._dynamo.config.suppress_errors and never restores it; change it to save
the original value, set suppress_errors=True for the compile loop, and restore
the original value in a finally block (so even if setattr/torch.compile raises
in the for-loop or try/except, torch._dynamo.config.suppress_errors is reset).
Update the function around the existing import and for name, kwargs in
self._compile_targets loop to wrap the change in a try/finally that restores the
saved config flag.

modelopt/torch/speculative/plugins/hf_dflash.py (1)

159-163: ⚠️ Potential issue | 🔴 Critical

Implement get_dummy_inputs() before advertising export support.

HFDFlashModel now provides get_exporter(), but it still inherits HFSpecDecMixin.get_dummy_inputs() from modelopt/torch/speculative/plugins/hf_spec_mixin.py (Lines 169-171), which raises NotImplementedError. The first export path that requests dummy inputs will fail at runtime.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/hf_dflash.py` around lines 159 - 163,
HFDFlashModel advertises export support via get_exporter() but still inherits
HFSpecDecMixin.get_dummy_inputs() which raises NotImplementedError; implement a
get_dummy_inputs(self, device=None, dtype=None, batch_size=1, seq_len=1) method
on HFDFlashModel that returns the same dummy input structure the exporter
expects (or delegate to DFlashExporter.get_dummy_inputs(self, ...)) so export
callers won't fail—update the HFDFlashModel class to override get_dummy_inputs
and produce tensors/inputs matching DFlashExporter input names/shapes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/speculative/plugins/hf_dflash.py`:
- Around line 145-154: The code assumes every base model has a .layers
attribute; instead use the parts discovered by _find_base_model_parts() or guard
accesses: obtain base_device from self._base_model_lm_head.weight.device as the
safe default, and only override it with
next(self._base_model.layers[-1].parameters()).device when
hasattr(self._base_model, "layers") and self._base_model.layers is non-empty;
likewise only call self._base_model._modules.pop("layers") if "layers" is
actually present in self._base_model._modules. Ensure you still respect
self.dflash_offline, the dtype/device conversion of self.dflash_module, and the
existing base_device.type != "meta" check.

In `@modelopt/torch/speculative/plugins/hf_medusa.py`:
- Around line 102-103: The parameters medusa_heads_coefficient and
medusa_decay_coefficient are declared as float | None but later used directly in
numeric loss computations; either change their annotations to plain float or
coerce None to a numeric default before the loss loop. Fix by updating the
signature or, if you want to keep None allowed, add normalization at the start
of the function that computes the Medusa loss (e.g., where the loss loop
runs—search for medusa_heads_coefficient/medusa_decay_coefficient usage) and set
medusa_heads_coefficient = 0.2 and medusa_decay_coefficient = 0.8 (or the
desired defaults) if they are None so downstream numeric operations never
receive None.

In `@modelopt/torch/speculative/plugins/modeling_dflash.py`:
- Around line 54-65: The build_target_layer_ids function can produce duplicate
indices when num_draft_layers is close to num_target_layers; after computing the
candidate list (currently returned at the end of build_target_layer_ids)
validate that all IDs are unique (e.g., compare len(set(ids)) to
num_draft_layers) and if duplicates exist raise a ValueError rejecting the
config with a clear message (mentioning num_target_layers and num_draft_layers
and advising to reduce num_draft_layers or increase num_target_layers). Keep
this validation inside build_target_layer_ids so callers receive an explicit
error rather than silently feeding the same hidden state multiple times to the
fusion block.

---

Outside diff comments:
In `@modelopt/torch/speculative/plugins/__init__.py`:
- Around line 16-21: The docstring currently lists only hf_eagle but the module
also re-exports hf_dflash and hf_medusa; update the top-level docstring in
modelopt.torch.speculative.plugins (the triple-quoted module docstring) to
either enumerate all three plugins (hf_eagle, hf_dflash, hf_medusa) or change
the bullet to indicate it is a non-exhaustive/example list; ensure the text
matches the actual exports so generated docs are accurate and consistent with
the re-exports of hf_eagle, hf_dflash and hf_medusa.

---

Duplicate comments:
In `@modelopt/torch/speculative/plugins/hf_dflash.py`:
- Around line 159-163: HFDFlashModel advertises export support via
get_exporter() but still inherits HFSpecDecMixin.get_dummy_inputs() which raises
NotImplementedError; implement a get_dummy_inputs(self, device=None, dtype=None,
batch_size=1, seq_len=1) method on HFDFlashModel that returns the same dummy
input structure the exporter expects (or delegate to
DFlashExporter.get_dummy_inputs(self, ...)) so export callers won't fail—update
the HFDFlashModel class to override get_dummy_inputs and produce tensors/inputs
matching DFlashExporter input names/shapes.

In `@modelopt/torch/speculative/plugins/hf_medusa.py`:
- Around line 113-123: The forward call is passing an invalid kwarg name
rcache_position to self.model when building outputs; change that argument to the
correct Transformers kwarg cache_position so the model receives the cache
position (i.e., replace the rcache_position keyword with cache_position in the
call that produces outputs) to ensure KV-cache behavior is applied correctly in
the self.model(...) invocation.
- Around line 147-156: The loop currently mutates labels in-place so each head
uses a cumulatively shifted target; instead, capture the original labels before
the loop (e.g., original_labels = labels) and inside the loop compute a per-head
shifted view like head_labels = original_labels[..., 1 + i :].contiguous(); use
head_labels (viewed into loss_labels) when computing loss with
medusa_logits[i][:, :-(1 + i)].contiguous(), leaving the original_labels
untouched and preserving correct (i+1)-token shifts per head while keeping
loss_fct, medusa_decay_coefficient, and medusa_heads_coefficient usage
unchanged.

In `@modelopt/torch/speculative/plugins/hf_spec_mixin.py`:
- Around line 151-165: The method _activate_torch_compile temporarily flips the
process-global torch._dynamo.config.suppress_errors and never restores it;
change it to save the original value, set suppress_errors=True for the compile
loop, and restore the original value in a finally block (so even if
setattr/torch.compile raises in the for-loop or try/except,
torch._dynamo.config.suppress_errors is reset). Update the function around the
existing import and for name, kwargs in self._compile_targets loop to wrap the
change in a try/finally that restores the saved config flag.

In `@modelopt/torch/speculative/plugins/modeling_eagle.py`:
- Around line 159-163: The code unconditionally saves self._input_embeds which
retains memory/autograd even when not used; change the assignment to only stash
the normalized input embeddings when the model is configured to use the EAGLE-3
hook (check the flag use_aux_hidden_state or self.config.use_aux_hidden_state),
e.g. compute inputs_embeds = self.layers[0].input_layernorm(inputs_embeds) as
now but only set self._input_embeds = ... when the hook flag is true (otherwise
avoid storing it or explicitly del/release it), referencing the attribute
self._input_embeds and the config flag use_aux_hidden_state and the layer method
layers[0].input_layernorm to locate the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1ffffb46-7fb0-4b7d-8ca7-5e7583948d47

📥 Commits

Reviewing files that changed from the base of the PR and between 6df234f and a81ca04.

📒 Files selected for processing (16)

.pre-commit-config.yaml
examples/speculative_decoding/eagle_utils.py
examples/speculative_decoding/main.py
examples/speculative_decoding/scripts/ar_validate.py
modelopt/torch/export/plugins/hf_spec_export.py
modelopt/torch/speculative/config.py
modelopt/torch/speculative/dflash/dflash_model.py
modelopt/torch/speculative/eagle/default_config.py
modelopt/torch/speculative/plugins/__init__.py
modelopt/torch/speculative/plugins/hf_dflash.py
modelopt/torch/speculative/plugins/hf_eagle.py
modelopt/torch/speculative/plugins/hf_medusa.py
modelopt/torch/speculative/plugins/hf_spec_mixin.py
modelopt/torch/speculative/plugins/modeling_dflash.py
modelopt/torch/speculative/plugins/modeling_eagle.py
modelopt/torch/speculative/utils.py

💤 Files with no reviewable changes (1)

modelopt/torch/speculative/eagle/default_config.py

✅ Files skipped from review due to trivial changes (3)

modelopt/torch/speculative/utils.py
.pre-commit-config.yaml
examples/speculative_decoding/eagle_utils.py

🚧 Files skipped from review as they are similar to previous changes (5)

examples/speculative_decoding/scripts/ar_validate.py
modelopt/torch/export/plugins/hf_spec_export.py
modelopt/torch/speculative/dflash/dflash_model.py
modelopt/torch/speculative/config.py
modelopt/torch/speculative/plugins/hf_eagle.py

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai

🧹 Nitpick comments (1)

tests/unit/torch/export/test_hf_spec_rope_export.py (1)

68-72: Consider parameterizing these two fallback tests to remove duplication.

Line 61-72 can be a single pytest.mark.parametrize test with different (rope_type, eagle_export_rope_scaling, expected) tuples.

♻️ Proposed refactor

+import pytest
+
+@pytest.mark.parametrize(
+    "rope_type,eagle_export_rope_scaling,expected_rope_scaling",
+    [
+        ("llama3", DEFAULT_ROPE_SCALING, {"rope_type": "llama3", "rope_theta": 10000}),
+        ("default", {}, {"rope_type": "default", "rope_theta": 10000}),
+    ],
+)
+def test_rope_scaling_falls_through_to_training_config(
+    rope_type, eagle_export_rope_scaling, expected_rope_scaling
+):
+    """Export override is not applied when conditions are unmet; fallback uses training config."""
+    config = _make_exporter(
+        rope_type=rope_type,
+        eagle_export_rope_scaling=eagle_export_rope_scaling,
+    )._export_config()
+    assert config["rope_scaling"] == expected_rope_scaling
-
-def test_rope_not_overridden_when_non_default_training_rope():
-    """Export override is not applied when training rope_type is not 'default';
-    rope_scaling falls through to the training config."""
-    config = _make_exporter(rope_type="llama3")._export_config()
-    assert config["rope_scaling"] == {"rope_type": "llama3", "rope_theta": 10000}
-
-
-def test_rope_not_overridden_when_eagle_export_rope_scaling_is_empty():
-    """Export override is not applied when eagle_export_rope_scaling is empty;
-    rope_scaling falls through to the training config."""
-    config = _make_exporter(eagle_export_rope_scaling={})._export_config()
-    assert config["rope_scaling"] == {"rope_type": "default", "rope_theta": 10000}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/export/test_hf_spec_rope_export.py` around lines 68 - 72,
Combine the two duplicate fallback tests into a single parametrized pytest by
replacing the separate test functions (e.g.,
test_rope_not_overridden_when_eagle_export_rope_scaling_is_empty) with one test
decorated with pytest.mark.parametrize over tuples of (rope_type,
eagle_export_rope_scaling, expected); inside the test call
_make_exporter(...)._export_config() with the parametrized
eagle_export_rope_scaling and assert config["rope_scaling"] == expected, keeping
the original docstring/context as needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/torch/export/test_hf_spec_rope_export.py`:
- Around line 68-72: Combine the two duplicate fallback tests into a single
parametrized pytest by replacing the separate test functions (e.g.,
test_rope_not_overridden_when_eagle_export_rope_scaling_is_empty) with one test
decorated with pytest.mark.parametrize over tuples of (rope_type,
eagle_export_rope_scaling, expected); inside the test call
_make_exporter(...)._export_config() with the parametrized
eagle_export_rope_scaling and assert config["rope_scaling"] == expected, keeping
the original docstring/context as needed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5c5d29d0-b9d3-4a22-8c05-21a28a3ed5e2

📥 Commits

Reviewing files that changed from the base of the PR and between a81ca04 and f42d707.

📒 Files selected for processing (1)

tests/unit/torch/export/test_hf_spec_rope_export.py

yeyu-nvidia · 2026-04-17T00:38:22Z

Thanks for working on this! I have a number of concerns with the current state of this PR. Given the volume of issues below, I don't think this is ready for merge — it needs significant rework.

1. Please split this into separate PRs

This PR bundles 5 independent changes: file renames, mixin extraction, offline DFlash, ParallelDraft removal, and config validation changes. These are hard to review together and make rollback difficult. I'd suggest at minimum:

PR A: Refactoring only (file renames, mixin extraction, ParallelDraft removal) — no behavioral changes
PR B: Offline DFlash feature — with tests

2. The mixin is too broad

The core idea of deduplicating base-model discovery (_base_model, _base_model_embeddings, _base_model_lm_head, _find_base_model_parts) is good. But several things in HFSpecDecMixin are single-consumer:

_nvtx_range: Only Eagle uses this. DFlash never sets _enable_nvtx or calls it.
_activate_torch_compile / _compile_targets: Only Eagle uses these. DFlash has its own compile approach.
get_dummy_inputs / get_exporter: These raise NotImplementedError — they define an export interface that belongs in the algorithm-specific base classes, not a mixin.
_base_model_forward: Eagle never calls this (it has _eagle_base_model_forward which is much more complex). So it's only used by DFlash — not actually shared.

Suggestion: keep the mixin slim — just the base-model property discovery. Move compile, NVTX, export, and forward logic back to the individual classes.

3. Breaking changes need a migration path

transformers.py → hf_eagle.py: This silently breaks from modelopt.torch.speculative.plugins.transformers import ... for any downstream code. Please add a transformers.py compatibility shim with a deprecation warning.
parallel_draft_step removal: Existing YAML configs or saved checkpoints with this key will break. Consider ignoring unknown keys or emitting a deprecation warning for one release cycle.

4. Offline DFlash implementation has multiple bugs

No hidden state dumper: Eagle offline works because dump_hidden_states.py produces the right format. DFlash needs different hidden states (target_hidden = concatenated multi-layer states from specific target layers), but no dumper script is provided. Offline DFlash is un-runnable as shipped.
num_orig_hidden_layers not set: The offline forward accesses base_config.num_orig_hidden_layers, but this is only set in load_vlm_or_llm for Eagle. DFlash offline will hit AttributeError at runtime.
_base_model._modules.pop("layers") is fragile: This hardcodes the assumption that layers live at .layers. Eagle's approach (setting num_hidden_layers=0 during loading) is more robust across model architectures/VLMs.
No eval-mode guard: In offline mode with layers deleted, super().forward() on the eval path will crash because the base model layers are gone.
base_model_hidden_states key mismatch: DFlashBaseModelOutput.from_offline_dict extracts aux_hidden_states and base_model_logits, but the forward method later reaches into the raw kwargs["base_model_outputs"] dict for base_model_hidden_states. Inconsistent — sometimes the dataclass is used, sometimes bypassed.

5. No tests for new feature

Zero new tests for offline DFlash
No MRO verification for the mixin's super().forward() chain

6. Minor issues

# mypy: disable-error-code="attr-defined,misc" in hf_spec_mixin.py — suppressing all mypy attr errors across the entire file is a red flag. Consider using typing.Protocol or proper type annotations.
Unused import: DFlashAttention in hf_dflash.py (imported with # noqa: F401 but never used and not in __all__)
nn.Softmax → torch.nn.Softmax change in hf_eagle.py is unnecessary — it's because the from torch import nn import was removed. Just keep the import.

h-guo18 · 2026-04-17T00:58:54Z

Thanks for working on this! I have a number of concerns with the current state of this PR. Given the volume of issues below, I don't think this is ready for merge — it needs significant rework.

1. Please split this into separate PRs

This PR bundles 5 independent changes: file renames, mixin extraction, offline DFlash, ParallelDraft removal, and config validation changes. These are hard to review together and make rollback difficult. I'd suggest at minimum:

PR A: Refactoring only (file renames, mixin extraction, ParallelDraft removal) — no behavioral changes

PR B: Offline DFlash feature — with tests

2. The mixin is too broad

The core idea of deduplicating base-model discovery (_base_model, _base_model_embeddings, _base_model_lm_head, _find_base_model_parts) is good. But several things in HFSpecDecMixin are single-consumer:

_nvtx_range: Only Eagle uses this. DFlash never sets _enable_nvtx or calls it.

_activate_torch_compile / _compile_targets: Only Eagle uses these. DFlash has its own compile approach.

get_dummy_inputs / get_exporter: These raise NotImplementedError — they define an export interface that belongs in the algorithm-specific base classes, not a mixin.

_base_model_forward: Eagle never calls this (it has _eagle_base_model_forward which is much more complex). So it's only used by DFlash — not actually shared.

Suggestion: keep the mixin slim — just the base-model property discovery. Move compile, NVTX, export, and forward logic back to the individual classes.

3. Breaking changes need a migration path

transformers.py → hf_eagle.py: This silently breaks from modelopt.torch.speculative.plugins.transformers import ... for any downstream code. Please add a transformers.py compatibility shim with a deprecation warning.

parallel_draft_step removal: Existing YAML configs or saved checkpoints with this key will break. Consider ignoring unknown keys or emitting a deprecation warning for one release cycle.

4. Offline DFlash implementation has multiple bugs

No hidden state dumper: Eagle offline works because dump_hidden_states.py produces the right format. DFlash needs different hidden states (target_hidden = concatenated multi-layer states from specific target layers), but no dumper script is provided. Offline DFlash is un-runnable as shipped.

num_orig_hidden_layers not set: The offline forward accesses base_config.num_orig_hidden_layers, but this is only set in load_vlm_or_llm for Eagle. DFlash offline will hit AttributeError at runtime.

_base_model._modules.pop("layers") is fragile: This hardcodes the assumption that layers live at .layers. Eagle's approach (setting num_hidden_layers=0 during loading) is more robust across model architectures/VLMs.

No eval-mode guard: In offline mode with layers deleted, super().forward() on the eval path will crash because the base model layers are gone.

base_model_hidden_states key mismatch: DFlashBaseModelOutput.from_offline_dict extracts aux_hidden_states and base_model_logits, but the forward method later reaches into the raw kwargs["base_model_outputs"] dict for base_model_hidden_states. Inconsistent — sometimes the dataclass is used, sometimes bypassed.

5. No tests for new feature

Zero new tests for offline DFlash

No MRO verification for the mixin's super().forward() chain

6. Minor issues

# mypy: disable-error-code="attr-defined,misc" in hf_spec_mixin.py — suppressing all mypy attr errors across the entire file is a red flag. Consider using typing.Protocol or proper type annotations.

Unused import: DFlashAttention in hf_dflash.py (imported with # noqa: F401 but never used and not in __all__)

nn.Softmax → torch.nn.Softmax change in hf_eagle.py is unnecessary — it's because the from torch import nn import was removed. Just keep the import.

@yeyu-nvidia , HI Ye, did you look at the code yourself or at least go through the comment by AI? I think over half of the "concerns" (2 and 3, and partially 4) are non-sense and it's a waste of time to explain item by item. Let me know if you agree any of them.

There are a few points that make sense to me. Will address them.

h-guo18 · 2026-04-17T04:10:12Z

@yeyu-nvidia Since the original comment is too long, I put the item-by-item technical feedbacks here: https://docs.google.com/document/d/1Uc4yj-bmvOA4zMAhh7QQTedn0_7BDwPO1LaiC1AlE7Q/edit?usp=sharing

h-guo18 changed the title ~~code reuse for dflashj/eagle, depreacet parallel draft~~ Refactor: code reuse for dflash/eagle; Deprecate parallel draft Apr 16, 2026

h-guo18 changed the title ~~Refactor: code reuse for dflash/eagle; Deprecate parallel draft~~ [Feat,Refactor]: Offline Dflash; Spec Mixin; Deprecate parallel draft; Apr 16, 2026

h-guo18 marked this pull request as ready for review April 16, 2026 07:32

h-guo18 requested review from a team as code owners April 16, 2026 07:32

h-guo18 requested review from ChenhanYu, kevalmorabia97 and sugunav14 April 16, 2026 07:32

h-guo18 marked this pull request as draft April 16, 2026 07:35

h-guo18 self-assigned this Apr 16, 2026

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

kevalmorabia97 reviewed Apr 16, 2026

View reviewed changes

Comment thread modelopt/torch/speculative/plugins/hf_medusa.py Outdated

h-guo18 force-pushed the haoguo/spec-mixin branch from 6df234f to f91cf9d Compare April 16, 2026 21:47

squash: spec-mixin

f82a9ef

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 force-pushed the haoguo/spec-mixin branch from f91cf9d to f82a9ef Compare April 16, 2026 21:52

debug offline

a81ca04

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 marked this pull request as ready for review April 16, 2026 23:27

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Comment thread modelopt/torch/speculative/plugins/hf_dflash.py

Comment thread modelopt/torch/speculative/plugins/hf_medusa.py

Comment thread modelopt/torch/speculative/plugins/modeling_dflash.py

h-guo18 requested a review from yeyu-nvidia April 16, 2026 23:39

fix tests

f42d707

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai bot reviewed Apr 17, 2026

View reviewed changes

h-guo18 mentioned this pull request Apr 19, 2026

[1/3][Refactor]: File reorg; deprecate ParallelDraft #1296

Open

This was referenced Apr 19, 2026

[2/3][Feat]: Offline DFlash training #1295

Draft

[3/3][Refactor]: Extract HFSpecDecMixin for HF spec-decoding plugins #1297

Draft

Conversation

h-guo18 commented Apr 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 16, 2026

Uh oh!

coderabbitai bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-17 00:27 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yeyu-nvidia commented Apr 17, 2026

1. Please split this into separate PRs

2. The mixin is too broad

3. Breaking changes need a migration path

4. Offline DFlash implementation has multiple bugs

5. No tests for new feature

6. Minor issues

Uh oh!

h-guo18 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Please split this into separate PRs

2. The mixin is too broad

3. Breaking changes need a migration path

4. Offline DFlash implementation has multiple bugs

5. No tests for new feature

6. Minor issues

Uh oh!

h-guo18 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

h-guo18 commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

codecov bot commented Apr 16, 2026 •

edited

Loading

github-actions bot commented Apr 16, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-17 00:27 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

h-guo18 commented Apr 17, 2026 •

edited

Loading