Add hc_mult support to DFlash for DeepSeek-V4-Flash#524
Conversation
Generalize the DFlash speculator model and training data pipeline to handle verifier models with hc_mult > 1 (e.g. DSv4 where hc_mult=4). All changes degenerate to current behavior when hc_mult=1. - Read hc_mult from verifier config and thread it through to the draft model's transformer_layer_config - Expand FC layer input dimension to len(target_layer_ids) * hc_mult * hidden_size - Register hc_head_fn/base/scale buffers when hc_mult > 1 - Apply hc_head projection to verifier_last_hidden_states before verifier_norm in the forward pass loss computation path - Override load_verifier_weights to handle DSv4's non-standard weight names (embed.weight, head.weight, norm.weight) and load hc_head parameters - Pass effective_hidden_size (hc_mult * hidden_size) to dataloaders so empty sample creation and collation use correct tensor shapes Signed-off-by: Rahul Tuli <rtuli@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The quality checks have failed. Please run |
…t#414) Updates the requirements on [pytest-mock](https://github.com/pytest-dev/pytest-mock) to permit the latest version. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pytest-dev/pytest-mock/releases">pytest-mock's releases</a>.</em></p> <blockquote> <h2>v3.15.1</h2> <p><em>2025-09-16</em></p> <ul> <li><a href="https://redirect.github.com/pytest-dev/pytest-mock/issues/529">#529</a>: Fixed <code>itertools._tee object has no attribute error</code> -- now <code>duplicate_iterators=True</code> must be passed to <code>mocker.spy</code> to duplicate iterators.</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pytest-dev/pytest-mock/blob/main/CHANGELOG.rst">pytest-mock's changelog</a>.</em></p> <blockquote> <h2>3.15.1</h2> <p><em>2025-09-16</em></p> <ul> <li><code>[vllm-project#529](pytest-dev/pytest-mock#529) <https://github.com/pytest-dev/pytest-mock/issues/529></code>_: Fixed <code>itertools._tee object has no attribute error</code> -- now <code>duplicate_iterators=True</code> must be passed to <code>mocker.spy</code> to duplicate iterators.</li> </ul> <h2>3.15.0</h2> <p><em>2025-09-04</em></p> <ul> <li>Python 3.8 (EOL) is no longer supported.</li> <li><code>[vllm-project#524](pytest-dev/pytest-mock#524) <https://github.com/pytest-dev/pytest-mock/pull/524></code>_: Added <code>spy_return_iter</code> to <code>mocker.spy</code>, which contains a duplicate of the return value of the spied method if it is an <code>Iterator</code>.</li> </ul> <h2>3.14.1 (2025-05-26)</h2> <ul> <li><code>[vllm-project#503](pytest-dev/pytest-mock#503) <https://github.com/pytest-dev/pytest-mock/pull/503></code>_: Python 3.14 is now officially supported.</li> </ul> <h2>3.14.0 (2024-03-21)</h2> <ul> <li> <p><code>[vllm-project#415](pytest-dev/pytest-mock#415) <https://github.com/pytest-dev/pytest-mock/pull/415></code>_: <code>MockType</code> and <code>AsyncMockType</code> can be imported from <code>pytest_mock</code> for type annotation purposes.</p> </li> <li> <p><code>[vllm-project#420](pytest-dev/pytest-mock#420) <https://github.com/pytest-dev/pytest-mock/issues/420></code>_: Fixed a regression which would cause <code>mocker.patch.object</code> to not being properly cleared between tests.</p> </li> </ul> <h2>3.13.0 (2024-03-21)</h2> <ul> <li><code>[vllm-project#417](pytest-dev/pytest-mock#417) <https://github.com/pytest-dev/pytest-mock/pull/417></code>_: <code>spy</code> now has <code>spy_return_list</code>, which is a list containing all the values returned by the spied function.</li> <li><code>pytest-mock</code> now requires <code>pytest>=6.2.5</code>.</li> <li><code>[vllm-project#410](pytest-dev/pytest-mock#410) <https://github.com/pytest-dev/pytest-mock/pull/410></code><em>: pytest-mock's <code>setup.py</code> file is removed. If you relied on this file, e.g. to install pytest using <code>setup.py install</code>, please see <code>Why you shouldn't invoke setup.py directly <https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html#summary></code></em> for alternatives.</li> </ul> <h2>3.12.0 (2023-10-19)</h2> <ul> <li>Added support for Python 3.12.</li> <li>Dropped support for EOL Python 3.7.</li> <li><code>mocker.resetall()</code> now also resets mocks created by <code>mocker.create_autospec</code> (<code>[vllm-project#390](https://github.com/pytest-dev/pytest-mock/issues/390)</code>_).</li> </ul> <p>.. _<a href="https://redirect.github.com/pytest-dev/pytest-mock/issues/390">#390</a>: <a href="https://redirect.github.com/pytest-dev/pytest-mock/pull/390">pytest-dev/pytest-mock#390</a></p> <h2>3.11.1 (2023-06-15)</h2> <p>(This release source code is identical to <code>3.11.0</code> except a small internal fix to deployment/CI)</p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/e1b5c62a38c5a05cae614aef3847f240ba50d269"><code>e1b5c62</code></a> Release 3.15.1</li> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/184eb190d6be417f5f33727bcbc9704909479498"><code>184eb19</code></a> Set <code>spy_return_iter</code> only when explicitly requested (<a href="https://redirect.github.com/pytest-dev/pytest-mock/issues/537">#537</a>)</li> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/4fa0088a0aa85eefb1313bd97adf43889bf1f647"><code>4fa0088</code></a> [pre-commit.ci] pre-commit autoupdate (<a href="https://redirect.github.com/pytest-dev/pytest-mock/issues/536">#536</a>)</li> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/f5aff33ce71ed4620acc43dc41cb3b198bcf4cb0"><code>f5aff33</code></a> Fix test failure with pytest 8+ and verbose mode (<a href="https://redirect.github.com/pytest-dev/pytest-mock/issues/535">#535</a>)</li> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/adc41873c9d6aa69b87e3f108c93a29c847869aa"><code>adc4187</code></a> Bump actions/setup-python from 5 to 6 in the github-actions group (<a href="https://redirect.github.com/pytest-dev/pytest-mock/issues/533">#533</a>)</li> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/95ad5700609aae73c6f767b8cc2ccfb2483e0f5c"><code>95ad570</code></a> [pre-commit.ci] pre-commit autoupdate (<a href="https://redirect.github.com/pytest-dev/pytest-mock/issues/532">#532</a>)</li> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/e696bf02c199b1f7d0c48adb450f40e5a75b699a"><code>e696bf0</code></a> Fix standalone mock support (<a href="https://redirect.github.com/pytest-dev/pytest-mock/issues/531">#531</a>)</li> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/5b29b03ce9581cfcd867dd6c04a970fb2c861291"><code>5b29b03</code></a> Fix gen-release-notes script</li> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/7d22ef4e560351832e60687d8bd15ebe2785ff3b"><code>7d22ef4</code></a> Merge pull request <a href="https://redirect.github.com/pytest-dev/pytest-mock/issues/528">#528</a> from pytest-dev/release-3.15.0</li> <li><a href="https://github.com/pytest-dev/pytest-mock/commit/90b29f89e2086c139a7b4fea89202faa192ee5a9"><code>90b29f8</code></a> Update CHANGELOG for 3.15.0</li> <li>Additional commits viewable in <a href="https://github.com/pytest-dev/pytest-mock/compare/v3.14.0...v3.15.1">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
The concern I have with this approach is that the transformation logic will have to be mirrored in vllm as well. Maybe we should be doing the transformation in vllm before extracting the hidden states? Also ideally we could generalize some of this. There is already a base implementation of |
Purpose
Enable training DFlash speculators for DeepSeek-V4-Flash (DSv4), which uses Manifold-Constrained Hyper-Connection (mHC) with
hc_mult=4. DSv4's hidden states are(N, hc_mult * hidden_size)per layer rather than the standard(N, hidden_size), and its checkpoint uses non-standard weight names (embed.weight,head.weight,norm.weight). Without these changes the DFlash pipeline cannot initialize or train a speculator for DSv4.Description
scripts/train.py: Readhc_multfrom the verifier config (default 1) and thread it into the draft model'stransformer_layer_config. Passhc_mult * hidden_sizeas the effective hidden size to both train and val dataloaders socreate_collate_fn/create_empty_samplecreate correctly shaped tensors.src/speculators/models/dflash/core.py:__init__: Readself.hc_multfrom config. Expand FC layer input dim tolen(target_layer_ids) * hc_mult * hidden_size. Registerhc_head_fn,hc_head_base,hc_head_scalebuffers whenhc_mult > 1.forward: Applyhc_head_project()to collapseverifier_last_hidden_statesfrom(N, hc_mult * hidden_size)→(N, hidden_size)beforeverifier_normin the loss path (only whenhc_mult > 1).load_verifier_weights: New override that handles DSv4's non-standard weight names and loads hc_head parameters. Delegates tosuper()whenhc_mult == 1.src/speculators/models/dflash/utils.py:hc_head_project()added in a prior commit (pure-PyTorch port of vLLM's_hc_head_fused_reference).All changes are backward-compatible: when
hc_mult=1(all non-DSv4 models), every dimension calculation is algebraically identical to the prior code.Related Issue
Part of the DFlash DSv4 code changes effort (Diff 1: Hidden States from the companion PRD).
Tests
python -m pytest tests/unit/ -x -q).I have filled in: