fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138
Open
javierpazo wants to merge 1 commit into
Open
fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138javierpazo wants to merge 1 commit into
javierpazo wants to merge 1 commit into
Conversation
Follow-up to merged Luce-Org#79 ("read model params from GGUF at runtime, support any qwen35 size"). Luce-Org#79 covers the target loader and the common drafter fields, but the fallback chain in gguf_draft_loader still requires the legacy `dflash.n_target_layers` key to be present. Drafters published with the new metadata key naming (`dflash-draft.dflash.target_layer_ids` plus `n_target_features`) hit the path where the legacy key is missing and the loader fails. Concrete case: the published Q8 GGUF drafter for Qwen3.6-27B-DFlash. This change derives `n_target_layers` in two steps: 1. If `target_layer_ids` is present, use its length. 2. Otherwise, if `n_target_features` and `n_embd` are both present, use `n_target_features / n_embd` (with a sanity check that the division is exact). If neither is available, the loader still fails with the same honest error as before. The legacy key path is untouched. Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target, Q8 GGUF drafter via the new metadata): Loaded `SWA layers: 4/5`, decode 21.06 tok/s, no fallback chain errors during init. Verification vs existing community PRs: COMP-COMPL with Luce-Org#79 (merged 2026-05-03). Luce-Org#79 covered target loader and drafter fields generically. This PR is a small follow-up for the case where only the new metadata is present on the drafter side. Author: Javier Pazo <xabicasa@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(dflash): derive n_target_layers fallback in gguf_draft_loader
Follow-up to merged #79 ("read model params from GGUF at runtime,
support any qwen35 size"). #79 covers the target loader and the
common drafter fields, but the fallback chain in gguf_draft_loader
still requires the legacy
dflash.n_target_layerskey to bepresent.
Drafters published with the new metadata key naming
(
dflash-draft.dflash.target_layer_idsplusn_target_features) hit the path where the legacy key is missingand the loader fails. Concrete case: the published Q8 GGUF drafter
for Qwen3.6-27B-DFlash.
This change derives
n_target_layersin two steps:target_layer_idsis present, use its length.n_target_featuresandn_embdare bothpresent, use
n_target_features / n_embd(with a sanitycheck that the division is exact).
If neither is available, the loader still fails with the same
honest error as before. The legacy key path is untouched.
Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target,
Q8 GGUF drafter via the new metadata):
Loaded
SWA layers: 4/5, decode 21.06 tok/s, no fallback chainerrors during init.
Verification vs existing community PRs:
COMP-COMPL with #79 (merged 2026-05-03). #79 covered target
loader and drafter fields generically. This PR is a small
follow-up for the case where only the new metadata is present
on the drafter side.
Author: Javier Pazo xabicasa@gmail.com