Skip to content

fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138

Open
javierpazo wants to merge 1 commit into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-gguf-draft-loader-target-layers
Open

fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138
javierpazo wants to merge 1 commit into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-gguf-draft-loader-target-layers

Conversation

@javierpazo
Copy link
Copy Markdown
Contributor

fix(dflash): derive n_target_layers fallback in gguf_draft_loader

Follow-up to merged #79 ("read model params from GGUF at runtime,
support any qwen35 size"). #79 covers the target loader and the
common drafter fields, but the fallback chain in gguf_draft_loader
still requires the legacy dflash.n_target_layers key to be
present.

Drafters published with the new metadata key naming
(dflash-draft.dflash.target_layer_ids plus
n_target_features) hit the path where the legacy key is missing
and the loader fails. Concrete case: the published Q8 GGUF drafter
for Qwen3.6-27B-DFlash.

This change derives n_target_layers in two steps:

  1. If target_layer_ids is present, use its length.
  2. Otherwise, if n_target_features and n_embd are both
    present, use n_target_features / n_embd (with a sanity
    check that the division is exact).

If neither is available, the loader still fails with the same
honest error as before. The legacy key path is untouched.

Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target,
Q8 GGUF drafter via the new metadata):

Loaded SWA layers: 4/5, decode 21.06 tok/s, no fallback chain
errors during init.

Verification vs existing community PRs:

COMP-COMPL with #79 (merged 2026-05-03). #79 covered target
loader and drafter fields generically. This PR is a small
follow-up for the case where only the new metadata is present
on the drafter side.

Author: Javier Pazo xabicasa@gmail.com

Follow-up to merged Luce-Org#79 ("read model params from GGUF at runtime,
support any qwen35 size"). Luce-Org#79 covers the target loader and the
common drafter fields, but the fallback chain in gguf_draft_loader
still requires the legacy `dflash.n_target_layers` key to be
present.

Drafters published with the new metadata key naming
(`dflash-draft.dflash.target_layer_ids` plus
`n_target_features`) hit the path where the legacy key is missing
and the loader fails. Concrete case: the published Q8 GGUF drafter
for Qwen3.6-27B-DFlash.

This change derives `n_target_layers` in two steps:

  1. If `target_layer_ids` is present, use its length.
  2. Otherwise, if `n_target_features` and `n_embd` are both
     present, use `n_target_features / n_embd` (with a sanity
     check that the division is exact).

If neither is available, the loader still fails with the same
honest error as before. The legacy key path is untouched.

Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target,
Q8 GGUF drafter via the new metadata):

  Loaded `SWA layers: 4/5`, decode 21.06 tok/s, no fallback chain
  errors during init.

Verification vs existing community PRs:

  COMP-COMPL with Luce-Org#79 (merged 2026-05-03). Luce-Org#79 covered target
  loader and drafter fields generically. This PR is a small
  follow-up for the case where only the new metadata is present
  on the drafter side.

Author: Javier Pazo <xabicasa@gmail.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant