fix(dflash): derive n_target_layers fallback in gguf_draft_loader by javierpazo · Pull Request #138 · Luce-Org/lucebox-hub

javierpazo · 2026-05-09T12:13:48Z

fix(dflash): derive n_target_layers fallback in gguf_draft_loader

Follow-up to merged #79 ("read model params from GGUF at runtime,
support any qwen35 size"). #79 covers the target loader and the
common drafter fields, but the fallback chain in gguf_draft_loader
still requires the legacy dflash.n_target_layers key to be
present.

Drafters published with the new metadata key naming
(dflash-draft.dflash.target_layer_ids plus
n_target_features) hit the path where the legacy key is missing
and the loader fails. Concrete case: the published Q8 GGUF drafter
for Qwen3.6-27B-DFlash.

This change derives n_target_layers in two steps:

If target_layer_ids is present, use its length.
Otherwise, if n_target_features and n_embd are both
present, use n_target_features / n_embd (with a sanity
check that the division is exact).

If neither is available, the loader still fails with the same
honest error as before. The legacy key path is untouched.

Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target,
Q8 GGUF drafter via the new metadata):

Loaded SWA layers: 4/5, decode 21.06 tok/s, no fallback chain
errors during init.

Verification vs existing community PRs:

COMP-COMPL with #79 (merged 2026-05-03). #79 covered target
loader and drafter fields generically. This PR is a small
follow-up for the case where only the new metadata is present
on the drafter side.

Author: Javier Pazo xabicasa@gmail.com

Follow-up to merged Luce-Org#79 ("read model params from GGUF at runtime, support any qwen35 size"). Luce-Org#79 covers the target loader and the common drafter fields, but the fallback chain in gguf_draft_loader still requires the legacy `dflash.n_target_layers` key to be present. Drafters published with the new metadata key naming (`dflash-draft.dflash.target_layer_ids` plus `n_target_features`) hit the path where the legacy key is missing and the loader fails. Concrete case: the published Q8 GGUF drafter for Qwen3.6-27B-DFlash. This change derives `n_target_layers` in two steps: 1. If `target_layer_ids` is present, use its length. 2. Otherwise, if `n_target_features` and `n_embd` are both present, use `n_target_features / n_embd` (with a sanity check that the division is exact). If neither is available, the loader still fails with the same honest error as before. The legacy key path is untouched. Validation (RTX 6000 Ada sm_89, Qwen3.6-27B Heretic Q4_K_M target, Q8 GGUF drafter via the new metadata): Loaded `SWA layers: 4/5`, decode 21.06 tok/s, no fallback chain errors during init. Verification vs existing community PRs: COMP-COMPL with Luce-Org#79 (merged 2026-05-03). Luce-Org#79 covered target loader and drafter fields generically. This PR is a small follow-up for the case where only the new metadata is present on the drafter side. Author: Javier Pazo <xabicasa@gmail.com>

cubic-dev-ai

No issues found across 1 file

cubic-dev-ai Bot reviewed May 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138

fix(dflash): derive n_target_layers fallback in gguf_draft_loader#138
javierpazo wants to merge 1 commit into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-gguf-draft-loader-target-layers

javierpazo commented May 9, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

javierpazo commented May 9, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant