refactor(mtp): MtpSource enum + auto-detect MTP tensors#2
Merged
Merged
Conversation
Per @howard0su's review on Luce-Org#237 (lines 57, 59): - 57: 'shall we use the unsloth single-file MTP-in-target GGUF?' - 59: 'why not a enum?' Replaces: const char * mtp_gguf_path = nullptr; const char * mtp_draft_source = nullptr; // "chain" | "mtp_topk" with: enum class MtpSource { None, Native, ExternalDrafter, Auto }; MtpSource mtp_source = MtpSource::None; const char * mtp_gguf_path = nullptr; // only for ExternalDrafter bool mtp_use_topk = false; // false=chain, true=mtp_topk Adds gguf_contains_mtp_tensors() probe (keyed on qwen35.nextn_predict_layers metadata) so --mtp-gguf becomes optional when the primary GGUF embeds MTP tensors (unsloth single-file case). Stacked on Luce-Org#237. dflash_server arg parsing updated to: - --mtp-source [none|native|external|auto] (new explicit flag) - --mtp-gguf PATH (now optional; only needed for ExternalDrafter) - Old --mtp-draft-source string flag warns + ignored (migration aid) - --mtp-gamma alone triggers Auto detection All test_common_mtp_orchestrator tests still pass (mock-based, unaffected by the config-surface change).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on Luce-Org#237 (Luce-Org/lucebox-hub). Addresses @howard0su's review comments on
dflash/src/common/backend_factory.h:--mtp-ggufis now optional. When absent,gguf_contains_mtp_tensors()probes the primary GGUF forqwen35.nextn_predict_layers > 0and selectsMtpSource::Nativeautomatically.enum class MtpSource { None, Native, ExternalDrafter, Auto }replaces the string-keyedmtp_draft_sourceand the load-shape-implied-by-pointermtp_gguf_path.Changes
dflash_server CLI
--mtp-source [none|native|external|auto]— new explicit flag--mtp-gguf PATH— now optional; only required for ExternalDrafter mode--mtp-gamma Nalone (no--mtp-source) triggers Auto detection--mtp-draft-sourceflag emits a stderr warning and is ignored (migration aid)Auto-detect
gguf_contains_mtp_tensors(path)opens the GGUF, checks forqwen35.nextn_predict_layers > 0— the same fieldqwen35_mtp_loader.cpprequires. Pure metadata scan, no tensor allocation, no GPU touch.Backward compat
--mtp-gguf PATHwithout--mtp-source→ inferred asExternalDrafter(old behavior preserved)--mtp-gamma Nwithout--mtp-sourceand without--mtp-gguf→Auto(probes primary GGUF)Tests
test_common_mtp_orchestrator(20-test mock-based suite) all pass. Build clean.Note on tensor naming
gguf_contains_mtp_tensors()uses theqwen35.nextn_predict_layersmetadata key as the detection heuristic — same criterion asqwen35_mtp_loader.cpp(which fails early with "not an MTP variant" if this key is absent). This covers the unsloth single-file format. If a different MTP GGUF format emerges without this key, howard can advise and we'll add a tensor-name scan as a fallback.