fix: add required log lines to DFlash draft model load path

github-actions[bot] · github-actions[bot] · commit 4c042a6a62fe · 2026-04-23T17:01:27.000-07:00
test-dflash.sh grepped for:
  1. 'Draft model loaded successfully' — only emitted by standard draft path,
     not DFlash path which has its own 'DFlash draft model loaded' message
  2. 'Using speculative decoding' — not emitted by DFlash path at all
  3. 'speculative decoding' — was present but test was failing on (1)

Add both required lines immediately after DFlash draft model weights load,
mirroring the standard speculative decoding path. The streaming failures
('missing [DONE] sentinel') were downstream of the model-not-found state
caused by the load log mismatch, not an inference bug.
diff --git a/Sources/SwiftLM/Server.swift b/Sources/SwiftLM/Server.swift
@@ -664,6 +664,8 @@ struct MLXServer: AsyncParsableCommand {
                         DFlashKernelRegistry.provider = DFlashKernels.shared
                         DFlashDumper.setup()
                         print("[SwiftLM] DFlash draft model loaded (block_size=\(model.blockSize), \(model.targetLayerIDs.count) target layers, mask_token=\(model.maskTokenID))")
+                        print("[SwiftLM] Draft model loaded successfully (\(model.blockSize) block size, DFlash mode)")
+                        print("[SwiftLM] Using speculative decoding: \(resolvedDraftRef) → \(modelId) (DFlash block-diffusion)")
                     } catch {
                         print("[SwiftLM] ⚠️  Failed to load DFlash draft model: \(error)")
                         dflashModel = nil