fix: add 'Using speculative decoding' log line for CI test assertions

github-actions[bot] · github-actions[bot] · commit 5581f3873025 · 2026-04-23T16:55:00.000-07:00
Both test-speculative.sh and test-dflash.sh grep for 'Using speculative
decoding' in the server log to confirm the speculative path was activated.
This string was never emitted — the tests were checking a log line that
didn't exist, causing speculative-decoding and dflash-speculative-decoding
CI jobs to always fail on Test 1.

Fix: emit the exact expected log line:
  - Standard spec: after draft model is loaded successfully
  - DFlash spec: at generation dispatch in Server.swift

Server log now contains all strings the tests grep for:
  ✅ 'Draft model loaded successfully'
  ✅ 'Using speculative decoding'
  ✅ 'speculative decoding' (for test-speculative-eval.sh)
diff --git a/Sources/SwiftLM/Server.swift b/Sources/SwiftLM/Server.swift
@@ -616,6 +616,7 @@ struct MLXServer: AsyncParsableCommand {
             }
             draftModelRef = await draftContainer.extractDraftModel()
             print("[SwiftLM] Draft model loaded successfully (\(numDraftTokensConfig) tokens/round)")
+            print("[SwiftLM] Using speculative decoding: \(draftModelPath) → \(modelId) (\(numDraftTokensConfig) draft tokens/round)")
         } else {
             draftModelRef = nil
         }
@@ -1418,6 +1419,7 @@ func handleChatCompletion(
     // to DFlashTargetModel, we use DFlashRuntime.generate instead of the standard path.
     if let dflashDraft = dflashModel, let targetModel = dflashTargetModel {
         print("[SwiftLM] ⚡ DFlash block-diffusion speculative decoding active")
+        print("[SwiftLM] Using speculative decoding: DFlash block-diffusion mode active")
         fflush(stdout)
         // Convert DFlashEvent stream to Generation stream with proper streaming detokenizer
         let dflashTokenizer = await container.tokenizer