You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(mac-chat): stream tokens live so the CLI no longer looks frozen on long answers (#152)
Root cause (from the code): the interactive chat REPL is fully NON-streaming —
_gen_turn runs the entire generation (up to max_new_tokens) before printing
anything. On a code-gen prompt the answer is long and the f_θ path is slow
(~3-5 tok/s, single-token past the wrap), so the terminal stays silent for
minutes — indistinguishable from a freeze (user: '一进入就卡死,完全没有输出'). Not a
deadlock (prior scripted code-gen runs completed).
Fix: add an on_commit streaming callback to the 3 fused generate loops (safe
_emit wrapper, never breaks decode) and have the chat REPL decode incrementally
and print the delta LIVE (+ a per-block '[stream] blk=.. t=..s' stderr line that
also proves the engine is progressing, not hung). New mlx-kakeya-chat-stream-
probe preset runs the user's exact prompt to validate.
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
0 commit comments