Skip to content

Add --max-tokens CLI argument to server#936

Open
nnorris7 wants to merge 1 commit into
Blaizzy:mainfrom
nnorris7:feat/server-max-tokens
Open

Add --max-tokens CLI argument to server#936
nnorris7 wants to merge 1 commit into
Blaizzy:mainfrom
nnorris7:feat/server-max-tokens

Conversation

@nnorris7
Copy link
Copy Markdown

@nnorris7 nnorris7 commented Apr 5, 2026

Summary

  • The server defaults to 256 max tokens when the client doesn't send max_tokens in the request, which is too low for conversational use (e.g. via Open WebUI)
  • Adds a --max-tokens CLI flag so operators can set a higher default at launch time (e.g. --max-tokens 8192)
  • Per-request max_tokens from clients still takes precedence

Test plan

  • python -m mlx_vlm.server --help shows the new --max-tokens flag with default 256
  • Server launched with --max-tokens 8192 uses 8192 when client omits max_tokens
  • Client-provided max_tokens still overrides the server default

🤖 Generated with Claude Code

The server's default max_tokens (256) is too low for conversational use
and there was no way to change it without the client sending the value
in every request. This adds a --max-tokens flag so operators can set a
sensible default at launch time (e.g. --max-tokens 8192).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant