Skip to content

Commit c830f99

Browse files
authored
server : support max_completion_tokens request property (ggml-org#19831)
"max_tokens" is deprectated in favor of "max_completion_tokens" which sets the upper bound for reasoning+output token. Closes: ggml-org#13700
1 parent aa6f918 commit c830f99

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

tools/server/server-task.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,8 @@ task_params server_task::params_from_json_cmpl(
204204
params.cache_prompt = json_value(data, "cache_prompt", defaults.cache_prompt);
205205
params.return_tokens = json_value(data, "return_tokens", false);
206206
params.return_progress = json_value(data, "return_progress", false);
207-
params.n_predict = json_value(data, "n_predict", json_value(data, "max_tokens", defaults.n_predict));
207+
auto max_tokens = json_value(data, "max_tokens", defaults.n_predict);
208+
params.n_predict = json_value(data, "n_predict", json_value(data, "max_completion_tokens", max_tokens));
208209
params.n_indent = json_value(data, "n_indent", defaults.n_indent);
209210
params.n_keep = json_value(data, "n_keep", defaults.n_keep);
210211
params.n_discard = json_value(data, "n_discard", defaults.n_discard);

0 commit comments

Comments
 (0)