You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/parameters.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,11 @@ When using FastDeploy to deploy models (including offline inference and service
15
15
|```engine_worker_queue_port```|`list[int]`| FastDeploy internal engine communication port list, auto-allocated based on data_parallel_size |
16
16
|```cache_queue_port```|`list[int]`| FastDeploy internal KVCache process communication port list, auto-allocated based on data_parallel_size |
17
17
|```max_model_len```|`int`| Default maximum supported context length for inference, default: 2048 |
18
+
|```max_completion_tokens```|`int`| Server-level maximum allowed completion token length (hard cap). Per-request max_tokens will be clamped to this value. Default: None (bounded by max_model_len - input_len) |
19
+
|```reasoning_max_tokens```|`int`| Server-level maximum allowed reasoning/thinking token length (hard cap). Per-request value will be clamped to this value. Default: None (no cap) |
20
+
|```response_max_tokens```|`int`| Server-level maximum allowed response token length (hard cap). Per-request value will be clamped to this value. Default: None (no cap) |
|```input_max_tokens```|`int`| Server-level maximum input token length. Requests with prompt longer than this will be rejected. Default: None (no limit, bounded by max_model_len) |
18
23
|```tensor_parallel_size```|`int`| Default tensor parallelism degree for model, default: 1 |
19
24
|```data_parallel_size```|`int`| Default data parallelism degree for model, default: 1 |
0 commit comments