Type of Change
Bug fix
Problem Statement
The SGLang multimodal encode worker is loading the entire model (including LLM weights), causing out-of-memory (OOM) errors on GPUs with limited memory.
Proposed Solution
Set server_args.encoder_only = True in args.py when dynamo_config.multimodal_encode_worker is True, right after ServerArgs.from_cli_args() is called.
Estimated PR Size
XS (1-10 lines)
Files/Components Affected
components/src/dynamo/sglang/args.py
Type of Change
Bug fix
Problem Statement
The SGLang multimodal encode worker is loading the entire model (including LLM weights), causing out-of-memory (OOM) errors on GPUs with limited memory.
Proposed Solution
Set
server_args.encoder_only = Trueinargs.pywhendynamo_config.multimodal_encode_workeris True, right afterServerArgs.from_cli_args()is called.Estimated PR Size
XS (1-10 lines)
Files/Components Affected
components/src/dynamo/sglang/args.py