Skip to content

[CONTRIBUTION]: Enable encoder_only mode for SGLang multimodal encode workers #9291

@dsocek

Description

@dsocek

Type of Change

Bug fix

Problem Statement

The SGLang multimodal encode worker is loading the entire model (including LLM weights), causing out-of-memory (OOM) errors on GPUs with limited memory.

Proposed Solution

Set server_args.encoder_only = True in args.py when dynamo_config.multimodal_encode_worker is True, right after ServerArgs.from_cli_args() is called.

Estimated PR Size

XS (1-10 lines)

Files/Components Affected

components/src/dynamo/sglang/args.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend::sglangRelates to the sglang backendbugSomething isn't workingcontribution-requestExternal contributor proposing to implement a changelanguage::pythonIssues/PRs that reference Python code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions