Skip to content

[BUG] vLLM LoRA runtime updates can make versioned model routes return 404 #1193

@Wangxiaoxiaoa

Description

@Wangxiaoxiaoa

Checklist

  • The error occurs when using our provided Docker image.
  • I can consistently reproduce the bug across multiple trials or random seeds.
  • If the error causes experiment abortion, I've verified that this error is the root
    cause, not a secondary error caused by peer workers.

Detailed Information

Describe the bug

When using LoRA with versioned model names on the vLLM backend, runtime weight updates can make earlier versioned model routes unavailable while rollout requests are still using them.

This causes /v1/chat/completions requests to fail with 404 and breaks rollout workflow execution.

This issue is specific to the current vLLM integration path. The SGLang backend does not exhibit the same failure mode here because versioned LoRA adapters can coexist there, so loading a newer version does not immediately invalidate older versioned adapter names.

Expected behavior

Older in-flight rollout requests targeting versioned LoRA model names should not fail with model does not exist immediately after a newer LoRA version is loaded.

Full logs

HTTP request to .../v1/chat/completions failed with ClientResponseError: 404, message='Not Found' (attempt 3/3)

Error with model error=ErrorInfo(message='The model `gui-lora-r64-v11` does not exist.', type='NotFoundError', param='model', code=404)

Workflow execution failed

Another occurrence from the same failure class:

Error with model error=ErrorInfo(message='The model `gui-lora-r64` does not exist.', type='NotFoundError', param='model', code=404)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions