[BUG] vLLM LoRA runtime updates can make versioned model routes return 404

## Checklist

  - [ ] The error occurs when using our provided Docker image.
  - [ ] I can consistently reproduce the bug across multiple trials or random seeds.
  - [ ] If the error causes experiment abortion, I've verified that this error is the root
    cause, not a secondary error caused by peer workers.

  ## Detailed Information

  ### Describe the bug

  When using LoRA with versioned model names on the vLLM backend, runtime weight updates can make earlier versioned model routes unavailable while rollout requests are still using them.

  This causes `/v1/chat/completions` requests to fail with 404 and breaks rollout workflow execution.

  This issue is specific to the current vLLM integration path. The SGLang backend does not exhibit the same failure mode here because versioned LoRA adapters can coexist there, so loading a newer version does not immediately invalidate older versioned adapter names.

  ### Expected behavior

  Older in-flight rollout requests targeting versioned LoRA model names should not fail with `model does not exist` immediately after a newer LoRA version is loaded.

  ### Full logs

  ```text
  HTTP request to .../v1/chat/completions failed with ClientResponseError: 404, message='Not Found' (attempt 3/3)

  Error with model error=ErrorInfo(message='The model `gui-lora-r64-v11` does not exist.', type='NotFoundError', param='model', code=404)

  Workflow execution failed

  Another occurrence from the same failure class:

  Error with model error=ErrorInfo(message='The model `gui-lora-r64` does not exist.', type='NotFoundError', param='model', code=404)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] vLLM LoRA runtime updates can make versioned model routes return 404 #1193

Checklist

Detailed Information

Describe the bug

Expected behavior

Full logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] vLLM LoRA runtime updates can make versioned model routes return 404 #1193

Description

Checklist

Detailed Information

Describe the bug

Expected behavior

Full logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions