Skip to content

SGLang unified backend: prefill worker has no drain() — NIXL KV transfer can be torn down mid-flight #9345

@tanmayv25

Description

@tanmayv25

Describe the Bug

SglangLLMEngine (unified backend, components/src/dynamo/sglang/llm_engine.py) does not override LLMEngine.drain() — it inherits the default no-op. On a prefill worker shutdown (SIGTERM), the unified Worker runs:

  1. discovery unregister
  2. DYN_GRACEFUL_SHUTDOWN_GRACE_PERIOD_SECS sleep (default 5s)
  3. engine.drain()no-op for SGLang
  4. engine.cleanup() → cancels _prefill_consume_tasks, then engine.shutdown()

If a decode peer is mid-NIXL-pull on a bootstrap room when step 4 fires, SGLang tears down the engine and the bootstrap room while the transfer is in flight. Decode peer's NIXL connect fails or returns garbage data (issue #7319).

This is at parity with the legacy python -m dynamo.sglang path — the legacy shutdown.py:install_graceful_shutdown only does discovery unregister + grace period, not transfer draining. Not a regression from the unified abstraction.

Comparison: TRT-LLM has this fix. TrtllmLLMEngine.drain() polls engine.llm.get_stats_async() until idle. SGLang likely needs an analogous poll on tokenizer_manager's scheduler state (e.g. get_internal_state() or similar) — needs investigation.

Steps to Reproduce

  1. Launch unified SGLang disagg with prefill + decode on separate GPUs.
  2. Send a steady stream of requests so the prefill worker has in-flight NIXL transfers when shutdown fires.
  3. SIGTERM the prefill worker.
  4. Decode peer's NIXL pull will fail because the prefill side's bootstrap room was torn down mid-transfer.

Expected Behavior

SglangLLMEngine.drain() should poll SGLang's scheduler / tokenizer_manager for outstanding KV transfers and wait until they complete (with a sensible timeout). Add the override and mirror the TRT-LLM pattern.

Actual Behavior

drain() is a no-op; cleanup proceeds immediately. The cancellation of _prefill_consume_tasks in cleanup() then aborts in-flight prefill streams.

Environment

  • OS: Ubuntu 24.04.4 LTS (host); Ubuntu container in dynamo:latest-sglang-test
  • Dynamo Runtime Version: branch tanmayv-unified-disagg
  • SGLang Version: (latest in dynamo:latest-sglang-test)
  • CPU Architecture: x86_64
  • CUDA Version: 12.9.1
  • GPU Architecture: NVIDIA RTX 5880 Ada Generation
  • Python Version: 3.12.3

Related

Metadata

Metadata

Assignees

Labels

backend::sglangRelates to the sglang backendbugSomething isn't workingnixlRelates to NIXL

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions