Skip to content

vLLM unified backend: python process hangs after Worker.run() returns on SIGTERM #9343

@tanmayv25

Description

@tanmayv25

Describe the Bug

When the vLLM unified backend (python -m dynamo.vllm.unified_main) receives SIGTERM, the Rust Worker orchestrator and engine_client.shutdown() complete cleanly — but the Python process does not exit on its own. After the final Phase 3: All endpoints ended gracefully log, the interpreter sits idle and must be force-killed with SIGKILL after the shutdown deadline (~30s) expires.

The hang is downstream of every line of code the unified Worker owns — something in vLLM's AsyncLLM / EngineCore subprocess / NixlConnector cleanup is holding the interpreter open. This is likely not introduced by the unified abstraction; the legacy python -m dynamo.vllm path probably exhibits the same behavior under similar conditions.

Steps to Reproduce

  1. Build the dynamo wheels and install into dynamo:latest-vllm-test (per the test-unified-disagg skill harness).
  2. Launch a unified vLLM disagg pair on a single GPU with scripts/run_vllm_disagg.sh.
  3. Once both workers register, fire a few cancelled streaming requests:
    for i in 1 2 3 4 5; do
      timeout 0.05 curl -sS -N -X POST http://localhost:8000/v1/chat/completions \
        -H "Content-Type: application/json" \
        -d '{"model":"Qwen/Qwen3-0.6B","messages":[{"role":"user","content":"hi"}],
             "max_tokens":512,"stream":true}' >/dev/null 2>&1 &
    done
    wait
  4. Send SIGTERM to the decode worker PID.
  5. Observe: Worker logs Engine cleanup completePhase 3: All endpoints ended gracefully, then the python process hangs.

Expected Behavior

After Worker.run() returns and engine_client.shutdown() has been awaited, the python interpreter should exit cleanly within a few seconds. Operators running unified vLLM in environments with strict shutdown timeouts (k8s terminationGracePeriodSeconds, systemd TimeoutStopSec) get SIGKILLed unnecessarily.

Actual Behavior

The Worker shutdown sequence completes cleanly:

Received shutdown signal; running graceful orchestration
Endpoint unregistered from discovery
Engine cleanup complete
Phase 1: Cancelling endpoint shutdown token
Phase 2: Waiting for graceful endpoints to complete
Phase 3: All endpoints ended gracefully. Connections to backend services...

After that last line, no further logs fire and the python process remains alive until force-killed with SIGKILL (verified with 35s wait → still alive).

The deferred-abort guards (_DeferredAbort.close() in components/src/dynamo/vllm/llm_engine.py) are NOT the cause — verified that all in-flight cancelled requests log request completed cleanly before SIGTERM, and Engine cleanup complete fires successfully. Suspected culprits are in vLLM's own teardown:

  • EngineCore subprocess not reaped after AsyncLLM.shutdown()
  • NixlConnector / NIXL transport thread not joined
  • multiprocessing resource_tracker or manager threads holding the interpreter

Environment

  • OS: Ubuntu 24.04.4 LTS (host); Ubuntu container in dynamo:latest-vllm-test
  • Dynamo Runtime Version: local build, branch tanmayv-unified-disagg
  • vLLM Version: v0.20.1
  • CPU Architecture: x86_64
  • CUDA Version: 12.9.1 (container), driver 580.126.09 (host)
  • GPU Architecture: NVIDIA RTX 5880 Ada Generation
  • Python Version: 3.12.3 (in container)

Metadata

Metadata

Assignees

Labels

backend::vllmRelates to the vllm backendbugSomething isn't workingfault tolerancelanguage::pythonIssues/PRs that reference Python codenixlRelates to NIXL

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions