Describe the Bug
When the vLLM unified backend (python -m dynamo.vllm.unified_main) receives SIGTERM, the Rust Worker orchestrator and engine_client.shutdown() complete cleanly — but the Python process does not exit on its own. After the final Phase 3: All endpoints ended gracefully log, the interpreter sits idle and must be force-killed with SIGKILL after the shutdown deadline (~30s) expires.
The hang is downstream of every line of code the unified Worker owns — something in vLLM's AsyncLLM / EngineCore subprocess / NixlConnector cleanup is holding the interpreter open. This is likely not introduced by the unified abstraction; the legacy python -m dynamo.vllm path probably exhibits the same behavior under similar conditions.
Steps to Reproduce
- Build the dynamo wheels and install into
dynamo:latest-vllm-test (per the test-unified-disagg skill harness).
- Launch a unified vLLM disagg pair on a single GPU with
scripts/run_vllm_disagg.sh.
- Once both workers register, fire a few cancelled streaming requests:
for i in 1 2 3 4 5; do
timeout 0.05 curl -sS -N -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"Qwen/Qwen3-0.6B","messages":[{"role":"user","content":"hi"}],
"max_tokens":512,"stream":true}' >/dev/null 2>&1 &
done
wait
- Send SIGTERM to the decode worker PID.
- Observe: Worker logs
Engine cleanup complete → Phase 3: All endpoints ended gracefully, then the python process hangs.
Expected Behavior
After Worker.run() returns and engine_client.shutdown() has been awaited, the python interpreter should exit cleanly within a few seconds. Operators running unified vLLM in environments with strict shutdown timeouts (k8s terminationGracePeriodSeconds, systemd TimeoutStopSec) get SIGKILLed unnecessarily.
Actual Behavior
The Worker shutdown sequence completes cleanly:
Received shutdown signal; running graceful orchestration
Endpoint unregistered from discovery
Engine cleanup complete
Phase 1: Cancelling endpoint shutdown token
Phase 2: Waiting for graceful endpoints to complete
Phase 3: All endpoints ended gracefully. Connections to backend services...
After that last line, no further logs fire and the python process remains alive until force-killed with SIGKILL (verified with 35s wait → still alive).
The deferred-abort guards (_DeferredAbort.close() in components/src/dynamo/vllm/llm_engine.py) are NOT the cause — verified that all in-flight cancelled requests log request completed cleanly before SIGTERM, and Engine cleanup complete fires successfully. Suspected culprits are in vLLM's own teardown:
- EngineCore subprocess not reaped after
AsyncLLM.shutdown()
- NixlConnector / NIXL transport thread not joined
- multiprocessing resource_tracker or manager threads holding the interpreter
Environment
- OS: Ubuntu 24.04.4 LTS (host); Ubuntu container in
dynamo:latest-vllm-test
- Dynamo Runtime Version: local build, branch
tanmayv-unified-disagg
- vLLM Version: v0.20.1
- CPU Architecture: x86_64
- CUDA Version: 12.9.1 (container), driver 580.126.09 (host)
- GPU Architecture: NVIDIA RTX 5880 Ada Generation
- Python Version: 3.12.3 (in container)
Describe the Bug
When the vLLM unified backend (
python -m dynamo.vllm.unified_main) receives SIGTERM, the RustWorkerorchestrator andengine_client.shutdown()complete cleanly — but the Python process does not exit on its own. After the finalPhase 3: All endpoints ended gracefullylog, the interpreter sits idle and must be force-killed with SIGKILL after the shutdown deadline (~30s) expires.The hang is downstream of every line of code the unified Worker owns — something in vLLM's AsyncLLM / EngineCore subprocess / NixlConnector cleanup is holding the interpreter open. This is likely not introduced by the unified abstraction; the legacy
python -m dynamo.vllmpath probably exhibits the same behavior under similar conditions.Steps to Reproduce
dynamo:latest-vllm-test(per thetest-unified-disaggskill harness).scripts/run_vllm_disagg.sh.Engine cleanup complete→Phase 3: All endpoints ended gracefully, then the python process hangs.Expected Behavior
After
Worker.run()returns andengine_client.shutdown()has been awaited, the python interpreter should exit cleanly within a few seconds. Operators running unified vLLM in environments with strict shutdown timeouts (k8sterminationGracePeriodSeconds, systemdTimeoutStopSec) get SIGKILLed unnecessarily.Actual Behavior
The Worker shutdown sequence completes cleanly:
After that last line, no further logs fire and the python process remains alive until force-killed with SIGKILL (verified with 35s wait → still alive).
The deferred-abort guards (
_DeferredAbort.close()incomponents/src/dynamo/vllm/llm_engine.py) are NOT the cause — verified that all in-flight cancelled requests logrequest completedcleanly before SIGTERM, andEngine cleanup completefires successfully. Suspected culprits are in vLLM's own teardown:AsyncLLM.shutdown()Environment
dynamo:latest-vllm-testtanmayv-unified-disagg