Commit f96369a
committed
[None][fix] Post-rebase fixes + coverage tests for DeepSeek-V4 disaggregation
Squash of the post-cherry-pick work layered on top of the 8 DeepSeek-V4
disaggregation cherry-picks.
Fixes:
- ADP disagg error path: restore per-request hang signal (_event_loop_error),
scan all candidates + prefer CTX role for mixed-batch dummy padding, and
keep charge_budget=False on KV-transfer timeouts so they don't exhaust the
global error budget and shut down the executor.
- _count_schedulable_active_requests: revert the GENERATION_TO_COMPLETE
exclusion that #13900 added (upstream has no such exclusion). Under V1 ADP
it undercounted -> a spurious ADP dummy was inserted on top of a real
request -> the batch exceeded max_batch_size=1 and tripped the mamba
dummy-mask assert / "No free slots". Restores upstream's exact method
(fixes test_ptp_quickstart_advanced_deepseek_v3_lite_4gpus_adp_balance).
- _handle_responses KV-timeout: gate the tp_allgather on disagg
(kv_cache_transceiver is not None), not just enable_attention_dp.
py_kv_transfer_timed_out is disagg-only, so in non-disagg ADP this added a
spurious per-iteration collective that desynced adp_router's tp_allgather
(gather_all_rank_states received a bool -> TypeError). Verified by 4-GPU
DeepSeek-V3-Lite adp_balance e2e A/B (buggy: timeout hang; fixed: completes).
- transceiver: only short-circuit the tp_allgather skip when pp_size==1
(_ctx_need_pp_sync) -- the PP>1 path asymmetrically flips send/recv markers
across pipeline stages and deadlocks the _ctx_consensus pp_allgather.
- py_executor: restore main's immediate benchmark fail-fast guard.
- resource_manager: do NOT narrow trim_to_history's except (resize() can raise
non-ValueError under v2 SWA + uneven-PP; narrowing leaked KV blocks).
Tests (added to existing files):
- test_py_executor.py: disagg cache-error sync + ADP no-op paths; ADP dummy-role
behavior; GENERATION_TO_COMPLETE counts as active (adp_balance regression);
CTX dummy padding for disagg idle ranks (incl. awaiting-KV-transfer-only).
- test_kv_cache_v2_scheduler.py: trim_to_history.
- test_cache_reuse_adapter.py: trim-to-prompt-history + transceiver ctx mgr;
_create_kv_slice TokenRange (stub sets py_beam_width for the #14876 path).
- test_router.py: finish_request explicit-session forwarding.
- test_agent.py: BindingsNixlTransferStatus + shutdown idempotency (#14137).
- transferAgentTest.cpp: status-outlives-agent (weak_ptr UAF safety) +
concurrent submitTransferRequests.
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>1 parent f02b4c2 commit f96369a
8 files changed
Lines changed: 566 additions & 104 deletions
File tree
- cpp/tests/unit_tests/executor
- tensorrt_llm/_torch
- disaggregation
- pyexecutor
- tests/unittest
- _torch/executor
- disaggregated
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
376 | 378 | | |
377 | 379 | | |
378 | 380 | | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
379 | 482 | | |
380 | 483 | | |
381 | 484 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
517 | 517 | | |
518 | 518 | | |
519 | 519 | | |
520 | | - | |
521 | | - | |
| 520 | + | |
522 | 521 | | |
523 | 522 | | |
524 | 523 | | |
| |||
573 | 572 | | |
574 | 573 | | |
575 | 574 | | |
576 | | - | |
577 | | - | |
| 575 | + | |
578 | 576 | | |
579 | 577 | | |
580 | 578 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
517 | 517 | | |
518 | 518 | | |
519 | 519 | | |
520 | | - | |
521 | | - | |
522 | 520 | | |
523 | 521 | | |
524 | 522 | | |
| |||
588 | 586 | | |
589 | 587 | | |
590 | 588 | | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
591 | 592 | | |
592 | 593 | | |
593 | 594 | | |
| |||
2740 | 2741 | | |
2741 | 2742 | | |
2742 | 2743 | | |
2743 | | - | |
2744 | | - | |
2745 | | - | |
2746 | | - | |
2747 | | - | |
2748 | | - | |
2749 | | - | |
2750 | | - | |
2751 | | - | |
2752 | | - | |
2753 | | - | |
2754 | | - | |
2755 | | - | |
2756 | | - | |
2757 | | - | |
2758 | | - | |
2759 | | - | |
2760 | | - | |
2761 | | - | |
2762 | | - | |
2763 | | - | |
2764 | | - | |
2765 | | - | |
2766 | | - | |
2767 | | - | |
2768 | | - | |
2769 | | - | |
2770 | | - | |
2771 | | - | |
2772 | | - | |
2773 | | - | |
2774 | | - | |
2775 | | - | |
2776 | | - | |
2777 | | - | |
2778 | | - | |
2779 | | - | |
2780 | | - | |
2781 | | - | |
2782 | | - | |
2783 | | - | |
2784 | | - | |
2785 | | - | |
2786 | | - | |
2787 | | - | |
2788 | | - | |
2789 | | - | |
2790 | | - | |
2791 | | - | |
2792 | | - | |
2793 | | - | |
2794 | | - | |
2795 | | - | |
2796 | | - | |
2797 | | - | |
2798 | | - | |
2799 | | - | |
2800 | | - | |
2801 | | - | |
2802 | | - | |
2803 | | - | |
| 2744 | + | |
| 2745 | + | |
| 2746 | + | |
| 2747 | + | |
| 2748 | + | |
| 2749 | + | |
| 2750 | + | |
| 2751 | + | |
| 2752 | + | |
| 2753 | + | |
| 2754 | + | |
| 2755 | + | |
| 2756 | + | |
| 2757 | + | |
| 2758 | + | |
| 2759 | + | |
| 2760 | + | |
| 2761 | + | |
| 2762 | + | |
| 2763 | + | |
| 2764 | + | |
| 2765 | + | |
| 2766 | + | |
| 2767 | + | |
2804 | 2768 | | |
2805 | 2769 | | |
2806 | 2770 | | |
| |||
4400 | 4364 | | |
4401 | 4365 | | |
4402 | 4366 | | |
4403 | | - | |
4404 | | - | |
4405 | | - | |
4406 | | - | |
4407 | | - | |
4408 | | - | |
| 4367 | + | |
4409 | 4368 | | |
4410 | | - | |
4411 | | - | |
| 4369 | + | |
| 4370 | + | |
| 4371 | + | |
| 4372 | + | |
| 4373 | + | |
4412 | 4374 | | |
| 4375 | + | |
| 4376 | + | |
| 4377 | + | |
4413 | 4378 | | |
4414 | | - | |
4415 | | - | |
| 4379 | + | |
4416 | 4380 | | |
4417 | 4381 | | |
4418 | 4382 | | |
4419 | 4383 | | |
4420 | 4384 | | |
4421 | | - | |
4422 | | - | |
4423 | | - | |
| 4385 | + | |
| 4386 | + | |
4424 | 4387 | | |
4425 | 4388 | | |
4426 | 4389 | | |
| |||
4456 | 4419 | | |
4457 | 4420 | | |
4458 | 4421 | | |
| 4422 | + | |
| 4423 | + | |
4459 | 4424 | | |
4460 | 4425 | | |
4461 | 4426 | | |
4462 | | - | |
4463 | | - | |
4464 | | - | |
4465 | | - | |
4466 | | - | |
| 4427 | + | |
| 4428 | + | |
| 4429 | + | |
| 4430 | + | |
| 4431 | + | |
| 4432 | + | |
| 4433 | + | |
| 4434 | + | |
| 4435 | + | |
| 4436 | + | |
4467 | 4437 | | |
4468 | 4438 | | |
4469 | 4439 | | |
| |||
4482 | 4452 | | |
4483 | 4453 | | |
4484 | 4454 | | |
4485 | | - | |
4486 | | - | |
4487 | 4455 | | |
4488 | 4456 | | |
4489 | 4457 | | |
| |||
5518 | 5486 | | |
5519 | 5487 | | |
5520 | 5488 | | |
5521 | | - | |
| 5489 | + | |
| 5490 | + | |
5522 | 5491 | | |
5523 | 5492 | | |
5524 | 5493 | | |
5525 | 5494 | | |
5526 | | - | |
| 5495 | + | |
| 5496 | + | |
5527 | 5497 | | |
5528 | 5498 | | |
5529 | 5499 | | |
5530 | 5500 | | |
5531 | | - | |
| 5501 | + | |
| 5502 | + | |
5532 | 5503 | | |
5533 | 5504 | | |
5534 | 5505 | | |
| |||
0 commit comments