Commit cdef3d0
committed
[None][fix] Post-rebase fixes + coverage tests for DeepSeek-V4 disaggregation
Squash of the post-cherry-pick work layered on top of the 8 DeepSeek-V4
disaggregation cherry-picks.
Fixes:
- ADP disagg error path: restore per-request hang signal (_event_loop_error),
scan all candidates + prefer CTX role for mixed-batch dummy padding, and
keep charge_budget=False on KV-transfer timeouts so they don't exhaust the
global error budget and shut down the executor.
- _count_schedulable_active_requests: gate the GENERATION_TO_COMPLETE exclusion
on the V2 KV-cache manager. Only the V2 scheduler skips state
>= GENERATION_TO_COMPLETE; the V1 scheduler still forwards those requests, so
excluding them under V1 ADP undercounted and spuriously inserted an ADP dummy
on top of a real request -- overflowing a small batch and tripping the mamba
dummy-mask assert (n <= _dummy_request_mask_host.shape[0]) / "No free slots".
Fixes test_ptp_quickstart_advanced_deepseek_v3_lite_4gpus_adp_balance.
- transceiver: only short-circuit the tp_allgather skip when pp_size==1
(_ctx_need_pp_sync) -- the PP>1 path asymmetrically flips send/recv markers
across pipeline stages and deadlocks the _ctx_consensus pp_allgather.
- py_executor: restore main's immediate benchmark fail-fast guard.
- resource_manager: do NOT narrow trim_to_history's except (resize() can raise
non-ValueError under v2 SWA + uneven-PP; narrowing leaked KV blocks).
Tests (added to existing files):
- test_py_executor.py: disagg cache-error sync + ADP no-op paths; ADP dummy-role
and _pad_attention_dp_dummy_request V1/V2 GENERATION_TO_COMPLETE behavior
(adp_balance regression).
- test_kv_cache_v2_scheduler.py: trim_to_history.
- test_cache_reuse_adapter.py: trim-to-prompt-history + transceiver ctx mgr.
- test_router.py: finish_request explicit-session forwarding.
- test_agent.py: BindingsNixlTransferStatus + shutdown idempotency (#14137).
- transferAgentTest.cpp: status-outlives-agent (weak_ptr UAF safety) +
concurrent submitTransferRequests.
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>1 parent 8b5e0d4 commit cdef3d0
9 files changed
Lines changed: 593 additions & 83 deletions
File tree
- cpp/tests/unit_tests/executor
- tensorrt_llm/_torch
- disaggregation
- pyexecutor
- tests/unittest
- _torch/executor
- disaggregated
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
376 | 378 | | |
377 | 379 | | |
378 | 380 | | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
379 | 482 | | |
380 | 483 | | |
381 | 484 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
517 | 517 | | |
518 | 518 | | |
519 | 519 | | |
520 | | - | |
521 | | - | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
522 | 526 | | |
523 | 527 | | |
524 | 528 | | |
| |||
573 | 577 | | |
574 | 578 | | |
575 | 579 | | |
576 | | - | |
577 | | - | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
578 | 585 | | |
579 | 586 | | |
580 | 587 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1171 | 1171 | | |
1172 | 1172 | | |
1173 | 1173 | | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
1174 | 1180 | | |
1175 | 1181 | | |
1176 | 1182 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
517 | 517 | | |
518 | 518 | | |
519 | 519 | | |
520 | | - | |
521 | | - | |
522 | 520 | | |
523 | 521 | | |
524 | 522 | | |
| |||
588 | 586 | | |
589 | 587 | | |
590 | 588 | | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
591 | 597 | | |
592 | 598 | | |
593 | 599 | | |
| |||
2740 | 2746 | | |
2741 | 2747 | | |
2742 | 2748 | | |
2743 | | - | |
2744 | | - | |
2745 | | - | |
2746 | | - | |
2747 | | - | |
2748 | | - | |
2749 | | - | |
2750 | | - | |
2751 | | - | |
2752 | | - | |
2753 | | - | |
2754 | | - | |
2755 | | - | |
2756 | | - | |
2757 | | - | |
2758 | | - | |
2759 | | - | |
2760 | | - | |
2761 | | - | |
2762 | | - | |
2763 | | - | |
2764 | | - | |
2765 | | - | |
2766 | | - | |
2767 | | - | |
2768 | | - | |
2769 | | - | |
2770 | | - | |
2771 | | - | |
2772 | | - | |
2773 | | - | |
2774 | | - | |
2775 | | - | |
2776 | | - | |
2777 | | - | |
2778 | | - | |
2779 | | - | |
2780 | | - | |
2781 | | - | |
2782 | | - | |
2783 | | - | |
2784 | | - | |
2785 | | - | |
2786 | | - | |
2787 | | - | |
2788 | | - | |
2789 | | - | |
2790 | | - | |
2791 | | - | |
2792 | | - | |
2793 | | - | |
2794 | | - | |
2795 | | - | |
2796 | | - | |
2797 | | - | |
2798 | | - | |
2799 | | - | |
2800 | | - | |
2801 | | - | |
2802 | | - | |
2803 | | - | |
| 2749 | + | |
| 2750 | + | |
| 2751 | + | |
| 2752 | + | |
| 2753 | + | |
| 2754 | + | |
| 2755 | + | |
| 2756 | + | |
| 2757 | + | |
| 2758 | + | |
| 2759 | + | |
| 2760 | + | |
| 2761 | + | |
| 2762 | + | |
| 2763 | + | |
| 2764 | + | |
| 2765 | + | |
| 2766 | + | |
| 2767 | + | |
| 2768 | + | |
| 2769 | + | |
| 2770 | + | |
| 2771 | + | |
| 2772 | + | |
2804 | 2773 | | |
2805 | 2774 | | |
2806 | 2775 | | |
| |||
4402 | 4371 | | |
4403 | 4372 | | |
4404 | 4373 | | |
4405 | | - | |
4406 | | - | |
4407 | | - | |
| 4374 | + | |
| 4375 | + | |
| 4376 | + | |
| 4377 | + | |
| 4378 | + | |
| 4379 | + | |
4408 | 4380 | | |
4409 | 4381 | | |
4410 | 4382 | | |
4411 | | - | |
| 4383 | + | |
| 4384 | + | |
4412 | 4385 | | |
4413 | 4386 | | |
4414 | 4387 | | |
| |||
4456 | 4429 | | |
4457 | 4430 | | |
4458 | 4431 | | |
| 4432 | + | |
| 4433 | + | |
4459 | 4434 | | |
4460 | 4435 | | |
4461 | 4436 | | |
4462 | | - | |
4463 | | - | |
4464 | | - | |
4465 | | - | |
4466 | | - | |
| 4437 | + | |
| 4438 | + | |
| 4439 | + | |
| 4440 | + | |
| 4441 | + | |
| 4442 | + | |
| 4443 | + | |
| 4444 | + | |
| 4445 | + | |
| 4446 | + | |
4467 | 4447 | | |
4468 | 4448 | | |
4469 | 4449 | | |
| |||
5523 | 5503 | | |
5524 | 5504 | | |
5525 | 5505 | | |
5526 | | - | |
| 5506 | + | |
| 5507 | + | |
5527 | 5508 | | |
5528 | 5509 | | |
5529 | 5510 | | |
5530 | 5511 | | |
5531 | | - | |
| 5512 | + | |
| 5513 | + | |
5532 | 5514 | | |
5533 | 5515 | | |
5534 | 5516 | | |
| |||
0 commit comments