Commit 453923c
[RLlib] Handle the all-evaluation-workers-unhealthy case uniformly across modes
When all *configured* remote evaluation EnvRunners are unhealthy at the
start of an evaluation step, `Algorithm.evaluate()` previously did one
of two thing:
- `evaluation_parallel_to_training=True`: fall back to the local eval
EnvRunner, which raises `ValueError: Cannot run on local evaluation
worker parallel to training!`. Hard-crashes a long training run.
- `evaluation_parallel_to_training=False`: silently fall back to the
local eval EnvRunner. "Works" but the eval numbers are quietly
produced by a different EnvRunner from the one the user configured,
on the driver process, with potentially different perf and env
settings.
Both behaviors are gone. RLlib never silently falls back to local eval
in the failure case anymore. Two new orthogonal config knobs on
`AlgorithmConfig.evaluation()` control the behavior:
- `evaluation_unhealthy_workers_timeout_s` (float, default 0): how
long to wait for at least one remote evaluation EnvRunner to recover.
- `evaluation_error_on_no_workers` (bool, default False): if still
none after the wait, raise `RuntimeError` (True) or skip evaluation
for this iteration (False).
Both knobs apply uniformly regardless of `evaluation_parallel_to_training`.
The intentional `evaluation_num_env_runners=0` case (user explicitly
asked for local-only eval) is preserved -- this is not a fallback, it's
the user's chosen configuration, and is recognized via
`num_remote_env_runners() == 0`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent bbf5fef commit 453923c
4 files changed
Lines changed: 248 additions & 3 deletions
File tree
- rllib
- algorithms
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1685 | 1685 | | |
1686 | 1686 | | |
1687 | 1687 | | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
1688 | 1700 | | |
1689 | 1701 | | |
1690 | 1702 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1461 | 1461 | | |
1462 | 1462 | | |
1463 | 1463 | | |
| 1464 | + | |
1464 | 1465 | | |
1465 | 1466 | | |
1466 | 1467 | | |
| 1468 | + | |
| 1469 | + | |
| 1470 | + | |
| 1471 | + | |
1467 | 1472 | | |
1468 | 1473 | | |
1469 | 1474 | | |
| |||
1474 | 1479 | | |
1475 | 1480 | | |
1476 | 1481 | | |
1477 | | - | |
| 1482 | + | |
| 1483 | + | |
1478 | 1484 | | |
1479 | 1485 | | |
1480 | 1486 | | |
1481 | 1487 | | |
1482 | 1488 | | |
1483 | 1489 | | |
1484 | 1490 | | |
1485 | | - | |
1486 | | - | |
| 1491 | + | |
| 1492 | + | |
| 1493 | + | |
| 1494 | + | |
| 1495 | + | |
1487 | 1496 | | |
1488 | 1497 | | |
1489 | 1498 | | |
1490 | 1499 | | |
1491 | 1500 | | |
1492 | 1501 | | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
| 1505 | + | |
| 1506 | + | |
| 1507 | + | |
| 1508 | + | |
| 1509 | + | |
| 1510 | + | |
| 1511 | + | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
1493 | 1515 | | |
1494 | 1516 | | |
1495 | 1517 | | |
| |||
1616 | 1638 | | |
1617 | 1639 | | |
1618 | 1640 | | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
| 1647 | + | |
| 1648 | + | |
| 1649 | + | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
1619 | 1695 | | |
1620 | 1696 | | |
1621 | 1697 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
520 | 520 | | |
521 | 521 | | |
522 | 522 | | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
523 | 532 | | |
524 | 533 | | |
525 | 534 | | |
| |||
2744 | 2753 | | |
2745 | 2754 | | |
2746 | 2755 | | |
| 2756 | + | |
| 2757 | + | |
2747 | 2758 | | |
2748 | 2759 | | |
2749 | 2760 | | |
| |||
2827 | 2838 | | |
2828 | 2839 | | |
2829 | 2840 | | |
| 2841 | + | |
| 2842 | + | |
| 2843 | + | |
| 2844 | + | |
| 2845 | + | |
| 2846 | + | |
| 2847 | + | |
| 2848 | + | |
| 2849 | + | |
| 2850 | + | |
| 2851 | + | |
| 2852 | + | |
| 2853 | + | |
| 2854 | + | |
2830 | 2855 | | |
2831 | 2856 | | |
2832 | 2857 | | |
| |||
3000 | 3025 | | |
3001 | 3026 | | |
3002 | 3027 | | |
| 3028 | + | |
| 3029 | + | |
| 3030 | + | |
| 3031 | + | |
| 3032 | + | |
| 3033 | + | |
3003 | 3034 | | |
3004 | 3035 | | |
3005 | 3036 | | |
| |||
Lines changed: 126 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
0 commit comments