Commit 402470f
committed
Surface repair-needed age on operator metrics
Adds `runs.oldest_repair_needed_at` and `runs.max_repair_needed_age_ms`
to `OperatorMetrics::snapshot()` so operators can read "how long has the
worst-case run been stuck without progress?" — the canonical
stuck-workflow duplicate-risk age indicator paired with the
`durable_resume_paths` health check — from the metric alone without
walking `workflow_run_summaries`.
The summary's `updated_at` is sourced by `RunSummaryProjector` from
`WorkflowRun::last_progress_at`, so it advances when the run made forward
progress and stalls when the run stopped progressing. For runs already
pinned at `liveness_state = repair_needed` it is therefore the closest
available proxy for "when this run last made progress before being
marked broken." Falls back to the run's `started_at` when the projection
has not recorded a progress boundary (a fresh run that was projected
straight into `repair_needed` without a prior progress write).
The `repair_needed` predicate matches `liveness_state = 'repair_needed'`
exactly, mirroring the existing count under `runs.repair_needed`. It
deliberately excludes the routing-blocked variant
`workflow_task_waiting_for_compatible_worker` (a wait state, not a
broken state); compatibility-blocked age is already surfaced under
`backlog.oldest_compatibility_blocked_started_at` /
`max_compatibility_blocked_age_ms`. The new signal pairs with that
routing-block age the same way `tasks.oldest_unhealthy_at` /
`max_unhealthy_age_ms` rolls up the four contributing per-path ages.
Pinned in `docs/architecture/rollout-safety.md` Frozen metric keys
table and asserted by `RolloutSafetyDocumentationTest` (frozen-keys
list plus a dedicated row regex). Covered end-to-end by two new
`V2OperatorMetricsTest` cases that verify the predicate matches
liveness_state exactly (compatibility-blocked rows do not contribute
their older `updated_at`), that the oldest stuck-since timestamp wins
across multiple repair_needed runs, and that the keys read as
`null` / 0 when no run is in repair_needed.1 parent b43d992 commit 402470f
4 files changed
Lines changed: 155 additions & 0 deletions
File tree
- docs/architecture
- src/V2/Support
- tests
- Feature/V2
- Unit/V2
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
403 | 403 | | |
404 | 404 | | |
405 | 405 | | |
| 406 | + | |
406 | 407 | | |
407 | 408 | | |
408 | 409 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| 96 | + | |
96 | 97 | | |
97 | 98 | | |
98 | 99 | | |
| |||
104 | 105 | | |
105 | 106 | | |
106 | 107 | | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
107 | 112 | | |
108 | 113 | | |
109 | 114 | | |
| |||
810 | 815 | | |
811 | 816 | | |
812 | 817 | | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
813 | 851 | | |
814 | 852 | | |
815 | 853 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1590 | 1590 | | |
1591 | 1591 | | |
1592 | 1592 | | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
| 1608 | + | |
| 1609 | + | |
| 1610 | + | |
| 1611 | + | |
| 1612 | + | |
| 1613 | + | |
| 1614 | + | |
| 1615 | + | |
| 1616 | + | |
| 1617 | + | |
| 1618 | + | |
| 1619 | + | |
| 1620 | + | |
| 1621 | + | |
| 1622 | + | |
| 1623 | + | |
| 1624 | + | |
| 1625 | + | |
| 1626 | + | |
| 1627 | + | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
| 1647 | + | |
| 1648 | + | |
| 1649 | + | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
1593 | 1696 | | |
1594 | 1697 | | |
1595 | 1698 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
| 143 | + | |
| 144 | + | |
143 | 145 | | |
144 | 146 | | |
145 | 147 | | |
| |||
460 | 462 | | |
461 | 463 | | |
462 | 464 | | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
463 | 476 | | |
464 | 477 | | |
465 | 478 | | |
| |||
0 commit comments