Commit 5451d17
Rescues @imwyvern's #1757 against current main, addressing both items
in @phil-opp's 2026-04-28 review (changes-requested).
The bug (#1682)
================
`binaries/coordinator/src/lib.rs:1558` (pre-fix) had:
Ok(_) => {
dataflow.node_to_daemon.insert(...);
dataflow.descriptor.nodes.push(...);
dataflow.nodes.insert(...);
Ok(ControlRequestReply::NodeAdded { ... })
}
The `Ok(_)` arm accepted ANY successful `send_and_receive` reply from
the daemon as proof that AddNode applied. The daemon's actual handler
returned `reply_tx.send(None)` — a null reply — and the coordinator
would still happily commit state.
When the daemon's TCP connection happened to be carrying a stale or
out-of-order `SetParamResult` / other variant (which can happen under
concurrent CLI requests against a flaky daemon), the coordinator
committed AddNode state for a node the daemon never actually added.
The user-visible symptom from #1682:
$ dora node add <node>
<times out>
$ dora node list
<node> shows as present in descriptor
<subsequent commands corrupt or hang>
The fix
========
Three coordinated changes:
1. `libraries/message/src/daemon_to_coordinator.rs`: new
`DaemonCoordinatorReply::AddNodeResult(Result<(), String>)` variant
so the daemon can identify the reply specifically.
2. `binaries/daemon/src/lib.rs`: the `DaemonCoordinatorEvent::AddNode`
handler now sends `AddNodeResult(result.map_err(|e| format!("{e:?}")))`
instead of the previous `None` placeholder.
3. `binaries/coordinator/src/lib.rs`:
* `Ok(reply_raw) =>` arm now calls a new `ensure_add_node_applied`
helper that pattern-matches the reply against `AddNodeResult`.
* The validator returns `Err(eyre!(...))` on either an explicit
daemon failure or a wrong-variant reply.
* The validator's error is funneled into the existing `Err` branch
of the `result` binding, which the coordinator's main loop sends
back to the CLI as a `ControlRequestReply::Error`. **The error
does NOT propagate via `?` past the dispatch arm** — addresses
phil's concern that the original PR's `?` would tear down the
coordinator's main loop on a recoverable per-request failure.
Addressing @phil-opp's review
==============================
Two items from the 2026-04-28 review:
* "This brings the whole coordinator down on error, no? We should
instead forward the result to the CLI and let it deal with the
failure." — Fixed. The helper returns `Err`, the call site converts
it into the `Err` arm of `result`, which becomes a
`ControlRequestReply::Error` sent back to the CLI. The coordinator
loop continues handling subsequent requests.
* "These tests are not testing anything non-trivial. Ideally, there
would be a test that fails before this PR and succeeds after this
PR." — The three new tests now explicitly cover:
- happy path (`AddNodeResult(Ok)` accepted)
- daemon-rejection path (`AddNodeResult(Err)` rejected with named
operation + node id)
- **regression scenario for #1682**: a wrong-variant reply
(`SetParamResult(Ok)`) is rejected with "unexpected daemon
reply" instead of silently committing state. This is the
specific failure mode that caused #1682's state corruption.
A full end-to-end test against a mock daemon would require
extracting the AddNode handler from the `start_inner` async loop —
out of scope for the rescue. The helper-level tests cover the
contract regression-style: if the validator is removed or weakened,
these tests fail.
What did NOT change
====================
* No new public API. The CLI/daemon protocol gains one reply variant,
consistent with the existing `SetParamResult` / `DeleteParamResult`
pattern in the same enum.
* No semver-affecting changes to user-facing types.
* No behavioral change for the happy path — a successful AddNode
still returns `ControlRequestReply::NodeAdded { dataflow_id,
node_id }` to the CLI exactly as before.
* No new dependencies.
Verification
=============
cargo fmt --all -- --check
cargo clippy --all --exclude dora-{node-api,operator-api,ros2-bridge}-python -- -D warnings
cargo test -p dora-coordinator --lib add_node_reply
(3/3 new tests pass)
cargo test -p dora-coordinator -p dora-daemon -p dora-message --lib
(103+/all pass)
cargo check --examples
Manual verification of the original #1682 repro recipe (`dora up` →
`dora build` → `dora start --detach` → `dora node list` → `dora node
add`) should now surface daemon errors as CLI-visible failures
instead of timeouts + state corruption.
Co-authored-by: imwyvern <imwyvern@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 54605dc commit 5451d17
3 files changed
Lines changed: 153 additions & 26 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1555 | 1555 | | |
1556 | 1556 | | |
1557 | 1557 | | |
1558 | | - | |
1559 | | - | |
1560 | | - | |
1561 | | - | |
1562 | | - | |
1563 | | - | |
1564 | | - | |
1565 | | - | |
1566 | | - | |
1567 | | - | |
1568 | | - | |
1569 | | - | |
1570 | | - | |
1571 | | - | |
1572 | | - | |
1573 | | - | |
1574 | | - | |
1575 | | - | |
1576 | | - | |
1577 | | - | |
1578 | | - | |
1579 | | - | |
1580 | | - | |
1581 | | - | |
1582 | | - | |
| 1558 | + | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
| 1562 | + | |
| 1563 | + | |
| 1564 | + | |
| 1565 | + | |
| 1566 | + | |
| 1567 | + | |
| 1568 | + | |
| 1569 | + | |
| 1570 | + | |
| 1571 | + | |
| 1572 | + | |
| 1573 | + | |
| 1574 | + | |
| 1575 | + | |
| 1576 | + | |
| 1577 | + | |
| 1578 | + | |
| 1579 | + | |
| 1580 | + | |
| 1581 | + | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
| 1589 | + | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
| 1608 | + | |
| 1609 | + | |
1583 | 1610 | | |
1584 | 1611 | | |
1585 | 1612 | | |
| |||
1625 | 1652 | | |
1626 | 1653 | | |
1627 | 1654 | | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
1628 | 1664 | | |
1629 | 1665 | | |
1630 | 1666 | | |
| |||
3414 | 3450 | | |
3415 | 3451 | | |
3416 | 3452 | | |
| 3453 | + | |
| 3454 | + | |
| 3455 | + | |
| 3456 | + | |
| 3457 | + | |
| 3458 | + | |
| 3459 | + | |
| 3460 | + | |
| 3461 | + | |
| 3462 | + | |
| 3463 | + | |
| 3464 | + | |
| 3465 | + | |
| 3466 | + | |
| 3467 | + | |
| 3468 | + | |
| 3469 | + | |
| 3470 | + | |
| 3471 | + | |
| 3472 | + | |
| 3473 | + | |
3417 | 3474 | | |
3418 | 3475 | | |
3419 | 3476 | | |
| |||
4710 | 4767 | | |
4711 | 4768 | | |
4712 | 4769 | | |
| 4770 | + | |
| 4771 | + | |
| 4772 | + | |
| 4773 | + | |
| 4774 | + | |
| 4775 | + | |
| 4776 | + | |
| 4777 | + | |
| 4778 | + | |
| 4779 | + | |
| 4780 | + | |
| 4781 | + | |
| 4782 | + | |
| 4783 | + | |
| 4784 | + | |
| 4785 | + | |
| 4786 | + | |
| 4787 | + | |
| 4788 | + | |
| 4789 | + | |
| 4790 | + | |
| 4791 | + | |
| 4792 | + | |
| 4793 | + | |
| 4794 | + | |
| 4795 | + | |
| 4796 | + | |
| 4797 | + | |
| 4798 | + | |
| 4799 | + | |
| 4800 | + | |
| 4801 | + | |
| 4802 | + | |
| 4803 | + | |
| 4804 | + | |
| 4805 | + | |
| 4806 | + | |
| 4807 | + | |
| 4808 | + | |
| 4809 | + | |
| 4810 | + | |
| 4811 | + | |
| 4812 | + | |
| 4813 | + | |
| 4814 | + | |
| 4815 | + | |
| 4816 | + | |
| 4817 | + | |
| 4818 | + | |
| 4819 | + | |
| 4820 | + | |
| 4821 | + | |
| 4822 | + | |
| 4823 | + | |
| 4824 | + | |
| 4825 | + | |
4713 | 4826 | | |
4714 | 4827 | | |
4715 | 4828 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1770 | 1770 | | |
1771 | 1771 | | |
1772 | 1772 | | |
1773 | | - | |
| 1773 | + | |
| 1774 | + | |
| 1775 | + | |
| 1776 | + | |
| 1777 | + | |
| 1778 | + | |
| 1779 | + | |
1774 | 1780 | | |
1775 | 1781 | | |
1776 | 1782 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
202 | 202 | | |
203 | 203 | | |
204 | 204 | | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
205 | 213 | | |
206 | 214 | | |
207 | 215 | | |
| |||
0 commit comments