You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
evidence: "remove_peer() is implemented and unit-tested but never triggered by peer disconnect in either bridge loop. No peer-disconnect event arm exists in tokio::select! loops. Stale entries persist until heartbeat replace-not-merge (2s interval) or node restart."
evidence: "has_announced() is defined at p2p.rs:422 but never called in production bridge loops (clippy dead_code). get_recipients() falls back to Recipients::All at service level (zero subscribers), but excludes pre-v1.3 peers when at least one v1.3 peer subscribes to the same service."
### SUB-03: Peer disconnect cleanup not triggered (medium severity)
73
+
Phase 18 was created to close the two gaps identified in the initial audit:
93
74
94
-
**What:**`PeerSubscriptionMap.remove_peer()` is implemented at `p2p.rs:407` and fully unit-tested, but never called from the bridge loops when a peer disconnects. Neither `run_lookup_network` nor `run_discovery_network` has a peer-disconnect event arm in their `tokio::select!` loop.
75
+
### SUB-03: Peer disconnect cleanup — CLOSED
95
76
96
-
**Impact:**Stale subscription entries accumulate for disconnected peers. `get_recipients()`may include departed peers in `Recipients::Some(...)`, causing sends to unreachable peers (silently dropped by commonware-p2p).
77
+
**Prior gap:**`remove_peer()`was implemented and unit-tested but never triggered by peer disconnect. No disconnect event arm existed in either bridge loop.
97
78
98
-
**Mitigation:** Heartbeat replace-not-merge (every 2s) overwrites stale entries when a peer reconnects. For truly departed peers, entries persist until node restart. The Engine broadcast channel (channel 0) still delivers to all peers as catch-up reliability.
79
+
**Fix (Phase 18):** Heartbeat-driven pruning via `tracked_peers().difference(&connected_peer_set)` in both bridge loops. On each 2-second heartbeat tick, peers tracked in `PeerSubscriptionMap` but absent from the connected peer set are pruned via `remove_peer()`. `known_peers` is also pruned to allow ANN-04 re-hello on reconnect.
99
80
100
-
**Root cause:**commonware-p2p does not expose a peer-disconnect callback/event channel — no mechanism to trigger `remove_peer()`directly.
81
+
**Evidence:**`tracked_peers()` called at lines 1126 (lookup) and 1624 (discovery). `remove_peer(departed)`called at lines 1128 and 1626. 45/45 unit tests pass including `test_heartbeat_prune_departed_peer`.
**What:**`has_announced()`at `p2p.rs:422`was designed for per-peer COMPAT-03 checking but is never called in production code (dead_code warning). `get_recipients()`falls back to `Recipients::All` only when the service has zero subscribers — correct for the zero-subscriber case, but excludes pre-v1.3 peers when at least one v1.3 peer has subscribed to the same service.
85
+
**Prior gap:**`has_announced()` was dead production code. `get_recipients()`excluded pre-v1.3 peers when at least one v1.3 peer subscribed to the same service.
105
86
106
-
**Impact:**During rolling upgrade with mixed v1.3 + pre-v1.3 operators, legacy peers are excluded from targeted delivery for services where at least one v1.3 peer has subscribed. Engine channel (channel 0) catch-up mitigates this by delivering via `Recipients::All` on the broadcast channel.
87
+
**Fix (Phase 18):**`get_recipients()` now takes a `connected_peers: &HashSet<ed25519::PublicKey>` parameter. For each connected peer that has never announced (`!has_announced(peer)`), the peer is unconditionally included in the recipient set. All 6 production call sites pass `&connected_peer_set`.
107
88
108
-
**Fix path:**Call `has_announced(peer)`per connected peer in `get_recipients()` and include un-announced peers unconditionally in the recipient set.
89
+
**Evidence:**`has_announced()`called at line 473 inside `get_recipients()`. No dead_code clippy warning. Tests `test_get_recipients_includes_unannounced_connected_peers` and `test_get_recipients_all_announced_no_legacy` verify behavior.
109
90
110
91
## Cross-Phase Integration
111
92
112
-
### E2E Flow: Service → Subscription → Targeting → Observability
113
-
114
-
| Step | Status |
115
-
|------|--------|
116
-
| Service add → AggregatorCommand::SubscribeService | CONNECTED |
117
-
| Subscribe arm → SubscriptionAnnouncement broadcast via direct_sender | CONNECTED (ANN-01) |
-`has_announced()`— defined, tested, never called in production (dead_code)
111
+
None.`has_announced()`dead_code resolved by Phase 18.
130
112
131
113
## Nyquist Compliance
132
114
@@ -136,18 +118,20 @@ nyquist:
136
118
| 15 | exists | false |`/gsd:validate-phase 15`|
137
119
| 16 | exists | false |`/gsd:validate-phase 16`|
138
120
| 17 | exists | false |`/gsd:validate-phase 17`|
121
+
| 18 | exists | false |`/gsd:validate-phase 18`|
139
122
140
-
All 4 phases have VALIDATION.md but none are signed off (nyquist_compliant: false, wave_0_complete: false). Validation strategies were defined but not executed during phase work.
123
+
All 5 phases have VALIDATION.md but none are signed off (nyquist_compliant: false, wave_0_complete: false).
141
124
142
125
## Tech Debt Summary
143
126
144
127
| Phase | Items |
145
128
|-------|-------|
146
-
| All phases | No VERIFICATION.md files (process gap) |
129
+
| All phases | No VERIFICATION.md files (verifier disabled in config) |
147
130
| All phases | VALIDATION.md unsigned (nyquist_compliant: false) |
0 commit comments