Skip to content

Commit f0daee0

Browse files
joostjagerclaude
authored andcommitted
Replace dual-sync-async persistence panic with Watch contract
Commit 0760f99 ("Disallow dual-sync-async persistence without restarting") added a panic in non-test builds when a Persist implementation returns both Completed and InProgress from the same ChannelManager instance. However, this check runs against the status that ChainMonitor returns to ChannelManager, not the raw Persist result. When ChannelMonitor::update_monitor fails (e.g. a counterparty commitment_signed arrives after a funding spend confirms), ChainMonitor persists the full monitor successfully but overrides the return value to InProgress. If the user's Persist impl only ever returns Completed, this override triggers a false mode-mismatch panic. This replaces the panic with a per-channel contract at the Watch trait level: a Watch implementation must not return Completed for a channel update while prior InProgress updates are still pending. Switching from Completed to InProgress is always allowed, but switching back is impractical because the Watch implementation cannot observe when ChannelManager has finished processing a MonitorEvent::Completed. The documentation on ChannelMonitorUpdateStatus is updated to describe these rules. The mode tracking and panic checks from 0760f99 are removed and replaced with a panic that validates the new contract directly on the in-flight update state. Legacy tests that switch the persister between modes mid-flight can opt out via Node::disable_monitor_completeness_assertion(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c525295 commit f0daee0

7 files changed

Lines changed: 43 additions & 29 deletions

File tree

lightning/src/chain/channelmonitor.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7048,6 +7048,7 @@ mod tests {
70487048
let legacy_cfg = test_legacy_channel_config();
70497049
let node_chanmgrs = create_node_chanmgrs(3, &node_cfgs, &[Some(legacy_cfg.clone()), Some(legacy_cfg.clone()), Some(legacy_cfg)]);
70507050
let nodes = create_network(3, &node_cfgs, &node_chanmgrs);
7051+
nodes[1].disable_monitor_completeness_assertion();
70517052
let channel = create_announced_chan_between_nodes(&nodes, 0, 1);
70527053
create_announced_chan_between_nodes(&nodes, 1, 2);
70537054

lightning/src/chain/mod.rs

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -233,11 +233,10 @@ pub enum ChannelMonitorUpdateStatus {
233233
/// This includes performing any `fsync()` calls required to ensure the update is guaranteed to
234234
/// be available on restart even if the application crashes.
235235
///
236-
/// If you return this variant, you cannot later return [`InProgress`] from the same instance of
237-
/// [`Persist`]/[`Watch`] without first restarting.
236+
/// You cannot switch from [`InProgress`] to this variant for the same channel without first
237+
/// restarting. However, switching from this variant to [`InProgress`] is always allowed.
238238
///
239239
/// [`InProgress`]: ChannelMonitorUpdateStatus::InProgress
240-
/// [`Persist`]: chainmonitor::Persist
241240
Completed,
242241
/// Indicates that the update will happen asynchronously in the background or that a transient
243242
/// failure occurred which is being retried in the background and will eventually complete.
@@ -263,12 +262,7 @@ pub enum ChannelMonitorUpdateStatus {
263262
/// reliable, this feature is considered beta, and a handful of edge-cases remain. Until the
264263
/// remaining cases are fixed, in rare cases, *using this feature may lead to funds loss*.
265264
///
266-
/// If you return this variant, you cannot later return [`Completed`] from the same instance of
267-
/// [`Persist`]/[`Watch`] without first restarting.
268-
///
269265
/// [`InProgress`]: ChannelMonitorUpdateStatus::InProgress
270-
/// [`Completed`]: ChannelMonitorUpdateStatus::Completed
271-
/// [`Persist`]: chainmonitor::Persist
272266
InProgress,
273267
/// Indicates that an update has failed and will not complete at any point in the future.
274268
///
@@ -328,6 +322,8 @@ pub trait Watch<ChannelSigner: EcdsaChannelSigner> {
328322
/// cannot be retried, the node should shut down immediately after returning
329323
/// [`ChannelMonitorUpdateStatus::UnrecoverableError`], see its documentation for more info.
330324
///
325+
/// See [`ChannelMonitorUpdateStatus`] for requirements on when each variant may be returned.
326+
///
331327
/// [`ChannelManager`]: crate::ln::channelmanager::ChannelManager
332328
fn update_channel(
333329
&self, channel_id: ChannelId, update: &ChannelMonitorUpdate,

lightning/src/ln/chanmon_update_fail_tests.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,7 @@ fn do_test_simple_monitor_temporary_update_fail(disconnect: bool) {
176176
let node_cfgs = create_node_cfgs(2, &chanmon_cfgs);
177177
let node_chanmgrs = create_node_chanmgrs(2, &node_cfgs, &[None, None]);
178178
let mut nodes = create_network(2, &node_cfgs, &node_chanmgrs);
179+
nodes[0].disable_monitor_completeness_assertion();
179180

180181
let node_a_id = nodes[0].node.get_our_node_id();
181182
let node_b_id = nodes[1].node.get_our_node_id();
@@ -317,6 +318,7 @@ fn do_test_monitor_temporary_update_fail(disconnect_count: usize) {
317318
let node_cfgs = create_node_cfgs(2, &chanmon_cfgs);
318319
let node_chanmgrs = create_node_chanmgrs(2, &node_cfgs, &[None, None]);
319320
let mut nodes = create_network(2, &node_cfgs, &node_chanmgrs);
321+
nodes[0].disable_monitor_completeness_assertion();
320322

321323
let node_a_id = nodes[0].node.get_our_node_id();
322324
let node_b_id = nodes[1].node.get_our_node_id();
@@ -970,6 +972,7 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
970972
let node_cfgs = create_node_cfgs(3, &chanmon_cfgs);
971973
let node_chanmgrs = create_node_chanmgrs(3, &node_cfgs, &[None, None, None]);
972974
let mut nodes = create_network(3, &node_cfgs, &node_chanmgrs);
975+
nodes[1].disable_monitor_completeness_assertion();
973976

974977
let node_a_id = nodes[0].node.get_our_node_id();
975978
let node_b_id = nodes[1].node.get_our_node_id();
@@ -1501,6 +1504,7 @@ fn claim_while_disconnected_monitor_update_fail() {
15011504
let node_cfgs = create_node_cfgs(2, &chanmon_cfgs);
15021505
let node_chanmgrs = create_node_chanmgrs(2, &node_cfgs, &[None, None]);
15031506
let mut nodes = create_network(2, &node_cfgs, &node_chanmgrs);
1507+
nodes[1].disable_monitor_completeness_assertion();
15041508

15051509
let node_a_id = nodes[0].node.get_our_node_id();
15061510
let node_b_id = nodes[1].node.get_our_node_id();
@@ -1728,6 +1732,7 @@ fn first_message_on_recv_ordering() {
17281732
let node_cfgs = create_node_cfgs(2, &chanmon_cfgs);
17291733
let node_chanmgrs = create_node_chanmgrs(2, &node_cfgs, &[None, None]);
17301734
let mut nodes = create_network(2, &node_cfgs, &node_chanmgrs);
1735+
nodes[1].disable_monitor_completeness_assertion();
17311736

17321737
let node_a_id = nodes[0].node.get_our_node_id();
17331738
let node_b_id = nodes[1].node.get_our_node_id();
@@ -3850,6 +3855,7 @@ fn do_test_durable_preimages_on_closed_channel(
38503855
// Now reload node B
38513856
let manager_b = nodes[1].node.encode();
38523857
reload_node!(nodes[1], &manager_b, &[&mon_ab, &mon_bc], persister, chain_mon, node_b_reload);
3858+
nodes[1].disable_monitor_completeness_assertion();
38533859

38543860
nodes[0].node.peer_disconnected(node_b_id);
38553861
nodes[2].node.peer_disconnected(node_b_id);

lightning/src/ln/channelmanager.rs

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2870,12 +2870,12 @@ pub struct ChannelManager<
28702870
#[cfg(any(test, feature = "_test_utils"))]
28712871
pub(super) per_peer_state: FairRwLock<HashMap<PublicKey, Mutex<PeerState<SP>>>>,
28722872

2873-
/// We only support using one of [`ChannelMonitorUpdateStatus::InProgress`] and
2874-
/// [`ChannelMonitorUpdateStatus::Completed`] without restarting. Because the API does not
2875-
/// otherwise directly enforce this, we enforce it in non-test builds here by storing which one
2876-
/// is in use.
2877-
#[cfg(not(any(test, feature = "_externalize_tests")))]
2878-
monitor_update_type: AtomicUsize,
2873+
/// When set, disables the panic when `Watch::update_channel` returns `Completed` while
2874+
/// prior updates are still `InProgress`. Some legacy tests switch the persister between
2875+
/// `InProgress` and `Completed` mid-flight, which violates this contract but is otherwise
2876+
/// harmless in a test context.
2877+
#[cfg(test)]
2878+
pub(crate) skip_monitor_update_assertion: AtomicBool,
28792879

28802880
/// The set of events which we need to give to the user to handle. In some cases an event may
28812881
/// require some further action after the user handles it (currently only blocking a monitor
@@ -3618,8 +3618,8 @@ impl<
36183618

36193619
per_peer_state: FairRwLock::new(new_hash_map()),
36203620

3621-
#[cfg(not(any(test, feature = "_externalize_tests")))]
3622-
monitor_update_type: AtomicUsize::new(0),
3621+
#[cfg(test)]
3622+
skip_monitor_update_assertion: AtomicBool::new(false),
36233623

36243624
pending_events: Mutex::new(VecDeque::new()),
36253625
pending_events_processor: AtomicBool::new(false),
@@ -10380,6 +10380,15 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1038010380
if update_completed {
1038110381
let _ = in_flight_updates.remove(update_idx);
1038210382
}
10383+
// A Watch implementation must not return Completed while prior updates are
10384+
// still InProgress, as this would violate the async persistence contract.
10385+
#[cfg(test)]
10386+
let skip_check = self.skip_monitor_update_assertion.load(Ordering::Relaxed);
10387+
#[cfg(not(test))]
10388+
let skip_check = false;
10389+
if !skip_check && update_completed && !in_flight_updates.is_empty() {
10390+
panic!("Watch::update_channel returned Completed while prior updates are still InProgress");
10391+
}
1038310392
(update_completed, update_completed && in_flight_updates.is_empty())
1038410393
} else {
1038510394
// We blindly assume that the ChannelMonitorUpdate will be regenerated on startup if we
@@ -10445,23 +10454,13 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1044510454
panic!("{}", err_str);
1044610455
},
1044710456
ChannelMonitorUpdateStatus::InProgress => {
10448-
#[cfg(not(any(test, feature = "_externalize_tests")))]
10449-
if self.monitor_update_type.swap(1, Ordering::Relaxed) == 2 {
10450-
panic!("Cannot use both ChannelMonitorUpdateStatus modes InProgress and Completed without restart");
10451-
}
1045210457
log_debug!(
1045310458
logger,
1045410459
"ChannelMonitor update in flight, holding messages until the update completes.",
1045510460
);
1045610461
false
1045710462
},
10458-
ChannelMonitorUpdateStatus::Completed => {
10459-
#[cfg(not(any(test, feature = "_externalize_tests")))]
10460-
if self.monitor_update_type.swap(2, Ordering::Relaxed) == 1 {
10461-
panic!("Cannot use both ChannelMonitorUpdateStatus modes InProgress and Completed without restart");
10462-
}
10463-
true
10464-
},
10463+
ChannelMonitorUpdateStatus::Completed => true,
1046510464
}
1046610465
}
1046710466

@@ -20160,8 +20159,8 @@ impl<
2016020159

2016120160
per_peer_state: FairRwLock::new(per_peer_state),
2016220161

20163-
#[cfg(not(any(test, feature = "_externalize_tests")))]
20164-
monitor_update_type: AtomicUsize::new(0),
20162+
#[cfg(test)]
20163+
skip_monitor_update_assertion: AtomicBool::new(false),
2016520164

2016620165
pending_events: Mutex::new(pending_events_read),
2016720166
pending_events_processor: AtomicBool::new(false),

lightning/src/ln/functional_test_utils.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -598,6 +598,14 @@ impl<'a, 'b, 'c> Node<'a, 'b, 'c> {
598598
self.node.init_features() | self.onion_messenger.provided_init_features(peer_node_id)
599599
})
600600
}
601+
602+
/// Disables the panic when `Watch::update_channel` returns `Completed` while prior updates
603+
/// are still `InProgress`. Some legacy tests switch the persister between modes mid-flight,
604+
/// which violates this contract but is otherwise harmless.
605+
#[cfg(test)]
606+
pub fn disable_monitor_completeness_assertion(&self) {
607+
self.node.skip_monitor_update_assertion.store(true, core::sync::atomic::Ordering::Relaxed);
608+
}
601609
}
602610

603611
impl<'a, 'b, 'c> std::panic::UnwindSafe for Node<'a, 'b, 'c> {}

lightning/src/ln/monitor_tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3384,6 +3384,7 @@ fn test_claim_event_never_handled() {
33843384
let chan_0_monitor_serialized = get_monitor!(nodes[1], chan.2).encode();
33853385
let mons = &[&chan_0_monitor_serialized[..]];
33863386
reload_node!(nodes[1], &init_node_ser, mons, persister, new_chain_mon, nodes_1_reload);
3387+
nodes[1].disable_monitor_completeness_assertion();
33873388

33883389
expect_payment_claimed!(nodes[1], payment_hash_a, 1_000_000);
33893390
// The reload logic spuriously generates a redundant payment preimage-containing

lightning/src/ln/reload_tests.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -823,12 +823,14 @@ fn do_test_partial_claim_before_restart(persist_both_monitors: bool, double_rest
823823

824824
// Now restart nodes[3].
825825
reload_node!(nodes[3], original_manager.clone(), &[&updated_monitor.0, &original_monitor.0], persist_d_1, chain_d_1, node_d_1);
826+
nodes[3].disable_monitor_completeness_assertion();
826827

827828
if double_restart {
828829
// Previously, we had a bug where we'd fail to reload if we re-persist the `ChannelManager`
829830
// without updating any `ChannelMonitor`s as we'd fail to double-initiate the claim replay.
830831
// We test that here ensuring that we can reload again.
831832
reload_node!(nodes[3], node_d_1.encode(), &[&updated_monitor.0, &original_monitor.0], persist_d_2, chain_d_2, node_d_2);
833+
nodes[3].disable_monitor_completeness_assertion();
832834
}
833835

834836
// Until the startup background events are processed (in `get_and_clear_pending_events`,
@@ -2216,6 +2218,7 @@ fn test_reload_with_mpp_claims_on_same_channel() {
22162218
nodes_1_deserialized,
22172219
Some(true)
22182220
);
2221+
nodes[1].disable_monitor_completeness_assertion();
22192222

22202223
// When the claims are reconstructed during reload, PaymentForwarded events are regenerated.
22212224
let events = nodes[1].node.get_and_clear_pending_events();

0 commit comments

Comments
 (0)