Skip to content

Commit 2050955

Browse files
grunchclaude
andcommitted
fix(pow): run PoW probe concurrently with the DM wait
Previous attempt (commit 72ced79) capped the postflight info-event lookup at 3s, but it still ran sequentially after the 15s DM wait timed out, so the user-visible failure path could take up to 18s when the relay was slow. Run the probe concurrently with the DM wait instead: by the time the wait elapses the probe's answer is typically already in hand, so the timeout branch consumes a resolved JoinHandle with ~0s added latency. POW_PROBE_TIMEOUT is kept as a safety net for pathological relays that outlive the 15s wait. The probe needs a 'static future for tokio::spawn, so the work moves into a new fetch_required_pow_with(client, mostro_pubkey) — the existing fetch_required_pow(ctx) becomes a thin wrapper around it. On the happy path (DM arrives in time) the spawned probe is aborted so we don't leak a stray relay request. Addresses review feedback on #173. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 72ced79 commit 2050955

3 files changed

Lines changed: 70 additions & 28 deletions

File tree

docs/pow_error_handling.md

Lines changed: 28 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -122,20 +122,31 @@ through (`add_invoice`, `take_order`, `take_dispute`, `send_msg`, `new_order`,
122122
`rate_user`, `orders_info`, `restore`, `last_trade_index`, `add_bond_invoice`).
123123
Centralizing the fix here covers every command in one place.
124124

125-
Postflight check (chosen — see Alternatives below):
125+
Concurrent probe (chosen — see Alternatives below):
126126

127127
```rust
128+
// Kick off the PoW probe alongside the DM wait so its answer is in hand
129+
// the moment the wait times out. The probe is cheap to start and cheap to
130+
// cancel via JoinHandle::abort() on the happy path.
131+
let pow_probe = tokio::spawn(fetch_required_pow_with(
132+
ctx.client.clone(),
133+
ctx.mostro_pubkey,
134+
));
135+
136+
let waited = tokio::time::timeout(FETCH_EVENTS_TIMEOUT, /* notification loop */).await;
137+
128138
let event = match waited {
129-
Ok(inner) => inner?,
139+
Ok(inner) => {
140+
pow_probe.abort();
141+
inner?
142+
}
130143
Err(_elapsed) => {
131-
// Before declaring this a generic timeout, check whether the daemon
132-
// advertises a PoW requirement we didn't meet — that's the real
133-
// cause "deadline has elapsed" was hiding. Bounded by
134-
// POW_PROBE_TIMEOUT so a slow/unreachable relay can't double the
135-
// user-visible wait; if the probe doesn't return in time we fall
136-
// through to the generic timeout error instead of hanging.
137-
let probe = tokio::time::timeout(POW_PROBE_TIMEOUT, fetch_required_pow(ctx)).await;
138-
if let Ok(Some(required)) = probe {
144+
// Probe has been running for FETCH_EVENTS_TIMEOUT alongside the
145+
// wait; it should already be done. POW_PROBE_TIMEOUT is a safety
146+
// net for pathological relays — if the answer isn't in by then,
147+
// fall through to the generic timeout error.
148+
let probe_result = tokio::time::timeout(POW_PROBE_TIMEOUT, pow_probe).await;
149+
if let Ok(Ok(Some(required))) = probe_result {
139150
let configured = parse_pow_env().unwrap_or(0);
140151
if required > configured {
141152
return Err(PowRequirementUnmet { required, configured }.into());
@@ -146,9 +157,13 @@ let event = match waited {
146157
};
147158
```
148159

149-
`POW_PROBE_TIMEOUT` is a small constant (currently 3 s) — well below
150-
`FETCH_EVENTS_TIMEOUT` (15 s). Worst-case user-visible wait stays at one
151-
`FETCH_EVENTS_TIMEOUT` plus the probe budget instead of doubling.
160+
The probe lives in `events::fetch_required_pow_with(client, mostro_pubkey)`
161+
— an owned-args sibling of `fetch_required_pow(ctx)`, used so the spawned
162+
future is `'static`. The 3 s `POW_PROBE_TIMEOUT` is now a safety net rather
163+
than the typical wait: in the common timeout case the probe is already
164+
resolved when we look at it, so the user-visible wait stays at
165+
`FETCH_EVENTS_TIMEOUT` (15 s) plus ~0 s, instead of doubling to 30 s as the
166+
naive sequential version would.
152167

153168
Add an `&Context` parameter? Look at the signature today —
154169
`wait_for_dm(ctx, order_trade_keys, sent_message)``ctx` is already

src/util/events.rs

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,9 +144,24 @@ pub async fn fetch_bond_claim_window_days(ctx: &crate::cli::Context) -> Option<i
144144
/// Used by [`crate::util::messaging::wait_for_dm`] to distinguish a real
145145
/// timeout from a silent PoW rejection — see `docs/pow_error_handling.md`.
146146
pub async fn fetch_required_pow(ctx: &crate::cli::Context) -> Option<u8> {
147-
fetch_info_tag(ctx, "pow")
147+
fetch_required_pow_with(ctx.client.clone(), ctx.mostro_pubkey).await
148+
}
149+
150+
/// Owned-args variant of [`fetch_required_pow`], suitable for `tokio::spawn`.
151+
///
152+
/// `wait_for_dm` kicks the probe off concurrently with the DM wait so the
153+
/// answer is already in hand by the time the wait times out (zero added
154+
/// latency in the timeout path, instead of a sequential second fetch).
155+
pub async fn fetch_required_pow_with(client: Client, mostro_pubkey: PublicKey) -> Option<u8> {
156+
let filter = Filter::new()
157+
.author(mostro_pubkey)
158+
.kind(nostr_sdk::Kind::Custom(NOSTR_INFO_EVENT_KIND));
159+
let events = client
160+
.fetch_events(filter, FETCH_EVENTS_TIMEOUT)
148161
.await
149-
.and_then(|v| v.parse::<u8>().ok())
162+
.ok()?;
163+
let event = events.iter().max_by_key(|e| e.created_at)?;
164+
read_info_tag_from_event(event, "pow").and_then(|v| v.parse::<u8>().ok())
150165
}
151166

152167
#[allow(clippy::too_many_arguments)]

src/util/messaging.rs

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -329,6 +329,17 @@ where
329329
// Send message here after opening notifications to avoid missing messages.
330330
sent_message.await?;
331331

332+
// Kick off the PoW probe concurrently with the DM wait. By running the
333+
// kind-38385 lookup alongside the 15s `FETCH_EVENTS_TIMEOUT` instead of
334+
// *after* it, the timeout branch doesn't pay a second sequential
335+
// `fetch_events` round-trip — by then the probe has typically already
336+
// returned. `JoinHandle` lets us `abort()` the probe cheaply on the happy
337+
// path (DM arrives in time) without leaking the task.
338+
let pow_probe = tokio::spawn(super::events::fetch_required_pow_with(
339+
ctx.client.clone(),
340+
ctx.mostro_pubkey,
341+
));
342+
332343
// Wait for the DM or gift wrap event
333344
let waited = tokio::time::timeout(super::events::FETCH_EVENTS_TIMEOUT, async move {
334345
loop {
@@ -346,22 +357,23 @@ where
346357
// Keep a genuine timeout (the only "no reply" outcome) distinguishable from
347358
// a notification-channel error so callers can treat them differently.
348359
let event = match waited {
349-
Ok(inner) => inner?,
360+
Ok(inner) => {
361+
// Happy path: DM arrived. Cancel the probe; the answer is no
362+
// longer needed and we don't want a stray relay request lingering.
363+
pow_probe.abort();
364+
inner?
365+
}
350366
Err(_elapsed) => {
351367
// mostrod silently drops events whose outer GiftWrap doesn't meet
352368
// its NIP-13 PoW requirement (relay accepts → daemon discards →
353-
// no reply ever comes). Before declaring this a generic timeout,
354-
// ask the daemon's kind-38385 info event whether that's actually
355-
// the cause we're hiding behind "no reply".
356-
//
357-
// The probe is bounded by `POW_PROBE_TIMEOUT` instead of the full
358-
// `FETCH_EVENTS_TIMEOUT` so a slow/unreachable relay can't double
359-
// the user-visible wait. If the probe doesn't return in time, fall
360-
// through to the generic timeout error rather than hanging.
361-
let probe =
362-
tokio::time::timeout(POW_PROBE_TIMEOUT, super::events::fetch_required_pow(ctx))
363-
.await;
364-
if let Ok(Some(required)) = probe {
369+
// no reply ever comes). The probe has already been running for
370+
// `FETCH_EVENTS_TIMEOUT` alongside the wait, so it is almost
371+
// certainly done. Cap the await with `POW_PROBE_TIMEOUT` as a
372+
// safety net so a pathological relay can't keep us hanging — if
373+
// the probe isn't back by then, fall through to the generic
374+
// timeout error instead of waiting any longer.
375+
let probe_result = tokio::time::timeout(POW_PROBE_TIMEOUT, pow_probe).await;
376+
if let Ok(Ok(Some(required))) = probe_result {
365377
let configured = parse_pow_env().unwrap_or(0);
366378
if required > configured {
367379
return Err(PowRequirementUnmet {

0 commit comments

Comments
 (0)