Skip to content

Commit 25e07ae

Browse files
authored
Merge pull request #173 from MostroP2P/fix/pow-rejection-error
fix(pow): surface explicit PoW rejection instead of generic timeout
2 parents 5623f71 + 2050955 commit 25e07ae

4 files changed

Lines changed: 497 additions & 20 deletions

File tree

docs/pow_error_handling.md

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
# Spec — Surface explicit PoW rejection error
2+
3+
Tracking issue: [MostroP2P/mostro-cli#172](https://github.com/MostroP2P/mostro-cli/issues/172)
4+
5+
## 1. Problem
6+
7+
When a user sends an event from `mostro-cli` to a `mostrod` instance that
8+
**requires NIP‑13 Proof‑of‑Work** without providing the required PoW, the
9+
daemon silently drops the event (logs `Not POW verified event!` on its side)
10+
and never replies. The CLI eventually times out and surfaces a generic
11+
timeout error to the user:
12+
13+
```text
14+
deadline has elapsed
15+
```
16+
17+
(or, in current `main`, the slightly less awful but still ambiguous
18+
`Timeout waiting for DM or gift wrap event`).
19+
20+
This is misleading. From the user perspective the message can mean almost
21+
anything: the relay is slow, mostrod is down, the network is broken, the CLI
22+
is buggy, or the command was wrong. The real cause — PoW not satisfied — is
23+
hidden.
24+
25+
## 2. Root cause
26+
27+
The flow that breaks today:
28+
29+
1. `mostro-cli` mines PoW on the outer GiftWrap based on the `POW` env var
30+
(default `0`) — see `src/util/messaging.rs::parse_pow_env` and the
31+
`WrapOptions { pow, .. }` plumbing.
32+
2. The wrapped event is published to relays via `client.send_event(...)`.
33+
3. `wait_for_dm` opens a subscription on the trade key and waits for
34+
`FETCH_EVENTS_TIMEOUT` (15 s) for an inbound GiftWrap.
35+
4. If `mostrod` requires PoW above what the client provided, mostrod
36+
silently drops the event in
37+
[`mostro/src/app.rs`](https://github.com/MostroP2P/mostro) — the relay
38+
accepted it (so there's no NIP‑01 `OK false` from the relay either) and
39+
no reply is ever produced.
40+
5. `tokio::time::timeout` elapses → `wait_for_dm` returns `WaitForDmTimeout`
41+
→ the user sees the generic timeout message.
42+
43+
The CLI has **no way to distinguish** "daemon silently dropped my event for
44+
PoW reasons" from "transient relay or network issue".
45+
46+
## 3. Signal we can use
47+
48+
`mostrod` already publishes its required PoW in its **kind‑38385 info event**
49+
under the tag `["pow", "<difficulty>"]`. See
50+
[`mostro/src/nip33.rs`](https://github.com/MostroP2P/mostro) (`new_info_event`)
51+
where the daemon writes:
52+
53+
```rust
54+
Tag::custom(
55+
TagKind::Custom(Cow::Borrowed("pow")),
56+
vec![mostro_settings.pow.to_string()],
57+
),
58+
```
59+
60+
Reading that tag on the CLI side lets us answer the only question we care
61+
about: *did this Mostro instance reject our event because we didn't provide
62+
enough PoW?*
63+
64+
We already fetch kind‑38385 elsewhere in the CLI — see
65+
`src/util/events.rs::fetch_bond_claim_window_days` — so this is a small
66+
extension of an existing pattern.
67+
68+
## 4. Design
69+
70+
### 4.1 Error variant
71+
72+
Add a typed error so callers (and tests) can match the PoW failure mode
73+
distinctly from a generic timeout:
74+
75+
```rust
76+
#[derive(Debug)]
77+
pub struct PowRequirementUnmet {
78+
pub required: u8,
79+
pub configured: u8,
80+
}
81+
```
82+
83+
The user-facing `Display` will be explicit:
84+
85+
```text
86+
This Mostro instance requires NIP-13 proof of work of N bits, but the
87+
client sent the event with M bits. Re-run with `--pow N` or set
88+
`POW=N` and try again.
89+
```
90+
91+
The error lives next to `WaitForDmTimeout` in `src/util/messaging.rs` (same
92+
module, same idiom: small zero/struct error wrapped via `anyhow`).
93+
94+
### 4.2 Helper — fetch required PoW from kind‑38385
95+
96+
Add a helper in `src/util/events.rs`:
97+
98+
```rust
99+
/// Best-effort: fetch the Mostro instance's kind-38385 info event and read
100+
/// the `pow` tag. Returns `None` when no info event is available or the tag
101+
/// is missing/unparseable. Mirrors `fetch_bond_claim_window_days`.
102+
pub async fn fetch_required_pow(ctx: &crate::cli::Context) -> Option<u8>;
103+
```
104+
105+
Implementation details:
106+
107+
- Author filter on `ctx.mostro_pubkey`, kind `NOSTR_INFO_EVENT_KIND`.
108+
- Pick the **newest** revision by `created_at` (replaceable but lagging
109+
relays can still serve old copies; same caveat as the bond-window helper).
110+
- Scan tags for `["pow", "<value>"]`, parse as `u8`.
111+
- Any error (fetch failure, missing tag, unparseable value) → `None`.
112+
113+
Returning `None` is the "I don't know" signal; the caller treats it as
114+
*"can't blame PoW"* and falls back to the generic timeout error. This keeps
115+
the helper non-fatal: older daemons that don't publish the tag, or unreachable
116+
relays, never break flows that worked before.
117+
118+
### 4.3 Where to plug it in — `wait_for_dm`
119+
120+
`wait_for_dm` is the single chokepoint that every request/reply flow goes
121+
through (`add_invoice`, `take_order`, `take_dispute`, `send_msg`, `new_order`,
122+
`rate_user`, `orders_info`, `restore`, `last_trade_index`, `add_bond_invoice`).
123+
Centralizing the fix here covers every command in one place.
124+
125+
Concurrent probe (chosen — see Alternatives below):
126+
127+
```rust
128+
// Kick off the PoW probe alongside the DM wait so its answer is in hand
129+
// the moment the wait times out. The probe is cheap to start and cheap to
130+
// cancel via JoinHandle::abort() on the happy path.
131+
let pow_probe = tokio::spawn(fetch_required_pow_with(
132+
ctx.client.clone(),
133+
ctx.mostro_pubkey,
134+
));
135+
136+
let waited = tokio::time::timeout(FETCH_EVENTS_TIMEOUT, /* notification loop */).await;
137+
138+
let event = match waited {
139+
Ok(inner) => {
140+
pow_probe.abort();
141+
inner?
142+
}
143+
Err(_elapsed) => {
144+
// Probe has been running for FETCH_EVENTS_TIMEOUT alongside the
145+
// wait; it should already be done. POW_PROBE_TIMEOUT is a safety
146+
// net for pathological relays — if the answer isn't in by then,
147+
// fall through to the generic timeout error.
148+
let probe_result = tokio::time::timeout(POW_PROBE_TIMEOUT, pow_probe).await;
149+
if let Ok(Ok(Some(required))) = probe_result {
150+
let configured = parse_pow_env().unwrap_or(0);
151+
if required > configured {
152+
return Err(PowRequirementUnmet { required, configured }.into());
153+
}
154+
}
155+
return Err(WaitForDmTimeout.into());
156+
}
157+
};
158+
```
159+
160+
The probe lives in `events::fetch_required_pow_with(client, mostro_pubkey)`
161+
— an owned-args sibling of `fetch_required_pow(ctx)`, used so the spawned
162+
future is `'static`. The 3 s `POW_PROBE_TIMEOUT` is now a safety net rather
163+
than the typical wait: in the common timeout case the probe is already
164+
resolved when we look at it, so the user-visible wait stays at
165+
`FETCH_EVENTS_TIMEOUT` (15 s) plus ~0 s, instead of doubling to 30 s as the
166+
naive sequential version would.
167+
168+
Add an `&Context` parameter? Look at the signature today —
169+
`wait_for_dm(ctx, order_trade_keys, sent_message)``ctx` is already
170+
passed. We just need to grant the helper access to `ctx.client` and
171+
`ctx.mostro_pubkey` (already does).
172+
173+
### 4.4 `add_bond_invoice` interplay
174+
175+
`add_bond_invoice` treats `WaitForDmTimeout` as the happy path (Mostro pays
176+
the invoice without acking over Nostr). The new variant must **not** be
177+
caught by that branch — a PoW failure on `add-bond-invoice` is *not* a
178+
successful submission. Concretely:
179+
180+
- The existing `Err(e) if e.downcast_ref::<WaitForDmTimeout>().is_some()`
181+
arm continues to match only `WaitForDmTimeout`.
182+
- A `PowRequirementUnmet` falls through to the generic `Err(e) => return
183+
Err(e)` arm and bubbles up to the user, which is the desired behavior.
184+
185+
No code change needed in `add_bond_invoice.rs` — the existing pattern match
186+
already handles the right split. We assert this with a regression note.
187+
188+
## 5. Alternatives considered
189+
190+
### 5.1 Preflight check (fail before sending)
191+
192+
Run `fetch_required_pow` *before* `send_dm`. Advantages: fail fast (no
193+
15‑second wait). Disadvantages:
194+
195+
- Adds a relay roundtrip to the happy path for every command.
196+
- Doubles "you didn't set PoW" errors for daemons that publish the tag but
197+
also accept PoW=0 (no real check on that side today).
198+
- Couples every command to the info event being reachable.
199+
200+
→ Rejected. Postflight has the right cost model: zero overhead when things
201+
work, and *only* pays for the info-event fetch in the failure path where the
202+
user is already waiting anyway.
203+
204+
### 5.2 Auto-mine the required PoW
205+
206+
Bump `WrapOptions.pow` to `max(configured, required)` instead of erroring.
207+
Tempting, but high PoW targets (e.g. 28+ bits) can take minutes to mine on
208+
a laptop, and the user wouldn't see *why* the CLI is suddenly chewing CPU.
209+
Better to fail explicitly with an actionable hint and let the user opt in
210+
by setting `POW` themselves.
211+
212+
→ Rejected. Defer to explicit user opt-in via `--pow` / `POW` env.
213+
214+
### 5.3 Read NIP‑01 `OK false` from the relay
215+
216+
Won't help: mostrod drops the event *after* the relay accepts it, so the
217+
relay returns `OK true`. There is no rejection signal on the publish path.
218+
219+
→ Rejected.
220+
221+
## 6. Acceptance criteria
222+
223+
(Reproducing the issue's checklist, with concrete bindings.)
224+
225+
- Sending an event from `mostro-cli` to a PoW‑enabled `mostrod` without
226+
satisfying PoW must surface an error that mentions PoW. Specifically the
227+
`PowRequirementUnmet { required, configured }` variant rendered as:
228+
`"This Mostro instance requires NIP-13 proof of work of N bits, but the
229+
client sent the event with M bits. Re-run with --pow N or set POW=N and
230+
try again."`
231+
- Genuine timeouts (daemon reachable, no PoW requirement, just slow / no
232+
reply) keep returning `WaitForDmTimeout` — its existing message stays.
233+
- `add_bond_invoice`'s "timeout is the happy path" arm only matches
234+
`WaitForDmTimeout`; a `PowRequirementUnmet` propagates and is shown to
235+
the user.
236+
- Older daemons that don't publish `["pow", ...]` in kind‑38385 behave
237+
exactly as before (helper returns `None` → generic timeout).
238+
239+
## 7. Test plan
240+
241+
Unit tests in `src/util/messaging.rs` and `src/util/events.rs`:
242+
243+
- `pow_requirement_unmet_display_mentions_required_and_configured`
244+
formatting check on the error message.
245+
- `fetch_required_pow_returns_none_when_tag_missing` — synthesize a
246+
kind-38385 event with no `pow` tag, assert `None`.
247+
- `fetch_required_pow_picks_newest_revision` — two events, newer with a
248+
different pow value; helper returns the newer value. Mirrors the existing
249+
`fetch_bond_claim_window_days` pattern.
250+
- `fetch_required_pow_parses_pow_tag` — single event with `["pow", "12"]`,
251+
assert `Some(12)`.
252+
253+
A manual end-to-end check (out of the unit-test scope) is to run against a
254+
PoW‑enabled local `mostrod`:
255+
256+
```sh
257+
POW=0 mostro-cli neworder ...
258+
# expected: PowRequirementUnmet error, not "deadline has elapsed"
259+
POW=<required> mostro-cli neworder ...
260+
# expected: normal flow, no error
261+
```
262+
263+
## 8. Out of scope
264+
265+
- Auto-mining the missing PoW (see 5.2).
266+
- Removing or restructuring the `POW` env var / `--pow` flag.
267+
- Surfacing the PoW requirement during `list-orders` / informational
268+
flows. The fix is scoped to the request/reply path where the missing
269+
PoW silently kills the flow.

0 commit comments

Comments
 (0)