Skip to content

[Epic] LN payments stability on mainnet #1003

@piotr-iohk

Description

@piotr-iohk

Context

Users and team members have reported unreliable Lightning payments on mainnet, particularly for larger amounts (see sub-issues). Until recently we had no continuous, objective signal to measure this — reports were anecdotal and hard to verify or track over time.

Since June 2026 we run nightly mainnet jobs in bitkit-nightly, giving us daily data:

  • Mainnet lightning probe — routes test payments (1k / 25k / 50k sats) from a Bitkit wallet (built from master, ~100k outbound via Blocktank channel) to external targets: Cake, WoS, Strike, Blitz, Blink, sats.mobi (invoice) and Boltz Mini, Megalith LSP, Mullvad VPN, Phoenix/ACINQ (keysend)
  • Mainnet nightly E2E — real LN payment (WoS) + CJIT/channel-order flows against release APKs

Results are posted daily to #bitkit-nightly-results.

This epic tracks diagnosing and fixing the reliability issues the probes surface, using the nightly results as the measure of progress.

Current signal (as of 2026-06-10)

Consistent pattern across recent nightly runs:

Amount Result
1k sats ✅ succeeds to all targets, every night
25k sats ⚠️ intermittent failures to Cake, WoS, Blitz
50k sats ❌ fails to Cake, WoS, Blitz nearly every night

Strike, Blink, sats.mobi and all keysend targets pass at all amounts. The nightly E2E real payment (WoS) also passes nightly. This points towards route liquidity / pathfinding limitations for higher amounts to specific destinations rather than a general payment-code regression — root cause analysis needed (probe diagnostics artifacts are attached to each workflow run).

A likely contributor is the pathfinding scorer: Bitkit uses an external scorer (Blocktank scorer-prod) on top of LDK's ProbabilisticScorer, and stale or overly pessimistic liquidity estimates would produce exactly this pattern — small amounts route fine while higher amounts to the same destination find no acceptable path, even when liquidity exists. Scorer data quality/freshness and penalty configuration should be part of the root cause analysis.

Note the timeline of existing reports: #790 and #761 (both Feb 2026) describe the same class of failure — sends above a destination-dependent threshold failing — months before the probes went live. This looks like a long-standing liquidity/pathfinding limitation, not a recent regression.

Goal / exit criteria

  • Root cause identified for the recurring 25k/50k failures (scorer data/config vs. app-side pathfinding vs. network-side liquidity), with findings documented
  • Fixes or mitigations shipped where the cause is on our side (scorer / app / ldk-node / Blocktank config)
  • Probe runs fully green (all targets, all amounts) for 7 consecutive nights

Sub-issues

Related (not sub-issues)

Notes

  • Issue is likely not platform-specific (shared ldk-node / bitkit-core / Blocktank layers, incl. the external scorer); tracked here since the probe APK is built from this repo and most existing reports live here. iOS-specific manifestations should be filed in bitkit-ios and cross-linked (see ios#449).
  • New probe targets or reporting changes can be requested in bitkit-nightly.

Metadata

Metadata

Assignees

Labels

epicGroup of tickets

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions