Skip to content

feat(cbf): add auto-restart with exponential backoff#9

Closed
febyeji wants to merge 21 commits into
add-cbf-chain-sourcefrom
test-ci/add-cbf-auto-restart
Closed

feat(cbf): add auto-restart with exponential backoff#9
febyeji wants to merge 21 commits into
add-cbf-chain-sourcefrom
test-ci/add-cbf-auto-restart

Conversation

@febyeji
Copy link
Copy Markdown
Owner

@febyeji febyeji commented Apr 4, 2026

Extract build_cbf_node helper, auto-restart with backoff,
liveness check, scan state cleanup, per-block timeout, required_peers default to 1, test fixes

Fmt Bot and others added 21 commits March 29, 2026 02:10
…ll-request/patch

Automated nightly rustfmt (2026-03-29)
Extract the connection logic into `do_connect_peer_internal` and have
`do_connect_peer` act as a thin wrapper that always calls
`propagate_result_to_subscribers` with the result. This removes the
need to manually propagate at every error site, making the code less
error-prone.

Co-Authored-By: HAL 9000
Replace the split setuptools configuration (pyproject.toml + setup.cfg) with
a unified hatchling-based setup. This adds a [build-system] section pointing
to hatchling and a build hook (hatch_build.py) that marks wheels as
platform-specific since we bundle native shared libraries.

Hatchling includes all files in the package directory by default, which also
fixes the missing *.dll glob that setup.cfg had for Windows.

Bump requires-python from >=3.6 to >=3.8 as 3.6/3.7 are long EOL.

Co-Authored-By: HAL 9000
…ripts

Add `python_build_wheel.sh` which generates bindings and builds a
platform-specific wheel via `uv build`, and `python_publish_package.sh`
which publishes collected wheels via `uv publish`.

The intended workflow is to run the build script on each target platform
(Linux, macOS), collect the wheels, and then publish them in one go.

Co-Authored-By: HAL 9000
Replace `actions/setup-python` with `astral-sh/setup-uv` and use `uv
run` to run tests.

Co-Authored-By: HAL 9000
Replace the synchronous, blocking `std::net::ToSocketAddrs::to_socket_addrs()`
calls with async `tokio::net::lookup_host` to avoid blocking the tokio
runtime during DNS resolution.

Additionally, instead of only using the first resolved address, we now
iterate over all resolved addresses and try connecting to each in
sequence until one succeeds. This improves connectivity for hostnames
that resolve to multiple addresses (e.g., dual-stack IPv4/IPv6).

Co-Authored-By: HAL 9000
Enforce HTTPS for non-localhost URLs per LNURL spec and disable
redirect following since the auth flow is a single GET request.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…v-python-build

Switch to `uv` python build system
…sync-lookup

Switch to async DNS resolution
.. we previously dropped the pin when moving MSRV to 1.85, but it seems
that is not sufficient anymore..
…adapter-pin

Re-pin `idna_adapter` to for MSRV builds
Move the GC pass from after insertion to before, so that stale entries
are reclaimed before allocating a new bucket. This avoids unnecessary
growth of the user map between GC cycles.

AI tools were used in preparing this commit.
Run rate limiter garbage collection before inserting new user
Squashed base CBF commits (rebased onto upstream/main):
- Add optional fee source from esplora/electrum
- Add BIP 157 compact block filter chain source
- Add CBF integration tests and documentation
- Fix CBF chain source build errors and UniFFI bindings
- Remove last_synced_height from cbf
Preparation for auto-restart: extract bip157 node build logic into a
reusable helper method, add chain_state() from wallet checkpoint to avoid
genesis re-sync, and thread Arc<Wallet> through start().

AI: claude
When node.run() exits (e.g. NoReachablePeers from kyoto lightningdevkit#558), the
background task rebuilds the node, swaps the requester, and respawns
channel processing tasks, up to MAX_RESTART_RETRIES (5) attempts with
doubling backoff starting at 500ms.

- Change cbf_runtime_status from Mutex<> to Arc<Mutex<>> so it can be
  shared with the async restart loop
- Extract build_cbf_node_static() that takes explicit params instead of
  &self, enabling calls from 'static async blocks
- Move all task spawning (info/warn/event + node.run) into a single
  restart loop inside spawn_background_task

AI: claude
requester() now checks is_running() to give callers an immediate failure
signal instead of waiting for SendError to propagate through the channel.
Extract cleanup_scan_state() helper and call it on error paths in
run_filter_scan() to prevent stale watched_scripts, matched_block_hashes,
and filter_skip_height from leaking between scans.
Wrap individual get_block() calls in tokio::time::timeout using the
existing per_request_timeout_secs config. Previously only the overall sync
had a timeout; individual block fetches could hang indefinitely (kyoto lightningdevkit#556).
@febyeji febyeji closed this Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants