Skip to content

htlcswitch+channeldb: batch fwd pkg loads at startup#10826

Open
Roasbeef wants to merge 3 commits into
lightningnetwork:masterfrom
Roasbeef:batched-fwdpkg-load
Open

htlcswitch+channeldb: batch fwd pkg loads at startup#10826
Roasbeef wants to merge 3 commits into
lightningnetwork:masterfrom
Roasbeef:batched-fwdpkg-load

Conversation

@Roasbeef
Copy link
Copy Markdown
Member

In this PR, we rework the switch's startup reforward path so that all
forwarding packages across the active channel set are loaded in a single
read transaction, instead of opening one read tx per channel. For nodes
with 1k+ channels, the old per-channel-tx approach can take a while at
startup, and the cost gets dramatically worse on remote-tx backends
(etcd, postgres) where each tx is a network round-trip.

The shape of the change is small. We extend the GlobalFwdPkgReader
interface (and its SwitchPackager implementation) with a new
LoadChannelFwdPkgsSet(tx, sources) query that walks the
fwd-packages root bucket once and returns a flat []*FwdPkg across
the requested set. Each FwdPkg already carries its originating
channel via its Source field, and reforwardSettleFails already
keys off fwdPkg.Source for ack/forward routing, so a flat slice is
all the switch needs.

Switch.reforwardResponses now collects the set of non-pending,
non-local channels up front and hands the whole set to the batched
query, replacing the prior loop that called loadChannelFwdPkgs (and
opened a fresh kvdb.View) per channel.

Benchmarks

A new pair of benchmarks in channeldb exercises the prior
per-channel-tx path against the new batched query across a matrix of
channel-count + pkgs-per-channel scenarios, so we can compare both via
benchstat. On a local bolt backend the per-iteration time is roughly
at parity for both variants (bolt read tx setup is cheap, so
amortizing it across channels doesn't move the needle much), while the
batched path consistently drops ~10% of allocs/op and ~4% of B/op
across the matrix. The bigger win lands on the remote-tx backends
where the per-tx network round-trip dominates.

Sample benchstat output on darwin/arm64 (M4 Max), -benchtime=2s -count=6:

                                           │  old (per-tx)  │      new (batched)       │
                                           │     sec/op     │   sec/op    vs base      │
LoadChannelFwdPkgs/channels=100/pkgs=1-16     352.4µ ±  5%   341.9µ ± 3%       ~
LoadChannelFwdPkgs/channels=500/pkgs=1-16     1.954m ±  3%   1.959m ± 3%       ~
LoadChannelFwdPkgs/channels=1000/pkgs=1-16    4.036m ±  6%   3.990m ± 7%       ~
LoadChannelFwdPkgs/channels=2000/pkgs=1-16    8.819m ± 18%   8.120m ± 44%      ~
LoadChannelFwdPkgs/channels=1000/pkgs=4-16    15.86m ±  4%   17.11m ± 32%      ~
geomean                                       3.296m         3.265m        -0.93%

                                           │     B/op      │    B/op     vs base   │
LoadChannelFwdPkgs/channels=1000/pkgs=1-16   11.57Mi ± 0%   11.04Mi ± 0%  -4.61%
geomean                                      9.478Mi        9.107Mi       -3.91%

                                           │   allocs/op   │ allocs/op   vs base   │
LoadChannelFwdPkgs/channels=1000/pkgs=1-16    160.4k ± 0%   144.4k ± 0%   -9.97%
geomean                                       123.3k        112.5k        -8.74%

See each commit message for a detailed description w.r.t the
incremental changes.

Test plan

  • Existing channeldb tests pass (go test ./channeldb/...).
  • New TestSwitchPackagerLoadChannelFwdPkgsSet covers correctness:
    batched output matches the per-channel concatenation, channels with
    no fwd pkgs on disk are silently skipped, each pkg's Source tracks
    its originating channel, and an empty input is a no-op.
  • htlcswitch builds and existing reforward path is exercised by
    the integration of reforwardResponses with the new query.
  • Benchmarks compile and run via
    go test -run NONE -bench BenchmarkLoadChannelFwdPkgs ./channeldb/.

Roasbeef added 3 commits May 20, 2026 16:52
In this commit, we extend `GlobalFwdPkgReader` (and its `SwitchPackager`
implementation) with a new `LoadChannelFwdPkgsSet` method that loads
fwd pkgs for a set of channels within a single read transaction.

The existing `LoadChannelFwdPkgs` is per-channel, so callers that want
to read fwd pkgs for the entire active channel set end up opening one
read tx per channel. On nodes with 1k+ channels, this gets expensive
fast, especially on remote-tx backends (etcd, postgres) where each tx
carries network round-trip overhead.

The new query takes a `[]lnwire.ShortChannelID`, walks the
`fwd-packages` root bucket once, and returns a flat `[]*FwdPkg` across
the requested set. Each `FwdPkg` already carries its originating
channel via its `Source` field, so callers that only need to operate
on the pkgs (e.g. the switch's reforward path) don't need a keyed map.
Channels with no fwd pkgs on disk are silently skipped.
In this commit, we add a pair of benchmarks that exercise the prior
per-channel-tx path against the new batched `LoadChannelFwdPkgsSet`
query, across a matrix of channel-count + pkgs-per-channel scenarios
(up to 2k channels). The bench names are structured so they can be
paired up under a common name and fed to benchstat.

On a local bolt backend the per-iteration time is roughly at parity
for both variants (bolt read tx setup is cheap, so amortizing it
across channels doesn't move the needle much), while the batched path
consistently drops ~10% of allocs/op and ~4% of B/op across the
matrix. The bigger win lands on the remote-tx backends (etcd,
postgres) where the per-tx network round-trip dominates and the
batched query collapses N round-trips into one.

Sample benchstat output on darwin/arm64 (M4 Max), `-benchtime=2s
-count=6`:

```
                                           │  old (per-tx)  │      new (batched)       │
                                           │     sec/op     │   sec/op    vs base      │
LoadChannelFwdPkgs/channels=100/pkgs=1-16     352.4µ ±  5%   341.9µ ± 3%       ~
LoadChannelFwdPkgs/channels=500/pkgs=1-16     1.954m ±  3%   1.959m ± 3%       ~
LoadChannelFwdPkgs/channels=1000/pkgs=1-16    4.036m ±  6%   3.990m ± 7%       ~
LoadChannelFwdPkgs/channels=2000/pkgs=1-16    8.819m ± 18%   8.120m ± 44%      ~
LoadChannelFwdPkgs/channels=1000/pkgs=4-16    15.86m ±  4%   17.11m ± 32%      ~
geomean                                       3.296m         3.265m        -0.93%

                                           │     B/op      │    B/op     vs base   │
LoadChannelFwdPkgs/channels=1000/pkgs=1-16   11.57Mi ± 0%   11.04Mi ± 0%  -4.61%
geomean                                      9.478Mi        9.107Mi       -3.91%

                                           │   allocs/op   │ allocs/op   vs base   │
LoadChannelFwdPkgs/channels=1000/pkgs=1-16    160.4k ± 0%   144.4k ± 0%   -9.97%
geomean                                       123.3k        112.5k        -8.74%
```
In this commit, we rework `reforwardResponses` to load fwd pkgs for
every active channel in a single read transaction via the new
`SwitchPackager.LoadChannelFwdPkgsSet` query, instead of opening one
read tx per channel.

The prior loop opened a fresh `kvdb.View` for each channel returned by
`FetchAllChannels` so that it could call `LoadChannelFwdPkgs(source)`.
On nodes with 1k+ channels this adds up to 1k+ read txs at startup,
which is particularly painful on remote-tx backends (etcd, postgres)
where each tx is a network round-trip. We now collect the set of
non-pending, non-local sources up front and hand the whole set to
`LoadChannelFwdPkgsSet`, which walks the `fwd-packages` bucket once
and returns a flat slice of fwd pkgs across all channels.

`reforwardSettleFails` already keys off `fwdPkg.Source` for its
ack/forward routing, so flattening the result doesn't lose any
information: each pkg still knows the channel it came from.
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request optimizes the startup process of the htlcswitch by batching the loading of forwarding packages. By moving from a per-channel read transaction model to a single batched query, the system reduces the number of database round-trips required during startup. This change is particularly beneficial for nodes with a large number of channels or those utilizing remote database backends where transaction latency is a significant bottleneck.

Highlights

  • Batched Forwarding Package Loading: Introduced a new LoadChannelFwdPkgsSet method to the GlobalFwdPkgReader interface, allowing all forwarding packages for a set of channels to be loaded within a single database read transaction.
  • Performance Optimization: Optimized the switch's startup reforwarding path by replacing the per-channel transaction loop with a single batched query, significantly reducing overhead especially on remote database backends like etcd or Postgres.
  • Benchmark Coverage: Added comprehensive benchmarks in channeldb to compare the performance of the legacy per-transaction approach against the new batched strategy across various channel counts.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a batched method for loading forwarding packages, LoadChannelFwdPkgsSet, designed to improve performance for nodes with a large number of channels by reducing database transaction overhead. The htlcswitch has been updated to utilize this new method during the re-forwarding of responses on startup, and comprehensive benchmarks and unit tests have been added to verify the implementation. Feedback focuses on optimizing the batched load logic by reducing redundant bucket lookups, utilizing the standard library for integer-to-string conversion in tests, and ensuring cross-platform compatibility for file paths.

}

for _, height := range heights {
fwdPkg, err := loadFwdPkg(fwdPkgBkt, source, height)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of loadChannelFwdPkgsSet calls loadFwdPkg for each height, which internally re-fetches the sourceBkt from fwdPkgBkt using source. Since sourceBkt is already retrieved in the outer loop (line 656), this results in redundant bucket lookups for every forwarding package. For nodes with many channels and packages, this can add up, especially on remote backends where even in-transaction lookups may have overhead. Consider refactoring loadFwdPkg to accept the sourceBkt directly to avoid this redundancy.

Comment on lines +162 to +176
func itoa(n int) string {
if n == 0 {
return "0"
}

var buf [20]byte
i := len(buf)
for n > 0 {
i--
buf[i] = byte('0' + n%10)
n /= 10
}

return string(buf[i:])
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Reimplementing itoa is unnecessary as the standard library provides strconv.Itoa. Using the standard library is more idiomatic, robust (e.g., handling negative numbers), and easier to maintain. Since this is a test file, the overhead of an additional import is negligible and preferred over custom implementations of basic utilities.

func makeFwdPkgBenchDB(b *testing.B) kvdb.Backend {
b.Helper()

path := b.TempDir() + "/fwdpkg.db"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use filepath.Join instead of manual string concatenation for file paths. This ensures the code is cross-platform compatible and correctly handles path separators across different operating systems.

Suggested change
path := b.TempDir() + "/fwdpkg.db"
path := filepath.Join(b.TempDir(), "fwdpkg.db")

@github-actions github-actions Bot added the severity-critical Requires expert review - security/consensus critical label May 21, 2026
@github-actions
Copy link
Copy Markdown

🔴 PR Severity: CRITICAL

Automated classification | 4 files (2 non-test) | 413 lines changed (127 non-test)

🔴 Critical (2 files)
  • channeldb/forwarding_package.go - Channel state persistence (channeldb/*)
  • htlcswitch/switch.go - HTLC forwarding / payment routing state machine (htlcswitch/*)
Excluded (tests) (2 files)
  • channeldb/forwarding_package_bench_test.go - benchmark test (excluded from severity count)
  • channeldb/forwarding_package_test.go - unit test (excluded from severity count)

Analysis

Both non-test files in this PR touch CRITICAL packages:

  1. channeldb/forwarding_package.go - channeldb/* is critical as it manages channel state persistence; the forwarding package tracks on-disk HTLC add/settle/fail circuit records across restarts.

  2. htlcswitch/switch.go - htlcswitch/* is critical as it implements the HTLC forwarding and payment routing state machine.

The PR modifies two distinct critical packages simultaneously, which satisfies the multi-critical-package bump rule (though already at the ceiling). This warrants expert review to ensure the interaction between the forwarding package persistence changes and the switch logic remains consistent and safe.


To override, add a severity-override-{critical,high,medium,low} label.
<!-- pr-severity-bot -->

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

severity-critical Requires expert review - security/consensus critical

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant