Context
Connection::poll (swarm/src/connection.rs:274) is a fixed-point loop
that re-polls every child sub-state-machine whenever any one returns
Ready, and only yields once every child returns Pending
simultaneously. With no upper bound on iterations per call, single
invocations regularly cross tokio's 50 µs slow-poll threshold —
holding a tokio worker thread back from timers, the I/O reactor, and
colocated tasks for hundreds of microseconds at a time.
Findings
120-cell sweep (N ∈ {1,2,5,10,15} × RTT ∈ {0,5,25,50,100,200} ms × payload ∈ {4,16,64,256} KiB) — real TCP, per-peer netns + tc netem,
single-worker tokio runtime, taskset-pinned per process:
- 106 / 120 cells have ≥ 1 % of
Connection::poll invocations
cross 50 µs. The 14 cells under 1 % are all at 4 KiB / N ≤ 2.
- Worst cells (256 KiB / RTT=0 / N ≥ 2): 32–36 % of polls slow,
p99 = 350–375 µs (~7× the threshold), and ~80 % of measurement
wall-clock is spent inside slow polls.
- Idle isn't clean either: 64 KiB / N=1 / RTT=0 = ~8 % slow ratio.
- Swarm::poll is fine: peak 2.1 %, mean 0.22 %, under 1 % in 119/120
cells. The cost is inside Connection::poll, not the event loop
above it.
Reproducing
Full methodology, source, raw CSVs, and rendered heatmaps live in
latency-benchmark/.
One-liner: ./latency-benchmark/sweep.sh on Linux. sudo is only
required for tc/netns; RTT=0 cells skip both and run as a regular
user. ~50–60 min for the default grid.
Possible direction
A tokio-coop-style work budget on the inner loop — bound iterations
per Connection::poll invocation, self-wake on saturation — would cap
per-poll duration without changing observable behaviour beyond one extra
scheduling round per saturated poll. Sketched in
the bench README.
The specific budget value is empirical; we don't have a strong preference.
Happy to refine the methodology, add measurements, or produce
before/after numbers once a fix shape lands.
Context
Connection::poll(swarm/src/connection.rs:274) is a fixed-point loopthat re-polls every child sub-state-machine whenever any one returns
Ready, and only yields once every child returnsPendingsimultaneously. With no upper bound on iterations per call, single
invocations regularly cross tokio's 50 µs slow-poll threshold —
holding a tokio worker thread back from timers, the I/O reactor, and
colocated tasks for hundreds of microseconds at a time.
Findings
120-cell sweep (
N ∈ {1,2,5,10,15} × RTT ∈ {0,5,25,50,100,200} ms × payload ∈ {4,16,64,256} KiB) — real TCP, per-peer netns +tc netem,single-worker tokio runtime,
taskset-pinned per process:Connection::pollinvocationscross 50 µs. The 14 cells under 1 % are all at 4 KiB / N ≤ 2.
p99 = 350–375 µs (~7× the threshold), and ~80 % of measurement
wall-clock is spent inside slow polls.
cells. The cost is inside
Connection::poll, not the event loopabove it.
Reproducing
Full methodology, source, raw CSVs, and rendered heatmaps live in
latency-benchmark/.One-liner:
./latency-benchmark/sweep.shon Linux.sudois onlyrequired for
tc/netns; RTT=0 cells skip both and run as a regularuser. ~50–60 min for the default grid.
Possible direction
A tokio-
coop-style work budget on the inner loop — bound iterationsper
Connection::pollinvocation, self-wake on saturation — would capper-poll duration without changing observable behaviour beyond one extra
scheduling round per saturated poll. Sketched in
the bench README.
The specific budget value is empirical; we don't have a strong preference.
Happy to refine the methodology, add measurements, or produce
before/after numbers once a fix shape lands.