Skip to content

experiment: enable tokio eager driver handoff (3.7.0 follow-up)#230

Open
vadv wants to merge 2 commits intofeature/3.7.0-rust-tokio-editionfrom
experiment/eager-driver-handoff
Open

experiment: enable tokio eager driver handoff (3.7.0 follow-up)#230
vadv wants to merge 2 commits intofeature/3.7.0-rust-tokio-editionfrom
experiment/eager-driver-handoff

Conversation

@vadv
Copy link
Copy Markdown
Collaborator

@vadv vadv commented Apr 30, 2026

Goal

Measure whether Tokio 1.52's opt-in Builder::enable_eager_driver_handoff (#8010) reduces p99/p999 latency for pg_doorman under heavy concurrency.

What this branch does

Branched off feature/3.7.0-rust-tokio-edition (PR #229). On top of the 3.7.0 base it:

  • Adds .cargo/config.toml with --cfg tokio_unstable (required by the API).
  • Calls .enable_eager_driver_handoff() on the multi-thread runtime in src/app/server.rs.

That is the entire diff (3 lines added across 2 files).

What we are testing

The mechanism: a worker which is about to poll a task and was previously parked on the I/O / time driver wakes another worker first and hands the driver off, so the notified worker can take ownership before this one starts polling. This avoids a pathology where one worker holds the driver through a long poll and blocks socket-readiness wakeups for tasks running on the others.

For a TCP-heavy connection pooler like pg_doorman this is the most directly applicable change in tokio 1.49–1.52 (vectored writes #7871 require migrating off write_all; sharded spawn_blocking #7757 was reverted in 1.52.1).

Validation plan

  • Ubicloud Benchmarks workflow on this branch with default profile (standard-30, bench_duration=30s, doorman_workers=auto, pgbench_jobs=4).
  • Compare against the 3.7.0 baseline (PR release: 3.7.0 — Rust 1.95, tokio 1.52, edition 2024 #229 head) using the same workflow.
  • Decision: keep + promote into 3.7.0, tune (e.g. only enable above N worker_threads), or revert if numbers do not justify carrying tokio_unstable.

Risk note

  • tokio_unstable is explicitly opt-in; the API may change or be removed in minor releases. Sister change in the same release window — sharded spawn_blocking (#7757) — was added in 1.52.0 and reverted in 1.52.1 due to a hang regression (#8056). Treat as experiment.
  • No user-facing behavior change beyond runtime scheduling; no config or wire-protocol changes.

Do not merge before #229.

dmitrivasilyev added 2 commits April 30, 2026 16:16
A worker holding the I/O / time driver while polling a long task can stall
socket readiness on other workers. Tokio 1.52 added an opt-in eager handoff
on the multi-threaded runtime that wakes a previously parked worker before
polling and lets it own the driver. We turn it on under tokio_unstable to
see whether p99/p999 latency improves under heavy concurrency.

Pure experiment branched off feature/3.7.0-rust-tokio-edition. The decision
to keep, tune, or revert depends on the Ubicloud benchmark vs. the 3.7.0
baseline.
Both pkg/rpm/pg-doorman.spec and debian/rules regenerate .cargo/config.toml
to point at vendored sources before cargo build, which silently overwrites
the rustflags entry committed for this experiment. As a result COPR and
Launchpad builds failed to find Builder::enable_eager_driver_handoff.

Re-deliver the cfg via RUSTFLAGS in both build scripts so the unstable API
is available regardless of how the project's .cargo/config.toml is rewritten.
Local cargo, Dockerfile and Ubicloud bench are unaffected — they pick up the
shipped .cargo/config.toml as-is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant