Skip to content

io_uring improvements#167

Merged
ioquatix merged 2 commits into
socketry:mainfrom
tavianator:io-uring-improvements
May 12, 2026
Merged

io_uring improvements#167
ioquatix merged 2 commits into
socketry:mainfrom
tavianator:io-uring-improvements

Conversation

@tavianator
Copy link
Copy Markdown
Contributor

Some io_uring changes, feel free to take some or all of these.

I tried also adding IORING_SETUP_DEFER_TASKRUN but it didn't pass the test suite, I assume because with SINGLE_ISSUER we're waiting with ppoll() instead of io_uring_enter() (and thus some task_work never runs).

  • Handle short io_uring submissions
  • Try setting up io_uring with IORING_SETUP_SUBMIT_ALL
  • Use IORING_SETUP_SINGLE_ISSUER if possible

Types of Changes

  • Performance improvement.
  • Maintenance.

Contribution

@samuel-williams-shopify
Copy link
Copy Markdown
Contributor

samuel-williams-shopify commented May 12, 2026

Took a look at this against current main (which now has #166 merged — the eventfd wakeup + SINGLE_ISSUER + DEFER_TASKRUN + TASKRUN_FLAG work). Reviewing the three commits:

b57476a — Handle short io_uring submissions → take

Genuine, important bug fix that's independent of #166. The old submit_flush assumes io_uring_submit is all-or-nothing and resets pending = 0 on any non-negative return, but io_uring_submit returns the number of submitted SQEs and can be short. Without SUBMIT_ALL an SQE prep error aborts the rest of the batch; even with it, ENOMEM / transient EAGAIN can shorten it. The fix (move the increment into io_get_sqe, loop decrementing by the actual submitted count) is the right approach.

7a468e7IORING_SETUP_SUBMIT_ALLtake

Strictly additive. The EINVAL retry-without-flag pattern is also nice — we can lift the same shape later for the DEFER_TASKRUNCOOP_TASKRUN fallback on kernel 5.19–6.0.

96f899cSINGLE_ISSUER + register_eventfd + ppoll wait → skip (#166 supersedes)

Solves the same problem as #166 but with register_eventfd + ppoll(). Your own commit message notes the limitation: "I tried also adding IORING_SETUP_DEFER_TASKRUN but it didn't pass the test suite, I assume because with SINGLE_ISSUER we're waiting with ppoll() instead of io_uring_enter() (and thus some task_work never runs)." — exactly right. #166 instead submits an async read on the interrupt fd via the ring and waits with io_uring_wait_cqe_timeout (which is io_uring_enter(GETEVENTS)), so DEFER_TASKRUN works.

Interaction with #166

Rebasing commits 1 and 2 onto main is mostly clean, but there's a one-line interaction bug: #166 added a manual selector->pending += 1; in select_internal_without_gvl (because the old io_get_sqe didn't increment). After b57476a moves the increment into io_get_sqe, that manual line becomes a double-count. The duplicated pending keeps io_uring_submit_flush busy-looping (submit returns 0 because the SQ is empty after wait_cqe_timeout drained it internally, but pending > 0 from the double-count). Caught it on the first benchmark — full hang.

I've rebased the branch onto current main, dropped commit 3, and folded the one-line fix into commit 1 (since that's the commit that introduced the new accounting model). Force-pushed via maintainer-edit.

Benchmark on hana (Linux 6.19.11, Ruby 3.4.9, dedicated)

main with #167 rebased
select(0) empty, tight loop (200k iter, n=5) ~56 ns/call ~56 ns/call
Cross-thread wakeup roundtrip (n=12) 11.53 µs ± 1.09 10.57 µs ± 0.76
Idle wakeup (n=12) 1.55 µs ± 0.12 1.58 µs ± 0.16

No regression on existing benchmarks (roundtrip is within the run-to-run noise envelope I've seen for #166). The wins from these commits are correctness under error conditions — short submits, ENOMEM, SQE prep errors — which the microbenchmarks don't exercise.

@samuel-williams-shopify samuel-williams-shopify force-pushed the io-uring-improvements branch 2 times, most recently from 3db30c7 to b06e10c Compare May 12, 2026 05:23
tavianator and others added 2 commits May 12, 2026 14:29
io_uring does not necessarily submit all the pending SQEs when you call
io_uring_submit().  In particular, in the default configuration, an
error while processing an SQE will cause the rest of the batch to be
aborted.  Even with IORING_SETUP_SUBMIT_ALL, some errors (like ENOMEM)
will still lead to a short submit.

Fix the accounting to keep selector->pending equal to the number of
unsubmitted SQEs at all times, so that the entire queue can be flushed
reliably.

Co-authored-by: Cursor <cursoragent@cursor.com>
This decreases the likelihood of a short submit.

Co-authored-by: Cursor <cursoragent@cursor.com>
@ioquatix ioquatix merged commit a0c57a1 into socketry:main May 12, 2026
27 of 30 checks passed
@ioquatix
Copy link
Copy Markdown
Member

Thanks for your contributions!

@samuel-williams-shopify samuel-williams-shopify added this to the v1.16.0 milestone May 12, 2026
@tavianator tavianator deleted the io-uring-improvements branch May 12, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants