Skip to content

io_uring: add write_buffer_high_watermark for io_uring sockets#44666

Open
aburan28 wants to merge 4 commits into
envoyproxy:mainfrom
aburan28:aburan28/io-uring-write-watermark
Open

io_uring: add write_buffer_high_watermark for io_uring sockets#44666
aburan28 wants to merge 4 commits into
envoyproxy:mainfrom
aburan28:aburan28/io-uring-write-watermark

Conversation

@aburan28
Copy link
Copy Markdown

@aburan28 aburan28 commented Apr 27, 2026

Commit Message:
Expose IoUringOptions.write_buffer_high_watermark to bound the size of
IoUringServerSocket::write_buf_.

Without a cap, the upper layer always sees writes as fully accepted (because
the io_uring socket is async and stages bytes internally), so connection-level
back-pressure — and overload protections that depend on it, e.g. HTTP flood
protection — never engages.

When the new field is set to a non-zero value, IoUringServerSocket::write
returns a short write once write_buf_ reaches the configured size, and
IoUringSocketHandleImpl::write reports the actual bytes moved (rather than
the original buffer length). The upper layer's connection-level write buffer
holds the remainder, where the existing watermark mechanism applies. After
the in-flight write completes and the buffer drops below the threshold, an
injected Write completion is delivered so the upper layer retries.

The default of 0 disables the cap and preserves the previous (uncapped)
behavior. Resolves the existing TODO in IoUringServerSocket referencing the
IntegrationTest.TestFloodUpstreamErrors timeout.

Additional Description:
This is a draft PR opened for review and discussion. The new field is opt-in
(default 0) so existing deployments are unaffected.

AI usage disclosure: Portions of the code and/or PR description were drafted
with the assistance of Claude (Anthropic). I reviewed and understand all
submitted code.

Risk Level: Low — opt-in via a new config field; default-0 path preserves the
previous (uncapped) behavior. Behavior change only kicks in when the field is
set to a non-zero value.

Testing:
Added unit tests (IoUringWorkerImplTest.WriteBufferHighWatermark and
WriteSliceBufferHighWatermark) for both write overloads, exercising the
cap, the short-write reporting via the socket handle, and the back-pressure
release (injected Write completion) when the in-flight write completes.

Docs Changes:
New field is documented inline in
api/envoy/extensions/network/socket_interface/v3/default_socket_interface.proto.

Release Notes:
Added a new_features entry in changelogs/current.yaml under io_uring.

Platform Specific Features:
io_uring (Linux >= 5.11). No platform support change beyond the existing
io_uring build gating.

Runtime guard: N/A. The new field defaults to 0, which is the previous
behavior; the config field itself is the gate.

API Considerations:
New optional field on IoUringOptions
(api/envoy/extensions/network/socket_interface/v3/default_socket_interface.proto).
Default of 0 preserves prior behavior. No removed fields, no breaking
changes.

Expose ``IoUringOptions.write_buffer_high_watermark`` to bound the size of
``IoUringServerSocket::write_buf_``. Without a cap, the upper layer always
sees writes as fully accepted (because the io_uring socket is async and
stages bytes internally), so connection-level back-pressure -- and overload
protections that depend on it, e.g. HTTP flood protection -- never engages.

When the new field is set to a non-zero value, ``IoUringServerSocket::write``
returns a short write once ``write_buf_`` reaches the configured size, and
``IoUringSocketHandleImpl::write`` reports the actual bytes moved (rather
than the original buffer length). The upper layer's connection-level write
buffer holds the remainder, where the existing watermark mechanism applies.
After the in-flight write completes and the buffer drops below the
threshold, an injected Write completion is delivered so the upper layer
retries.

The default of 0 disables the cap and preserves the previous (uncapped)
behavior. Resolves the existing TODO in ``IoUringServerSocket`` referencing
the ``IntegrationTest.TestFloodUpstreamErrors`` timeout.

Risk Level: Low
Testing: Added unit tests for the buffer- and slice-based ``write``
overloads, including the back-pressure release on completion.
Docs Changes: New field documented inline in the proto.
Release Notes: Added.
Platform Specific Features: io_uring (Linux >=5.11)

Signed-off-by: Adam Buran <aburan28@gmail.com>
@aburan28 aburan28 had a problem deploying to external-contributors April 27, 2026 00:31 — with GitHub Actions Error
@repokitteh-read-only
Copy link
Copy Markdown

Hi @aburan28, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #44666 was opened by aburan28.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #44666 was opened by aburan28.

see: more, trace.

@aburan28 aburan28 marked this pull request as ready for review April 27, 2026 01:29
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #44666 was ready_for_review by aburan28.

see: more, trace.

@ravenblackx
Copy link
Copy Markdown
Contributor

@aburan can you please merge main to resolve the conflict? (We recognize merge conflicts on this file are a pain and are working on making it less annoying, but until that's done, this annoyance is an unfortunate part of getting a PR merged.)

…rite-watermark

# Conflicts:
#	changelogs/current.yaml
@aburan28 aburan28 had a problem deploying to external-contributors May 4, 2026 22:50 — with GitHub Actions Error
Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Left a high-level question, as I'm not very familiar with io_uring.

// asynchronously. If the remote stops reading, the io_uring write operation may never complete.
// The operation is canceled and the socket is closed after the timeout. The default is 1000.
google.protobuf.UInt32Value write_timeout_ms = 4;
// High watermark in bytes for the io_uring socket's internal write buffer. Because io_uring
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question:
Is this actually a bug in the way io_uring is used in Envoy?
Would it make more sense to have the same watermark threshold values that are used for the rest of Envoy's flow-control mechanism (in other words, why should io_uring be considered something different than the non-io_uring socket)?

cc @KBaichoo who may have more context.

@kyessenov
Copy link
Copy Markdown
Contributor

Needs main merge.
/wait

aburan28 added 2 commits May 14, 2026 22:23
Signed-off-by: Adam Buran <aburan28@gmail.com>
Signed-off-by: Adam Buran <aburan28@gmail.com>

# Conflicts:
#	test/common/io/io_uring_worker_impl_test.cc
@aburan28 aburan28 requested a deployment to external-contributors May 15, 2026 05:29 — with GitHub Actions Waiting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants