env/posix: improve io_uring init failure diagnostics in ReadAsync#14736
env/posix: improve io_uring init failure diagnostics in ReadAsync#14736krhancoc wants to merge 1 commit into
Conversation
When io_uring_queue_init fails, ReadAsync previously returned a single
generic IOStatus::NotSupported("ReadAsync: failed to init io_uring")
regardless of why the ring could not be created. ENOMEM (memlock quota
exhausted) and ENOSYS (kernel has no io_uring) have very different
remediation paths, yet both produced identical, opaque error messages.
Changes:
1. CreateIOUring (env/io_posix.h)
- Add optional int* err_out parameter so callers receive the raw
positive errno from io_uring_queue_init.
- Log to stderr (was stdout) so the message is captured by log
pipelines that suppress stdout.
- Emit a dedicated message for ENOMEM that names RLIMIT_MEMLOCK as
the likely cause and suggests raising it.
2. ReadAsync (env/io_posix.cc)
- Capture the errno via the new err_out parameter.
- Map errno to a precise IOStatus:
ENOMEM -> IOError (resource exhaustion; io_uring IS
supported, resources are just depleted)
ENOSYS / EPERM -> NotSupported (kernel/permissions barrier)
other non-zero -> IOError with the strerror description
zero (TLS null) -> NotSupported (io_uring disabled at file-open)
- Using IOError rather than NotSupported for ENOMEM is intentional:
callers that treat NotSupported as a silent "retry sync" signal
would otherwise hide the resource-exhaustion problem from operators.
3. MultiRead (env/io_posix.cc)
- Update CreateIOUring() call to pass &init_err so the improved
ENOMEM log line fires when MultiRead's ring init fails (no
behavioral change; MultiRead already falls back to serialized
reads on init failure).
This is motivated by a flaky CI failure (Not implemented: ReadAsync in
AnnIndexSPANNTest) where ENOMEM was the root cause but the error message
gave no indication of RLIMIT_MEMLOCK being relevant.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Hi @krhancoc! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Problem
When io_uring_queue_init fails with ENOMEM (memlock limit exhausted — common on systems running many concurrent threads), CreateIOUring returns
nullptr and ReadAsync propagates a generic IOStatus::NotSupported("ReadAsync") with no indication of why the ring creation failed. This makes the failure opaque and hard to diagnose.
Root Cause Chain
Fix
(with a memlock-specific hint when errno is ENOMEM), and — if the caller passes a pointer — fills it with the raw errno so the caller can respond
appropriately.
with no argument (default nullptr), relying on the internal stderr logging.