Skip to content

librdmacm: extend rsocket for Redis, iperf3, memcached and more Linux APIs#1717

Merged
rleon merged 13 commits intolinux-rdma:masterfrom
BatshevaBlack:rsocket_upstream_no_epoll
Apr 15, 2026
Merged

librdmacm: extend rsocket for Redis, iperf3, memcached and more Linux APIs#1717
rleon merged 13 commits intolinux-rdma:masterfrom
BatshevaBlack:rsocket_upstream_no_epoll

Conversation

@BatshevaBlack
Copy link
Copy Markdown
Contributor

Extend the rsocket implementation in librdmacm so that applications such as Redis, iperf3, and memcached can use rsocket transparently via LD_PRELOAD (librspreload), and so rsocket aligns with more standard Linux socket and I/O behavior.

Motivation
The rsocket library did not fully support several POSIX/Linux interfaces (epoll, select, accept4, sendfile, fcntl64, and various socket options). Applications that rely on these either failed or fell back to TCP. This change extends the rsocket implementation to implement or fix those interfaces, so the preload can intercept them and route traffic over RDMA.

Changes
fix rpoll timeout handling and select,
fix cm_svc_run
add accept4, dup, fcntl64, sendfile64;
fix rfcntl
extend getsockopt/setsockopt;
fix SOCK_STREAM/SOCK_DGRAM handling and connect service/TCP behavior;
adjust wake-up timeout from rpoll.

Copy link
Copy Markdown
Contributor

@shefty shefty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good. A few minor comments, though one asks to revert a change request I made on the prior review. :/ My bad.

Comment thread librdmacm/preload.c Outdated
Comment thread librdmacm/preload.c Outdated
Comment thread librdmacm/preload.c Outdated
@BatshevaBlack BatshevaBlack force-pushed the rsocket_upstream_no_epoll branch from 77bbfe7 to 3317462 Compare March 17, 2026 21:40
@BatshevaBlack
Copy link
Copy Markdown
Contributor Author

@shefty
I resolved all comments
thanks!

Copy link
Copy Markdown
Contributor

@shefty shefty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my comments have been addressed. Thanks!

@BatshevaBlack
Copy link
Copy Markdown
Contributor Author

@shefty thanks!
Will you merge it? or what is the next step?

@rleon
Copy link
Copy Markdown
Member

rleon commented Mar 19, 2026

@shefty thanks! Will you merge it? or what is the next step?

I will just perform some sanity review and will merge.

Thanks

Comment thread librdmacm/CMakeLists.txt Outdated
Comment thread librdmacm/CMakeLists.txt Outdated
Comment thread librdmacm/preload.c Outdated
Comment thread librdmacm/preload.c Outdated
void *optval, socklen_t *optlen);
int (*fcntl)(int socket, int cmd, ... /* arg */);
#if (!defined(_FILE_OFFSET_BITS) || _FILE_OFFSET_BITS != 64) && \
(!defined(RDMA_PRELOAD_HAVE_64) || RDMA_PRELOAD_HAVE_64)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this extra "#if ..."?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both: non-LFS vs LFS (_FILE_OFFSET_BITS) and whether libc has fcntl64/sendfile64. I’ll combine those in one CMake flag and use a single #if in preload.c so we don’t duplicate the logic in many places.

Comment thread librdmacm/preload.c
rfcntl(fd, cmd) : real.fcntl64(fd, cmd);
break;
case F_DUPFD:
/*case F_DUPFD_CLOEXEC:*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is legacy from the original fcntl commit, there’s no documented reason.

Comment thread librdmacm/rsocket.c
*optlen = sizeof(int);
break;
case TCP_INFO:
//TODO: support other tcp_info fields.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We implemented TCP_INFO incrementally by demand, not as a full TCP stack mirror—that’s why the TODO calls out other fields.

@BatshevaBlack BatshevaBlack force-pushed the rsocket_upstream_no_epoll branch 7 times, most recently from 3317462 to 1b81b9d Compare March 22, 2026 07:56
in case of timeout which causes poll to return, clear all signals that
arrived by calling rs_poll_exit.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
Without the user_fds mapping, select() could set bits for internal fds
instead of the user fds the application passed in, so the wrong (or no)
sockets were reported as ready.

Keep the list of the fds that are sent to poll in order to know
which fd belongs to each rfd when returning the revents to the
fds list.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
The accept4 implementation extends accept to support the additional
atomic flag-setting functionality provided by accept4.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
Add preload interception for fcntl64 so rsocket file descriptors
support the same flag semantics as the glibc fcntl64 API.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
getsockopt: TCP_INFO, TCP_CONGESTION, SO_BROADCAST & IP_TOS.
setsockopt: IP_TOS & TCP_CONGESTION.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
rfcntl keeps the files flags all in the fd_flags argument.
Adding the new field fs_flags to the rs struct allows the
fcntl function to keep the file status flags separately
from the file descriptor flags.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
Add preload interception for sendfile64 so applications using the
64-bit offset sendfile64 API work correctly with rsocket file descriptors.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
Add preload interception for dup so that duplicating an rsocket file
descriptor produces another rsocket fd that refers to the same connection.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
Previously we only added it when rconnect() returned EINPROGRESS.
Now also add when connect succeeds so the progress thread can
drive state and handle disconnects.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
The changes to rpoll to use a signaling fd to wake up blocked threads,
combined with suspending polling while rsockets states may be changing
_should_ prevent any threads from blocking indefinitely in rpoll()
when a desired state change occurs.

We periodically wake up any polling thread, so that it can recheck its
rsocket states. The sleeping interval was set to an arbitrary value of
5 seconds, this interval is too long for apps that request a connection
and are dependent on the thread waking up, so it's changed now to 0.5
seconds, but can be overridden using config files.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
Updated type checks to identify socket types even when additional flags
are present in the type field. Changed the comparison to use bitwise AND
for more accurate detection.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
@BatshevaBlack BatshevaBlack force-pushed the rsocket_upstream_no_epoll branch from 1b81b9d to 945db75 Compare March 30, 2026 06:46
@jgunthorpe
Copy link
Copy Markdown
Member

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@BatshevaBlack BatshevaBlack force-pushed the rsocket_upstream_no_epoll branch from 945db75 to 87c7aaa Compare April 13, 2026 10:39
        The cmake detection test for fcntl64/sendfile64 was clearing
        CMAKE_REQUIRED_DEFINITIONS without adding -D_GNU_SOURCE, causing
        the symbols to be invisible in glibc headers and the test to fail.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
        The check_c_source_compiles test inherited -Wredundant-decls -Werror
        from CMAKE_C_FLAGS, causing it to wrongly conclude fcntl64/sendfile64
        are not in headers. Remove the fragile compile test and infer
        IN_HEADER directly from HAVE_LFS_WRAPPER_SYMS.

Signed-off-by: Batsheva Black <bblack@nvidia.com>
@rleon rleon merged commit 5e7ab6b into linux-rdma:master Apr 15, 2026
14 checks passed
@BatshevaBlack BatshevaBlack deleted the rsocket_upstream_no_epoll branch April 15, 2026 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants