librdmacm: extend rsocket for Redis, iperf3, memcached and more Linux APIs#1717
Conversation
shefty
left a comment
There was a problem hiding this comment.
Overall, looks good. A few minor comments, though one asks to revert a change request I made on the prior review. :/ My bad.
77bbfe7 to
3317462
Compare
|
@shefty |
shefty
left a comment
There was a problem hiding this comment.
All my comments have been addressed. Thanks!
|
@shefty thanks! |
I will just perform some sanity review and will merge. Thanks |
| void *optval, socklen_t *optlen); | ||
| int (*fcntl)(int socket, int cmd, ... /* arg */); | ||
| #if (!defined(_FILE_OFFSET_BITS) || _FILE_OFFSET_BITS != 64) && \ | ||
| (!defined(RDMA_PRELOAD_HAVE_64) || RDMA_PRELOAD_HAVE_64) |
There was a problem hiding this comment.
why do we need this extra "#if ..."?
There was a problem hiding this comment.
We need both: non-LFS vs LFS (_FILE_OFFSET_BITS) and whether libc has fcntl64/sendfile64. I’ll combine those in one CMake flag and use a single #if in preload.c so we don’t duplicate the logic in many places.
| rfcntl(fd, cmd) : real.fcntl64(fd, cmd); | ||
| break; | ||
| case F_DUPFD: | ||
| /*case F_DUPFD_CLOEXEC:*/ |
There was a problem hiding this comment.
This is legacy from the original fcntl commit, there’s no documented reason.
| *optlen = sizeof(int); | ||
| break; | ||
| case TCP_INFO: | ||
| //TODO: support other tcp_info fields. |
There was a problem hiding this comment.
We implemented TCP_INFO incrementally by demand, not as a full TCP stack mirror—that’s why the TODO calls out other fields.
3317462 to
1b81b9d
Compare
in case of timeout which causes poll to return, clear all signals that arrived by calling rs_poll_exit. Signed-off-by: Batsheva Black <bblack@nvidia.com>
Without the user_fds mapping, select() could set bits for internal fds instead of the user fds the application passed in, so the wrong (or no) sockets were reported as ready. Keep the list of the fds that are sent to poll in order to know which fd belongs to each rfd when returning the revents to the fds list. Signed-off-by: Batsheva Black <bblack@nvidia.com>
The accept4 implementation extends accept to support the additional atomic flag-setting functionality provided by accept4. Signed-off-by: Batsheva Black <bblack@nvidia.com>
Add preload interception for fcntl64 so rsocket file descriptors support the same flag semantics as the glibc fcntl64 API. Signed-off-by: Batsheva Black <bblack@nvidia.com>
getsockopt: TCP_INFO, TCP_CONGESTION, SO_BROADCAST & IP_TOS. setsockopt: IP_TOS & TCP_CONGESTION. Signed-off-by: Batsheva Black <bblack@nvidia.com>
rfcntl keeps the files flags all in the fd_flags argument. Adding the new field fs_flags to the rs struct allows the fcntl function to keep the file status flags separately from the file descriptor flags. Signed-off-by: Batsheva Black <bblack@nvidia.com>
Add preload interception for sendfile64 so applications using the 64-bit offset sendfile64 API work correctly with rsocket file descriptors. Signed-off-by: Batsheva Black <bblack@nvidia.com>
Add preload interception for dup so that duplicating an rsocket file descriptor produces another rsocket fd that refers to the same connection. Signed-off-by: Batsheva Black <bblack@nvidia.com>
Previously we only added it when rconnect() returned EINPROGRESS. Now also add when connect succeeds so the progress thread can drive state and handle disconnects. Signed-off-by: Batsheva Black <bblack@nvidia.com>
The changes to rpoll to use a signaling fd to wake up blocked threads, combined with suspending polling while rsockets states may be changing _should_ prevent any threads from blocking indefinitely in rpoll() when a desired state change occurs. We periodically wake up any polling thread, so that it can recheck its rsocket states. The sleeping interval was set to an arbitrary value of 5 seconds, this interval is too long for apps that request a connection and are dependent on the thread waking up, so it's changed now to 0.5 seconds, but can be overridden using config files. Signed-off-by: Batsheva Black <bblack@nvidia.com>
Updated type checks to identify socket types even when additional flags are present in the type field. Changed the comparison to use bitwise AND for more accurate detection. Signed-off-by: Batsheva Black <bblack@nvidia.com>
1b81b9d to
945db75
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
945db75 to
87c7aaa
Compare
The cmake detection test for fcntl64/sendfile64 was clearing
CMAKE_REQUIRED_DEFINITIONS without adding -D_GNU_SOURCE, causing
the symbols to be invisible in glibc headers and the test to fail.
Signed-off-by: Batsheva Black <bblack@nvidia.com>
The check_c_source_compiles test inherited -Wredundant-decls -Werror
from CMAKE_C_FLAGS, causing it to wrongly conclude fcntl64/sendfile64
are not in headers. Remove the fragile compile test and infer
IN_HEADER directly from HAVE_LFS_WRAPPER_SYMS.
Signed-off-by: Batsheva Black <bblack@nvidia.com>
Extend the rsocket implementation in librdmacm so that applications such as Redis, iperf3, and memcached can use rsocket transparently via LD_PRELOAD (librspreload), and so rsocket aligns with more standard Linux socket and I/O behavior.
Motivation
The rsocket library did not fully support several POSIX/Linux interfaces (epoll, select, accept4, sendfile, fcntl64, and various socket options). Applications that rely on these either failed or fell back to TCP. This change extends the rsocket implementation to implement or fix those interfaces, so the preload can intercept them and route traffic over RDMA.
Changes
fix rpoll timeout handling and select,
fix cm_svc_run
add accept4, dup, fcntl64, sendfile64;
fix rfcntl
extend getsockopt/setsockopt;
fix SOCK_STREAM/SOCK_DGRAM handling and connect service/TCP behavior;
adjust wake-up timeout from rpoll.