prov/rxm: add pluggable multi-QP routing with round-robin selector#12234
Open
imasuari wants to merge 4 commits into
Open
prov/rxm: add pluggable multi-QP routing with round-robin selector#12234imasuari wants to merge 4 commits into
imasuari wants to merge 4 commits into
Conversation
ooststep
reviewed
May 20, 2026
Contributor
ooststep
left a comment
There was a problem hiding this comment.
the underlying transport is mostly transparent to rxm, which deals in endpoint types (msg_ep) rather than sockets/qps/etc. in that respect - this is more a prioritized multi-ep/multi-rail rather than specific to multi-qp, though I understand that to be the primary intent.
|
|
||
| struct rxm_conn; | ||
|
|
||
| enum rxm_op_type { |
Contributor
There was a problem hiding this comment.
we already have rxm_ctrl_* to mark each message in the pkt. can those not be used rather than adding another enum and function argument to track the same information?
Introduce a pluggable rxm_ep_selector interface that maps each TX operation to a msg_ep index. The selector receives the op type (EAGER, SAR_FIRST/MIDDLE/LAST, RNDV_CTRL/RMA, RMA, ATOMIC) and the SAR msg_id, allowing future policies to steer traffic across multiple underlying msg endpoints. Wire the selector and a msg_eps[] array into rxm_conn, replacing the single scalar msg_ep alias. All TX paths (eager/tagged sends, SAR, RNDV control and payload, pass-through RMA, atomics) are updated to call rxm_conn_msg_ep() instead of accessing msg_ep directly. The initial implementation ships rxm_selector_single_ep, which always returns index 0 and preserves current single-msg_ep behaviour. Signed-off-by: Itai Masuari <imasuari@habana.ai>
Implement a round-robin msg_ep selector policy and SAR segment pinning. RR policy routes RMA and rendezvous-RMA ops round-robin across msg_ep[1..N-1] while keeping control-plane traffic (eager, tagged, RNDV_CTRL, atomics) pinned to msg_ep[0]. Falls back to msg_ep[0] when num_msg_eps == 1. SAR pinning ensures in-order delivery: SAR_FIRST stays on msg_ep[0]; the first SAR_MIDDLE picks a data msg_ep via RR and records the choice in an index_map keyed by msg_id; subsequent MIDDLEs inherit the pin; SAR_LAST clears the entry. A destroy hook is added to the selector vtable so stateful selectors (like the RR selector with its sar_pins map) can own their cleanup. Signed-off-by: Itai Masuari <imasuari@habana.ai>
Add runtime configuration for multi-msg_ep routing via the FI_OFI_RXM_NUM_MSG_EPS environment variable (default 1, clamped to [1, 255]). When num_msg_eps == 1, use the stateless single-ep selector to avoid an unnecessary heap allocation and SAR-pin map. When num_msg_eps > 1, allocate the round-robin selector. Currently all msg_eps[] slots alias the single real msg endpoint so wire behaviour is unchanged; the selector still picks among indices, paving the way for real multi-msg_ep support (e.g. multi-QP over verbs). Multi-ep (num_msg_eps > 1) is restricted to the verbs provider, where each msg_ep maps to a distinct QP; other providers clamp to 1 with an info-level log. Signed-off-by: Itai Masuari <imasuari@habana.ai>
Log conn, op type, msg_id, and selected msg_ep index at FI_LOG_EP_DATA level to aid tracing multi-ep path selection. Signed-off-by: Itai Masuari <imasuari@habana.ai>
de7b1e8 to
4146705
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Design
The selector is a simple vtable (select + destroy callbacks) attached to each rxm_conn. Two implementations are provided:
Wire behaviour is unchanged in this series — all msg_eps[] slots point to the same underlying endpoint. This lays the groundwork for real multi-QP connections in a follow-up series.