fabtests/efa: Add multi-QP support for efa_rma_bw by shijin-aws · Pull Request #12298 · ofiwg/libfabric

shijin-aws · 2026-05-28T00:51:19Z

Add support for multiple endpoints (-q / --num-eps) with per-EP flow control and FI_MORE doorbell batching. Each EP independently tracks its posted/completed counts via an efa_rma_bw_ctx embedded in the operation context, with pool slots partitioned per-EP to prevent cross-EP reuse. The benchmark loop follows the perftest pattern: iterating all EPs each pass to avoid starvation, with per-EP window checks gating both posting and FI_MORE. The writedata rx path uses fi_recvmsg with per-EP tracking when FI_RX_CQ_DATA is set.

Add support for multiple endpoints (-q / --num-eps) with per-EP flow control and FI_MORE doorbell batching. Each EP independently tracks its posted/completed counts via an efa_rma_bw_ctx embedded in the operation context, with pool slots partitioned per-EP to prevent cross-EP reuse. The benchmark loop follows the perftest pattern: iterating all EPs each pass to avoid starvation, with per-EP window checks gating both posting and FI_MORE. The writedata rx path uses fi_recvmsg with per-EP tracking when FI_RX_CQ_DATA is set. Signed-off-by: Shi Jin <sjina@amazon.com>

sunkuamzn · 2026-05-28T22:44:27Z

+ *   Each posted operation carries an efa_rma_bw_ctx containing the EP index.
+ *   On completion, the CQ entry's op_context is used (via container_of) to
+ *   recover the efa_rma_bw_ctx and attribute the completion to the correct EP.
+ *   The ep_idx field is placed before fi_context2 in the struct to prevent


The provider shouldn't clobber any memory beyond fi_context2 anyway

Lame comments generated by AI, I will fix it

sunkuamzn · 2026-05-28T22:46:48Z

+ *
+ *   For the writedata receiver (unsolicited write-with-imm without posted
+ *   receives, i.e. !FI_RX_CQ_DATA), completions have no user context, so
+ *   per_ep_completed is passed as NULL and only global counting is performed.


Is the benchmark actually useful in this case? With only a global counter, some QPs can be idle because other QPs are waiting for completions. Should the test just fail when multiple EPs are used without FI_RX_CQ_DATA?

That completion counter is only used as credits for each EP to repost rx buffer, when unsolicited write recv is used, such credit doesn't exist by nature

sunkuamzn · 2026-05-28T22:48:23Z

 * Options:
 *   --high-pps        Enable FI_EFA_WR_HIGH_PPS flag on writes.
 *   -o write|writedata|read  Select RMA operation (default: write).
+ *   --post-list <n>   Batch n posts per doorbell using FI_MORE (default: 1).


Should the default be min(window_size, tx_queue_depth)?

good catch, will do

sunkuamzn · 2026-05-28T22:55:05Z

+	per_ep_posted[ep_idx]++;
+
+	ret = fi_recvmsg(eps[ep_idx], &msg, rx_flags);
+	if (ret) {


No retry on EAGAIN?

will do. We have a large while loop until completed_cnt reaching the total cnt, so as long as we don't return early in the function for EAGAIN we finally retry

sunkuamzn · 2026-05-28T22:59:28Z

+
+# Testing the batch mode of fi_efa_rma_bw (--post-list) which batch multiple WQEs with FI_MORE
+@pytest.mark.pr_ci
+@pytest.mark.fabric(params=["efa", "efa-direct"])


Since you don't retry on EAGAIN, this test won't work well for the EFA protocol path. You could either set the homogeneous peers option or fail the test for the protocol path.

It should retry on EAGAIN, I just missed the handling on fi_recvmsg, will add that

post_rma also does not retry on EAGAIN https://github.com/shijin-aws/libfabric/blob/d164f90ddea65ab012cea46ce5ab479faaa2f603/fabtests/prov/efa/src/efa_rma_bw.c#L134-L147

If you do want to retry on EAGAIN, then it's probably better to re-use the FT_POST macro

ret = post_rma(eps[ep_idx], remote_addrs[ep_idx], buf, opts.transfer_size, remote, &tx_ctx_pool[slot].context, flags); if (ret == -FI_EAGAIN) { per_ep_posted[ep_idx]--; break; }

This finally get retried in the outmost while

while (posted_cnt < total_posts || completed_cnt < total_posts) {

I intend to use such semantic to keep post and poll separated in the while loop while both finally get progressed

shijin-aws requested a review from a team May 28, 2026 00:51

shijin-aws force-pushed the pps-dev branch from 1ed3905 to d164f90 Compare May 28, 2026 20:06

sunkuamzn reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fabtests/efa: Add multi-QP support for efa_rma_bw#12298

fabtests/efa: Add multi-QP support for efa_rma_bw#12298
shijin-aws wants to merge 1 commit into
ofiwg:mainfrom
shijin-aws:pps-dev

shijin-aws commented May 28, 2026

Uh oh!

sunkuamzn May 28, 2026

Uh oh!

shijin-aws May 28, 2026

Uh oh!

sunkuamzn May 28, 2026

Uh oh!

shijin-aws May 28, 2026 •

edited

Loading

Uh oh!

sunkuamzn May 28, 2026

Uh oh!

shijin-aws May 29, 2026

Uh oh!

sunkuamzn May 28, 2026

Uh oh!

shijin-aws May 28, 2026

Uh oh!

sunkuamzn May 28, 2026

Uh oh!

shijin-aws May 28, 2026

Uh oh!

sunkuamzn May 29, 2026

Uh oh!

shijin-aws May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shijin-aws commented May 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shijin-aws May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shijin-aws May 28, 2026 •

edited

Loading