prov/shm: Revert inject protocol to use receiver side resources by zachdworkin · Pull Request #12313 · ofiwg/libfabric

zachdworkin · 2026-05-30T00:26:41Z

Convert parallel inject stack to be an smr_freestack inject pool instead. Senders will now try to allocate an inject buffer from the receiver's pool to copy their data into (write) or have the receiver copy data into for the sender (read).

These changes are aligning the smr_flags to be a uint8 and renamed the op_flags to smr_flags.

Remove "format_inject" because the steps inside it need to be spread out in the "do_inject" function. We need to try and get an inject buffer first because it is the most expensive grab and has the highest chance to fail. It does not make sense to keep "format_inject" since the allocation of the inject_buffer and copy into it was the main thing the format function would have done.

Place lighter protocol fields in the same cache-line grab as the atomic-queue cmd_entry grab. This way we if we are using a cpu without prefetcing algorithms (to grab the adjacent cacheline for us) we are optimizing the access of the fields we need for the lightweight/fast protocols. The heavier/slower protocols which use the second cache line fields will be unaffected by this change on older cpus since they already need to access both cache-lines. Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com> Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

Replace hdr.status with hdr.smr_flags to indicate any error. This error will use the flag SMR_OP_ERROR for the sender to process its errors on return cmd. Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com> Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

Command stack is less likely to be used in the inject protocol when resources are on the receiver side. If the inject pool is above it then we have to jump less, and do not have to jump over the command stack, when accessing it. Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>

SAR should never be handling 0 byte copies anymore since the inject protocol can handle delivery complete. Instead we will check and WARN the user if we accidentally do a 0-byte copy in SAR. Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>

Reuse cmd entry field to be use as the command pointer. The use of entry and the ptr are mutually exclusive and so it can be reused. This allows casting of the cmd to an an entry and dereferencing the cmd to get the ptr. It allows us to maiximize the header field caching as well as increases the maximum inline payload by 8 bytes. Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

Only atomic needs this, msg and rma properly push it back. Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>

Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>

Remove the parallel command-inject resources and revert to using a lock-unlock inject buffer pool. Update the inject protocol to use the old method. There is a performance regression when using the "new shm" command-inject parallel data structure. This is due to the sender not being able to complete its transmission until the receiver returns the sender's command to the sender's return queue. In the old lock-unlock method the sender would allocate receiver side resources, copy its data into the receiver inject buffer and then complete. The old method allows MPIs and applications to assume that their inject message transmissions will complete quickly and since the new method does not complete `as` quickly it is likely the reason for this regression. Remove the smr_format_inject function. We need to try to get a tx_buf and try again if we run out as soon as possible. Since we always need a tx_buf we are avoiding the ofi_buf_alloc call in the case where there are no more inject buffers. Pulling this code out of the format_inject function makes that function copy into the tx_buf and set the proto to inject which does not need to be its own function. Instead we can order the operations from that function in more optimal locations inside of do_inject function. This will also revert to the "old-shm" method of buffering all unexpected inject messages Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com> Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

zachdworkin and others added 9 commits May 28, 2026 13:09

prov/shm: Remove 0-byte copy SAR

4dfe427

SAR should never be handling 0 byte copies anymore since the inject protocol can handle delivery complete. Instead we will check and WARN the user if we accidentally do a 0-byte copy in SAR. Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>

prov/shm: Push cmd back to stack on error

0e9bfbb

Only atomic needs this, msg and rma properly push it back. Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>

prov/shm: Do not check for total_len < inject_size on rma_fast

06e2d53

Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>

prov/shm: Do not take inject for op_read_req

b576fa6

Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>

zachdworkin added ⚠️ Do not merge prov/shm labels May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prov/shm: Revert inject protocol to use receiver side resources#12313

prov/shm: Revert inject protocol to use receiver side resources#12313
zachdworkin wants to merge 9 commits into
ofiwg:mainfrom
zachdworkin:lock2

zachdworkin commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zachdworkin commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants