Skip to content

[Store] Implement dual rdma forward path#2092

Open
Shichang-Zhang wants to merge 14 commits into
kvcache-ai:P2P-Mooncake-Storefrom
Shichang-Zhang:dual-rdma-p2p-forward-direction
Open

[Store] Implement dual rdma forward path#2092
Shichang-Zhang wants to merge 14 commits into
kvcache-ai:P2P-Mooncake-Storefrom
Shichang-Zhang:dual-rdma-p2p-forward-direction

Conversation

@Shichang-Zhang
Copy link
Copy Markdown
Contributor

Description

This PR implements the forward transfer control path and RPC surface for Mooncake Store.

When forward mode is enabled, writes use PreWrite → RDMA write → WriteCommit and reads use PinKey → RDMA read → UnPinKey. The default reverse path (WriteRemoteData / ReadRemoteData) is unchanged unless callers explicitly select forward.

Use ReadConfigExt/WriteConfigExt (original ReadRouteConifg/WriteRouteConfig) to configure the transfer path direction。

Module

  • Transfer Engine (mooncake-transfer-engine)
  • Mooncake Store (mooncake-store)
  • Mooncake EP (mooncake-ep)
  • Integration (mooncake-integration)
  • P2P Store (mooncake-p2p-store)
  • Python Wheel (mooncake-wheel)
  • PyTorch Backend (mooncake-pg)
  • Mooncake RL (mooncake-rl)
  • CI/CD
  • Docs
  • Other

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Breaking change
  • Documentation update
  • Other

How Has This Been Tested?

  • mooncake-store/tests/peer_client_test.cpp: PreWrite / WriteCommit / WriteRevoke, PinKey / UnPinKey (including refcount: double pin same token, double unpin, new token after full unpin), ReadRemoteData / WriteRemoteData param and error paths, not-connected RPC_FAIL cases.
  • mooncake-store/tests/client_rpc_service_test.cpp: ReadRemoteData (including INTERNAL_ERROR when TE transfer is not available in fixture), PinKey / UnPinKey, existing write-path tests.
  • mooncake-store/tests/p2p_client_integration_test.cpp: forward scenarios as implemented (adjust this line to match what you actually ran).

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a forward RDMA data plane path for the Mooncake store, complementing the existing reverse path. Key changes include the introduction of a three-phase control plane (PreWrite/Pin, Transfer, and Commit/UnPin) and a lease-based resource management system within the DataManager to handle pending writes and pinned keys. Client interfaces and Python bindings have been extended to allow users to specify the RDMA direction mode. Feedback identifies a significant limitation where the forward RDMA path currently requires contiguous buffers, recommending improved documentation, more descriptive error codes, and future support for non-contiguous transfers.

Comment thread mooncake-store/src/p2p_client_service.cpp
@Shichang-Zhang Shichang-Zhang force-pushed the dual-rdma-p2p-forward-direction branch from 3978a85 to 0871e70 Compare May 13, 2026 11:45
@Shichang-Zhang Shichang-Zhang force-pushed the dual-rdma-p2p-forward-direction branch from 116cc61 to 92f0829 Compare May 15, 2026 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant