Skip to content

Move Ascend P2P by keeping outbound P2P socket operations on the dedicated Ascend P2P event loop.#234

Merged
matthewygf merged 6 commits into
LMCache:mainfrom
matthewygf:loop_ownership
May 19, 2026
Merged

Move Ascend P2P by keeping outbound P2P socket operations on the dedicated Ascend P2P event loop.#234
matthewygf merged 6 commits into
LMCache:mainfrom
matthewygf:loop_ownership

Conversation

@matthewygf
Copy link
Copy Markdown
Collaborator

@matthewygf matthewygf commented May 14, 2026

Changes

  • Made all P2P lookup, connection , pull and signals through the P2P loop instead of the storage manager loop to avoid blocking from storage manager operation and avoid sit ready while the wrong loop were delayed.
  • Ensure Done-signal uses the backend's async ZMQ context instead of creating sockets on a different loop.
  • Serialize async HCCL peer handshakes to avoid concurrent setup.
flowchart LR
  subgraph ownership["Socket / I/O ownership"]
    A["req_socket"] --> B["must run on worker.loop"]
    C["P2P peer + transfer ZMQ"] --> D["must run on AscendP2PBackend.loop"]
    E["prefetch orchestration"] --> F["runs on StorageManager.loop"]
  end
Loading
sequenceDiagram
  participant SL as StorageManager.loop
  participant BE as Backend coroutine<br/>(e.g. AscendP2PBackend)
  participant PL as AscendP2PBackend.loop
  participant WL as worker.loop

  SL->>BE: await batched_async_contains / batched_get_non_blocking
  alt Ascend P2P needs P2P-only I/O
    BE->>PL: run_coroutine_threadsafe(_run_on_p2p_loop coro)
    PL-->>BE: result via Future → wrap_future
  end
  opt Needs controller via worker req_socket
    BE->>WL: async_put_and_wait_msg → run_coroutine_threadsafe(...)
    WL-->>BE: result via wrap_future + wait_for
  end
  BE-->>SL: return to prefetch task

Loading

Why

  • By creating and using the dedicated sockets on p2p loop, the lookup replies should have no risk of being delayed, and no longer depend on storagemanager and worker loop.

@matthewygf
Copy link
Copy Markdown
Collaborator Author

matthewygf commented May 14, 2026

Current main:

============ Serving Benchmark Result ============
Successful requests:                     29        
Failed requests:                         1         
Request rate configured (RPS):           1.00      
Benchmark duration (s):                  32.05     
Total input tokens:                      653602    
Total generated tokens:                  2900      
Request throughput (req/s):              0.90      
Output token throughput (tok/s):         90.48     
Peak output token throughput (tok/s):    460.00    
Peak concurrent requests:                26.00     
Total token throughput (tok/s):          20482.79  
---------------Time to First Token----------------
Mean TTFT (ms):                          3337.24   
Median TTFT (ms):                        3686.39   
P99 TTFT (ms):                           6067.23   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          96.40     
Median TPOT (ms):                        77.17     
P99 TPOT (ms):                           219.72    
---------------Inter-token Latency----------------
Mean ITL (ms):                           96.40     
Median ITL (ms):                         38.08     
P99 ITL (ms):                            890.86    
==================================================

Current commit:

image

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements event loop marshaling for Ascend-specific LMCache components to ensure ZMQ socket operations are executed on their designated event loops, preventing thread-safety issues. Key updates include patching the LMCacheWorker to handle cross-loop requests, introducing a mechanism in the P2P backend to marshal operations to its internal loop, and adding per-peer handshake locks in the HCCL channel. Review feedback recommends enhancing robustness by handling asyncio.CancelledError to maintain ZMQ socket state, replacing a busy-wait polling loop with an asyncio.Event, and using an initialization helper instead of a potentially crashing assertion.

Comment thread lmcache_ascend/v1/cache_controller/worker.py
Comment thread lmcache_ascend/v1/storage_backend/p2p_backend.py Outdated
Comment thread lmcache_ascend/v1/storage_backend/p2p_backend.py Outdated
@matthewygf matthewygf marked this pull request as ready for review May 14, 2026 13:22
@matthewygf matthewygf merged commit 61047fa into LMCache:main May 19, 2026
1 check passed
@matthewygf matthewygf deleted the loop_ownership branch May 19, 2026 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants