dora manage pinned memory about 1000MB/S#1623
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Please use English in commit messages, code comments, and docs. Please also add a PR description to tell us what this change is about. Thanks! |
|
Hi @tang-canran, thanks for the substantial engineering effort. After reading the actual code (not just the design doc), the technical idea has real value but this specific PR isn't the right vehicle. Closing this — please read carefully, the path forward is in the follow-up issue linked at the bottom. What works architecturallyAfter reading
That's a much friendlier shape than Where the real value sitsThere's a concrete gap in dora today that this addresses: CPU-pinned host buffer → cudaMemcpyHostToDevice at high throughput. Zenoh SHM (the ≥4 KiB transport) is page-aligned but NOT pinned, so feeding CUDA from a zenoh-SHM payload requires either (a) an extra For high-rate CPU→GPU workloads — camera frames, sensor batches, real-time perception — that overhead is real. The 3-5× speedup claim against the non-pinned baseline is plausible on that specific path.
Why this specific PR can't landFive issues, in order of severity:
Path forwardI'm opening a follow-up issue: #1872 describes the CPU-pinned-host-buffer opportunity in English, doesn't commit to an architecture, and invites focused proposals. If you (or anyone else) want to engage with the opportunity at the right scope and in English, that's the place to start. Specifically, a reviewable future PR would look like:
Thanks again for the engineering. The paper is yours to pursue on your fork regardless of what upstream decides. If you do engage with the follow-up issue, please write in English and start with the smaller architectural slice. cc @phil-opp |
No description provided.