dora manage pinned memory about 1000MB/S by tang-canran · Pull Request #1623 · dora-rs/dora

tang-canran · 2026-04-17T12:00:18Z

No description provided.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

phil-opp · 2026-04-23T14:20:56Z

Please use English in commit messages, code comments, and docs. Please also add a PR description to tell us what this change is about. Thanks!

heyong4725 · 2026-05-19T21:51:32Z

Hi @tang-canran, thanks for the substantial engineering effort. After reading the actual code (not just the design doc), the technical idea has real value but this specific PR isn't the right vehicle. Closing this — please read carefully, the path forward is in the follow-up issue linked at the bottom.

What works architecturally

After reading binaries/daemon/src/memory_manager.rs, the new DaemonRequest variants in libraries/message/src/node_to_daemon.rs, and the control_channel.rs additions, the actual design is side-channel, not replacement:

3 new opt-in protocol variants (RegisterPinnedMemory, ReadPinnedMemory, FreePinnedMemory) — additive, no modification to existing message types
New MemoryPoolManager in the daemon — metadata-only HashMap, isolated module
Bulk data lives in /dev/shm/dora_pool_* shared-memory files; daemon brokers metadata exchange
Existing send_output / recv_event paths are untouched. Zenoh SHM, Arrow IPC, mpsc routing, cudaIpc — all unchanged

That's a much friendlier shape than docs/design.md's "replace message-passing with shared-state" rhetoric suggests. The code is opt-in; the docs aren't.

Where the real value sits

There's a concrete gap in dora today that this addresses: CPU-pinned host buffer → cudaMemcpyHostToDevice at high throughput. Zenoh SHM (the ≥4 KiB transport) is page-aligned but NOT pinned, so feeding CUDA from a zenoh-SHM payload requires either (a) an extra cudaMemcpyHostToDevice from non-pinned memory (slower than DMA from pinned) or (b) manual pinning that the application has to manage.

For high-rate CPU→GPU workloads — camera frames, sensor batches, real-time perception — that overhead is real. The 3-5× speedup claim against the non-pinned baseline is plausible on that specific path.

cudaIpc handles solve GPU↔GPU sharing but don't help when the producer is on CPU. So pinned-host-buffer is a real gap, not a synthetic benchmark.

Why this specific PR can't land

Five issues, in order of severity:

Implementation sprawl. The feature surface (RPC + memory manager + opt-in API) is reasonably ~300-500 lines. This PR adds 2,717. The 1,092-line Python binding additions in apis/python/node/src/lib.rs and the 348/-60 rewrite of apis/python/node/dora/cuda.py are 7× larger than the feature needs. A focused implementation would fit in a reviewable PR; this one cannot.
Design doc framing. docs/design.md is structured as a EuroSys paper draft ("面向 EuroSys 投稿的系统设计与创新...3.7× 加速比") arguing dora should replace its data plane. That's not a feature proposal — it's a research artifact. A future PR's design doc has to be in English and framed as "opt-in additional transport for pinned-host-buffer CUDA workloads," not "replace upstream architecture."
/dev/shm/dora_pool_* is Linux-only with no Windows/macOS story. dora supports all three. Cross-platform isn't optional — needs at least a graceful "this feature is unavailable on platform X" path.
No comparison against existing alternatives. The design doc doesn't address: why not extend zenoh-shm to support optional pinning? Why not a CUDA-aware Arrow IPC variant? Why not improve cudaIpc reuse with handle caching? Those are the questions a reviewer needs answered before evaluating a new transport.
Maintainer language feedback unresolved. @phil-opp asked on 2026-04-23 for English commits/comments/docs. Commits after that date are still in Chinese ("dora统一管理页锁内存", "完成自释放功能", etc.) and docs/design.md is entirely in Chinese. Project working language is English.

Path forward

I'm opening a follow-up issue: #1872 describes the CPU-pinned-host-buffer opportunity in English, doesn't commit to an architecture, and invites focused proposals. If you (or anyone else) want to engage with the opportunity at the right scope and in English, that's the place to start.

Specifically, a reviewable future PR would look like:

An issue or RFC first, in English, proposing the architecture and addressing the four "why not X" alternatives above
After maintainer feedback, a code PR roughly: ~150 lines of Rust (protocol + daemon + control_channel), ~100 lines of Python wrapper, ~50 lines of cross-platform gating, ~50 lines of focused test, and an English design note explaining what it is and what it isn't
Cross-platform fallback (clear "feature unavailable" on non-Linux, or graceful degradation)
Comparison numbers against zenoh-shm + manual pinning, not just against zenoh-shm without pinning

Thanks again for the engineering. The paper is yours to pursue on your fork regardless of what upstream decides. If you do engage with the follow-up issue, please write in English and start with the smaller architectural slice.

cc @phil-opp

tang-canran and others added 10 commits April 17, 2026 13:49

dora manage pinned memory about 1000MB/S

8b80eb6

解决内存泄漏问题，速度1500MB/S

6713b41

Add .worktrees/ to .gitignore

f869943

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

daemon 统一管理，1500MB/S

acb6241

fix: remove free_pinned_buffer dependency, use daemon memory manager

4105039

Merge main into feature/pinned-memory

789178a

chore: update pinned memory implementation and documentation

99e60d3

dora统一管理页锁内存1500MB/S cargo fmt

226412b

stage4:1800MB/S，已完成重复释放测试和空读测试

3183c9e

stage4:完成自释放功能

1e15bf1

phil-opp added the waiting-for-author The pull request requires adjustments by the PR author. label Apr 23, 2026

tang-canran added 5 commits April 25, 2026 20:09

2500MB/S

23ba858

pinned memory 6000MB/S

9982eec

pinned memory cuda->cpu 7000MB/S

4d64f9b

memory pool for about 7500MB/S

77beebf

remove slot-3

ca09a80

heyong4725 mentioned this pull request May 19, 2026

Opt-in pinned-host memory pool for high-throughput CPU→GPU transfers #1872

Open

heyong4725 closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dora manage pinned memory about 1000MB/S#1623

dora manage pinned memory about 1000MB/S#1623
tang-canran wants to merge 15 commits into
dora-rs:mainfrom
tang-canran:main

tang-canran commented Apr 17, 2026

Uh oh!

phil-opp commented Apr 23, 2026

Uh oh!

heyong4725 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tang-canran commented Apr 17, 2026

Uh oh!

phil-opp commented Apr 23, 2026

Uh oh!

heyong4725 commented May 19, 2026

What works architecturally

Where the real value sits

Why this specific PR can't land

Path forward

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants