This document describes the Python RPC microbenchmark added for comparing SlimeRPC against Ray (and optionally Pulsing) on a single machine.
The benchmark measures round-trip latency and effective bandwidth for a raw
bytes echo RPC across payload sizes from 1KB up to 16MB by default.
It runs two implementations by default, with a third opt-in baseline:
SlimeRPC: RDMA-backedPeerAgentRPC echo betweenbench-driverandbench-workerRay: a localEchoActorbaseline using the same payload sizes and metricsPulsing(optional, off by default): a@pul.remoteactor echo using the same payload sizes and metrics. Enable with--with-pulsing.
The comparison script prints:
- average latency
- p50 latency
- p99 latency
- effective round-trip bandwidth
S/Ray= Ray avg latency / SlimeRPC avg latency (> 1 means SlimeRPC wins)S/Pul= Pulsing avg latency / SlimeRPC avg latency (only shown when Pulsing was enabled)
dlslime/bench/python/run_rpc_bench.shdlslime/bench/python/rpc_bench_slime_worker.pydlslime/bench/python/rpc_bench_slime_driver.pydlslime/bench/python/rpc_bench_ray.pydlslime/bench/python/rpc_bench_pulsing.pydlslime/bench/python/rpc_bench_compare.py
Before running the SlimeRPC side:
- Start NanoCtrl and make sure it is reachable.
- Make sure Redis is reachable through NanoCtrl.
- Build and install DLSlime with Python bindings and RDMA support.
For the optional Pulsing baseline, also install pulsing
(pip install pulsing) in the same environment.
Default run (SlimeRPC + Ray, Pulsing disabled):
bash dlslime/bench/python/run_rpc_bench.shInclude the Pulsing baseline:
bash dlslime/bench/python/run_rpc_bench.sh --with-pulsing
# or
WITH_PULSING=1 bash dlslime/bench/python/run_rpc_bench.shSpecify control-plane address or buffer size:
bash dlslime/bench/python/run_rpc_bench.sh \
--ctrl http://127.0.0.1:4479 \
--buf-mb 256 \
--max-size-mb 16Environment-variable form:
CTRL=http://127.0.0.1:4479 BUF_MB=256 MAX_SIZE_MB=16 \
bash dlslime/bench/python/run_rpc_bench.shThe script always writes:
bench/results/slime_rpc.csvbench/results/ray_rpc.csv
and, when --with-pulsing is passed, additionally writes:
bench/results/pulsing_rpc.csv
It then prints a merged comparison table. The S/Pul column only appears in
the table when the Pulsing CSV is present.
The default --max-size-mb is 16.
That limit is intentional: the current raw mailbox RPC path is validated and
stable through 16MB in this benchmark. Larger payloads still need a dedicated
bulk-transfer path instead of the mailbox-oriented RPC data path.
The benchmark work also exercised and hardened several runtime behaviors:
- peer rendezvous retries no longer get stuck behind stale in-flight state
- stale Redis exchange and mailbox MR keys are cleaned on startup
- cleanup events now unblock pending RDMA waits when peers exit
RDMAEndpoint.shutdown()is exposed to Python for cleanup-driven teardown- Ray benchmark setup defaults to an isolated local runtime instead of attaching to an ambient cluster