Skip to content

Commit e5305e0

Browse files
refactor(ray.sub): drop NETWORK_INIT_CMDS — MC_TCP_BIND_ADDRESS suffices
The NETWORK_INIT_CMDS block (pkill avahi-autoipd / ifconfig usb0 down / ip addr flush + a 2-second relaunch loop) was a workaround for an outdated diagnosis in data-plane-bench/DEBUG_TQ_BACKENDS.md (Issue 1): "MC_TCP_BIND_ADDRESS controls server_name (registration) but NOT the RPC listener bind address." Re-reading current Mooncake main (commit fast-forwarded today): - mooncake-transfer-engine/src/transfer_engine_impl.cpp:159-170 If MC_TCP_BIND_ADDRESS is set, it goes directly into desc.ip_or_host_name, which is the address registered via addRpcMetaEntry — i.e. the address peers receive from the metadata service. This was added by PR #226 (caef1ef, merged 2025-04-10) and IS in the pinned wheel 0.3.10.post2 (bumped 2026-04-22). - mooncake-transfer-engine/src/transfer_metadata_plugin.cpp:1292 The TCP listener binds INADDR_ANY and accepts on all interfaces. Bind itself was never the bug — the announce was. So per-process MC_TCP_BIND_ADDRESS in TQDataPlaneClient.__init__ (unchanged in this commit, runs on every process) gives Mooncake the routable announce address and peer connections work cross-node without OS-level interface stripping. The pkill+sleep loop fought a symptom (avahi-autoipd respawning the APIPA address). With the announce now correct regardless of usb0, that fight is unnecessary. Removing the block also makes ray.sub a no-op for non-mooncake_cpu backends (simple, mooncake_rdma) — they were paying the host-process-kill cost for no reason. If multi-node smoke regresses with peers connecting to 169.254.x, revert this commit only — (A) codec/adapter cleanup stays. Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
1 parent ddb9a02 commit e5305e0

1 file changed

Lines changed: 0 additions & 31 deletions

File tree

ray.sub

Lines changed: 0 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -205,40 +205,10 @@ head_node_ip=${ip_addresses_array[0]}
205205

206206
ip_head=$head_node_ip:$PORT
207207

208-
# Network init for Mooncake-cpu (TQ data-plane backend "mooncake_cpu").
209-
# Mooncake's transfer_metadata_plugin.cpp:1127 calls getifaddrs() and binds
210-
# the RPC listener to the FIRST interface with an IP — usually usb0
211-
# (169.254.3.1 link-local APIPA), which is unreachable cross-node. The fix
212-
# kills avahi-autoipd (the daemon that re-assigns 169.254.3.1), tells
213-
# NetworkManager to stop managing usb0, flushes the address, and runs a
214-
# 2 s relaunch loop as a failsafe. Lifted from data-plane-bench/ray.sub
215-
# (proven at 32-node and 48-node scales). Belt-and-braces: ifconfig +
216-
# ip variants both attempted because the container set varies.
217-
# Without this, mooncake_cpu fails with metadata 404s and a
218-
# MemcpyWorkerPool segfault during the first kv_batch_put.
219-
# See research/data_plane_mooncake_status.md.
220-
NETWORK_INIT_CMDS='# Kill avahi-autoipd for usb0: it is the daemon that re-assigns 169.254.3.1.
221-
pkill avahi-autoipd 2>/dev/null || true
222-
if [ -f /run/avahi-autoipd.usb0.pid ]; then kill $(cat /run/avahi-autoipd.usb0.pid) 2>/dev/null || true; fi
223-
nmcli device set usb0 managed no 2>/dev/null || true
224-
ifconfig usb0 0.0.0.0 2>/dev/null || true
225-
ifconfig usb0 down 2>/dev/null || true
226-
ip link set usb0 down 2>/dev/null || true
227-
ip addr flush dev usb0 2>/dev/null || true
228-
{ while :; do
229-
pkill avahi-autoipd 2>/dev/null || true
230-
ifconfig usb0 0.0.0.0 2>/dev/null || true
231-
ifconfig usb0 down 2>/dev/null || true
232-
ip link set usb0 down 2>/dev/null || true
233-
ip addr flush dev usb0 2>/dev/null || true
234-
sleep 2
235-
done; } &'
236-
237208
# First we start the head of the ray cluster on one of the physical nodes
238209
# Give the head node actual resources to make it schedulable
239210

240211
head_cmd=$(cat <<EOF
241-
$NETWORK_INIT_CMDS
242212
# Touch a file to indicate that the head node has started
243213
# Overlapping srun commands will check this file to determine if we can overlap a container command
244214
touch $LOG_DIR/STARTED_RAY_HEAD
@@ -347,7 +317,6 @@ for ((i = 1; i < SLURM_JOB_NUM_NODES; i++)); do
347317
node_i=${nodes_array[$i]}
348318

349319
worker_cmd=$(cat <<EOF
350-
$NETWORK_INIT_CMDS
351320
env
352321
353322
exit-dramatically() {

0 commit comments

Comments
 (0)