66- ` CUDA_DEVICE_MAX_CONNECTIONS=1 ` must be enabled in the environment.
77- For best performance, point-to-point communication via _ CUDA Multicast_ needs CUDA Toolkit 12.0+
88 and CUDA driver 535+ on devices with compute capability 9.0 or newer.
9- - Devices older than compute capability 9.0 require ` UB_SKIPMC=1 ` in the environment in order fall
9+ - Devices older than compute capability 9.0 require ` UB_SKIPMC=1 ` in the environment in order to fall
1010 back on a less performant implementation based on CUDA Inter-Process Communication (IPC) handles.
1111
1212## Examples
@@ -22,7 +22,7 @@ $ torchrun --nnodes=1 --nproc-per-node=$(nvidia-smi -L | wc -l) te_layer_with_ov
2222# [rank0:node0] |-- Created tensor-parallel group: [0, 1, 2, 3, 4, 5, 6, 7]
2323# !!! [UB] Create UbufP2PCommOverlap Communicator
2424# UB_TIMEOUT is set to 110 sec, 217800000000 cycles, freq: 1980000khz
25- # MC initialized succesfully , window size = 549755813888
25+ # MC initialized successfully , window size = 549755813888
2626# !!! [UBP2P] Register UBuf 1
2727# !!! [UBP2P] Register UBuf 2
2828# !!! [UBP2P] Register UBuf 3
@@ -66,7 +66,7 @@ $ torchrun --nnodes=1 --nproc-per-node=$(nvidia-smi -L | wc -l) te_layer_with_ov
6666```
6767### Single node, mixed data- and tensor-parallel LayerNormMLP:
6868
69- Uses ` torch.nn.parallel.DistributedDataParallel ` for replicatin the model across 2 tensor-parallel
69+ Uses ` torch.nn.parallel.DistributedDataParallel ` for replicating the model across 2 tensor-parallel
7070groups in a single node.
7171
7272``` bash
@@ -81,7 +81,7 @@ $ torchrun --nnodes=1 --nproc-per-node=$(nvidia-smi -L | wc -l) te_layer_with_ov
8181# [rank2:node0] |-- Created data-parallel group: [2, 6]
8282# !!! [UB] Create UbufP2PCommOverlap Communicator
8383# UB_TIMEOUT is set to 110 sec, 217800000000 cycles, freq: 1980000khz
84- # MC initialized succesfully , window size = 549755813888
84+ # MC initialized successfully , window size = 549755813888
8585# !!! [UBP2P] Register UBuf 1
8686# !!! [UBP2P] Register UBuf 2
8787# !!! [UBP2P] Register UBuf 3
0 commit comments