Skip to content

Commit b629e6e

Browse files
authored
docs: fix comm GEMM overlap README typos (#3010)
Signed-off-by: LeSingh1 <sshaurya914@gmail.com>
1 parent ca50bbf commit b629e6e

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

examples/pytorch/comm_gemm_overlap/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
- `CUDA_DEVICE_MAX_CONNECTIONS=1` must be enabled in the environment.
77
- For best performance, point-to-point communication via _CUDA Multicast_ needs CUDA Toolkit 12.0+
88
and CUDA driver 535+ on devices with compute capability 9.0 or newer.
9-
- Devices older than compute capability 9.0 require `UB_SKIPMC=1` in the environment in order fall
9+
- Devices older than compute capability 9.0 require `UB_SKIPMC=1` in the environment in order to fall
1010
back on a less performant implementation based on CUDA Inter-Process Communication (IPC) handles.
1111

1212
## Examples
@@ -22,7 +22,7 @@ $ torchrun --nnodes=1 --nproc-per-node=$(nvidia-smi -L | wc -l) te_layer_with_ov
2222
# [rank0:node0] |-- Created tensor-parallel group: [0, 1, 2, 3, 4, 5, 6, 7]
2323
# !!! [UB] Create UbufP2PCommOverlap Communicator
2424
# UB_TIMEOUT is set to 110 sec, 217800000000 cycles, freq: 1980000khz
25-
# MC initialized succesfully, window size = 549755813888
25+
# MC initialized successfully, window size = 549755813888
2626
# !!! [UBP2P] Register UBuf 1
2727
# !!! [UBP2P] Register UBuf 2
2828
# !!! [UBP2P] Register UBuf 3
@@ -66,7 +66,7 @@ $ torchrun --nnodes=1 --nproc-per-node=$(nvidia-smi -L | wc -l) te_layer_with_ov
6666
```
6767
### Single node, mixed data- and tensor-parallel LayerNormMLP:
6868

69-
Uses `torch.nn.parallel.DistributedDataParallel` for replicatin the model across 2 tensor-parallel
69+
Uses `torch.nn.parallel.DistributedDataParallel` for replicating the model across 2 tensor-parallel
7070
groups in a single node.
7171

7272
```bash
@@ -81,7 +81,7 @@ $ torchrun --nnodes=1 --nproc-per-node=$(nvidia-smi -L | wc -l) te_layer_with_ov
8181
# [rank2:node0] |-- Created data-parallel group: [2, 6]
8282
# !!! [UB] Create UbufP2PCommOverlap Communicator
8383
# UB_TIMEOUT is set to 110 sec, 217800000000 cycles, freq: 1980000khz
84-
# MC initialized succesfully, window size = 549755813888
84+
# MC initialized successfully, window size = 549755813888
8585
# !!! [UBP2P] Register UBuf 1
8686
# !!! [UBP2P] Register UBuf 2
8787
# !!! [UBP2P] Register UBuf 3

0 commit comments

Comments
 (0)