Skip to content

Commit e044bc7

Browse files
committed
- Update torch version in requirements.txt
- Remove CPU execution option since DDP requires 2 GPUs for this example. - Refine README.md for DDP RPC example clarity and detail Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>
1 parent a84f91c commit e044bc7

3 files changed

Lines changed: 7 additions & 18 deletions

File tree

distributed/rpc/ddp_rpc/README.md

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,10 @@
11
Distributed DataParallel + Distributed RPC Framework Example
22

3-
The example shows how to combine Distributed DataParallel with the Distributed
4-
RPC Framework. There are two trainer nodes, 1 master node and 1 parameter
5-
server in the example.
3+
This example demonstrates how to combine Distributed DataParallel (DDP) with the Distributed RPC Framework. It requires two trainer nodes (each with a GPU), one master node, and one parameter server.
64

7-
The master node creates an embedding table on the parameter server and drives
8-
the training loop on the trainers. The model consists of a dense part
9-
(nn.Linear) replicated on the trainers via Distributed DataParallel and a
10-
sparse part (nn.EmbeddingBag) which resides on the parameter server. Each
11-
trainer performs an embedding lookup on the parameter server (using the
12-
Distributed RPC Framework) and then executes its local nn.Linear module.
13-
During the backward pass, the gradients for the dense part are aggregated via
14-
allreduce by DDP and the distributed backward pass updates the parameters for
15-
the embedding table on the parameter server.
5+
The master node initializes an embedding table on the parameter server and orchestrates the training loop across the trainers. The model is composed of a dense component (`nn.Linear`), which is replicated on the trainers using DDP, and a sparse component (`nn.EmbeddingBag`), which resides on the parameter server.
6+
7+
Each trainer performs embedding lookups on the parameter server via RPC, then processes the results through its local `nn.Linear` module. During the backward pass, DDP aggregates gradients for the dense part using allreduce, while the distributed backward pass updates the embedding table parameters on the parameter server.
168

179

1810
```

distributed/rpc/ddp_rpc/main.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -150,11 +150,8 @@ def run_worker(rank, world_size):
150150
for fut in futs:
151151
fut.wait()
152152
elif rank <= 1:
153-
if torch.accelerator.is_available():
154-
acc = torch.accelerator.current_accelerator()
155-
device = torch.device(acc)
156-
else:
157-
device = torch.device("cpu")
153+
acc = torch.accelerator.current_accelerator()
154+
device = torch.device(acc)
158155
backend = torch.distributed.get_default_backend_for_device(device)
159156
torch.accelerator.device_index(rank)
160157
# Initialize process group for Distributed DataParallel on trainers.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
torch>=2.7.1
1+
torch>=2.7.0
22
numpy

0 commit comments

Comments
 (0)