Skip to content

Commit afab02f

Browse files
authored
translate : recipes_source/distributed_device_mesh.rst (#1132)
* translate : recipes_source/distributed_device_mesh.rst * modify : 177 line * 0530 modify : ํ”ผ๋“œ๋ฐฑ ๋ฐ˜์˜
1 parent 14f7a1c commit afab02f

1 file changed

Lines changed: 49 additions & 49 deletions

File tree

Lines changed: 49 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,42 @@
1-
Getting Started with DeviceMesh
1+
DeviceMesh ์‹œ์ž‘ํ•˜๊ธฐ
22
=====================================================
33

4-
**Author**: `Iris Zhang <https://github.com/wz337>`__, `Wanchao Liang <https://github.com/wanchaol>`__
4+
**์ €์ž**: `Iris Zhang <https://github.com/wz337>`__, `Wanchao Liang <https://github.com/wanchaol>`__
5+
**์—ญ์ž:** `๊ฐ•๋™์„ <https://github.com/ehdtjr>`_
56

67
.. note::
7-
|edit| View and edit this tutorial in `github <https://github.com/pytorchkorea/tutorials-kr/blob/main/recipes_source/distributed_device_mesh.rst>`__.
8+
|edit| ์ด ํŠœํ† ๋ฆฌ์–ผ์€ `github <https://github.com/PyTorchKorea/tutorials-kr/blob/master/recipes_source/distributed_device_mesh.rst>`__ ์—์„œ ๋ณด๊ฑฐ๋‚˜ ํŽธ์ง‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
89

9-
Prerequisites:
10+
์‚ฌ์ „ ์ค€๋น„(Prerequisites):
1011

11-
- `Distributed Communication Package - torch.distributed <https://pytorch.org/docs/stable/distributed.html>`__
12+
- `๋ถ„์‚ฐ ํ†ต์‹  ํŒจํ‚ค์ง€ - torch.distributed <https://pytorch.org/docs/stable/distributed.html>`__
1213
- Python 3.8 - 3.11
1314
- PyTorch 2.2
1415

1516

16-
Setting up distributed communicators, i.e. NVIDIA Collective Communication Library (NCCL) communicators, for distributed training can pose a significant challenge. For workloads where users need to compose different parallelisms,
17-
users would need to manually set up and manage NCCL communicators (for example, :class:`ProcessGroup`) for each parallelism solution. This process could be complicated and susceptible to errors.
18-
:class:`DeviceMesh` can simplify this process, making it more manageable and less prone to errors.
17+
๋ถ„์‚ฐ ํ•™์Šต์„ ์œ„ํ•ด ๋ถ„์‚ฐ ํ†ต์‹ ๊ธฐ(communicator), ์ฆ‰ NVIDIA Collective Communication Library(NCCL) ํ†ต์‹ ๊ธฐ๋ฅผ ์„ค์ •ํ•˜๋Š” ์ผ์€ ์ƒ๋‹นํ•œ ์–ด๋ ค์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„œ๋กœ ๋‹ค๋ฅธ ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹์„ ์กฐํ•ฉํ•ด์•ผ ํ•˜๋Š” ์ž‘์—…์ด๋ผ๋ฉด,
18+
๊ฐ ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹๋งˆ๋‹ค NCCL ํ†ต์‹ ๊ธฐ(์˜ˆ: :class:`ProcessGroup`)๋ฅผ ์ง์ ‘ ์„ค์ •ํ•˜๊ณ  ๊ด€๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ๋ณต์žกํ•˜๊ณ  ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค.
19+
:class:`DeviceMesh` ๋Š” ์ด ๊ณผ์ •์„ ๋‹จ์ˆœํ™”ํ•  ์ˆ˜ ์žˆ๊ณ , ๋” ๋‹ค๋ฃจ๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ค๋ฉฐ ์˜ค๋ฅ˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ๋„ ์ค„์—ฌ์ค๋‹ˆ๋‹ค.
1920

20-
What is DeviceMesh
21-
------------------
22-
:class:`DeviceMesh` is a higher level abstraction that manages :class:`ProcessGroup`. It allows users to effortlessly
23-
create inter-node and intra-node process groups without worrying about how to set up ranks correctly for different sub process groups.
24-
Users can also easily manage the underlying process_groups/devices for multi-dimensional parallelism via :class:`DeviceMesh`.
21+
DeviceMesh๋ž€ ๋ฌด์—‡์ธ๊ฐ€
22+
---------------------
23+
:class:`DeviceMesh` ๋Š” :class:`ProcessGroup` ์„ ๊ด€๋ฆฌํ•˜๋Š” ์ƒ์œ„ ์ˆ˜์ค€์˜ ์ถ”์ƒํ™”์ž…๋‹ˆ๋‹ค.
24+
์„œ๋กœ ๋‹ค๋ฅธ ํ•˜์œ„ ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋žญํฌ(rank)๋ฅผ ์–ด๋–ป๊ฒŒ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์„ค์ •ํ• ์ง€ ๊ณ ๋ฏผํ•˜์ง€ ์•Š๊ณ ๋„, ๋…ธ๋“œ ๊ฐ„(inter-node) ๋ฐ ๋…ธ๋“œ ๋‚ด(intra-node) ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน์„ ์†์‰ฝ๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
25+
๋˜ํ•œ :class:`DeviceMesh` ๋ฅผ ํ†ตํ•ด ๋‹ค์ฐจ์› ๋ณ‘๋ ฌํ™”์— ์‚ฌ์šฉ๋˜๋Š” ๋‚ด๋ถ€์˜ ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน๊ณผ ๋””๋ฐ”์ด์Šค๋ฅผ ์‰ฝ๊ฒŒ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
2526

2627
.. figure:: /_static/img/distributed/device_mesh.png
2728
:width: 100%
2829
:align: center
2930
:alt: PyTorch DeviceMesh
3031

31-
Why DeviceMesh is Useful
32+
DeviceMesh๊ฐ€ ์œ ์šฉํ•œ ์ด์œ 
3233
------------------------
33-
DeviceMesh is useful when working with multi-dimensional parallelism (i.e. 3-D parallel) where parallelism composability is required. For example, when your parallelism solutions require both communication across hosts and within each host.
34-
The image above shows that we can create a 2D mesh that connects the devices within each host, and connects each device with its counterpart on the other hosts in a homogeneous setup.
34+
DeviceMesh๋Š” ์—ฌ๋Ÿฌ ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹์„ ์กฐํ•ฉ(composability)ํ•ด์•ผ ํ•˜๋Š” ๋‹ค์ฐจ์› ๋ณ‘๋ ฌํ™”(์˜ˆ: 3์ฐจ์› ๋ณ‘๋ ฌ)๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹์ด ํ˜ธ์ŠคํŠธ ๊ฐ„ ํ†ต์‹ ๊ณผ ๊ฐ ํ˜ธ์ŠคํŠธ ๋‚ด๋ถ€์˜ ํ†ต์‹ ์„ ๋ชจ๋‘ ์š”๊ตฌํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๊ทธ๋ ‡์Šต๋‹ˆ๋‹ค.
35+
์œ„ ์ด๋ฏธ์ง€๋Š” ๋™์ผํ•œ ๊ตฌ์„ฑ์˜ ํ™˜๊ฒฝ์—์„œ ๊ฐ ํ˜ธ์ŠคํŠธ ๋‚ด๋ถ€์˜ ๋””๋ฐ”์ด์Šค๋ฅผ ์—ฐ๊ฒฐํ•˜๊ณ , ๊ฐ ๋””๋ฐ”์ด์Šค๋ฅผ ๋‹ค๋ฅธ ํ˜ธ์ŠคํŠธ์˜ ๋Œ€์‘ ๋””๋ฐ”์ด์Šค์™€ ์—ฐ๊ฒฐํ•˜๋Š” 2D ๋ฉ”์‹œ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
3536

36-
Without DeviceMesh, users would need to manually set up NCCL communicators, cuda devices on each process before applying any parallelism, which could be quite complicated.
37-
The following code snippet illustrates a hybrid sharding 2-D Parallel pattern setup without :class:`DeviceMesh`.
38-
First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
39-
replicate group to each rank.
37+
DeviceMesh๊ฐ€ ์—†๋‹ค๋ฉด, ์–ด๋–ค ๋ณ‘๋ ฌํ™”๋ฅผ ์ ์šฉํ•˜๊ธฐ ์ „์— ๊ฐ ํ”„๋กœ์„ธ์Šค๋งˆ๋‹ค NCCL ํ†ต์‹ ๊ธฐ์™€ CUDA ๋””๋ฐ”์ด์Šค๋ฅผ ์ง์ ‘ ์„ค์ •ํ•ด์•ผ ํ•˜๋ฉฐ, ์ด๋Š” ๊ฝค ๋ณต์žกํ•œ ์ž‘์—…์ž…๋‹ˆ๋‹ค.
38+
๋‹ค์Œ ์ฝ”๋“œ๋Š” :class:`DeviceMesh` ์—†์ด ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ƒค๋”ฉ(hybrid sharding) 2์ฐจ์› ๋ณ‘๋ ฌ ํŒจํ„ด์„ ์„ค์ •ํ•˜๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.
39+
๋จผ์ € ์ƒค๋“œ(shard) ๊ทธ๋ฃน๊ณผ ๋ณต์ œ ๊ทธ๋ฃน์„ ์ง์ ‘ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฐ ๋žญํฌ์— ์•Œ๋งž์€ ๊ทธ๋ฃน์„ ํ• ๋‹นํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
4040

4141
.. code-block:: python
4242
@@ -45,17 +45,17 @@ replicate group to each rank.
4545
import torch
4646
import torch.distributed as dist
4747
48-
# Understand world topology
48+
# ์›”๋“œ ํ† ํด๋กœ์ง€ ์ดํ•ด
4949
rank = int(os.environ["RANK"])
5050
world_size = int(os.environ["WORLD_SIZE"])
5151
print(f"Running example on {rank=} in a world with {world_size=}")
5252
53-
# Create process groups to manage 2-D like parallel pattern
53+
# 2์ฐจ์› ํ˜•ํƒœ์˜ ๋ณ‘๋ ฌ ํŒจํ„ด์„ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน ์ƒ์„ฑ
5454
dist.init_process_group("nccl")
5555
torch.cuda.set_device(rank)
5656
57-
# Create shard groups (e.g. (0, 1, 2, 3), (4, 5, 6, 7))
58-
# and assign the correct shard group to each rank
57+
# ์ƒค๋“œ ๊ทธ๋ฃน ์ƒ์„ฑ (์˜ˆ: (0, 1, 2, 3), (4, 5, 6, 7))
58+
# ๊ฐ ๋žญํฌ์— ์˜ฌ๋ฐ”๋ฅธ ์ƒค๋“œ ๊ทธ๋ฃน ํ• ๋‹น
5959
num_node_devices = torch.cuda.device_count()
6060
shard_rank_lists = list(range(0, num_node_devices // 2)), list(range(num_node_devices // 2, num_node_devices))
6161
shard_groups = (
@@ -66,8 +66,8 @@ replicate group to each rank.
6666
shard_groups[0] if rank in shard_rank_lists[0] else shard_groups[1]
6767
)
6868
69-
# Create replicate groups (for example, (0, 4), (1, 5), (2, 6), (3, 7))
70-
# and assign the correct replicate group to each rank
69+
# ๋ณต์ œ ๊ทธ๋ฃน ์ƒ์„ฑ (์˜ˆ: (0, 4), (1, 5), (2, 6), (3, 7))
70+
# ๊ฐ ๋žญํฌ์— ์˜ฌ๋ฐ”๋ฅธ ๋ณต์ œ ๊ทธ๋ฃน ํ• ๋‹น
7171
current_replicate_group = None
7272
shard_factor = len(shard_rank_lists[0])
7373
for i in range(num_node_devices // 2):
@@ -76,44 +76,44 @@ replicate group to each rank.
7676
if rank in replicate_group_ranks:
7777
current_replicate_group = replicate_group
7878
79-
To run the above code snippet, we can leverage PyTorch Elastic. Let's create a file named ``2d_setup.py``.
80-
Then, run the following `torch elastic/torchrun <https://pytorch.org/docs/stable/elastic/quickstart.html>`__ command.
79+
์œ„ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ ค๋ฉด PyTorch Elastic์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ``2d_setup.py`` ๋ผ๋Š” ํŒŒ์ผ์„ ๋งŒ๋“  ๋’ค,
80+
`torch elastic/torchrun <https://pytorch.org/docs/stable/elastic/quickstart.html>`__ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์„ธ์š”.
8181

8282
.. code-block:: python
8383
8484
torchrun --nproc_per_node=8 --rdzv_id=100 --rdzv_endpoint=localhost:29400 2d_setup.py
8585
8686
.. note::
87-
For simplicity of demonstration, we are simulating 2D parallel using only one node. Note that this code snippet can also be used when running on multi hosts setup.
87+
์˜ˆ์‹œ๋ฅผ ๊ฐ„๋‹จํžˆ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด ๋‹จ์ผ ๋…ธ๋“œ๋งŒ ์‚ฌ์šฉํ•ด 2D ๋ณ‘๋ ฌ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๋Š” ๋ฉ€ํ‹ฐ ํ˜ธ์ŠคํŠธ ํ™˜๊ฒฝ์—์„œ๋„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
8888

89-
With the help of :func:`init_device_mesh`, we can accomplish the above 2D setup in just two lines, and we can still
90-
access the underlying :class:`ProcessGroup` if needed.
89+
:func:`init_device_mesh` ๋ฅผ ํ™œ์šฉํ•˜๋ฉด ์œ„์˜ 2D ์„ค์ •์„ ๋‹จ ๋‘ ์ค„๋กœ ๋๋‚ผ ์ˆ˜ ์žˆ๊ณ , ํ•„์š”ํ•  ๋•Œ๋Š”
90+
๋‚ด๋ถ€์˜ :class:`ProcessGroup` ์—๋„ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
9191

9292

9393
.. code-block:: python
9494
9595
from torch.distributed.device_mesh import init_device_mesh
9696
mesh_2d = init_device_mesh("cuda", (2, 4), mesh_dim_names=("replicate", "shard"))
9797
98-
# Users can access the underlying process group thru `get_group` API.
98+
# `get_group` API๋ฅผ ํ†ตํ•ด ๋‚ด๋ถ€ ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
9999
replicate_group = mesh_2d.get_group(mesh_dim="replicate")
100100
shard_group = mesh_2d.get_group(mesh_dim="shard")
101101
102-
Let's create a file named ``2d_setup_with_device_mesh.py``.
103-
Then, run the following `torch elastic/torchrun <https://pytorch.org/docs/stable/elastic/quickstart.html>`__ command.
102+
``2d_setup_with_device_mesh.py`` ๋ผ๋Š” ํŒŒ์ผ์„ ๋งŒ๋“  ๋’ค,
103+
`torch elastic/torchrun <https://pytorch.org/docs/stable/elastic/quickstart.html>`__ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์„ธ์š”.
104104

105105
.. code-block:: python
106106
107107
torchrun --nproc_per_node=8 2d_setup_with_device_mesh.py
108108
109109
110-
How to use DeviceMesh with HSDP
110+
HSDP์—์„œ DeviceMesh๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
111111
-------------------------------
112112

113-
Hybrid Sharding Data Parallel(HSDP) is 2D strategy to perform FSDP within a host and DDP across hosts.
113+
Hybrid Sharding Data Parallel(HSDP)์€ ํ˜ธ์ŠคํŠธ ๋‚ด๋ถ€์—์„œ๋Š” FSDP๋ฅผ, ํ˜ธ์ŠคํŠธ ๊ฐ„์—๋Š” DDP๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” 2D ์ „๋žต์ž…๋‹ˆ๋‹ค.
114114

115-
Let's see an example of how DeviceMesh can assist with applying HSDP to your model with a simple setup. With DeviceMesh,
116-
users would not need to manually create and manage shard group and replicate group.
115+
DeviceMesh๊ฐ€ ๊ฐ„๋‹จํ•œ ์„ค์ •์œผ๋กœ ๋ชจ๋ธ์— HSDP๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์–ด๋–ป๊ฒŒ ๋„์›€์ด ๋˜๋Š”์ง€ ์˜ˆ์‹œ๋กœ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. DeviceMesh๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด
116+
์ƒค๋“œ ๊ทธ๋ฃน๊ณผ ๋ณต์ œ ๊ทธ๋ฃน์„ ์ง์ ‘ ๋งŒ๋“ค๊ณ  ๊ด€๋ฆฌํ•˜์ง€ ์•Š์•„๋„ ๋ฉ๋‹ˆ๋‹ค.
117117

118118
.. code-block:: python
119119
@@ -141,39 +141,39 @@ users would not need to manually create and manage shard group and replicate gro
141141
ToyModel(), device_mesh=mesh_2d
142142
)
143143
144-
Let's create a file named ``hsdp.py``.
145-
Then, run the following `torch elastic/torchrun <https://pytorch.org/docs/stable/elastic/quickstart.html>`__ command.
144+
``hsdp.py`` ๋ผ๋Š” ํŒŒ์ผ์„ ๋งŒ๋“  ๋’ค,
145+
`torch elastic/torchrun <https://pytorch.org/docs/stable/elastic/quickstart.html>`__ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์„ธ์š”.
146146

147147
.. code-block:: python
148148
149149
torchrun --nproc_per_node=8 hsdp.py
150150
151-
How to use DeviceMesh for your custom parallel solutions
151+
์‚ฌ์šฉ์ž ์ •์˜ ๋ณ‘๋ ฌ ๋ฐฉ์‹์—์„œ DeviceMesh๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
152152
--------------------------------------------------------
153-
When working with large scale training, you might have more complex custom parallel training composition. For example, you may need to slice out sub-meshes for different parallelism solutions.
154-
DeviceMesh allows users to slice child mesh from the parent mesh and re-use the NCCL communicators already created when the parent mesh is initialized.
153+
๋Œ€๊ทœ๋ชจ ํ•™์Šต ํ™˜๊ฒฝ์—์„œ๋Š” ๋” ๋ณต์žกํ•œ ์‚ฌ์šฉ์ž ์ •์˜ ๋ณ‘๋ ฌ ํ•™์Šต ๊ตฌ์„ฑ์„ ๋‹ค๋ค„์•ผ ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์„œ๋กœ ๋‹ค๋ฅธ ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹์— ๋งž์ถฐ ํ•˜์œ„ ๋ฉ”์‹œ(sub-mesh)๋ฅผ ๋‚˜๋ˆ„์–ด ์‚ฌ์šฉํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
154+
DeviceMesh๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ƒ์œ„ ๋ฉ”์‹œ์—์„œ ํ•˜์œ„ ๋ฉ”์‹œ๋ฅผ ์ž˜๋ผ๋‚ด๊ณ , ์ƒ์œ„ ๋ฉ”์‹œ๋ฅผ ์ดˆ๊ธฐํ™”ํ•  ๋•Œ ์ด๋ฏธ ๋งŒ๋“ค์–ด์ง„ NCCL ํ†ต์‹ ๊ธฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
155155

156156
.. code-block:: python
157157
158158
from torch.distributed.device_mesh import init_device_mesh
159159
mesh_3d = init_device_mesh("cuda", (2, 2, 2), mesh_dim_names=("replicate", "shard", "tp"))
160160
161-
# Users can slice child meshes from the parent mesh.
161+
# ์ƒ์œ„ ๋ฉ”์‹œ์—์„œ ํ•˜์œ„ ๋ฉ”์‹œ๋ฅผ ์ž˜๋ผ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
162162
hsdp_mesh = mesh_3d["replicate", "shard"]
163163
tp_mesh = mesh_3d["tp"]
164164
165-
# Users can access the underlying process group thru `get_group` API.
165+
# `get_group` API๋ฅผ ํ†ตํ•ด ๋‚ด๋ถ€ ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
166166
replicate_group = hsdp_mesh["replicate"].get_group()
167167
shard_group = hsdp_mesh["shard"].get_group()
168168
tp_group = tp_mesh.get_group()
169169
170170
171-
Conclusion
171+
๊ฒฐ๋ก 
172172
----------
173-
In conclusion, we have learned about :class:`DeviceMesh` and :func:`init_device_mesh`, as well as how
174-
they can be used to describe the layout of devices across the cluster.
173+
์ง€๊ธˆ๊นŒ์ง€ :class:`DeviceMesh` ์™€ :func:`init_device_mesh` ๋ฅผ ์‚ดํŽด๋ณด๊ณ ,
174+
์ด๋ฅผ ํ™œ์šฉํ•ด ํด๋Ÿฌ์Šคํ„ฐ์— ๋ถ„์‚ฐ๋œ ๋””๋ฐ”์ด์Šค์˜ ๋ฐฐ์น˜๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์•Œ์•„๋ดค์Šต๋‹ˆ๋‹ค.
175175

176-
For more information, please see the following:
176+
๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ ์ž๋ฃŒ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
177177

178178
- `2D parallel combining Tensor/Sequence Parallel with FSDP <https://github.com/pytorch/examples/blob/main/distributed/tensor_parallelism/fsdp_tp_example.py>`__
179179
- `Composable PyTorch Distributed with PT2 <https://static.sched.com/hosted_files/pytorch2023/d1/%5BPTC%2023%5D%20Composable%20PyTorch%20Distributed%20with%20PT2.pdf>`__

0 commit comments

Comments
ย (0)