Skip to content

Commit 441e62c

Browse files
Taeyoung96Taeyoung96
andauthored
advanced/torch-script-parallelism.rst λ²ˆμ—­ (#386)
* Translate torch-script-parallelism.rst Co-authored-by: Taeyoung96 <[tyoung96@naver.com]>
1 parent c8294d3 commit 441e62c

1 file changed

Lines changed: 110 additions & 110 deletions

File tree

Lines changed: 110 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
1-
Dynamic Parallelism in TorchScript
2-
==================================
1+
TorchScript의 동적 병렬 처리(Dynamic Parallelism)
2+
===================================================
33

4-
In this tutorial, we introduce the syntax for doing *dynamic inter-op parallelism*
5-
in TorchScript. This parallelism has the following properties:
4+
이 νŠœν† λ¦¬μ–Όμ—μ„œλŠ”, TorchScriptμ—μ„œ *동적 inter-op λ³‘λ ¬μ²˜λ¦¬* λ₯Ό ν•˜λŠ” ꡬ문(syntax)을 μ†Œκ°œν•©λ‹ˆλ‹€.
5+
이 λ³‘λ ¬μ²˜λ¦¬μ—λŠ” λ‹€μŒκ³Ό 같은 속성이 μžˆμŠ΅λ‹ˆλ‹€:
66

7-
* dynamic - The number of parallel tasks created and their workload can depend on the control flow of the program.
8-
* inter-op - The parallelism is concerned with running TorchScript program fragments in parallel. This is distinct from *intra-op parallelism*, which is concerned with splitting up individual operators and running subsets of the operator's work in parallel.
9-
Basic Syntax
7+
* 동적(dynamic) - μƒμ„±λœ 병렬 μž‘μ—…μ˜ μˆ˜μ™€ μž‘μ—… λΆ€ν•˜λŠ” ν”„λ‘œκ·Έλž¨μ˜ μ œμ–΄ 흐름에 따라 λ‹¬λΌμ§ˆ 수 μžˆμŠ΅λ‹ˆλ‹€.
8+
* inter-op - 병렬 μ²˜λ¦¬λŠ” TorchScript ν”„λ‘œκ·Έλž¨ 쑰각을 λ³‘λ ¬λ‘œ μ‹€ν–‰ν•˜λŠ” 것과 관련이 μžˆμŠ΅λ‹ˆλ‹€. μ΄λŠ” κ°œλ³„ μ—°μ‚°μžλ₯Ό λΆ„ν• ν•˜κ³  μ—°μ‚°μž μž‘μ—…μ˜ ν•˜μœ„ 집합을 λ³‘λ ¬λ‘œ μ‹€ν–‰ν•˜λŠ” 방식인 *intra-op parallelism* μ™€λŠ” κ΅¬λ³„λ©λ‹ˆλ‹€.
9+
기본 ꡬ문
1010
------------
1111

12-
The two important APIs for dynamic parallelism are:
12+
동적 병렬 처리λ₯Ό μœ„ν•œ 두 κ°€μ§€ μ€‘μš”ν•œ APIλŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
1313

1414
* ``torch.jit.fork(fn : Callable[..., T], *args, **kwargs) -> torch.jit.Future[T]``
1515
* ``torch.jit.wait(fut : torch.jit.Future[T]) -> T``
1616

17-
A good way to demonstrate how these work is by way of an example:
17+
μ΄λŸ¬ν•œ μž‘λ™ 방식은 λ‹€μŒ μ˜ˆμ œμ—μ„œ 잘 이해할 수 μžˆμŠ΅λ‹ˆλ‹€:
1818

1919
.. code-block:: python
2020
@@ -25,37 +25,37 @@ A good way to demonstrate how these work is by way of an example:
2525
2626
@torch.jit.script
2727
def example(x):
28-
# Call `foo` using parallelism:
29-
# First, we "fork" off a task. This task will run `foo` with argument `x`
28+
# λ³‘λ ¬μ μœΌλ‘œ `foo` λ₯Ό ν˜ΈμΆœν•©λ‹ˆλ‹€.
29+
# λ¨Όμ €, μž‘μ—…μ„ "fork" ν•©λ‹ˆλ‹€. 이 μž‘μ—…μ€ `x` 인자(argument)와 ν•¨κ»˜ `foo` λ₯Ό μ‹€ν–‰ν•©λ‹ˆλ‹€.
3030
future = torch.jit.fork(foo, x)
3131
32-
# Call `foo` normally
32+
# 일반적으둜 `foo`λ₯Ό ν˜ΈμΆœν•©λ‹ˆλ‹€.
3333
x_normal = foo(x)
3434
35-
# Second, we "wait" on the task. Since the task may be running in
36-
# parallel, we have to "wait" for its result to become available.
37-
# Notice that by having lines of code between the "fork()" and "wait()"
38-
# call for a given Future, we can overlap computations so that they
39-
# run in parallel.
35+
# λ‘˜μ§Έ, μž‘μ—…μ„ "κΈ°λ‹€λ¦½λ‹ˆλ‹€".
36+
# μž‘μ—…μ΄ λ³‘λ ¬λ‘œ μ‹€ν–‰ 쀑일 수 μžˆμœΌλ―€λ‘œ κ²°κ³Όλ₯Ό μ‚¬μš©ν•  수 μžˆμ„ λ•ŒκΉŒμ§€ "λŒ€κΈ°" ν•΄μ•Όν•©λ‹ˆλ‹€.
37+
# 계산을 λ³‘λ ¬λ‘œ μˆ˜ν–‰ν•˜κΈ° μœ„ν•΄μ„œ
38+
# "fork()" 와 "wait()" μ‚¬μ΄μ—μ„œ
39+
# Futureλ₯Ό ν˜ΈμΆœν•˜λŠ” 점에 μœ μ˜ν•˜μ„Έμš”.
4040
x_parallel = torch.jit.wait(future)
4141
4242
return x_normal, x_parallel
4343
4444
print(example(torch.ones(1))) # (-1., -1.)
4545
4646
47-
``fork()`` takes the callable ``fn`` and arguments to that callable ``args``
48-
and ``kwargs`` and creates an asynchronous task for the execution of ``fn``.
49-
``fn`` can be a function, method, or Module instance. ``fork()`` returns a
50-
reference to the value of the result of this execution, called a ``Future``.
51-
Because ``fork`` returns immediately after creating the async task, ``fn`` may
52-
not have been executed by the time the line of code after the ``fork()`` call
53-
is executed. Thus, ``wait()`` is used to wait for the async task to complete
54-
and return the value.
47+
``fork()`` λŠ” 호좜 κ°€λŠ₯ν•œ(callable) ``fn`` ,그에 λŒ€ν•œ 호좜 κ°€λŠ₯ν•œ 인자 ``args``
48+
및 ``kwargs`` λ₯Ό μ·¨ν•˜κ³  ``fn`` 싀행을 μœ„ν•œ 비동기(asynchronous) μž‘μ—…μ„ μƒμ„±ν•©λ‹ˆλ‹€.
49+
``fn`` 은 ν•¨μˆ˜, λ©”μ†Œλ“œ, λ˜λŠ” λͺ¨λ“ˆ μΈμŠ€ν„΄μŠ€μΌ 수 μžˆμŠ΅λ‹ˆλ‹€.
50+
``fork()`` λŠ” ``Future`` 라고 λΆˆλ¦¬λŠ” 이 μ‹€ν–‰ 결과의 값에 λŒ€ν•œ μ°Έμ‘°(reference)λ₯Ό λ°˜ν™˜ν•©λ‹ˆλ‹€.
51+
``fork`` λŠ” 비동기 μž‘μ—…μ„ μƒμ„±ν•œ 직후에 λ°˜ν™˜λ˜κΈ° λ•Œλ¬Έμ—,
52+
``fork()`` 호좜 ν›„ μ½”λ“œ 라인이 싀행될 λ•ŒκΉŒμ§€ ``fn`` 이 μ‹€ν–‰λ˜μ§€ μ•Šμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.
53+
λ”°λΌμ„œ, ``wait()`` 은 비동기 μž‘μ—…μ΄ μ™„λ£Œ λ λ•ŒκΉŒμ§€ λŒ€κΈ°ν•˜κ³  값을
54+
λ°˜ν™˜ν•˜λŠ”λ° μ‚¬μš©λ©λ‹ˆλ‹€.
5555

56-
These constructs can be used to overlap the execution of statements within a
57-
function (shown in the worked example section) or be composed with other language
58-
constructs like loops:
56+
μ΄λŸ¬ν•œ κ΅¬μ‘°λŠ” ν•¨μˆ˜ λ‚΄μ—μ„œ λͺ…λ Ήλ¬Έ 싀행을 μ€‘μ²©ν•˜κ±°λ‚˜
57+
μž‘μ—…λœ (예제 μ„Ήμ…˜μ— ν‘œμ‹œλ¨) 루프와 같은 λ‹€λ₯Έ
58+
μ–Έμ–΄ ꡬ쑰둜 ꡬ성 될 수 μžˆμŠ΅λ‹ˆλ‹€:
5959

6060
.. code-block:: python
6161
@@ -81,55 +81,55 @@ constructs like loops:
8181
8282
.. note::
8383

84-
When we initialized an empty list of Futures, we needed to add an explicit
85-
type annotation to ``futures``. In TorchScript, empty containers default
86-
to assuming they contain Tensor values, so we annotate the list constructor
87-
# as being of type ``List[torch.jit.Future[torch.Tensor]]``
84+
Future의 빈 리슀트(list)λ₯Ό μ΄ˆκΈ°ν™”ν• λ•Œ, λͺ…μ‹œμ μΈ μœ ν˜• 주석을 ``futures`` 에 μΆ”κ°€ν•΄μ•Ό ν–ˆμŠ΅λ‹ˆλ‹€.
85+
TorchScriptμ—μ„œ 빈 μ»¨ν…Œμ΄λ„ˆ(container)λŠ” 기본적으둜 tensor 값을 ν¬ν•¨ν•œλ‹€κ³  κ°€μ •ν•˜λ―€λ‘œ
86+
리슀트 μƒμ„±μž(constructor) #에
87+
``List[torch.jit.Future[torch.Tensor]]`` μœ ν˜•μ˜ 주석을 λ‹¬μ•˜μŠ΅λ‹ˆλ‹€.
8888

89-
This example uses ``fork()`` to launch 100 instances of the function ``foo``,
90-
waits on the 100 tasks to complete, then sums the results, returning ``-100.0``.
89+
이 μ˜ˆμ œλŠ” ``fork()`` λ₯Ό μ‚¬μš©ν•˜μ—¬ ν•¨μˆ˜ ``foo`` 의 μΈμŠ€ν„΄μŠ€ 100개λ₯Ό μ‹œμž‘ν•˜κ³ , 100개의 μž‘μ—…μ΄ μ™„λ£Œ λ λ•ŒκΉŒμ§€
90+
λŒ€κΈ°ν•œ λ‹€μŒ, κ²°κ³Όλ₯Ό ν•©μ‚°ν•˜μ—¬ ``-100.0`` 을 λ°˜ν™˜ν•©λ‹ˆλ‹€.
9191

92-
Applied Example: Ensemble of Bidirectional LSTMs
93-
------------------------------------------------
92+
적용된 μ˜ˆμ‹œ: μ–‘λ°©ν–₯(bidirectional) LSTMs의 앙상블(Ensemble)
93+
------------------------------------------------------------
9494

95-
Let's try to apply parallelism to a more realistic example and see what sort
96-
of performance we can get out of it. First, let's define the baseline model: an
97-
ensemble of bidirectional LSTM layers.
95+
보닀 ν˜„μ‹€μ μΈ μ˜ˆμ‹œμ— 병렬화λ₯Ό μ μš©ν•˜κ³  μ–΄λ–€ μ„±λŠ₯을 얻을 수 μžˆλŠ”μ§€ μ‚΄νŽ΄λ΄…μ‹œλ‹€.
96+
λ¨Όμ €, μ–‘λ°©ν–₯ LSTM κ³„μΈ΅μ˜ 앙상블인
97+
κΈ°μ€€ λͺ¨λΈμ„ μ •μ˜ν•©μ‹œλ‹€.
9898

9999
.. code-block:: python
100100
101101
import torch, time
102102
103-
# In RNN parlance, the dimensions we care about are:
104-
# # of time-steps (T)
105-
# Batch size (B)
106-
# Hidden size/number of "channels" (C)
103+
# RNN μš©μ–΄μ—μ„œλŠ” μš°λ¦¬κ°€ 관심 κ°–λŠ” 차원듀을 μ•„λž˜μ™€ 같이 λΆ€λ¦…λ‹ˆλ‹€:
104+
# λ‹¨μœ„μ‹œκ°„μ˜ 갯수 (T)
105+
# 배치 크기 (B)
106+
# "channels"의 μˆ¨κ²¨μ§„ 크기/숫자 (C)
107107
T, B, C = 50, 50, 1024
108108
109-
# A module that defines a single "bidirectional LSTM". This is simply two
110-
# LSTMs applied to the same sequence, but one in reverse
109+
# 단일 "μ–‘λ°©ν–₯ LSTM"을 μ •μ˜ν•˜λŠ” λͺ¨λ“ˆμž…λ‹ˆλ‹€.
110+
# μ΄λŠ” λ‹¨μˆœνžˆ λ™μΌν•œ μ‹œν€€μŠ€μ— 적용된 두 개의 LSTMμ΄μ§€λ§Œ ν•˜λ‚˜λŠ” λ°˜λŒ€λ‘œ μ μš©λ©λ‹ˆλ‹€.
111111
class BidirectionalRecurrentLSTM(torch.nn.Module):
112112
def __init__(self):
113113
super().__init__()
114114
self.cell_f = torch.nn.LSTM(input_size=C, hidden_size=C)
115115
self.cell_b = torch.nn.LSTM(input_size=C, hidden_size=C)
116116
117117
def forward(self, x : torch.Tensor) -> torch.Tensor:
118-
# Forward layer
118+
# Forward 계측
119119
output_f, _ = self.cell_f(x)
120120
121-
# Backward layer. Flip input in the time dimension (dim 0), apply the
122-
# layer, then flip the outputs in the time dimension
121+
# Backward 계측. μ‹œκ°„ 차원(time dimension)(dim 0)μ—μ„œ μž…λ ₯을 flip (dim 0),
122+
# 계측 μ μš©ν•˜κ³ , μ‹œκ°„ μ°¨μ›μ—μ„œ 좜λ ₯을 flip ν•©λ‹ˆλ‹€.
123123
x_rev = torch.flip(x, dims=[0])
124124
output_b, _ = self.cell_b(torch.flip(x, dims=[0]))
125125
output_b_rev = torch.flip(output_b, dims=[0])
126126
127127
return torch.cat((output_f, output_b_rev), dim=2)
128128
129129
130-
# An "ensemble" of `BidirectionalRecurrentLSTM` modules. The modules in the
131-
# ensemble are run one-by-one on the same input then their results are
132-
# stacked and summed together, returning the combined result.
130+
# `BidirectionalRecurrentLSTM` λͺ¨λ“ˆμ˜ "ensemble"μž…λ‹ˆλ‹€.
131+
# μ•™μƒλΈ”μ˜ λͺ¨λ“ˆμ€ 같은 μž…λ ₯으둜 ν•˜λ‚˜ν•˜λ‚˜μ”© μ‹€ν–‰λ˜κ³ ,
132+
# λˆ„μ λ˜κ³  ν•©μ‚°λœ κ²°κ³Όλ₯Ό λ°˜ν™˜ν•©λ‹ˆλ‹€.
133133
class LSTMEnsemble(torch.nn.Module):
134134
def __init__(self, n_models):
135135
super().__init__()
@@ -143,110 +143,110 @@ ensemble of bidirectional LSTM layers.
143143
results.append(model(x))
144144
return torch.stack(results).sum(dim=0)
145145
146-
# For a head-to-head comparison to what we're going to do with fork/wait, let's
147-
# instantiate the model and compile it with TorchScript
146+
# fork/wait으둜 μ‹€ν–‰ν•  κ²ƒλ“€μ˜ 직접 비ꡐλ₯Ό μœ„ν•΄
147+
# λͺ¨λ“ˆμ„ μΈμŠ€ν„΄μŠ€ν™”ν•˜κ³  TorchScriptλ₯Ό 톡해 μ»΄νŒŒμΌν•΄ λ΄…μ‹œλ‹€.
148148
ens = torch.jit.script(LSTMEnsemble(n_models=4))
149149
150-
# Normally you would pull this input out of an embedding table, but for the
151-
# purpose of this demo let's just use random data.
150+
# 일반적으둜 μž„λ² λ”© ν…Œμ΄λΈ”(embedding table)μ—μ„œ μž…λ ₯을 κ°€μ Έμ˜€μ§€λ§Œ,
151+
# 데λͺ¨λ₯Ό μœ„ν•΄ μ—¬κΈ°μ„œλŠ” λ¬΄μž‘μœ„ 데이터λ₯Ό μ‚¬μš©ν•˜κ² μŠ΅λ‹ˆλ‹€.
152152
x = torch.rand(T, B, C)
153153
154-
# Let's run the model once to warm up things like the memory allocator
154+
# λ©”λͺ¨λ¦¬ ν• λ‹Ήμž(memory allocator) 등을 μ€€λΉ„μ‹œν‚€κΈ° μœ„ν•΄ λͺ¨λΈμ„ λ¨Όμ € ν•œλ²ˆ μ‹€ν–‰ν•©λ‹ˆλ‹€.
155155
ens(x)
156156
157157
x = torch.rand(T, B, C)
158158
159-
# Let's see how fast it runs!
159+
# μ–Όλ§ˆλ‚˜ λΉ λ₯΄κ²Œ μ‹€ν–‰λ˜λŠ”μ§€ λ΄…μ‹œλ‹€!
160160
s = time.time()
161161
ens(x)
162162
print('Inference took', time.time() - s, ' seconds')
163163
164-
On my machine, this network runs in ``2.05`` seconds. We can do a lot better!
164+
제 μ»΄ν“¨ν„°μ—μ„œλŠ” λ„€νŠΈμ›Œν¬κ°€ ``2.05`` 초 λ§Œμ— μ‹€ν–‰λ˜μ—ˆμŠ΅λ‹ˆλ‹€. 훨씬 더 λΉ λ₯΄κ²Œ ν•  수 μžˆμŠ΅λ‹ˆλ‹€!
165165

166-
Parallelizing Forward and Backward Layers
167-
-----------------------------------------
166+
Forward, Backward 계측 병렬화
167+
----------------------------------
168168

169-
A very simple thing we can do is parallelize the forward and backward layers
170-
within ``BidirectionalRecurrentLSTM``. For this, the structure of the computation
171-
is static, so we don't actually even need any loops. Let's rewrite the ``forward``
172-
method of ``BidirectionalRecurrentLSTM`` like so:
169+
κ°„λ‹¨ν•˜κ²Œ ν•  수 μžˆλŠ” μΌλ‘œλŠ” ``BidirectionalRecurrentLSTM`` λ‚΄μ—μ„œ forward, backward 계측듀을 λ³‘λ ¬ν™”ν•˜λŠ” 것이 μžˆμŠ΅λ‹ˆλ‹€.
170+
이 λ•Œ, 계산 κ΅¬μ‘°λŠ” κ³ μ •λ˜μ–΄ μžˆμœΌλ―€λ‘œ μš°λ¦¬λŠ” μ–΄λ–€ 루프도 ν•„μš”λ‘œ ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.
171+
``BidirectionalRecurrentLSTM`` 의 ``forward`` λ©”μ†Œλ“œλ₯Ό λ‹€μŒκ³Ό 같이 μž¬μž‘μ„±ν•΄λ΄…μ‹œλ‹€:
173172

174173
.. code-block:: python
175174
176175
def forward(self, x : torch.Tensor) -> torch.Tensor:
177-
# Forward layer - fork() so this can run in parallel to the backward
178-
# layer
176+
177+
# Backward 계측과 λ³‘λ ¬λ‘œ μ‹€ν–‰μ‹œν‚€κΈ° μœ„ν•΄ forward layerλ₯Ό fork()λ₯Ό ν•œλ‹€.
179178
future_f = torch.jit.fork(self.cell_f, x)
180179
181-
# Backward layer. Flip input in the time dimension (dim 0), apply the
182-
# layer, then flip the outputs in the time dimension
180+
# Backward 계측. μ‹œκ°„ 차원(time dimension)(dim 0)μ—μ„œ μž…λ ₯을 flip (dim 0),
181+
# 계측을 μ μš©ν•˜κ³ , 그리고 μ‹œκ°„ μ°¨μ›μ—μ„œ 좜λ ₯을 flip ν•©λ‹ˆλ‹€.
183182
x_rev = torch.flip(x, dims=[0])
184183
output_b, _ = self.cell_b(torch.flip(x, dims=[0]))
185184
output_b_rev = torch.flip(output_b, dims=[0])
186185
187-
# Retrieve the output from the forward layer. Note this needs to happen
188-
# *after* the stuff we want to parallelize with
186+
# Forward κ³„μΈ΅μ—μ„œ 좜λ ₯을 λ°›μ•„μ˜΅λ‹ˆλ‹€.
187+
# μ΄λŠ” μš°λ¦¬κ°€ λ³‘λ ¬ν™”ν•˜λ €λŠ” μž‘μ—… *이후*에 μΌμ–΄λ‚˜μ•Ό 함을 μ£Όμ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€.
189188
output_f, _ = torch.jit.wait(future_f)
190189
191190
return torch.cat((output_f, output_b_rev), dim=2)
192191
193-
In this example, ``forward()`` delegates execution of ``cell_f`` to another thread,
194-
while it continues to execute ``cell_b``. This causes the execution of both the
195-
cells to be overlapped with each other.
192+
이 μ˜ˆμ‹œμ—μ„œ, ``forward()`` λŠ” ``cell_b`` 의 싀행을 κ³„μ†ν•˜λŠ” λ™μ•ˆ
193+
``cell_f`` λ₯Ό λ‹€λ₯Έ μŠ€λ ˆλ“œλ‘œ μœ„μž„ν•©λ‹ˆλ‹€.
194+
이둜 인해 두 μ…€μ˜ 싀행이 μ„œλ‘œ κ²ΉμΉ©λ‹ˆλ‹€.
196195

197-
Running the script again with this simple modification yields a runtime of
198-
``1.71`` seconds for an improvement of ``17%``!
199196

200-
Aside: Visualizing Parallelism
201-
------------------------------
197+
이 κ°„λ‹¨ν•œ μˆ˜μ • 후에 슀크립트λ₯Ό λ‹€μ‹œ μ‹€ν–‰ν•˜λ©΄
198+
``17%`` ν–₯μƒλœ ``1.71`` 초의 λŸ°νƒ€μž„μ΄ λ‚˜μ˜΅λ‹ˆλ‹€!
202199

203-
We're not done optimizing our model but it's worth introducing the tooling we
204-
have for visualizing performance. One important tool is the `PyTorch profiler <https://pytorch.org/docs/stable/autograd.html#profiler>`_.
200+
Aside: 병렬화 μ‹œκ°ν™” (Visualizing Parallelism)
201+
--------------------------------------------------
205202

206-
Let's use the profiler along with the Chrome trace export functionality to
207-
visualize the performance of our parallelized model:
203+
아직 λͺ¨λΈ μ΅œμ ν™”κ°€ λλ‚˜μ§€ μ•Šμ•˜μ§€λ§Œ μ΄μ―€μ—μ„œ μ„±λŠ₯ μ‹œκ°ν™”λ₯Ό μœ„ν•œ 도ꡬλ₯Ό λ„μž…ν•΄λ΄…μ‹œλ‹€.
204+
ν•œ κ°€μ§€ μ€‘μš”ν•œ λ„κ΅¬λŠ” `PyTorch ν”„λ‘œνŒŒμΌλŸ¬(profiler) <https://pytorch.org/docs/stable/autograd.html#profiler>`_ μž…λ‹ˆλ‹€.
205+
206+
Chrome의 좔적 내보내기 κΈ°λŠ₯(trace export functionality)κ³Ό ν•¨κ»˜ ν”„λ‘œνŒŒμΌλŸ¬λ₯Ό μ‚¬μš©ν•΄
207+
λ³‘λ ¬ν™”λœ λͺ¨λΈμ˜ μ„±λŠ₯을 μ‹œκ°ν™”ν•΄λ΄…μ‹œλ‹€:
208208

209209
.. code-block:: python
210210
211211
with torch.autograd.profiler.profile() as prof:
212212
ens(x)
213213
prof.export_chrome_trace('parallel.json')
214214
215-
This snippet of code will write out a file named ``parallel.json``. If you
216-
navigate Google Chrome to ``chrome://tracing``, click the ``Load`` button, and
217-
load in that JSON file, you should see a timeline like the following:
215+
이 μž‘μ€ μ½”λ“œ 쑰각은 ``parallel.json`` νŒŒμΌμ„ μž‘μ„±ν•©λ‹ˆλ‹€.
216+
Google Chromeμ—μ„œ ``chrome://tracing`` 으둜 μ΄λ™ν•˜μ—¬ ``Load`` λ²„νŠΌμ„ ν΄λ¦­ν•˜κ³ 
217+
JSON νŒŒμΌμ„ λ‘œλ“œν•˜λ©΄ λ‹€μŒκ³Ό 같은 νƒ€μž„λΌμΈμ„ 보게 될 κ²λ‹ˆλ‹€:
218218

219219
.. image:: https://i.imgur.com/rm5hdG9.png
220220

221-
The horizontal axis of the timeline represents time and the vertical axis
222-
represents threads of execution. As we can see, we are running two ``lstm``
223-
instances at a time. This is the result of our hard work parallelizing the
224-
bidirectional layers!
221+
νƒ€μž„λΌμΈμ˜ κ°€λ‘œμΆ•μ€ μ‹œκ°„μ„, μ„Έλ‘œμΆ•μ€ μ‹€ν–‰ μŠ€λ ˆλ“œλ₯Ό λ‚˜νƒ€λƒ…λ‹ˆλ‹€.
222+
λ³΄λ‹€μ‹œν”Ό ν•œ λ²ˆμ— 두 개의 ``lstm`` 을 μ‹€ν–‰ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.
223+
이것은 μ–‘λ°©ν–₯(forward, backward) 계측을 λ³‘λ ¬ν™”ν•˜κΈ° μœ„ν•΄
224+
λ…Έλ ₯ν•œ κ²°κ³Όμž…λ‹ˆλ‹€!
225225

226-
Parallelizing Models in the Ensemble
226+
μ•™μƒλΈ”μ—μ„œμ˜ 병렬화 λͺ¨λΈ
227227
------------------------------------
228228

229-
You may have noticed that there is a further parallelization opportunity in our
230-
code: we can also run the models contained in ``LSTMEnsemble`` in parallel with
231-
each other. The way to do that is simple enough, this is how we should change
232-
the ``forward`` method of ``LSTMEnsemble``:
229+
이 μ½”λ“œμ— 더 λ§Žμ€ 병렬화 κΈ°νšŒκ°€ μžˆλ‹€λŠ” 것을 λˆˆμΉ˜μ±˜μ„μ§€λ„ λͺ¨λ¦…λ‹ˆλ‹€:
230+
``LSTMEnsemble`` 에 ν¬ν•¨λœ λͺ¨λΈλ“€μ„ μ„œλ‘œ λ³‘λ ¬λ‘œ μ‹€ν–‰ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
231+
μ΄λ ‡κ²Œ ν•˜κΈ° μœ„ν•œ 방법은 μ•„μ£Ό κ°„λ‹¨ν•©λ‹ˆλ‹€.
232+
λ°”λ‘œ ``LSTMEnsemble`` 의 ``forward`` λ©”μ†Œλ“œλ₯Ό λ³€κ²½ν•˜λŠ” λ°©λ²•μž…λ‹ˆλ‹€:
233233

234234
.. code-block:: python
235235
236236
def forward(self, x : torch.Tensor) -> torch.Tensor:
237-
# Launch tasks for each model
237+
# 각 λͺ¨λΈμ„ μœ„ν•œ μž‘μ—… μ‹€ν–‰ν•©λ‹ˆλ‹€.
238238
futures : List[torch.jit.Future[torch.Tensor]] = []
239239
for model in self.models:
240240
futures.append(torch.jit.fork(model, x))
241241
242-
# Collect the results from the launched tasks
242+
# μ‹€ν–‰λœ μž‘μ—…λ“€μ—μ„œ κ²°κ³Ό μˆ˜μ§‘ν•©λ‹ˆλ‹€.
243243
results : List[torch.Tensor] = []
244244
for future in futures:
245245
results.append(torch.jit.wait(future))
246246
247247
return torch.stack(results).sum(dim=0)
248248
249-
Or, if you value brevity, we can use list comprehensions:
249+
λ˜λŠ”, λ§Œμ•½ 간결함을 μ€‘μš”ν•˜κ²Œ μƒκ°ν•œλ‹€λ©΄ 리슀트 μ»΄ν”„λ¦¬ν—¨μ…˜(list comprehension)을 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
250250

251251
.. code-block:: python
252252
@@ -255,25 +255,25 @@ Or, if you value brevity, we can use list comprehensions:
255255
results = [torch.jit.wait(fut) for fut in futures]
256256
return torch.stack(results).sum(dim=0)
257257
258-
Like described in the intro, we've used loops to fork off tasks for each of the
259-
models in our ensemble. We've then used another loop to wait for all of the
260-
tasks to be completed. This provides even more overlap of computation.
258+
μ„œλ‘μ—μ„œ μ„€λͺ…ν–ˆλ“―이, μš°λ¦¬λŠ” 루프λ₯Ό μ‚¬μš©ν•΄ μ•™μƒλΈ”μ˜ 각 λͺ¨λΈλ“€μ— λŒ€ν•œ μž‘μ—…μ„ λ‚˜λˆ΄μŠ΅λ‹ˆλ‹€.
259+
그리고 λͺ¨λ“  μž‘μ—…μ΄ μ™„λ£Œλ  λ•ŒκΉŒμ§€ 기닀릴 λ‹€λ₯Έ 루프λ₯Ό μ‚¬μš©ν–ˆμŠ΅λ‹ˆλ‹€.
260+
μ΄λŠ” 더 λ§Žμ€ κ³„μ‚°μ˜ μ˜€λ²„λž©μ„ μ œκ³΅ν•©λ‹ˆλ‹€.
261261

262-
With this small update, the script runs in ``1.4`` seconds, for a total speedup
263-
of ``32%``! Pretty good for two lines of code.
262+
이 μž‘μ€ μ—…λ°μ΄νŠΈλ‘œ μŠ€ν¬λ¦½νŠΈλŠ” ``1.4`` μ΄ˆμ— μ‹€ν–‰λ˜μ–΄ 총 ``32%`` 만큼 속도가 ν–₯μƒλ˜μ—ˆμŠ΅λ‹ˆλ‹€!
263+
단 두 μ€„λ§Œμ— 쒋은 효과λ₯Ό λ³΄μ˜€μŠ΅λ‹ˆλ‹€.
264264

265-
We can also use the Chrome tracer again to see where's going on:
265+
λ˜ν•œ Chrome 좔적기(tracer)λ₯Ό λ‹€μ‹œ μ‚¬μš©ν•΄ μ§„ν–‰ 상황을 λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€:
266266

267267
.. image:: https://i.imgur.com/kA0gyQm.png
268268

269-
We can now see that all ``LSTM`` instances are being run fully in parallel.
269+
이제 λͺ¨λ“  ``LSTM`` μΈμŠ€ν„΄μŠ€κ°€ μ™„μ „νžˆ λ³‘λ ¬λ‘œ μ‹€ν–‰λ˜λŠ” 것을 λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.
270270

271-
Conclusion
271+
κ²°λ‘ 
272272
----------
273273

274-
In this tutorial, we learned about ``fork()`` and ``wait()``, the basic APIs
275-
for doing dynamic, inter-op parallelism in TorchScript. We saw a few typical
276-
usage patterns for using these functions to parallelize the execution of
277-
functions, methods, or ``Modules`` in TorchScript code. Finally, we worked through
278-
an example of optimizing a model using this technique and explored the performance
279-
measurement and visualization tooling available in PyTorch.
274+
이 νŠœν† λ¦¬μ–Όμ—μ„œ μš°λ¦¬λŠ” TorchScriptμ—μ„œ 동적(dynamic), inter-op 병렬 처리λ₯Ό μˆ˜ν–‰ν•˜κΈ° μœ„ν•œ κΈ°λ³Έ API인
275+
``fork()`` 와 ``wait()`` 에 λŒ€ν•΄ λ°°μ› μŠ΅λ‹ˆλ‹€.
276+
μ΄λŸ¬ν•œ ν•¨μˆ˜λ“€μ„ μ‚¬μš©ν•΄ TorchScript μ½”λ“œμ—μ„œ ν•¨μˆ˜, λ©”μ†Œλ“œ, λ˜λŠ”
277+
``Modules`` 의 싀행을 λ³‘λ ¬ν™”ν•˜λŠ” λͺ‡ κ°€μ§€ 일반적인 μ‚¬μš© νŒ¨ν„΄λ„ λ³΄μ•˜μŠ΅λ‹ˆλ‹€.
278+
λ§ˆμ§€λ§‰μœΌλ‘œ, 이 κΈ°μˆ μ„ μ‚¬μš©ν•΄ λͺ¨λΈμ„ μ΅œμ ν™”ν•˜λŠ” 예λ₯Ό 훑어보고, PyTorchμ—μ„œ μ‚¬μš© κ°€λŠ₯ν•œ
279+
μ„±λŠ₯ μΈ‘μ • 및 μ‹œκ°ν™” 도ꡬλ₯Ό μ‚΄νŽ΄λ³΄μ•˜μŠ΅λ‹ˆλ‹€.

0 commit comments

Comments
Β (0)