1- Dynamic Parallelism in TorchScript
2- ==================================
1+ TorchScriptμ λμ λ³λ ¬ μ²λ¦¬(Dynamic Parallelism)
2+ ===================================================
33
4- In this tutorial, we introduce the syntax for doing * dynamic inter-op parallelism *
5- in TorchScript. This parallelism has the following properties :
4+ μ΄ νν 리μΌμμλ, TorchScriptμμ * λμ inter-op λ³λ ¬μ²λ¦¬ * λ₯Ό νλ ꡬ문(syntax)μ μκ°ν©λλ€.
5+ μ΄ λ³λ ¬μ²λ¦¬μλ λ€μκ³Ό κ°μ μμ±μ΄ μμ΅λλ€ :
66
7- * dynamic - The number of parallel tasks created and their workload can depend on the control flow of the program .
8- * inter-op - The parallelism is concerned with running TorchScript program fragments in parallel. This is distinct from * intra-op parallelism *, which is concerned with splitting up individual operators and running subsets of the operator's work in parallel .
9- Basic Syntax
7+ * λμ ( dynamic) - μμ±λ λ³λ ¬ μμ
μ μμ μμ
λΆνλ νλ‘κ·Έλ¨μ μ μ΄ νλ¦μ λ°λΌ λ¬λΌμ§ μ μμ΅λλ€ .
8+ * inter-op - λ³λ ¬ μ²λ¦¬λ TorchScript νλ‘κ·Έλ¨ μ‘°κ°μ λ³λ ¬λ‘ μ€ννλ κ²κ³Ό κ΄λ ¨μ΄ μμ΅λλ€. μ΄λ κ°λ³ μ°μ°μλ₯Ό λΆν νκ³ μ°μ°μ μμ
μ νμ μ§ν©μ λ³λ ¬λ‘ μ€ννλ λ°©μμΈ * intra-op parallelism * μλ ꡬλ³λ©λλ€ .
9+ 기본 ꡬ문
1010------------
1111
12- The two important APIs for dynamic parallelism are :
12+ λμ λ³λ ¬ μ²λ¦¬λ₯Ό μν λ κ°μ§ μ€μν APIλ λ€μκ³Ό κ°μ΅λλ€ :
1313
1414* ``torch.jit.fork(fn : Callable[..., T], *args, **kwargs) -> torch.jit.Future[T] ``
1515* ``torch.jit.wait(fut : torch.jit.Future[T]) -> T ``
1616
17- A good way to demonstrate how these work is by way of an example :
17+ μ΄λ¬ν μλ λ°©μμ λ€μ μμ μμ μ μ΄ν΄ν μ μμ΅λλ€ :
1818
1919.. code-block :: python
2020
@@ -25,37 +25,37 @@ A good way to demonstrate how these work is by way of an example:
2525
2626 @torch.jit.script
2727 def example (x ):
28- # Call `foo` using parallelism:
29- # First, we "fork" off a task. This task will run `foo` with argument `x`
28+ # λ³λ ¬μ μΌλ‘ `foo` λ₯Ό νΈμΆν©λλ€.
29+ # λ¨Όμ , μμ
μ "fork" ν©λλ€. μ΄ μμ
μ `x` μΈμ( argument)μ ν¨κ» `foo` λ₯Ό μ€νν©λλ€.
3030 future = torch.jit.fork(foo, x)
3131
32- # Call `foo` normally
32+ # μΌλ°μ μΌλ‘ `foo`λ₯Ό νΈμΆν©λλ€.
3333 x_normal = foo(x)
3434
35- # Second, we "wait" on the task. Since the task may be running in
36- # parallel, we have to "wait" for its result to become available .
37- # Notice that by having lines of code between the "fork()" and "wait()"
38- # call for a given Future, we can overlap computations so that they
39- # run in parallel .
35+ # λμ§Έ, μμ
μ "κΈ°λ€λ¦½λλ€".
36+ # μμ
μ΄ λ³λ ¬λ‘ μ€ν μ€μΌ μ μμΌλ―λ‘ κ²°κ³Όλ₯Ό μ¬μ©ν μ μμ λκΉμ§ "λκΈ°" ν΄μΌν©λλ€ .
37+ # κ³μ°μ λ³λ ¬λ‘ μννκΈ° μν΄μ
38+ # "fork()" μ "wait()" μ¬μ΄μμ
39+ # Futureλ₯Ό νΈμΆνλ μ μ μ μνμΈμ .
4040 x_parallel = torch.jit.wait(future)
4141
4242 return x_normal, x_parallel
4343
4444 print (example(torch.ones(1 ))) # (-1., -1.)
4545
4646
47- ``fork() `` takes the callable ``fn `` and arguments to that callable ``args ``
48- and ``kwargs `` and creates an asynchronous task for the execution of ``fn ``.
49- ``fn `` can be a function, method, or Module instance. `` fork() `` returns a
50- reference to the value of the result of this execution, called a `` Future `` .
51- Because ``fork `` returns immediately after creating the async task, `` fn `` may
52- not have been executed by the time the line of code after the `` fork() `` call
53- is executed. Thus , ``wait() `` is used to wait for the async task to complete
54- and return the value .
47+ ``fork() `` λ νΈμΆ κ°λ₯ν( callable) ``fn `` ,κ·Έμ λν νΈμΆ κ°λ₯ν μΈμ ``args ``
48+ λ° ``kwargs `` λ₯Ό μ·¨νκ³ ``fn `` μ€νμ μν λΉλκΈ°(asynchronous) μμ
μ μμ±ν©λλ€ .
49+ ``fn `` μ ν¨μ, λ©μλ, λλ λͺ¨λ μΈμ€ν΄μ€μΌ μ μμ΅λλ€.
50+ `` fork() `` λ `` Future `` λΌκ³ λΆλ¦¬λ μ΄ μ€ν κ²°κ³Όμ κ°μ λν μ°Έμ‘°(reference)λ₯Ό λ°νν©λλ€ .
51+ ``fork `` λ λΉλκΈ° μμ
μ μμ±ν μ§νμ λ°νλκΈ° λλ¬Έμ,
52+ `` fork() `` νΈμΆ ν μ½λ λΌμΈμ΄ μ€νλ λκΉμ§ `` fn `` μ΄ μ€νλμ§ μμ μ μμ΅λλ€.
53+ λ°λΌμ , ``wait() `` μ λΉλκΈ° μμ
μ΄ μλ£ λ λκΉμ§ λκΈ°νκ³ κ°μ
54+ λ°ννλλ° μ¬μ©λ©λλ€ .
5555
56- These constructs can be used to overlap the execution of statements within a
57- function (shown in the worked example section) or be composed with other language
58- constructs like loops :
56+ μ΄λ¬ν ꡬ쑰λ ν¨μ λ΄μμ λͺ
λ Ήλ¬Έ μ€νμ μ€μ²©νκ±°λ
57+ μμ
λ (μμ μΉμ
μ νμλ¨) 루νμ κ°μ λ€λ₯Έ
58+ μΈμ΄ κ΅¬μ‘°λ‘ κ΅¬μ± λ μ μμ΅λλ€ :
5959
6060.. code-block :: python
6161
@@ -81,55 +81,55 @@ constructs like loops:
8181
8282 .. note ::
8383
84- When we initialized an empty list of Futures, we needed to add an explicit
85- type annotation to `` futures ``. In TorchScript, empty containers default
86- to assuming they contain Tensor values, so we annotate the list constructor
87- # as being of type ``List[torch.jit.Future[torch.Tensor]] ``
84+ Futureμ λΉ λ¦¬μ€νΈ( list)λ₯Ό μ΄κΈ°νν λ, λͺ
μμ μΈ μ ν μ£Όμμ `` futures `` μ μΆκ°ν΄μΌ νμ΅λλ€.
85+ TorchScriptμμ λΉ μ»¨ν
μ΄λ(container)λ κΈ°λ³Έμ μΌλ‘ tensor κ°μ ν¬ν¨νλ€κ³ κ°μ νλ―λ‘
86+ 리μ€νΈ μμ±μ(constructor) #μ
87+ ``List[torch.jit.Future[torch.Tensor]] `` μ νμ μ£Όμμ λ¬μμ΅λλ€.
8888
89- This example uses ``fork() `` to launch 100 instances of the function ``foo ``,
90- waits on the 100 tasks to complete, then sums the results, returning ``-100.0 ``.
89+ μ΄ μμ λ ``fork() `` λ₯Ό μ¬μ©νμ¬ ν¨μ ``foo `` μ μΈμ€ν΄μ€ 100κ°λ₯Ό μμνκ³ , 100κ°μ μμ
μ΄ μλ£ λ λκΉμ§
90+ λκΈ°ν λ€μ, κ²°κ³Όλ₯Ό ν©μ°νμ¬ ``-100.0 `` μ λ°νν©λλ€ .
9191
92- Applied Example: Ensemble of Bidirectional LSTMs
93- ------------------------------------------------
92+ μ μ©λ μμ: μλ°©ν₯(bidirectional) LSTMsμ μμλΈ(Ensemble)
93+ ------------------------------------------------------------
9494
95- Let's try to apply parallelism to a more realistic example and see what sort
96- of performance we can get out of it. First, let's define the baseline model: an
97- ensemble of bidirectional LSTM layers .
95+ λ³΄λ€ νμ€μ μΈ μμμ λ³λ ¬νλ₯Ό μ μ©νκ³ μ΄λ€ μ±λ₯μ μ»μ μ μλμ§ μ΄ν΄λ΄
μλ€.
96+ λ¨Όμ , μλ°©ν₯ LSTM κ³μΈ΅μ μμλΈμΈ
97+ κΈ°μ€ λͺ¨λΈμ μ μν©μλ€ .
9898
9999.. code-block :: python
100100
101101 import torch, time
102102
103- # In RNN parlance, the dimensions we care about are :
104- # # of time-steps (T)
105- # Batch size (B)
106- # Hidden size/number of "channels" (C)
103+ # RNN μ©μ΄μμλ μ°λ¦¬κ° κ΄μ¬ κ°λ μ°¨μλ€μ μλμ κ°μ΄ λΆλ¦
λλ€ :
104+ # λ¨μμκ°μ κ°―μ (T)
105+ # λ°°μΉ ν¬κΈ° (B)
106+ # "channels"μ μ¨κ²¨μ§ ν¬κΈ°/μ«μ (C)
107107 T, B, C = 50 , 50 , 1024
108108
109- # A module that defines a single "bidirectional LSTM". This is simply two
110- # LSTMs applied to the same sequence, but one in reverse
109+ # λ¨μΌ "μλ°©ν₯ LSTM"μ μ μνλ λͺ¨λμ
λλ€.
110+ # μ΄λ λ¨μν λμΌν μνμ€μ μ μ©λ λ κ°μ LSTMμ΄μ§λ§ νλλ λ°λλ‘ μ μ©λ©λλ€.
111111 class BidirectionalRecurrentLSTM (torch .nn .Module ):
112112 def __init__ (self ):
113113 super ().__init__ ()
114114 self .cell_f = torch.nn.LSTM(input_size = C, hidden_size = C)
115115 self .cell_b = torch.nn.LSTM(input_size = C, hidden_size = C)
116116
117117 def forward (self , x : torch.Tensor) -> torch.Tensor:
118- # Forward layer
118+ # Forward κ³μΈ΅
119119 output_f, _ = self .cell_f(x)
120120
121- # Backward layer. Flip input in the time dimension (dim 0), apply the
122- # layer, then flip the outputs in the time dimension
121+ # Backward κ³μΈ΅. μκ° μ°¨μ(time dimension)(dim 0)μμ μ
λ ₯μ flip (dim 0),
122+ # κ³μΈ΅ μ μ©νκ³ , μκ° μ°¨μμμ μΆλ ₯μ flip ν©λλ€.
123123 x_rev = torch.flip(x, dims = [0 ])
124124 output_b, _ = self .cell_b(torch.flip(x, dims = [0 ]))
125125 output_b_rev = torch.flip(output_b, dims = [0 ])
126126
127127 return torch.cat((output_f, output_b_rev), dim = 2 )
128128
129129
130- # An "ensemble" of `BidirectionalRecurrentLSTM` modules. The modules in the
131- # ensemble are run one-by-one on the same input then their results are
132- # stacked and summed together, returning the combined result .
130+ # `BidirectionalRecurrentLSTM` λͺ¨λμ "ensemble"μ
λλ€.
131+ # μμλΈμ λͺ¨λμ κ°μ μ
λ ₯μΌλ‘ νλνλμ© μ€νλκ³ ,
132+ # λμ λκ³ ν©μ°λ κ²°κ³Όλ₯Ό λ°νν©λλ€ .
133133 class LSTMEnsemble (torch .nn .Module ):
134134 def __init__ (self , n_models ):
135135 super ().__init__ ()
@@ -143,110 +143,110 @@ ensemble of bidirectional LSTM layers.
143143 results.append(model(x))
144144 return torch.stack(results).sum(dim = 0 )
145145
146- # For a head-to-head comparison to what we're going to do with fork/wait, let's
147- # instantiate the model and compile it with TorchScript
146+ # fork/waitμΌλ‘ μ€νν κ²λ€μ μ§μ λΉκ΅λ₯Ό μν΄
147+ # λͺ¨λμ μΈμ€ν΄μ€ννκ³ TorchScriptλ₯Ό ν΅ν΄ μ»΄νμΌν΄ λ΄
μλ€.
148148 ens = torch.jit.script(LSTMEnsemble(n_models = 4 ))
149149
150- # Normally you would pull this input out of an embedding table, but for the
151- # purpose of this demo let's just use random data .
150+ # μΌλ°μ μΌλ‘ μλ² λ© ν
μ΄λΈ( embedding table)μμ μ
λ ₯μ κ°μ Έμ€μ§λ§,
151+ # λ°λͺ¨λ₯Ό μν΄ μ¬κΈ°μλ 무μμ λ°μ΄ν°λ₯Ό μ¬μ©νκ² μ΅λλ€ .
152152 x = torch.rand(T, B, C)
153153
154- # Let's run the model once to warm up things like the memory allocator
154+ # λ©λͺ¨λ¦¬ ν λΉμ(memory allocator) λ±μ μ€λΉμν€κΈ° μν΄ λͺ¨λΈμ λ¨Όμ νλ² μ€νν©λλ€.
155155 ens(x)
156156
157157 x = torch.rand(T, B, C)
158158
159- # Let's see how fast it runs !
159+ # μΌλ§λ λΉ λ₯΄κ² μ€νλλμ§ λ΄
μλ€ !
160160 s = time.time()
161161 ens(x)
162162 print (' Inference took' , time.time() - s, ' seconds' )
163163
164- On my machine, this network runs in ``2.05 `` seconds. We can do a lot better !
164+ μ μ»΄ν¨ν°μμλ λ€νΈμν¬κ° ``2.05 `` μ΄ λ§μ μ€νλμμ΅λλ€. ν¨μ¬ λ λΉ λ₯΄κ² ν μ μμ΅λλ€ !
165165
166- Parallelizing Forward and Backward Layers
167- -----------------------------------------
166+ Forward, Backward κ³μΈ΅ λ³λ ¬ν
167+ ----------------------------------
168168
169- A very simple thing we can do is parallelize the forward and backward layers
170- within ``BidirectionalRecurrentLSTM ``. For this, the structure of the computation
171- is static, so we don't actually even need any loops. Let's rewrite the ``forward ``
172- method of ``BidirectionalRecurrentLSTM `` like so:
169+ κ°λ¨νκ² ν μ μλ μΌλ‘λ ``BidirectionalRecurrentLSTM `` λ΄μμ forward, backward κ³μΈ΅λ€μ λ³λ ¬ννλ κ²μ΄ μμ΅λλ€.
170+ μ΄ λ, κ³μ° ꡬ쑰λ κ³ μ λμ΄ μμΌλ―λ‘ μ°λ¦¬λ μ΄λ€ 루νλ νμλ‘ νμ§ μμ΅λλ€.
171+ ``BidirectionalRecurrentLSTM `` μ ``forward `` λ©μλλ₯Ό λ€μκ³Ό κ°μ΄ μ¬μμ±ν΄λ΄
μλ€:
173172
174173.. code-block :: python
175174
176175 def forward (self , x : torch.Tensor) -> torch.Tensor:
177- # Forward layer - fork() so this can run in parallel to the backward
178- # layer
176+
177+ # Backward κ³μΈ΅κ³Ό λ³λ ¬λ‘ μ€νμν€κΈ° μν΄ forward layerλ₯Ό fork()λ₯Ό νλ€.
179178 future_f = torch.jit.fork(self .cell_f, x)
180179
181- # Backward layer. Flip input in the time dimension (dim 0), apply the
182- # layer, then flip the outputs in the time dimension
180+ # Backward κ³μΈ΅. μκ° μ°¨μ(time dimension)(dim 0)μμ μ
λ ₯μ flip (dim 0),
181+ # κ³μΈ΅μ μ μ©νκ³ , κ·Έλ¦¬κ³ μκ° μ°¨μμμ μΆλ ₯μ flip ν©λλ€.
183182 x_rev = torch.flip(x, dims = [0 ])
184183 output_b, _ = self .cell_b(torch.flip(x, dims = [0 ]))
185184 output_b_rev = torch.flip(output_b, dims = [0 ])
186185
187- # Retrieve the output from the forward layer. Note this needs to happen
188- # *after* the stuff we want to parallelize with
186+ # Forward κ³μΈ΅μμ μΆλ ₯μ λ°μμ΅λλ€.
187+ # μ΄λ μ°λ¦¬κ° λ³λ ¬ννλ €λ μμ
*μ΄ν*μ μΌμ΄λμΌ ν¨μ μ£Όμν΄μΌ ν©λλ€.
189188 output_f, _ = torch.jit.wait(future_f)
190189
191190 return torch.cat((output_f, output_b_rev), dim = 2 )
192191
193- In this example , ``forward() `` delegates execution of `` cell_f `` to another thread,
194- while it continues to execute `` cell_b ``. This causes the execution of both the
195- cells to be overlapped with each other .
192+ μ΄ μμμμ , ``forward() `` λ `` cell_b `` μ μ€νμ κ³μνλ λμ
193+ `` cell_f `` λ₯Ό λ€λ₯Έ μ€λ λλ‘ μμν©λλ€.
194+ μ΄λ‘ μΈν΄ λ μ
μ μ€νμ΄ μλ‘ κ²ΉμΉ©λλ€ .
196195
197- Running the script again with this simple modification yields a runtime of
198- ``1.71 `` seconds for an improvement of ``17% ``!
199196
200- Aside: Visualizing Parallelism
201- ------------------------------
197+ μ΄ κ°λ¨ν μμ νμ μ€ν¬λ¦½νΈλ₯Ό λ€μ μ€ννλ©΄
198+ `` 17% `` ν₯μλ `` 1.71 `` μ΄μ λ°νμμ΄ λμ΅λλ€!
202199
203- We're not done optimizing our model but it's worth introducing the tooling we
204- have for visualizing performance. One important tool is the ` PyTorch profiler < https://pytorch.org/docs/stable/autograd.html#profiler >`_.
200+ Aside: λ³λ ¬ν μκ°ν (Visualizing Parallelism)
201+ --------------------------------------------------
205202
206- Let's use the profiler along with the Chrome trace export functionality to
207- visualize the performance of our parallelized model:
203+ μμ§ λͺ¨λΈ μ΅μ νκ° λλμ§ μμμ§λ§ μ΄μ―€μμ μ±λ₯ μκ°νλ₯Ό μν λꡬλ₯Ό λμ
ν΄λ΄
μλ€.
204+ ν κ°μ§ μ€μν λꡬλ `PyTorch νλ‘νμΌλ¬(profiler) <https://pytorch.org/docs/stable/autograd.html#profiler >`_ μ
λλ€.
205+
206+ Chromeμ μΆμ λ΄λ³΄λ΄κΈ° κΈ°λ₯(trace export functionality)κ³Ό ν¨κ» νλ‘νμΌλ¬λ₯Ό μ¬μ©ν΄
207+ λ³λ ¬νλ λͺ¨λΈμ μ±λ₯μ μκ°νν΄λ΄
μλ€:
208208
209209.. code-block :: python
210210
211211 with torch.autograd.profiler.profile() as prof:
212212 ens(x)
213213 prof.export_chrome_trace(' parallel.json' )
214214
215- This snippet of code will write out a file named ``parallel.json ``. If you
216- navigate Google Chrome to ``chrome://tracing ``, click the ``Load `` button, and
217- load in that JSON file, you should see a timeline like the following :
215+ μ΄ μμ μ½λ μ‘°κ°μ ``parallel.json `` νμΌμ μμ±ν©λλ€.
216+ Google Chromeμμ ``chrome://tracing `` μΌλ‘ μ΄λνμ¬ ``Load `` λ²νΌμ ν΄λ¦νκ³
217+ JSON νμΌμ λ‘λνλ©΄ λ€μκ³Ό κ°μ νμλΌμΈμ λ³΄κ² λ κ²λλ€ :
218218
219219.. image :: https://i.imgur.com/rm5hdG9.png
220220
221- The horizontal axis of the timeline represents time and the vertical axis
222- represents threads of execution. As we can see, we are running two ``lstm ``
223- instances at a time. This is the result of our hard work parallelizing the
224- bidirectional layers !
221+ νμλΌμΈμ κ°λ‘μΆμ μκ°μ, μΈλ‘μΆμ μ€ν μ€λ λλ₯Ό λνλ
λλ€.
222+ 보λ€μνΌ ν λ²μ λ κ°μ ``lstm `` μ μ€ννκ³ μμ΅λλ€.
223+ μ΄κ²μ μλ°©ν₯(forward, backward) κ³μΈ΅μ λ³λ ¬ννκΈ° μν΄
224+ λ
Έλ ₯ν κ²°κ³Όμ
λλ€ !
225225
226- Parallelizing Models in the Ensemble
226+ μμλΈμμμ λ³λ ¬ν λͺ¨λΈ
227227------------------------------------
228228
229- You may have noticed that there is a further parallelization opportunity in our
230- code: we can also run the models contained in ``LSTMEnsemble `` in parallel with
231- each other. The way to do that is simple enough, this is how we should change
232- the ``forward `` method of `` LSTMEnsemble `` :
229+ μ΄ μ½λμ λ λ§μ λ³λ ¬ν κΈ°νκ° μλ€λ κ²μ λμΉμ±μμ§λ λͺ¨λ¦
λλ€:
230+ ``LSTMEnsemble `` μ ν¬ν¨λ λͺ¨λΈλ€μ μλ‘ λ³λ ¬λ‘ μ€νν μλ μμ΅λλ€.
231+ μ΄λ κ² νκΈ° μν λ°©λ²μ μμ£Ό κ°λ¨ν©λλ€.
232+ λ°λ‘ ``LSTMEnsemble `` μ `` forward `` λ©μλλ₯Ό λ³κ²½νλ λ°©λ²μ
λλ€ :
233233
234234.. code-block :: python
235235
236236 def forward (self , x : torch.Tensor) -> torch.Tensor:
237- # Launch tasks for each model
237+ # κ° λͺ¨λΈμ μν μμ
μ€νν©λλ€.
238238 futures : List[torch.jit.Future[torch.Tensor]] = []
239239 for model in self .models:
240240 futures.append(torch.jit.fork(model, x))
241241
242- # Collect the results from the launched tasks
242+ # μ€νλ μμ
λ€μμ κ²°κ³Ό μμ§ν©λλ€.
243243 results : List[torch.Tensor] = []
244244 for future in futures:
245245 results.append(torch.jit.wait(future))
246246
247247 return torch.stack(results).sum(dim = 0 )
248248
249- Or, if you value brevity, we can use list comprehensions:
249+ λλ, λ§μ½ κ°κ²°ν¨μ μ€μνκ² μκ°νλ€λ©΄ 리μ€νΈ μ»΄ν리ν¨μ
(list comprehension)μ μ¬μ©ν μ μμ΅λλ€.
250250
251251.. code-block :: python
252252
@@ -255,25 +255,25 @@ Or, if you value brevity, we can use list comprehensions:
255255 results = [torch.jit.wait(fut) for fut in futures]
256256 return torch.stack(results).sum(dim = 0 )
257257
258- Like described in the intro, we've used loops to fork off tasks for each of the
259- models in our ensemble. We've then used another loop to wait for all of the
260- tasks to be completed. This provides even more overlap of computation .
258+ μλμμ μ€λͺ
νλ―μ΄, μ°λ¦¬λ 루νλ₯Ό μ¬μ©ν΄ μμλΈμ κ° λͺ¨λΈλ€μ λν μμ
μ λλ΄μ΅λλ€.
259+ κ·Έλ¦¬κ³ λͺ¨λ μμ
μ΄ μλ£λ λκΉμ§ κΈ°λ€λ¦΄ λ€λ₯Έ 루νλ₯Ό μ¬μ©νμ΅λλ€.
260+ μ΄λ λ λ§μ κ³μ°μ μ€λ²λ©μ μ 곡ν©λλ€ .
261261
262- With this small update, the script runs in ``1.4 `` seconds, for a total speedup
263- of `` 32% ``! Pretty good for two lines of code .
262+ μ΄ μμ μ
λ°μ΄νΈλ‘ μ€ν¬λ¦½νΈλ ``1.4 `` μ΄μ μ€νλμ΄ μ΄ `` 32% `` λ§νΌ μλκ° ν₯μλμμ΅λλ€!
263+ λ¨ λ μ€λ§μ μ’μ ν¨κ³Όλ₯Ό 보μμ΅λλ€ .
264264
265- We can also use the Chrome tracer again to see where's going on :
265+ λν Chrome μΆμ κΈ°( tracer)λ₯Ό λ€μ μ¬μ©ν΄ μ§ν μν©μ λ³Ό μ μμ΅λλ€ :
266266
267267.. image :: https://i.imgur.com/kA0gyQm.png
268268
269- We can now see that all ``LSTM `` instances are being run fully in parallel .
269+ μ΄μ λͺ¨λ ``LSTM `` μΈμ€ν΄μ€κ° μμ ν λ³λ ¬λ‘ μ€νλλ κ²μ λ³Ό μ μμ΅λλ€ .
270270
271- Conclusion
271+ κ²°λ‘
272272----------
273273
274- In this tutorial, we learned about `` fork() `` and `` wait() ``, the basic APIs
275- for doing dynamic, inter-op parallelism in TorchScript. We saw a few typical
276- usage patterns for using these functions to parallelize the execution of
277- functions, methods, or ``Modules `` in TorchScript code. Finally, we worked through
278- an example of optimizing a model using this technique and explored the performance
279- measurement and visualization tooling available in PyTorch .
274+ μ΄ νν 리μΌμμ μ°λ¦¬λ TorchScriptμμ λμ (dynamic), inter-op λ³λ ¬ μ²λ¦¬λ₯Ό μννκΈ° μν κΈ°λ³Έ APIμΈ
275+ `` fork() `` μ `` wait() `` μ λν΄ λ°°μ μ΅λλ€.
276+ μ΄λ¬ν ν¨μλ€μ μ¬μ©ν΄ TorchScript μ½λμμ ν¨μ, λ©μλ, λλ
277+ ``Modules `` μ μ€νμ λ³λ ¬ννλ λͺ κ°μ§ μΌλ°μ μΈ μ¬μ© ν¨ν΄λ 보μμ΅λλ€.
278+ λ§μ§λ§μΌλ‘, μ΄ κΈ°μ μ μ¬μ©ν΄ λͺ¨λΈμ μ΅μ ννλ μλ₯Ό νμ΄λ³΄κ³ , PyTorchμμ μ¬μ© κ°λ₯ν
279+ μ±λ₯ μΈ‘μ λ° μκ°ν λꡬλ₯Ό μ΄ν΄λ³΄μμ΅λλ€ .
0 commit comments