22
33"""
44.. meta::
5- :description: An end-to-end example of how to use AOTInductor for Python runtime.
5+ :description: Python ๋ฐํ์์์ AOTInductor๋ฅผ ์ฌ์ฉํ๋ end-to-end ์์
66 :keywords: torch.export, AOTInductor, torch._inductor.aoti_compile_and_package, aot_compile, torch._export.aoti_load_package
77
8- ``torch.export`` AOTInductor Tutorial for Python runtime (Beta)
9- ===============================================================
10- **Author:** Ankith Gunapal, Bin Bao, Angela Yi
8+ (Beta) Python ๋ฐํ์์ ์ํ ``torch.export`` AOTInductor ํํ ๋ฆฌ์ผ
9+ ==================================================================
10+ **์ ์:** Ankith Gunapal, Bin Bao, Angela Yi
11+ **๋ฒ์ญ:** `๊น์ ์ฐ <https://github.com/jykimai>`_
1112"""
1213
1314######################################################################
1415#
1516# .. warning::
1617#
17- # ``torch._inductor.aoti_compile_and_package`` and
18- # ``torch._inductor.aoti_load_package`` are in Beta status and are subject
19- # to backwards compatibility breaking changes. This tutorial provides an
20- # example of how to use these APIs for model deployment using Python
21- # runtime.
18+ # ``torch._inductor.aoti_compile_and_package`` ์
19+ # ``torch._inductor.aoti_load_package`` ๋ Beta ์ํ์ด๋ฉฐ, ํ์ ํธํ์ฑ์ ๊นจ๋
20+ # ๋ณ๊ฒฝ์ด ๋ฐ์ํ ์ ์์ต๋๋ค. ์ด ํํ ๋ฆฌ์ผ์ Python ๋ฐํ์์ ์ฌ์ฉํ
21+ # ๋ชจ๋ธ ๋ฐฐํฌ์ ์ด๋ฌํ API๋ฅผ ํ์ฉํ๋ ๋ฐฉ๋ฒ์ ์์ ๋ก ๋ณด์ฌ์ค๋๋ค.
2222#
23- # It has been shown `previously
24- # <https://pytorch.org/docs/stable/torch.compiler_aot_inductor.html#>`__ how
25- # AOTInductor can be used to do Ahead-of-Time compilation of PyTorch exported
26- # models by creating an artifact that can be run in a non-Python environment.
27- # In this tutorial, you will learn an end-to-end example of how to use
28- # AOTInductor for Python runtime.
23+ # `์ด์ ๋ฌธ์ <https://pytorch.org/docs/stable/torch.compiler_aot_inductor.html#>`__ ์์
24+ # AOTInductor๋ฅผ ์ฌ์ฉํ์ฌ PyTorch๋ก ๋ด๋ณด๋ธ(exported) ๋ชจ๋ธ์ ์ฌ์ ์ปดํ์ผ(Ahead-of-Time compilation)ํ๊ณ ,
25+ # Python์ด ์๋ ํ๊ฒฝ์์๋ ์คํํ ์ ์๋ ์ฐ์ถ๋ฌผ(artifact)์ ์์ฑํ๋ ๋ฐฉ๋ฒ์ ์ดํด๋ณด์์ต๋๋ค.
26+ # ์ด ํํ ๋ฆฌ์ผ์์๋ Python ๋ฐํ์์์ AOTInductor๋ฅผ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ์๋ ํฌ ์๋ ์์ ๋ก ์์๋ด
๋๋ค.
2927#
30- # **Contents **
28+ # **๋ชฉ์ฐจ **
3129#
3230# .. contents::
3331# :local:
3432
3533######################################################################
36- # Prerequisites
34+ # ์ ์ ์กฐ๊ฑด
3735# -------------
38- # * PyTorch 2.6 or later
39- # * Basic understanding of ``torch.export`` and AOTInductor
40- # * Complete the `AOTInductor: Ahead-Of -Time Compilation for Torch.Export-ed Models <https://pytorch.org/docs/stable/torch.compiler_aot_inductor.html#>`_ tutorial
36+ # * PyTorch 2.6 ์ด์
37+ # * ``torch.export`` ์ AOTInductor์ ๋ํ ๊ธฐ๋ณธ์ ์ธ ์ดํด
38+ # * `AOTInductor: Torch.Export๋ก ๋ด๋ณด๋ธ ๋ชจ๋ธ์ ์ฌ์ ์ปดํ์ผ( Ahead-of -Time Compilation) <https://pytorch.org/docs/stable/torch.compiler_aot_inductor.html#>`_ ํํ ๋ฆฌ์ผ ์๋ฃ
4139
4240######################################################################
43- # What you will learn
41+ # ์ด ํํ ๋ฆฌ์ผ์์ ๋ฐฐ์ธ ๋ด์ฉ
4442# ----------------------
45- # * How to use AOTInductor for Python runtime.
46- # * How to use :func:`torch._inductor.aoti_compile_and_package` along with :func:`torch.export.export` to generate a compiled artifact
47- # * How to load and run the artifact in a Python runtime using :func:`torch._export.aot_load`.
48- # * When to you use AOTInductor with a Python runtime
43+ # * Python ๋ฐํ์์์ AOTInductor๋ฅผ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ
44+ # * :func:`torch._inductor.aoti_compile_and_package` ์ :func:`torch.export.export` ๋ฅผ ํจ๊ป ์ฌ์ฉํ์ฌ ์ปดํ์ผ๋ ์ฐ์ถ๋ฌผ( artifact)์ ์์ฑํ๋ ๋ฐฉ๋ฒ
45+ # * :func:`torch._export.aot_load` ๋ฅผ ์ฌ์ฉํ์ฌ Python ๋ฐํ์์์ ์ฐ์ถ๋ฌผ์ ๋ถ๋ฌ์ค๊ณ ์คํํ๋ ๋ฐฉ๋ฒ
46+ # * Python ๋ฐํ์๊ณผ ํจ๊ป AOTInductor๋ฅผ ์ฌ์ฉํด์ผ ํ๋ ๊ฒฝ์ฐ
4947
5048######################################################################
51- # Model Compilation
49+ # ๋ชจ๋ธ ์ปดํ์ผ
5250# -----------------
5351#
54- # We will use the TorchVision pretrained ``ResNet18`` model as an example .
52+ # ์์๋ก TorchVision์ ์ฌ์ ํ์ต๋ ``ResNet18`` ๋ชจ๋ธ์ ์ฌ์ฉํฉ๋๋ค .
5553#
56- # The first step is to export the model to a graph representation using
57- # :func:`torch.export.export`. To learn more about using this function, you can
58- # check out the `docs <https://pytorch.org/docs/main/export.html>`_ or the
59- # `tutorial <https://tutorials.pytorch.kr/intermediate/torch_export_tutorial.html>`_.
54+ # ์ฒซ ๋ฒ์งธ ๋จ๊ณ๋ :func:`torch. export.export` ๋ฅผ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ๊ทธ๋ํ ํํ์ผ๋ก
55+ # ๋ด๋ณด๋ด๋ ๊ฒ์
๋๋ค. ์ด ํจ์์ ๋ํด ๋ ์์ธํ ์์๋ณด๋ ค๋ฉด
56+ # `๋ฌธ์ <https://pytorch.org/docs/main/export.html>`_ ๋
57+ # `ํํ ๋ฆฌ์ผ <https://tutorials.pytorch.kr/intermediate/torch_export_tutorial.html>`_ ์ ์ฐธ๊ณ ํ์ธ์ .
6058#
61- # Once we have exported the PyTorch model and obtained an ``ExportedProgram``,
62- # we can apply :func:`torch._inductor.aoti_compile_and_package` to AOTInductor
63- # to compile the program to a specified device, and save the generated contents
64- # into a ".pt2" artifact.
59+ # PyTorch ๋ชจ๋ธ์ ๋ด๋ณด๋ด์ด ``ExportedProgram`` ์ ์ป์ ํ์๋,
60+ # :func:`torch._inductor.aoti_compile_and_package` ๋ฅผ AOTInductor์ ์ ์ฉํ์ฌ
61+ # ์ง์ ๋ ๋๋ฐ์ด์ค์ ๋ง์ถฐ ํ๋ก๊ทธ๋จ์ ์ปดํ์ผํ๊ณ , ์์ฑ๋ ๋ด์ฉ์ ".pt2" ์ฐ์ถ๋ฌผ๋ก ์ ์ฅํ ์ ์์ต๋๋ค.
6562#
6663# .. note::
6764#
68- # This API supports the same available options that :func:`torch.compile`
69- # has, such as ``mode`` and ``max_autotune`` (for those who want to enable
70- # CUDA graphs and leverage Triton based matrix multiplications and
71- # convolutions)
65+ # ์ด API๋ :func:`torch.compile` ์์ ์ฌ์ฉํ ์ ์๋ ์ต์
๊ณผ ๋์ผํ ์ต์
์ ์ง์ํฉ๋๋ค.
66+ # ์๋ฅผ ๋ค์ด ``mode`` ์ ``max_autotune`` ๊ฐ์ ์ต์
์ด ์์ผ๋ฉฐ,
67+ # ์ด๋ CUDA ๊ทธ๋ํ๋ฅผ ํ์ฑํํ๊ณ Triton ๊ธฐ๋ฐ์ ํ๋ ฌ ๊ณฑ์
๊ณผ ํฉ์ฑ๊ณฑ(convolution)์
68+ # ํ์ฉํ๊ณ ์ ํ๋ ๊ฒฝ์ฐ์ ์ฌ์ฉํฉ๋๋ค.
7269
7370import os
7471import torch
10198 )
10299
103100######################################################################
104- # The result of :func:`aoti_compile_and_package` is an artifact "resnet18.pt2"
105- # which can be loaded and executed in Python and C++ .
101+ # :func:`aoti_compile_and_package` ์ ๊ฒฐ๊ณผ๋ก "resnet18.pt2" ์ฐ์ถ๋ฌผ์ด ์์ฑ๋๋ฉฐ,
102+ # Python๊ณผ C++ ํ๊ฒฝ ๋ชจ๋์์ ๋ถ๋ฌ์ ์คํํ ์ ์์ต๋๋ค .
106103#
107- # The artifact itself contains a bunch of AOTInductor generated code, such as
108- # a generated C++ runner file, a shared library compiled from the C++ file, and
109- # CUDA binary files, aka cubin files, if optimizing for CUDA .
104+ # ์ฐ์ถ๋ฌผ ์์ฒด์๋ AOTInductor๊ฐ ์์ฑํ ๋ค์ํ ์ฝ๋๊ฐ ํฌํจ๋์ด ์์ต๋๋ค.
105+ # ์๋ฅผ ๋ค์ด ์์ฑ๋ C++ ๋ฌ๋ ํ์ผ, C++ ํ์ผ๋ก๋ถํฐ ์ปดํ์ผ๋ ๊ณต์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ,
106+ # ๊ทธ๋ฆฌ๊ณ CUDA์ ์ต์ ํํ๋ ๊ฒฝ์ฐ์๋ CUDA ๋ฐ์ด๋๋ฆฌ ํ์ผ(cubin ํ์ผ)์ด ํจ๊ป ๋ค์ด ์์ต๋๋ค .
110107#
111- # Structure-wise, the artifact is a structured ``.zip`` file, with the following
112- # specification:
108+ # ๊ตฌ์กฐ ์ธก๋ฉด์์ ์ฐ์ถ๋ฌผ์ ๊ตฌ์กฐํ๋ ``.zip`` ํ์ผ์ด๋ฉฐ, ์๋์ ๊ฐ์ ์ฌ์์ ๊ฐ์ง๊ณ ์์ต๋๋ค.
113109#
114110# .. code::
115111# .
118114# โโโ data
119115# โ โโโ aotinductor
120116# โ โ โโโ model
121- # โ โ โโโ xxx.cpp # AOTInductor generated cpp file
122- # โ โ โโโ xxx.so # AOTInductor generated shared library
123- # โ โ โโโ xxx.cubin # Cubin files (if running on CUDA )
124- # โ โ โโโ xxx_metadata.json # Additional metadata to save
117+ # โ โ โโโ xxx.cpp # AOTInductor๊ฐ ์์ฑํ cpp ํ์ผ
118+ # โ โ โโโ xxx.so # AOTInductor๊ฐ ์์ฑํ ๊ณต์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ
119+ # โ โ โโโ xxx.cubin # Cubin ํ์ผ (CUDA์์ ์คํํ๋ ๊ฒฝ์ฐ )
120+ # โ โ โโโ xxx_metadata.json # ์ ์ฅํ ์ถ๊ฐ ๋ฉํ๋ฐ์ดํฐ
125121# โ โโโ weights
126122# โ โ โโโ TBD
127123# โ โโโ constants
128124# โ โโโ TBD
129125# โโโ extra
130126# โโโ metadata.json
131127#
132- # We can use the following command to inspect the artifact contents:
128+ # ๋ค์ ๋ช
๋ น์ด๋ฅผ ์ฌ์ฉํ์ฌ ์ฐ์ถ๋ฌผ์ ๋ด์ฉ์ ํ์ธํ ์ ์์ต๋๋ค.
129+ #
130+ # .. code:: bash
131+ #
132+ # $ unzip -l resnet18.pt2
133133#
134134# .. code:: bash
135135#
163163
164164
165165######################################################################
166- # Model Inference in Python
166+ # Python์์์ ๋ชจ๋ธ ์ถ๋ก
167167# -------------------------
168168#
169- # To load and run the artifact in Python, we can use :func:`torch._inductor.aoti_load_package`.
169+ # Python์์ ์ฐ์ถ๋ฌผ์ ๋ถ๋ฌ์ ์คํํ๋ ค๋ฉด :func:`torch._inductor.aoti_load_package` ๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค .
170170#
171171
172172import os
183183
184184
185185######################################################################
186- # When to use AOTInductor with a Python Runtime
186+ # Python ๋ฐํ์๊ณผ ํจ๊ป AOTInductor๋ฅผ ์ฌ์ฉํด์ผ ํ๋ ๊ฒฝ์ฐ
187187# ---------------------------------------------
188188#
189- # There are mainly two reasons why one would use AOTInductor with a Python Runtime:
189+ # Python ๋ฐํ์๊ณผ ํจ๊ป AOTInductor๋ฅผ ์ฌ์ฉํ๋ ์ฃผ๋ ์ด์ ๋ ํฌ๊ฒ ๋ ๊ฐ์ง์
๋๋ค.
190190#
191- # - ``torch._inductor.aoti_compile_and_package`` generates a singular
192- # serialized artifact. This is useful for model versioning for deployments
193- # and tracking model performance over time.
194- # - With :func:`torch.compile` being a JIT compiler, there is a warmup
195- # cost associated with the first compilation. Your deployment needs to
196- # account for the compilation time taken for the first inference. With
197- # AOTInductor, the compilation is done ahead of time using
198- # ``torch.export.export`` and ``torch._inductor.aoti_compile_and_package``.
199- # At deployment time, after loading the model, running inference does not
200- # have any additional cost.
191+ # - ``torch._inductor.aoti_compile_and_package`` ๋ ํ๋์ ์ง๋ ฌํ๋ ์ฐ์ถ๋ฌผ์
192+ # ์์ฑํฉ๋๋ค. ์ด๋ ๋ฐฐํฌ ์ ๋ชจ๋ธ ๋ฒ์ ๊ด๋ฆฌ์ ์๊ฐ์ ๋ฐ๋ฅธ ๋ชจ๋ธ ์ฑ๋ฅ ์ถ์ ์ ์ ์ฉํฉ๋๋ค.
193+ # - :func:`torch.compile` ์ JIT ์ปดํ์ผ๋ฌ์ด๋ฏ๋ก ์ฒซ ์ปดํ์ผ ์ ์๋ฐ์
๋น์ฉ์ด ๋ฐ์ํฉ๋๋ค.
194+ # ๋ฐ๋ผ์ ๋ฐฐํฌ ์ ์ฒซ ์ถ๋ก ์ ๊ฑธ๋ฆฌ๋ ์ปดํ์ผ ์๊ฐ์ ๊ณ ๋ คํด์ผ ํฉ๋๋ค.
195+ # ๋ฐ๋ฉด AOTInductor๋ฅผ ์ฌ์ฉํ๋ฉด ``torch.export.export`` ์
196+ # ``torch._inductor.aoti_compile_and_package`` ๋ฅผ ํตํด ์ปดํ์ผ์ด ๋ฏธ๋ฆฌ ์ํ๋ฉ๋๋ค.
197+ # ๋ฐฐํฌ ์์ ์๋ ๋ชจ๋ธ์ ๋ถ๋ฌ์จ ํ ์ถ๋ก ์ ์คํํ ๋ ์ถ๊ฐ ๋น์ฉ์ด ๋ฐ์ํ์ง ์์ต๋๋ค.
201198#
202199#
203- # The section below shows the speedup achieved with AOTInductor for first inference
200+ # ์๋ ์น์
์์๋ AOTInductor๋ฅผ ์ฌ์ฉํ์ ๋ ์ฒซ ์ถ๋ก ์์ ์ป์ ์ ์๋ ์๋ ํฅ์์ ๋ณด์ฌ์ค๋๋ค.
204201#
205- # We define a utility function ``timed`` to measure the time taken for inference
202+ # ์ถ๋ก ์ ๊ฑธ๋ฆฌ๋ ์๊ฐ์ ์ธก์ ํ๊ธฐ ์ํด ``timed`` ๋ผ๋ ์ ํธ๋ฆฌํฐ ํจ์๋ฅผ ์ ์ํฉ๋๋ค.
206203#
207204
208205import time
209206def timed (fn ):
210- # Returns the result of running `fn()` and the time it took for `fn()` to run,
211- # in seconds. We use CUDA events and synchronization for accurate
212- # measurement on CUDA enabled devices .
207+ # `fn()` ์ ์คํํ ๊ฒฐ๊ณผ์ `fn()` ์ ์คํ ์๊ฐ(์ด)์ ๋ฐํํฉ๋๋ค.
208+ # CUDA๋ฅผ ์ง์ํ๋ ๋๋ฐ์ด์ค์์ ์ ํํ๊ฒ ์ธก์ ํ๊ธฐ ์ํด
209+ # CUDA ์ด๋ฒคํธ์ ๋๊ธฐํ๋ฅผ ์ฌ์ฉํฉ๋๋ค .
213210 if torch .cuda .is_available ():
214211 start = torch .cuda .Event (enable_timing = True )
215212 end = torch .cuda .Event (enable_timing = True )
@@ -224,7 +221,7 @@ def timed(fn):
224221 else :
225222 end = time .time ()
226223
227- # Measure time taken to execute the function in miliseconds
224+ # ํจ์ ์คํ์ ๊ฑธ๋ฆฐ ์๊ฐ์ ๋ฐ๋ฆฌ์ด ๋จ์๋ก ์ธก์ ํฉ๋๋ค.
228225 if torch .cuda .is_available ():
229226 duration = start .elapsed_time (end )
230227 else :
@@ -234,7 +231,7 @@ def timed(fn):
234231
235232
236233######################################################################
237- # Lets measure the time for first inference using AOTInductor
234+ # AOTInductor๋ฅผ ์ฌ์ฉํ ์ฒซ ์ถ๋ก ์๊ฐ์ ์ธก์ ํด ๋ณด๊ฒ ์ต๋๋ค.
238235
239236torch ._dynamo .reset ()
240237
@@ -247,7 +244,7 @@ def timed(fn):
247244
248245
249246######################################################################
250- # Lets measure the time for first inference using ``torch.compile``
247+ # ``torch.compile`` ์ ์ฌ์ฉํ ์ฒซ ์ถ๋ก ์๊ฐ์ ์ธก์ ํด ๋ณด๊ฒ ์ต๋๋ค.
251248
252249torch ._dynamo .reset ()
253250
@@ -262,15 +259,13 @@ def timed(fn):
262259 print (f"Time taken for first inference for torch.compile is { time_taken :.2f} ms" )
263260
264261######################################################################
265- # We see that there is a drastic speedup in first inference time using AOTInductor compared
266- # to ``torch.compile``
262+ # AOTInductor๋ฅผ ์ฌ์ฉํ๋ฉด ``torch.compile`` ์ ๋นํด ์ฒซ ์ถ๋ก ์๊ฐ์ด ํฌ๊ฒ ๋จ์ถ๋๋ ๊ฒ์ ํ์ธํ ์ ์์ต๋๋ค.
267263
268264######################################################################
269- # Conclusion
265+ # ๊ฒฐ๋ก
270266# ----------
271267#
272- # In this recipe, we have learned how to effectively use the AOTInductor for Python runtime by
273- # compiling and loading a pretrained ``ResNet18`` model. This process
274- # demonstrates the practical application of generating a compiled artifact and
275- # running it within a Python environment. We also looked at the advantage of using
276- # AOTInductor in model deployments, with regards to speed up in first inference time.
268+ # ์ด ํํ ๋ฆฌ์ผ์์๋ ์ฌ์ ํ์ต๋ ``ResNet18`` ๋ชจ๋ธ์ ์ปดํ์ผํ๊ณ ๋ถ๋ฌ์ค๋ ๋ฐฉ๋ฒ์ ํตํด
269+ # Python ๋ฐํ์์์ AOTInductor๋ฅผ ํจ๊ณผ์ ์ผ๋ก ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ์์๋ณด์์ต๋๋ค.
270+ # ์ด ๊ณผ์ ์ ์ปดํ์ผ๋ ์ฐ์ถ๋ฌผ์ ์์ฑํ๊ณ Python ํ๊ฒฝ์์ ์คํํ๋ ์ค์ฉ์ ์ธ ํ์ฉ ๋ฐฉ๋ฒ์ ๋ณด์ฌ์ค๋๋ค.
271+ # ๋ํ ์ฒซ ์ถ๋ก ์๊ฐ ๋จ์ถ์ด๋ผ๋ ์ธก๋ฉด์์ ๋ชจ๋ธ ๋ฐฐํฌ์ AOTInductor๋ฅผ ์ฌ์ฉํ์ ๋์ ์ฅ์ ๋ ํจ๊ป ์ดํด๋ณด์์ต๋๋ค.
0 commit comments