|
18 | 18 | PyProf - PyTorch Profiling tool |
19 | 19 | =============================== |
20 | 20 |
|
| 21 | + **NOTE: You are currently on the r20.09 branch which tracks |
| 22 | + stabilization towards the release. This branch is not usable |
| 23 | + during stabilization.** |
| 24 | + |
21 | 25 | .. overview-begin-marker-do-not-remove |
22 | 26 |
|
23 | | -PyProf is a tool that profiles and analyzes the GPU performance of PyTorch |
24 | | -models. PyProf aggregates kernel performance from `Nsight Systems |
25 | | -<https://developer.nvidia.com/nsight-systems>`_ or `NvProf |
26 | | -<https://developer.nvidia.com/nvidia-visual-profiler>`_. |
27 | | - |
28 | | -What's New in 3.4.0 |
29 | | -------------------- |
30 | | - |
31 | | -* README and User Guide documentation has been updated with more installation |
32 | | - options and pointers |
33 | | - |
34 | | -Known Issues |
35 | | ------------- |
36 | | - |
37 | | -* Forward-Backward kernel correlation heuristics do not work correctly with |
38 | | - PyTorch 1.6. Recommended work arounds include: |
39 | | - |
40 | | - * Use with PyTorch 1.5 |
41 | | - * Use DLProf in the `20.09 NGC Pytorch container <https://ngc.nvidia.com/catalog/containers/nvidia:pytorch>`_ |
42 | | - |
43 | | -Features |
44 | | --------- |
45 | | - |
46 | | -* Identifies the layer that launched a kernel: e.g. the association of |
47 | | - `ComputeOffsetsKernel` with a concrete PyTorch layer or API is not obvious. |
48 | | - |
49 | | -* Identifies the tensor dimensions and precision: without knowing the tensor |
50 | | - dimensions and precision, it's impossible to reason about whether the actual |
51 | | - (silicon) kernel time is close to maximum performance of such a kernel on |
52 | | - the GPU. Knowing the tensor dimensions and precision, we can figure out the |
53 | | - FLOPs and bandwidth required by a layer, and then determine how close to |
54 | | - maximum performance the kernel is for that operation. |
55 | | - |
56 | | -* Forward-backward correlation: PyProf determines what the forward pass step |
57 | | - is that resulted in the particular weight and data gradients (wgrad, dgrad), |
58 | | - which makes it possible to determine the tensor dimensions required by these |
59 | | - backprop steps to assess their performance. |
60 | | - |
61 | | -* Determines Tensor Core usage: PyProf can highlight the kernels that use |
62 | | - `Tensor Cores <https://developer.nvidia.com/tensor-cores>`_. |
63 | | - |
64 | | -* Correlate the line in the user's code that launched a particular kernel (program trace). |
65 | | - |
66 | 27 | .. overview-end-marker-do-not-remove |
67 | 28 |
|
68 | | -The current release of PyProf is 3.4.0 and is available in the 20.09 release of |
69 | | -the PyTorch container on `NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The |
70 | | -branch for this release is `r20.09 |
71 | | -<https://github.com/NVIDIA/PyProf/tree/r20.09>`_. |
72 | | - |
73 | | -Quick Installation Instructions |
74 | | -------------------------------- |
75 | | - |
76 | 29 | .. quick-install-start-marker-do-not-remove |
77 | 30 |
|
78 | | -* Clone the git repository :: |
79 | | - |
80 | | - $ git clone https://github.com/NVIDIA/PyProf.git |
81 | | - |
82 | | -* Navigate to the top level PyProf directory |
83 | | - |
84 | | -* Install PyProf :: |
85 | | - |
86 | | - $ pip install . |
87 | | - |
88 | | -* Verify installation is complete with pip list :: |
89 | | - |
90 | | - $ pip list | grep pyprof |
91 | | - |
92 | | -* Should display :: |
93 | | - |
94 | | - pyprof 3.3.0.dev0 |
95 | | - |
96 | 31 | .. quick-install-end-marker-do-not-remove |
97 | 32 |
|
98 | | -Quick Start Instructions |
99 | | ------------------------- |
100 | | - |
101 | 33 | .. quick-start-start-marker-do-not-remove |
102 | 34 |
|
103 | | -* Add the following lines to the PyTorch network you want to profile: :: |
104 | | - |
105 | | - import torch.cuda.profiler as profiler |
106 | | - import pyprof |
107 | | - pyprof.init() |
108 | | - |
109 | | -* Profile with NVProf or Nsight Systems to generate a SQL file. :: |
110 | | - |
111 | | - $ nsys profile -f true -o net --export sqlite python net.py |
112 | | - |
113 | | -* Run the parse.py script to generate the dictionary. :: |
114 | | - |
115 | | - $ python -m pyprof.parse net.sqlite > net.dict |
116 | | - |
117 | | -* Run the prof.py script to generate the reports. :: |
118 | | - |
119 | | - $ python -m pyprof.prof --csv net.dict |
120 | | - |
121 | 35 | .. quick-start-end-marker-do-not-remove |
122 | 36 |
|
123 | | -Documentation |
124 | | -------------- |
125 | | - |
126 | | -The User Guide can be found in the |
127 | | -`documentation for current release |
128 | | -<https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/index.html>`_, and |
129 | | -provides instructions on how to install and profile with PyProf. |
130 | | - |
131 | | -A complete `Quick Start Guide <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/quickstart.html>`_ |
132 | | -provides step-by-step instructions to get you quickly started using PyProf. |
133 | | - |
134 | | -An `FAQ <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/faqs.html>`_ provides |
135 | | -answers for frequently asked questions. |
136 | | - |
137 | | -The `Release Notes |
138 | | -<https://docs.nvidia.com/deeplearning/frameworks/pyprof-release-notes/index.html>`_ |
139 | | -indicate the required versions of the NVIDIA Driver and CUDA, and also describe |
140 | | -which GPUs are supported by PyProf |
141 | | - |
142 | | -Presentation and Papers |
143 | | -^^^^^^^^^^^^^^^^^^^^^^^ |
144 | | - |
145 | | -* `Automating End-toEnd PyTorch Profiling <https://developer.nvidia.com/gtc/2020/video/s21143>`_. |
146 | | - * `Presentation slides <https://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21143-automating-end-to-end-pytorch-profiling.pdf>`_. |
147 | | - |
148 | | -Contributing |
149 | | ------------- |
150 | | - |
151 | | -Contributions to PyProf are more than welcome. To |
152 | | -contribute make a pull request and follow the guidelines outlined in |
153 | | -the `Contributing <CONTRIBUTING.md>`_ document. |
154 | | - |
155 | | -Reporting problems, asking questions |
156 | | ------------------------------------- |
157 | | - |
158 | | -We appreciate any feedback, questions or bug reporting regarding this |
159 | | -project. When help with code is needed, follow the process outlined in |
160 | | -the Stack Overflow (https://stackoverflow.com/help/mcve) |
161 | | -document. Ensure posted examples are: |
162 | | - |
163 | | -* minimal – use as little code as possible that still produces the |
164 | | - same problem |
165 | | - |
166 | | -* complete – provide all parts needed to reproduce the problem. Check |
167 | | - if you can strip external dependency and still show the problem. The |
168 | | - less time we spend on reproducing problems the more time we have to |
169 | | - fix it |
170 | | - |
171 | | -* verifiable – test the code you're about to provide to make sure it |
172 | | - reproduces the problem. Remove all other problems that are not |
173 | | - related to your request/question. |
174 | | - |
175 | 37 | .. |License| image:: https://img.shields.io/badge/License-Apache2-green.svg |
176 | 38 | :target: http://www.apache.org/licenses/LICENSE-2.0 |
0 commit comments