1818PyProf - PyTorch Profiling tool
1919===============================
2020
21- **ANNOUNCEMENT: The default branch for PyProf has changed to 'main'. Please
22- update all pulls and PRs accordingly. **
23-
24- **LATEST RELEASE: You are currently working on the main branch which
25- tracks under-development progress towards the next release. The
26- latest release of the PyProf is 3.7.0 and is available on branch ** `r20.12
27- <https://github.com/NVIDIA/PyProf/blob/r20.12> `_.
21+ **NOTE: You are currently on the r21.03 branch which tracks stabilization
22+ towards the release. This branch is not usable during stabilization. **
2823
2924.. overview-begin-marker-do-not-remove
3025
31- PyProf is a tool that profiles and analyzes the GPU performance of PyTorch
32- models. PyProf aggregates kernel performance from `Nsight Systems
33- <https://developer.nvidia.com/nsight-systems> `_ or `NvProf
34- <https://developer.nvidia.com/nvidia-visual-profiler> `_ and provides the
35- following additional features:
36-
37- * Identifies the layer that launched a kernel: e.g. the association of
38- `ComputeOffsetsKernel ` with a concrete PyTorch layer or API is not obvious.
39-
40- * Identifies the tensor dimensions and precision: without knowing the tensor
41- dimensions and precision, it's impossible to reason about whether the actual
42- (silicon) kernel time is close to maximum performance of such a kernel on
43- the GPU. Knowing the tensor dimensions and precision, we can figure out the
44- FLOPs and bandwidth required by a layer, and then determine how close to
45- maximum performance the kernel is for that operation.
46-
47- * Forward-backward correlation: PyProf determines what the forward pass step
48- is that resulted in the particular weight and data gradients (wgrad, dgrad),
49- which makes it possible to determine the tensor dimensions required by these
50- backprop steps to assess their performance.
51-
52- * Determines Tensor Core usage: PyProf can highlight the kernels that use
53- `Tensor Cores <https://developer.nvidia.com/tensor-cores >`_.
54-
55- * Correlate the line in the user's code that launched a particular kernel (program trace).
56-
5726.. overview-end-marker-do-not-remove
5827
59- The current release of PyProf is 3.7.0 and is available in the 20.12 release of
60- the PyTorch container on `NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com >`_. The
61- branch for this release is `r20.12
62- <https://github.com/NVIDIA/PyProf/tree/r20.12> `_.
63-
6428 Quick Installation Instructions
6529-------------------------------
6630
@@ -82,7 +46,7 @@ Quick Installation Instructions
8246
8347* Should display ::
8448
85- pyprof 3.9.0.dev0
49+ pyprof 3.9.0
8650
8751.. quick-install-end-marker-do-not-remove
8852
@@ -111,57 +75,5 @@ Quick Start Instructions
11175
11276.. quick-start-end-marker-do-not-remove
11377
114- Documentation
115- -------------
116-
117- The User Guide can be found in the
118- `documentation for current release
119- <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/index.html> `_, and
120- provides instructions on how to install and profile with PyProf.
121-
122- A complete `Quick Start Guide <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/quickstart.html >`_
123- provides step-by-step instructions to get you quickly started using PyProf.
124-
125- An `FAQ <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/faqs.html >`_ provides
126- answers for frequently asked questions.
127-
128- The `Release Notes
129- <https://docs.nvidia.com/deeplearning/frameworks/pyprof-release-notes/index.html> `_
130- indicate the required versions of the NVIDIA Driver and CUDA, and also describe
131- which GPUs are supported by PyProf
132-
133- Presentation and Papers
134- ^^^^^^^^^^^^^^^^^^^^^^^
135-
136- * `Automating End-toEnd PyTorch Profiling <https://developer.nvidia.com/gtc/2020/video/s21143 >`_.
137- * `Presentation slides <https://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21143-automating-end-to-end-pytorch-profiling.pdf >`_.
138-
139- Contributing
140- ------------
141-
142- Contributions to PyProf are more than welcome. To
143- contribute make a pull request and follow the guidelines outlined in
144- the `Contributing <CONTRIBUTING.md >`_ document.
145-
146- Reporting problems, asking questions
147- ------------------------------------
148-
149- We appreciate any feedback, questions or bug reporting regarding this
150- project. When help with code is needed, follow the process outlined in
151- the Stack Overflow (https://stackoverflow.com/help/mcve)
152- document. Ensure posted examples are:
153-
154- * minimal – use as little code as possible that still produces the
155- same problem
156-
157- * complete – provide all parts needed to reproduce the problem. Check
158- if you can strip external dependency and still show the problem. The
159- less time we spend on reproducing problems the more time we have to
160- fix it
161-
162- * verifiable – test the code you're about to provide to make sure it
163- reproduces the problem. Remove all other problems that are not
164- related to your request/question.
165-
16678 .. |License | image :: https://img.shields.io/badge/License-Apache2-green.svg
16779 :target: http://www.apache.org/licenses/LICENSE-2.0
0 commit comments