Skip to content

Commit b4a6235

Browse files
committed
Update README and NGC versions post-20.10 release
1 parent 3347bf5 commit b4a6235

1 file changed

Lines changed: 101 additions & 3 deletions

File tree

README.rst

Lines changed: 101 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,59 @@
1818
PyProf - PyTorch Profiling tool
1919
===============================
2020

21-
**NOTE: You are currently on the r20.10 branch which tracks stabilization
22-
towards the release. This branch is not usable during stabilization.**
23-
2421
.. overview-begin-marker-do-not-remove
2522
23+
PyProf is a tool that profiles and analyzes the GPU performance of PyTorch
24+
models. PyProf aggregates kernel performance from `Nsight Systems
25+
<https://developer.nvidia.com/nsight-systems>`_ or `NvProf
26+
<https://developer.nvidia.com/nvidia-visual-profiler>`_.
27+
28+
What's New in 3.5.0
29+
-------------------
30+
* Nsight System database lookup improved to speed up the runtime profile
31+
analysis time by 50x.
32+
33+
* Node names will now include class info and can be linked back to the original
34+
Python source.
35+
36+
Known Issues
37+
------------
38+
* Forward-Backward kernel correlation heuristics do not work correctly with
39+
PyTorch 1.6. Recommended work arounds include:
40+
41+
* Use with PyTorch 1.5
42+
* Use DLProf in the `20.10 NGC Pytorch container <https://ngc.nvidia.com/catalog/containers/nvidia:pytorch>`_
43+
44+
Features
45+
--------
46+
47+
* Identifies the layer that launched a kernel: e.g. the association of
48+
`ComputeOffsetsKernel` with a concrete PyTorch layer or API is not obvious.
49+
50+
* Identifies the tensor dimensions and precision: without knowing the tensor
51+
dimensions and precision, it's impossible to reason about whether the actual
52+
(silicon) kernel time is close to maximum performance of such a kernel on
53+
the GPU. Knowing the tensor dimensions and precision, we can figure out the
54+
FLOPs and bandwidth required by a layer, and then determine how close to
55+
maximum performance the kernel is for that operation.
56+
57+
* Forward-backward correlation: PyProf determines what the forward pass step
58+
is that resulted in the particular weight and data gradients (wgrad, dgrad),
59+
which makes it possible to determine the tensor dimensions required by these
60+
backprop steps to assess their performance.
61+
62+
* Determines Tensor Core usage: PyProf can highlight the kernels that use
63+
`Tensor Cores <https://developer.nvidia.com/tensor-cores>`_.
64+
65+
* Correlate the line in the user's code that launched a particular kernel (program trace).
66+
2667
.. overview-end-marker-do-not-remove
2768
69+
The current release of PyProf is 3.5.0 and is available in the 20.10 release of
70+
the PyTorch container on `NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The
71+
branch for this release is `r20.10
72+
<https://github.com/NVIDIA/PyProf/tree/r20.10>`_.
73+
2874
Quick Installation Instructions
2975
-------------------------------
3076

@@ -75,5 +121,57 @@ Quick Start Instructions
75121

76122
.. quick-start-end-marker-do-not-remove
77123
124+
Documentation
125+
-------------
126+
127+
The User Guide can be found in the
128+
`documentation for current release
129+
<https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/index.html>`_, and
130+
provides instructions on how to install and profile with PyProf.
131+
132+
A complete `Quick Start Guide <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/quickstart.html>`_
133+
provides step-by-step instructions to get you quickly started using PyProf.
134+
135+
An `FAQ <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/faqs.html>`_ provides
136+
answers for frequently asked questions.
137+
138+
The `Release Notes
139+
<https://docs.nvidia.com/deeplearning/frameworks/pyprof-release-notes/index.html>`_
140+
indicate the required versions of the NVIDIA Driver and CUDA, and also describe
141+
which GPUs are supported by PyProf
142+
143+
Presentation and Papers
144+
^^^^^^^^^^^^^^^^^^^^^^^
145+
146+
* `Automating End-toEnd PyTorch Profiling <https://developer.nvidia.com/gtc/2020/video/s21143>`_.
147+
* `Presentation slides <https://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21143-automating-end-to-end-pytorch-profiling.pdf>`_.
148+
149+
Contributing
150+
------------
151+
152+
Contributions to PyProf are more than welcome. To
153+
contribute make a pull request and follow the guidelines outlined in
154+
the `Contributing <CONTRIBUTING.md>`_ document.
155+
156+
Reporting problems, asking questions
157+
------------------------------------
158+
159+
We appreciate any feedback, questions or bug reporting regarding this
160+
project. When help with code is needed, follow the process outlined in
161+
the Stack Overflow (https://stackoverflow.com/help/mcve)
162+
document. Ensure posted examples are:
163+
164+
* minimal – use as little code as possible that still produces the
165+
same problem
166+
167+
* complete – provide all parts needed to reproduce the problem. Check
168+
if you can strip external dependency and still show the problem. The
169+
less time we spend on reproducing problems the more time we have to
170+
fix it
171+
172+
* verifiable – test the code you're about to provide to make sure it
173+
reproduces the problem. Remove all other problems that are not
174+
related to your request/question.
175+
78176
.. |License| image:: https://img.shields.io/badge/License-Apache2-green.svg
79177
:target: http://www.apache.org/licenses/LICENSE-2.0

0 commit comments

Comments
 (0)