Skip to content

Commit ab4dc5a

Browse files
committed
Update README for stabilization
1 parent 662ddc0 commit ab4dc5a

1 file changed

Lines changed: 4 additions & 142 deletions

File tree

README.rst

Lines changed: 4 additions & 142 deletions
Original file line numberDiff line numberDiff line change
@@ -18,159 +18,21 @@
1818
PyProf - PyTorch Profiling tool
1919
===============================
2020

21+
**NOTE: You are currently on the r20.09 branch which tracks
22+
stabilization towards the release. This branch is not usable
23+
during stabilization.**
24+
2125
.. overview-begin-marker-do-not-remove
2226
23-
PyProf is a tool that profiles and analyzes the GPU performance of PyTorch
24-
models. PyProf aggregates kernel performance from `Nsight Systems
25-
<https://developer.nvidia.com/nsight-systems>`_ or `NvProf
26-
<https://developer.nvidia.com/nvidia-visual-profiler>`_.
27-
28-
What's New in 3.4.0
29-
-------------------
30-
31-
* README and User Guide documentation has been updated with more installation
32-
options and pointers
33-
34-
Known Issues
35-
------------
36-
37-
* Forward-Backward kernel correlation heuristics do not work correctly with
38-
PyTorch 1.6. Recommended work arounds include:
39-
40-
* Use with PyTorch 1.5
41-
* Use DLProf in the `20.09 NGC Pytorch container <https://ngc.nvidia.com/catalog/containers/nvidia:pytorch>`_
42-
43-
Features
44-
--------
45-
46-
* Identifies the layer that launched a kernel: e.g. the association of
47-
`ComputeOffsetsKernel` with a concrete PyTorch layer or API is not obvious.
48-
49-
* Identifies the tensor dimensions and precision: without knowing the tensor
50-
dimensions and precision, it's impossible to reason about whether the actual
51-
(silicon) kernel time is close to maximum performance of such a kernel on
52-
the GPU. Knowing the tensor dimensions and precision, we can figure out the
53-
FLOPs and bandwidth required by a layer, and then determine how close to
54-
maximum performance the kernel is for that operation.
55-
56-
* Forward-backward correlation: PyProf determines what the forward pass step
57-
is that resulted in the particular weight and data gradients (wgrad, dgrad),
58-
which makes it possible to determine the tensor dimensions required by these
59-
backprop steps to assess their performance.
60-
61-
* Determines Tensor Core usage: PyProf can highlight the kernels that use
62-
`Tensor Cores <https://developer.nvidia.com/tensor-cores>`_.
63-
64-
* Correlate the line in the user's code that launched a particular kernel (program trace).
65-
6627
.. overview-end-marker-do-not-remove
6728
68-
The current release of PyProf is 3.4.0 and is available in the 20.09 release of
69-
the PyTorch container on `NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The
70-
branch for this release is `r20.09
71-
<https://github.com/NVIDIA/PyProf/tree/r20.09>`_.
72-
73-
Quick Installation Instructions
74-
-------------------------------
75-
7629
.. quick-install-start-marker-do-not-remove
7730
78-
* Clone the git repository ::
79-
80-
$ git clone https://github.com/NVIDIA/PyProf.git
81-
82-
* Navigate to the top level PyProf directory
83-
84-
* Install PyProf ::
85-
86-
$ pip install .
87-
88-
* Verify installation is complete with pip list ::
89-
90-
$ pip list | grep pyprof
91-
92-
* Should display ::
93-
94-
pyprof 3.3.0.dev0
95-
9631
.. quick-install-end-marker-do-not-remove
9732
98-
Quick Start Instructions
99-
------------------------
100-
10133
.. quick-start-start-marker-do-not-remove
10234
103-
* Add the following lines to the PyTorch network you want to profile: ::
104-
105-
import torch.cuda.profiler as profiler
106-
import pyprof
107-
pyprof.init()
108-
109-
* Profile with NVProf or Nsight Systems to generate a SQL file. ::
110-
111-
$ nsys profile -f true -o net --export sqlite python net.py
112-
113-
* Run the parse.py script to generate the dictionary. ::
114-
115-
$ python -m pyprof.parse net.sqlite > net.dict
116-
117-
* Run the prof.py script to generate the reports. ::
118-
119-
$ python -m pyprof.prof --csv net.dict
120-
12135
.. quick-start-end-marker-do-not-remove
12236
123-
Documentation
124-
-------------
125-
126-
The User Guide can be found in the
127-
`documentation for current release
128-
<https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/index.html>`_, and
129-
provides instructions on how to install and profile with PyProf.
130-
131-
A complete `Quick Start Guide <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/quickstart.html>`_
132-
provides step-by-step instructions to get you quickly started using PyProf.
133-
134-
An `FAQ <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/faqs.html>`_ provides
135-
answers for frequently asked questions.
136-
137-
The `Release Notes
138-
<https://docs.nvidia.com/deeplearning/frameworks/pyprof-release-notes/index.html>`_
139-
indicate the required versions of the NVIDIA Driver and CUDA, and also describe
140-
which GPUs are supported by PyProf
141-
142-
Presentation and Papers
143-
^^^^^^^^^^^^^^^^^^^^^^^
144-
145-
* `Automating End-toEnd PyTorch Profiling <https://developer.nvidia.com/gtc/2020/video/s21143>`_.
146-
* `Presentation slides <https://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21143-automating-end-to-end-pytorch-profiling.pdf>`_.
147-
148-
Contributing
149-
------------
150-
151-
Contributions to PyProf are more than welcome. To
152-
contribute make a pull request and follow the guidelines outlined in
153-
the `Contributing <CONTRIBUTING.md>`_ document.
154-
155-
Reporting problems, asking questions
156-
------------------------------------
157-
158-
We appreciate any feedback, questions or bug reporting regarding this
159-
project. When help with code is needed, follow the process outlined in
160-
the Stack Overflow (https://stackoverflow.com/help/mcve)
161-
document. Ensure posted examples are:
162-
163-
* minimal – use as little code as possible that still produces the
164-
same problem
165-
166-
* complete – provide all parts needed to reproduce the problem. Check
167-
if you can strip external dependency and still show the problem. The
168-
less time we spend on reproducing problems the more time we have to
169-
fix it
170-
171-
* verifiable – test the code you're about to provide to make sure it
172-
reproduces the problem. Remove all other problems that are not
173-
related to your request/question.
174-
17537
.. |License| image:: https://img.shields.io/badge/License-Apache2-green.svg
17638
:target: http://www.apache.org/licenses/LICENSE-2.0

0 commit comments

Comments
 (0)