Skip to content
This repository was archived by the owner on Apr 2, 2025. It is now read-only.

Commit e33feb4

Browse files
restore code-centric view.
1 parent e192eca commit e33feb4

1 file changed

Lines changed: 9 additions & 2 deletions

File tree

doc/manual/HPCToolkit-users-manual.tex

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,11 @@ \chapter{Introduction}
215215
binary analysis exclusively.
216216
On x86\_64 processors, HPCToolkit employs both strategies in an integrated fashion.
217217

218+
\begin{figure}[t]
219+
\centering{\includegraphics[width=.8\textwidth]{fig/hpctoolkit-code-centric}}
220+
\caption{A code-centric view of an execution of the University of Chicago's FLASH code executing on 8192 cores of a Blue Gene/P. This bottom-up view shows that 16\% of the execution time was spent in IBM's DCMF messaging layer. By tracking these costs up the call chain, we can see that most of this time was spent on behalf of calls to {\tt pmpi\_allreduce} on line 419 of {\tt amr\_comm\_setup}.}
221+
\label{fig:code-centric}
222+
\end{figure}
218223

219224
\begin{figure}[t]
220225
\centering{\includegraphics[width=.8\textwidth]{fig/hpctoolkit-thread-centric}}
@@ -234,7 +239,8 @@ \chapter{Introduction}
234239
\HPCToolkit{} assembles performance measurements into a call path profile that associates the costs of each function call with its full calling context.
235240
In addition, \HPCToolkit{} uses binary analysis to attribute program performance metrics with uniquely detailed precision -- full dynamic calling contexts augmented with information about call sites, inlined functions and templates, loops, and source lines.
236241
Measurements can be analyzed in a variety of ways: top-down in a calling context tree, which associates costs with the full calling context in which they are incurred; bottom-up in a view that apportions costs associated with a function to each of the contexts in which the function is called; and in a flat view that aggregates all costs associated with a function independent of calling context.
237-
This multiplicity of code-centric perspectives is essential to understanding a program's performance for tuning under various circumstances. \HPCToolkit{} also supports a thread-centric perspective, which enables one to see how a performance metric for a calling context differs across threads, and a time-centric perspective, which enables a user to see how an execution unfolds over time. Figures~\ref{fig:code-centric}--\ref{fig:time-centric} show samples of the code-centric, thread-centric, and time-centric views.
242+
This multiplicity of code-centric perspectives is essential to understanding a program's performance for tuning under various circumstances.
243+
\HPCToolkit{} also supports a thread-centric perspective, which enables one to see how a performance metric for a calling context differs across threads, and a time-centric perspective, which enables a user to see how an execution unfolds over time. Figures~\ref{fig:code-centric}--\ref{fig:time-centric} show samples of HPCToolkit's code-centric, thread-centric, and time-centric views.
238244

239245
By working at the machine-code level, \HPCToolkit{} accurately measures and attributes costs in executions of multilingual programs, even if they are linked with libraries available only in binary form.
240246
\HPCToolkit{} supports performance analysis of fully optimized code -- the only form of a program worth measuring; it even measures and attributes performance metrics to shared libraries that are dynamically loaded at run time.
@@ -268,7 +274,8 @@ \chapter{\HPCToolkit{} Overview}
268274

269275
\HPCToolkit{}'s work flow is organized around four principal capabilities, as shown in Figure~\ref{fig:hpctoolkit-overview:a}:
270276
\begin{enumerate}
271-
\item \emph{measurement} of context-sensitive performance metrics while an application executes;
277+
\item \emph{measurement} of context-sensitive performance metrics using call-stack unwinding
278+
while an application executes;
272279
\item \emph{binary analysis} to recover program structure from application binaries;
273280
\item \emph{attribution} of performance metrics by correlating dynamic performance metrics with static program structure; and
274281
\item \emph{presentation} of performance metrics and associated source code.

0 commit comments

Comments
 (0)