Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 66 additions & 1 deletion content/docs/tooling/tooling-performance-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ This trace format can then be visualized using the following tools:

Use `precice-cli profiling trace --web` to directly open the exported trace in the browser.

These visualization tools cannot handle large runs though.
Even with these tools, very large runs can become difficult to inspect interactively.
There are two options to reduce the trace size:
You can select the first `N` ranks using `-l N` and you can pick specific ranks using `-r RANK ...`
These two selectors are combined.
Expand Down Expand Up @@ -315,6 +315,71 @@ This second version contains all events using the configuration `<profiling mode

![example of parallel ASTE with all events visualized by perfetto](images/docs/tooling/profiling-aste-perfetto-parallel-all.png)

### Analyze larger traces with PerfettoSQL

For larger runs, it is often faster to query the trace data directly instead of navigating the UI.
`trace_processor_shell` can execute PerfettoSQL on `trace.json` and gives reproducible results.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find any trace_processor_shell, only a trace_processor, which at first run downloads trace_processor_shell:

$ curl -LO https://get.perfetto.dev/trace_processor
$ chmod +x ./trace_processor 
$ ./trace_processor trace.json 
Downloading https://commondatastorage.googleapis.com/perfetto-luci-artifacts/v55.2/linux-amd64/trace_processor_shell

You can download `trace_processor_shell` from the [Perfetto releases page](https://perfetto.dev/docs/quickstart/trace-analysis#get-trace-processor).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can download `trace_processor_shell` from the [Perfetto releases page](https://perfetto.dev/docs/quickstart/trace-analysis#get-trace-processor).
You can download `trace_processor_shell` from the [Perfetto releases page](https://perfetto.dev/docs/analysis/trace-processor).


First, inspect which event names are present:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of trace_processor_shell -q "...", I would rather execute ./trace_processor first, and then run queries in it.

And mention that one can exit with .q.


```console
trace_processor_shell -q "SELECT name, COUNT(*) AS n FROM slice GROUP BY name ORDER BY n DESC LIMIT 30;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

> SELECT name, COUNT(*) AS n FROM slice GROUP BY name ORDER BY n DESC LIMIT 30;
name                 n                    
-------------------- -------------------- 
solver.advance                        428 
mapping                               428 
waitAndSendData                       426 
waitAndReceiveData                    426 
mn2.sendData.interSy                  426 
m2n.sendData                          426 
m2n.receiveData.inte                  426 
m2n.receiveData                       426 
advanceCoupling                       426 
advance                               426 
readData.Solid-Mesh_                  214 
readData.Fluid-Mesh_                  214 
handleExports                         214 
writeData.Solid-Mesh                  213 
writeData.Fluid-Mesh                  213 
sendConvergence                       213 
receiveConvergence                    213 
map.nn.mapData.FromS                  213 
map.nn.mapData.FromF                  213 
accelerate                            213 
m2n.exchangeVertexDi                    4 
m2n.createCommunicat                    4 
m2n.buildCommunicati                    4 
m2n.broadcastVertexD                    4 
startProfilingBacken                    2 
solver.initialize                       2 
repartitioning                          2 
readConfiguration                       2 
initialize                              2 
initalizeCouplingSch                    2 

Query executed in 3.294 ms

```

Then focus on preCICE events, for example to get total runtime per event:

```console
trace_processor_shell -q "SELECT name, COUNT(*) AS n, ROUND(SUM(dur)/1e9, 6) AS total_s, ROUND(AVG(dur)/1e6, 3) AS avg_ms FROM slice WHERE name IN ('advance', 'initialize', 'solver.advance', 'solver.initialize', 'construction', 'finalize') GROUP BY name ORDER BY total_s DESC;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

> SELECT name, COUNT(*) AS n, ROUND(SUM(dur)/1e9, 6) AS total_s, ROUND(AVG(dur)/1e6, 3) AS avg_ms FROM slice WHERE name IN ('advance', 'initialize', 'solver.advance', 'solver.initialize', 'construction', 'finalize') GROUP BY name ORDER BY total_s DESC;
name                 n                    total_s              avg_ms               
-------------------- -------------------- -------------------- -------------------- 
advance                               426            28.813540            67.637000 
solver.advance                        428            27.648382            64.599000 
initialize                              2             3.481007          1740.504000 
construction                            2             0.357415           178.708000 
finalize                                2             0.001848             0.924000 
solver.initialize                       2             0.000714             0.357000 

Query executed in 1.510 ms

```

To find unusually expensive `advance` calls:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To find unusually expensive `advance` calls:
To find the most expensive `advance` calls:


```console
trace_processor_shell -q "SELECT ROUND(ts/1e9, 6) AS start_s, ROUND(dur/1e6, 3) AS dur_ms, track_id FROM slice WHERE name = 'advance' ORDER BY dur DESC LIMIT 20;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

> SELECT ROUND(ts/1e9, 6) AS start_s, ROUND(dur/1e6, 3) AS dur_ms, track_id FROM slice WHERE name = 'advance' ORDER BY dur DESC LIMIT 20;
start_s              dur_ms               track_id             
-------------------- -------------------- -------------------- 
   1779026672.600307           276.929000                    1 
   1779026673.084940           268.537000                    1 
   1779026671.265589           218.855000                    1 
   1779026679.218089           215.254000                    1 
   1779026671.486101           214.927000                    1 
   1779026673.558230           212.789000                    1 
   1779026667.692958           209.477000                    1 
   1779026673.970424           200.914000                    1 
   1779026673.356204           200.625000                    1 
   1779026672.883513           199.199000                    1 
   1779026673.772448           196.284000                    1 
   1779026666.334844           194.339000                    1 
   1779026672.225841           192.858000                    1 
   1779026667.015985           187.507000                    1 
   1779026666.830590           184.151000                    1 
   1779026654.190474           182.940000                    1 
   1779026672.420540           178.525000                    1 
   1779026672.048006           176.499000                    1 
   1779026671.702332           173.871000                    1 
   1779026675.158442           173.198000                    1 

Query executed in 0.743 ms

```

To estimate rank/participant imbalance on `solver.advance`, compare totals per track:

```console
trace_processor_shell -q "SELECT track_id, COUNT(*) AS n, ROUND(SUM(dur)/1e9, 6) AS total_s, ROUND(AVG(dur)/1e6, 3) AS avg_ms FROM slice WHERE name = 'solver.advance' GROUP BY track_id ORDER BY total_s DESC;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what rank/participant imbalance means in this context. Do I understand correctly that this query is mainly to list total and average duration of solver.advance() for each rank?

Output:

> SELECT track_id, COUNT(*) AS n, ROUND(SUM(dur)/1e9, 6) AS total_s, ROUND(AVG(dur)/1e6, 3) AS avg_ms FROM slice WHERE name = 'solver.advance' GROUP BY track_id ORDER BY total_s DESC;
track_id             n                    total_s              avg_ms               
-------------------- -------------------- -------------------- -------------------- 
                   0                  214            27.402214           128.048000 
                   1                  214             0.246168             1.150000 

Query executed in 1.328 ms

```

If you need the mapping from `track_id` to names:

```console
trace_processor_shell -q "SELECT t.id AS track_id, COALESCE(t.name, tt.name, pt.name, p.name, th.name) AS track_name, p.pid AS pid, p.name AS process_name, tt.utid AS utid, th.name AS thread_name FROM track t LEFT JOIN thread_track tt ON tt.id = t.id LEFT JOIN process_track pt ON pt.id = t.id LEFT JOIN thread th ON th.utid = tt.utid LEFT JOIN process p ON p.upid = COALESCE(tt.upid, pt.upid) ORDER BY t.id;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work for me. Output:

 SELECT t.id AS track_id, COALESCE(t.name, tt.name, pt.name, p.name, th.name) AS track_name, p.pid AS pid, p.name AS process_name, tt.utid AS utid, th.name AS thread_name FROM track t LEFT JOIN thread_track tt ON tt.id = t.id LEFT JOIN process_track pt ON pt.id = t.id LEFT JOIN thread th ON th.utid = tt.utid LEFT JOIN process p ON p.upid = COALESCE(tt.upid, pt.upid) ORDER BY t.id;
Traceback (most recent call last):
  File "stdin" line 1 col 1
    SELECT t.id AS track_id, COALESCE(t.name, tt.name, pt.name, p.name, th.name) AS track_name, p.pid AS pid, p.name AS process_name
    ^
no such column: tt.upid

Query executed in 0.503 ms

```

### Scripting trace analysis with Python

For recurring analyses in CI or larger studies, you can script queries using Perfetto Trace Processor:
installable via `pip install perfetto`.

```python
from perfetto.trace_processor import TraceProcessor

tp = TraceProcessor(file_path="trace.json")

query = """
SELECT
name,
COUNT(*) AS n,
SUM(dur) / 1e9 AS total_s,
AVG(dur) / 1e6 AS avg_ms
FROM slice
WHERE name IN ('advance', 'initialize', 'solver.advance', 'solver.initialize')
GROUP BY name
ORDER BY total_s DESC
"""

for row in tp.query(query):
print(row.name, row.n, row.total_s, row.avg_ms)
Comment on lines +360 to +377
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

$ python -m venv venv && source venv/bin/activate
$ pip install perfetto
$ python test.py 
advance 426 28.81354 67.63741784037559
solver.advance 428 27.648382 64.59902336448597
initialize 2 3.481007 1740.5035
solver.initialize 2 0.000714 0.357

```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I was curious to try this snippet since I'm currently trying to plot preCICE performance in a more automated way. I'm basically looking for an easy way to extract the performance values, and such a small Python script would be great.
However, when using this, I see that it adds up the values for e.g., "advance" for my two participants. I think it would be more useful to see the sum for each participant separately. Also, it would be useful if all names are supported. Currently only the names that appear in the top level when doing analyze seem to be supported (for exampleWHERE name IN ('initialize/mapping')does not seem to work).


Recent Perfetto Python versions expose selected columns as row attributes (`row.name`, `row.total_s`, ...).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since which versions? What does that mean for the script?

This approach scales better than manual exploration and allows you to keep a stable analysis script for your preCICE runs.

### Analyzing participants

You can run `precice-cli profiling analyze NAME` to analyze the participant `NAME` and display a table of recorded events.
Expand Down
Loading