Skip to content

Add large-run PerfettoSQL guidance to performance analysis docs#819

Open
Flamki wants to merge 2 commits into
precice:masterfrom
Flamki:issue-631-perfetto-large-runs
Open

Add large-run PerfettoSQL guidance to performance analysis docs#819
Flamki wants to merge 2 commits into
precice:masterfrom
Flamki:issue-631-perfetto-large-runs

Conversation

@Flamki
Copy link
Copy Markdown

@Flamki Flamki commented Feb 26, 2026

Summary

Add practical guidance for analyzing large preCICE traces with PerfettoSQL and scripted Trace Processor queries.

What changed

  • Updated content/docs/tooling/tooling-performance-analysis.md
  • Added a new section: Analyze larger traces with PerfettoSQL
    • event inventory query
    • runtime aggregation for key preCICE events
    • expensive advance call query
    • per-track imbalance query
    • robust track_id to participant/process/thread mapping query
  • Added a new section: Scripting trace analysis with Python
    • perfetto.trace_processor.TraceProcessor example for repeatable analysis
  • Added installation hints for:
    • trace_processor_shell
    • Python package perfetto

Scope

  • Content-only docs update
  • No JS/CSS/layout/behavior changes

Context

Validation

  • pre-commit run --files content/docs/tooling/tooling-performance-analysis.md (passed)
  • docker run --rm -v "${PWD}:/srv/jekyll" -w /srv/jekyll jekyll/jekyll:4 bash -lc "bundle install && bundle exec jekyll build" (passed)

@MakisH MakisH added content Content-only issues GSoC Contributed in the context of the Google Summer of Code labels Feb 27, 2026
Copy link
Copy Markdown
Member

@MakisH MakisH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, @Flamki. This requires more in-depth knowledge of the profiling functionality to make sense, but some aspects of this PR are already useful for communicating that PerfettoSQL exists and what the user can do with it. Some queries, however, were not very well-motivated/clear.

I executed all queries and documented the outputs. @fsimonis since you opened the original issue and have written most of this page, please take a look. I would suggest directly editing to not leave this open much longer (or closing if it is too far from what you wanted to have).

For my outputs, I just used the flow-over-heated-plate tutorial with fluid-openfoam, solid-openfoam, all serial, and with <profiling mode="all" synchronize="true" />.

trace.json

I am not sure where you got trace-processor-shell from the linked Perfetto documentation page, but I found the equivalent trace-processor. One query also did not work for me.


For larger runs, it is often faster to query the trace data directly instead of navigating the UI.
`trace_processor_shell` can execute PerfettoSQL on `trace.json` and gives reproducible results.
You can download `trace_processor_shell` from the [Perfetto releases page](https://perfetto.dev/docs/quickstart/trace-analysis#get-trace-processor).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can download `trace_processor_shell` from the [Perfetto releases page](https://perfetto.dev/docs/quickstart/trace-analysis#get-trace-processor).
You can download `trace_processor_shell` from the [Perfetto releases page](https://perfetto.dev/docs/analysis/trace-processor).

First, inspect which event names are present:

```console
trace_processor_shell -q "SELECT name, COUNT(*) AS n FROM slice GROUP BY name ORDER BY n DESC LIMIT 30;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

> SELECT name, COUNT(*) AS n FROM slice GROUP BY name ORDER BY n DESC LIMIT 30;
name                 n                    
-------------------- -------------------- 
solver.advance                        428 
mapping                               428 
waitAndSendData                       426 
waitAndReceiveData                    426 
mn2.sendData.interSy                  426 
m2n.sendData                          426 
m2n.receiveData.inte                  426 
m2n.receiveData                       426 
advanceCoupling                       426 
advance                               426 
readData.Solid-Mesh_                  214 
readData.Fluid-Mesh_                  214 
handleExports                         214 
writeData.Solid-Mesh                  213 
writeData.Fluid-Mesh                  213 
sendConvergence                       213 
receiveConvergence                    213 
map.nn.mapData.FromS                  213 
map.nn.mapData.FromF                  213 
accelerate                            213 
m2n.exchangeVertexDi                    4 
m2n.createCommunicat                    4 
m2n.buildCommunicati                    4 
m2n.broadcastVertexD                    4 
startProfilingBacken                    2 
solver.initialize                       2 
repartitioning                          2 
readConfiguration                       2 
initialize                              2 
initalizeCouplingSch                    2 

Query executed in 3.294 ms

### Analyze larger traces with PerfettoSQL

For larger runs, it is often faster to query the trace data directly instead of navigating the UI.
`trace_processor_shell` can execute PerfettoSQL on `trace.json` and gives reproducible results.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find any trace_processor_shell, only a trace_processor, which at first run downloads trace_processor_shell:

$ curl -LO https://get.perfetto.dev/trace_processor
$ chmod +x ./trace_processor 
$ ./trace_processor trace.json 
Downloading https://commondatastorage.googleapis.com/perfetto-luci-artifacts/v55.2/linux-amd64/trace_processor_shell

Then focus on preCICE events, for example to get total runtime per event:

```console
trace_processor_shell -q "SELECT name, COUNT(*) AS n, ROUND(SUM(dur)/1e9, 6) AS total_s, ROUND(AVG(dur)/1e6, 3) AS avg_ms FROM slice WHERE name IN ('advance', 'initialize', 'solver.advance', 'solver.initialize', 'construction', 'finalize') GROUP BY name ORDER BY total_s DESC;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

> SELECT name, COUNT(*) AS n, ROUND(SUM(dur)/1e9, 6) AS total_s, ROUND(AVG(dur)/1e6, 3) AS avg_ms FROM slice WHERE name IN ('advance', 'initialize', 'solver.advance', 'solver.initialize', 'construction', 'finalize') GROUP BY name ORDER BY total_s DESC;
name                 n                    total_s              avg_ms               
-------------------- -------------------- -------------------- -------------------- 
advance                               426            28.813540            67.637000 
solver.advance                        428            27.648382            64.599000 
initialize                              2             3.481007          1740.504000 
construction                            2             0.357415           178.708000 
finalize                                2             0.001848             0.924000 
solver.initialize                       2             0.000714             0.357000 

Query executed in 1.510 ms

trace_processor_shell -q "SELECT name, COUNT(*) AS n, ROUND(SUM(dur)/1e9, 6) AS total_s, ROUND(AVG(dur)/1e6, 3) AS avg_ms FROM slice WHERE name IN ('advance', 'initialize', 'solver.advance', 'solver.initialize', 'construction', 'finalize') GROUP BY name ORDER BY total_s DESC;" trace.json
```

To find unusually expensive `advance` calls:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To find unusually expensive `advance` calls:
To find the most expensive `advance` calls:

To find unusually expensive `advance` calls:

```console
trace_processor_shell -q "SELECT ROUND(ts/1e9, 6) AS start_s, ROUND(dur/1e6, 3) AS dur_ms, track_id FROM slice WHERE name = 'advance' ORDER BY dur DESC LIMIT 20;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

> SELECT ROUND(ts/1e9, 6) AS start_s, ROUND(dur/1e6, 3) AS dur_ms, track_id FROM slice WHERE name = 'advance' ORDER BY dur DESC LIMIT 20;
start_s              dur_ms               track_id             
-------------------- -------------------- -------------------- 
   1779026672.600307           276.929000                    1 
   1779026673.084940           268.537000                    1 
   1779026671.265589           218.855000                    1 
   1779026679.218089           215.254000                    1 
   1779026671.486101           214.927000                    1 
   1779026673.558230           212.789000                    1 
   1779026667.692958           209.477000                    1 
   1779026673.970424           200.914000                    1 
   1779026673.356204           200.625000                    1 
   1779026672.883513           199.199000                    1 
   1779026673.772448           196.284000                    1 
   1779026666.334844           194.339000                    1 
   1779026672.225841           192.858000                    1 
   1779026667.015985           187.507000                    1 
   1779026666.830590           184.151000                    1 
   1779026654.190474           182.940000                    1 
   1779026672.420540           178.525000                    1 
   1779026672.048006           176.499000                    1 
   1779026671.702332           173.871000                    1 
   1779026675.158442           173.198000                    1 

Query executed in 0.743 ms

To estimate rank/participant imbalance on `solver.advance`, compare totals per track:

```console
trace_processor_shell -q "SELECT track_id, COUNT(*) AS n, ROUND(SUM(dur)/1e9, 6) AS total_s, ROUND(AVG(dur)/1e6, 3) AS avg_ms FROM slice WHERE name = 'solver.advance' GROUP BY track_id ORDER BY total_s DESC;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what rank/participant imbalance means in this context. Do I understand correctly that this query is mainly to list total and average duration of solver.advance() for each rank?

Output:

> SELECT track_id, COUNT(*) AS n, ROUND(SUM(dur)/1e9, 6) AS total_s, ROUND(AVG(dur)/1e6, 3) AS avg_ms FROM slice WHERE name = 'solver.advance' GROUP BY track_id ORDER BY total_s DESC;
track_id             n                    total_s              avg_ms               
-------------------- -------------------- -------------------- -------------------- 
                   0                  214            27.402214           128.048000 
                   1                  214             0.246168             1.150000 

Query executed in 1.328 ms

If you need the mapping from `track_id` to names:

```console
trace_processor_shell -q "SELECT t.id AS track_id, COALESCE(t.name, tt.name, pt.name, p.name, th.name) AS track_name, p.pid AS pid, p.name AS process_name, tt.utid AS utid, th.name AS thread_name FROM track t LEFT JOIN thread_track tt ON tt.id = t.id LEFT JOIN process_track pt ON pt.id = t.id LEFT JOIN thread th ON th.utid = tt.utid LEFT JOIN process p ON p.upid = COALESCE(tt.upid, pt.upid) ORDER BY t.id;" trace.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work for me. Output:

 SELECT t.id AS track_id, COALESCE(t.name, tt.name, pt.name, p.name, th.name) AS track_name, p.pid AS pid, p.name AS process_name, tt.utid AS utid, th.name AS thread_name FROM track t LEFT JOIN thread_track tt ON tt.id = t.id LEFT JOIN process_track pt ON pt.id = t.id LEFT JOIN thread th ON th.utid = tt.utid LEFT JOIN process p ON p.upid = COALESCE(tt.upid, pt.upid) ORDER BY t.id;
Traceback (most recent call last):
  File "stdin" line 1 col 1
    SELECT t.id AS track_id, COALESCE(t.name, tt.name, pt.name, p.name, th.name) AS track_name, p.pid AS pid, p.name AS process_name
    ^
no such column: tt.upid

Query executed in 0.503 ms

Comment on lines +360 to +377
from perfetto.trace_processor import TraceProcessor

tp = TraceProcessor(file_path="trace.json")

query = """
SELECT
name,
COUNT(*) AS n,
SUM(dur) / 1e9 AS total_s,
AVG(dur) / 1e6 AS avg_ms
FROM slice
WHERE name IN ('advance', 'initialize', 'solver.advance', 'solver.initialize')
GROUP BY name
ORDER BY total_s DESC
"""

for row in tp.query(query):
print(row.name, row.n, row.total_s, row.avg_ms)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

$ python -m venv venv && source venv/bin/activate
$ pip install perfetto
$ python test.py 
advance 426 28.81354 67.63741784037559
solver.advance 428 27.648382 64.59902336448597
initialize 2 3.481007 1740.5035
solver.initialize 2 0.000714 0.357

print(row.name, row.n, row.total_s, row.avg_ms)
```

Recent Perfetto Python versions expose selected columns as row attributes (`row.name`, `row.total_s`, ...).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since which versions? What does that mean for the script?


for row in tp.query(query):
print(row.name, row.n, row.total_s, row.avg_ms)
```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I was curious to try this snippet since I'm currently trying to plot preCICE performance in a more automated way. I'm basically looking for an easy way to extract the performance values, and such a small Python script would be great.
However, when using this, I see that it adds up the values for e.g., "advance" for my two participants. I think it would be more useful to see the sum for each participant separately. Also, it would be useful if all names are supported. Currently only the names that appear in the top level when doing analyze seem to be supported (for exampleWHERE name IN ('initialize/mapping')does not seem to work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

content Content-only issues GSoC Contributed in the context of the Google Summer of Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants