From f5a1b2d879c538bfb8cd3657bda087a39014ad7e Mon Sep 17 00:00:00 2001 From: Timm Leon Erxleben Date: Sun, 11 Jun 2023 18:47:19 +0200 Subject: [PATCH 1/2] documentation on flamgraphs --- doc/README.md | 1 + doc/flamegraphs.md | 66 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+) create mode 100644 doc/flamegraphs.md diff --git a/doc/README.md b/doc/README.md index 8533cdde1..b3e360b55 100644 --- a/doc/README.md +++ b/doc/README.md @@ -16,6 +16,7 @@ Its goal is to provide a solid foundation for storage research and teaching. * [Installation and Usage](installation-usage.md) * [Configuration](configuration.md) * [Debugging](debugging.md) +* [Using FlameGraphs to analyse performance](flamegraphs.md) * [Implementing a Backend](implementing-backend.md) * [JULEA-DB Details](db-code.md) * [HDF5 Support](hdf5.md) diff --git a/doc/flamegraphs.md b/doc/flamegraphs.md new file mode 100644 index 000000000..3feb095b0 --- /dev/null +++ b/doc/flamegraphs.md @@ -0,0 +1,66 @@ +# Using perf and FlameGraphs to profile JULEA + +This is a short guide on how to use `perf` and FlameGraphs to analyse the performance of JULEA's server. + +## Prerequisites + +- `perf` is installed +- necessary permissions are set (`sudo sysctl kernel.perf_event_paranoid=0`, see [Kernel Documentation](https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html)) +- [FlameGraph Repository](https://github.com/brendangregg/FlameGraph) is cloned +- JULEA's development environment is loaded + +## Profiling + +### Record stack traces + +Stack traces are recorded using perf as shown below. + +```bash +perf record --call-graph dwarf -F 200 julea-server +# start desired workload (e.g., JULEA's benchmarks) +# ... +# server can be terminated using Ctrl+C +``` + +A few points to note: +- `-F` sets the sampling frequency in Hertz. +You may want to specify a higher value depending on the needed precision. +- Because traces can grow quite fast in size it might be usefull to get an overview first and use a higher resolution to examine specific performance issues or functions only. +- JULEA's tests are not suited for profiling because of too few operations performed. +If no specific application is given JULEA's benchmarks are suitable as profiling workload. +- Adding `-a` to the perf record call will take samples from all running applications. +This might be useful to also profile the JULEA client application if running on the same machine. + +### Fold stack traces + +If you want to use Brandan Gregg's FlameGraph script you need to fold the generated stack traces. +Though Speedscope, theoretically, does not require this step it is benefical for the performance the tool. +Since Linux 4.5, perf can generate folded stack traces. +However the script from Brendan Gregg produces better color mappings in Speedscope. + +```bash +perf script | ./stackcollapse-perf.pl > out.perf-folded +``` + +```bash +perf report --stdio --no-children -n -g folded,0,caller,count -s comm | \ +awk '/^ / { comm = $3 } /^[0-9]/ { print comm ";" $2, $1 }' > out.perf-folded +``` + +### Visualize + +The folded stack traces can be visualized using Brendan Gregg's original FlameGraphs: + +```bash +./flamegraph.pl out.perf-folded > perf.svg +firefox perf.svg # or chrome, etc. +``` + +Alternatively [Speedscope](https://www.speedscope.app/) is a nice web app to interactively explore the data. + +## Further Reading + +- [Step by step guide from Brendan Gregg](https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html) +- [FlameGraph Repository](https://github.com/brendangregg/FlameGraph) +- [Speedscope](https://www.speedscope.app/) +- [perf record manpage](https://www.man7.org/linux/man-pages/man1/perf-record.1.html) From ef7fcecf165bde6183b371356bc43c2e4bf57852 Mon Sep 17 00:00:00 2001 From: Timm Leon Erxleben Date: Sun, 11 Jun 2023 18:54:55 +0200 Subject: [PATCH 2/2] clarification --- doc/flamegraphs.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/flamegraphs.md b/doc/flamegraphs.md index 3feb095b0..9bbffd4f1 100644 --- a/doc/flamegraphs.md +++ b/doc/flamegraphs.md @@ -25,7 +25,7 @@ perf record --call-graph dwarf -F 200 julea-server A few points to note: - `-F` sets the sampling frequency in Hertz. You may want to specify a higher value depending on the needed precision. -- Because traces can grow quite fast in size it might be usefull to get an overview first and use a higher resolution to examine specific performance issues or functions only. +- Because traces can grow quite fast in size it might be usefull to get an overview first and use a higher resolution to examine only specific performance issues if possible. - JULEA's tests are not suited for profiling because of too few operations performed. If no specific application is given JULEA's benchmarks are suitable as profiling workload. - Adding `-a` to the perf record call will take samples from all running applications. @@ -34,7 +34,7 @@ This might be useful to also profile the JULEA client application if running on ### Fold stack traces If you want to use Brandan Gregg's FlameGraph script you need to fold the generated stack traces. -Though Speedscope, theoretically, does not require this step it is benefical for the performance the tool. +Though Speedscope, theoretically, does not require this step it is benefical for the performance of the tool. Since Linux 4.5, perf can generate folded stack traces. However the script from Brendan Gregg produces better color mappings in Speedscope.