Skip to content

Commit 8f8a7e8

Browse files
Fix pointers of the docs in the README
1 parent bafd0b9 commit 8f8a7e8

4 files changed

Lines changed: 47 additions & 52 deletions

File tree

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ With these tools you can find:
5151

5252
For more details on each of the performance analysis methods see the following
5353
documents:
54-
* GHC RTS Stats
55-
* threadCPUTime# RTS primitive
56-
* GHC Event logging
57-
* GHC patches details
54+
* [GHC RTS Stats](docs/ghc-rts-performance-analysis.md)
55+
* [threadCPUTime# RTS primitive](docs/thread-cputime-primop.md)
56+
* [GHC Event logging](docs/eventlog-performance-analysis.md)
57+
* [GHC patches details](dev/ghc-work.md)

docs/eventlog-performance-analysis.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
1-
# haskell-perf
1+
# GHC Event logging
22

3+
Available in GHC 9.2.8 RTS patch. Can be ported to later GHCs.
4+
5+
<!--
36
GHC Patch: https://github.com/composewell/ghc/tree/ghc-8.10.7-eventlog-enhancements
7+
-->
8+
9+
Eventlog based Haskell thread aware time and allocation analysis is
10+
possible with stock GHC but there are some limitations and drawbacks
11+
which are fixed in the RTS patch described below. The patch adds
12+
accurate timing and allocation information and hardware performance
13+
counters, and we then use a custom event log analysis program to provide
14+
an accurate and comprehensive analysis of all the threads in the entire
15+
program not just the current thread.
16+
17+
<!--
18+
TBD: document the exact limitations and differences.
19+
-->
420

521
## Enable Linux perf counters
622

docs/ghc-rts-performance-analysis.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,27 @@
1+
<!--
2+
# GHC RTS Stats
3+
4+
RTS Stats give entire OS process level (not OS thread level) cpu time
5+
and not Haskell thread cpu time. When multiple OS threads are used, the
6+
cpu time recorded is the cpu time of all the threads combined. Also, the
7+
way kernel accounts this time it could be off by a little (microseconds)
8+
because each thread's cpu time is recorded at the last accounting
9+
event. Allocations are recorded by the GHC RTS only at the GC boundary,
10+
so the allocations reported are from the point when the last GC
11+
happened. So we need to be careful when using or interpreting these
12+
stats.
13+
14+
If we built the program without -threaded and we are using a single
15+
Haskell thread then we can get cpu time between any two points in the
16+
program accurately. Accurate accounting of allocations will require a GC
17+
to be forced which is not usually practical.
18+
19+
In a multithreaded program using RTS stats we can only tell time how
20+
much total CPU time (and allocations) the entire Haskell process (all
21+
threads) spent between two points, but we cannot tell which Haskell
22+
thread spent how much time.
23+
-->
24+
125
# Components of a Haskell Process
226

327
* An OS level process
Lines changed: 2 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,4 @@
1-
# Haskell Performance Analysis
2-
3-
## GHC RTS Stats
4-
5-
RTS Stats give entire OS process level (not OS thread level) cpu time
6-
and not Haskell thread cpu time. When multiple OS threads are used, the
7-
cpu time recorded is the cpu time of all the threads combined. Also, the
8-
way kernel accounts this time it could be off by a little (microseconds)
9-
because each thread's cpu time is recorded at the last accounting
10-
event. Allocations are recorded by the GHC RTS only at the GC boundary,
11-
so the allocations reported are from the point when the last GC
12-
happened. So we need to be careful when using or interpreting these
13-
stats.
14-
15-
If we built the program without -threaded and we are using a single
16-
Haskell thread then we can get cpu time between any two points in the
17-
program accurately. Accurate accounting of allocations will require a GC
18-
to be forced which is not usually practical.
19-
20-
In a multithreaded program using RTS stats we can only tell time how
21-
much total CPU time (and allocations) the entire Haskell process (all
22-
threads) spent between two points, but we cannot tell which Haskell
23-
thread spent how much time.
24-
25-
## GHC Event logging
26-
27-
Eventlog based Haskell thread aware time and allocation analysis is
28-
possible with stock GHC but there are some limitations and drawbacks
29-
which are fixed in the RTS patch described below. The patch basically
30-
adds accurate information and more information, and we then use a custom
31-
event log analysis program to provide an accurate and comprehensive
32-
picture of the entire program.
33-
34-
TBD: document the exact limitations and differences.
35-
36-
## threadCPUTime# prim op
1+
# threadCPUTime# prim op
372

383
Available in the
394
[GHC 9.2.8 RTS patch](https://github.com/composewell/ghc/releases/tag/ghc-9.2.8-perf-counters-1-rc1).
@@ -56,7 +21,7 @@ and B in a program, diff will tell us the time spent and allocations
5621
between the two points.
5722

5823
We have to ensure that we are diffing the data for the same thread id at
59-
both the points. See [this example program](./threadCPUTime.hs).
24+
both the points. See [this example program](../examples/threadCPUTime.hs).
6025

6126
The API has some measurement overhead but it is not very high. If we
6227
are nesting measurements be aware that outer measurement will measure
@@ -82,13 +47,3 @@ though. For accurate synchronization (if needed) of all threads at the
8247
given points we can stop-the-world, can be useful in testing but not a
8348
good idea in production though. Also, managing windows with possible
8449
nesting can complicate the RTS code.
85-
86-
## Eventlog based perf counters
87-
88-
Available in GHC 8.10.7 RTS patch. Can be ported to later GHCs.
89-
90-
This gives you a more comprehensive picture of the entire program
91-
between any two specified points, it gives a detailed report about all
92-
the threads in the system not just the current thread.
93-
94-
See the [README](../README.md) for more details on this.

0 commit comments

Comments
 (0)