1- # Haskell Performance Analysis
2-
3- ## GHC RTS Stats
4-
5- RTS Stats give entire OS process level (not OS thread level) cpu time
6- and not Haskell thread cpu time. When multiple OS threads are used, the
7- cpu time recorded is the cpu time of all the threads combined. Also, the
8- way kernel accounts this time it could be off by a little (microseconds)
9- because each thread's cpu time is recorded at the last accounting
10- event. Allocations are recorded by the GHC RTS only at the GC boundary,
11- so the allocations reported are from the point when the last GC
12- happened. So we need to be careful when using or interpreting these
13- stats.
14-
15- If we built the program without -threaded and we are using a single
16- Haskell thread then we can get cpu time between any two points in the
17- program accurately. Accurate accounting of allocations will require a GC
18- to be forced which is not usually practical.
19-
20- In a multithreaded program using RTS stats we can only tell time how
21- much total CPU time (and allocations) the entire Haskell process (all
22- threads) spent between two points, but we cannot tell which Haskell
23- thread spent how much time.
24-
25- ## GHC Event logging
26-
27- Eventlog based Haskell thread aware time and allocation analysis is
28- possible with stock GHC but there are some limitations and drawbacks
29- which are fixed in the RTS patch described below. The patch basically
30- adds accurate information and more information, and we then use a custom
31- event log analysis program to provide an accurate and comprehensive
32- picture of the entire program.
33-
34- TBD: document the exact limitations and differences.
35-
36- ## threadCPUTime# prim op
1+ # threadCPUTime# prim op
372
383Available in the
394[ GHC 9.2.8 RTS patch] ( https://github.com/composewell/ghc/releases/tag/ghc-9.2.8-perf-counters-1-rc1 ) .
@@ -56,7 +21,7 @@ and B in a program, diff will tell us the time spent and allocations
5621between the two points.
5722
5823We have to ensure that we are diffing the data for the same thread id at
59- both the points. See [ this example program] ( ./threadCPUTime.hs ) .
24+ both the points. See [ this example program] ( ../examples /threadCPUTime.hs ) .
6025
6126The API has some measurement overhead but it is not very high. If we
6227are nesting measurements be aware that outer measurement will measure
@@ -82,13 +47,3 @@ though. For accurate synchronization (if needed) of all threads at the
8247given points we can stop-the-world, can be useful in testing but not a
8348good idea in production though. Also, managing windows with possible
8449nesting can complicate the RTS code.
85-
86- ## Eventlog based perf counters
87-
88- Available in GHC 8.10.7 RTS patch. Can be ported to later GHCs.
89-
90- This gives you a more comprehensive picture of the entire program
91- between any two specified points, it gives a detailed report about all
92- the threads in the system not just the current thread.
93-
94- See the [ README] ( ../README.md ) for more details on this.
0 commit comments