@@ -12,14 +12,15 @@ Originally posted as Fast TotW #60 on June 6, 2022
1212
1313* By [ Chris Kennelly] ( mailto:ckennelly@google.com ) *
1414
15- Updated 2025-03-24
15+ Updated 2025-09-29
1616
1717Quicklink: [ abseil.io/fast/60] ( https://abseil.io/fast/60 )
1818
1919
2020[ Google-Wide Profiling] ( https://research.google/pubs/pub36575/ ) collects data
2121not just from our hardware performance counters, but also from in-process
22- profilers.
22+ profilers. These have been covered in previous episodes covering
23+ [ hashtables] ( /fast/26 ) .
2324
2425In-process profilers can give deeper insights about the state of the program
2526that are hard to observe from the outside, such as lock contention, where memory
@@ -39,8 +40,8 @@ decisions faster, shortening our
3940The value is in pulling in the area-under-curve and landing in a better spot. An
4041"imperfect" profiler that can help make a decision is better than a "perfect"
4142profiler that is unwieldy to collect for performance or privacy reasons. Extra
42- information or precision is only useful insofar as it helps us make a * better *
43- decision or * changes* the outcome.
43+ information or precision is only useful insofar as it helps us make a
44+ [ * better * decision or * changes* the outcome] ( /fast/94 ) .
4445
4546For example, most new optimizations to
4647[ TCMalloc] ( https://github.com/google/tcmalloc/blob/master/tcmalloc ) start from
@@ -54,7 +55,7 @@ steps didn't directly save any CPU usage or bytes of RAM, but they enabled
5455better decisions. Capabilities are harder to directly quantify, but they are the
5556motor of progress.
5657
57- ## Leveraging existing profilers: the "No build" option
58+ ## Leveraging existing profilers: the "No build" option {#no-build}
5859
5960Developing a new profiler takes considerable time, both in terms of
6061implementation and wallclock time to ready the fleet for collection at scale.
@@ -65,19 +66,19 @@ For example, if the case for hashtable profiling was just reporting the capacity
6566of hashtables, then we could also derive that information from heap profiles,
6667TCMalloc's heap profiles of the fleet. Even where heap profiles might not be
6768able to provide precise insights--the actual "size" of the hashtable, rather
68- than its capacity--we can make an informed guess from the profile combined with
69- knowledge about the typical load factors due to SwissMap's design.
69+ than its capacity--we can make an [ informed guess] ( /fast/90 ) from the profile
70+ combined with knowledge about the typical load factors due to SwissMap's design.
7071
7172It is important to articulate the value of the new profiler over what is already
7273provided. A key driver for hashtable-specific profiling is that the CPU profiles
7374of a hashtable with a
7475[ bad hash function look similar to those] ( https://youtu.be/JZE3_0qvrMg?t=1864 )
75- with a good hash function. The added information collected for stuck bits helps
76- us drive optimization decisions we wouldn't have been able to make. The capacity
77- information collected during hashtable-profiling is incidental to the profiler's
78- richer, hashtable-specific details, but wouldn't be a particularly compelling
79- reason to collect it on its own given the redundant information available from
80- ordinary heap profiles.
76+ with a good hash function. The [ added information collected] ( /fast/26 ) for stuck
77+ bits helps us drive optimization decisions we wouldn't have been able to make.
78+ The capacity information collected during hashtable-profiling is incidental to
79+ the profiler's richer, hashtable-specific details, but wouldn't be a
80+ particularly compelling reason to collect it on its own given the redundant
81+ information available from ordinary heap profiles.
8182
8283## Sampling strategies
8384
0 commit comments