abseil
diff --git a/‎_posts/2023-03-02-fast-21.md‎
Lines changed: 1 addition & 1 deletion b/‎_posts/2023-03-02-fast-21.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_posts/2023-03-02-fast-39.md‎
Lines changed: 6 additions & 5 deletions b/‎_posts/2023-03-02-fast-39.md‎
Lines changed: 6 additions & 5 deletions
diff --git a/‎_posts/2023-03-02-fast-53.md‎
Lines changed: 7 additions & 5 deletions b/‎_posts/2023-03-02-fast-53.md‎
Lines changed: 7 additions & 5 deletions
diff --git a/‎_posts/2023-03-02-fast-9.md‎
Lines changed: 1 addition & 1 deletion b/‎_posts/2023-03-02-fast-9.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_posts/2023-09-14-fast-7.md‎
Lines changed: 1 addition & 1 deletion b/‎_posts/2023-09-14-fast-7.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_posts/2023-09-30-fast-52.md‎
Lines changed: 1 addition & 1 deletion b/‎_posts/2023-09-30-fast-52.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_posts/2023-10-10-fast-64.md‎
Lines changed: 2 additions & 2 deletions b/‎_posts/2023-10-10-fast-64.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎_posts/2023-10-15-fast-60.md‎
Lines changed: 14 additions & 13 deletions b/‎_posts/2023-10-15-fast-60.md‎
Lines changed: 14 additions & 13 deletions
diff --git a/‎_posts/2023-10-20-fast-70.md‎
Lines changed: 8 additions & 1 deletion b/‎_posts/2023-10-20-fast-70.md‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎_posts/2023-11-10-fast-74.md‎
Lines changed: 12 additions & 12 deletions b/‎_posts/2023-11-10-fast-74.md‎
Lines changed: 12 additions & 12 deletions
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #21 on January 16, 2020
 
 *By [Paul Wankadia](mailto:junyer@google.com) and [Darryl Gove](mailto:djgove@google.com)*
 
-Updated 2024-10-21
+Updated 2025-09-03
 
 Quicklink: [abseil.io/fast/21](https://abseil.io/fast/21)
 
 
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #39 on January 22, 2021
 
 *By [Chris Kennelly](mailto:ckennelly@google.com) and [Alkis Evlogimenos](mailto:alkis@evlogimenos.com)*
 
-Updated 2025-03-24
+Updated 2025-09-29
 
 Quicklink: [abseil.io/fast/39](https://abseil.io/fast/39)
 
@@ -112,10 +112,11 @@ challenging: Microbenchmarks tend to have small working sets that tend to be
 cache resident. Real code, particularly Google C++, is not.
 
 In production, the cacheline holding `kMasks` might be evicted, leading to much
-worse stalls (hundreds of cycles to access main memory). Additionally, on x86
-processors since Haswell, this [optimization can be past its prime](/fast/9):
-BMI2's `bzhi` instruction is both faster than loading and masking *and* delivers
-more consistent performance.
+worse stalls
+([hundreds of cycles to access main memory](https://sre.google/static/pdf/rule-of-thumb-latency-numbers-letter.pdf)).
+Additionally, on x86 processors since Haswell, this
+[optimization can be past its prime](/fast/9): BMI2's `bzhi` instruction is both
+faster than loading and masking *and* delivers more consistent performance.
 
 When developing benchmarks for
 [SwissMap](https://abseil.io/blog/20180927-swisstables), individual operations
 
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #53 on October 14, 2021
 
 *By [Mircea Trofin](mailto:mtrofin@google.com)*
 
-Updated 2024-11-19
+Updated 2025-09-03
 
 Quicklink: [abseil.io/fast/53](https://abseil.io/fast/53)
 
@@ -73,7 +73,7 @@ the process of writing a benchmark. An example of its use may be seen
 [here](https://github.com/llvm/llvm-test-suite/tree/main/MicroBenchmarks/LoopVectorization)
 
 The benchmark harness support for performance counters consists of allowing the
-user to specify up to 3 counters in a comma-separated list, via the
+user to specify counters in a comma-separated list, via the
 `--benchmark_perf_counters` flag, to be measured alongside the time measurement.
 Just like time measurement, each counter value is captured right before the
 benchmarked code is run, and right after. The difference is reported to the user
@@ -131,13 +131,15 @@ instructions, and 6 memory ops per iteration.
 
 -   *Number of counters*: At most 32 events may be requested for simultaneous
     collection. Note however, that the number of hardware counters available is
-    much lower (usually 4-8 on modern CPUs) -- requesting more events than the
+    much lower (usually 4-8 on modern CPUs, see
+    `PerfCounterValues::kMaxCounters`) -- requesting more events than the
     hardware counters will cause
     [multiplexing](https://perf.wiki.kernel.org/index.php/Tutorial#multiplexing_and_scaling_events)
     and decreased accuracy.
 
--   *Visualization*: There is no visualization available, so the user needs to
-    rely on collecting JSON result files and summarizing the results.
+-   *Visualization*: There is no dedicated visualization UI available, so for
+    complex analysis, users may need to collect JSON result files and summarize
+    the results.
 
 -   *Counting vs. Sampling*: The framework only collects counters in "counting"
     mode -- it answers how many cycles/cache misses/etc. happened, but not does
 
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #9 on June 24, 2019
 
 *By [Chris Kennelly](mailto:ckennelly@google.com)*
 
-Updated 2025-03-27
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/9](https://abseil.io/fast/9)
 
 
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #7 on June 6, 2019
 
 *By [Chris Kennelly](mailto:ckennelly@google.com)*
 
-Updated 2025-03-25
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/7](https://abseil.io/fast/7)
 
 
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #52 on September 30, 2021
 
 *By [Chris Kennelly](mailto:ckennelly@google.com)*
 
-Updated 2025-03-24
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/52](https://abseil.io/fast/52)
 
 
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #64 on October 21, 2022
 
 *By [Chris Kennelly](mailto:ckennelly@google.com)*
 
-Updated 2025-03-24
+Updated 2025-09-29
 
 Quicklink: [abseil.io/fast/64](https://abseil.io/fast/64)
 
@@ -192,7 +192,7 @@ that can be returned. This approach has two problems:
     variable small string object buffer sizes. Returning `const std::string&`
     constrains the implementation to that particular size of buffer.
 
-In contrast, by returning `std::string_view` (or our
+In contrast, by returning [`std::string_view`](/tips/1) (or our
 [internal predecessor](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html),
 `StringPiece`), we decouple callers from the internal representation. The API is
 the same, independent of whether the string is constant data (backed by the
 
@@ -12,14 +12,15 @@ Originally posted as Fast TotW #60 on June 6, 2022
 
 *By [Chris Kennelly](mailto:ckennelly@google.com)*
 
-Updated 2025-03-24
+Updated 2025-09-29
 
 Quicklink: [abseil.io/fast/60](https://abseil.io/fast/60)
 
 
 [Google-Wide Profiling](https://research.google/pubs/pub36575/) collects data
 not just from our hardware performance counters, but also from in-process
-profilers.
+profilers. These have been covered in previous episodes covering
+[hashtables](/fast/26).
 
 In-process profilers can give deeper insights about the state of the program
 that are hard to observe from the outside, such as lock contention, where memory
@@ -39,8 +40,8 @@ decisions faster, shortening our
 The value is in pulling in the area-under-curve and landing in a better spot. An
 "imperfect" profiler that can help make a decision is better than a "perfect"
 profiler that is unwieldy to collect for performance or privacy reasons. Extra
-information or precision is only useful insofar as it helps us make a *better*
-decision or *changes* the outcome.
+information or precision is only useful insofar as it helps us make a
+[*better* decision or *changes* the outcome](/fast/94).
 
 For example, most new optimizations to
 [TCMalloc](https://github.com/google/tcmalloc/blob/master/tcmalloc) start from
@@ -54,7 +55,7 @@ steps didn't directly save any CPU usage or bytes of RAM, but they enabled
 better decisions. Capabilities are harder to directly quantify, but they are the
 motor of progress.
 
-## Leveraging existing profilers: the "No build" option
+## Leveraging existing profilers: the "No build" option {#no-build}
 
 Developing a new profiler takes considerable time, both in terms of
 implementation and wallclock time to ready the fleet for collection at scale.
@@ -65,19 +66,19 @@ For example, if the case for hashtable profiling was just reporting the capacity
 of hashtables, then we could also derive that information from heap profiles,
 TCMalloc's heap profiles of the fleet. Even where heap profiles might not be
 able to provide precise insights--the actual "size" of the hashtable, rather
-than its capacity--we can make an informed guess from the profile combined with
-knowledge about the typical load factors due to SwissMap's design.
+than its capacity--we can make an [informed guess](/fast/90) from the profile
+combined with knowledge about the typical load factors due to SwissMap's design.
 
 It is important to articulate the value of the new profiler over what is already
 provided. A key driver for hashtable-specific profiling is that the CPU profiles
 of a hashtable with a
 [bad hash function look similar to those](https://youtu.be/JZE3_0qvrMg?t=1864)
-with a good hash function. The added information collected for stuck bits helps
-us drive optimization decisions we wouldn't have been able to make. The capacity
-information collected during hashtable-profiling is incidental to the profiler's
-richer, hashtable-specific details, but wouldn't be a particularly compelling
-reason to collect it on its own given the redundant information available from
-ordinary heap profiles.
+with a good hash function. The [added information collected](/fast/26) for stuck
+bits helps us drive optimization decisions we wouldn't have been able to make.
+The capacity information collected during hashtable-profiling is incidental to
+the profiler's richer, hashtable-specific details, but wouldn't be a
+particularly compelling reason to collect it on its own given the redundant
+information available from ordinary heap profiles.
 
 ## Sampling strategies
 
 
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #70 on June 26, 2023
 
 *By [Chris Kennelly](mailto:ckennelly@google.com)*
 
-Updated 2025-03-25
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/70](https://abseil.io/fast/70)
 
@@ -129,6 +129,13 @@ performance improvements. We still need to measure the impact on application and
 service-level performance, but the proxies help us hone in on an optimization
 that we want to deploy faster.
 
+When we are considering multiple options for a project, secondary metrics can
+give us confirmation after the fact that our expectations were correct. For
+example, suppose we chose option A over option B because both provided
+comparable performance but A would not impact reliability. We should measure
+both the performance and reliability outcomes to support our engineering
+decision. This lets us close the loop between expectations and reality.
+
 ## Aligning with success
 
 The metrics we pick need to align with success. If a metric tells us to do the
 
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #74 on September 29, 2023
 
 *By [Chris Kennelly](mailto:ckennelly@google.com) and [Matt Kulukundis](mailto:kfm@google.com)*
 
-Updated 2025-03-25
+Updated 2025-10-03
 
 Quicklink: [abseil.io/fast/74](https://abseil.io/fast/74)
 
@@ -74,12 +74,12 @@ understand, we might be tempted to remove it. TCMalloc's fast path would appear
 cheaper, but other code somewhere else would experience a cache miss and
 [application productivity](/fast/7) would decline.
 
-To make matters worse, the cost is partly a profiling artifact. The TLB miss
-blocks instruction retirement, but our processors are superscalar, out-of-order
-behemoths. The processor can continue to execute further instructions in the
-meantime, but this execution is not visible to a sampling profiler like
-Google-Wide Profiling. IPC in the application may be improved, but not in a way
-immediately associated with TCMalloc.
+To make matters worse, the cost is partly [a profiling artifact](/fast/94). The
+TLB miss blocks instruction retirement, but our processors are superscalar,
+out-of-order behemoths. The processor can continue to execute further
+instructions in the meantime, but this execution is not visible to a sampling
+profiler like Google-Wide Profiling. IPC in the application may be improved, but
+not in a way immediately associated with TCMalloc.
 
 ### Hidden context switch costs
 
@@ -104,11 +104,11 @@ increase apparent kernel scheduler latency.
 
 ### Sweeping away protocol buffers
 
-Consider an extreme example. When our hashtable profiler for Abseil's hashtables
-indicates a problematic hashtable, a user could switch the offending table from
-`absl::flat_hash_map` to `std::unordered_map`. Since the profiler doesn't
-collect information about `std` containers, the offending table would no longer
-show up, although the fleet itself would be dramatically worse.
+Consider an extreme example. When [our hashtable profiler](/fast/26) for
+Abseil's hashtables indicates a problematic hashtable, a user could switch the
+offending table from `absl::flat_hash_map` to `std::unordered_map`. Since the
+profiler doesn't collect information about `std` containers, the offending table
+would no longer show up, although the fleet itself would be dramatically worse.
 
 While the above example may seem contrived, an almost entirely analogous
 recommendation comes up with some regularity: migrate users from protos to