Merge pull request #460 from aalexand/update-53

tituswinters · web-flow · commit af238cc7c4bd · 2023-09-14T12:16:42.000-04:00
Update abseil.io/fast/53 with recent changes.
diff --git a/_posts/2023-03-02-fast-53.md b/_posts/2023-03-02-fast-53.md
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #53 on October 14, 2021
 
 *By [Mircea Trofin](mailto:mtrofin@google.com)*
 
-Updated 2023-03-02
+Updated 2023-09-04
 
 Quicklink: [abseil.io/fast/53](https://abseil.io/fast/53)
 
@@ -77,10 +77,9 @@ user to specify up to 3 counters in a comma-separated list, via the
 `--benchmark_perf_counters` flag, to be measured alongside the time measurement.
 Just like time measurement, each counter value is captured right before the
 benchmarked code is run, and right after. The difference is reported to the user
-as per-iteration values (similar to the time measurement). The report is only
-available in the JSON output (`--benchmark_format=json`).
+as per-iteration values (similar to the time measurement).
 
-### Simple example
+### Basic usage
 
 **Note**: counter names are hardware vendor and version specific. The example
 here assumes Intel Skylake. Check how this maps to other versions of Intel CPUs,
@@ -92,51 +91,41 @@ Build a benchmark executable - for example, let's use "swissmap" from
 [fleetbench](https://github.com/google/fleetbench):
 
 <pre class="prettyprint code">
-bazel build -c opt //fleetbench/swissmap:swissmap_benchmark
+bazel build -c opt //fleetbench/swissmap:cold_swissmap_benchmark
 </pre>
 
 Run the benchmark; let's ask for instructions, cycles, and loads:
 
 <pre class="prettyprint code">
-bazel-bin/fleetbench/swissmap/swissmap_benchmark --benchmarks=all --benchmark_perf_counters=INSTRUCTIONS,CYCLES,MEM_UOPS_RETIRED:ALL_LOADS --benchmark_format=json
+bazel-bin/fleetbench/swissmap/cold_swissmap_benchmark \
+  --benchmark_filter='BM_.*::absl::flat_hash_set.*64.*set_size:64.*density:0' \
+  --benchmark_perf_counters=INSTRUCTIONS,CYCLES,MEM_UOPS_RETIRED:ALL_LOADS
 </pre>
 
-The output JSON file is organized as follows:
-
-<pre class="prettyprint code">
-{
-  "benchmarks": [
-    {
-      "CYCLES": 183357.29158733244,
-      "INSTRUCTIONS": 603772.790402176,
-      "MEM_UOPS_RETIRED:ALL_LOADS": 121.63652613172722,
-      "bytes_per_second": 1804401396.9863303,
-      "cpu_time_ns": 56750.122323683696,
-      "iterations": 25735,
-      "label": "html",
-      "name": "BM_UDataBuffer/0",
-      "real_time_ns": 56900.075383718671
-    },
-    {
-      "CYCLES": 183782.38686892079,
-      "INSTRUCTIONS": 603772.91427358345,
-      "MEM_UOPS_RETIRED:ALL_LOADS": 119.59456538520921,
-      "bytes_per_second": 1825391775.0291102,
-      "cpu_time_ns": 56097.546510730273,
-      "iterations": 25908,
-      "label": "html",
-      "name": "BM_UDataBuffer/0",
-      "real_time_ns": 56245.906090782773
-    },
-    [...]
-}
-</pre>
-
-For each run of the benchmark, the requested counters and their values are
-captured in a JSON dictionary. The values are per-iteration (note the
-`iterations` field). In the first run the benchmark completed `25735`
-iterations, so the total value for CYCLES measured by the benchmark was
-`183357.29158733244 * 25735`.
+The output looks like:
+
+```
+Running ./cold_swissmap_benchmark
+Run on (8 X 4667.91 MHz CPU s)
+CPU Caches:
+  L1 Data 32 KiB (x4)
+  L1 Instruction 32 KiB (x4)
+  L2 Unified 256 KiB (x4)
+  L3 Unified 8192 KiB (x1)
+Load Average: 2.31, 2.08, 1.95
+---------------------------------------------------------------------------------------------------------------------------------------
+Benchmark                                                                             Time             CPU   Iterations UserCounters...
+---------------------------------------------------------------------------------------------------------------------------------------
+BM_FindMiss_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0                  18.4 ns         18.4 ns     39048136 CYCLES=82.9019 INSTRUCTIONS=35.7284 MEM_UOPS_RETIRED:ALL_LOADS=6.05507
+BM_FindHit_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0                   33.3 ns         33.3 ns     20600490 CYCLES=152.156 INSTRUCTIONS=55.0354 MEM_UOPS_RETIRED:ALL_LOADS=15.0034
+BM_InsertHit_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0                 34.8 ns         34.8 ns     19004416 CYCLES=157.956 INSTRUCTIONS=59.0354 MEM_UOPS_RETIRED:ALL_LOADS=16.0013
+BM_Iterate_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0                   33.5 ns         33.5 ns     25444389 CYCLES=152.431 INSTRUCTIONS=57.9225 MEM_UOPS_RETIRED:ALL_LOADS=13.3892
+BM_InsertManyOrdered_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0         54.9 ns         54.8 ns     14141958 CYCLES=242.373 INSTRUCTIONS=111.455 MEM_UOPS_RETIRED:ALL_LOADS=33.1838
+BM_InsertManyUnordered_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0       50.0 ns         50.0 ns     14234753 CYCLES=227.516 INSTRUCTIONS=111.415 MEM_UOPS_RETIRED:ALL_LOADS=33.1781
+```
+
+So we can see that `BM_FindMiss_Cold` took approximately 83 cycles, 36
+instructions, and 6 memory ops per iteration.
 
 ## Summary