Skip to content

Commit af238cc

Browse files
authored
Merge pull request #460 from aalexand/update-53
Update abseil.io/fast/53 with recent changes.
2 parents c215078 + b618d09 commit af238cc

1 file changed

Lines changed: 31 additions & 42 deletions

File tree

_posts/2023-03-02-fast-53.md

Lines changed: 31 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #53 on October 14, 2021
1212

1313
*By [Mircea Trofin](mailto:mtrofin@google.com)*
1414

15-
Updated 2023-03-02
15+
Updated 2023-09-04
1616

1717
Quicklink: [abseil.io/fast/53](https://abseil.io/fast/53)
1818

@@ -77,10 +77,9 @@ user to specify up to 3 counters in a comma-separated list, via the
7777
`--benchmark_perf_counters` flag, to be measured alongside the time measurement.
7878
Just like time measurement, each counter value is captured right before the
7979
benchmarked code is run, and right after. The difference is reported to the user
80-
as per-iteration values (similar to the time measurement). The report is only
81-
available in the JSON output (`--benchmark_format=json`).
80+
as per-iteration values (similar to the time measurement).
8281

83-
### Simple example
82+
### Basic usage
8483

8584
**Note**: counter names are hardware vendor and version specific. The example
8685
here assumes Intel Skylake. Check how this maps to other versions of Intel CPUs,
@@ -92,51 +91,41 @@ Build a benchmark executable - for example, let's use "swissmap" from
9291
[fleetbench](https://github.com/google/fleetbench):
9392

9493
<pre class="prettyprint code">
95-
bazel build -c opt //fleetbench/swissmap:swissmap_benchmark
94+
bazel build -c opt //fleetbench/swissmap:cold_swissmap_benchmark
9695
</pre>
9796

9897
Run the benchmark; let's ask for instructions, cycles, and loads:
9998

10099
<pre class="prettyprint code">
101-
bazel-bin/fleetbench/swissmap/swissmap_benchmark --benchmarks=all --benchmark_perf_counters=INSTRUCTIONS,CYCLES,MEM_UOPS_RETIRED:ALL_LOADS --benchmark_format=json
100+
bazel-bin/fleetbench/swissmap/cold_swissmap_benchmark \
101+
--benchmark_filter='BM_.*::absl::flat_hash_set.*64.*set_size:64.*density:0' \
102+
--benchmark_perf_counters=INSTRUCTIONS,CYCLES,MEM_UOPS_RETIRED:ALL_LOADS
102103
</pre>
103104

104-
The output JSON file is organized as follows:
105-
106-
<pre class="prettyprint code">
107-
{
108-
"benchmarks": [
109-
{
110-
"CYCLES": 183357.29158733244,
111-
"INSTRUCTIONS": 603772.790402176,
112-
"MEM_UOPS_RETIRED:ALL_LOADS": 121.63652613172722,
113-
"bytes_per_second": 1804401396.9863303,
114-
"cpu_time_ns": 56750.122323683696,
115-
"iterations": 25735,
116-
"label": "html",
117-
"name": "BM_UDataBuffer/0",
118-
"real_time_ns": 56900.075383718671
119-
},
120-
{
121-
"CYCLES": 183782.38686892079,
122-
"INSTRUCTIONS": 603772.91427358345,
123-
"MEM_UOPS_RETIRED:ALL_LOADS": 119.59456538520921,
124-
"bytes_per_second": 1825391775.0291102,
125-
"cpu_time_ns": 56097.546510730273,
126-
"iterations": 25908,
127-
"label": "html",
128-
"name": "BM_UDataBuffer/0",
129-
"real_time_ns": 56245.906090782773
130-
},
131-
[...]
132-
}
133-
</pre>
134-
135-
For each run of the benchmark, the requested counters and their values are
136-
captured in a JSON dictionary. The values are per-iteration (note the
137-
`iterations` field). In the first run the benchmark completed `25735`
138-
iterations, so the total value for CYCLES measured by the benchmark was
139-
`183357.29158733244 * 25735`.
105+
The output looks like:
106+
107+
```
108+
Running ./cold_swissmap_benchmark
109+
Run on (8 X 4667.91 MHz CPU s)
110+
CPU Caches:
111+
L1 Data 32 KiB (x4)
112+
L1 Instruction 32 KiB (x4)
113+
L2 Unified 256 KiB (x4)
114+
L3 Unified 8192 KiB (x1)
115+
Load Average: 2.31, 2.08, 1.95
116+
---------------------------------------------------------------------------------------------------------------------------------------
117+
Benchmark Time CPU Iterations UserCounters...
118+
---------------------------------------------------------------------------------------------------------------------------------------
119+
BM_FindMiss_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0 18.4 ns 18.4 ns 39048136 CYCLES=82.9019 INSTRUCTIONS=35.7284 MEM_UOPS_RETIRED:ALL_LOADS=6.05507
120+
BM_FindHit_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0 33.3 ns 33.3 ns 20600490 CYCLES=152.156 INSTRUCTIONS=55.0354 MEM_UOPS_RETIRED:ALL_LOADS=15.0034
121+
BM_InsertHit_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0 34.8 ns 34.8 ns 19004416 CYCLES=157.956 INSTRUCTIONS=59.0354 MEM_UOPS_RETIRED:ALL_LOADS=16.0013
122+
BM_Iterate_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0 33.5 ns 33.5 ns 25444389 CYCLES=152.431 INSTRUCTIONS=57.9225 MEM_UOPS_RETIRED:ALL_LOADS=13.3892
123+
BM_InsertManyOrdered_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0 54.9 ns 54.8 ns 14141958 CYCLES=242.373 INSTRUCTIONS=111.455 MEM_UOPS_RETIRED:ALL_LOADS=33.1838
124+
BM_InsertManyUnordered_Cold<::absl::flat_hash_set, 64>/set_size:64/density:0 50.0 ns 50.0 ns 14234753 CYCLES=227.516 INSTRUCTIONS=111.415 MEM_UOPS_RETIRED:ALL_LOADS=33.1781
125+
```
126+
127+
So we can see that `BM_FindMiss_Cold` took approximately 83 cycles, 36
128+
instructions, and 6 memory ops per iteration.
140129

141130
## Summary
142131

0 commit comments

Comments
 (0)