Commit 5f9d5bf
Optimzie crc32 on AMD Milan+
We have AVX encoded vector PCLMULQDQ on Milan, so use it to make
crc32c computations ~10% faster. We need to use inline asm, since
building this twice with different complier flags for dynamic
dispatch performed worse due to missing inlining.
BM_Calculate/0 1.136n ± 0% 1.136n ± 1% ~ (p=0.968 n=6)
BM_Calculate/1 1.420n ± 0% 1.421n ± 1% ~ (p=0.870 n=6)
BM_Calculate/100 9.089n ± 0% 9.660n ± 1% +6.29% (p=0.002 n=6)
BM_Calculate/2048 75.30n ± 1% 67.67n ± 1% -10.13% (p=0.002 n=6)
BM_Calculate/10000 313.1n ± 0% 286.1n ± 0% -8.63% (p=0.002 n=6)
BM_Calculate/500000 14.91µ ± 4% 13.49µ ± 1% -9.48% (p=0.002 n=6)
BM_Extend/0 1.136n ± 1% 1.136n ± 1% ~ (p=0.636 n=6)
BM_Extend/1 1.420n ± 0% 1.420n ± 1% ~ (p=0.636 n=6)
BM_Extend/100 9.247n ± 2% 9.800n ± 2% +5.99% (p=0.002 n=6)
BM_Extend/2048 75.73n ± 1% 67.37n ± 1% -11.04% (p=0.002 n=6)
BM_Extend/10000 313.2n ± 1% 286.2n ± 0% -8.62% (p=0.002 n=6)
BM_Extend/500000 14.87µ ± 1% 13.57µ ± 1% -8.74% (p=0.002 n=6)
BM_Extend/100000000 3.185m ± 2% 2.816m ± 3% -11.60% (p=0.002 n=6)
BM_ExtendCacheMiss/10 26.07m ± 1% 26.06m ± 1% ~ (p=1.000 n=6)
BM_ExtendCacheMiss/100 13.86m ± 4% 14.36m ± 2% +3.61% (p=0.026 n=6)
BM_ExtendCacheMiss/1000 27.02m ± 4% 27.28m ± 4% ~ (p=0.699 n=6)
BM_ExtendCacheMiss/100000 5.114m ± 5% 4.600m ± 8% -10.07% (p=0.002 n=6)
BM_ExtendByZeroes/1 1.420n ± 0% 1.420n ± 0% ~ (p=0.670 n=12)
BM_ExtendByZeroes/10 1.704n ± 1% 1.704n ± 0% ~ (p=1.000 n=6)
BM_ExtendByZeroes/100 3.128n ± 0% 3.128n ± 0% ~ (p=1.000 n=6)
BM_ExtendByZeroes/1000 6.758n ± 0% 6.638n ± 1% -1.78% (p=0.002 n=6)
BM_ExtendByZeroes/10000 6.619n ± 1% 6.503n ± 0% -1.75% (p=0.002 n=6)
BM_ExtendByZeroes/100000 8.537n ± 1% 8.479n ± 0% -0.67% (p=0.019 n=6)
BM_ExtendByZeroes/1000000 9.766n ± 1% 9.692n ± 1% -0.75% (p=0.002 n=6)
PiperOrigin-RevId: 900870516
Change-Id: I1382ae2ffeed35e1d55a0916290144cae5256fe01 parent cd0423d commit 5f9d5bf
2 files changed
Lines changed: 182 additions & 80 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
65 | 66 | | |
66 | 67 | | |
67 | 68 | | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
68 | 76 | | |
69 | 77 | | |
70 | 78 | | |
| |||
119 | 127 | | |
120 | 128 | | |
121 | 129 | | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
122 | 141 | | |
123 | 142 | | |
124 | 143 | | |
| |||
271 | 290 | | |
272 | 291 | | |
273 | 292 | | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
274 | 313 | | |
275 | 314 | | |
276 | 315 | | |
| |||
0 commit comments