Commit b85d169
Optimzie crc32 on AMD Milan+
We have AVX encoded vector PCLMULQDQ on Milan, so use it to make
crc32c computations ~10% faster. We need to use inline asm, since
building this twice with different complier flags for dynamic
dispatch performed worse due to missing inlining.
BM_Calculate/0 1.136n ± 0% 1.136n ± 1% ~ (p=0.968 n=6)
BM_Calculate/1 1.420n ± 0% 1.421n ± 1% ~ (p=0.870 n=6)
BM_Calculate/100 9.089n ± 0% 9.660n ± 1% +6.29% (p=0.002 n=6)
BM_Calculate/2048 75.30n ± 1% 67.67n ± 1% -10.13% (p=0.002 n=6)
BM_Calculate/10000 313.1n ± 0% 286.1n ± 0% -8.63% (p=0.002 n=6)
BM_Calculate/500000 14.91µ ± 4% 13.49µ ± 1% -9.48% (p=0.002 n=6)
BM_Extend/0 1.136n ± 1% 1.136n ± 1% ~ (p=0.636 n=6)
BM_Extend/1 1.420n ± 0% 1.420n ± 1% ~ (p=0.636 n=6)
BM_Extend/100 9.247n ± 2% 9.800n ± 2% +5.99% (p=0.002 n=6)
BM_Extend/2048 75.73n ± 1% 67.37n ± 1% -11.04% (p=0.002 n=6)
BM_Extend/10000 313.2n ± 1% 286.2n ± 0% -8.62% (p=0.002 n=6)
BM_Extend/500000 14.87µ ± 1% 13.57µ ± 1% -8.74% (p=0.002 n=6)
BM_Extend/100000000 3.185m ± 2% 2.816m ± 3% -11.60% (p=0.002 n=6)
BM_ExtendCacheMiss/10 26.07m ± 1% 26.06m ± 1% ~ (p=1.000 n=6)
BM_ExtendCacheMiss/100 13.86m ± 4% 14.36m ± 2% +3.61% (p=0.026 n=6)
BM_ExtendCacheMiss/1000 27.02m ± 4% 27.28m ± 4% ~ (p=0.699 n=6)
BM_ExtendCacheMiss/100000 5.114m ± 5% 4.600m ± 8% -10.07% (p=0.002 n=6)
BM_ExtendByZeroes/1 1.420n ± 0% 1.420n ± 0% ~ (p=0.670 n=12)
BM_ExtendByZeroes/10 1.704n ± 1% 1.704n ± 0% ~ (p=1.000 n=6)
BM_ExtendByZeroes/100 3.128n ± 0% 3.128n ± 0% ~ (p=1.000 n=6)
BM_ExtendByZeroes/1000 6.758n ± 0% 6.638n ± 1% -1.78% (p=0.002 n=6)
BM_ExtendByZeroes/10000 6.619n ± 1% 6.503n ± 0% -1.75% (p=0.002 n=6)
BM_ExtendByZeroes/100000 8.537n ± 1% 8.479n ± 0% -0.67% (p=0.019 n=6)
BM_ExtendByZeroes/1000000 9.766n ± 1% 9.692n ± 1% -0.75% (p=0.002 n=6)
PiperOrigin-RevId: 900897540
Change-Id: I57d8df2bf10690afc07009d61f8c4ea61e88ce501 parent 5f9d5bf commit b85d169
2 files changed
Lines changed: 80 additions & 182 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | 18 | | |
20 | 19 | | |
21 | 20 | | |
| |||
66 | 65 | | |
67 | 66 | | |
68 | 67 | | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | 68 | | |
77 | 69 | | |
78 | 70 | | |
| |||
127 | 119 | | |
128 | 120 | | |
129 | 121 | | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | 122 | | |
142 | 123 | | |
143 | 124 | | |
| |||
290 | 271 | | |
291 | 272 | | |
292 | 273 | | |
293 | | - | |
294 | | - | |
295 | | - | |
296 | | - | |
297 | | - | |
298 | | - | |
299 | | - | |
300 | | - | |
301 | | - | |
302 | | - | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | 274 | | |
314 | 275 | | |
315 | 276 | | |
| |||
0 commit comments