Building with PGO speeds up loda benchmark by 0%-35%. However, this is quite difficult due to at least 3 profiles (cl, gcc, clang, maybe more because of x86_64/arm64?) and the dependency on the toolchain version. Example of changes here #547
Results from Apple M3, clang 17
Without PGO
| Sequence |
Terms |
Reg Eval |
Inc Eval |
Vir Eval |
| A000040 |
1000 |
3.48s |
- |
0.43s |
| A000394 |
1000 |
1.72s |
- |
0.22s |
| A000401 |
1000 |
2.98s |
- |
0.22s |
| A000796 |
300 |
0.70s |
- |
- |
| A001041 |
300 |
0.73s |
- |
- |
| A001113 |
300 |
0.63s |
- |
- |
| A002110 |
300 |
0.77s |
- |
- |
| A002760 |
200 |
24.71s |
- |
1.18s |
| A057552 |
300 |
2.81s |
0.03s |
- |
| A079309 |
300 |
2.80s |
0.03s |
- |
| A002193 |
400 |
0.56s |
0.23s |
- |
| A035856 |
500 |
1.62s |
- |
- |
| A001609 |
1000 |
0.52s |
0.00s |
- |
| A003411 |
1000 |
0.59s |
0.00s |
- |
| A012866 |
1000 |
1.00s |
0.00s |
- |
| A000045 |
2000 |
1.82s |
0.00s |
- |
| A001304 |
3000 |
0.98s |
0.00s |
- |
| A000005 |
5000 |
1.04s |
- |
- |
| A130487 |
5000 |
1.70s |
0.00s |
- |
| A000030 |
500000 |
0.38s |
- |
- |
With PGO(instrumented profile, profile generated from loda mine -H 1)
| Sequence |
Terms |
Reg Eval |
Inc Eval |
Vir Eval |
| A000040 |
1000 |
2.22s |
- |
0.32s |
| A000394 |
1000 |
1.41s |
- |
0.17s |
| A000401 |
1000 |
2.34s |
- |
0.17s |
| A000796 |
300 |
0.67s |
- |
- |
| A001041 |
300 |
0.72s |
- |
- |
| A001113 |
300 |
0.60s |
- |
- |
| A002110 |
300 |
0.73s |
- |
- |
| A002760 |
200 |
19.51s |
- |
0.98s |
| A057552 |
300 |
2.79s |
0.03s |
- |
| A079309 |
300 |
2.77s |
0.03s |
- |
| A002193 |
400 |
0.56s |
0.21s |
- |
| A035856 |
500 |
1.55s |
- |
- |
| A001609 |
1000 |
0.50s |
0.00s |
- |
| A003411 |
1000 |
0.58s |
0.00s |
- |
| A012866 |
1000 |
0.99s |
0.00s |
- |
| A000045 |
2000 |
1.82s |
0.00s |
- |
| A001304 |
3000 |
0.84s |
0.00s |
- |
| A000005 |
5000 |
0.74s |
- |
- |
| A130487 |
5000 |
1.12s |
0.00s |
- |
| A000030 |
500000 |
0.28s |
- |
- |
Building with PGO speeds up
loda benchmarkby 0%-35%. However, this is quite difficult due to at least 3 profiles (cl, gcc, clang, maybe more because of x86_64/arm64?) and the dependency on the toolchain version. Example of changes here #547Results from Apple M3, clang 17
Without PGO
With PGO(instrumented profile, profile generated from
loda mine -H 1)