cuda.bindings latency benchmarks part 4#1959
Conversation
|
All of these look good. I continue to be worried by the high stddev on some of the C++ benchmarks: With stddev so high, it means that these differences don't really tell us anything: |
For a follow-up PR, I’d suggest a calibrated "slow/data-collection" mode: instead of fixed loop counts, first estimate the repetitions needed to reach a target sample time (similar to |
|
Yes, agree that we should take that next. we are pretty much done with the P0s and P1s i wanted to target for bindings so I'll work on improving that collection in the next PR. |
|
/ok to test f5b1621 |
|
(Linux runners appear to have a major outage.) |
This comment has been minimized.
This comment has been minimized.
Yes, that may be the issue. The way the data looks (random, rather than converging) also suggests it may be due to reusing the same process continuously. (The Python benchmarks start a number of fresh processes). |
|
Description
Follow up #1580
Mostly all generated by AI agents.
It also migrated a most of the existing pytest-benchmarks, will finish that in the next PR.
Results in my dev station:
Checklist