Conversation
0cfea1d to
a3f0678
Compare
|
There are the results for a run on my dev machine (4090) |
|
It's the first time that I'm looking at this code. My second question was: What's the purpose of the benchmarks. Cursor (GPT-5.4 Extra High Fast) offered some answers. I asked it to generate a "Motivation" section based on what it found, see below. I think it'd be a great addition to MotivationThese benchmarks are intended to measure the latency overhead of calling CUDA Driver APIs through The main goal is to help answer questions such as:
The paired C++ benchmarks are included to provide a lower-level reference point for the same operation. Comparing Python and C++ results helps estimate the additional cost introduced by the Python-to-C boundary and by binding-specific marshalling work. These benchmarks are not intended to measure overall GPU performance, kernel throughput, or end-to-end application speed. Most of the benchmarked operations are deliberately tiny, so the reported numbers are best interpreted as binding/API-call latency measurements and regression signals, rather than as predictions of full application performance. Because the benchmarked operations are so small, methodology matters a lot. The most useful comparisons are between Python and C++ benchmarks that perform as nearly the same work as possible and are run under similar conditions. |
|
My first question (to Cursor) when reviewing this PR was:
After it gave me the response below I started thinking about the motivation, with the result in the previous comment. In light of that, the findings below still seem relevant, but I'd need to look closer to be more certain which of the "not clean apples-to-apples" aspects it found are actually meaningful. I hope they are at least a good starting point for figuring it out together, so I'm copy-pasting them below. Findings
What Looks Reasonably Matched
Bottom Line
Note
|
Description
closes #1580
Follow up #1580
Adding a couple of more benchmarks here and fixing a couple of issue with the pyperf json handling.
Checklist