[ET-VK][benchmarking][ez] Don't perform copies when benchmarking#9468
[ET-VK][benchmarking][ez] Don't perform copies when benchmarking#9468facebook-github-bot merged 2 commits intogh/SS-JIA/199/basefrom
Conversation
## Context The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead: 1. Copy from CPU to staging 2. Copy from staging to GPU Buffer/Image And this is done for both inputs and outputs. Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor. Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark. Differential Revision: [D71570143](https://our.internmc.facebook.com/intern/diff/D71570143/) [ghstack-poisoned]
## Context The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead: 1. Copy from CPU to staging 2. Copy from staging to GPU Buffer/Image And this is done for both inputs and outputs. Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor. Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark. Differential Revision: [D71570143](https://our.internmc.facebook.com/intern/diff/D71570143/) ghstack-source-id: 273059929 Pull Request resolved: #9468
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9468
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 PendingAs of commit f4c4585 with merge base 7159650 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D71570143 |
This PR needs a
|
…arking" ## Context The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead: 1. Copy from CPU to staging 2. Copy from staging to GPU Buffer/Image And this is done for both inputs and outputs. Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor. Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark. Differential Revision: [D71570143](https://our.internmc.facebook.com/intern/diff/D71570143/) [ghstack-poisoned]
Pull Request resolved: #9468 ## Context The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead: 1. Copy from CPU to staging 2. Copy from staging to GPU Buffer/Image And this is done for both inputs and outputs. Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor. Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark. Differential Revision: [D71570143](https://our.internmc.facebook.com/intern/diff/D71570143/) ghstack-source-id: 274197244
|
This pull request was exported from Phabricator. Differential Revision: D71570143 |
e68552d
into
gh/SS-JIA/199/base
Stack from ghstack (oldest at bottom):
Context
The bencmarks generated by the generated operator benchmarks currently have a high amount of copy overhead:
And this is done for both inputs and outputs.
Since benchmarks are not correctness tests, copying data in/out is not really necessary especially if the compute shader does not have behaviour dependent on the contents of the input/output tensor.
Make it so that by default, the benchmark will only execute the op without adding copy overhead. However, test cases can optionally specify that the copy overhead should be included in the benchmark.
Differential Revision: D71570143