Skip to content

Repeat performance tests up to 5 times on GHA#9000

Merged
alexreinking merged 2 commits intomainfrom
alexreinking/retry-performance-tests
Mar 10, 2026
Merged

Repeat performance tests up to 5 times on GHA#9000
alexreinking merged 2 commits intomainfrom
alexreinking/retry-performance-tests

Conversation

@alexreinking
Copy link
Copy Markdown
Member

The GHA runners are very noisy and our performance tests aren't very stable anyway.

skip_buildbots because this is GHA workflow-only

@alexreinking alexreinking added the skip_buildbots Do not run buildbots on this PR. Must add before opening PR as we scan labels immediately. label Mar 10, 2026
@alexreinking alexreinking merged commit 3c1d47f into main Mar 10, 2026
16 checks passed
@mcourteaux
Copy link
Copy Markdown
Contributor

memcpy does not pass on my Linux machine, ever. It is 3x slower than libc's memcpy. I looked at the disassembly of both Halide and memcpy: it seems that libc has a ~5 times unrolled ymm-vectorized move, with streaming stores, and memory prefetching.

So it looked like 6x prefetch, 6x load, 6x stream store, repeat.

@abadams
Copy link
Copy Markdown
Member

abadams commented Mar 13, 2026

Maybe we should just delete it. It's true that a Halide memcpy shouldn't be inherently slower than a libc memcpy, but there are various reasons that might be the case. The test is really asking "do we generate a sane inner loop for a memcpy", but a sane inner loop might be a long way from the best inner loop on a particular machine, and if we didn't generate a sane inner loop for the most trivial pipeline possible much much more would be broken than just that test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip_buildbots Do not run buildbots on this PR. Must add before opening PR as we scan labels immediately.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants