ci: remove "max-workers" and test with our gradle defaults#15307
Conversation
If our defaults need improvement, let's improve them to make tests run faster. We want to take advantage of the hardware.
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
I'll force a couple re-builds here to see how our mac runner does with the change. Don't want to make things temporarily worse. |
|
The results are not promising. Would rather fix other slowdowns first before merging this one. I do think it is the correct approach, but we can't afford any slowdown on the mac runners. @dweiss I see very different slow tests in these CI runs, one big difference is that they write the test data to "real" filesystem. Who knows how slow the disks are. Some ideas:
|
|
The tmpfs trick is neat and I'm all for it but it won't help people who can't use it for some reason. So it'd be better to fix the tests so that they run faster but it's a humongous task, given the size of these tests. |
|
@dweiss I see differently, instead as simply respecting my I just happened to have configured my system to use tmpfs for these temporary files. |
|
We can change the defaults to respect TMPDIR... It's the way it is now because I thought it'd be easier to locate the generated files if they reside within the project. But maybe it's not the best idea. Don't know. |
Currently these don't use all the cpus: * Ubuntu and Windows runners have 4 cpus * MacOS X has 3 cpus (!) Dividing by two (assuming hyperthreads?) doesn't yield a speedup either. I set the value to 4 on 2-core system for this reason.
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
@dweiss this needs a second look, you may not like it. Here is the problem with the previous logic: Up to 67% of mac runner hardware was not used. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
Set back to draft, we can't really benchmark the build times right now, so erratic, seems to be the bottleneck is something else, looks to be around the filesystem. |
|
These CPU count defaults date back to hyperthreading - that's why they're halved (saturating cpus wasn't helping much + there are CPUs actively used by the GC and some slack was left for the filesystem). Times have changed. Maybe this patch is indeed more realistic since on VMs the cpu count is actually what's available and on beefy hardware I/O saturates much more quickly. |
|
OK, I'm back. The reason this change didn't work the first time, was C2 being enabled by It is my opinion java 25 set the CI system over the edge and started causing the timeouts. MacOS gets some tweaks because it is a crazy operating system. It is not totally tamed here, but just trying to reduce the variance in the worst-case builds (where you get the crazy slow runners). I am happy if it always stays under 20 minutes which is what I'm seeing now. I checked, there are no tmpfs mounts, so we create one. Combined with #15312 I am seeing much more stable, and faster results, e.g: ubuntu-latest: 7m 33s |
Those are also good defaults if you are running lucene tests with C2, which we were doing before. You need to spare half your CPUs, so they can handle all the JIT compilation :) |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
I've separately got a nice list of tests that were slowest on the macos runs (after painfully running it many times LOL). The same ones keep popping up, so they are doing something to make the macos or the ARM angry. Tomorrow can create an issue, maybe best to just temp- |
|
I will open a followup PR to cleanup the macos part of this, I'm sorry its a bit ugly, it is just putting out a fire for now. Any concerns, just revert it, but this PR should give us some breathing room. |
| mkdir /tmp/tmpfs | ||
| sudo mount_tmpfs -o noowners -s 1g /tmp/tmpfs | ||
| sudo sysctl debug.lowpri_throttle_enabled=0 | ||
| echo "tests.workDir=/tmp/tmpfs/lucene" >> build-options.local.properties |
There was a problem hiding this comment.
Thank you for using these local build option overrides.
If our defaults need improvement, let's improve them to make tests run faster. We want to take advantage of the hardware.
Relates: #15212