Port test-gpu.yml to Open-Athena/ec2 runner#308
Conversation
jder
left a comment
There was a problem hiding this comment.
Thanks! Just one question/suggestion from me.
1310f2d to
cde1ff9
Compare
ba13991 to
352f33b
Compare
|
Hey @ryan-williams just wanted to check on the state of this PR. What's left before we can merge this? |
|
@jder merging this now is reasonable. The only reason to wait would be because it uses ec2-gha@v2, which I've only just sent for review at Open-Athena/ec2-gha#3. It empirically works with this repo's GPU tests, so if you're comfortable with that, we can merge this without waiting on that PR. |
| uses: Open-Athena/ec2-gha/.github/workflows/runner.yml@v2 | ||
| with: | ||
| ec2_instance_type: g4dn.xlarge | ||
| ec2_image_id: ami-00096836009b16a22 # Deep Learning OSS Nvidia Driver AMI GPU PyTorch |
There was a problem hiding this comment.
FYI when you update this there will be a change here to a new AMI.
There was a problem hiding this comment.
I'd actually pushed a previous merge here that missed this! I've corrected it now (I force-pushed over the previous bad merge, that also brings in latest main, so this thread is now orphaned 🫠 ty!)
f09e1d6 to
823b4cb
Compare
823b4cb to
0c7b02f
Compare
- In [#308] I neglected to update `benchmarks.yml` to use ec2-gha, which resulted in an invalid workflow file. - However, [`benchmarks.yml`] has been broken on `main` since [#384], which added a FOMO model benchmark that uses more memory (and GPU memory) - I've not yet found instances that can handle either - e.g. [benchmarks#175] uses an [m7i.8xlarge] for CPU benchmarks and a [g6.xlarge] for GPU, and both fail This PR fixes the former by updating `benchmarks.yml` to use ec2-gha, and works around the latter by restoring the benchmarks configs to the pre-[#384] state. [benchmarks#172] is a passing run from [`f313865`] [Open-Athena/Ocean_Emulator#399]: https://github.com/Open-Athena/Ocean_Emulator/pull/399 [benchmarks#172]: https://github.com/Open-Athena/Ocean_Emulator/actions/runs/17985502557/job/51162739635 [benchmarks#175]: https://github.com/Open-Athena/Ocean_Emulator/actions/runs/17986640677/job/51166555978 [m7i.8xlarge]: https://instances.vantage.sh/aws/ec2/m7i.8xlarge [g6.xlarge]: https://instances.vantage.sh/aws/ec2/g6.xlarge [`benchmarks.yml`]: https://github.com/Open-Athena/Ocean_Emulator/actions/workflows/benchmarks.yml?query=branch%3Amain [#308]: https://github.com/Open-Athena/Ocean_Emulator/pull/308 [#384]: https://github.com/Open-Athena/Ocean_Emulator/pull/384 [`f313865`]: https://github.com/Open-Athena/Ocean_Emulator/pull/399/commits/f313865a76db1b401243a4189ae876954b94c4c9 <!-- Synced with https://gist.github.com/81ed211c8bf19f9b97ab1d4c3cdb51bd/1135cb77d4fb2c35bdd774f1330751e4b233f35d via [github-pr.py](https://github.com/ryan-williams/git-helpers/blob/main/github/github-pr.py) --> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jesse Rusak <jesse@openathena.ai>
Use Open-Athena/ec2-gha#3 (self-terminating GHA EC2 runner)
Before/After gif: