You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
this is a quick-start to inference, discussed yesterday... for now only a draft to get feedback, then we will add some more info (especially about formalizing a process for "simulation pages" on confluence). To reviewing resulting doc, see this page https://docs.e3sm.org/aigroup/pr-preview/pr-19/ace2-inference/
Thanks @mahf708 for putting up this inference quickstart. I faithfully followed the steps and obtained a successful test1.
After going through the process, I feel it will be useful to have some additional guides:
How to create an environment that contains pytorch and uv, and ensuring them running with the same python interpreter. What other packages are suggested to also be included. I likely started with an unclean environment, causing uv to run a different python interpreter. I had to force the UV_PYTHON to get it to work.
Need to request a gpu node to run the uv command, at least for perlmutter. Though the login node also has a GPU, running on a login node would get CUDA out of memory error. Should n_initial_conditions always equal to num_data_workers? I requested 2 gpus on 1 node to run the example. Are they a suitable match (best use of the requested resource)?
restart.nc file is recorded. How to perform a restart run?
Some info about the output data: what are the pressure values for levels 0 to 7? How the monthly_mean data are computed? Equal length for all months? (The example test1 ran 1000 steps -- 250 days, and has 11 monthly values)
@wlin7 I added some info about python envs separately; the expectation is that people will figure this out, but uv is used as a shortcut. In the future, we can potentially host an aigroup env like e3sm-unified so that people can just activate it
Hi @elynnwu, do you know what the restart.nc does in the context of ACE? I didn't look into it yet. Also, adding this quickstart for people to try out inference using ACE, any comments appreciated!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
this is a quick-start to inference, discussed yesterday... for now only a draft to get feedback, then we will add some more info (especially about formalizing a process for "simulation pages" on confluence). To reviewing resulting doc, see this page https://docs.e3sm.org/aigroup/pr-preview/pr-19/ace2-inference/