Skip to content

Running tests using the CESM CIME test infrastructure

Kate Thayer-Calder edited this page Apr 17, 2026 · 11 revisions

Overview

The CESM case control system includes a very powerful testing infrastructure that is used throughout the model for a variety of purposes. The testing tools allow for reproducible tests that exercise the UI scripts and multiple code configurations. They also generate baseline results that can be store and compared against in later tests. This document describes how to set up CISM-wrapper to include a custom branch of CISM and how to run a single test or a suite of tests using the CIME create_test command. It also gives a brief overview of the test string and links on where to find more information.

Step 1: Set up CISM-Wrapper with CISM code

Clone a copy of CISM-Wrapper to your work directory. If you need to check out a working branch of CISM-wrapper to match your CISM code, this is a good time to do that.

git clone https://github.com/ESCOMP/CISM-wrapper.git CISM-wrapper-testing

cd CISM-wrapper-testing

Optionally...

git checkout my_feature_branch

At this point, you will need to edit the .gitmodules file to point to your desired CISM checkout within the wrapper. For more information about editing this file, see the section below. Once you have this file pointing to the code you want, you can run git-fleximod to checkout or update all of your externals and have a working CISM-Wrapper codebase.

./bin/git-fleximod update

Step 2: Run a test

From the root of your CISM-Wrapper checkout, go into the CIME/Scripts directory to find the create_test script.

cd cime/

A good, basic test to run is a simple "smoke test". This just runs the model for a specified period of time and checks to make sure there are no errors with the default workflow. You will need to use the "qcmd" on Derecho, so setting the PBS_ACCOUNT environment variable will be necessary before calling this script there.

Basic usage

./create_test SMS_Ly6.f09_g17_gris4.T1850Gg.derecho_intel

On Derecho, it is a good idea to run this with qcmd

qcmd -- ./create_test SMS_Ly6.f09_g17_gris4.T1850Gg.derecho_intel

Note that the default use of this script on Derecho puts a job in the regular queue with a 12:00:00 wallclock time. I will often adjust this in the queue with a qalter command.

qalter -lwalltime=01:55:00 [jobid]

It is possible to request a different walltime than the default with the flag --walltime. It's probably a good idea, in general, to run ./create_test --help and read through the flags and options if curious.

So, if Derecho is heavily used, you might consider running this test with a short wallclock request with a command like the following:

qcmd -- ./create_test SMS_Ly6.f09_g17_gris4.T1850Gg.derecho_intel --walltime 00:45:00

Step 3. Checking the results

The result of this simple create_test command is a directory in your scratch space with the name of the test followed by the date and automatically generated test ID. The command also creates a script in your scratch space named cs.status.[date]_[testid] and a script named testreporter. Running the ./cs.status script will give you an overview of the results of your test. For more information than that, you can change directories into your test directory.

cd /glade/derecho/scratch/[user]

cd SMS_Ly6.f09_g17_gris4.T1850Gg.derecho_intel.[date]_[testid]/

This is the case directory for your test case. The model built into the bld subdirectory and ran out of the run subdirectory. In this case directory, you can see the CaseStatus file that is the same as a regular CaseStatus and describes all of the xml changes to the case that were made. There is a README.case file that describes the compset and grid information for the test. The TestStatus file includes the text from the cs.status script that we ran before. And there are several logs that are useful to simply understand the test results, such as a test.[testname].o[jobid] stdout log file from the super computer, and the TestStatus.log file that goes into detail on all of the steps and comparisons done within the test.

Finally, if you go into the run subdirectory for this test, you will see all of the required input for the model to run (including the cism.gris.config file) and all of the logs and output for the model. In default test cases, the history files are not archived (the short-term archiver does not run) so there should be history files in your run directory. Using a --generate flag to generate a baseline from a test might move or copy these history files to another location so they can be accessed to compare with other tests.

Step 4. Decoding and modifying the test string

In our example case here, we described our test with a complicated and cryptic string of acronyms and codes. Here is a closer look at that string with a decoding line below it.

SMS_Ly6.f09_g17_gris4.T1850Gg.derecho_intel

[Test Type]_[debug]_[Length in unitNumber].[grid alias].[compset].[computer]_[compiler]

Let's explain this in detail back to front because that is simplest to most complicated.

[compiler] can be any compiler available in your CESM port on that machine. On Derecho, we usually test with intel and gnu

[computer] is the CESM name for your machine. On Derecho, that's derecho

[compset] specifies the active component configuration in this test. For most stand-alone CISM tests, they can be run in a T-G compset so they will likely be one of: T1850Gg, T1850Ga, or T1850Gag where the lower case letters stand for "Greenland" or "Antartica".

[grid alias] is the CESM short name for the grid you are using in this test. It will need to be one of the grid aliases listed in CISM-wrapper-testing\ccs_config\modelgrid_aliases_nuopc.xml that match the compset chosen in this test.

[Length in unitNumber] describes the length of the test. For a smoke test, this just describes how long to run the model. In our example case, the "L" stands for length, "y" stands for "in years", and "6" means 6 years. So, this test runs the model for 6 years. You can also specify test lengths in "m" (months), "d" (days), or "n" (number of steps). In atmospheric model tests, often the length is "Ln9" for 9 timesteps.

[debug] is a "D" flag that is added when we want to run the test without optimization and with enhanced debugging output. So, the string above would be increased to include a D like "SMS_D_Ly6" for a debug-mode test.

[Test type] is a short string that describes the type of test that is running. Since CESM can handle python code for any test situation, there can be very many of these! The typical ones that we use are:

SMS smoke test - run the model for a specific period of time.

ERS exact restart test - run the model for a specific period of time and write restart files halfway through. Restart the model from those files and ensure that they are bit-for-bit (b4b) the same as if the model had continued running.

ERI hybrid branch restart test - Create "ref1case" and "ref2case" that are clones of the main test case. Do an initial run of ref1case writing restarts at the end. Next, do a hybrid run in "ref2case" running with ref1 restarts, and writing restarts part-way through. Finally, do a branch run in the main case, starting from restarts written in "ref2case", and write restarts part-way through. Restart from these restarts and compare the results with the hybrid restarts to ensure they are the same.

ERP exact restart with changing processors - Initial PES set up to their default values. Do an initial run and write restarts halfway through (file suffix base). Half the number of tasks and threads for each component. Do a restart starting from the files written midway(file suffix rest). Compare component history files ‘.base’ and ‘.rest’ at the end of the test.

More information and longer list of test types can be found in the CIME documentation.

So, as an exercise for the user, could you create a test that runs the Antarctic ice sheet for 10 years with a check for an exact restart on Derecho using the Gnu compiler?

Answer `ERS_D_Ly10.f09_g17_ais8.T1850Ga.derecho_intel`

Step 5. Generate and compare against baselines

The CIME test infrastructure doesn't just do fancy complicated tests on all of our grids and compsets. It also will generate a "baseline" directory for you, where the history files are stored and easily referenced. Then, you can create a test that compares against those baselines to make sure that any code modifications you have made do not change answers unexpectedly. The baselines also include namelists as well, so a comparison will check for new, changed or missing namelist fields.

--generate [path/to/baselines] This flag to create_test will create a subdirectory of the given path with the same name as the test configuration string and copy history and namelist files into it when the test is completed successfully.

--compare [path/to/baselines] This flag causes create_test to look for a subdirectory of the given path with the same test configuration string and use it in a new step within the test where final history files are compared against the files in the matching test subdirectory. So, baselines must match all testing options or the compare step will fail with an error in the TestStatus that says "Missing Baselines". The TestStatus.log file includes detailed analysis of the comparisons including how many fields are different, what the RMS errors are, along with max and min differences in fields.

So, for our example test, you might run it once on "clean" code that has not been modified like:

qcmd -- ./create_test SMS_Ly6.f09_g17_gris4.T1850Gg.derecho_intel --walltime 00:45:00 --generate /glade/derecho/scratch/katec/test_feature_baselines

And then future tests can be compared against that baseline with a command like:

qcmd -- ./create_test SMS_Ly6.f09_g17_gris4.T1850Gg.derecho_intel --walltime 00:45:00 --compare /glade/derecho/scratch/katec/test_feature_baselines

Step 6. Test Suites

Coming Soon

Step 7. Re-running tests

Step 8. Changing Namelist or Configuration options

Appx. Editing the .gitmodules file

Coming Soon