Skip to content

Commit d7752d4

Browse files
authored
Merge pull request #400 from xylar/add-agent-to-update-config-machines
Add an action to ask Copilot to update `config_machines.xml` and spack templates
2 parents 90b5b2f + 9649b69 commit d7752d4

9 files changed

Lines changed: 1259 additions & 24 deletions

File tree

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
name: Daily config_machines update check
2+
3+
on:
4+
schedule:
5+
- cron: '0 8 * * *'
6+
workflow_dispatch:
7+
8+
env:
9+
PIXI_ENV: py314
10+
ISSUE_TITLE: Daily config_machines drift detected
11+
PRIMARY_ASSIGNEE: xylar
12+
REPORT_JSON: cime_machine_config_report.json
13+
REPORT_MARKDOWN: cime_machine_config_report.md
14+
15+
jobs:
16+
check-config-machines:
17+
runs-on: ubuntu-latest
18+
permissions:
19+
contents: read
20+
steps:
21+
- name: Checkout main
22+
uses: actions/checkout@v6
23+
with:
24+
ref: main
25+
26+
- name: Set up Pixi
27+
uses: prefix-dev/setup-pixi@v0.9.5
28+
with:
29+
pixi-version: v0.62.2
30+
cache: ${{ hashFiles('pixi.lock') != '' }}
31+
environments: ${{ env.PIXI_ENV }}
32+
33+
- name: Install mache from main
34+
run: |
35+
pixi run -e ${PIXI_ENV} python -m pip install --no-deps \
36+
--no-build-isolation -e .
37+
38+
- name: Generate machine update report
39+
env:
40+
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
41+
run: |
42+
pixi run -e ${PIXI_ENV} python utils/update_cime_machine_config.py \
43+
--json-output ${REPORT_JSON} \
44+
--markdown-output ${REPORT_MARKDOWN} \
45+
--run-url ${RUN_URL}
46+
47+
- name: Upload machine update report
48+
uses: actions/upload-artifact@v4
49+
with:
50+
name: cime-machine-config-report
51+
path: |
52+
${{ env.REPORT_JSON }}
53+
${{ env.REPORT_MARKDOWN }}
54+
55+
- name: Synchronize automation issue
56+
env:
57+
GH_CLI_TOKEN: ${{ secrets.GH_CLI_TOKEN }}
58+
DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
59+
run: |
60+
if [ -z "${GH_CLI_TOKEN}" ]; then
61+
echo "GH_CLI_TOKEN is not configured; report generated" \
62+
" but no issue was synchronized."
63+
exit 0
64+
fi
65+
66+
pixi run -e ${PIXI_ENV} python \
67+
utils/manage_cime_machine_config_issue.py \
68+
--report-json ${REPORT_JSON} \
69+
--report-markdown ${REPORT_MARKDOWN} \
70+
--repository ${GITHUB_REPOSITORY} \
71+
--token ${GH_CLI_TOKEN} \
72+
--issue-title "${ISSUE_TITLE}" \
73+
--base-branch ${DEFAULT_BRANCH} \
74+
--primary-assignee ${PRIMARY_ASSIGNEE}
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name: Copilot Setup Steps
2+
3+
on:
4+
workflow_dispatch:
5+
push:
6+
paths:
7+
- .github/workflows/copilot-setup-steps.yml
8+
- pixi.toml
9+
- pyproject.toml
10+
pull_request:
11+
paths:
12+
- .github/workflows/copilot-setup-steps.yml
13+
- pixi.toml
14+
- pyproject.toml
15+
16+
jobs:
17+
copilot-setup-steps:
18+
runs-on: ubuntu-latest
19+
permissions:
20+
contents: read
21+
timeout-minutes: 20
22+
steps:
23+
- name: Checkout code
24+
uses: actions/checkout@v6
25+
26+
- name: Set up Pixi
27+
uses: prefix-dev/setup-pixi@v0.9.5
28+
with:
29+
pixi-version: v0.62.2
30+
cache: ${{ hashFiles('pixi.lock') != '' }}
31+
environments: py314
32+
33+
- name: Install mache in the Pixi environment
34+
run: |
35+
pixi run -e py314 python -m pip install --no-deps \
36+
--no-build-isolation -e .
37+
38+
- name: Verify the agent environment
39+
run: |
40+
pixi run -e py314 python --version
41+
pixi run -e py314 mache --help

docs/developers_guide/adding_new_machine.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ can be added to mache. This list is a *copy* of the
1313
which we try to keep up-to-date. If you wish to add a machine that is not
1414
included in this list, you must contact the E3SM-Project developers to add your
1515
machine.
16+
17+
For details on the automated workflow that detects upstream drift in this file
18+
and assigns follow-up work to Copilot, see
19+
{doc}`config_machines_updates`.
1620
:::
1721

1822
(dev-new-config-file)=
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Automated `config_machines.xml` updates
2+
3+
This page describes the automation that watches for upstream changes to
4+
E3SM's `config_machines.xml`, opens or refreshes a Copilot task when drift is
5+
detected, and explains how maintainers are expected to review the resulting
6+
pull request.
7+
8+
## Goal
9+
10+
`mache` keeps a repository-local copy of the upstream E3SM machine list in
11+
`mache/cime_machine_config/config_machines.xml`.
12+
13+
The automation added here does **not** edit that file directly. Instead, it:
14+
15+
1. Compares the copy in `mache` against the current upstream E3SM source.
16+
2. Produces a structured report describing any drift for supported machines.
17+
3. Creates or updates one GitHub issue that assigns the work to Copilot.
18+
4. Lets Copilot open a PR that updates `config_machines.xml` and any related
19+
Spack configuration.
20+
21+
This keeps the source-of-truth update in a reviewed pull request rather than a
22+
silent CI-side commit.
23+
24+
## Pieces of the automation
25+
26+
### Daily workflow
27+
28+
`.github/workflows/cime_machine_config_update.yml`
29+
: Runs once a day at `0 8 * * *` and can also be started manually with
30+
`workflow_dispatch`.
31+
32+
The job:
33+
34+
1. Checks out `main`.
35+
2. Sets up the `py314` Pixi environment.
36+
3. Installs `mache` from the checked-out repository.
37+
4. Runs `utils/update_cime_machine_config.py`.
38+
5. Uploads the generated JSON and Markdown report artifacts.
39+
6. Runs `utils/manage_cime_machine_config_issue.py` when `GH_CLI_TOKEN` is
40+
configured.
41+
42+
### Copilot environment workflow
43+
44+
`.github/workflows/copilot-setup-steps.yml`
45+
: Defines the setup steps the Copilot cloud agent can use on the default
46+
branch so it starts from a working Pixi environment with `mache` installed.
47+
48+
### Drift report builder
49+
50+
`utils/update_cime_machine_config.py`
51+
: Downloads the current upstream E3SM `config_machines.xml`, compares it with
52+
`mache/cime_machine_config/config_machines.xml`, prints a short console
53+
summary, and optionally writes:
54+
55+
- a JSON report for machine-readable automation,
56+
- a Markdown issue body for Copilot and human reviewers.
57+
58+
`mache/cime_machine_config/report.py`
59+
: Contains the structured comparison logic. It determines which supported
60+
machines changed, identifies module and environment-variable drift, infers
61+
related package groups, and lists candidate Spack template files to review.
62+
63+
### Issue synchronization
64+
65+
`utils/manage_cime_machine_config_issue.py`
66+
: Owns the GitHub-side lifecycle for the automation issue.
67+
68+
If drift exists, it creates or updates the issue.
69+
70+
If no drift exists, it closes the existing issue.
71+
72+
If Copilot assignment fails, it falls back to creating or updating the same
73+
issue without Copilot assignment so the report is still visible.
74+
75+
### Tests
76+
77+
`tests/test_cime_machine_config_report.py`
78+
: Verifies that the report builder detects relevant drift and that the rendered
79+
issue body contains the required maintainer instructions.
80+
81+
## How `config_machines.xml` gets updated
82+
83+
The important point is that the scheduled workflow never edits
84+
`mache/cime_machine_config/config_machines.xml` itself.
85+
86+
The update path is:
87+
88+
1. The workflow detects drift between the `mache` copy and upstream E3SM.
89+
2. The workflow creates or refreshes a GitHub issue.
90+
3. Copilot is assigned to that issue.
91+
4. Copilot opens a pull request against `main`.
92+
5. That PR updates `mache/cime_machine_config/config_machines.xml` first, then
93+
any related Spack templates or version strings that the report indicates
94+
should be reviewed.
95+
6. A maintainer reviews and merges the PR.
96+
7. The next daily run compares the merged repository state against upstream
97+
again.
98+
99+
If the PR fully resolved the drift, the issue is closed automatically on the
100+
next run.
101+
102+
If only part of the drift was resolved, the issue stays open and its body is
103+
updated to reflect the remaining work.
104+
105+
## What Copilot is told to do
106+
107+
Copilot receives instructions from two places.
108+
109+
### Fixed API-level instructions
110+
111+
`utils/manage_cime_machine_config_issue.py` adds the following guidance in the
112+
`agent_assignment` payload:
113+
114+
- Use the issue body as the task definition.
115+
- Update `config_machines.xml` first.
116+
- Then update related Spack templates and version strings.
117+
- Add TODO comments in the PR when prefix or path changes need reviewer
118+
confirmation.
119+
120+
### Generated issue-body instructions
121+
122+
`mache/cime_machine_config/report.py` renders the issue body for the current
123+
drift and includes:
124+
125+
- the timestamp and upstream source URL,
126+
- the workflow run URL,
127+
- the list of affected supported machines,
128+
- the required work list,
129+
- per-machine details such as package groups, prefix or path variables, and
130+
candidate Spack templates to inspect.
131+
132+
The required work section tells Copilot to:
133+
134+
- update `mache/cime_machine_config/config_machines.xml` for the affected
135+
supported machines,
136+
- update Spack templates and version strings when module or environment drift
137+
implies different package versions,
138+
- keep the PR focused when the change is only version or module drift,
139+
- add a TODO in the PR instead of guessing when a new prefix or path is not
140+
obvious.
141+
142+
## Why this does not create a new issue every day
143+
144+
The workflow is designed to reuse one open issue rather than create a new one
145+
for every scheduled run.
146+
147+
`utils/manage_cime_machine_config_issue.py` looks for an existing open issue
148+
with the fixed title stored in the workflow environment:
149+
150+
- `ISSUE_TITLE: Daily config_machines drift detected`
151+
152+
The lifecycle is:
153+
154+
1. If no matching open issue exists and drift is detected, create one.
155+
2. If a matching open issue already exists and drift is still present, update
156+
that same issue.
157+
3. If no drift remains and the issue exists, close it.
158+
159+
That means an unresolved drift while you are away does **not** produce a fresh
160+
issue every day. The same issue remains open and is refreshed in place.
161+
162+
A new issue would only be created if one of these is true:
163+
164+
- the existing automation issue was manually closed while drift still exists,
165+
- the issue title configured in the workflow was changed,
166+
- the existing issue was deleted or otherwise no longer appears as an open
167+
issue in the repository.
168+
169+
## Reviewer workflow
170+
171+
When Copilot opens a PR from this issue, the reviewer should check the changes
172+
in this order.
173+
174+
### 1. `config_machines.xml` changes
175+
176+
Verify that the PR updates
177+
`mache/cime_machine_config/config_machines.xml` only for supported machines
178+
reported by the workflow, and that those changes match the current upstream
179+
E3SM machine definitions.
180+
181+
In practice, the easiest cross-check is to compare the PR against the report
182+
artifact from the workflow run that opened or refreshed the issue.
183+
184+
### 2. Related Spack updates
185+
186+
If the report lists package groups or candidate Spack templates, check that the
187+
PR updated the relevant `mache/spack/*.yaml` inputs and any version strings
188+
that should track the new module or environment values.
189+
190+
If the report does not indicate Spack-relevant drift, the PR should usually be
191+
limited to `config_machines.xml`.
192+
193+
### 3. Ambiguous path or prefix changes
194+
195+
When upstream changes a path-like variable such as `NETCDF_PATH`, the correct
196+
replacement in `mache` may not be obvious from the XML alone.
197+
198+
In that case, the expected behavior is **not** to guess. The PR should leave a
199+
TODO note for the reviewer and explain what needs confirmation.
200+
201+
### 4. Validation
202+
203+
At minimum, reviewers or PR authors should run the same local checks used by
204+
development in this repository.
205+
206+
Generate the current report locally:
207+
208+
```bash
209+
pixi run -e py314 python utils/update_cime_machine_config.py \
210+
--json-output /tmp/cime_machine_config_report.json \
211+
--markdown-output /tmp/cime_machine_config_report.md
212+
```
213+
214+
Run the focused tests:
215+
216+
```bash
217+
pixi run -e py314 pytest tests/test_cime_machine_config_report.py
218+
```
219+
220+
Run pre-commit on changed files before merging:
221+
222+
```bash
223+
pixi run -e py314 pre-commit run --files <changed files>
224+
```
225+
226+
## Manual dry run for maintainers
227+
228+
To exercise the detection path without waiting for the cron schedule:
229+
230+
1. Trigger the workflow manually with `workflow_dispatch`, or
231+
2. Run `utils/update_cime_machine_config.py` locally in the Pixi environment.
232+
233+
If `GH_CLI_TOKEN` is not configured, the workflow still generates and uploads
234+
the report artifacts but skips issue synchronization.
235+
236+
That is a safe way to validate the comparison and report rendering logic
237+
without asking Copilot to act on the result.
238+
239+
## Operational notes
240+
241+
- `GH_CLI_TOKEN` should be a user token with access to create and update
242+
issues in the repository. A classic PAT with `repo` scope is sufficient.
243+
- Copilot assignment additionally depends on Copilot cloud agent being enabled
244+
for the repository.
245+
- The workflow uses the repository's current `main` branch as the comparison
246+
baseline and as the branch Copilot is asked to target.

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ users_guide/sync/diags
2424
developers_guide/quick_start
2525
developers_guide/contributing
2626
developers_guide/deploy
27+
developers_guide/config_machines_updates
2728
developers_guide/adding_new_machine
2829
developers_guide/spack
2930
developers_guide/jigsaw

0 commit comments

Comments
 (0)