You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even once the collation script is packaged (#280), there is currently no workflow that regenerates _web/entities/{datasets,tasks,models}.json when a dataset / task / model changes on main. The website happily fetches stale data forever.
Today the entity files on _web were last updated 2025-05-20 (Update prelim entity files.) and are visibly stale (missing AUMCdb, missing several abnormal_lab/cbc and vital/hypotension tasks, embedded MIMIC-IV config still uses the obsolete MEDS_cohort_dir= parameter).
What needs to be done
Add a workflow .github/workflows/regenerate_entities.yaml that:
Triggers on push to main when files under src/MEDS_DEV/{datasets,tasks,models}/** change. Also exposes workflow_dispatch for manual runs.
Checks outmain (for source) and _web (in a separate path, for output).
Installs MEDS-DEV from the local checkout so it picks up the latest collation script.
Commits and pushes to _web if anything changed. Skip the commit if no diff.
Open questions
Should the workflow run on every push to main, or only on PR merges that touched the relevant paths? (Path filter on paths: answers this naturally.)
For the push back to _web: same GITHUB_TOKEN triggering issue described in aggregate benchmark results workflow is not triggering on new result submissions #238 applies if any downstream automation is ever wired up to react to _web changes. For now there's no downstream — the website fetches at request time — so default GITHUB_TOKEN is fine.
Acceptance criteria
Editing any dataset.yaml / task.yaml / model.yaml (or related files) in a merged PR results in _web/entities/*.json being regenerated within minutes.
Stale entries on _web (e.g. missing AUMCdb) appear after the workflow runs once.
Workflow is idempotent (running it twice yields no new commits).
Background
Even once the collation script is packaged (#280), there is currently no workflow that regenerates
_web/entities/{datasets,tasks,models}.jsonwhen a dataset / task / model changes onmain. The website happily fetches stale data forever.Today the entity files on
_webwere last updated 2025-05-20 (Update prelim entity files.) and are visibly stale (missing AUMCdb, missing severalabnormal_lab/cbcandvital/hypotensiontasks, embedded MIMIC-IV config still uses the obsoleteMEDS_cohort_dir=parameter).What needs to be done
Add a workflow
.github/workflows/regenerate_entities.yamlthat:mainwhen files undersrc/MEDS_DEV/{datasets,tasks,models}/**change. Also exposesworkflow_dispatchfor manual runs.main(for source) and_web(in a separate path, for output).meds-dev-collate-entities --repo_dir . --output_dir <web>/entities --do_overwrite(CLI shape per Promote web tooling (collate_entities, aggregate_results) into the package #280)._webif anything changed. Skip the commit if no diff.Open questions
main, or only on PR merges that touched the relevant paths? (Path filter onpaths:answers this naturally.)_web: sameGITHUB_TOKENtriggering issue described in aggregate benchmark results workflow is not triggering on new result submissions #238 applies if any downstream automation is ever wired up to react to_webchanges. For now there's no downstream — the website fetches at request time — so defaultGITHUB_TOKENis fine.Acceptance criteria
dataset.yaml/task.yaml/model.yaml(or related files) in a merged PR results in_web/entities/*.jsonbeing regenerated within minutes._web(e.g. missing AUMCdb) appear after the workflow runs once.Related