Skip to content

Add workflow to auto-regenerate _web entity manifests on source changes #281

@mmcdermott

Description

@mmcdermott

Background

Even once the collation script is packaged (#280), there is currently no workflow that regenerates _web/entities/{datasets,tasks,models}.json when a dataset / task / model changes on main. The website happily fetches stale data forever.

Today the entity files on _web were last updated 2025-05-20 (Update prelim entity files.) and are visibly stale (missing AUMCdb, missing several abnormal_lab/cbc and vital/hypotension tasks, embedded MIMIC-IV config still uses the obsolete MEDS_cohort_dir= parameter).

What needs to be done

Add a workflow .github/workflows/regenerate_entities.yaml that:

  1. Triggers on push to main when files under src/MEDS_DEV/{datasets,tasks,models}/** change. Also exposes workflow_dispatch for manual runs.
  2. Checks out main (for source) and _web (in a separate path, for output).
  3. Installs MEDS-DEV from the local checkout so it picks up the latest collation script.
  4. Runs meds-dev-collate-entities --repo_dir . --output_dir <web>/entities --do_overwrite (CLI shape per Promote web tooling (collate_entities, aggregate_results) into the package #280).
  5. Commits and pushes to _web if anything changed. Skip the commit if no diff.

Open questions

  • Should the workflow run on every push to main, or only on PR merges that touched the relevant paths? (Path filter on paths: answers this naturally.)
  • For the push back to _web: same GITHUB_TOKEN triggering issue described in aggregate benchmark results workflow is not triggering on new result submissions #238 applies if any downstream automation is ever wired up to react to _web changes. For now there's no downstream — the website fetches at request time — so default GITHUB_TOKEN is fine.

Acceptance criteria

  • Editing any dataset.yaml / task.yaml / model.yaml (or related files) in a merged PR results in _web/entities/*.json being regenerated within minutes.
  • Stale entries on _web (e.g. missing AUMCdb) appear after the workflow runs once.
  • Workflow is idempotent (running it twice yields no new commits).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    TestingAssociated with testing &/or CI practices to ensure validityWebsite / BrandingFor website/branding/tutorial content issues (beyond pure technical documentation)priority:highHigh priority; should be included in subsequent release candidate.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions