Skip to content

Promote web tooling (collate_entities, aggregate_results) into the package #280

@mmcdermott

Description

@mmcdermott

Background

The _web branch holds JSON manifests (entities/datasets.json, entities/tasks.json, entities/models.json) that the MEDS website fetches at runtime to render the dataset / task / model catalog pages. These files are currently manually maintained and have drifted significantly from the source code (see #186, #187, #188 for specifics).

A working prototype of the regeneration logic exists on the unmerged add_web_scripts branch:

  • src/MEDS_DEV/web/collate_entities.py — walks src/MEDS_DEV/{datasets,tasks,models}/ and produces the JSON manifests
  • src/MEDS_DEV/web/aggregate_results.py — improved version of the existing _web/scripts/aggregate_results.py (currently lives only on _web, not in the package, not under CI)

Both have docstring tests and are written as proper package modules.

What needs to be done

  1. Rebuild on current main: the branch is stale and the source layout has changed since it was written (e.g., _results task vs. nested category readmes, MIMIC-IV dataset.yaml updates, AUMCdb addition). Re-port the scripts cleanly rather than merging the branch as-is.
  2. Move under the package: place at src/MEDS_DEV/web/{collate_entities,aggregate_results}.py. Keep doctests.
  3. Expose CLI entry points in pyproject.toml:
    • meds-dev-collate-entities → writes datasets.json / tasks.json / models.json
    • meds-dev-aggregate-results → consolidates per-issue result.json blobs
  4. Replace the legacy script on _web: once the package version is shipped, update aggregate_benchmark_results.yaml to install MEDS-DEV from PyPI and invoke meds-dev-aggregate-results, then delete _web/scripts/aggregate_results.py.
  5. Delete or archive the add_web_scripts branch once content is incorporated (currently a dangling reference).

Acceptance criteria

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    Code Cleanliness/Tech DebtWebsite / BrandingFor website/branding/tutorial content issues (beyond pure technical documentation)priority:highHigh priority; should be included in subsequent release candidate.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions