Background
The _web branch holds JSON manifests (entities/datasets.json, entities/tasks.json, entities/models.json) that the MEDS website fetches at runtime to render the dataset / task / model catalog pages. These files are currently manually maintained and have drifted significantly from the source code (see #186, #187, #188 for specifics).
A working prototype of the regeneration logic exists on the unmerged add_web_scripts branch:
src/MEDS_DEV/web/collate_entities.py — walks src/MEDS_DEV/{datasets,tasks,models}/ and produces the JSON manifests
src/MEDS_DEV/web/aggregate_results.py — improved version of the existing _web/scripts/aggregate_results.py (currently lives only on _web, not in the package, not under CI)
Both have docstring tests and are written as proper package modules.
What needs to be done
- Rebuild on current
main: the branch is stale and the source layout has changed since it was written (e.g., _results task vs. nested category readmes, MIMIC-IV dataset.yaml updates, AUMCdb addition). Re-port the scripts cleanly rather than merging the branch as-is.
- Move under the package: place at
src/MEDS_DEV/web/{collate_entities,aggregate_results}.py. Keep doctests.
- Expose CLI entry points in
pyproject.toml:
meds-dev-collate-entities → writes datasets.json / tasks.json / models.json
meds-dev-aggregate-results → consolidates per-issue result.json blobs
- Replace the legacy script on
_web: once the package version is shipped, update aggregate_benchmark_results.yaml to install MEDS-DEV from PyPI and invoke meds-dev-aggregate-results, then delete _web/scripts/aggregate_results.py.
- Delete or archive the
add_web_scripts branch once content is incorporated (currently a dangling reference).
Acceptance criteria
Related
Background
The
_webbranch holds JSON manifests (entities/datasets.json,entities/tasks.json,entities/models.json) that the MEDS website fetches at runtime to render the dataset / task / model catalog pages. These files are currently manually maintained and have drifted significantly from the source code (see #186, #187, #188 for specifics).A working prototype of the regeneration logic exists on the unmerged
add_web_scriptsbranch:src/MEDS_DEV/web/collate_entities.py— walkssrc/MEDS_DEV/{datasets,tasks,models}/and produces the JSON manifestssrc/MEDS_DEV/web/aggregate_results.py— improved version of the existing_web/scripts/aggregate_results.py(currently lives only on_web, not in the package, not under CI)Both have docstring tests and are written as proper package modules.
What needs to be done
main: the branch is stale and the source layout has changed since it was written (e.g.,_resultstask vs. nested category readmes, MIMIC-IV dataset.yaml updates, AUMCdb addition). Re-port the scripts cleanly rather than merging the branch as-is.src/MEDS_DEV/web/{collate_entities,aggregate_results}.py. Keep doctests.pyproject.toml:meds-dev-collate-entities→ writesdatasets.json/tasks.json/models.jsonmeds-dev-aggregate-results→ consolidates per-issue result.json blobs_web: once the package version is shipped, updateaggregate_benchmark_results.yamlto install MEDS-DEV from PyPI and invokemeds-dev-aggregate-results, then delete_web/scripts/aggregate_results.py.add_web_scriptsbranch once content is incorporated (currently a dangling reference).Acceptance criteria
meds-dev-collate-entitiesandmeds-dev-aggregate-resultsare installable from PyPI as CLI commands.aggregate_benchmark_results.yamlno longer depends on a script committed to_web.Related