More datasets#258
Conversation
|
@mmcdermott What would be the standard procedure for datasets without a demo? Should the command just be an echo statement or something to make the tests run? |
ReviewThis is a substantial PR — 6 new datasets (EHRShot, HIRID, INSPIRE, NWICU, SICdb, eICU), AUMCdb updates, task RetargetingThis PR targets ScopeThe PR bundles several distinct concerns:
Once retargeted to Dataset definitionsThe new datasets look well-structured overall. A few issues:
Open question@rvandewater's question about datasets without demos still needs an answer — this affects testability of most of the new datasets. Recommendation
|
Detailed review (follow-up)A deeper pass found several critical data correctness issues beyond the structural points in my earlier comment. Critical (will produce wrong results or break existing functionality)
Significant
Before merge (updated checklist)
|
Replicates the dataset additions from #258 on top of current dev (the original branch is too far behind to merge directly; this PR keeps only the dataset/task content, not the stale infra reverts). Datasets added (`src/MEDS_DEV/datasets/<name>/`): - EHRShot — Stanford EHR cohort with pre-built MEDS extraction. - HIRID — Bern ICU dataset via MEDS_extract-HIRID. - INSPIRE — perioperative dataset via MEDS_extract-INSPIRE. - NWICU — Northwestern ICU dataset via NWICU_MEDS. - SICdb — Salzburg ICU dataset via MEDS_extract-SICdb. - eICU — multi-center US ICU dataset via MEDS_extract-eICU (with demo). AUMCdb is also completed (was previously just predicates.yaml + README): adds dataset.yaml, requirements.txt, refs.bib, and the full ICU predicate set from the upstream PR. Tasks: mortality/in_icu/first_24h now lists AUMCdb and NWICU under supported_datasets in addition to MIMIC-IV. MIMIC-IV/README.md: pulled the longer description + access-requirements write-up from #258 (replaces the TODO placeholders). Each dataset.yaml has a `build_demo` command — for datasets without a real demo recipe, this is a stub echo so registry validation passes (matching the pattern HIRID already used in the source PR). Co-Authored-By: Robin P. van de Water <rvandewater@users.noreply.github.com> Co-Authored-By: Patrick Rockenschaub <prockenschaub@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No description provided.