Add the eICU dataset#311
Draft
mmcdermott wants to merge 1 commit into
Draft
Conversation
This was referenced May 13, 2026
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
2 tasks
1e4dfd7 to
3260c05
Compare
mmcdermott
added a commit
that referenced
this pull request
May 13, 2026
Prerequisite for the per-dataset registration PRs (#305 AUMCdb, #306 EHRShot, #307 HIRID, #308 INSPIRE, #309 NWICU, #310 SICdb, #311 eICU). Most of those datasets' upstream extractors don't ship a publicly installable demo, and the existing registry validation requires every dataset to declare a build_demo command. Switches the convention to: a dataset has a demo iff its commands declare build_demo. Absence is the signal — no separate metadata field. - `test_all_datasets_have_commands` now requires `build_full` (which every dataset still needs) and allows missing `build_demo`. - `tests/conftest.py` drops datasets without `build_demo` from the integration test matrix, so a per-dataset CI lane for one collects zero parametrized tests and passes cleanly rather than trying to build data the dataset can't produce. - `src/MEDS_DEV/datasets/__main__.py` raises a clear error when called with `demo=True` against a dataset that doesn't declare a build_demo command (instead of the previous KeyError). No dataset.yaml files change here — those changes ship with the sister per-dataset PRs that depend on this one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17019a0 to
a2d4038
Compare
3260c05 to
896ee12
Compare
896ee12 to
516b8dd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Registers eICU (multi-center US ICU) in
src/MEDS_DEV/datasets/eICU/. Extraction is viaMEDS_extract-eICUfrom the upstreameICU-MEDSpackage — the one new dataset that ships an end-to-end demo recipe.Files added:
dataset.yaml— metadata + bothbuild_fullandbuild_democommands.predicates.yaml— admission/discharge plus a wider lab/vital predicate set.requirements.txt— bumped from the contributor'seICU-MEDS==0.0.1to the latesteICU-MEDS==0.0.2. The 0.0.1 demo end-to-end failed in CI on the bundled PR Add six new datasets (EHRShot, HIRID, INSPIRE, NWICU, SICdb, eICU) + complete AUMCdb #299 — theshard_eventsstage couldn't find its parquet lock under demo mode. Hoping the 0.0.2 release picks up either an upstream bug-fix or aMEDS_transformscompatibility update.refs.bib— Pollard et al., 2018.README.md.Since eICU declares
build_demo, the integration test matrix exercises it (no opt-out needed; #312's mechanism only skips datasets that omitbuild_demoentirely).eICU is not added to any task's
supported_datasetsin this PR (worth a follow-up to wire it intomortality/in_icu/first_24honce we've confirmed the demo extraction produces well-formed predicates).Depends on #312
Targeted at
feat/dataset-demo-availabilityfor now; once #312 merges, this PR retargets todev. (eICU itself doesn't need #312's relaxation, but staying stacked keeps the seven dataset PRs visually consistent.)Test plan
build_demooptional, skip demo-less datasets in tests #312.full-integrationlane:test_datasets_configured[eICU]should actually exercise the demo. If green, eICU is the first new dataset with an actually-testable demo; if red, treat as an upstream blocker.Supersedes / refs
🤖 Generated with Claude Code