Skip to content

Add the NWICU dataset#309

Draft
mmcdermott wants to merge 1 commit into
devfrom
feat/add-dataset-nwicu
Draft

Add the NWICU dataset#309
mmcdermott wants to merge 1 commit into
devfrom
feat/add-dataset-nwicu

Conversation

@mmcdermott
Copy link
Copy Markdown
Collaborator

@mmcdermott mmcdermott commented May 13, 2026

Summary

Registers NWICU (Northwestern ICU) in src/MEDS_DEV/datasets/NWICU/. Extraction is via NWICU_MEDS from the upstream nwicu-meds package (pinned to 0.0.11).

Files added/modified:

  • dataset.yaml — metadata + build_full shelling out to NWICU_MEDS. No build_demo (upstream doesn't ship a demo recipe; Make build_demo optional, skip demo-less datasets in tests #312 makes the key optional and ensures integration tests skip it).
  • predicates.yaml — admission/discharge plus a few lab predicates.
  • requirements.txtNWICU-MEDS==0.0.11.
  • refs.bib, README.md.
  • tasks/mortality/in_icu/first_24h.yaml — adds NWICU to supported_datasets. NWICU defines the required icu_admission / icu_discharge predicates.

Depends on #312

#312 makes build_demo optional in the registry and skips datasets that don't declare it from the integration test matrix. Targeted at feat/dataset-demo-availability for now; once #312 merges, this PR retargets to dev.

Test plan

Supersedes / refs

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

Files with missing lines Coverage Δ
src/MEDS_DEV/datasets/__init__.py 97.43% <100.00%> (+0.06%) ⬆️

@mmcdermott mmcdermott changed the base branch from dev to feat/dataset-demo-availability May 13, 2026 16:33
@mmcdermott mmcdermott force-pushed the feat/add-dataset-nwicu branch from 54accc2 to cc9e1b1 Compare May 13, 2026 16:33
mmcdermott added a commit that referenced this pull request May 13, 2026
Prerequisite for the per-dataset registration PRs (#305 AUMCdb, #306
EHRShot, #307 HIRID, #308 INSPIRE, #309 NWICU, #310 SICdb, #311 eICU).
Most of those datasets' upstream extractors don't ship a publicly
installable demo, and the existing registry validation requires every
dataset to declare a build_demo command.

Switches the convention to: a dataset has a demo iff its commands
declare build_demo. Absence is the signal — no separate metadata field.

- `test_all_datasets_have_commands` now requires `build_full` (which
  every dataset still needs) and allows missing `build_demo`.
- `tests/conftest.py` drops datasets without `build_demo` from the
  integration test matrix, so a per-dataset CI lane for one collects
  zero parametrized tests and passes cleanly rather than trying to
  build data the dataset can't produce.
- `src/MEDS_DEV/datasets/__main__.py` raises a clear error when called
  with `demo=True` against a dataset that doesn't declare a
  build_demo command (instead of the previous KeyError).

No dataset.yaml files change here — those changes ship with the sister
per-dataset PRs that depend on this one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mmcdermott mmcdermott force-pushed the feat/dataset-demo-availability branch from 17019a0 to a2d4038 Compare May 13, 2026 17:20
@mmcdermott mmcdermott force-pushed the feat/add-dataset-nwicu branch from cc9e1b1 to 0f49a92 Compare May 13, 2026 17:22
@mmcdermott mmcdermott changed the base branch from feat/dataset-demo-availability to dev May 13, 2026 17:43
@mmcdermott mmcdermott force-pushed the feat/add-dataset-nwicu branch from 0f49a92 to 365b0b9 Compare May 13, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant