Add the HIRID dataset#307
Draft
mmcdermott wants to merge 1 commit into
Draft
Conversation
This was referenced May 13, 2026
Codecov Report✅ All modified and coverable lines are covered by tests. |
b195616 to
a9363fa
Compare
mmcdermott
added a commit
that referenced
this pull request
May 13, 2026
Prerequisite for the per-dataset registration PRs (#305 AUMCdb, #306 EHRShot, #307 HIRID, #308 INSPIRE, #309 NWICU, #310 SICdb, #311 eICU). Most of those datasets' upstream extractors don't ship a publicly installable demo, and the existing registry validation requires every dataset to declare a build_demo command. Switches the convention to: a dataset has a demo iff its commands declare build_demo. Absence is the signal — no separate metadata field. - `test_all_datasets_have_commands` now requires `build_full` (which every dataset still needs) and allows missing `build_demo`. - `tests/conftest.py` drops datasets without `build_demo` from the integration test matrix, so a per-dataset CI lane for one collects zero parametrized tests and passes cleanly rather than trying to build data the dataset can't produce. - `src/MEDS_DEV/datasets/__main__.py` raises a clear error when called with `demo=True` against a dataset that doesn't declare a build_demo command (instead of the previous KeyError). No dataset.yaml files change here — those changes ship with the sister per-dataset PRs that depend on this one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17019a0 to
a2d4038
Compare
a9363fa to
d7f0754
Compare
d7f0754 to
c96cb66
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Registers HIRID (Bern University Hospital ICU dataset) in
src/MEDS_DEV/datasets/HIRID/. Extraction is viaMEDS_extract-HIRIDfrom the upstreamhirid-medspackage (pinned to0.0.2).Files added:
dataset.yaml— metadata +build_fullshelling out toMEDS_extract-HIRID. Nobuild_demo(upstream doesn't ship a demo recipe; Makebuild_demooptional, skip demo-less datasets in tests #312 makes the key optional and ensures integration tests skip it).predicates.yaml— admission/discharge plus a handful of lab predicates. Regex syntax bugs from the contributor's branch (^X*matchedXminus its last char + zero-or-more of that char, notX//.*) were corrected to the standard^X//.*form.requirements.txt—HIRID-MEDS==0.0.2.refs.bib— three entries.README.md.HIRID is not added to any task's
supported_datasets.Known issue (latent, not blocking)
The
icu_admission/icu_dischargepredicates use the same regex ashospital_admission/hospital_discharge(matching^HOSPITAL_ADMISSION.*etc.), so they don't actually distinguish ICU vs hospital events. Since HIRID isn't wired into themortality/in_icu/first_24htask here, this is latent. Worth fixing before listing HIRID in any ICU-specific task — needs HIRID-MEDS knowledge to pick the right code patterns.Depends on #312
#312 makes
build_demooptional in the registry and skips datasets that don't declare it from the integration test matrix. Targeted atfeat/dataset-demo-availabilityfor now; once #312 merges, this PR retargets todev.Test plan
build_demooptional, skip demo-less datasets in tests #312.Supersedes / refs
🤖 Generated with Claude Code