Add the SICdb dataset#310
Draft
mmcdermott wants to merge 1 commit into
Draft
Conversation
This was referenced May 13, 2026
d393567 to
8aa8d9e
Compare
mmcdermott
added a commit
that referenced
this pull request
May 13, 2026
Prerequisite for the per-dataset registration PRs (#305 AUMCdb, #306 EHRShot, #307 HIRID, #308 INSPIRE, #309 NWICU, #310 SICdb, #311 eICU). Most of those datasets' upstream extractors don't ship a publicly installable demo, and the existing registry validation requires every dataset to declare a build_demo command. Switches the convention to: a dataset has a demo iff its commands declare build_demo. Absence is the signal — no separate metadata field. - `test_all_datasets_have_commands` now requires `build_full` (which every dataset still needs) and allows missing `build_demo`. - `tests/conftest.py` drops datasets without `build_demo` from the integration test matrix, so a per-dataset CI lane for one collects zero parametrized tests and passes cleanly rather than trying to build data the dataset can't produce. - `src/MEDS_DEV/datasets/__main__.py` raises a clear error when called with `demo=True` against a dataset that doesn't declare a build_demo command (instead of the previous KeyError). No dataset.yaml files change here — those changes ship with the sister per-dataset PRs that depend on this one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17019a0 to
a2d4038
Compare
8aa8d9e to
197591d
Compare
197591d to
13eb783
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Registers SICdb (Salzburg Intensive Care database) in
src/MEDS_DEV/datasets/SICdb/. Extraction is viaMEDS_extract-SICdbfrom the upstreamsicdb-medspackage (pinned to0.0.4).Files added:
dataset.yaml— metadata +build_fullshelling out toMEDS_extract-SICdb. Nobuild_demo(upstream doesn't ship a demo recipe; Makebuild_demooptional, skip demo-less datasets in tests #312 makes the key optional and ensures integration tests skip it).predicates.yaml— admission/discharge plus a couple of lab predicates. Regex syntax bugs (^X*form) and commented-out predicate stubs from the contributor's branch were cleaned up.requirements.txt—SICdb_MEDS==0.0.4(note: underscore in package name matches upstream).refs.bib,README.md.SICdb is not added to any task's
supported_datasets.Known issue (latent, not blocking)
The
icu_admission/icu_dischargepredicates use the same regex ashospital_admission/hospital_discharge(^ADMISSION//.*and^DISCHARGE//.*), so they don't distinguish ICU vs hospital events. Since SICdb isn't wired into themortality/in_icu/first_24htask here, this is latent — but worth fixing before listing SICdb in any ICU-specific task.Depends on #312
#312 makes
build_demooptional in the registry and skips datasets that don't declare it from the integration test matrix. Targeted atfeat/dataset-demo-availabilityfor now; once #312 merges, this PR retargets todev.Test plan
build_demooptional, skip demo-less datasets in tests #312.Supersedes / refs
🤖 Generated with Claude Code