Skip to content

Add the SICdb dataset#310

Draft
mmcdermott wants to merge 1 commit into
devfrom
feat/add-dataset-sicdb
Draft

Add the SICdb dataset#310
mmcdermott wants to merge 1 commit into
devfrom
feat/add-dataset-sicdb

Conversation

@mmcdermott
Copy link
Copy Markdown
Collaborator

@mmcdermott mmcdermott commented May 13, 2026

Summary

Registers SICdb (Salzburg Intensive Care database) in src/MEDS_DEV/datasets/SICdb/. Extraction is via MEDS_extract-SICdb from the upstream sicdb-meds package (pinned to 0.0.4).

Files added:

  • dataset.yaml — metadata + build_full shelling out to MEDS_extract-SICdb. No build_demo (upstream doesn't ship a demo recipe; Make build_demo optional, skip demo-less datasets in tests #312 makes the key optional and ensures integration tests skip it).
  • predicates.yaml — admission/discharge plus a couple of lab predicates. Regex syntax bugs (^X* form) and commented-out predicate stubs from the contributor's branch were cleaned up.
  • requirements.txtSICdb_MEDS==0.0.4 (note: underscore in package name matches upstream).
  • refs.bib, README.md.

SICdb is not added to any task's supported_datasets.

Known issue (latent, not blocking)

The icu_admission / icu_discharge predicates use the same regex as hospital_admission / hospital_discharge (^ADMISSION//.* and ^DISCHARGE//.*), so they don't distinguish ICU vs hospital events. Since SICdb isn't wired into the mortality/in_icu/first_24h task here, this is latent — but worth fixing before listing SICdb in any ICU-specific task.

Depends on #312

#312 makes build_demo optional in the registry and skips datasets that don't declare it from the integration test matrix. Targeted at feat/dataset-demo-availability for now; once #312 merges, this PR retargets to dev.

Test plan

Supersedes / refs

🤖 Generated with Claude Code

@mmcdermott mmcdermott changed the base branch from dev to feat/dataset-demo-availability May 13, 2026 16:34
@mmcdermott mmcdermott force-pushed the feat/add-dataset-sicdb branch from d393567 to 8aa8d9e Compare May 13, 2026 16:34
mmcdermott added a commit that referenced this pull request May 13, 2026
Prerequisite for the per-dataset registration PRs (#305 AUMCdb, #306
EHRShot, #307 HIRID, #308 INSPIRE, #309 NWICU, #310 SICdb, #311 eICU).
Most of those datasets' upstream extractors don't ship a publicly
installable demo, and the existing registry validation requires every
dataset to declare a build_demo command.

Switches the convention to: a dataset has a demo iff its commands
declare build_demo. Absence is the signal — no separate metadata field.

- `test_all_datasets_have_commands` now requires `build_full` (which
  every dataset still needs) and allows missing `build_demo`.
- `tests/conftest.py` drops datasets without `build_demo` from the
  integration test matrix, so a per-dataset CI lane for one collects
  zero parametrized tests and passes cleanly rather than trying to
  build data the dataset can't produce.
- `src/MEDS_DEV/datasets/__main__.py` raises a clear error when called
  with `demo=True` against a dataset that doesn't declare a
  build_demo command (instead of the previous KeyError).

No dataset.yaml files change here — those changes ship with the sister
per-dataset PRs that depend on this one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mmcdermott mmcdermott force-pushed the feat/dataset-demo-availability branch from 17019a0 to a2d4038 Compare May 13, 2026 17:20
@mmcdermott mmcdermott force-pushed the feat/add-dataset-sicdb branch from 8aa8d9e to 197591d Compare May 13, 2026 17:22
@mmcdermott mmcdermott changed the base branch from feat/dataset-demo-availability to dev May 13, 2026 17:43
@mmcdermott mmcdermott force-pushed the feat/add-dataset-sicdb branch from 197591d to 13eb783 Compare May 13, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant