Skip to content

Commit 8048c83

Browse files
mmcdermottrvandewaterclaude
committed
Register the INSPIRE dataset
INSPIRE (perioperative medicine) — `MEDS_extract-INSPIRE` from the upstream `inspire-meds` package handles the extraction. No demo recipe upstream, so marked `demo_available: false`. Also lands the shared `demo_available` registry mechanism (identical across sister per-dataset PRs). Replicated from the closed bundled PR #299 (which itself replicated @rvandewater's #258). Co-Authored-By: Robin P. van de Water <rvandewater@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ed1d56c commit 8048c83

8 files changed

Lines changed: 139 additions & 3 deletions

File tree

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# INSPIRE: a publicly available research dataset for perioperative medicine
2+
3+
## Description
4+
5+
The INSPIRE dataset is a publicly available research dataset in perioperative medicine, which includes approximately 130,000 cases (50% of all surgical cases) who underwent anesthesia for surgery at an academic institution in South Korea between 2011 and 2020. This comprehensive dataset includes patient characteristics such as age, sex, American Society of Anesthesiologists physical status classification, diagnosis, surgical procedure code, department, and type of anesthesia. It also includes vital signs in the operating theatre, general wards, and intensive care units (ICUs), laboratory results from six months before admission to six months after discharge, and medication during hospitalization. Complications include total hospital and ICU length of stay and in-hospital death.[1]
6+
7+
## Access Requirements
8+
9+
Taken from [PhysioNet](https://physionet.org/content/inspire/1.3/):
10+
11+
- **Access Policy**: Complete the credentialed data access requirements on PhysioNet[1]
12+
- **License (for files)**: PhysioNet Credentialed Health Data License Version 1.5.0[1]
13+
- **Data Use Agreement**: Agreement requires verified institutional affiliation and commitment to use data solely for lawful scientific research[1]
14+
- **Required training**: Valid CITI training certification in human research subject protection and HIPAA regulations[1]
15+
- **License Term**: 3 years from account creation date[1]
16+
- **Code Sharing**: Agreement to contribute code associated with publications to open research repository[1]
17+
18+
## Supported Tasks
19+
20+
INSPIRE includes several classification tasks organized into categories such as:[1]
21+
22+
**Operational Outcomes:**
23+
24+
- `tasks/mortality/long_length_of_stay.yaml`
25+
- `tasks/readmission/30_day_readmission.yaml`
26+
- `tasks/transfer/icu_transfer.yaml`
27+
28+
## MEDS-transformation
29+
30+
The INSPIRE ETL is found at https://github.com/rvandewater/INSPIRE_MEDS.
31+
32+
## Sources
33+
34+
1. [INSPIRE Physionet Website](https://physionet.org/content/inspire/1.3/)
35+
36+
## Disclaimer
37+
38+
Please refer to the data owners and the most up-to-date information when using this dataset in your research. The INSPIRE dataset has not been reviewed or approved by the Food and Drug Administration and is for non-clinical, research and education use only.[1]
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
metadata:
2+
demo_available: false
3+
description: >-
4+
A MEDS version of INSPIRE.
5+
links:
6+
- https://github.com/rvandewater/INSPIRE_MEDS
7+
- https://physionet.org/content/inspire/
8+
contacts:
9+
- name: "Robin P. van de Water"
10+
github_username: "rvandewater"
11+
12+
commands:
13+
build_full: >-
14+
MEDS_extract-INSPIRE
15+
raw_input_dir="{temp_dir}/raw"
16+
pre_MEDS_dir="{temp_dir}/pre_MEDS"
17+
MEDS_cohort_dir="{output_dir}"
18+
log_dir="{output_dir}/.pipeline_logs"
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
predicates:
2+
hospital_admission:
3+
code: { regex: "^ADMISSION//.*" }
4+
hospital_discharge:
5+
code: { regex: "^HOSPITAL_DISCHARGE//.*" }
6+
7+
icu_admission:
8+
code: { regex: "^ICU_ADMISSION//.*" }
9+
icu_discharge:
10+
code: { regex: "^ICU_DISCHARGE//.*" }
11+
12+
creatinine_mgdl:
13+
code: { any: ["LAB//mg/dL//creatinine"] }
14+
15+
#TODO: convert units from mmol/L to mg/dL
16+
sodium_meql:
17+
code: { any: ["LAB//mmol/L//sodium"] }
18+
19+
wbc_kul:
20+
code: { any: ["LAB///nL//wbc"] }
21+
22+
platelets_kul:
23+
code: { any: ["LAB///nL//platelet"] }
24+
25+
map_mmhg:
26+
code:
27+
{
28+
any:
29+
[
30+
"VITAL//pap_mbp//mmHg",
31+
"VITAL//nibp_mbp//mmHg",
32+
"VITAL//art_mbp//mmHg",
33+
"WARD_VITAL//mmHg//nibp_mbp",
34+
],
35+
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
@article{limINSPIREPubliclyAvailable2024,
2+
title = {{{INSPIRE}}, a Publicly Available Research Dataset for Perioperative Medicine},
3+
author = {Lim, Leerang and Lee, Hyeonhoon and Jung, Chul-Woo and Sim, Dayeon and Borrat, Xavier and Pollard, Tom J. and Celi, Leo A. and Mark, Roger G. and Vistisen, Simon T. and Lee, Hyung-Chul},
4+
year = 2024,
5+
month = jun,
6+
journal = {Scientific Data},
7+
volume = {11},
8+
number = {1},
9+
pages = {655},
10+
publisher = {Nature Publishing Group},
11+
issn = {2052-4463},
12+
doi = {10.1038/s41597-024-03517-4},
13+
urldate = {2024-11-06},
14+
abstract = {We present the INSPIRE dataset, a publicly available research dataset in perioperative medicine, which includes approximately 130,000 surgical operations at an academic institution in South Korea over a ten-year period between 2011 and 2020. This comprehensive dataset includes patient characteristics such as age, sex, American Society of Anesthesiologists physical status classification, diagnosis, surgical procedure code, department, and type of anaesthesia. The dataset also includes vital signs in the operating theatre, general wards, and intensive care units (ICUs), laboratory results from six months before admission to six months after discharge, and medication during hospitalisation. Complications include total hospital and ICU length of stay and in-hospital death. We hope this dataset will inspire collaborative research and development in perioperative medicine and serve as a reproducible external validation dataset to improve surgical outcomes.},
15+
copyright = {2024 The Author(s)},
16+
langid = {english},
17+
keywords = {Outcomes research,Public health,Risk factors},
18+
}
19+
20+
@misc{robinvandewaterRvandewaterINSPIRE_MEDSZenodo2025,
21+
title = {Rvandewater/{{INSPIRE}}\_{{MEDS}}: {{Zenodo}}},
22+
shorttitle = {Rvandewater/{{INSPIRE}}\_{{MEDS}}},
23+
author = {{Robin van de Water}},
24+
year = 2025,
25+
month = feb,
26+
doi = {10.5281/ZENODO.14891940},
27+
urldate = {2025-10-16},
28+
abstract = {First Zenodo release.},
29+
copyright = {MIT License},
30+
howpublished = {Zenodo}
31+
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
INSPIRE-MEDS==0.0.11

src/MEDS_DEV/datasets/__init__.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,23 +38,28 @@ class DatasetMetadata(Metadata):
3838
access_policy: The level of accessibility of the dataset. Limited to the values in the AccessPolicy
3939
enum.
4040
access_details: A string describing the access policy in more detail. May be empty.
41+
demo_available: Whether the dataset has a usable ``build_demo`` recipe. Datasets where the upstream
42+
extractor offers no public demo set this to False; integration tests skip them and they are
43+
excluded from the CI dataset matrix.
4144
4245
Examples:
4346
>>> DatasetMetadata(description="foo", contacts=[{"name": "bar"}]) # doctest: +NORMALIZE_WHITESPACE
4447
DatasetMetadata(description='foo',
4548
contacts=[Contact(name='bar', email=None, github_username=None)],
4649
links=None,
4750
access_policy=<AccessPolicy.PRIVATE_SINGLE_USE: 'private_single_use'>,
48-
access_details=None)
51+
access_details=None,
52+
demo_available=True)
4953
>>> DatasetMetadata(
5054
... description="foo", contacts=[{"name": "bar"}], access_policy="public_unrestricted",
51-
... access_details="baz"
55+
... access_details="baz", demo_available=False
5256
... ) # doctest: +NORMALIZE_WHITESPACE
5357
DatasetMetadata(description='foo',
5458
contacts=[Contact(name='bar', email=None, github_username=None)],
5559
links=None,
5660
access_policy=<AccessPolicy.PUBLIC_UNRESTRICTED: 'public_unrestricted'>,
57-
access_details='baz')
61+
access_details='baz',
62+
demo_available=False)
5863
>>> DatasetMetadata(
5964
... description="foo", contacts=[{"name": "bar"}], access_policy="foo"
6065
... ) # doctest: +NORMALIZE_WHITESPACE
@@ -74,6 +79,7 @@ class DatasetMetadata(Metadata):
7479

7580
access_policy: AccessPolicy = AccessPolicy.PRIVATE_SINGLE_USE
7681
access_details: str | None = None
82+
demo_available: bool = True
7783

7884
def __post_init__(self):
7985
super().__post_init__()

tests/conftest.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -325,6 +325,9 @@ def get_opts(config, opt: str) -> list[str]:
325325
if isinstance(out, dict):
326326
out = list(out.keys())
327327

328+
if opt == "dataset":
329+
out = [name for name in out if DATASETS[name]["metadata"].demo_available]
330+
328331
return out
329332

330333

tests/test_registry_validation.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ def test_all_datasets_have_commands():
3131
for name, dataset in DATASETS.items():
3232
commands = dataset.get("commands")
3333
assert commands is not None, f"Dataset {name} missing commands"
34+
metadata = dataset.get("metadata")
35+
if metadata is not None and not metadata.demo_available:
36+
# Datasets that explicitly opt out of a demo recipe don't need build_demo.
37+
continue
3438
assert "build_demo" in commands, f"Dataset {name} missing build_demo command"
3539

3640

0 commit comments

Comments
 (0)