Skip to content

Commit 7bb12b8

Browse files
committed
update README for mimic-iv
1 parent 51c6c4f commit 7bb12b8

2 files changed

Lines changed: 136 additions & 73 deletions

File tree

mimic-iv/README.md

Lines changed: 61 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,67 @@
1-
# MIMIC-IV
1+
# MIMIC-IV Concepts
22

3-
## Brief introduction
3+
* [buildmimic](/mimic-iv/buildmimic) - Scripts to build MIMIC-IV in various relational database management system (RDMS), in particular [postgres](/mimic-iv/buildmimic/postgres) is a popular open source option
4+
* [concepts](/mimic-iv/concepts) - SQL scripts to extract data from MIMIC-IV including demographics, organ failure scores, severity of illness scores, durations of treatment, and so on. These concepts are written in the BigQuery dialect.
5+
* [concepts_postgres](/mimic-iv/concepts_postgres) - above concepts converted to the PostgreSQL dialect
6+
* [concepts_duckdb](/mimic-iv/concepts_duckdb) - above concepts converted to the DuckDB dialect
7+
* [notebooks](/mimic-iv/notebooks) - Jupyter notebooks to demonstrate how to use the data in MIMIC-IV
8+
* [mapping](/mimic-iv/mapping) - Mapping concepts within MIMIC-IV to standard ontologies
49

5-
The repository consists of a number of Structured Query Language (SQL) scripts which build the MIMIC-IV database in a number of systems and extract useful concepts from the raw data. Subfolders include:
10+
## Concepts
611

7-
* [buildmimic](/mimic-iv/buildmimic) - Scripts to build MIMIC-IV in various relational database management system (RDMS), in particular [postgres](/buildmimic/postgres) is a popular open source option
8-
* [concepts](/mimic-iv/concepts) - Useful views/summaries of the data in MIMIC-IV, e.g. demographics, organ failure scores, severity of illness scores, durations of treatment, easier to analyze views, etc. The paper above describes these in detail, and a README in the subfolder lists concepts generated.
12+
The [MIMIC-IV concepts](/mimic-iv/concepts) are written in an SQL syntax compatible with BigQuery.
13+
The BigQuery [physionet-data.mimic_derived](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1sphysionet-data!2smimiciv_derived) dataset contains the output of the SQL scripts present in the concepts folder. These tables are generated using the code in the [latest release on GitHub](https://github.com/MIT-LCP/mimic-code/releases). Access to this dataset is available to MIMIC-IV approved users: see the [cloud instructions](https://mimic.mit.edu/docs/gettingstarted/cloud/).
914

10-
### Concepts
15+
* [List of the concept folders and their content](#concept-index)
16+
* [Generating the concept tables on BigQuery](#generating-the-concepts-on-bigquery)
17+
* [Generating the concept tables on PostgreSQL](#generating-the-concepts-on-postgresql)
1118

12-
The [MIMIC-IV concepts](/mimic-iv/concepts) are written in an SQL syntax compatible with BigQuery. These scripts have been converted to PostgreSQL by a script. To generate the concepts in PostgreSQL, see the [MIMIC-IV postgresql concepts subfolder](/mimic-iv/concepts/postgres).
19+
## Generating the concepts
1320

14-
Tables in the BigQuery `physionet-data.mimic_derived` dataset are generated using the concepts made available in this folder. These tables are generated using the code in the [latest release on GitHub](https://github.com/MIT-LCP/mimic-code/releases).
21+
**If you just want to use the data generated by the concepts scripts, you can access each table as `physionet-data.mimiciv_derived.*` on BigQuery.** See the [cloud instructions](https://mimic.mit.edu/docs/gettingstarted/cloud/) for access details.
22+
23+
These concepts assume the output schema is `mimiciv_derived`. If you would like a different schema, you will need to make a few edits to the scripts.
24+
25+
All concepts are originally written in the **BigQuery Standard SQL Dialect**. A Python package is used to convert these BigQuery scripts into other dialects such as PostgreSQL.
26+
These scripts have been converted to PostgreSQL by a script. To generate the concepts in PostgreSQL, see the [MIMIC-IV postgresql concepts subfolder](/mimic-iv/concepts/postgres).
27+
[See below for how scripts in non-bigquery dialects were generated](#transpile).
28+
29+
### BigQuery
30+
31+
Generating the concepts requires the [Google Cloud SDK](https://cloud.google.com/sdk) to be installed.
32+
A shell script, [make_concepts.sh](/mimic-iv/concepts/make_concepts.sh), is provided which iterates over each folder and creates a table with the same name as the concept file. Concept names have been chosen to avoid collisions.
33+
34+
Generating a single concept can be done by calling the Google Cloud SDK as follows:
35+
36+
```sh
37+
bq query --use_legacy_sql=False --replace --destination_table=my_bigquery_dataset.age < demographics/age.sql
38+
```
39+
40+
### PostgreSQL
41+
42+
You should have already created a database with the MIMIC-IV data loaded in using the build scripts.
43+
The [postgres](/mimic-iv/concepts_postgres) folder contains concepts in a PostgreSQL compatible dialect.
44+
45+
### DuckDB
46+
47+
You should have already created a database with the MIMIC-IV data loaded in using the build scripts.
48+
The [duckdb](/mimic-iv/concepts_duckdb) folder contains concepts in a DuckDB compatible dialect.
49+
50+
## Transpile
51+
52+
The Python package [sqlglot](https://sqlglot.com/) is used to convert from concepts in BigQuery syntax to the other syntaxes ("transpile").
53+
This package parses the SQL into an abstract syntax tree (AST), after which it can be re-written into a specific dialect.
54+
Not all functions are supported by sqlglot, so a helper package was written which adds support for the missing functions.
55+
Most of this process is done the [transpile.py](/src/mimic_utils/transpile.py) file.
56+
57+
An entrypoint is provided for convenience. To transpile a single file, run:
58+
59+
```sh
60+
mimic_utils convert_file mimic-iv/concepts/demographics/age.sql age.sql --destination_dialect duckdb
61+
```
62+
63+
To transpile all files in a folder, run:
64+
65+
```sh
66+
mimic_utils convert_folder mimic-iv/concepts mimic-iv/concepts_duckdb --destination_dialect duckdb
67+
```

mimic-iv/concepts/README.md

Lines changed: 75 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,77 @@
11
# MIMIC-IV Concepts
22

3-
Highlights
4-
5-
* This folder contains SQL scripts to generate useful abstractions of raw MIMIC-IV data ("concepts").
6-
* The BigQuery dataset [mimiciv_derived](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1sphysionet-data!2smimiciv_derived) contains the data generated by this code
7-
* A Python package is used to translate BigQuery dialect code into other SQL dialects
8-
9-
## Organization
10-
11-
Concepts are categorized into folders if possible, otherwise they remain in the top-level directory. All concepts are originally written in the **BigQuery Standard SQL Dialect**. A Python package is used to convert these BigQuery scripts into other dialects such as PostgreSQL.
12-
13-
Currently supported dialects include:
14-
15-
* [bigquery](/mimic-iv/concepts/bigquery)
16-
* [postgres](/mimic-iv/concepts/postgres)
17-
* [duckdb](/mimic-iv/concepts/duckdb)
18-
19-
[See below for how scripts in non-bigquery dialects were generated](#transpile).
20-
21-
The concepts are organized into individual SQL scripts, with each script generating a table. Access to this dataset is available to MIMIC-IV approved users: see the [cloud instructions](https://mimic.mit.edu/docs/gettingstarted/cloud/) on how to access MIMIC-IV on BigQuery (which includes the derived concepts).
22-
23-
* [List of the concept folders and their content](#concept-index)
24-
* [Generating the concept tables on BigQuery](#generating-the-concepts-on-bigquery)
25-
* [Generating the concept tables on PostgreSQL](#generating-the-concepts-on-postgresql)
26-
27-
## Concept Index
28-
29-
## Generating the concepts on BigQuery
30-
31-
**If you just want to use the data here, you can access each table as `physionet-data.mimiciv_derived.*`.**
32-
33-
Generating the concepts requires the [Google Cloud SDK](https://cloud.google.com/sdk) to be installed.
34-
A shell script, [make_concepts.sh](/mimic-iv/concepts/make_concepts.sh), is provided which iterates over each folder and creates a table with the same name as the concept file. Concept names have been chosen to avoid collisions.
35-
36-
Generating a single concept can be done by calling the Google Cloud SDK as follows:
37-
38-
```sh
39-
bq query --use_legacy_sql=False --replace --destination_table=my_bigquery_dataset.age < demographics/age.sql
40-
```
41-
42-
In general the concepts may be generated in any order, except for the *first_day_sofa* and *kdigo_stages* tables, which depend on other tables.
43-
44-
## Generating the concepts on PostgreSQL
45-
46-
You should have already created a database with the MIMIC-IV data loaded in using the build scripts.
47-
The [postgres](/mimic-iv/concepts/postgres) folder contains concepts in a PostgreSQL compatible dialect.
48-
These concepts assume the output schema is `mimiciv_derived`. If you would like a different schema, you will need to make a few edits to the scripts.
49-
50-
## Transpile
51-
52-
The Python package [sqlglot]() is used to convert from one SQL dialect to another ("transpile").
53-
This package parses the SQL into the abstract syntax tree (AST), after which is can be re-written into a specific dialect.
54-
Not all functions are supported by sqlglot, so a helper package was written which adds support for the missing functions.
55-
Most code is provided in the [transpile.py](/src/mimic_utils/transpile.py) file.
56-
57-
An entrypoint is provided for convenience. To transpile a single file, run:
58-
59-
```sh
60-
mimic_utils convert_file mimic-iv/concepts/demographics/age.sql age.sql --destination_dialect duckdb
61-
```
62-
63-
To transpile all files in a folder, run:
64-
65-
```sh
66-
mimic_utils convert_folder mimic-iv/concepts mimic-iv/concepts_duckdb --destination_dialect duckdb
67-
```
3+
Concepts in this folder:
4+
5+
├── comorbidity
6+
│ └── charlson.sql
7+
├── demographics
8+
│ ├── age.sql
9+
│ ├── icustay_detail.sql
10+
│ ├── icustay_hourly.sql
11+
│ ├── icustay_times.sql
12+
│ └── weight_durations.sql
13+
├── firstday
14+
│ ├── first_day_bg.sql
15+
│ ├── first_day_bg_art.sql
16+
│ ├── first_day_gcs.sql
17+
│ ├── first_day_height.sql
18+
│ ├── first_day_lab.sql
19+
│ ├── first_day_rrt.sql
20+
│ ├── first_day_sofa.sql
21+
│ ├── first_day_urine_output.sql
22+
│ ├── first_day_vitalsign.sql
23+
│ └── first_day_weight.sql
24+
├── measurement
25+
│ ├── bg.sql
26+
│ ├── blood_differential.sql
27+
│ ├── cardiac_marker.sql
28+
│ ├── chemistry.sql
29+
│ ├── coagulation.sql
30+
│ ├── complete_blood_count.sql
31+
│ ├── creatinine_baseline.sql
32+
│ ├── enzyme.sql
33+
│ ├── gcs.sql
34+
│ ├── height.sql
35+
│ ├── icp.sql
36+
│ ├── inflammation.sql
37+
│ ├── oxygen_delivery.sql
38+
│ ├── rhythm.sql
39+
│ ├── urine_output.sql
40+
│ ├── urine_output_rate.sql
41+
│ ├── ventilator_setting.sql
42+
│ └── vitalsign.sql
43+
├── medication
44+
│ ├── acei.sql
45+
│ ├── antibiotic.sql
46+
│ ├── dobutamine.sql
47+
│ ├── dopamine.sql
48+
│ ├── epinephrine.sql
49+
│ ├── milrinone.sql
50+
│ ├── neuroblock.sql
51+
│ ├── norepinephrine.sql
52+
│ ├── norepinephrine_equivalent_dose.sql
53+
│ ├── nsaid.sql
54+
│ ├── phenylephrine.sql
55+
│ ├── vasoactive_agent.sql
56+
│ └── vasopressin.sql
57+
├── organfailure
58+
│ ├── kdigo_creatinine.sql
59+
│ ├── kdigo_stages.sql
60+
│ ├── kdigo_uo.sql
61+
│ └── meld.sql
62+
├── score
63+
│ ├── apsiii.sql
64+
│ ├── lods.sql
65+
│ ├── oasis.sql
66+
│ ├── sapsii.sql
67+
│ ├── sirs.sql
68+
│ └── sofa.sql
69+
├── sepsis
70+
│ ├── sepsis3.sql
71+
│ └── suspicion_of_infection.sql
72+
└── treatment
73+
├── code_status.sql
74+
├── crrt.sql
75+
├── invasive_line.sql
76+
├── rrt.sql
77+
└── ventilation.sql

0 commit comments

Comments
 (0)