|
| 1 | +# Examples |
| 2 | + |
| 3 | +This directory contains worked examples that demonstrate the [RWD-Lineage Data Standard](../documents/RWD-Lineage_Data_Standard_Specification.md) — a machine-readable CDISC data exchange format for capturing the lineage of Real-World Data (RWD) as it is transformed into SDTM datasets. |
| 4 | + |
| 5 | +Each example provides a complete, self-contained package: source EHR data, target SDTM datasets, a Define-XML 2.1 file with the `rwdl` namespace extension, and a companion `rwd-lineage.xml` file that traces every cell in the SDTM output back to its origin in the source data. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Quick Start |
| 10 | + |
| 11 | +``` |
| 12 | +examples/ |
| 13 | +├── README.md ← You are here |
| 14 | +├── example1/ ← CE domain: diagnoses, vitals, and clinical notes |
| 15 | +│ ├── README.md |
| 16 | +│ ├── Example1.xlsx |
| 17 | +│ └── data/ |
| 18 | +│ ├── define/ |
| 19 | +│ │ ├── define.xml |
| 20 | +│ │ └── rwd-lineage.xml |
| 21 | +│ ├── sdtm/ |
| 22 | +│ │ └── ce.csv |
| 23 | +│ └── source/ |
| 24 | +│ ├── pt_dx.csv |
| 25 | +│ ├── vitals.csv |
| 26 | +│ └── notes.csv |
| 27 | +└── example2/ ← AE + LB domains: lab results and adverse events |
| 28 | + ├── README.md |
| 29 | + ├── Example2.xlsx |
| 30 | + └── data/ |
| 31 | + ├── define/ |
| 32 | + │ ├── define.xml |
| 33 | + │ └── rwd-lineage.xml |
| 34 | + ├── sdtm/ |
| 35 | + │ ├── AE.csv |
| 36 | + │ └── LB.csv |
| 37 | + └── source/ |
| 38 | + └── LabResults.csv |
| 39 | +``` |
| 40 | + |
| 41 | +### Validating an example |
| 42 | + |
| 43 | +From the repository root: |
| 44 | + |
| 45 | +```bash |
| 46 | +# Validate the RWD-Lineage XML |
| 47 | +python3 tools/validate.py rwd-lineage examples/example1/data/define/rwd-lineage.xml |
| 48 | + |
| 49 | +# Validate the Define-XML (requires lxml) |
| 50 | +python3 tools/validate.py define-xml examples/example1/data/define/define.xml |
| 51 | + |
| 52 | +# Check that every SDTM cell has lineage coverage |
| 53 | +python3 tools/validate.py coverage examples/example1/data/sdtm examples/example1/data/define/rwd-lineage.xml |
| 54 | +``` |
| 55 | + |
| 56 | +See the [repository README](../README.md) for full validation instructions and requirements. |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## Example Summaries |
| 61 | + |
| 62 | +### Example 1 — Clinical Events (CE) from EHR Diagnoses, Vitals, and Notes |
| 63 | + |
| 64 | +**SDTM domain:** CE (Clinical Events) |
| 65 | +**Source tables:** `pt_dx` (ICD-10 diagnoses), `vitals` (blood pressure, BMI), `notes` (free-text clinical notes) |
| 66 | +**Subjects:** 2 (001, 002) × 2 prespecified conditions = 4 CE records |
| 67 | +**Lineage entries:** 20 `<MapID>` elements |
| 68 | + |
| 69 | +This example models two prespecified clinical events — **hypertension** and **acute myocardial infarction** — and shows how each `CEOCCUR` determination draws on multiple evidence sources: |
| 70 | + |
| 71 | +| Transformation Type | Count | Description | |
| 72 | +|---------------------|-------|-------------| |
| 73 | +| `DirectMap` | 7 | One-to-one mapping of a source value to a target field (e.g., ICD-10 code → `CEOCCUR` evidence) | |
| 74 | +| `AfterIndexDate` | 5 | Temporal filter ensuring the source event falls within the study follow-up period | |
| 75 | +| `NLPExtraction` | 5 | Structured data extracted from free-text clinical notes via NLP | |
| 76 | +| `FilterByValue` | 3 | Conditional inclusion based on a source value (e.g., blood pressure ≥ threshold) | |
| 77 | + |
| 78 | +**Key concepts illustrated:** |
| 79 | +- Multi-source evidence: a single SDTM cell (`CEOCCUR`) can trace to diagnosis codes, vital-sign measurements, *and* NLP-extracted findings simultaneously. |
| 80 | +- Prespecified event algorithms: the lineage captures each step of a composite clinical algorithm (diagnosis code check → temporal filter → vitals threshold → NLP confirmation). |
| 81 | +- NLP lineage: free-text clinical notes are treated as a legitimate source, with the `NLPExtraction` transformation type documenting the extraction. |
| 82 | + |
| 83 | +→ See [`example1/README.md`](example1/README.md) for the full algorithm definitions. |
| 84 | + |
| 85 | +--- |
| 86 | + |
| 87 | +### Example 2 — Laboratory Results (LB) and Adverse Events (AE) from EHR Labs |
| 88 | + |
| 89 | +**SDTM domains:** LB (Laboratory Test Results), AE (Adverse Events) |
| 90 | +**Source table:** `LabResults` (LOINC-coded lab results with raw values in original units) |
| 91 | +**Subjects:** 2 (001, 002) × 3 liver-enzyme tests × 2 visits = 12 LB records + 1 AE record |
| 92 | +**Lineage entries:** 99 `<MapID>` elements |
| 93 | + |
| 94 | +This example traces LOINC-coded EHR lab data through unit conversion into the SDTM LB domain, then derives an adverse event (hepatic enzyme elevation) in the AE domain: |
| 95 | + |
| 96 | +| Transformation Type | Count | Description | |
| 97 | +|--------------------------|-------|-------------| |
| 98 | +| `DirectMap` | 39 | One-to-one mappings (LOINC → `LBTESTCD`, visit date → `LBDTC`, patient ID → `USUBJID`, etc.) | |
| 99 | +| `LabValueParsing` | 24 | Parsing composite result strings (e.g., `"0.3507 µkat/L"`) into numeric value and unit components | |
| 100 | +| `UnitConversion` | 24 | Converting original units (µkat/L) to standard units (U/L) with stored results in `LBSTRES`/`LBSTRESU` | |
| 101 | +| `ElevatedLiverEnzyme` | 12 | Algorithmic derivation identifying elevated ALT/AST/ALP to produce the AE record | |
| 102 | + |
| 103 | +**Key concepts illustrated:** |
| 104 | +- Multi-step transformations: a single lab result passes through parsing → conversion → standardization, each step recorded as a separate lineage entry. |
| 105 | +- Cross-domain derivation: the AE domain record is derived from the LB domain, which is itself derived from source EHR data — the lineage captures both hops. |
| 106 | +- High coverage density: 99 lineage entries across 13 SDTM records demonstrates cell-level traceability at scale, including every standard-range indicator (`LBSTNRLO`, `LBSTNRHI`, `LBNRIND`). |
| 107 | + |
| 108 | +→ See [`example2/README.md`](example2/README.md) for the full algorithm definitions. |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## Anatomy of an Example |
| 113 | + |
| 114 | +Every example follows the same internal structure: |
| 115 | + |
| 116 | +``` |
| 117 | +exampleN/ |
| 118 | +├── README.md # Scenario description, algorithm definitions, file inventory |
| 119 | +├── ExampleN.xlsx # Human-readable workbook with all tables and lineage in spreadsheet form |
| 120 | +└── data/ |
| 121 | + ├── define/ |
| 122 | + │ ├── define.xml # Define-XML 2.1 with rwdl namespace extension |
| 123 | + │ └── rwd-lineage.xml # RWD-Lineage XML: the cell-level lineage map |
| 124 | + ├── sdtm/ |
| 125 | + │ └── *.csv # Target SDTM domain datasets |
| 126 | + └── source/ |
| 127 | + └── *.csv # Source EHR/RWD tables |
| 128 | +``` |
| 129 | + |
| 130 | +### `define.xml` |
| 131 | + |
| 132 | +A standard [CDISC Define-XML 2.1](https://www.cdisc.org/standards/data-exchange/define-xml) file with one addition — the `rwdl` namespace extension that references the companion lineage file: |
| 133 | + |
| 134 | +```xml |
| 135 | +<ODM xmlns:rwdl="http://www.cdisc.org/ns/rwd-lineage/v1" ...> |
| 136 | + <Study> |
| 137 | + <MetaDataVersion> |
| 138 | + <rwdl:lineage> |
| 139 | + <rwdl:ref leafID="LF.RWDLINEAGE">rwd-lineage.xml</rwdl:ref> |
| 140 | + </rwdl:lineage> |
| 141 | + <!-- Standard ItemGroupDef / ItemDef elements follow --> |
| 142 | + </MetaDataVersion> |
| 143 | + </Study> |
| 144 | +</ODM> |
| 145 | +``` |
| 146 | + |
| 147 | +### `rwd-lineage.xml` |
| 148 | + |
| 149 | +The core deliverable. Each `<MapID>` element represents one source-to-target cell mapping: |
| 150 | + |
| 151 | +```xml |
| 152 | +<RWDLineage xmlns="http://www.cdisc.org/ns/rwd-lineage/v1" ...> |
| 153 | + <MapID uuid="35060134-fc2f-4cdf-9abe-491924739bd5"> |
| 154 | + <Transformation type="DirectMap">Direct Map</Transformation> |
| 155 | + <Source> |
| 156 | + <Coordinate storage="Filesystem" structure="Tabular"> |
| 157 | + <URI>...source/pt_dx.csv</URI> |
| 158 | + <RowIndex>4</RowIndex> |
| 159 | + <ColumnName>ICD10</ColumnName> |
| 160 | + </Coordinate> |
| 161 | + </Source> |
| 162 | + <Target> |
| 163 | + <Coordinate storage="Filesystem" structure="Tabular"> |
| 164 | + <URI>...sdtm/ce.csv</URI> |
| 165 | + <RowIndex>2</RowIndex> |
| 166 | + <ColumnName>CEOCCUR</ColumnName> |
| 167 | + </Coordinate> |
| 168 | + </Target> |
| 169 | + </MapID> |
| 170 | + <!-- ... --> |
| 171 | +</RWDLineage> |
| 172 | +``` |
| 173 | + |
| 174 | +Key attributes are documented in the [RWD-Lineage Data Standard Specification](../documents/RWD-Lineage_Data_Standard_Specification.md). |
| 175 | + |
| 176 | +### `ExampleN.xlsx` |
| 177 | + |
| 178 | +A companion Excel workbook containing all source tables, SDTM output tables, and the lineage mappings in tabular form. This is provided for human review and is not a normative artifact — the XML files are the machine-readable standard. |
| 179 | + |
| 180 | +--- |
| 181 | + |
| 182 | +## Transformation Types Used Across Examples |
| 183 | + |
| 184 | +| Type | Example 1 | Example 2 | Description | |
| 185 | +|------|-----------|-----------|-------------| |
| 186 | +| `DirectMap` | ✓ | ✓ | One-to-one value copy from source to target | |
| 187 | +| `AfterIndexDate` | ✓ | | Temporal filter relative to a study index date | |
| 188 | +| `FilterByValue` | ✓ | | Conditional inclusion based on source data value | |
| 189 | +| `NLPExtraction` | ✓ | | Value extracted from unstructured text via NLP | |
| 190 | +| `LabValueParsing` | | ✓ | Numeric/unit parsing from composite lab result strings | |
| 191 | +| `UnitConversion` | | ✓ | Conversion between measurement unit systems | |
| 192 | +| `ElevatedLiverEnzyme` | | ✓ | Algorithmic derivation of adverse events from lab data | |
| 193 | + |
| 194 | +--- |
| 195 | + |
| 196 | +## Contributing a New Example |
| 197 | + |
| 198 | +New examples are welcome. To maintain consistency: |
| 199 | + |
| 200 | +1. Create a directory named `exampleN/` following the structure above. |
| 201 | +2. Include a `README.md` with the scenario description, algorithm, and file inventory. |
| 202 | +3. Provide both CSV data files and a companion `.xlsx` workbook. |
| 203 | +4. Ensure the `rwd-lineage.xml` passes validation: |
| 204 | + ```bash |
| 205 | + python3 tools/validate.py rwd-lineage examples/exampleN/data/define/rwd-lineage.xml |
| 206 | + ``` |
| 207 | +5. Ensure full lineage coverage of all SDTM cells: |
| 208 | + ```bash |
| 209 | + python3 tools/validate.py coverage examples/exampleN/data/sdtm examples/exampleN/data/define/rwd-lineage.xml |
| 210 | + ``` |
| 211 | + |
| 212 | +See [CONTRIBUTING.md](../CONTRIBUTING.md) for general contribution guidelines. |
0 commit comments