|
1 | | -# Example 2 |
| 1 | +# Example 2 — Laboratory Results and Adverse Events from EHR Lab Data |
2 | 2 |
|
3 | | -Contains a simplified example of input RWD from EHR lab results and output SDTM. |
| 3 | +This example demonstrates RWD lineage traceability for two SDTM domains — **LB (Laboratory Test Results)** and **AE (Adverse Events)** — derived from a single EHR source table of LOINC-coded lab results. It showcases multi-step transformations (value parsing, unit conversion) and cross-domain derivation (lab abnormalities triggering an adverse event record). |
4 | 4 |
|
5 | | -**SDTM: LB** contains an excerpt of a laboratory test results table derived from raw EHR lab data. |
6 | | -**SDTM: AE** contains an adverse event record for a subject with elevated liver enzymes, derived from the LB domain. |
7 | | -**LabResults** contains raw lab results from an EHR system including LOINC-coded lab tests, visit dates, and results in original units. |
| 5 | +## Scenario |
8 | 6 |
|
9 | | -### Algorithm |
| 7 | +A study collects liver-enzyme lab panels (ALT, AST, ALP) for two subjects across two visits each. Raw EHR lab results arrive as composite strings with values and units combined (e.g., `"0.3507 µkat/L"`). These are parsed, converted to standard units, and mapped into the SDTM LB domain. When a subject's results cross the normal range threshold, a hepatic enzyme elevation adverse event is derived in the AE domain. |
| 8 | + |
| 9 | +### Source Data |
| 10 | + |
| 11 | +| Table | Columns | Records | Description | |
| 12 | +|-------|---------|---------|-------------| |
| 13 | +| `LabResults.csv` | `PATID`, `LOINC Code`, `Lab Test`, `Visit Date`, `Lab Result` | 12 | Raw LOINC-coded lab results from the EHR, with results as composite value+unit strings | |
| 14 | + |
| 15 | +### Target Data |
| 16 | + |
| 17 | +| Table | Columns | Records | Description | |
| 18 | +|-------|---------|---------|-------------| |
| 19 | +| `LB.csv` | `USUBJID`, `LBSEQ`, `LBTESTCD`, `LBTEST`, `LBDTC`, `LBORRES`, `LBORRESU`, `LBSTRES`, `LBSTRESU`, `LBSTNRLO`, `LBSTNRHI`, `LBNRIND` | 12 | SDTM LB domain — 2 subjects × 3 tests × 2 visits | |
| 20 | +| `AE.csv` | `USUBJID`, `AESEQ`, `AETERM`, `AEDECOD`, `AELLTCD`, `AESER`, `AEREL`, `AESTDTC` | 1 | SDTM AE domain — 1 hepatic enzyme elevation event for subject 002 | |
| 21 | + |
| 22 | +### Lab Tests Covered |
| 23 | + |
| 24 | +| LOINC Code | Test Name | SDTM `LBTESTCD` | |
| 25 | +|------------|-----------|------------------| |
| 26 | +| 1742-6 | Alanine Aminotransferase | ALT | |
| 27 | +| 1920-8 | Aspartate Aminotransferase | AST | |
| 28 | +| 1775-6 | Alkaline Phosphatase | ALP | |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +## Algorithms |
| 33 | + |
| 34 | +### Laboratory Test Results (LB) |
10 | 35 |
|
11 | | -**Laboratory Test Results (LB)** |
12 | 36 | Lab results from the EHR are mapped to the SDTM LB domain through the following steps: |
13 | | -1. LOINC codes and lab test names from the source are mapped directly to `LBTESTCD`/`LBTEST` |
14 | | -2. Visit dates are mapped directly to `LBDTC` |
15 | | -3. Raw lab result strings (e.g. `0.3507 µkat/L`) are parsed to extract numeric values into `LBORRES`/`LBORRESU` (Lab Value Parsing) |
16 | | -4. Original units (µkat/L) are converted to standard units (U/L) and stored in `LBSTRES`/`LBSTRESU` (Unit Conversion) |
17 | 37 |
|
18 | | -**Adverse Events (AE)** |
| 38 | +1. **Direct mapping**: LOINC codes and lab test names are mapped to `LBTESTCD`/`LBTEST`; visit dates are mapped to `LBDTC` |
| 39 | +2. **Lab value parsing**: Raw result strings (e.g., `"0.3507 µkat/L"`) are parsed to extract the numeric value into `LBORRES` and the unit into `LBORRESU` |
| 40 | +3. **Unit conversion**: Original units (µkat/L) are converted to standard units (U/L), with results stored in `LBSTRES`/`LBSTRESU` |
| 41 | +4. **Normal range evaluation**: Standard results are compared against reference ranges (`LBSTNRLO`, `LBSTNRHI`) to determine the normal-range indicator (`LBNRIND`) |
| 42 | + |
| 43 | +### Adverse Events (AE) |
| 44 | + |
19 | 45 | A hepatic enzyme elevation adverse event is derived from the LB domain: |
20 | | -1. LB records with `LBNRIND = HIGH` for ALT, AST, or ALP are identified (Elevated Liver Enzyme) |
21 | | -2. The dictionary-derived term `AEDECOD` is populated from the elevated lab test names |
| 46 | + |
| 47 | +1. LB records where `LBNRIND = HIGH` for ALT, AST, or ALP are identified |
| 48 | +2. The dictionary-derived term `AEDECOD` is populated as "Hepatic enzyme increased" |
22 | 49 | 3. The adverse event start date `AESTDTC` is taken from the earliest elevated lab result date |
23 | 50 |
|
| 51 | +In this example, subject 002's second visit (2025-11-06) shows all three liver enzymes elevated above normal range, producing a single AE record. |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## Lineage Overview |
| 56 | + |
| 57 | +The `rwd-lineage.xml` file contains **99 `<MapID>` elements** tracing source EHR lab data to the SDTM LB and AE domains. |
| 58 | + |
| 59 | +### Transformation Types |
| 60 | + |
| 61 | +| Type | Count | Target Domain | Description | |
| 62 | +|------|-------|---------------|-------------| |
| 63 | +| `DirectMap` | 36 | LB | One-to-one mapping (LOINC code → `LBTEST`, visit date → `LBDTC`, patient ID → `USUBJID`, etc.) | |
| 64 | +| `LabValueParsing` | 24 | LB | Parsing composite result strings into numeric value and unit components | |
| 65 | +| `UnitConversion` | 24 | LB | Converting original units (µkat/L) to standard units (U/L) | |
| 66 | +| `DirectMap` | 3 | AE | Direct mappings for AE fields derived from LB data | |
| 67 | +| `ElevatedLiverEnzyme` | 12 | AE | Algorithmic derivation identifying elevated lab values to produce the adverse event | |
| 68 | + |
| 69 | +### Lineage by Domain |
| 70 | + |
| 71 | +**LB domain (84 lineage entries across 12 records)** |
| 72 | +Each of the 12 LB records receives 7 lineage entries covering 5 target columns: `LBTEST` (DirectMap), `LBDTC` (DirectMap), `LBORRES` (LabValueParsing × 2 — one for value, one for unit source), `LBORRESU` (LabValueParsing × 2), and `LBSTRES`/`LBSTRESU` (UnitConversion × 2 each). This pattern demonstrates the multi-step transformation pipeline where a single raw result string undergoes parsing and then conversion. |
| 73 | + |
| 74 | +**AE domain (15 lineage entries for 1 record)** |
| 75 | +The single AE record for subject 002 traces to 12 `ElevatedLiverEnzyme` entries (one per source lab result contributing to the elevated-enzyme determination) plus 3 `DirectMap` entries for the derived fields (`AEDECOD`, `AESTDTC`). This demonstrates cross-domain derivation: the AE record's lineage points back through the LB transformation pipeline to the original EHR source. |
| 76 | + |
| 77 | +--- |
24 | 78 |
|
25 | 79 | ## Contents |
26 | 80 |
|
27 | 81 | ``` |
28 | 82 | example2/ |
29 | 83 | ├── README.md # This file |
30 | | -├── Example2.xlsx # Source workbook (SDTM AE, SDTM LB, Source LabResults, RWDLineage-Table) |
| 84 | +├── Example2.xlsx # Companion workbook (SDTM AE, SDTM LB, Source LabResults, RWDLineage-Table) |
31 | 85 | └── data/ |
32 | 86 | ├── sdtm/ |
33 | | - │ ├── AE.csv # SDTM AE domain (1 record: subject 002 hepatic enzyme elevation) |
34 | | - │ └── LB.csv # SDTM LB domain (12 records: subjects 001/002 x 3 tests x 2 visits) |
| 87 | + │ ├── AE.csv # SDTM AE domain — 1 record (subject 002 hepatic enzyme elevation) |
| 88 | + │ └── LB.csv # SDTM LB domain — 12 records (2 subjects × 3 tests × 2 visits) |
35 | 89 | ├── source/ |
36 | | - │ └── LabResults.csv # LabResults table (PATID, LOINC Code, Lab Test, Visit Date, Lab Result) |
| 90 | + │ └── LabResults.csv # EHR lab results — 12 rows (PATID, LOINC Code, Lab Test, Visit Date, Lab Result) |
37 | 91 | └── define/ |
38 | | - ├── define.xml # Define-XML 2.1 describing the AE and LB domains with RWD lineage reference |
39 | | - └── rwd-lineage.xml # RWD-Lineage XML with 99 MapID elements linking source EHR data to SDTM AE and LB |
| 92 | + ├── define.xml # Define-XML 2.1 with rwdl namespace extension referencing rwd-lineage.xml |
| 93 | + └── rwd-lineage.xml # RWD-Lineage XML — 99 MapID elements linking source EHR data to SDTM AE and LB |
40 | 94 | ``` |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## Validation |
| 99 | + |
| 100 | +From the repository root: |
| 101 | + |
| 102 | +```bash |
| 103 | +# Validate the RWD-Lineage XML structure |
| 104 | +python3 tools/validate.py rwd-lineage examples/example2/data/define/rwd-lineage.xml |
| 105 | + |
| 106 | +# Validate the Define-XML against CDISC XSD (requires lxml) |
| 107 | +python3 tools/validate.py define-xml examples/example2/data/define/define.xml |
| 108 | + |
| 109 | +# Check that every SDTM cell has lineage coverage |
| 110 | +python3 tools/validate.py coverage examples/example2/data/sdtm examples/example2/data/define/rwd-lineage.xml |
| 111 | +``` |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +## Key Concepts Demonstrated |
| 116 | + |
| 117 | +- **Multi-step transformations**: A single lab result passes through parsing → unit conversion → range evaluation, with each step recorded as a separate lineage entry. |
| 118 | +- **Cross-domain derivation**: The AE record is derived from LB domain data, which is itself derived from source EHR data — the lineage captures both hops. |
| 119 | +- **High coverage density**: 99 lineage entries across 13 SDTM records demonstrate cell-level traceability at scale, covering every column including standard-range metadata (`LBSTNRLO`, `LBSTNRHI`, `LBNRIND`). |
| 120 | +- **Composite string parsing**: The `LabValueParsing` transformation type documents how a single source field (`"0.3507 µkat/L"`) fans out into multiple target fields (numeric value + unit). |
0 commit comments