Skip to content

Commit 6ccc1af

Browse files
tnagamineclaude
andcommitted
Add README.md for examples folder
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 4bdbffe commit 6ccc1af

1 file changed

Lines changed: 212 additions & 0 deletions

File tree

examples/README.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Examples
2+
3+
This directory contains worked examples that demonstrate the [RWD-Lineage Data Standard](../documents/RWD-Lineage_Data_Standard_Specification.md) — a machine-readable CDISC data exchange format for capturing the lineage of Real-World Data (RWD) as it is transformed into SDTM datasets.
4+
5+
Each example provides a complete, self-contained package: source EHR data, target SDTM datasets, a Define-XML 2.1 file with the `rwdl` namespace extension, and a companion `rwd-lineage.xml` file that traces every cell in the SDTM output back to its origin in the source data.
6+
7+
---
8+
9+
## Quick Start
10+
11+
```
12+
examples/
13+
├── README.md ← You are here
14+
├── example1/ ← CE domain: diagnoses, vitals, and clinical notes
15+
│ ├── README.md
16+
│ ├── Example1.xlsx
17+
│ └── data/
18+
│ ├── define/
19+
│ │ ├── define.xml
20+
│ │ └── rwd-lineage.xml
21+
│ ├── sdtm/
22+
│ │ └── ce.csv
23+
│ └── source/
24+
│ ├── pt_dx.csv
25+
│ ├── vitals.csv
26+
│ └── notes.csv
27+
└── example2/ ← AE + LB domains: lab results and adverse events
28+
├── README.md
29+
├── Example2.xlsx
30+
└── data/
31+
├── define/
32+
│ ├── define.xml
33+
│ └── rwd-lineage.xml
34+
├── sdtm/
35+
│ ├── AE.csv
36+
│ └── LB.csv
37+
└── source/
38+
└── LabResults.csv
39+
```
40+
41+
### Validating an example
42+
43+
From the repository root:
44+
45+
```bash
46+
# Validate the RWD-Lineage XML
47+
python3 tools/validate.py rwd-lineage examples/example1/data/define/rwd-lineage.xml
48+
49+
# Validate the Define-XML (requires lxml)
50+
python3 tools/validate.py define-xml examples/example1/data/define/define.xml
51+
52+
# Check that every SDTM cell has lineage coverage
53+
python3 tools/validate.py coverage examples/example1/data/sdtm examples/example1/data/define/rwd-lineage.xml
54+
```
55+
56+
See the [repository README](../README.md) for full validation instructions and requirements.
57+
58+
---
59+
60+
## Example Summaries
61+
62+
### Example 1 — Clinical Events (CE) from EHR Diagnoses, Vitals, and Notes
63+
64+
**SDTM domain:** CE (Clinical Events)
65+
**Source tables:** `pt_dx` (ICD-10 diagnoses), `vitals` (blood pressure, BMI), `notes` (free-text clinical notes)
66+
**Subjects:** 2 (001, 002) &nbsp;×&nbsp; 2 prespecified conditions = 4 CE records
67+
**Lineage entries:** 20 `<MapID>` elements
68+
69+
This example models two prespecified clinical events — **hypertension** and **acute myocardial infarction** — and shows how each `CEOCCUR` determination draws on multiple evidence sources:
70+
71+
| Transformation Type | Count | Description |
72+
|---------------------|-------|-------------|
73+
| `DirectMap` | 7 | One-to-one mapping of a source value to a target field (e.g., ICD-10 code → `CEOCCUR` evidence) |
74+
| `AfterIndexDate` | 5 | Temporal filter ensuring the source event falls within the study follow-up period |
75+
| `NLPExtraction` | 5 | Structured data extracted from free-text clinical notes via NLP |
76+
| `FilterByValue` | 3 | Conditional inclusion based on a source value (e.g., blood pressure ≥ threshold) |
77+
78+
**Key concepts illustrated:**
79+
- Multi-source evidence: a single SDTM cell (`CEOCCUR`) can trace to diagnosis codes, vital-sign measurements, *and* NLP-extracted findings simultaneously.
80+
- Prespecified event algorithms: the lineage captures each step of a composite clinical algorithm (diagnosis code check → temporal filter → vitals threshold → NLP confirmation).
81+
- NLP lineage: free-text clinical notes are treated as a legitimate source, with the `NLPExtraction` transformation type documenting the extraction.
82+
83+
→ See [`example1/README.md`](example1/README.md) for the full algorithm definitions.
84+
85+
---
86+
87+
### Example 2 — Laboratory Results (LB) and Adverse Events (AE) from EHR Labs
88+
89+
**SDTM domains:** LB (Laboratory Test Results), AE (Adverse Events)
90+
**Source table:** `LabResults` (LOINC-coded lab results with raw values in original units)
91+
**Subjects:** 2 (001, 002) &nbsp;×&nbsp; 3 liver-enzyme tests &nbsp;×&nbsp; 2 visits = 12 LB records + 1 AE record
92+
**Lineage entries:** 99 `<MapID>` elements
93+
94+
This example traces LOINC-coded EHR lab data through unit conversion into the SDTM LB domain, then derives an adverse event (hepatic enzyme elevation) in the AE domain:
95+
96+
| Transformation Type | Count | Description |
97+
|--------------------------|-------|-------------|
98+
| `DirectMap` | 39 | One-to-one mappings (LOINC → `LBTESTCD`, visit date → `LBDTC`, patient ID → `USUBJID`, etc.) |
99+
| `LabValueParsing` | 24 | Parsing composite result strings (e.g., `"0.3507 µkat/L"`) into numeric value and unit components |
100+
| `UnitConversion` | 24 | Converting original units (µkat/L) to standard units (U/L) with stored results in `LBSTRES`/`LBSTRESU` |
101+
| `ElevatedLiverEnzyme` | 12 | Algorithmic derivation identifying elevated ALT/AST/ALP to produce the AE record |
102+
103+
**Key concepts illustrated:**
104+
- Multi-step transformations: a single lab result passes through parsing → conversion → standardization, each step recorded as a separate lineage entry.
105+
- Cross-domain derivation: the AE domain record is derived from the LB domain, which is itself derived from source EHR data — the lineage captures both hops.
106+
- High coverage density: 99 lineage entries across 13 SDTM records demonstrates cell-level traceability at scale, including every standard-range indicator (`LBSTNRLO`, `LBSTNRHI`, `LBNRIND`).
107+
108+
→ See [`example2/README.md`](example2/README.md) for the full algorithm definitions.
109+
110+
---
111+
112+
## Anatomy of an Example
113+
114+
Every example follows the same internal structure:
115+
116+
```
117+
exampleN/
118+
├── README.md # Scenario description, algorithm definitions, file inventory
119+
├── ExampleN.xlsx # Human-readable workbook with all tables and lineage in spreadsheet form
120+
└── data/
121+
├── define/
122+
│ ├── define.xml # Define-XML 2.1 with rwdl namespace extension
123+
│ └── rwd-lineage.xml # RWD-Lineage XML: the cell-level lineage map
124+
├── sdtm/
125+
│ └── *.csv # Target SDTM domain datasets
126+
└── source/
127+
└── *.csv # Source EHR/RWD tables
128+
```
129+
130+
### `define.xml`
131+
132+
A standard [CDISC Define-XML 2.1](https://www.cdisc.org/standards/data-exchange/define-xml) file with one addition — the `rwdl` namespace extension that references the companion lineage file:
133+
134+
```xml
135+
<ODM xmlns:rwdl="http://www.cdisc.org/ns/rwd-lineage/v1" ...>
136+
<Study>
137+
<MetaDataVersion>
138+
<rwdl:lineage>
139+
<rwdl:ref leafID="LF.RWDLINEAGE">rwd-lineage.xml</rwdl:ref>
140+
</rwdl:lineage>
141+
<!-- Standard ItemGroupDef / ItemDef elements follow -->
142+
</MetaDataVersion>
143+
</Study>
144+
</ODM>
145+
```
146+
147+
### `rwd-lineage.xml`
148+
149+
The core deliverable. Each `<MapID>` element represents one source-to-target cell mapping:
150+
151+
```xml
152+
<RWDLineage xmlns="http://www.cdisc.org/ns/rwd-lineage/v1" ...>
153+
<MapID uuid="35060134-fc2f-4cdf-9abe-491924739bd5">
154+
<Transformation type="DirectMap">Direct Map</Transformation>
155+
<Source>
156+
<Coordinate storage="Filesystem" structure="Tabular">
157+
<URI>...source/pt_dx.csv</URI>
158+
<RowIndex>4</RowIndex>
159+
<ColumnName>ICD10</ColumnName>
160+
</Coordinate>
161+
</Source>
162+
<Target>
163+
<Coordinate storage="Filesystem" structure="Tabular">
164+
<URI>...sdtm/ce.csv</URI>
165+
<RowIndex>2</RowIndex>
166+
<ColumnName>CEOCCUR</ColumnName>
167+
</Coordinate>
168+
</Target>
169+
</MapID>
170+
<!-- ... -->
171+
</RWDLineage>
172+
```
173+
174+
Key attributes are documented in the [RWD-Lineage Data Standard Specification](../documents/RWD-Lineage_Data_Standard_Specification.md).
175+
176+
### `ExampleN.xlsx`
177+
178+
A companion Excel workbook containing all source tables, SDTM output tables, and the lineage mappings in tabular form. This is provided for human review and is not a normative artifact — the XML files are the machine-readable standard.
179+
180+
---
181+
182+
## Transformation Types Used Across Examples
183+
184+
| Type | Example 1 | Example 2 | Description |
185+
|------|-----------|-----------|-------------|
186+
| `DirectMap` ||| One-to-one value copy from source to target |
187+
| `AfterIndexDate` || | Temporal filter relative to a study index date |
188+
| `FilterByValue` || | Conditional inclusion based on source data value |
189+
| `NLPExtraction` || | Value extracted from unstructured text via NLP |
190+
| `LabValueParsing` | || Numeric/unit parsing from composite lab result strings |
191+
| `UnitConversion` | || Conversion between measurement unit systems |
192+
| `ElevatedLiverEnzyme` | || Algorithmic derivation of adverse events from lab data |
193+
194+
---
195+
196+
## Contributing a New Example
197+
198+
New examples are welcome. To maintain consistency:
199+
200+
1. Create a directory named `exampleN/` following the structure above.
201+
2. Include a `README.md` with the scenario description, algorithm, and file inventory.
202+
3. Provide both CSV data files and a companion `.xlsx` workbook.
203+
4. Ensure the `rwd-lineage.xml` passes validation:
204+
```bash
205+
python3 tools/validate.py rwd-lineage examples/exampleN/data/define/rwd-lineage.xml
206+
```
207+
5. Ensure full lineage coverage of all SDTM cells:
208+
```bash
209+
python3 tools/validate.py coverage examples/exampleN/data/sdtm examples/exampleN/data/define/rwd-lineage.xml
210+
```
211+
212+
See [CONTRIBUTING.md](../CONTRIBUTING.md) for general contribution guidelines.

0 commit comments

Comments
 (0)