Skip to content

Commit d75ba74

Browse files
tnagamineclaude
andcommitted
docs: update examples README to current spec and add PDF reference
- Add RWD-Lineage-Examples.pdf to directory tree with description - Update define.xml snippet: new rwdl namespace, MethodDef elements, def:leaf + rwdl:LineageRef pattern replacing old rwdl:lineage/rwdl:ref - Update rwd-lineage.xml snippet: rwdl:Lineage root, rwdl:SourceMetadata, rwdl:LineageTrail, rwdl:MapID with UUID (PascalCase), rwdl:Coordinate with Storage/Structure (uppercase enums), MethodDefOID attribute - Replace transformation type table with MethodDefOID table - Update example summaries to use MethodDefOID language and correct MapID count for example 2 (101, was 99) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent fbe6d44 commit d75ba74

1 file changed

Lines changed: 126 additions & 63 deletions

File tree

examples/README.md

Lines changed: 126 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ Each example provides a complete, self-contained package: source EHR data, targe
1010

1111
```
1212
examples/
13-
├── README.md ← You are here
14-
├── example1/ ← CE domain: diagnoses, vitals, and clinical notes
13+
├── README.md ← You are here
14+
├── RWD-Lineage-Examples.pdf ← Slide deck: background, motivation, and example walkthroughs
15+
├── example1/ ← CE domain: diagnoses, vitals, and clinical notes
1516
│ ├── README.md
1617
│ ├── Example1.xlsx
1718
│ └── data/
@@ -24,7 +25,7 @@ examples/
2425
│ ├── pt_dx.csv
2526
│ ├── vitals.csv
2627
│ └── notes.csv
27-
└── example2/ ← AE + LB domains: lab results and adverse events
28+
└── example2/ ← AE + LB domains: lab results and adverse events
2829
├── README.md
2930
├── Example2.xlsx
3031
└── data/
@@ -38,6 +39,8 @@ examples/
3839
└── LabResults.csv
3940
```
4041

42+
`RWD-Lineage-Examples.pdf` is a presentation deck covering the background and motivation for the standard, the RWD Lineage in Define-XML architecture, and annotated walkthroughs of both examples. It is a good starting point for understanding why the standard exists before reading the XML files.
43+
4144
### Validating an example
4245

4346
From the repository root:
@@ -59,51 +62,51 @@ See the [repository README](../README.md) for full validation instructions and r
5962

6063
## Example Summaries
6164

62-
### Example 1 — Clinical Events (CE) from EHR Diagnoses, Vitals, and Notes
65+
### Example 1 — Clinical Events (CE): Hypertension and Myocardial Infarction
6366

6467
**SDTM domain:** CE (Clinical Events)
6568
**Source tables:** `pt_dx` (ICD-10 diagnoses), `vitals` (blood pressure, BMI), `notes` (free-text clinical notes)
6669
**Subjects:** 2 (001, 002) &nbsp;×&nbsp; 2 prespecified conditions = 4 CE records
67-
**Lineage entries:** 20 `<MapID>` elements
70+
**Lineage entries:** 20 `<rwdl:MapID>` elements
6871

6972
This example models two prespecified clinical events — **hypertension** and **acute myocardial infarction** — and shows how each `CEOCCUR` determination draws on multiple evidence sources:
7073

71-
| Transformation Type | Count | Description |
72-
|---------------------|-------|-------------|
73-
| `DirectMap` | 7 | One-to-one mapping of a source value to a target field (e.g., ICD-10 code → `CEOCCUR` evidence) |
74-
| `AfterIndexDate` | 5 | Temporal filter ensuring the source event falls within the study follow-up period |
75-
| `NLPExtraction` | 5 | Structured data extracted from free-text clinical notes via NLP |
76-
| `FilterByValue` | 3 | Conditional inclusion based on a source value (e.g., blood pressure ≥ threshold) |
74+
| `MethodDefOID` | Count | Description |
75+
|----------------|-------|-------------|
76+
| *(none — direct map)* | 7 | Source record directly supports the target determination; no algorithmic transformation |
77+
| `MT.AFTERIDXDATE` | 5 | Temporal filter — include only source records dated on or after the patient's study index date |
78+
| `MT.NLPEXTRACTION` | 5 | Structured data extracted from free-text clinical notes via NLP |
79+
| `MT.FILTERBYVAL` | 3 | Source vitals filtered by vital type to match the target clinical event |
7780

7881
**Key concepts illustrated:**
79-
- Multi-source evidence: a single SDTM cell (`CEOCCUR`) can trace to diagnosis codes, vital-sign measurements, *and* NLP-extracted findings simultaneously.
80-
- Prespecified event algorithms: the lineage captures each step of a composite clinical algorithm (diagnosis code check → temporal filter → vitals threshold → NLP confirmation).
81-
- NLP lineage: free-text clinical notes are treated as a legitimate source, with the `NLPExtraction` transformation type documenting the extraction.
82+
- **Multi-source evidence:** a single SDTM cell (`CEOCCUR`) can trace to diagnosis codes, vital-sign measurements, *and* NLP-extracted findings simultaneously.
83+
- **Prespecified event algorithms:** the lineage captures each step of a composite clinical algorithm (diagnosis code check → temporal filter → vitals threshold → NLP confirmation).
84+
- **NLP lineage:** free-text clinical notes are a legitimate source, with `MT.NLPEXTRACTION` referencing the Define-XML `MethodDef` that documents the extraction logic.
8285

8386
→ See [`example1/README.md`](example1/README.md) for the full algorithm definitions.
8487

8588
---
8689

87-
### Example 2 — Laboratory Results (LB) and Adverse Events (AE) from EHR Labs
90+
### Example 2 — Labs (LB) and Adverse Events (AE): Elevated Liver Enzyme
8891

8992
**SDTM domains:** LB (Laboratory Test Results), AE (Adverse Events)
9093
**Source table:** `LabResults` (LOINC-coded lab results with raw values in original units)
9194
**Subjects:** 2 (001, 002) &nbsp;×&nbsp; 3 liver-enzyme tests &nbsp;×&nbsp; 2 visits = 12 LB records + 1 AE record
92-
**Lineage entries:** 99 `<MapID>` elements
95+
**Lineage entries:** 101 `<rwdl:MapID>` elements
9396

9497
This example traces LOINC-coded EHR lab data through unit conversion into the SDTM LB domain, then derives an adverse event (hepatic enzyme elevation) in the AE domain:
9598

96-
| Transformation Type | Count | Description |
97-
|--------------------------|-------|-------------|
98-
| `DirectMap` | 39 | One-to-one mappings (LOINC → `LBTESTCD`, visit date → `LBDTC`, patient ID `USUBJID`, etc.) |
99-
| `LabValueParsing` | 24 | Parsing composite result strings (e.g., `"0.3507 µkat/L"`) into numeric value and unit components |
100-
| `UnitConversion` | 24 | Converting original units (µkat/L) to standard units (U/L) with stored results in `LBSTRES`/`LBSTRESU` |
101-
| `ElevatedLiverEnzyme` | 12 | Algorithmic derivation identifying elevated ALT/AST/ALP to produce the AE record |
99+
| `MethodDefOID` | Count | Description |
100+
|----------------|-------|-------------|
101+
| *(none — direct map)* | 39 | LOINC code `LBTEST`, visit date → `LBDTC`, `AETERM` `AEDECOD`/`AELLTCD`, etc. |
102+
| `MT.LABVALPARSING` | 24 | Parse composite result strings (e.g., `"0.3507 µkat/L"`) into numeric value and unit components |
103+
| `MT.UNITCONV` | 24 | Convert original units (µkat/L) to standard units (U/L); results in `LBSTRES`/`LBSTRESU` |
104+
| `MT.ELEVATEDLIVERENZYME` | 12 | Evaluate ALT/AST/ALP against reference range upper limits to derive the AE record |
102105

103106
**Key concepts illustrated:**
104-
- Multi-step transformations: a single lab result passes through parsing → conversion → standardization, each step recorded as a separate lineage entry.
105-
- Cross-domain derivation: the AE domain record is derived from the LB domain, which is itself derived from source EHR data — the lineage captures both hops.
106-
- High coverage density: 99 lineage entries across 13 SDTM records demonstrates cell-level traceability at scale, including every standard-range indicator (`LBSTNRLO`, `LBSTNRHI`, `LBNRIND`).
107+
- **Multi-step transformations:** a single lab result passes through parsing → conversion → standardization, each step a separate lineage entry.
108+
- **Cross-domain derivation:** the AE domain record is derived from the LB domain, which is itself derived from source EHR data — the lineage captures both hops.
109+
- **High coverage density:** 101 lineage entries across 13 SDTM records demonstrates cell-level traceability at scale, including every standard-range indicator (`LBSTNRLO`, `LBSTNRHI`, `LBNRIND`).
107110

108111
→ See [`example2/README.md`](example2/README.md) for the full algorithm definitions.
109112

@@ -129,15 +132,32 @@ exampleN/
129132

130133
### `define.xml`
131134

132-
A standard [CDISC Define-XML 2.1](https://www.cdisc.org/standards/data-exchange/define-xml) file with one addition — the `rwdl` namespace extension that references the companion lineage file:
135+
A standard [CDISC Define-XML 2.1](https://www.cdisc.org/standards/data-exchange/define-xml) file extended with two additions:
136+
137+
1. **`MethodDef` elements** — one per non-direct transformation, describing the algorithm applied. Each `MethodDefOID` in `rwd-lineage.xml` resolves to one of these.
138+
2. **`rwdl:LineageRef`** — points to the companion lineage file via a standard `def:leaf` reference.
133139

134140
```xml
135-
<ODM xmlns:rwdl="http://www.cdisc.org/ns/rwd-lineage/v1" ...>
141+
<ODM xmlns:rwdl="http://www.cdisc.org/ns/rwdl/v1.0"
142+
xmlns:def="http://www.cdisc.org/ns/def/v2.1" ...>
136143
<Study>
137144
<MetaDataVersion>
138-
<rwdl:lineage>
139-
<rwdl:ref leafID="LF.RWDLINEAGE">rwd-lineage.xml</rwdl:ref>
140-
</rwdl:lineage>
145+
146+
<!-- Transformation definitions referenced by MethodDefOID in rwd-lineage.xml -->
147+
<MethodDef OID="MT.AFTERIDXDATE" Name="After Index Date filter" Type="Computation">
148+
<Description>
149+
<TranslatedText xml:lang="en">Include source records only when the source date
150+
falls on or after the patient's index date.</TranslatedText>
151+
</Description>
152+
</MethodDef>
153+
<!-- ... additional MethodDef elements ... -->
154+
155+
<!-- Lineage file reference: def:leaf declares the file; rwdl:LineageRef points to it -->
156+
<def:leaf ID="LF.RWDLINEAGE" xlink:href="rwd-lineage.xml">
157+
<def:title>RWD Lineage Traceability</def:title>
158+
</def:leaf>
159+
<rwdl:LineageRef leafID="LF.RWDLINEAGE"/>
160+
141161
<!-- Standard ItemGroupDef / ItemDef elements follow -->
142162
</MetaDataVersion>
143163
</Study>
@@ -146,50 +166,93 @@ A standard [CDISC Define-XML 2.1](https://www.cdisc.org/standards/data-exchange/
146166

147167
### `rwd-lineage.xml`
148168

149-
The core deliverable. Each `<MapID>` element represents one source-to-target cell mapping:
169+
The core deliverable. The document has two top-level layers inside `rwdl:Lineage`:
170+
171+
- **`rwdl:SourceMetadata`** (optional) — assertions about the source systems: their names, data models, and the controlled terminologies their coded values are encoded in.
172+
- **`rwdl:LineageTrail`** — the forensic record: an array of `rwdl:MapID` elements, each a Source→Target pair.
173+
174+
Each `rwdl:MapID` element represents one source-to-target data point mapping:
150175

151176
```xml
152-
<RWDLineage xmlns="http://www.cdisc.org/ns/rwd-lineage/v1" ...>
153-
<MapID uuid="35060134-fc2f-4cdf-9abe-491924739bd5">
154-
<Transformation type="DirectMap">Direct Map</Transformation>
155-
<Source>
156-
<Coordinate storage="Filesystem" structure="Tabular">
157-
<URI>...source/pt_dx.csv</URI>
158-
<RowIndex>4</RowIndex>
159-
<ColumnName>ICD10</ColumnName>
160-
</Coordinate>
161-
</Source>
162-
<Target>
163-
<Coordinate storage="Filesystem" structure="Tabular">
164-
<URI>...sdtm/ce.csv</URI>
165-
<RowIndex>2</RowIndex>
166-
<ColumnName>CEOCCUR</ColumnName>
167-
</Coordinate>
168-
</Target>
169-
</MapID>
170-
<!-- ... -->
171-
</RWDLineage>
177+
<rwdl:Lineage xmlns:rwdl="http://www.cdisc.org/ns/rwdl/v1.0">
178+
179+
<!-- LAYER 1: Assertions about the source systems -->
180+
<rwdl:SourceMetadata>
181+
<rwdl:SourceSystem OID="SRC.CSV.1"
182+
Name="Example 1 Clinical Source CSV Files"
183+
Description="CSV exports from clinical source system">
184+
<rwdl:ExternalCodeList Dictionary="ICD-10-CM" Version="2024"
185+
AppliesTo="pt_dx.csv ICD10"/>
186+
</rwdl:SourceSystem>
187+
</rwdl:SourceMetadata>
188+
189+
<!-- LAYER 2: Forensic trail — Source -> Target pairs -->
190+
<rwdl:LineageTrail>
191+
192+
<!-- Direct map: no MethodDefOID -->
193+
<rwdl:MapID UUID="35060134-fc2f-4cdf-9abe-491924739bd5">
194+
<rwdl:Source>
195+
<rwdl:Coordinate Storage="FILESYSTEM" Structure="TABULAR" Format="CSV">
196+
<rwdl:URI>.../source/pt_dx.csv</rwdl:URI>
197+
<rwdl:RowIndex>4</rwdl:RowIndex>
198+
<rwdl:ColumnName>ICD10</rwdl:ColumnName>
199+
</rwdl:Coordinate>
200+
</rwdl:Source>
201+
<rwdl:Target>
202+
<rwdl:Coordinate Storage="FILESYSTEM" Structure="TABULAR" Format="CSV">
203+
<rwdl:URI>.../sdtm/ce.csv</rwdl:URI>
204+
<rwdl:RowIndex>2</rwdl:RowIndex>
205+
<rwdl:ColumnName>CEOCCUR</rwdl:ColumnName>
206+
</rwdl:Coordinate>
207+
</rwdl:Target>
208+
</rwdl:MapID>
209+
210+
<!-- Non-direct map: MethodDefOID references the MethodDef in define.xml -->
211+
<rwdl:MapID UUID="7e376beb-7dad-4f5c-a212-88283ac22eba"
212+
MethodDefOID="MT.AFTERIDXDATE">
213+
<rwdl:Source>
214+
<rwdl:Coordinate Storage="FILESYSTEM" Structure="TABULAR" Format="CSV">
215+
<rwdl:URI>.../source/pt_dx.csv</rwdl:URI>
216+
<rwdl:RowIndex>4</rwdl:RowIndex>
217+
<rwdl:ColumnName>DATE</rwdl:ColumnName>
218+
</rwdl:Coordinate>
219+
</rwdl:Source>
220+
<rwdl:Target>
221+
<rwdl:Coordinate Storage="FILESYSTEM" Structure="TABULAR" Format="CSV">
222+
<rwdl:URI>.../sdtm/ce.csv</rwdl:URI>
223+
<rwdl:RowIndex>2</rwdl:RowIndex>
224+
<rwdl:ColumnName>CEOCCUR</rwdl:ColumnName>
225+
</rwdl:Coordinate>
226+
</rwdl:Target>
227+
</rwdl:MapID>
228+
229+
<!-- ... -->
230+
</rwdl:LineageTrail>
231+
232+
</rwdl:Lineage>
172233
```
173234

174-
Key attributes are documented in the [RWD-Lineage Data Standard Specification](../documents/RWD-Lineage_Data_Standard_Specification.md).
235+
Key attributes and elements are documented in the [RWD-Lineage Data Standard Specification](../documents/RWD-Lineage_Data_Standard_Specification.md).
175236

176237
### `ExampleN.xlsx`
177238

178239
A companion Excel workbook containing all source tables, SDTM output tables, and the lineage mappings in tabular form. This is provided for human review and is not a normative artifact — the XML files are the machine-readable standard.
179240

180241
---
181242

182-
## Transformation Types Used Across Examples
183-
184-
| Type | Example 1 | Example 2 | Description |
185-
|------|-----------|-----------|-------------|
186-
| `DirectMap` ||| One-to-one value copy from source to target |
187-
| `AfterIndexDate` || | Temporal filter relative to a study index date |
188-
| `FilterByValue` || | Conditional inclusion based on source data value |
189-
| `NLPExtraction` || | Value extracted from unstructured text via NLP |
190-
| `LabValueParsing` | || Numeric/unit parsing from composite lab result strings |
191-
| `UnitConversion` | || Conversion between measurement unit systems |
192-
| `ElevatedLiverEnzyme` | || Algorithmic derivation of adverse events from lab data |
243+
## Transformations Across Examples
244+
245+
Transformations are expressed via the `MethodDefOID` attribute on `rwdl:MapID`, which references a `MethodDef` element in `define.xml`. Direct maps — where the source record directly supports the target without an algorithmic transformation — carry no `MethodDefOID`.
246+
247+
| `MethodDefOID` | Example 1 | Example 2 | Description |
248+
|----------------|-----------|-----------|-------------|
249+
| *(none)* ||| Direct map — no transformation applied |
250+
| `MT.AFTERIDXDATE` || | Temporal filter relative to the patient's study index date |
251+
| `MT.FILTERBYVAL` || | Conditional inclusion based on source vital type value |
252+
| `MT.NLPEXTRACTION` || | Concept extracted from free-text clinical notes via NLP |
253+
| `MT.LABVALPARSING` | || Numeric value and unit parsed from a composite lab result string |
254+
| `MT.UNITCONV` | || Conversion between measurement unit systems (µkat/L → U/L) |
255+
| `MT.ELEVATEDLIVERENZYME` | || Algorithmic derivation of an AE record from elevated lab results |
193256

194257
---
195258

0 commit comments

Comments
 (0)