You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: update examples README to current spec and add PDF reference
- Add RWD-Lineage-Examples.pdf to directory tree with description
- Update define.xml snippet: new rwdl namespace, MethodDef elements,
def:leaf + rwdl:LineageRef pattern replacing old rwdl:lineage/rwdl:ref
- Update rwd-lineage.xml snippet: rwdl:Lineage root, rwdl:SourceMetadata,
rwdl:LineageTrail, rwdl:MapID with UUID (PascalCase), rwdl:Coordinate
with Storage/Structure (uppercase enums), MethodDefOID attribute
- Replace transformation type table with MethodDefOID table
- Update example summaries to use MethodDefOID language and correct
MapID count for example 2 (101, was 99)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`RWD-Lineage-Examples.pdf` is a presentation deck covering the background and motivation for the standard, the RWD Lineage in Define-XML architecture, and annotated walkthroughs of both examples. It is a good starting point for understanding why the standard exists before reading the XML files.
43
+
41
44
### Validating an example
42
45
43
46
From the repository root:
@@ -59,51 +62,51 @@ See the [repository README](../README.md) for full validation instructions and r
59
62
60
63
## Example Summaries
61
64
62
-
### Example 1 — Clinical Events (CE) from EHR Diagnoses, Vitals, and Notes
65
+
### Example 1 — Clinical Events (CE): Hypertension and Myocardial Infarction
**Subjects:** 2 (001, 002) × 2 prespecified conditions = 4 CE records
67
-
**Lineage entries:** 20 `<MapID>` elements
70
+
**Lineage entries:** 20 `<rwdl:MapID>` elements
68
71
69
72
This example models two prespecified clinical events — **hypertension** and **acute myocardial infarction** — and shows how each `CEOCCUR` determination draws on multiple evidence sources:
70
73
71
-
|Transformation Type| Count | Description |
72
-
|---------------------|-------|-------------|
73
-
|`DirectMap`| 7 | One-to-one mapping of a source value to a target field (e.g., ICD-10 code → `CEOCCUR` evidence)|
74
-
|`AfterIndexDate`| 5 | Temporal filter ensuring the source event falls within the study follow-up period|
75
-
|`NLPExtraction`| 5| Structured data extracted from free-text clinical notes via NLP |
76
-
|`FilterByValue`| 3 | Conditional inclusion based on a source value (e.g., blood pressure ≥ threshold)|
74
+
|`MethodDefOID`| Count | Description |
75
+
|----------------|-------|-------------|
76
+
|*(none — direct map)*| 7 | Source record directly supports the target determination; no algorithmic transformation|
77
+
|`MT.AFTERIDXDATE`| 5 | Temporal filter — include only source records dated on or after the patient's study index date|
78
+
|`MT.NLPEXTRACTION`| 5 | Structured data extracted from free-text clinical notes via NLP |
79
+
|`MT.FILTERBYVAL`| 3 | Source vitals filtered by vital type to match the target clinical event|
77
80
78
81
**Key concepts illustrated:**
79
-
- Multi-source evidence: a single SDTM cell (`CEOCCUR`) can trace to diagnosis codes, vital-sign measurements, *and* NLP-extracted findings simultaneously.
80
-
- Prespecified event algorithms: the lineage captures each step of a composite clinical algorithm (diagnosis code check → temporal filter → vitals threshold → NLP confirmation).
81
-
- NLP lineage: free-text clinical notes are treated as a legitimate source, with the `NLPExtraction` transformation type documenting the extraction.
82
+
-**Multi-source evidence:** a single SDTM cell (`CEOCCUR`) can trace to diagnosis codes, vital-sign measurements, *and* NLP-extracted findings simultaneously.
83
+
-**Prespecified event algorithms:** the lineage captures each step of a composite clinical algorithm (diagnosis code check → temporal filter → vitals threshold → NLP confirmation).
84
+
-**NLP lineage:** free-text clinical notes are a legitimate source, with `MT.NLPEXTRACTION` referencing the Define-XML `MethodDef` that documents the extraction logic.
82
85
83
86
→ See [`example1/README.md`](example1/README.md) for the full algorithm definitions.
84
87
85
88
---
86
89
87
-
### Example 2 — Laboratory Results (LB) and Adverse Events (AE) from EHR Labs
90
+
### Example 2 — Labs (LB) and Adverse Events (AE): Elevated Liver Enzyme
88
91
89
92
**SDTM domains:** LB (Laboratory Test Results), AE (Adverse Events)
90
93
**Source table:**`LabResults` (LOINC-coded lab results with raw values in original units)
91
94
**Subjects:** 2 (001, 002) × 3 liver-enzyme tests × 2 visits = 12 LB records + 1 AE record
92
-
**Lineage entries:**99`<MapID>` elements
95
+
**Lineage entries:**101`<rwdl:MapID>` elements
93
96
94
97
This example traces LOINC-coded EHR lab data through unit conversion into the SDTM LB domain, then derives an adverse event (hepatic enzyme elevation) in the AE domain:
|`DirectMap`| 39 | One-to-one mappings (LOINC → `LBTESTCD`, visit date → `LBDTC`, patient ID → `USUBJID`, etc.)|
99
-
|`LabValueParsing`| 24 | Parsing composite result strings (e.g., `"0.3507 µkat/L"`) into numeric value and unit components |
100
-
|`UnitConversion`| 24 | Converting original units (µkat/L) to standard units (U/L) with stored results in `LBSTRES`/`LBSTRESU`|
101
-
|`ElevatedLiverEnzyme`| 12 | Algorithmic derivation identifying elevated ALT/AST/ALP to produce the AE record |
99
+
|`MethodDefOID`| Count | Description |
100
+
|----------------|-------|-------------|
101
+
|*(none — direct map)*| 39 |LOINC code → `LBTEST`, visit date → `LBDTC`, `AETERM`→ `AEDECOD`/`AELLTCD`, etc. |
102
+
|`MT.LABVALPARSING`| 24 | Parse composite result strings (e.g., `"0.3507 µkat/L"`) into numeric value and unit components |
103
+
|`MT.UNITCONV`| 24 | Convert original units (µkat/L) to standard units (U/L); results in `LBSTRES`/`LBSTRESU`|
104
+
|`MT.ELEVATEDLIVERENZYME`| 12 | Evaluate ALT/AST/ALP against reference range upper limits to derive the AE record |
102
105
103
106
**Key concepts illustrated:**
104
-
- Multi-step transformations: a single lab result passes through parsing → conversion → standardization, each step recorded as a separate lineage entry.
105
-
- Cross-domain derivation: the AE domain record is derived from the LB domain, which is itself derived from source EHR data — the lineage captures both hops.
106
-
- High coverage density: 99 lineage entries across 13 SDTM records demonstrates cell-level traceability at scale, including every standard-range indicator (`LBSTNRLO`, `LBSTNRHI`, `LBNRIND`).
107
+
-**Multi-step transformations:** a single lab result passes through parsing → conversion → standardization, each step a separate lineage entry.
108
+
-**Cross-domain derivation:** the AE domain record is derived from the LB domain, which is itself derived from source EHR data — the lineage captures both hops.
109
+
-**High coverage density:** 101 lineage entries across 13 SDTM records demonstrates cell-level traceability at scale, including every standard-range indicator (`LBSTNRLO`, `LBSTNRHI`, `LBNRIND`).
107
110
108
111
→ See [`example2/README.md`](example2/README.md) for the full algorithm definitions.
109
112
@@ -129,15 +132,32 @@ exampleN/
129
132
130
133
### `define.xml`
131
134
132
-
A standard [CDISC Define-XML 2.1](https://www.cdisc.org/standards/data-exchange/define-xml) file with one addition — the `rwdl` namespace extension that references the companion lineage file:
135
+
A standard [CDISC Define-XML 2.1](https://www.cdisc.org/standards/data-exchange/define-xml) file extended with two additions:
136
+
137
+
1.**`MethodDef` elements** — one per non-direct transformation, describing the algorithm applied. Each `MethodDefOID` in `rwd-lineage.xml` resolves to one of these.
138
+
2.**`rwdl:LineageRef`** — points to the companion lineage file via a standard `def:leaf` reference.
<!-- Standard ItemGroupDef / ItemDef elements follow -->
142
162
</MetaDataVersion>
143
163
</Study>
@@ -146,50 +166,93 @@ A standard [CDISC Define-XML 2.1](https://www.cdisc.org/standards/data-exchange/
146
166
147
167
### `rwd-lineage.xml`
148
168
149
-
The core deliverable. Each `<MapID>` element represents one source-to-target cell mapping:
169
+
The core deliverable. The document has two top-level layers inside `rwdl:Lineage`:
170
+
171
+
-**`rwdl:SourceMetadata`** (optional) — assertions about the source systems: their names, data models, and the controlled terminologies their coded values are encoded in.
172
+
-**`rwdl:LineageTrail`** — the forensic record: an array of `rwdl:MapID` elements, each a Source→Target pair.
173
+
174
+
Each `rwdl:MapID` element represents one source-to-target data point mapping:
Key attributes are documented in the [RWD-Lineage Data Standard Specification](../documents/RWD-Lineage_Data_Standard_Specification.md).
235
+
Key attributes and elements are documented in the [RWD-Lineage Data Standard Specification](../documents/RWD-Lineage_Data_Standard_Specification.md).
175
236
176
237
### `ExampleN.xlsx`
177
238
178
239
A companion Excel workbook containing all source tables, SDTM output tables, and the lineage mappings in tabular form. This is provided for human review and is not a normative artifact — the XML files are the machine-readable standard.
179
240
180
241
---
181
242
182
-
## Transformation Types Used Across Examples
183
-
184
-
| Type | Example 1 | Example 2 | Description |
185
-
|------|-----------|-----------|-------------|
186
-
|`DirectMap`| ✓ | ✓ | One-to-one value copy from source to target |
187
-
|`AfterIndexDate`| ✓ || Temporal filter relative to a study index date |
188
-
|`FilterByValue`| ✓ || Conditional inclusion based on source data value |
189
-
|`NLPExtraction`| ✓ || Value extracted from unstructured text via NLP |
190
-
|`LabValueParsing`|| ✓ | Numeric/unit parsing from composite lab result strings |
191
-
|`UnitConversion`|| ✓ | Conversion between measurement unit systems |
192
-
|`ElevatedLiverEnzyme`|| ✓ | Algorithmic derivation of adverse events from lab data |
243
+
## Transformations Across Examples
244
+
245
+
Transformations are expressed via the `MethodDefOID` attribute on `rwdl:MapID`, which references a `MethodDef` element in `define.xml`. Direct maps — where the source record directly supports the target without an algorithmic transformation — carry no `MethodDefOID`.
246
+
247
+
|`MethodDefOID`| Example 1 | Example 2 | Description |
0 commit comments