You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| 1 |`storage`| string (Enum) | Required | The container type. Values from the **RWDL Storage Type** codelist (see Controlled Terminology): `DATABASE`, `FILESYSTEM`, `API`, `MESSAGE`. |
35
+
| 2 |`structure`| string (Enum) | Required | The addressing mechanism for locating a value within the source. Values from the **RWDL Structure Type** codelist (see Controlled Terminology): `TABULAR`, `PATH`, `OBJECT`. |
36
36
| 3 |`URI`| string | Conditional | The full connection string, file path, or API endpoint. |
37
37
| 4 |`Database`| string | Conditional | The specific database name (Required for `storage="Database"`). |
38
38
| 5 |`Schema`| string | Conditional | The schema name (Required for `storage="Database"`). |
39
39
| 6 |`Table`| string | Conditional | The table name (Required for `storage="Database"`). |
40
-
| 7 |`RowIndex`| integer | Conditional | The row number (One of `RowIndex` or `RowKey` required for `structure="Tabular"`). |
41
-
| 8 |`RowKey`| string/integer | Conditional | The Primary Key field (One of `RowIndex` or `RowKey` required for `structure="Tabular"`). |
40
+
| 7 |`RowIndex`| integer | Conditional | The row number (One of `RowIndex` or `RowKey` required for `structure="TABULAR"`). |
41
+
| 8 |`RowKey`| string/integer | Conditional | The Primary Key field (One of `RowIndex` or `RowKey` required for `structure="TABULAR"`). |
42
42
| 9 |`RowKeyValue`| string/integer | Conditional | The Primary Key value (Required if `RowKey` is used). |
43
-
| 10 |`ColumnName`| string | Conditional | The header/variable name (Required for `structure="Tabular"`). |
44
-
| 11 |`Path`| string | Conditional | The navigation string (XPath/JSONPath) (Required for `structure="Tree"`). |
45
-
| 12 |`Format`| string | Optional | The specific format of the file or response (e.g., "JSON", "XML", "CSV"). |
43
+
| 10 |`ColumnName`| string | Conditional | The header/variable name (Optional for `structure="TABULAR"` — omitted for key-value-shaped data with row identifiers but no distinct column dimension). |
44
+
| 11 |`Path`| string | Conditional | The navigation string used to address a value (e.g., XPath, JSONPath, FHIRPath, Cypher, SPARQL) (Required for `structure="PATH"`). The syntax is declared on the `Path` element via the `syntax` attribute. |
45
+
| 12 |`Format`| string (Enum) | Optional | The serialization format of the source. Values from the **RWDL Data Format** codelist (see Controlled Terminology), e.g., `JSON`, `XML`, `CSV`, `PARQUET`, `XLSX`, `PDF`. |
46
46
47
47
### Coordinates
48
48
49
-
List of supported coordinates — designed to be extensible by defining the `structure` attribute in the XML schema (e.g., `type="Graph"` or `type="Stream"`).
49
+
The `structure` and `storage` attributes are governed by controlled terminology. See the Controlled Terminology section for the full codelists, definitions, and submission values.
50
50
51
51
#### Structural Formats
52
52
53
-
-**Tabular** — Data organized in a row-and-column format (e.g., CSV, SQL Tables, SAS Datasets).
54
-
-**Tree** — Data organized in a hierarchical, nested format (e.g., JSON, XML, FHIR resources).
55
-
-**Files** — Data treated as a singular object or blob within a directory structure (e.g., PDF reports, images).
53
+
The `structure` attribute classifies how a value within a source is addressed, not the data model of the source itself.
54
+
55
+
-**TABULAR** — Value addressed by row identifier (index or key) and column name (e.g., SQL tables, SAS XPT, CSV files, key-value stores).
56
+
-**PATH** — Value addressed by a path or query expression that locates the value within a structured source (e.g., JSON, XML, FHIR resources, property graphs, RDF triplestores). The syntax of the path expression is declared on the `Path` element.
57
+
-**OBJECT** — Value addressed as a whole object with no sub-addressing; the URI is the location (e.g., PDF reports, medical images, binary blobs).
56
58
57
59
**Scope:**
58
-
-*In Scope (Current):* Deterministic, static structures where a value's location can be explicitly defined by a rigid index, key, or path (e.g., "Row 5, Col A" or `$.patient.id`).
59
-
-*Out of Scope (Extensible):* Non-deterministic or unstructured data requiring semantic interpretation (e.g., free-text clinical notes requiring NLP, video/audio streams, graph databases relying on complex pattern matching).
60
+
-*In Scope (Current):* Deterministic, static structures where a value's location can be explicitly defined by an index, key, path expression, or URI alone.
61
+
-*Out of Scope:* Non-deterministic or unstructured data requiring semantic interpretation (e.g., free-text clinical notes requiring NLP, video/audio streams).
-**Filesystem** — Flat files stored on a local disk, network drive, or object storage (e.g., S3).
65
-
-**API** — Data accessible via web service endpoints (e.g., REST, SOAP).
65
+
-**DATABASE** — Structured data engines accessed via connection protocol (e.g., SQL, NoSQL).
66
+
-**FILESYSTEM** — Flat files on local disk, network share, or object storage (e.g., POSIX, S3, Azure Blob, GCS).
67
+
-**API** — Data accessed via request/response web service endpoint (e.g., REST, SOAP, GraphQL, FHIR API).
68
+
-**MESSAGE** — Data delivered as discrete units over a message transport or event stream (e.g., HL7 v2 over MLLP, FHIR Messaging, Kafka, Kinesis, AMQP, MQTT, webhooks).
66
69
67
70
**Scope:**
68
-
-*In Scope (Current):* Standard digital repositories accessible via common, widely supported protocols (JDBC/ODBC, POSIX/S3, HTTP/REST).
69
-
-*Out of Scope (Extensible):* Physical media (paper records requiring OCR), proprietary legacy systems without standard connectivity, and Distributed Ledger Technology (blockchain).
71
+
-*In Scope (Current):* Standard digital repositories accessible via common, widely supported protocols (JDBC/ODBC, POSIX/S3, HTTP/REST, message broker protocols).
72
+
-*Out of Scope:* Physical media (paper records requiring OCR), proprietary legacy systems without standard connectivity, and Distributed Ledger Technology (blockchain).
70
73
71
74
### Lineage Trail Attributes
72
75
@@ -87,28 +90,110 @@ List of supported coordinates — designed to be extensible by defining the `str
87
90
88
91
#### Storage Coordinates
89
92
90
-
**Database:**
93
+
**Database (`storage="DATABASE"`):**
91
94
-`URI` — The connection string (e.g., `jdbc:postgresql://host:port/db`).
92
95
-`Database` — The specific database name context.
93
96
-`Schema` — The schema name (e.g., `public`, `dbo`, `clinical_data`).
97
+
-`Table` — The table name.
94
98
95
-
**Filesystem:**
99
+
**Filesystem (`storage="FILESYSTEM"`):**
96
100
-`URI` — The full file path or object storage URI (e.g., `file://server/share/data.csv` or `s3://bucket/key`).
97
101
98
-
**API:**
102
+
**API (`storage="API"`):**
99
103
-`URI` — The full endpoint URL including query parameters (e.g., `https://api.hospital.org/fhir/Patient/123`).
100
104
105
+
**Message (`storage="MESSAGE"`):**
106
+
-`URI` — The transport endpoint or topic identifier (e.g., `kafka://broker:9092/topic-adt`, `mllp://hospital-feed:2575`).
107
+
101
108
#### Structural Coordinates
102
109
103
-
**Tabular Data:**
104
-
-`RowIndex` — The specific row number or Primary Key value identifying the record.
105
-
-`ColumnName` — The header name or variable name of the specific cell.
110
+
**Tabular (`structure="TABULAR"`):**
111
+
-`RowIndex` — The specific row number, OR
112
+
-`RowKey` + `RowKeyValue` — The primary key field name and its value.
113
+
-`ColumnName` — The header or variable name (omitted for key-value-shaped data).
114
+
115
+
**Path-Addressable (`structure="PATH"`):**
116
+
-`Path` — The navigation or query expression used to address the value, with `syntax` attribute declaring the expression language (e.g., XPath for XML, JSONPath for JSON, FHIRPath for FHIR resources, Cypher for property graphs, SPARQL for RDF triplestores).
117
+
118
+
**Object (`structure="OBJECT"`):**
119
+
-`URI` — The identifier of the object as a whole. No sub-addressing.
120
+
121
+
122
+
123
+
## Controlled Terminology
124
+
125
+
This section defines the controlled terminology (codelists) governing enumerated attributes in RWD Lineage. Codelists are submitted to the CDISC Controlled Terminology team under the `RWDL` prefix and are intended to be published through CDISC and NCI Enterprise Vocabulary Services (NCI-EVS) on the standard CDISC release cadence.
126
+
127
+
The codelists in this section are finalized for V1. Additional codelists (Path Syntax, Data Model) are under discussion and will be added in a future revision once decisions are settled.
128
+
129
+
### RWDL Storage Type
130
+
131
+
Governs the `storage` attribute on the Coordinate element.
132
+
133
+
**Extensibility:** Non-extensible. The four values comprehensively cover the architectural categories of data access (query-connection, file-path, request/response, message transport).
134
+
135
+
| Submission Value | Preferred Term | Definition |
|`DATABASE`| Database | Structured data engine accessed via connection protocol (SQL, NoSQL). |
138
+
|`FILESYSTEM`| Filesystem | Flat files on local disk, network share, or object storage (POSIX, S3, Azure Blob, GCS). |
139
+
|`API`| Application Programming Interface | Data accessed via request/response web service endpoint (REST, SOAP, GraphQL, FHIR API). |
140
+
|`MESSAGE`| Messages | Data delivered as discrete units over a message transport or event stream (HL7 v2, FHIR Messaging, Kafka, Kinesis, AMQP, MQTT, webhooks). |
141
+
142
+
### RWDL Structure Type
143
+
144
+
Governs the `structure` attribute on the Coordinate element. Each value corresponds to a distinct addressing mechanism rather than to the data model of the source.
145
+
146
+
**Extensibility:** Non-extensible. The three values correspond directly to the addressing mechanisms the specification itself defines (row-and-column, path expression, whole-object).
147
+
148
+
| Submission Value | Preferred Term | Definition | Required Addressing |
|`TABULAR`| Tabular | Value addressed by row identifier and column name. |`RowIndex` or (`RowKey` + `RowKeyValue`); plus `ColumnName` (optional for key-value-shaped data). |
151
+
|`PATH`| Path-Addressable | Value addressed by a path or query expression that locates the value within a structured source. |`Path` element with `syntax` attribute. |
152
+
|`OBJECT`| Object | Value is addressed as a whole object with no sub-addressing; the URI is the location. |`URI` only. No `RowIndex`, `ColumnName`, or `Path`. |
153
+
154
+
**Coverage notes:**
155
+
- Tree-structured sources (JSON, XML, FHIR resources) are addressed as `structure="PATH"` with `syntax="JSONPath"`, `"XPath"`, or `"FHIRPath"`.
156
+
- Graph sources (property graphs, RDF triplestores) are addressed as `structure="PATH"` with `syntax="Cypher"` or `"SPARQL"`.
157
+
- Key-value stores (Redis, DynamoDB) are addressed as `structure="TABULAR"` with `RowKey`/`RowKeyValue` populated and `ColumnName` omitted.
158
+
- Whole-object sources (PDF reports, medical images, opaque blobs) are addressed as `structure="OBJECT"`.
159
+
160
+
### RWDL Data Format
161
+
162
+
Governs the `Format` attribute on the Coordinate element. Scoped strictly to serialization layer: how bytes are arranged.
106
163
107
-
**Tree:**
108
-
-`Path` — The navigation string used to traverse the hierarchy (e.g., XPath for XML, JSONPath for JSON).
164
+
**Extensibility:** Extensible. Sponsors populating a value not present in the published codelist flag the value as an extension using the Define-XML convention (`def:ExtendedValue="Yes"` on the relevant CodeList element) and are encouraged to contribute commonly-used extensions back to CDISC for consideration in future codelist versions.
109
165
110
-
**Files:**
111
-
-`URI` — The identifier of the specific file if the lineage points to the file as a whole object.
166
+
| Submission Value | Preferred Term | Definition |
0 commit comments