Skip to content

Commit 0797ff6

Browse files
authored
Merge pull request #79 from calypr/fixes/doc-links
Fixing some of the dead links
2 parents d679456 + d039db2 commit 0797ff6

83 files changed

Lines changed: 844 additions & 874 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/calypr/data/git-drs.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,9 @@ Visit the [Quick Start Guide](../quick-start.md) for detailed, OS-specific insta
2626
| :--- | :--- |
2727
| **git-drs** | Manages large file tracking, storage, and DRS indexing. |
2828
| **forge** | Handles metadata validation, transformation (ETL), and publishing. |
29-
| **data-client** | Administrative tool for managing [collaborators and access requests](../../tools/data-client/access_requests.md). |
29+
| **data-client** | Administrative tool for managing [collaborators and access requests](../../tools/data-client/docs/access_requests.md). |
3030
{: .caption }
3131

3232
## Git DRS Workflows
3333

34-
For complete Git DRS documentation including project initialization, file management, and upload workflows, see the [Git DRS Quick Start](../../tools/git-drs/quickstart.md).
34+
For complete Git DRS documentation including project initialization, file management, and upload workflows, see the [Git DRS Quick Start](../../tools/git-drs/docs/quickstart.md).

docs/calypr/data/integration.md

Lines changed: 61 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,59 @@
11

22
# Integrating your data
33

4-
Converting tabular data (CSV, TSV, spreadsheet, database table) into FHIR (Fast Healthcare Interoperability Resources) involves several steps to map the data in the spreadsheet to FHIR's resource structure. Here is what you need to know to get started:
4+
Converting tabular data (CSV, TSV, spreadsheet, database table) into FHIR (Fast Healthcare Interoperability Resources) involves mapping your data to FHIR's resource structure. This guide walks you through the integration process from data preparation to validation.
55

6-
As you create a upload files, you can tag them with identifiers which by default will create minimal, skeleton graph.
6+
## Overview
77

8-
You can retrieve that data using the [git-drs](../../tools/git-drs/index.md) command line tool, and update the metadata using [forge](../../tools/forge/index.md) to create a more complete graph representing your study.
8+
When you create and upload files, you can tag them with identifiers to establish an initial skeleton graph. You can then retrieve that data using the [git-drs](../../tools/git-drs/docs/index.md) command line tool and enhance the metadata using [forge](../../tools/forge/docs/index.md) to create a more complete graph representing your study.
99

10-
You may choose to work with the data in its "native" JSON format, or convert it to a tabular format for integration. The system will re-convert tabular data back to JSON for submittal.
10+
You may work with data in its "native" JSON format or convert it to a tabular format for integration. The system automatically re-converts tabular data back to JSON for submission.
1111

12-
The process of integrating your data into the graph involves several steps:
12+
## Integration process
1313

14-
* Step 1: Identify Data and FHIR Resources
15-
* Inventory tabular data: Review the spreadsheet to understand the types of data it contains (e.g., patient demographics, lab results, medications).
16-
* Understand FHIR Resources: Familiarize yourself with FHIR resources relevant to the data in your spreadsheet (e.g., Patient, Observation, Specimen, etc.).
14+
The process of integrating your data into the graph involves six key steps:
1715

18-
* Step 2: Mapping Spreadsheet Columns to FHIR Fields
19-
* Analyze Columns: Map each column in the spreadsheet to corresponding fields in FHIR resources. For instance, you may have a field called biopsy_anatomical_location with content of "Prostate needle biopsies", that would map to Specimen.collection.method and Specimen.collection.bodySite.
20-
* Handle Relationships: Identify how different pieces of data relate to each other and how they map to FHIR resource relationships (e.g., linking patients to their observations).
16+
### Step 1: Identify Data and FHIR Resources
17+
18+
Before mapping, understand what data you have and which FHIR resources it should map to.
19+
20+
* **Inventory your tabular data**: Review your spreadsheet to understand the types of data it contains (e.g., patient demographics, lab results, medications, specimen information).
21+
* **Understand relevant FHIR resources**: Familiarize yourself with FHIR resources that match your data types (e.g., [Patient](https://hl7.org/fhir/patient.html), [Observation](https://hl7.org/fhir/observation.html), [Specimen](https://hl7.org/fhir/specimen.html), [DocumentReference](https://hl7.org/fhir/documentreference.html)).
22+
23+
### Step 2: Map Spreadsheet Columns to FHIR Fields
24+
25+
Create a mapping between your data columns and FHIR resource fields.
26+
27+
* **Analyze your columns**: Systematically map each column to corresponding FHIR resource fields. For example, if you have a `biopsy_anatomical_location` field with values like "Prostate needle biopsies", map it to the appropriate FHIR Specimen fields such as `collection.method` and `collection.bodySite`.
28+
* **Handle relationships**: Identify how different data pieces relate to each other and map them to FHIR resource references (e.g., linking Patient resources to their Observation resources).
2129

22-
* Step 3: Data Transformation and Structure
23-
* Prepare Data: Ensure data consistency and format alignment. Dates, codes, and identifiers should comply with FHIR standards.
24-
* Normalize Data: Split the spreadsheet data into FHIR-compliant resources.
30+
### Step 3: Transform and Structure Your Data
31+
32+
Prepare your data to comply with FHIR standards.
33+
34+
* **Ensure consistency**: Validate data formats, especially dates (ISO 8601 format), codes (use appropriate code systems), and identifiers. Remove duplicates and address missing values.
35+
* **Normalize into resources**: Split your spreadsheet data into separate FHIR resources. For example, separate patient demographics into Patient resources and test results into Observation resources.
36+
37+
### Step 4: Use FHIR Tooling and Validation
2538

26-
* Step 4: Utilize provided FHIR Tooling or Libraries
27-
* FHIR Tooling: Use `forge meta` and associated libraries to support data conversion and validation.
28-
* Validation: Use `forge validate` to validate the transformed data against FHIR specifications to ensure compliance and accuracy.
39+
Leverage provided tools to convert and validate your data.
2940

30-
* Step 5: Import into FHIR-Compatible System
31-
* Load Data: Use `git commit` and `git push` to manage your local data state.
32-
* Testing and Verification: Ensure your data appears correctly in the portal and analysis tools after a successful push.
41+
* **Convert your data**: Use `forge meta` to transform your data into FHIR-compliant JSON format.
42+
* **Validate compliance**: Use `forge validate` to check that your transformed data conforms to FHIR specifications. This catches errors before submission and ensures your data is valid.
3343

34-
* Step 6: Iterate and Refine
35-
* Review and Refine: Check for any discrepancies or issues during the import process. Refine the conversion process as needed.
36-
* Feedback Loop: Gather feedback from users or stakeholders to improve the mapping and conversion process.
44+
### Step 5: Commit and Deploy Your Data
45+
46+
Submit your validated data to the system.
47+
48+
* **Version and commit**: Use `git commit` to track your changes with descriptive messages.
49+
* **Deploy**: Use `git push` to submit your data. Verify that your data appears correctly in the portal and analysis tools after deployment.
50+
51+
### Step 6: Iterate and Improve
52+
53+
Refine your data based on feedback and validation results.
54+
55+
* **Review and validate**: Check for discrepancies or issues in how your data appears in the system. Review user feedback.
56+
* **Refine mappings**: Make adjustments to your data transformations and FHIR mappings as needed to improve accuracy and completeness.
3757

3858

3959
## Ontologies
@@ -72,41 +92,48 @@ The mapping process typically involves several steps:
7292

7393
## Identifiers
7494

75-
Identifiers in FHIR references typically include the following components: [see more](https://hl7.org/fhir/datatypes.html#Identifier)
95+
[Identifiers in FHIR](https://hl7.org/fhir/datatypes.html#Identifier) are strings (typically numeric or alphanumeric) that uniquely identify an object or entity within a system. They are essential for connecting resources within FHIR to external systems and maintaining data integrity across platforms.
7696

77-
> A string, typically numeric or alphanumeric, that is associated with a single object or entity within a given system. Typically, identifiers are used to connect content in resources to external content available in other frameworks or protocols.
97+
Identifiers have two key components:
7898

79-
System: Indicates the system or namespace to which the identifier belongs. By default the namespace is `http://calypr-public.ohsu.edu/<project-id>`.
99+
* **System**: The namespace or system to which the identifier belongs. The default namespace is `http://calypr-public.ohsu.edu/<project-id>`. This ensures your identifiers don't conflict with identifiers from other systems.
100+
* **Value**: The actual identifier string (e.g., a subject ID like "SUBJ-001" or a specimen ID like "SPEC-12345").
80101

81-
Value: The actual value of the identifier within the specified system. For instance, a lab controlled subject identifier or a specimen identifier.
102+
**Example**: A patient identifier might be represented as:
103+
- System: `http://calypr-public.ohsu.edu/study-123`
104+
- Value: `PAT-00542`
82105

83106

84107

85108
## References
86109

87-
By using identifiers in references, FHIR ensures that data can be accurately linked, retrieved, and interpreted across different systems and contexts within the healthcare domain, promoting interoperability and consistency in data exchange. [see more](https://hl7.org/fhir/references.html)
110+
By using [identifiers in references](https://hl7.org/fhir/references.html), FHIR ensures that data can be accurately linked, retrieved, and interpreted across different systems and contexts within the healthcare domain, promoting interoperability and consistency in data exchange.
88111

89112
> Many of the defined elements in a resource are references to other resources. Using these references, the resources combine to build a web of information about healthcare.
90113
91114

92115
## Key resources
93116

94117
### ResearchStudy
95-
> A scientific study of nature that sometimes includes processes involved in health and disease. [see more](https://hl7.org/fhir/researchstudy.html)
118+
A [scientific study](https://hl7.org/fhir/researchstudy.html) of nature that sometimes includes processes involved in health and disease.
96119

97120
### ResearchSubject
98-
> A ResearchSubject is a participant or object which is the recipient of investigative activities in a research study. [see more](https://hl7.org/fhir/researchsubject.html)
121+
A [ResearchSubject](https://hl7.org/fhir/researchsubject.html) is a participant or object which is the recipient of investigative activities in a research study.
99122

100123

101124
### Patient
102-
> Demographics and other administrative information about an individual or animal receiving care or other health-related services. [see more](https://hl7.org/fhir/patient.html)
125+
A [Patient](https://hl7.org/fhir/patient.html) connects to Demographics and other administrative information about an individual or animal receiving care or other health-related services.
103126

104127
### Specimen
105128

106-
> A sample to be used for analysis. [see more](https://hl7.org/fhir/specimen.html)
129+
A [Specimen](https://hl7.org/fhir/specimen.html) represents a sample collected during a healthcare event and used for analysis or testing. It includes information about the sample type, collection method, and processing.
107130

108131
### DocumentReference
109-
> A reference to a document of any kind for any purpose. [see more](https://hl7.org/fhir/documentreference.html)
132+
A [DocumentReference](https://hl7.org/fhir/documentreference.html) is a reference to a document of any kind for any purpose.
133+
134+
135+
## Next steps
110136

137+
With your data integrated into FHIR format, you can now manage and enhance your metadata. See the [data management section](managing-metadata.md) for detailed guidance on creating, updating, and maintaining your metadata.
111138

112-
See the [data management section](managing-metadata.md) for more information on how to create and upload metadata.
139+
For more information on ontologies and how SNOMED CT codes enhance your data, see the [Ontologies](#ontologies) section above.

docs/calypr/data/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Given all of the intricacies healthcare and experimental data, we use Fast Healt
55

66
## What is FHIR?
77

8-
> In a Gen3 data commons, a semantic distinction is made between two types of data: "data files" and "metadata". [more](https://gen3.org/resources/user/dictionary/#understanding-data-representation-in-gen3)
8+
In a Gen3 data commons, a semantic distinction is made between two types of data: ["data files" and "metadata"](https://gen3.org/resources/user/dictionary/#understanding-data-representation-in-gen3)
99

1010
A "data file" could be information like tabulated data values in a spreadsheet or a fastq/bam file containing DNA sequences. The contents of the file are not exposed to the API as queryable properties, so the file must be downloaded to view its content.
1111

docs/calypr/data/managing-metadata.md

Lines changed: 36 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,26 @@
11
# Managing Metadata
22

3-
Metadata in Calypr is formatted using the Fast Healthcare Interoperability Resources (FHIR) schema. If you choose to bring your own FHIR newline delimited json data, you will need to create a directory called “META” in your git-drs repository in the same directory that you initialized your git-drs repository, and place your metadata files in that directory.
4-
The META/ folder contains newline-delimited JSON (.ndjson) files representing FHIR resources describing the project, its data, and related entities. Large files are tracked using Git LFS, with a required correlation between each data file and a DocumentReference resource. This project follows a standardized structure to manage large research data files and associated FHIR metadata in a version-controlled, DRS and FHIR compatible format.
5-
Each file must contain only one type of FHIR resource type, for example META/ResearchStudy.ndjson only contains research study resource typed FHIR objects. The name of the file doesn’t have to match the resource type name, unless you bring your own document references, then you must use DocumentReference.ndjson. For all other FHIR file types this is simply a good organizational practice for organizing your FHIR metadata.
3+
Metadata in Calypr is represented as FHIR resources in newline-delimited JSON files (.ndjson).
4+
5+
If you are bringing your own FHIR metadata, create a `META/` directory at the root of your initialized `git-drs` repository and place your metadata files there.
6+
7+
The `META/` directory contains one resource file, with each file representing a single FHIR resource type. For example, `META/ResearchStudy.ndjson` should contain only `ResearchStudy` resources. Using one file per resource type keeps validation and troubleshooting straightforward.
8+
9+
For projects with Git LFS-managed data files, each data file must have a corresponding `DocumentReference` resource.
610

711
## META/ResearchStudy.ndjson
812

9-
* The File directory structure root research study is based on the 1st Research Study in the document. This research study is the research study that the autogenerated document references are connected to. Any additional research studies that are provided will be ignored when populating the miller table file tree.
13+
* The entry tree root in the portal is based on the first `ResearchStudy` record in this file.
14+
* Auto-generated `DocumentReference` resources are linked to that first `ResearchStudy`.
15+
* Additional `ResearchStudy` records may be preserved in metadata, but they are not used to build the default file tree root.
1016
* Contains at least one FHIR ResearchStudy resource describing the project.
1117
* Defines project identifiers, title, description, and key attributes.
1218

1319
## META/DocumentReference.ndjson
1420

1521
* Contains one FHIR DocumentReference resource per Git LFS-managed file.
1622
* Each `DocumentReference.content.attachment.url` field:
17-
* Must exactly match the relative path of the corresponding file in the repository.
23+
* Must exactly match the relative path of the corresponding file in the repository (for example, `data/file1.bam`).
1824
* Example:
1925

2026
```json
@@ -43,19 +49,26 @@ cp ~/my-data/specimens.ndjson META/
4349
cp ~/my-data/document-references.ndjson META/
4450
```
4551

46-
## Other FHIR data
52+
## Other FHIR Data
4753

48-
\[TODO More intro text here\]
54+
You can include additional resource types to represent subjects, specimens, assays, and measurements.
4955

50-
* Patient.ndjson: Participant records.
51-
* Specimen.ndjson: Biological specimens.
52-
* ServiceRequest.ndjson: Requested procedures.
53-
* Observation.ndjson: Measurements or results.
56+
Common examples:
57+
58+
* `Patient.ndjson`: Participant records.
59+
* `ResearchSubject.ndjson`: Participant enrollment in a study.
60+
* `Specimen.ndjson`: Biological specimens.
61+
* `Task.ndjson` or `ServiceRequest.ndjson`: Procedures, pipeline steps, or assay workflow context.
62+
* `Observation.ndjson`: Measurements or results.
5463
* Other valid FHIR resource types as required.
5564

56-
Ensure your FHIR `DocumentReference` resources reference the DRS URIs:
65+
When these files are present, ensure references are internally consistent (for example, a `DocumentReference.subject.reference` should point to an existing `Patient`, `Specimen`, or `ResearchStudy` record).
66+
67+
### Important: `DocumentReference` URL Format
68+
69+
In a `git-drs` repository, `DocumentReference.content.attachment.url` should be the repository-relative file path, not a `drs://` URI.
5770

58-
Example `DocumentReference` linking to S3 file:
71+
Example:
5972

6073
```json
6174
{
@@ -64,7 +77,7 @@ Example `DocumentReference` linking to S3 file:
6477
"status": "current",
6578
"content": [{
6679
"attachment": {
67-
"url": "drs://calypr-public.ohsu.edu/your-drs-id",
80+
"url": "data/sample1.bam",
6881
"title": "sample1.bam",
6982
"contentType": "application/octet-stream"
7083
}
@@ -80,10 +93,10 @@ Example `DocumentReference` linking to S3 file:
8093

8194
## Validating Metadata
8295

83-
To ensure that the FHIR files you have added to the project are correct and pass schema checking, you can use the [Forge tool](../../tools/forge/index.md).
96+
To ensure that the FHIR files you added are valid and graph-consistent, use [Forge validation](../../tools/forge/docs/validation.md).
8497

8598
```bash
86-
forge validate
99+
forge validate data --path META
87100
```
88101

89102
Successful output:
@@ -99,21 +112,21 @@ Fix any validation errors and re-run until all files pass.
99112

100113
### Forge Data Quality Assurance Command Line Commands
101114

102-
If you have provided your own FHIR resources there are two commands that might be useful to you for ensuring that your FHIR metadata will appear on the CALYPR data platform as expected. These commands are validate and check-edge
115+
If you provide your own FHIR resources, these two commands are the most useful checks before submission.
103116

104117
**Validate:**
105118
```bash
106-
forge validate META
119+
forge validate data --path META
107120
# or
108-
forge validate META/DocumentReference.ndjson
121+
forge validate data --path META/DocumentReference.ndjson
109122
```
110123
Validation checks if the provided directory or file will be accepted by the CALYPR data platform. It catches improper JSON formatting and FHIR schema errors.
111124

112125
**Check-edge:**
113126
```bash
114-
forge check-edge META
127+
forge validate edge --path META
115128
# or
116-
forge validate META/DocumentReference.ndjson
129+
forge validate edge --path META --out-dir tmp/graph-check
117130
```
118131
Check-edge ensures that references within your files (e.g., a Patient ID in an Observation) connect to known vertices and aren't "orphaned".
119132

@@ -145,13 +158,13 @@ Check-edge ensures that references within your files (e.g., a Patient ID in an O
145158
* Validates that DocumentReference resources reference the same ResearchStudy via relatesTo or other linking mechanisms.
146159
* If FHIR resources like Patient, Specimen, ServiceRequest, Observation are present, ensures:
147160
* Their id fields are unique.
148-
* DocumentReference correctly refers to those resources (e.g., via subject or related fields).
161+
* DocumentReference correctly refers to those resources (for example, via `subject`).
149162

150163
#### 5\. Cross-Entity Consistency
151164

152165
* If multiple optional FHIR .ndjson files exist:
153166
* Confirms IDs referenced in one file exist in others.
154-
* Detects dangling references (e.g., a DocumentReference.patient ID that's not in Patient.ndjson).
167+
* Detects dangling references (for example, a `DocumentReference.subject.reference` that points to a missing `Patient`).
155168

156169
---
157170

0 commit comments

Comments
 (0)