calypr
diff --git a/‎docs/calypr/data/git-drs.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/calypr/data/git-drs.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/calypr/data/integration.md‎
Lines changed: 61 additions & 34 deletions b/‎docs/calypr/data/integration.md‎
Lines changed: 61 additions & 34 deletions
diff --git a/‎docs/calypr/data/introduction.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/calypr/data/introduction.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/calypr/data/managing-metadata.md‎
Lines changed: 36 additions & 23 deletions b/‎docs/calypr/data/managing-metadata.md‎
Lines changed: 36 additions & 23 deletions
@@ -26,9 +26,9 @@ Visit the [Quick Start Guide](../quick-start.md) for detailed, OS-specific insta
 | :--- | :--- |
 | **git-drs** | Manages large file tracking, storage, and DRS indexing. |
 | **forge** | Handles metadata validation, transformation (ETL), and publishing. |
-| **data-client** | Administrative tool for managing [collaborators and access requests](../../tools/data-client/access_requests.md). |
+| **data-client** | Administrative tool for managing [collaborators and access requests](../../tools/data-client/docs/access_requests.md). |
 {: .caption }
 
 ## Git DRS Workflows
 
-For complete Git DRS documentation including project initialization, file management, and upload workflows, see the [Git DRS Quick Start](../../tools/git-drs/quickstart.md).
+For complete Git DRS documentation including project initialization, file management, and upload workflows, see the [Git DRS Quick Start](../../tools/git-drs/docs/quickstart.md).
@@ -1,39 +1,59 @@
 
 # Integrating your data
 
-Converting tabular data (CSV, TSV, spreadsheet, database table) into FHIR (Fast Healthcare Interoperability Resources) involves several steps to map the data in the spreadsheet to FHIR's resource structure. Here is what you need to know to get started:
+Converting tabular data (CSV, TSV, spreadsheet, database table) into FHIR (Fast Healthcare Interoperability Resources) involves mapping your data to FHIR's resource structure. This guide walks you through the integration process from data preparation to validation.
 
-As you create a upload files, you can tag them with identifiers which by default will create minimal, skeleton graph.
+## Overview
 
-You can retrieve that data using the [git-drs](../../tools/git-drs/index.md) command line tool, and update the metadata using [forge](../../tools/forge/index.md) to create a more complete graph representing your study.
+When you create and upload files, you can tag them with identifiers to establish an initial skeleton graph. You can then retrieve that data using the [git-drs](../../tools/git-drs/docs/index.md) command line tool and enhance the metadata using [forge](../../tools/forge/docs/index.md) to create a more complete graph representing your study.
 
-You may choose to work with the data in its "native" JSON format, or convert it to a tabular format for integration. The system will re-convert tabular data back to JSON for submittal.
+You may work with data in its "native" JSON format or convert it to a tabular format for integration. The system automatically re-converts tabular data back to JSON for submission.
 
-The process of integrating your data into the graph involves several steps:
+## Integration process
 
-* Step 1: Identify Data and FHIR Resources
-    * Inventory tabular data: Review the spreadsheet to understand the types of data it contains (e.g., patient demographics, lab results, medications).
-    * Understand FHIR Resources: Familiarize yourself with FHIR resources relevant to the data in your spreadsheet (e.g., Patient, Observation, Specimen, etc.).
+The process of integrating your data into the graph involves six key steps:
 
-* Step 2: Mapping Spreadsheet Columns to FHIR Fields
-    * Analyze Columns: Map each column in the spreadsheet to corresponding fields in FHIR resources. For instance, you may have a field called biopsy_anatomical_location with content of "Prostate needle biopsies", that would map to Specimen.collection.method and Specimen.collection.bodySite.
-    * Handle Relationships: Identify how different pieces of data relate to each other and how they map to FHIR resource relationships (e.g., linking patients to their observations).
+### Step 1: Identify Data and FHIR Resources
+
+Before mapping, understand what data you have and which FHIR resources it should map to.
+
+* **Inventory your tabular data**: Review your spreadsheet to understand the types of data it contains (e.g., patient demographics, lab results, medications, specimen information).
+* **Understand relevant FHIR resources**: Familiarize yourself with FHIR resources that match your data types (e.g., [Patient](https://hl7.org/fhir/patient.html), [Observation](https://hl7.org/fhir/observation.html), [Specimen](https://hl7.org/fhir/specimen.html), [DocumentReference](https://hl7.org/fhir/documentreference.html)).
+
+### Step 2: Map Spreadsheet Columns to FHIR Fields
+
+Create a mapping between your data columns and FHIR resource fields.
+
+* **Analyze your columns**: Systematically map each column to corresponding FHIR resource fields. For example, if you have a `biopsy_anatomical_location` field with values like "Prostate needle biopsies", map it to the appropriate FHIR Specimen fields such as `collection.method` and `collection.bodySite`.
+* **Handle relationships**: Identify how different data pieces relate to each other and map them to FHIR resource references (e.g., linking Patient resources to their Observation resources).
 
-* Step 3: Data Transformation and Structure
-    * Prepare Data: Ensure data consistency and format alignment. Dates, codes, and identifiers should comply with FHIR standards.
-    * Normalize Data: Split the spreadsheet data into FHIR-compliant resources.
+### Step 3: Transform and Structure Your Data
+
+Prepare your data to comply with FHIR standards.
+
+* **Ensure consistency**: Validate data formats, especially dates (ISO 8601 format), codes (use appropriate code systems), and identifiers. Remove duplicates and address missing values.
+* **Normalize into resources**: Split your spreadsheet data into separate FHIR resources. For example, separate patient demographics into Patient resources and test results into Observation resources.
+
+### Step 4: Use FHIR Tooling and Validation
 
-* Step 4: Utilize provided FHIR Tooling or Libraries
-    * FHIR Tooling: Use `forge meta` and associated libraries to support data conversion and validation.
-    * Validation: Use `forge validate` to validate the transformed data against FHIR specifications to ensure compliance and accuracy.
+Leverage provided tools to convert and validate your data.
 
-* Step 5: Import into FHIR-Compatible System
-    * Load Data: Use `git commit` and `git push` to manage your local data state.
-    * Testing and Verification: Ensure your data appears correctly in the portal and analysis tools after a successful push.
+* **Convert your data**: Use `forge meta` to transform your data into FHIR-compliant JSON format.
+* **Validate compliance**: Use `forge validate` to check that your transformed data conforms to FHIR specifications. This catches errors before submission and ensures your data is valid.
 
-* Step 6: Iterate and Refine
-    * Review and Refine: Check for any discrepancies or issues during the import process. Refine the conversion process as needed.
-    * Feedback Loop: Gather feedback from users or stakeholders to improve the mapping and conversion process.
+### Step 5: Commit and Deploy Your Data
+
+Submit your validated data to the system.
+
+* **Version and commit**: Use `git commit` to track your changes with descriptive messages.
+* **Deploy**: Use `git push` to submit your data. Verify that your data appears correctly in the portal and analysis tools after deployment.
+
+### Step 6: Iterate and Improve
+
+Refine your data based on feedback and validation results.
+
+* **Review and validate**: Check for discrepancies or issues in how your data appears in the system. Review user feedback.
+* **Refine mappings**: Make adjustments to your data transformations and FHIR mappings as needed to improve accuracy and completeness.
 
 
 ## Ontologies
@@ -72,41 +92,48 @@ The mapping process typically involves several steps:
 
 ## Identifiers
 
-Identifiers in FHIR references typically include the following components: [see more](https://hl7.org/fhir/datatypes.html#Identifier)
+[Identifiers in FHIR](https://hl7.org/fhir/datatypes.html#Identifier) are strings (typically numeric or alphanumeric) that uniquely identify an object or entity within a system. They are essential for connecting resources within FHIR to external systems and maintaining data integrity across platforms.
 
-> A string, typically numeric or alphanumeric, that is associated with a single object or entity within a given system. Typically, identifiers are used to connect content in resources to external content available in other frameworks or protocols.
+Identifiers have two key components:
 
-System: Indicates the system or namespace to which the identifier belongs. By default the namespace is `http://calypr-public.ohsu.edu/<project-id>`.
+* **System**: The namespace or system to which the identifier belongs. The default namespace is `http://calypr-public.ohsu.edu/<project-id>`. This ensures your identifiers don't conflict with identifiers from other systems.
+* **Value**: The actual identifier string (e.g., a subject ID like "SUBJ-001" or a specimen ID like "SPEC-12345").
 
-Value: The actual value of the identifier within the specified system. For instance, a lab controlled subject identifier or a specimen identifier.
+**Example**: A patient identifier might be represented as:
+- System: `http://calypr-public.ohsu.edu/study-123`
+- Value: `PAT-00542`
 
 
 
 ## References
 
-By using identifiers in references, FHIR ensures that data can be accurately linked, retrieved, and interpreted across different systems and contexts within the healthcare domain, promoting interoperability and consistency in data exchange. [see more](https://hl7.org/fhir/references.html)
+By using [identifiers in references](https://hl7.org/fhir/references.html), FHIR ensures that data can be accurately linked, retrieved, and interpreted across different systems and contexts within the healthcare domain, promoting interoperability and consistency in data exchange.
 
 > Many of the defined elements in a resource are references to other resources. Using these references, the resources combine to build a web of information about healthcare.
 
 
 ## Key resources
 
 ### ResearchStudy
-> A scientific study of nature that sometimes includes processes involved in health and disease. [see more](https://hl7.org/fhir/researchstudy.html)
+A [scientific study](https://hl7.org/fhir/researchstudy.html) of nature that sometimes includes processes involved in health and disease.
 
 ### ResearchSubject
-> A ResearchSubject is a participant or object which is the recipient of investigative activities in a research study. [see more](https://hl7.org/fhir/researchsubject.html)
+A [ResearchSubject](https://hl7.org/fhir/researchsubject.html) is a participant or object which is the recipient of investigative activities in a research study.
 
 
 ### Patient 
-> Demographics and other administrative information about an individual or animal receiving care or other health-related services. [see more](https://hl7.org/fhir/patient.html)
+A [Patient](https://hl7.org/fhir/patient.html) connects to Demographics and other administrative information about an individual or animal receiving care or other health-related services.
 
 ### Specimen
 
-> A sample to be used for analysis. [see more](https://hl7.org/fhir/specimen.html)
+A [Specimen](https://hl7.org/fhir/specimen.html) represents a sample collected during a healthcare event and used for analysis or testing. It includes information about the sample type, collection method, and processing.
 
 ### DocumentReference
-> A reference to a document of any kind for any purpose. [see more](https://hl7.org/fhir/documentreference.html)
+A [DocumentReference](https://hl7.org/fhir/documentreference.html) is a reference to a document of any kind for any purpose.
+
+
+## Next steps
 
+With your data integrated into FHIR format, you can now manage and enhance your metadata. See the [data management section](managing-metadata.md) for detailed guidance on creating, updating, and maintaining your metadata.
 
-See the [data management section](managing-metadata.md) for more information on how to create and upload metadata.
+For more information on ontologies and how SNOMED CT codes enhance your data, see the [Ontologies](#ontologies) section above.
@@ -5,7 +5,7 @@ Given all of the intricacies healthcare and experimental data, we use Fast Healt
 
 ## What is FHIR?
 
-> In a Gen3 data commons, a semantic distinction is made between two types of data: "data files" and "metadata". [more](https://gen3.org/resources/user/dictionary/#understanding-data-representation-in-gen3)
+In a Gen3 data commons, a semantic distinction is made between two types of data: ["data files" and "metadata"](https://gen3.org/resources/user/dictionary/#understanding-data-representation-in-gen3)
 
 A "data file" could be information like tabulated data values in a spreadsheet or a fastq/bam file containing DNA sequences. The contents of the file are not exposed to the API as queryable properties, so the file must be downloaded to view its content.
 
 
@@ -1,20 +1,26 @@
 # Managing Metadata
 
-Metadata in Calypr is formatted using the Fast Healthcare Interoperability Resources (FHIR) schema. If you choose to bring your own FHIR newline delimited json data, you will need to create a directory called “META” in your git-drs repository in the same directory that you initialized your git-drs repository, and place your metadata files in that directory.   
-The META/ folder contains newline-delimited JSON (.ndjson) files representing FHIR resources describing the project, its data, and related entities. Large files are tracked using Git LFS, with a required correlation between each data file and a DocumentReference resource. This project follows a standardized structure to manage large research data files and associated FHIR metadata in a version-controlled, DRS and FHIR compatible format.  
-Each file must contain only one type of FHIR resource type, for example META/ResearchStudy.ndjson only contains research study resource typed FHIR objects. The name of the file doesn’t have to match the resource type name, unless you bring your own document references, then you must use DocumentReference.ndjson. For all other FHIR file types this is simply a good organizational practice for organizing your FHIR metadata.
+Metadata in Calypr is represented as FHIR resources in newline-delimited JSON files (.ndjson).
+
+If you are bringing your own FHIR metadata, create a `META/` directory at the root of your initialized `git-drs` repository and place your metadata files there.
+
+The `META/` directory contains one resource file, with each file representing a single FHIR resource type. For example, `META/ResearchStudy.ndjson` should contain only `ResearchStudy` resources. Using one file per resource type keeps validation and troubleshooting straightforward.
+
+For projects with Git LFS-managed data files, each data file must have a corresponding `DocumentReference` resource.
 
 ## META/ResearchStudy.ndjson
 
-* The File directory structure root research study is based on the 1st Research Study in the document. This research study is the research study that the autogenerated document references are connected to. Any additional research studies that are provided will be ignored when populating the miller table file tree.  
+* The entry tree root in the portal is based on the first `ResearchStudy` record in this file.
+* Auto-generated `DocumentReference` resources are linked to that first `ResearchStudy`.
+* Additional `ResearchStudy` records may be preserved in metadata, but they are not used to build the default file tree root.
 * Contains at least one FHIR ResearchStudy resource describing the project.  
 * Defines project identifiers, title, description, and key attributes.
 
 ## META/DocumentReference.ndjson
 
 * Contains one FHIR DocumentReference resource per Git LFS-managed file.  
 * Each `DocumentReference.content.attachment.url` field:
-  * Must exactly match the relative path of the corresponding file in the repository.
+  * Must exactly match the relative path of the corresponding file in the repository (for example, `data/file1.bam`).
   * Example:
 
 ```json
@@ -43,19 +49,26 @@ cp ~/my-data/specimens.ndjson META/
 cp ~/my-data/document-references.ndjson META/
 ```
 
-## Other FHIR data 
+## Other FHIR Data
 
-\[TODO More intro text here\]
+You can include additional resource types to represent subjects, specimens, assays, and measurements.
 
-* Patient.ndjson: Participant records.  
-* Specimen.ndjson: Biological specimens.  
-* ServiceRequest.ndjson: Requested procedures.  
-* Observation.ndjson: Measurements or results.  
+Common examples:
+
+* `Patient.ndjson`: Participant records.
+* `ResearchSubject.ndjson`: Participant enrollment in a study.
+* `Specimen.ndjson`: Biological specimens.
+* `Task.ndjson` or `ServiceRequest.ndjson`: Procedures, pipeline steps, or assay workflow context.
+* `Observation.ndjson`: Measurements or results.
 * Other valid FHIR resource types as required.
 
-Ensure your FHIR `DocumentReference` resources reference the DRS URIs:
+When these files are present, ensure references are internally consistent (for example, a `DocumentReference.subject.reference` should point to an existing `Patient`, `Specimen`, or `ResearchStudy` record).
+
+### Important: `DocumentReference` URL Format
+
+In a `git-drs` repository, `DocumentReference.content.attachment.url` should be the repository-relative file path, not a `drs://` URI.
 
-Example `DocumentReference` linking to S3 file:
+Example:
 
 ```json
 {
@@ -64,7 +77,7 @@ Example `DocumentReference` linking to S3 file:
   "status": "current",
   "content": [{
     "attachment": {
-      "url": "drs://calypr-public.ohsu.edu/your-drs-id",
+      "url": "data/sample1.bam",
       "title": "sample1.bam",
       "contentType": "application/octet-stream"
     }
@@ -80,10 +93,10 @@ Example `DocumentReference` linking to S3 file:
 
 ## Validating Metadata
 
-To ensure that the FHIR files you have added to the project are correct and pass schema checking, you can use the [Forge tool](../../tools/forge/index.md).
+To ensure that the FHIR files you added are valid and graph-consistent, use [Forge validation](../../tools/forge/docs/validation.md).
 
 ```bash
-forge validate
+forge validate data --path META
 ```
 
 Successful output:
@@ -99,21 +112,21 @@ Fix any validation errors and re-run until all files pass.
 
 ### Forge Data Quality Assurance Command Line Commands
 
-If you have provided your own FHIR resources there are two commands that might be useful to you for ensuring that your FHIR metadata will appear on the CALYPR data platform as expected. These commands are validate and check-edge
+If you provide your own FHIR resources, these two commands are the most useful checks before submission.
 
 **Validate:**
 ```bash
-forge validate META
+forge validate data --path META
 # or
-forge validate META/DocumentReference.ndjson
+forge validate data --path META/DocumentReference.ndjson
 ```
 Validation checks if the provided directory or file will be accepted by the CALYPR data platform. It catches improper JSON formatting and FHIR schema errors.
 
 **Check-edge:**
 ```bash
-forge check-edge META
+forge validate edge --path META
 # or
-forge validate META/DocumentReference.ndjson
+forge validate edge --path META --out-dir tmp/graph-check
 ```
 Check-edge ensures that references within your files (e.g., a Patient ID in an Observation) connect to known vertices and aren't "orphaned".
 
@@ -145,13 +158,13 @@ Check-edge ensures that references within your files (e.g., a Patient ID in an O
 * Validates that DocumentReference resources reference the same ResearchStudy via relatesTo or other linking mechanisms.  
 * If FHIR resources like Patient, Specimen, ServiceRequest, Observation are present, ensures:  
   * Their id fields are unique.  
-  * DocumentReference correctly refers to those resources (e.g., via subject or related fields).
+  * DocumentReference correctly refers to those resources (for example, via `subject`).
 
 #### 5\. Cross-Entity Consistency 
 
 * If multiple optional FHIR .ndjson files exist:  
   * Confirms IDs referenced in one file exist in others.  
-  * Detects dangling references (e.g., a DocumentReference.patient ID that's not in Patient.ndjson).
+  * Detects dangling references (for example, a `DocumentReference.subject.reference` that points to a missing `Patient`).
 
 ---