Data Contract

The CSVToFHIR converter Data Contract file is a JSON based configuration file which defines the CSVToFHIR conversion process. A single data contract file is used to support multiple CSVToFHIR conversions for a single tenant.

Resource Mapping Specification

Top Level Keys

{
  "general": {},
  "fileDefinitions": {}
}

Key Name	Description	Required
general	Contains general settings for the CSVToFHIR service which apply to all file definitions such as tenant id, timezone, etc	Y
fileDefinitions	Defines the CSVToFHIR mapping configuration for each CSV source file	Y

General

{
  "general": {
    "timeZone": "US/Eastern",
    "tenantId": "tenant1",
    "assigningAuthority": "default-authority",
    "streamType": "live",
    "emptyFieldValues": [
      "empty",
      "\\n"
    ],
    "regexFilenames": true
  }
}

Key Name	Description	Required
timeZone	The default timezone to apply to datetime values as necessary. The timeone is a valid tz database/IANA values	Y
tenantId	The customer tenant id	Y
assigningAuthority	The default assigning authority/system of record, applied to code values where needed	N
streamType	Indicates if the incoming data is "historical" or "live"	Y
emptyFieldValues	Additional field values which are treated as "empty" or NULL	N
regexFilenames	Determines if the filename to fileDefinition matching will be regex based or simple string comparison. Default: False	N

Validations

timeZone is a valid value as specified by pytz.common_timezones
streamType is either historical or live

FileDefinition

The top-level key within a FileDefinition serves as the FileDefinition name. This name is matched against the input CSV file using either string match (case-insensitive) or regex (case-sensitive) [see general.regexFilenames setting].

Two methods of providing a fileDefinition for a file are supported; inline and external.

Inline

Provided as the value of the filename pattern key

{
 "fileDefinitions": {
    "Patient": {
     "comment": "patient demographic fields",
      "fileType": "csv",
      "valueDelimiter": ",",
      "convertColumnsToString": true,
      "resourceType": "Patient",
      "groupByKey": "patientId",
      "skiprows": [2],
      "headers": [],
      "tasks": []
    }
  }
}

Key Name	Description	Required
fileType	The type of source file. Supports "csv" or "fixed-width". Defaults to "csv"	N
valueDelimiter	The value, or field, delimiter used in the "CSV" file. Defaults to ","	N
comment	Provides an additional description/comment for the file definition	N
convertColumnsToString	When true converts all input columns to Python's "str" data type. If False, Pandas will infer the datatype. Defaults to True.	N
resourceType	The target FHIR resource type.	Y
groupByKey	The field used to associate the record with other records in separate CSV payloads	Y
skiprows	Skip rows from the csv file. Value can be in integet to skip that many lines from the top, or an array to skip rows with that index (0 based). e.g. `[2, 3]` will skip row 3 and 4 from the file (including headers)	N
headers	Provides a header record for a CSV source file without a header. Column names reflect the target record format. When `fileType=fixed-width`, headers is a required field, and should be a dictionary of type <col_name>:<col_width>	N
tasks	List of tasks to execute against the CSV source data, prior to FHIR conversion.	N

External

reference an external json file that contains the fileDefinition model. The path can be absolute or relative to the main data-contract

{
 "fileDefinitions": {
    "Patient": "external-patient-file-definition.json"
  }
}

Validations

resourceType is a valid FHIR resource type name
tasks definitions align with pipeline task function implementations
if fileType is fixed-width headers are mandatory

Tasks

{
 "name": "add_constant",
 "comment": "adds a default ethnic system code to the source data",
 "params": {
    "name": "ethnicitySystem",
    "value": "http://terminology.hl7.org/CodeSystem/v3-Ethnicity"
 }
}

Key Name	Description	Required
name	The task name	Y
comment	Additional comment/documentation for the task	N
params	Dictionary of task parameters	N

Supported Tasks

Task Name	Description	Parameters	Examples
add_constant	Creates an additional column with constant value assigned	name: constant name used as the new column name value: constant value	{ "name": "add_constant", "params": { "name": "ssnSystem", "value": "http://hl7.org/fhir/sid/us-ssn" } }
add_row_num
append_list
build_object_array
change_case
compare_to_date
conditional_column	Creates a new column by mapping the values from a source column to a target value. Supports inline mappings as a dictionary, and external mappings using a file name. If mapping not found: "default": value will be used if present otherwise leave existing value from source	source_column: The source column for the new conditional column condition_map: Maps values from the source column to the desired target values or a filename that contains the mappings. target_column: The new target column	Inline mapping: { "name": "conditional_column", "params": { "source_column": "raceText", "target_column": "raceCode", "condition_map": { "american indian": "1002-5", "asian": "2028-9", "black": "2054-5", "pacific islander": "2076-8", "white": "2106-3", "default": "2131-1" } } } External file map: { "name": "conditional_column", "params": { "source_column": "raceText", "target_column": "raceCode", "condition_map": "race.csv" } }
conditional_column_update
condition_column_with_prerequisite
convert_to_list
copy_columns	Copies one or more source columns to a target column.	columns: List of column(s) to copy target_column: Name of column to be created value_separator: Character to be used when mutliple columns are concatenated.Defaults to a " ".
filter_to_columns
find_not_null_value
format_date	Formats date string values within a column to a target format.	columns: the column name(s) to update date_format: the date format to apply to the column(s). Defaults to “%Y-%m-%d”	{ "name": "format_date", "params": { "columns": [ "dateOfBirth" ], "date_format": "%Y-%m-%d" } }
map_codes	Maps 'codes' values to a target representation. map_codes supports inline mappings as a dictionary, and external mappings using a file name. If a map of “default” is provided, any value that does not match another mapping key is given this value.	code_map: Contains a mapping from source value to target value for a given set of fields or the name of a file which contains the mappings.	Internal data contract mapping { "name": "map_codes", "params": { "code_map": { "sex": { "default": "unknown", "F": "female", "M": "male", "O": "other" } } } } External mapping: { "name": "map_codes", "params": { "code_map": { "sex": "sex.csv" } } }
rename_columns	Renames column(s)	column_map: A dictionary which maps the source column names to the target column names.	{ "name": "rename_columns", "params": { "column_map": { "hospitalId": "assigningAuthority", "givenName": "nameFirstMiddle", "familyName": "nameLast", "sex": "gender", "dateOfBirth": "birthDate" } } }
replace_text
remove_whitespace_from_columns
set_nan_to_none
split_column
split_row	Splits a record “row” on a column or columns, creating N additional rows for each column included within the split operation. Creates additional columns for the “label” and “value”.	columns: The column(s) to split on split_column_name: The column name or header used for the "label" column split_value_column_name: The column name or header used for the "value" column	{ "name": "split_row", "params": { "columns": [ "height", "weight", "bmi" ], "split_column_name": "observationCodeText", "split_value_column_name": "observationValue" } }
validate_value	Validates the value of the column against the provided regex to confirm a complete match. An alternative value can be provided which will be used in case regex does not match	column_name: name of column to validate regex: regex to validate against no_match_replacement: replacement value if the regex does not match, defaults to None	{ "name": "validate_value", "params": { "column_name": "company email", "regex": "^[a-zA-Z0-0_\.]+@company.com", "no_match_replacement": "invalid company email" } }
join_data	takes a secondary file (csv or fixed width) and joins the supplimentary data with the primary dataframe based on some common joining key.	secondary_data_source: path to the secondary data file. Can be relative to the data-contract dictionary or absolute. join_type: {'left', 'right', 'outer', 'inner', 'cross'} which correspond roughly to the join types in relational databases by the same name. See "how" parameter of pandas.dataframe.merge function: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html join_on: Key that will be used to corelate the two data sets. The Key has to be named exactly the same in both datasets source_type: csv or fixed-width. default: csv reader_params: any additional parameters that need to be passed to pandas for reading the secondary file. default: None	{ "name": "join_data", "params": { "secondary_data_source": "/path/to/secondary/file.csv", "join_type": "outer", "join_on": "MRN", "source_type": "csv", "reader_params": { "some_panda_reader_param": "param value" } } }

Alternative Datacontract locations

CsvToFHIR uses the smart_open library to read the Datacontract and any referenced file definitions within it. This allows CsvToFHIR to seamlessly support data contract files that are stored in external cloud storage such as S3, Azure Blob Storage etc. (see smart_open documentation for a full list of supported platforms).

In order to use an external cloud storage vendor additional dependencies might be required which are not automatically installed by CsvToFHIR. For example to support azure storage, install the azure extras package from the smart_open pip install smart_open[azure]. Again, see the smart_open documentation for additional information and examples.

A sample configuration to use a data contract stored in azure would look like:

export mapping_config_directory=azure://my_bucket/my_prefix/
export mapping_config_file_name=data-contract.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Contract

Resource Mapping Specification

Top Level Keys

General

Validations

FileDefinition

Inline

External

Validations

Tasks

Supported Tasks

Alternative Datacontract locations

FilesExpand file tree

datacontract.md

Latest commit

History

datacontract.md

File metadata and controls

Data Contract

Resource Mapping Specification

Top Level Keys

General

Validations

FileDefinition

Inline

External

Validations

Tasks

Supported Tasks

Alternative Datacontract locations