|
1 | | -# Versioning |
| 1 | +# versioning |
2 | 2 |
|
3 | 3 | An R package for versioned file I/O using a configuration file. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +R data pipelines commonly require reading and writing data to versioned directories. Each |
| 8 | +directory might correspond to one step of a multi-step process, where that version |
| 9 | +corresponds to particular settings for that step and a chain of previous steps that each |
| 10 | +have their own respective versions. |
| 11 | + |
| 12 | +The **versioning** package simplifies management of project settings and file I/O by |
| 13 | +combining them in a single `Config` object, backed by YAML configuration files that are |
| 14 | +loaded from and saved to each versioned folder. |
| 15 | + |
| 16 | +## Installation |
| 17 | + |
| 18 | +```r |
| 19 | +install.packages('versioning') |
| 20 | +``` |
| 21 | + |
| 22 | +## Quick Start |
| 23 | + |
| 24 | +```r |
| 25 | +library(versioning) |
| 26 | + |
| 27 | +# Load the example config bundled with the package |
| 28 | +example_config_fp <- system.file('extdata', 'example_config.yaml', package = 'versioning') |
| 29 | +config <- Config$new(config_list = example_config_fp) |
| 30 | + |
| 31 | +# Print the full config |
| 32 | +print(config) |
| 33 | + |
| 34 | +# Access settings (throws an error if the key doesn't exist) |
| 35 | +config$get('a') #> [1] "foo" |
| 36 | +config$get('group_c', 'd') #> [1] 1e+05 |
| 37 | + |
| 38 | +# Point directories at temporary folders for this example |
| 39 | +config$config_list$directories$raw_data$path <- tempdir() |
| 40 | +config$config_list$directories$prepared_data$path <- tempdir() |
| 41 | + |
| 42 | +# Get directory and file paths |
| 43 | +config$get_dir_path('prepared_data') # <tempdir>/v1 (versioned) |
| 44 | +config$get_file_path('raw_data', 'a') # <tempdir>/example_input_file.csv |
| 45 | + |
| 46 | +# Copy the bundled input file into the raw_data directory |
| 47 | +file.copy( |
| 48 | + from = system.file('extdata', 'example_input_file.csv', package = 'versioning'), |
| 49 | + to = config$get_file_path('raw_data', 'a') |
| 50 | +) |
| 51 | + |
| 52 | +# Read and write files (format inferred from extension) |
| 53 | +df <- config$read(dir_name = 'raw_data', file_name = 'a') |
| 54 | +config$write(df, dir_name = 'prepared_data', file_name = 'prepared_table') |
| 55 | + |
| 56 | +# Save the config itself to the prepared_data directory as config.yaml |
| 57 | +config$write_self(dir_name = 'prepared_data') |
| 58 | +``` |
| 59 | + |
| 60 | +## Config File Format |
| 61 | + |
| 62 | +The package uses YAML files for configuration. Settings can be any mix of scalar values, |
| 63 | +lists, and nested groups. Two top-level keys have special meaning: `directories` and |
| 64 | +`versions`. |
| 65 | + |
| 66 | +```yaml |
| 67 | +# Arbitrary settings |
| 68 | +a: 'foo' |
| 69 | +b: ['bar', 'baz'] |
| 70 | +group_c: |
| 71 | + d: 1e5 |
| 72 | + e: false |
| 73 | + |
| 74 | +# Directory definitions |
| 75 | +directories: |
| 76 | + raw_data: |
| 77 | + versioned: FALSE # no versioned sub-directory |
| 78 | + path: '~/project/raw_data' |
| 79 | + files: |
| 80 | + a: 'example_input_file.csv' |
| 81 | + prepared_data: |
| 82 | + versioned: TRUE # paths include a version sub-directory |
| 83 | + path: '~/project/prepared_data' |
| 84 | + files: |
| 85 | + prepared_table: 'example_prepared_table.csv' |
| 86 | + summary_text: 'summary_of_rows.txt' |
| 87 | + |
| 88 | +# Current version for each versioned directory |
| 89 | +versions: |
| 90 | + prepared_data: 'v1' |
| 91 | +``` |
| 92 | +
|
| 93 | +Each entry in `directories` contains: |
| 94 | + |
| 95 | +| Field | Type | Description | |
| 96 | +|---|---|---| |
| 97 | +| `versioned` | logical | Whether paths include a version sub-directory (e.g. `.../v1/`) | |
| 98 | +| `path` | character | Base path to the directory | |
| 99 | +| `files` | list | Named file references within the directory | |
| 100 | + |
| 101 | +When `versioned: TRUE`, `config$get_dir_path('prepared_data')` returns |
| 102 | +`~/project/prepared_data/v1` (appending the version from `versions$prepared_data`). |
| 103 | + |
| 104 | +## Overriding Versions Programmatically |
| 105 | + |
| 106 | +You can override specific versions at load time without editing the YAML file. This is |
| 107 | +useful for passing versions as command-line arguments to a script: |
| 108 | + |
| 109 | +```r |
| 110 | +# Load config but change the "prepared_data" version to "v2" |
| 111 | +config_v2 <- Config$new( |
| 112 | + config_list = 'path/to/config.yaml', |
| 113 | + versions = list(prepared_data = 'v2') |
| 114 | +) |
| 115 | +config_v2$get_dir_path('prepared_data') # ~/project/prepared_data/v2 |
| 116 | +``` |
| 117 | + |
| 118 | +## Supported File Formats |
| 119 | + |
| 120 | +`config$read()` and `config$write()` dispatch on file extension via `autoread()` and |
| 121 | +`autowrite()`. Supported formats: |
| 122 | + |
| 123 | +| Operation | Extensions | |
| 124 | +|---|---| |
| 125 | +| Read | `csv`, `dbf`, `dta`, `rda`, `rds`, `shp`, `tif` / `geotiff`, `txt`, `xls` / `xlsx`, `yaml` / `yml` | |
| 126 | +| Write | `csv`, `rda`, `rds`, `shp`, `tif` / `geotiff`, `txt`, `yaml` / `yml` | |
| 127 | + |
| 128 | +Required packages for each format are loaded on demand (e.g. **data.table** for CSV, |
| 129 | +**sf** for shapefiles, **terra** for rasters). |
| 130 | + |
| 131 | +## Further Reading |
| 132 | + |
| 133 | +- Vignette: `vignette('versioning', package = 'versioning')` |
| 134 | +- Config class reference: `help(Config, package = 'versioning')` |
0 commit comments