Skip to content

Commit 8e01b1b

Browse files
committed
Update README for humans and machines.
1 parent 915e234 commit 8e01b1b

2 files changed

Lines changed: 143 additions & 1 deletion

File tree

CLAUDE.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Versioning
2+
3+
The purpose of this package is to parse YAML config files that simplify file reading and writing, with some opinionated package choices for file reading and writing of particular file types. The package is also intended to make it easy to deploy different versions of data pipelines over time.
4+
5+
## Documentation
6+
7+
Sphinx docs live in `docs/` and are auto-deployed to GitHub Pages on every push to `main` via `.github/workflows/docs.yml`.
8+
9+
Docstring changes and signature updates are picked up automatically. However, when you **add a new public function, class, or module**, you must manually update `docs/api.rst` with the corresponding `.. autofunction::`, `.. autoclass::`, or `.. automodule::` directive. If the new symbol depends on a new optional third-party package, also add that package to `autodoc_mock_imports` in `docs/conf.py`.
10+
11+
The version shown in the docs is read automatically from `__version__` in `src/versioning/__init__.py`; update that string when releasing a new version.

README.md

Lines changed: 132 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,134 @@
1-
# Versioning
1+
# versioning
22

33
An R package for versioned file I/O using a configuration file.
4+
5+
## Overview
6+
7+
R data pipelines commonly require reading and writing data to versioned directories. Each
8+
directory might correspond to one step of a multi-step process, where that version
9+
corresponds to particular settings for that step and a chain of previous steps that each
10+
have their own respective versions.
11+
12+
The **versioning** package simplifies management of project settings and file I/O by
13+
combining them in a single `Config` object, backed by YAML configuration files that are
14+
loaded from and saved to each versioned folder.
15+
16+
## Installation
17+
18+
```r
19+
install.packages('versioning')
20+
```
21+
22+
## Quick Start
23+
24+
```r
25+
library(versioning)
26+
27+
# Load the example config bundled with the package
28+
example_config_fp <- system.file('extdata', 'example_config.yaml', package = 'versioning')
29+
config <- Config$new(config_list = example_config_fp)
30+
31+
# Print the full config
32+
print(config)
33+
34+
# Access settings (throws an error if the key doesn't exist)
35+
config$get('a') #> [1] "foo"
36+
config$get('group_c', 'd') #> [1] 1e+05
37+
38+
# Point directories at temporary folders for this example
39+
config$config_list$directories$raw_data$path <- tempdir()
40+
config$config_list$directories$prepared_data$path <- tempdir()
41+
42+
# Get directory and file paths
43+
config$get_dir_path('prepared_data') # <tempdir>/v1 (versioned)
44+
config$get_file_path('raw_data', 'a') # <tempdir>/example_input_file.csv
45+
46+
# Copy the bundled input file into the raw_data directory
47+
file.copy(
48+
from = system.file('extdata', 'example_input_file.csv', package = 'versioning'),
49+
to = config$get_file_path('raw_data', 'a')
50+
)
51+
52+
# Read and write files (format inferred from extension)
53+
df <- config$read(dir_name = 'raw_data', file_name = 'a')
54+
config$write(df, dir_name = 'prepared_data', file_name = 'prepared_table')
55+
56+
# Save the config itself to the prepared_data directory as config.yaml
57+
config$write_self(dir_name = 'prepared_data')
58+
```
59+
60+
## Config File Format
61+
62+
The package uses YAML files for configuration. Settings can be any mix of scalar values,
63+
lists, and nested groups. Two top-level keys have special meaning: `directories` and
64+
`versions`.
65+
66+
```yaml
67+
# Arbitrary settings
68+
a: 'foo'
69+
b: ['bar', 'baz']
70+
group_c:
71+
d: 1e5
72+
e: false
73+
74+
# Directory definitions
75+
directories:
76+
raw_data:
77+
versioned: FALSE # no versioned sub-directory
78+
path: '~/project/raw_data'
79+
files:
80+
a: 'example_input_file.csv'
81+
prepared_data:
82+
versioned: TRUE # paths include a version sub-directory
83+
path: '~/project/prepared_data'
84+
files:
85+
prepared_table: 'example_prepared_table.csv'
86+
summary_text: 'summary_of_rows.txt'
87+
88+
# Current version for each versioned directory
89+
versions:
90+
prepared_data: 'v1'
91+
```
92+
93+
Each entry in `directories` contains:
94+
95+
| Field | Type | Description |
96+
|---|---|---|
97+
| `versioned` | logical | Whether paths include a version sub-directory (e.g. `.../v1/`) |
98+
| `path` | character | Base path to the directory |
99+
| `files` | list | Named file references within the directory |
100+
101+
When `versioned: TRUE`, `config$get_dir_path('prepared_data')` returns
102+
`~/project/prepared_data/v1` (appending the version from `versions$prepared_data`).
103+
104+
## Overriding Versions Programmatically
105+
106+
You can override specific versions at load time without editing the YAML file. This is
107+
useful for passing versions as command-line arguments to a script:
108+
109+
```r
110+
# Load config but change the "prepared_data" version to "v2"
111+
config_v2 <- Config$new(
112+
config_list = 'path/to/config.yaml',
113+
versions = list(prepared_data = 'v2')
114+
)
115+
config_v2$get_dir_path('prepared_data') # ~/project/prepared_data/v2
116+
```
117+
118+
## Supported File Formats
119+
120+
`config$read()` and `config$write()` dispatch on file extension via `autoread()` and
121+
`autowrite()`. Supported formats:
122+
123+
| Operation | Extensions |
124+
|---|---|
125+
| Read | `csv`, `dbf`, `dta`, `rda`, `rds`, `shp`, `tif` / `geotiff`, `txt`, `xls` / `xlsx`, `yaml` / `yml` |
126+
| Write | `csv`, `rda`, `rds`, `shp`, `tif` / `geotiff`, `txt`, `yaml` / `yml` |
127+
128+
Required packages for each format are loaded on demand (e.g. **data.table** for CSV,
129+
**sf** for shapefiles, **terra** for rasters).
130+
131+
## Further Reading
132+
133+
- Vignette: `vignette('versioning', package = 'versioning')`
134+
- Config class reference: `help(Config, package = 'versioning')`

0 commit comments

Comments
 (0)