henryspatialanalysis
diff --git a/‎.gitignore‎
Lines changed: 216 additions & 0 deletions b/‎.gitignore‎
Lines changed: 216 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 5 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 152 additions & 0 deletions b/‎README.md‎
Lines changed: 152 additions & 0 deletions
@@ -0,0 +1,216 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[codz]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#   Usually these files are written by a python script from a template
+#   before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py.cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+# Pipfile.lock
+
+# UV
+#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+# uv.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+# poetry.lock
+# poetry.toml
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#   pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
+#   https://pdm-project.org/en/latest/usage/project/#working-with-version-control
+# pdm.lock
+# pdm.toml
+.pdm-python
+.pdm-build/
+
+# pixi
+#   Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
+# pixi.lock
+#   Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
+#   in the .venv directory. It is recommended not to include this directory in version control.
+.pixi
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# Redis
+*.rdb
+*.aof
+*.pid
+
+# RabbitMQ
+mnesia/
+rabbitmq/
+rabbitmq-data/
+
+# ActiveMQ
+activemq-data/
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.envrc
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#   JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#   be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#   and can be added to the global gitignore or merged into this file.  For a more nuclear
+#   option (not recommended) you can uncomment the following to ignore the entire idea folder.
+# .idea/
+
+# Abstra
+#   Abstra is an AI-powered process automation framework.
+#   Ignore directories containing user credentials, local state, and settings.
+#   Learn more at https://abstra.io/docs
+.abstra/
+
+# Visual Studio Code
+#   Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore 
+#   that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
+#   and can be added to the global gitignore or merged into this file. However, if you prefer, 
+#   you could uncomment the following to ignore the entire vscode folder
+# .vscode/
+
+# Ruff stuff:
+.ruff_cache/
+
+# PyPI configuration file
+.pypirc
+
+# Marimo
+marimo/_static/
+marimo/_lsp/
+__marimo__/
+
+# Streamlit
+.streamlit/secrets.toml
@@ -0,0 +1,5 @@
+# Versioning
+
+We are recreating the R package found in `../versioning.R/` as a clean Python package with tests and documentation, so that it is fully ready to be uploaded to PyPL. The package name is "versioning".
+
+The purpose of this package is to parse YAML config files that simplify file reading and writing, with some opinionated package choices for file reading and writing of particular file types. The package is also intended to make it easy to deploy different versions of data pipelines over time.
@@ -0,0 +1,152 @@
+# versioning
+
+A Python package for YAML-based configuration management in data pipelines, with versioned directory support and automatic file I/O by extension.
+
+## Installation
+
+```bash
+pip install versioning
+```
+
+Install optional extras for specific file formats:
+
+```bash
+pip install versioning[pandas]   # CSV, TSV, Excel, Stata
+pip install versioning[geo]      # Shapefiles, GeoJSON, GeoPackage, etc.
+pip install versioning[raster]   # GeoTIFF, rasterio formats
+pip install versioning[xarray]   # NetCDF
+pip install versioning[dbfread]  # DBF files
+pip install versioning[all]      # All of the above
+```
+
+## Quick Start
+
+### 1. Create a config YAML file
+
+```yaml
+# project_config.yaml
+project_name: 'my_analysis'
+
+directories:
+  raw_data:
+    versioned: false
+    path: '~/data/raw'
+    files:
+      input_table: 'records.csv'
+
+  results:
+    versioned: true
+    path: '~/data/results'
+    files:
+      output_table: 'processed.csv'
+      summary: 'summary.txt'
+
+versions:
+  results: 'v1'
+```
+
+### 2. Load the config
+
+```python
+from versioning import Config
+
+cfg = Config('project_config.yaml')
+```
+
+### 3. Access settings
+
+```python
+cfg.get('project_name')           # 'my_analysis'
+cfg.get('versions', 'results')    # 'v1'
+cfg.get()                         # full config dict
+```
+
+### 4. Build paths
+
+```python
+# Non-versioned: returns ~/data/raw
+cfg.get_dir_path('raw_data')
+
+# Versioned: returns ~/data/results/v1
+cfg.get_dir_path('results')
+
+# With a custom version override
+cfg.get_dir_path('results', custom_version='v2')
+
+# Full file path
+cfg.get_file_path('raw_data', 'input_table')   # ~/data/raw/records.csv
+cfg.get_file_path('results', 'output_table')   # ~/data/results/v1/processed.csv
+```
+
+All path methods return `pathlib.Path` objects.
+
+### 5. Read and write files
+
+```python
+import pandas as pd
+
+# Read a file (path resolved from config)
+df = cfg.read('raw_data', 'input_table')
+
+# Process data
+processed = df.head(10)
+
+# Write results (directory must exist)
+cfg.write(processed, 'results', 'output_table')
+cfg.write(['Summary: 10 rows written\n'], 'results', 'summary')
+
+# Write the config itself to the results directory
+cfg.write_self('results')
+```
+
+### 6. Override versions at load time
+
+```python
+# Run the same pipeline with a new version
+cfg_v2 = Config('project_config.yaml', versions={'results': 'v2'})
+cfg_v2.get_dir_path('results')  # ~/data/results/v2
+```
+
+## Standalone autoread / autowrite
+
+```python
+from versioning import autoread, autowrite
+
+# Read by extension
+df = autoread('data/records.csv')
+config = autoread('config.yaml')
+lines = autoread('notes.txt')
+
+# Write by extension
+autowrite(df, 'output/results.csv')
+autowrite({'key': 'value'}, 'output/config.yaml')
+autowrite(['line one\n', 'line two\n'], 'output/notes.txt')
+```
+
+## Supported File Extensions
+
+| Format | Extensions | Requires |
+|--------|-----------|---------|
+| CSV / TSV | csv, tsv, gz, bz2 | `pandas` |
+| Excel | xls, xlsx | `pandas`, `openpyxl` |
+| Stata | dta | `pandas` |
+| DBF | dbf | `dbfread` |
+| YAML | yaml, yml | *(core)* |
+| Text | txt | *(core)* |
+| Shapefile / Vector | shp, geojson, gpkg, fgb, gml, kml, and more | `geopandas` |
+| Raster | tif, geotiff | `rasterio` |
+| NetCDF | nc | `xarray` |
+
+For raster files, `autoread` returns `{"data": np.ndarray, "profile": dict}` and `autowrite` accepts that same structure (or a `(data, profile)` tuple).
+
+## Example Config File
+
+A bundled example is included with the package:
+
+```python
+import importlib.resources as r
+from versioning import Config
+
+path = str(r.files("versioning") / "data" / "example_config.yaml")
+cfg = Config(path)
+```