|
| 1 | +Installation |
| 2 | +============ |
| 3 | + |
| 4 | +Core package (YAML support only): |
| 5 | + |
| 6 | +.. code-block:: bash |
| 7 | +
|
| 8 | + pip install versioning |
| 9 | +
|
| 10 | +With optional file-format extras: |
| 11 | + |
| 12 | +.. code-block:: bash |
| 13 | +
|
| 14 | + pip install versioning[pandas] # CSV, TSV, Excel, Stata |
| 15 | + pip install versioning[geo] # Shapefiles, GeoJSON, GeoPackage, etc. |
| 16 | + pip install versioning[raster] # GeoTIFF and other raster formats |
| 17 | + pip install versioning[xarray] # NetCDF |
| 18 | + pip install versioning[dbfread] # DBF files |
| 19 | + pip install versioning[all] # All of the above |
| 20 | +
|
| 21 | +Config file structure |
| 22 | +--------------------- |
| 23 | + |
| 24 | +The config YAML has two special top-level keys — ``directories`` and |
| 25 | +``versions`` — alongside any arbitrary settings your pipeline needs: |
| 26 | + |
| 27 | +.. code-block:: yaml |
| 28 | +
|
| 29 | + project_name: 'my_analysis' |
| 30 | +
|
| 31 | + directories: |
| 32 | + raw_data: |
| 33 | + versioned: false |
| 34 | + path: '~/data/raw' |
| 35 | + files: |
| 36 | + input_table: 'records.csv' |
| 37 | +
|
| 38 | + results: |
| 39 | + versioned: true |
| 40 | + path: '~/data/results' |
| 41 | + files: |
| 42 | + output_table: 'processed.csv' |
| 43 | + summary: 'summary.txt' |
| 44 | +
|
| 45 | + versions: |
| 46 | + results: 'v1' |
| 47 | +
|
| 48 | +Each entry under ``directories`` requires three fields: |
| 49 | + |
| 50 | +- **versioned** (bool) — whether the directory uses version subdirectories. |
| 51 | +- **path** (str) — base path (tilde expansion is applied). |
| 52 | +- **files** (dict) — named file stubs within the directory. |
| 53 | + |
| 54 | +For versioned directories the full path is ``{path}/{version}``, where the |
| 55 | +version comes from the ``versions`` dict (or a ``custom_version`` argument). |
| 56 | + |
| 57 | +Supported file extensions |
| 58 | +------------------------- |
| 59 | + |
| 60 | +.. list-table:: |
| 61 | + :header-rows: 1 |
| 62 | + :widths: 20 30 30 |
| 63 | + |
| 64 | + * - Format |
| 65 | + - Extensions |
| 66 | + - Requires |
| 67 | + * - CSV / TSV |
| 68 | + - csv, tsv, gz, bz2 |
| 69 | + - ``pandas`` |
| 70 | + * - Excel |
| 71 | + - xls, xlsx |
| 72 | + - ``pandas``, ``openpyxl`` |
| 73 | + * - Stata |
| 74 | + - dta |
| 75 | + - ``pandas`` |
| 76 | + * - DBF |
| 77 | + - dbf |
| 78 | + - ``dbfread`` |
| 79 | + * - YAML |
| 80 | + - yaml, yml |
| 81 | + - *(core)* |
| 82 | + * - Plain text |
| 83 | + - txt |
| 84 | + - *(core)* |
| 85 | + * - Vector geospatial |
| 86 | + - shp, geojson, gpkg, fgb, gml, kml, … |
| 87 | + - ``geopandas`` |
| 88 | + * - Raster |
| 89 | + - tif, geotiff |
| 90 | + - ``rasterio`` |
| 91 | + * - NetCDF |
| 92 | + - nc |
| 93 | + - ``xarray`` |
| 94 | + |
| 95 | +For raster files, :func:`~versioning.autoread` returns |
| 96 | +``{"data": np.ndarray, "profile": dict}`` and |
| 97 | +:func:`~versioning.autowrite` accepts that same structure (or a |
| 98 | +``(data, profile)`` tuple). |
0 commit comments