Skip to content

Commit a2147e0

Browse files
EliEli
authored andcommitted
Update documentation.
1 parent 0a9def6 commit a2147e0

13 files changed

Lines changed: 1116 additions & 53 deletions

README.md

Lines changed: 34 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,38 @@
11
# dms_datastore
22

3-
Delta Modeling Section Datastore provides tools for downloading and managing continuous data. This repository is a work in progress.
4-
5-
## Table of Contents
6-
- [Overview](#overview)
7-
- [Data Repository Structure](#data-repository-structure)
8-
- [Data Quality and Flags](#data-quality-and-flags)
9-
- [Data Screening and Error Detection Methods](#data-screening-and-error-detection-methods)
10-
- [Metadata and Station Concepts](#metadata-and-station-concepts)
11-
- [File Naming Conventions](#file-naming-conventions)
12-
- [Units and Standardization](#units-and-standardization)
13-
- [Data Fetching and Priority](#data-fetching-and-priority)
14-
- [Configuration System](#configuration-system)
15-
- [Accessing Datastore Data](#accessing-datastore-data)
16-
- [Challenges and Exceptions](#challenges-and-exceptions)
17-
- [Installation](#installation)
18-
- [CLI Commands](#cli-commands)
3+
Delta Modeling Section Datastore — tools for downloading, formatting, screening, and
4+
managing continuous time-series data from water-quality and hydrological agencies
5+
(USGS, CDEC, NOAA, NCRO, DES, and others).
6+
7+
**Full documentation:** https://cadwrdeltamodeling.github.io/dms_datastore/
8+
9+
## Quick Install
10+
11+
```bash
12+
git clone https://github.com/CADWRDeltaModeling/dms_datastore
13+
conda env create -f environment.yml
14+
conda activate dms_datastore
15+
```
16+
17+
Development install after the above:
18+
19+
```bash
20+
pip install --no-deps -e .
21+
```
22+
23+
## Quick Start
24+
25+
```python
26+
from dms_datastore.read_multi import read_ts_repo
27+
28+
# Read EC data for station "dsj"
29+
data = read_ts_repo("dsj", "ec")
30+
```
31+
32+
For the complete data model, CLI reference, configuration system, and API docs see
33+
the [documentation site](https://cadwrdeltamodeling.github.io/dms_datastore/) or the
34+
[`docsrc/`](docsrc/) folder for the Sphinx source.
35+
1936

2037
## Overview
2138

@@ -170,7 +187,7 @@ The Dropbox Data Processing System provides a mechanism for importing ad-hoc or
170187

171188
The system uses a YAML configuration file (`dropbox_spec.yaml`) to define data sources, collection patterns, and metadata handling rules. The `dropbox_data.py` script processes these configurations to locate, transform, and store the data in the standardized repository format.
172189

173-
See [README-dropbox.md](README-dropbox.md) for detailed documentation on this system.
190+
See the Dropbox documentation in the project docs (docsrc/dropbox.rst) for detailed information on this system.
174191

175192
## Configuration System
176193

README.rst

Lines changed: 12 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,25 @@
1-
===============================
1+
===============
22
dms_datastore
3-
===============================
3+
===============
44

5-
Downloading tools and data repository management. This repository is a work in progress. It is not recommended for any purpose while it is under construction.
5+
Delta Modeling Section Datastore — tools for downloading, formatting, screening, and
6+
managing continuous time-series data from water-quality and hydrological agencies
7+
(USGS, CDEC, NOAA, NCRO, DES, and others).
68

9+
**Full documentation:** https://cadwrdeltamodeling.github.io/dms_datastore/
710

8-
9-
There are improvements that are needed to the downloading system tools.
10-
11-
1. The downloading scripts should be called download_XXX.py where XXX is noaa, nwis, cdec etc.
12-
2. The API should be made uniform between these scripts.
13-
a. It should be able to use our new id or the agency_id with the new id preferred
14-
b. It should be able to use our variable name or the agency variable code
15-
c. It should produce files that are named according to the file naming convention in the data plan: http://msb-confluence/display/DMKB/Strawman+Data+Organization+Plan:
16-
17-
usgs_sjj@bgc_11337190_turbidity_2021.csv
18-
19-
This is all potentially destabilizing, so perhaps it should be done on a shortlived branch
20-
21-
The station files don't have a uniform format. I prefer all look like this:
22-
id,agency_id,subloc,variable
23-
sjj,11337190,bgc,turbidity
24-
25-
The agency_id column is optional.
26-
27-
===============================
2811
Installation
29-
===============================
12+
============
3013

3114
.. code-block:: bash
32-
15+
3316
git clone https://github.com/CADWRDeltaModeling/dms_datastore
34-
conda env create -f environment.yml # should create a dms_datastore and pip install the package
35-
# alternatively, pip install -e . after running the above command if you want to develop the package
17+
conda env create -f environment.yml
3618
conda activate dms_datastore
3719
20+
Development install after the above:
3821

22+
.. code-block:: bash
3923
40-
24+
pip install --no-deps -e .
4125

docsrc/commands.rst

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
dms_datastore Command Reference
2+
===============================
3+
4+
This page collects CLI command help pointers and workflow examples for the commands
5+
exposed by the package entry points.
6+
7+
Main Entrypoint
8+
---------------
9+
10+
Use the `dms` grouped CLI (or call commands directly by script name).
11+
12+
.. code-block:: bash
13+
14+
# grouped help
15+
dms --help
16+
17+
# subcommand help (example)
18+
dms download_ncro --help
19+
20+
Command Help Shortcuts
21+
-----------------------
22+
23+
.. code-block:: bash
24+
25+
dms --help
26+
download_noaa --help
27+
download_hycom --help
28+
download_hrrr --help
29+
download_cdec --help
30+
download_wdl --help
31+
download_nwis --help
32+
download_des --help
33+
download_ncro --help
34+
download_mokelumne --help
35+
download_ucdipm --help
36+
download_cimis --help
37+
download_dcc --help
38+
download_montezuma_gates --help
39+
download_smscg --help
40+
compare_directories --help
41+
populate_repo --help
42+
station_info --help
43+
reformat --help
44+
auto_screen --help
45+
inventory --help
46+
usgs_multi --help
47+
delete_from_filelist --help
48+
data_cache --help
49+
merge_files --help
50+
dropbox --help
51+
coarsen --help
52+
update_repo --help
53+
update_flagged_data --help
54+
rationalize_time_partitions --help
55+
56+
Workflow A: Repository Build Pipeline (Download → Reformat → Auto Screen)
57+
-----------------------------------------------------------------------
58+
59+
Stage 1: Download into raw/staging
60+
61+
.. code-block:: bash
62+
63+
# helper pattern used by all downloaders
64+
download_noaa --help
65+
download_nwis --help
66+
download_des --help
67+
download_ncro --help
68+
69+
Examples
70+
~~~~~~~~
71+
72+
.. code-block:: bash
73+
74+
# NOAA
75+
download_noaa --start 2024-01-01 --end 2024-01-31 --param water_level --stations ccc --dest <raw_dir>
76+
77+
# NWIS
78+
download_nwis --start 2024-01-01 --end 2024-01-31 --stations sjj --param 00060 --dest <raw_dir>
79+
80+
# DES
81+
download_des --start 2024-01-01 --end 2024-01-31 --stations cll --param flow --dest <raw_dir>
82+
83+
# NCRO timeseries
84+
download_ncro --start 2024-01-01 --end 2024-12-31 --stations orm --param elev --dest <raw_dir>
85+
86+
# NCRO inventory only
87+
download_ncro --inventory-only
88+
89+
# CDEC
90+
download_cdec --start 2024-01-01 --end 2024-01-31 --stations cse --param elev --dest <raw_dir>
91+
92+
# HYCOM
93+
download_hycom --sdate 2024-01-01 --edate 2024-01-31 --raw_dest <hycom_raw_dir> --processed_dest <hycom_processed_dir>
94+
95+
# HRRR
96+
download_hrrr --sdate 2024-01-01 --edate 2024-01-03 --dest <hrrr_raw_dir>
97+
98+
# UCD IPM (positional dates)
99+
download_ucdipm 2024-01-01 2024-01-31 --stnkey 281
100+
101+
Stage 2: Reformat raw → formatted
102+
103+
.. code-block:: bash
104+
105+
reformat --inpath <raw_dir> --outpath <formatted_dir>
106+
107+
Stage 2b: USGS multivariate cleanup
108+
109+
.. code-block:: bash
110+
111+
usgs_multi --fpath <formatted_dir>
112+
113+
Stage 3: Auto screen formatted → screened
114+
115+
.. code-block:: bash
116+
117+
auto_screen --fpath <formatted_dir> --dest <screened_dir>
118+
119+
Workflow B: Dropbox Ingest (separate workflow)
120+
----------------------------------------------
121+
122+
.. code-block:: bash
123+
124+
dropbox --input dms_datastore/config_data/dropbox_spec.yaml
125+
126+
Workflow C: Staging → Repo update and utilities
127+
------------------------------------------------
128+
129+
.. code-block:: bash
130+
131+
update_repo <staging_formatted_dir> <repo_formatted_dir> --plan --out-actions update_plan.csv
132+
update_repo <staging_formatted_dir> <repo_formatted_dir> --apply
133+
134+
Additional Utilities
135+
--------------------
136+
137+
See the documentation for details and examples for `station_info`, `merge_files`, `coarsen`, and other utilities.

0 commit comments

Comments
 (0)