11# dms_datastore — Workspace Instructions
22
3- ## Project Overview
3+ Follow organization standards from BayDeltaSCHISM https://raw.githubusercontent.com/CADWRDeltaModeling/BayDeltaSCHISM/refs/heads/master/AGENTS.md
44
5- ` dms_datastore ` is a Python library and CLI toolkit for the Delta Modeling Section (DMS) that downloads, formats, screens, and manages continuous time-series data from water-quality and hydrological agencies (USGS, CDEC, NOAA, NCRO, DES, etc.). Data flows through four stages: ** raw → formatted → screened → processed ** .
5+ Follow local project rules in AGENTS.md .
66
7- ## Build and Test
7+ Local project rules override organization defaults.
8+
9+
10+ # Build and Test
811
912The ` dms_datastore ` conda environment is assumed to exist. Always activate it before running any tests or install commands.
1013
@@ -13,84 +16,7 @@ The `dms_datastore` conda environment is assumed to exist. Always activate it be
1316conda activate dms_datastore
1417pip install --no-deps -e .
1518
16- # Unit/integration tests (no real repo required)
17- conda activate dms_datastore && pytest
18-
19- # Integration tests against a real repository
20- conda activate dms_datastore && pytest test_repo/ --repo=< path_to_repo>
21-
22- # Single file
23- conda activate dms_datastore && pytest tests/test_filename.py
2419```
2520
2621pytest is configured in ` pyproject.toml ` (` [tool.pytest.ini_options] ` ): strict markers, JUnit XML output, ignores ` setup.py ` and ` build/ ` .
2722
28- ## Architecture
29-
30- | Layer | Modules | Purpose |
31- | ---| ---| ---|
32- | Public API | ` __init__.py ` | Re-exports ` read_ts_repo ` , ` read_ts ` , ` write_ts_csv ` |
33- | CLI | ` __main__.py ` | Click group ` dms ` aggregating all subcommands |
34- | Config | ` dstore_config.py ` , ` config_data/dstore_config.yaml ` | Repo roots, station DBs, variable/source mappings |
35- | File naming | ` filename.py ` | Parse/render filenames via ` interpret_fname ` / ` meta_to_filename ` |
36- | I/O | ` read_ts.py ` , ` write_ts.py ` | Low-level CSV read/write with YAML front-matter |
37- | Multi-file read | ` read_multi.py ` | ` read_ts_repo ` — resolves source priority, merges year-sharded files |
38- | Download | ` download_*.py ` | One module per agency (CDEC, NWIS, NOAA, NCRO, DES, HRRR, HYCOM, …) |
39- | Pipeline | ` populate_repo.py ` , ` update_repo.py ` | Orchestrate download → format → screen |
40- | QA/QC | ` auto_screen.py ` , ` screeners.py ` | YAML-driven screening; flags stored as ` user_flag ` column |
41- | Utilities | ` inventory.py ` , ` merge_files.py ` , ` coarsen_file.py ` , ` rationalize_time_partitions.py ` , ` reconcile_data.py ` | Repo maintenance |
42-
43- ## File Naming Convention
44-
45- Pattern: ` {agency}_{station_id@subloc}_{agency_id}_{variable}_{syear}_{eyear}.csv `
46-
47- - ` @subloc ` is omitted when subloc is ` default ` /` None `
48- - End year ` 9999 ` means open-ended (actively updated)
49- - ` variable@modifier ` encodes e.g. ` ec@daily `
50-
51- Examples:
52- - ` usgs_anh@north_11303500_flow_2024.csv `
53- - ` cdec_sac_11447650_flow_2020_9999.csv `
54-
55- See [ dms_datastore/filename.py] ( ../dms_datastore/filename.py ) for ` meta_to_filename ` / ` interpret_fname ` .
56-
57- ## Data File Format
58-
59- CSV files with ` # ` -commented YAML front-matter:
60-
61- ``` csv
62- # format: dwr-dms-1.0
63- # date_formatted: 2024-01-15T12:00:00
64- # source_info:
65- # siteName: MOKELUMNE R A ANDRUS ISLAND
66- datetime,value,user_flag
67- 2020-01-01 00:00:00,1.5,0
68- ```
69-
70- - Index column: ` datetime `
71- - Always two data columns: ` value ` (float) and ` user_flag ` (` Int64 ` , nullable)
72- - ` user_flag != 0 ` → anomalous; masked by ` read_ts ` by default (` read_flagged=True ` )
73- - Files are year-sharded; wildcards handled automatically by ` read_ts `
74-
75- ## Key Conventions
76-
77- - ** Station IDs with sublocation** : ` station_id@subloc ` (e.g. ` anh@north ` , ` msd@bottom ` )
78- - ** Variables with modifier** : ` param@modifier ` (e.g. ` ec@daily ` )
79- - ** Units** : SI for most variables; stage/flow in ft / cfs; salinity as specific conductivity at 25°C (µS/cm)
80- - ** Source priority** is declared per agency in ` dstore_config.yaml ` and resolved by ` read_ts_repo ` — do not hard-code provider preferences in code
81- - ** Config paths** are resolved by ` dstore_config.config_file(label) ` — checks cwd first, then ` config_data/ `
82- - New download modules must register as a Click command in ` __main__.py ` and add an entry point in ` pyproject.toml `
83-
84- ## Tests
85-
86- - ` tests/ ` — unit and integration tests with monkeypatched config; no real repo needed
87- - ` test_repo/ ` — integration tests; pass ` --repo=<path> ` to pytest
88- - Use ` tmp_path ` and ` monkeypatch ` for config isolation
89- - Do not couple unit tests to the shared repo path
90-
91- ## Key Reference Files
92-
93- - [ README.md] ( ../README.md ) — full data model, flags, units, configuration system
94- - [ README-dropbox.md] ( ../README-dropbox.md ) — Dropbox data ingestion via ` dropbox_spec.yaml `
95- - [ README-commands.md] ( ../README-commands.md ) — CLI command reference
96- - [ dms_datastore/config_data/dstore_config.yaml] ( ../dms_datastore/config_data/dstore_config.yaml ) — central config
0 commit comments