You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/CLAUDE.md
+22-13Lines changed: 22 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,47 +24,56 @@ Code style is enforced by Black (format on save in VSCode). Linting via flake8 a
24
24
25
25
**openpois** models POI (Point of Interest) stability over time using historical OpenStreetMap data. The workflow is:
26
26
27
-
1.**Download OSM history** (`src/openpois/osm/download.py`) — queries the Overpass API for element histories within a bounding box and date range, producing version/change tables
27
+
1.**Download OSM history** (`src/openpois/io/osm_history.py`) — queries the Overpass API for element histories within a bounding box and date range, producing version/change tables
28
28
2.**Format observations** (`src/openpois/osm/format_observations.py`) — converts raw OSM version histories into observation records (one row per version) with flags for tag changes and deletions
29
29
3.**Model change rates** (`src/openpois/models/`) — fits an empirical Bayes model using PyTorch to estimate per-group POI change rates (λ) as a Poisson process
30
30
4.**Visualize stability** (`src/openpois/osm/change_plots.py`) — plots how long POI tags remain unchanged
31
31
32
-
The **exploratory/**scripts are end-to-end pipelines that call library functions using settings from `config.yaml`. They are not part of the installed package and serve as reference implementations.
32
+
The **scripts/**directory contains end-to-end pipelines that call library functions using settings from `config.yaml`. They are not part of the installed package and serve as reference implementations.
33
33
34
34
### Key classes and files
35
35
36
36
-`EventRate` (`models/event_rate.py`) — wraps a constant or time-varying λ; computes change probabilities via integration
37
37
-`ModelFitter` (`models/model_fitter.py`) — fits λ using PyTorch L-BFGS optimizer with optional priors; supports parameter draws for uncertainty
38
38
-`pytorch_setup()` / `prepare_data_for_model()` (`models/setup.py`) — initializes torch (GPU/CPU) and prepares filtered, grouped observation data
39
-
-`download_element_histories()` (`osm/download.py`) — main entry point for OSM history acquisition (Overpass, Seattle bbox only — do NOT modify for nationwide use)
39
+
-`download_element_histories()` (`io/osm_history.py`) — main entry point for OSM history acquisition (Overpass, `download.osm.history_bbox` config key, Seattle-scoped — do NOT repurpose for nationwide use; Overpass cannot serve US-wide histories)
40
40
41
41
### Configuration
42
42
43
-
`config.yaml` holds all shared settings (bounding box, date ranges, OSM tag keys, model hyperparameters, output directory paths with versioning). The `config_versioned` package (external dependency) reads this file. Exploratory scripts load config at startup; library functions accept parameters directly.
43
+
`config.yaml` holds all shared settings (spatial boundary, date ranges, OSM tag keys, model hyperparameters, output directory paths with versioning). The `config_versioned` package (external dependency) reads this file. Scripts load config at startup; library functions accept parameters directly.
44
44
45
45
-`.get()` raises `ValueError` for null config values — pass `fail_if_none=False` for optional fields like `release_date: null`
46
46
47
47
## POI Snapshot Downloads
48
48
49
-
Three separate utilities download current US-wide snapshots (separate from the historical OSM workflow):
49
+
Three separate utilities download current snapshots covering the 50 US states + DC + Puerto Rico (separate from the historical OSM workflow):
- Single source of truth for the US+PR extent used by all three snapshot downloaders
53
+
- Downloads the Census 1:20M cartographic state shapefile (`cb_2023_us_state_20m`) on first use; cached under `directories.boundary`
54
+
-`get_us_pr_boundary()` returns `(boundary_gdf, coarse_bboxes)` — a single-row dissolved+buffered polygon (EPSG:4326) plus a list of bboxes for predicate pushdown
55
+
- Buffering is done in `EPSG:6933` (World Equal-Area Cylindrical) so the `coastline_buffer_m` (default 100 m) is accurate across CONUS / AK / HI / PR. Because `.dissolve()` removes internal state borders, the uniform outward buffer effectively only expands coastline; land-border expansion into CA/MX is negligible.
56
+
-`coarse_bboxes` splits the Aleutians at the antimeridian into two bboxes (Near Islands at +172°E vs. rest of AK at negative longitudes)
- Two Geofabrik extracts: `us-latest.osm.pbf` (~11 GB, 50 states incl. AK+HI) + `puerto-rico-latest.osm.pbf` (PR is NOT in the US extract) → osmium tags-filter → pyosmium parse → concat → GeoParquet
61
+
- Geofabrik extracts are pre-cut to admin boundaries, so no polygon post-filter is needed
54
62
-`osmium` is in the conda env bin but NOT on shell PATH; code resolves it via `Path(sys.executable).parent / "osmium"`
- DuckDB + httpfs + spatial extensions; queries public S3 directly, no auth
67
+
-**Two-stage spatial filter:** DuckDB `WHERE` clause ORs one disjunct per coarse bbox (predicate pushdown on Overture's `bbox` struct column), then a GeoPandas `sjoin(predicate='within')` post-filter against the exact US+PR polygon
59
68
-`taxonomy` field is a named STRUCT: use `taxonomy.hierarchy[1]` (not `taxonomy[1]`)
60
69
-`brand` is a singular struct (not array); geometry is native DuckDB GEOMETRY type requiring `LOAD spatial` and `ST_X()/ST_Y()`
- Row filter: `country IN ('US', 'PR') AND date_closed IS NULL`— Foursquare uses ISO alpha-2 codes, so PR must be listed explicitly; PyIceberg has no spatial predicate support, so an exact `sjoin(predicate='within')` post-filter runs after the rows are loaded
69
78
-`fsq_category_ids` arrives as numpy/pyarrow array — use `len(x) == 0` not `if not x:`
70
-
- Token in `FSQ_PORTAL_TOKEN` env var; run: `python exploratory/foursquare/download.py`
79
+
- Token in `FSQ_PORTAL_TOKEN` env var; run: `python scripts/foursquare/download.py`
0 commit comments