Skip to content

Commit 83746a8

Browse files
committed
Clean documentation and remove orphaned scripts in preparation for public release.
1 parent 71e94eb commit 83746a8

19 files changed

Lines changed: 129 additions & 532 deletions

File tree

.claude/CLAUDE.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ make build_env # Create conda env from environment.yml (name: openpois, Py
99
make install_package # pip install -e . (editable install)
1010
```
1111

12-
Python executable: `/home/nathenry/miniforge3/envs/openpois/bin/python`.
12+
Python executable: `$CONDA_PREFIX/bin/python` after `conda activate openpois`.
13+
14+
> **Note on data paths.** Examples in this directory show resolved paths under `~/data/openpois/...`, which is the maintainer's configured `directories.*.path` value in `config.yaml`. Substitute your own `directories.*.path` root when reading them; the layout under that root is set by `config.yaml`.
1315
1416
## Common commands
1517

.claude/docs/data-sources.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Reference for every external data source openpois ingests. For the workflow that
1212
- **Auth**: OAuth — any OSM account works. Produce a Netscape-format cookie jar (browser export or Geofabrik's `oauth_cookie_client.py`). Path: `download.osm.history_cookie_file` (default `~/data/openpois/.creds/geofabrik_cookies.txt`).
1313
- **Pipeline**: `osmium tags-filter --omit-referenced``osmium time-filter` → pyosmium streams to `osm_versions.parquet` + `osm_changes.parquet`.
1414
- **Entry**: [src/openpois/io/osm_history_pbf.py](../../src/openpois/io/osm_history_pbf.py) (`download_osm_history`).
15-
- **Config**: `download.osm.start_date`, `end_date`, `date_interval_days`, `filter_keys`, `extract_keys`.
15+
- **Config**: `download.osm.start_date`, `end_date`, `filter_keys`, `extract_keys`.
1616

1717
## OSM snapshot (Geofabrik standard PBFs)
1818

@@ -55,11 +55,3 @@ Reference for every external data source openpois ingests. For the workflow that
5555
- **Returns**: `(boundary_gdf, coarse_bboxes)` — single-row dissolved+buffered polygon (EPSG:4326) plus a list of bboxes for predicate pushdown.
5656
- **Antimeridian**: Aleutians split into two bboxes (Near Islands at +172°E vs. rest of AK at negative longitudes).
5757

58-
## Legacy: Overpass-based OSM history
59-
60-
Still wired up but superseded by the PBF pipeline. Queries Overpass API for element IDs in a bbox, then fetches per-element histories from the OSM API.
61-
62-
- **Config**: `download.osm.history_bbox` (Seattle-scoped; Overpass can't serve US-wide histories).
63-
- **Entry**: [src/openpois/io/osm_history.py](../../src/openpois/io/osm_history.py) (`download_element_histories`).
64-
- **Script**: `scripts/osm_data/download.py`.
65-
- **When to use**: city-scale testing, or if Geofabrik OAuth is unavailable.

.claude/skills/conflate-snapshots/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ upload for web consumption.
1818
> credentials (`aws_access_key_id` starting with `ASIA…`). **Before** running
1919
> step 7, ask the user to regenerate them at
2020
> <https://source.coop/repositories/henryspatialanalysis/openpois/manage>
21-
> and overwrite `~/repos/openpois/.env.json`. The upload script will warn if
21+
> and overwrite `.env.json` at the repo root. The upload script will warn if
2222
> the file looks stale, but it cannot tell whether the token itself has
2323
> expired until it actually fails.
2424

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ CONDA_PYTHON := $(shell conda run -n openpois which python 2>/dev/null || echo p
2020
CONDA_BIN := $(dir $(CONDA_PYTHON))
2121

2222
lint:
23-
@$(CONDA_BIN)flake8 src/ exploratory/ tests/
23+
@$(CONDA_BIN)flake8 src/ scripts/ tests/
2424
@$(CONDA_BIN)pylint src/openpois/
2525

2626
# Build the site for production

config.yaml

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,6 @@ download:
2222
osm:
2323
start_date: 2016-01-01
2424
end_date: 2025-12-31
25-
date_interval_days: 7
26-
# Seattle-area bbox used only by the Overpass-based historical download.
27-
# Overpass cannot serve US-wide histories; keep this scoped to a city.
28-
history_bbox:
29-
xmin: -122.45
30-
ymin: 47.50
31-
xmax: -122.25
32-
ymax: 47.70
3325
pbf_url: "https://download.geofabrik.de/north-america/us-latest.osm.pbf"
3426
pr_pbf_url: "https://download.geofabrik.de/north-america/us/puerto-rico-latest.osm.pbf"
3527
# Full-history PBFs live on Geofabrik's OAuth-protected internal server.
@@ -156,9 +148,6 @@ directories:
156148
us_changes: us_osm_changes.parquet
157149
pr_versions: pr_osm_versions.parquet
158150
pr_changes: pr_osm_changes.parquet
159-
# Legacy Overpass-based pipeline (still used by scripts/osm_data/download.py)
160-
osm_elements: osm_elements.csv
161-
osm_failed_elements: osm_failed_elements.csv
162151
# Modelling-ready observations (one row per POI version × shared_label)
163152
osm_observations: osm_observations.parquet
164153
model_output:
@@ -208,7 +197,6 @@ directories:
208197

209198
# Settings for POI conflation
210199
conflation:
211-
osm_confidence_weight: 0.8
212200
overture_confidence_weight: 0.7
213201
min_match_score: 0.50
214202
max_radius_m: 200

docs/api.rst

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -51,15 +51,16 @@ used for type-agreement scoring.
5151
io
5252
--
5353

54-
openpois.io.osm_history
55-
~~~~~~~~~~~~~~~~~~~~~~~
54+
openpois.io.osm_history_pbf
55+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
5656

57-
Download OpenStreetMap element change histories via the Overpass and OSM APIs.
58-
Builds Overpass queries across a configured date range to collect element IDs,
59-
then fetches the full version history of each element, producing per-version
60-
and per-change tables suitable for the change-rate model.
57+
Download Geofabrik full-history PBFs (US + Puerto Rico), filter to POI tags
58+
with ``osmium tags-filter``, time-window with ``osmium time-filter``, and parse
59+
with pyosmium into per-version and per-change Parquet tables suitable for the
60+
change-rate model. Uses an OAuth cookie jar against Geofabrik's internal
61+
server.
6162

62-
.. automodule:: openpois.io.osm_history
63+
.. automodule:: openpois.io.osm_history_pbf
6364
:members:
6465
:undoc-members:
6566
:show-inheritance:
@@ -105,15 +106,27 @@ sorts rows within each partition by a finer geohash for spatial locality.
105106
:undoc-members:
106107
:show-inheritance:
107108

108-
openpois.io.s3
109-
~~~~~~~~~~~~~~
109+
openpois.io.source_coop
110+
~~~~~~~~~~~~~~~~~~~~~~~
111+
112+
Upload a locally partitioned dataset to Source Cooperative's S3-compatible
113+
storage. Walks the Hive partition directory, uploads each Parquet file under
114+
a versioned prefix, and reports the public URL on completion. Credentials
115+
come from a JSON file at the repo root (``publish.credentials_file``).
116+
117+
.. automodule:: openpois.io.source_coop
118+
:members:
119+
:undoc-members:
120+
:show-inheritance:
121+
122+
openpois.io.credentials
123+
~~~~~~~~~~~~~~~~~~~~~~~
110124

111-
Upload a locally partitioned dataset to a public S3 bucket. Walks the Hive
112-
partition directory, uploads each Parquet file under a versioned S3 prefix
113-
with public-read ACL, and reports the public base URL on completion. Requires
114-
AWS credentials via environment variables or ``~/.aws/credentials``.
125+
Load Source Cooperative AWS-compatible credentials from a JSON file. Tokens
126+
are short-lived (~1 hour); the loader logs a clear error pointing at the
127+
credentials regeneration URL when the file is stale or missing.
115128

116-
.. automodule:: openpois.io.s3
129+
.. automodule:: openpois.io.credentials
117130
:members:
118131
:undoc-members:
119132
:show-inheritance:

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
# -- Project information -------------------------------------------------------
1010

1111
project = "openpois"
12-
copyright = "2024, Nathaniel Henry"
12+
copyright = "2026, Nathaniel Henry"
1313
author = "Nathaniel Henry"
1414
release = "0.1.0"
1515

docs/workflows.md

Lines changed: 0 additions & 4 deletions
This file was deleted.

docs/workflows.rst

Lines changed: 76 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,46 @@
11
Workflows
22
=========
33

4-
This page describes the four end-to-end pipelines that make up the openpois
5-
data processing system, in the order they should be executed. Each pipeline
6-
is implemented as a series of scripts in the ``scripts/`` directory; the
7-
scripts call library functions documented in the :doc:`api`.
4+
This page describes the four end-to-end pipelines that produce the openpois
5+
dataset, in the order they are executed. Each pipeline is implemented as a
6+
series of scripts in the ``scripts/`` directory; the scripts call library
7+
functions documented in the :doc:`api`.
88

99
All scripts read their configuration from ``config.yaml`` via
1010
``config_versioned.Config``. See the individual script docstrings for the
1111
exact config keys each script uses.
1212

13-
---
13+
14+
Prerequisites
15+
-------------
16+
17+
**Python environment.** Install the conda env from ``environment.yml`` and
18+
the package itself in editable mode:
19+
20+
.. code-block:: bash
21+
22+
make build_env # conda env create -f environment.yml (env name: openpois)
23+
conda activate openpois
24+
make install_package # pip install -e .
25+
26+
**Geofabrik OAuth (Pipeline 2 only).** Pipeline 2 downloads full-history
27+
PBFs from Geofabrik's OAuth-protected internal server. Any OSM account grants
28+
access. Generate a Netscape-format cookie jar by logging in at
29+
``https://osm-internal.download.geofabrik.de/`` and exporting cookies, or by
30+
running Geofabrik's ``oauth_cookie_client.py``. Save the cookie jar at the
31+
path configured in ``config.yaml`` under ``download.osm.history_cookie_file``.
32+
33+
**Source Cooperative credentials (publishing only).** Publishing the data
34+
back to Source Cooperative requires short-lived AWS-style credentials in a
35+
JSON file at the path configured under ``publish.credentials_file`` (default:
36+
``.env.json`` at the repo root). The format is documented in the
37+
``scripts/publish/upload_to_source_coop.py`` docstring. Replicators who do
38+
not intend to publish can stop after Pipeline 4 Step 3.
39+
40+
----
1441

1542
Pipeline 1: POI Snapshot Downloads
16-
------------------------------------
43+
----------------------------------
1744

1845
These two scripts are independent and can be run in any order (or in
1946
parallel). Each downloads a current US-wide POI snapshot from one data
@@ -51,27 +78,29 @@ See :mod:`openpois.io.overture`.
5178
Reads the first 100 rows of each snapshot without loading the full files,
5279
saving snippet CSVs to the ``testing/`` directory for column inspection.
5380

54-
---
81+
----
5582

5683
Pipeline 2: OSM Historical Change-Rate Model
5784
--------------------------------------------
5885

59-
This pipeline downloads OpenStreetMap element histories for a Seattle-area
60-
bounding box and fits a Poisson change-rate model to estimate how quickly
61-
different POI categories become outdated.
86+
This pipeline downloads OpenStreetMap full-history PBFs (US + Puerto Rico)
87+
and fits a Poisson change-rate model to estimate how quickly different POI
88+
categories become outdated.
6289

63-
**Step 1 — Download OSM element histories**
90+
**Step 1 — Download full-history PBFs**
6491

6592
.. code-block:: bash
6693
67-
python scripts/osm_data/download.py
94+
python scripts/osm_data/download_history.py
6895
69-
Queries the Overpass API across a configured date range to collect element IDs
70-
for each tag key, then fetches the full version history of each element via the
71-
OSM API. Outputs ``osm_elements.csv``, ``osm_versions.csv``,
72-
``osm_changes.csv``, and ``osm_failed_elements.csv``.
96+
Requires the Geofabrik OAuth cookie jar described in *Prerequisites* above.
97+
Downloads the US-mainland and Puerto Rico full-history extracts, filters
98+
each with ``osmium tags-filter`` (POI tag keys only) and ``osmium
99+
time-filter`` (the ``download.osm.start_date`` / ``end_date`` window), then
100+
parses with pyosmium into per-version and per-change Parquet tables.
101+
Outputs: ``osm_versions.parquet`` and ``osm_changes.parquet``.
73102

74-
See :mod:`openpois.io.osm_history`.
103+
See :mod:`openpois.io.osm_history_pbf`.
75104

76105
**Step 2 — Reformat into observations**
77106

@@ -82,7 +111,7 @@ See :mod:`openpois.io.osm_history`.
82111
Converts raw version histories into one-row-per-observation records, each
83112
flagged for whether the configured ``osm_data.tag_key`` changed, then
84113
assigns a shared taxonomy label and explodes rows for POIs mapping to
85-
multiple labels. Output: ``osm_observations.csv``.
114+
multiple labels. Output: ``osm_observations.parquet``.
86115

87116
See :mod:`openpois.osm.format_observations`.
88117

@@ -111,10 +140,10 @@ Produces Kaplan-Meier-style survival curve plots saved to
111140

112141
See :mod:`openpois.osm.change_plots`.
113142

114-
---
143+
----
115144

116145
Pipeline 3: Rate the OSM Snapshot
117-
------------------------------------
146+
---------------------------------
118147

119148
This pipeline applies the fitted change-rate model (Pipeline 2) to the OSM
120149
snapshot (Pipeline 1) to assign a confidence score to every POI.
@@ -141,31 +170,30 @@ See :mod:`openpois.models.apply`.
141170
142171
python scripts/osm_snapshot/format_for_upload.py
143172
144-
Adds geohash columns and writes a Hive-style partitioned dataset so that the
145-
web map can fetch only the tiles it needs. Output:
146-
``osm_snapshot_partitioned/``.
173+
Adds geohash columns and writes a Hive-style partitioned dataset so the web
174+
map can fetch only the tiles it needs. Output: ``osm_snapshot_partitioned/``.
147175

148176
See :mod:`openpois.io.geohash_partition`.
149177

150-
**Step 3 — Upload to S3**
178+
**Step 3 — Build OSM PMTiles**
151179

152180
.. code-block:: bash
153181
154-
python scripts/osm_snapshot/upload_to_s3.py
182+
python scripts/osm_snapshot/prepare_pmtiles.py
155183
156-
Uploads the partitioned dataset to the configured public S3 bucket with
157-
public-read ACL. Requires AWS credentials (``AWS_ACCESS_KEY_ID`` /
158-
``AWS_SECRET_ACCESS_KEY`` env vars or ``~/.aws/credentials``).
184+
Generates a single-zoom (z14) PMTiles archive from the partitioned dataset
185+
for use by the web map. Output: ``osm_snapshot.pmtiles``.
159186

160-
See :mod:`openpois.io.s3`.
187+
See :mod:`openpois.io.pmtiles`.
161188

162-
---
189+
----
163190

164-
Pipeline 4: Conflation and Upload
165-
------------------------------------
191+
Pipeline 4: Conflation and Publishing
192+
-------------------------------------
166193

167-
This pipeline conflates the rated OSM snapshot with the Overture Maps snapshot
168-
into a single unified POI dataset for the web map.
194+
This pipeline conflates the rated OSM snapshot with the Overture Maps
195+
snapshot into a single unified POI dataset and publishes it to Source
196+
Cooperative.
169197

170198
**Prerequisites:** Pipeline 3 rated OSM snapshot and Pipeline 1 Overture
171199
snapshot.
@@ -193,23 +221,29 @@ See :mod:`openpois.conflation.match`, :mod:`openpois.conflation.merge`, and
193221
Produces a summary CSV with match counts and average match scores per
194222
shared taxonomy label. Output: ``summary_by_label.csv``.
195223

196-
**Step 3 — Partition for upload**
224+
**Step 3 — Partition and build conflated PMTiles**
197225

198226
.. code-block:: bash
199227
200228
python scripts/conflation/format_for_upload.py
229+
python scripts/conflation/prepare_pmtiles.py
201230
202-
Adds geohash columns and writes a Hive-style partitioned dataset.
203-
Output: ``conflated_partitioned/``.
231+
Adds geohash columns and writes a Hive-style partitioned dataset, then
232+
builds a single-zoom (z14) PMTiles archive of the conflated points.
233+
Outputs: ``conflated_partitioned/`` and ``conflated.pmtiles``.
204234

205-
See :mod:`openpois.io.geohash_partition`.
235+
See :mod:`openpois.io.geohash_partition` and :mod:`openpois.io.pmtiles`.
206236

207-
**Step 4 — Upload to S3**
237+
**Step 4 — Publish to Source Cooperative** *(optional)*
208238

209239
.. code-block:: bash
210240
211-
python scripts/conflation/upload_to_s3.py
241+
python scripts/publish/upload_to_source_coop.py
212242
213-
Uploads the partitioned conflated dataset to S3 with public-read ACL.
243+
Uploads the partitioned conflated dataset, the partitioned OSM dataset,
244+
both PMTiles archives, and a per-version README to Source Cooperative
245+
under the ``versions.source_coop`` folder. Requires the credentials file
246+
described in *Prerequisites*. Skip this step if you only want the data
247+
locally.
214248

215-
See :mod:`openpois.io.s3`.
249+
See :mod:`openpois.io.source_coop` and :mod:`openpois.publish.build_readme`.

scripts/conflation/conflate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/home/nathenry/miniforge3/envs/openpois/bin/python
1+
#!/usr/bin/env python
22
"""
33
Conflate rated OSM POIs with Overture Maps POIs into a unified dataset.
44

0 commit comments

Comments
 (0)