Skip to content

Commit 235ea0a

Browse files
committed
Move all orchestration scripts to scripts/ and update documentation.
1 parent 8f1368e commit 235ea0a

18 files changed

Lines changed: 223 additions & 65 deletions
File renamed without changes.
Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,37 @@
11
#!/home/nathenry/miniforge3/envs/openpois/bin/python
22
"""
3-
Conflate rated OSM POIs with Overture Maps POIs.
4-
5-
Reads both snapshots, builds a taxonomy crosswalk, finds spatial
6-
candidates via BallTree, scores candidates on distance + name +
7-
type + identifiers, performs greedy one-to-one matching, and merges
8-
into a unified superset saved as GeoParquet.
3+
Conflate rated OSM POIs with Overture Maps POIs into a unified dataset.
4+
5+
Reads both snapshots, assigns each POI a shared taxonomy label via CSV
6+
crosswalk files, finds spatial candidates within per-category radii using a
7+
BallTree, scores candidate pairs on distance, name similarity, type agreement,
8+
and shared identifiers, performs greedy one-to-one matching, and merges all
9+
POIs (matched and unmatched) into a single GeoParquet output.
10+
11+
Config keys used (config.yaml):
12+
snapshot_osm.rated_snapshot — rated OSM GeoParquet input path
13+
snapshot_overture.snapshot — Overture GeoParquet input path
14+
conflation.conflated — output GeoParquet path
15+
download.osm.filter_keys — tag keys used for taxonomy assignment
16+
conflation.overture_confidence_weight — weight on Overture confidence in scoring
17+
conflation.min_match_score — minimum composite score to accept a match
18+
conflation.max_radius_m — maximum candidate search radius in meters
19+
conflation.default_radius_m — fallback radius for unclassified POIs
20+
conflation.distance_weight — scoring weight for spatial distance
21+
conflation.name_weight — scoring weight for name similarity
22+
conflation.type_weight — scoring weight for taxonomy agreement
23+
conflation.identifier_weight — scoring weight for shared identifiers
24+
conflation.chunk_size — BallTree chunk size for memory management
25+
conflation.test_bbox — small bbox used with --test flag
926
1027
Usage:
11-
python exploratory/conflation/conflate.py # full run
12-
python exploratory/conflation/conflate.py --test # small bbox
28+
python scripts/conflation/conflate.py # full CONUS run
29+
python scripts/conflation/conflate.py --test # Seattle test bbox
30+
31+
Output file:
32+
conflated.parquet — GeoParquet with all OSM + Overture POIs, columns:
33+
shared_label, source (matched/osm/overture), match_score,
34+
osm_id, overture_id, name, conf_mean/lower/upper, geometry, ...
1335
"""
1436
from __future__ import annotations
1537

File renamed without changes.
Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,20 @@
22
"""
33
Summarize the conflated dataset by shared_label and source.
44
5-
Produces a CSV with one row per shared_label showing counts by
6-
source (matched, osm, overture) and average match score.
5+
Reads conflated.parquet and produces a CSV with one row per shared_label
6+
showing POI counts broken down by source (matched, osm, overture) and the
7+
average composite match score for matched pairs.
78
8-
Usage:
9-
python exploratory/conflation/summarize.py
9+
Config keys used (config.yaml):
10+
conflation.conflated — input GeoParquet path (conflated.parquet)
11+
conflation.summary_by_label — output CSV path
12+
13+
Prerequisites:
14+
Run scripts/conflation/conflate.py first.
15+
16+
Output file:
17+
summary_by_label.csv — columns: shared_label, matched, osm, overture,
18+
total, avg_match_score; sorted by total descending
1019
"""
1120
from __future__ import annotations
1221

@@ -53,6 +62,6 @@
5362
)
5463
summary.index.name = "shared_label"
5564

56-
summary.to_csv(OUTPUT_PATH)
57-
print(f"\nSaved to {OUTPUT_PATH}")
65+
summary.to_csv(output_path)
66+
print(f"\nSaved to {output_path}")
5867
print(f"\n{summary.to_string()}")
Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,30 @@
11
"""
2-
Exploratory script for downloading a current Foursquare OS Places snapshot.
2+
Download the current US Foursquare OS Places snapshot as a GeoParquet file.
33
4-
This script uses openpois.foursquare.download to:
5-
1. Authenticate to the Foursquare Places Portal Iceberg catalog.
6-
2. Auto-detect or use a pinned release date.
7-
3. Load US places filtered by L1 category and save as GeoParquet.
4+
Authenticates to the Foursquare Places Portal Apache Iceberg REST catalog
5+
using a portal token, loads the unpartitioned places_os table filtered to US
6+
records with no closed date, joins against categories_os to resolve L1
7+
category names, and saves the result as a GeoParquet file.
88
99
Authentication:
10-
Set the FSQ_PORTAL_TOKEN environment variable to your portal token before
11-
running. Register at https://places.foursquare.com to obtain a token.
12-
13-
Example (bash):
10+
Set the FSQ_PORTAL_TOKEN environment variable before running:
1411
export FSQ_PORTAL_TOKEN="<your_token>"
15-
python exploratory/foursquare/download.py
12+
Register at https://places.foursquare.com to obtain a token.
13+
14+
Config keys used (config.yaml):
15+
download.foursquare.release_date — pinned release (null = auto-detect)
16+
download.foursquare.catalog_uri — REST catalog endpoint URL
17+
download.foursquare.catalog_warehouse — warehouse name ("places")
18+
download.foursquare.catalog_namespace — namespace ("datasets")
19+
download.foursquare.places_table — places table name ("places_os")
20+
download.foursquare.categories_table — categories table name ("categories_os")
21+
download.foursquare.token_env_var — env var name for the portal token
22+
download.foursquare.l1_category_names — L1 category filter list
23+
directories.snapshot_foursquare — output directory
24+
25+
Output file:
26+
foursquare_snapshot.parquet — GeoParquet with ~8.3M US POIs
27+
Columns: fsq_place_id, name, fsq_category_ids, geometry, source
1628
"""
1729
from config_versioned import Config
1830
from openpois.io.foursquare import download_foursquare_snapshot
Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,32 @@
11
"""
2-
PyTorch model testing
2+
Fit an empirical Bayes PyTorch model for OSM POI tag change rates.
33
4-
Created February 12, 2026
5-
Purpose: Explore a simple empirical Bayes PyTorch model framework for change data
4+
Reads osm_observations_{tag_key}.csv and fits a Poisson change-rate model
5+
using L-BFGS optimization via PyTorch. The model estimates a per-group change
6+
rate λ (events per year). Predictions give the probability that a tag remains
7+
unchanged after t years for t = 0.0, 0.1, ..., 10.0. Supports constant and
8+
random-effects (by type) model specifications.
69
7-
Reads data prepared in `osm/format_tabular.py`
10+
Config keys used (config.yaml):
11+
directories.osm_data — input data directory
12+
directories.model_output — output directory for results
13+
osm_turnover_model.tag_key — tag key to model (e.g. "amenity")
14+
osm_turnover_model.group_key — column to group by (null = constant)
15+
osm_turnover_model.group_values — subset of group values (null = all)
16+
osm_turnover_model.min_value_count — minimum observations to include a group
17+
osm_turnover_model.model_type — "constant" or "random_by_type"
18+
osm_turnover_model.var_prior — prior variance on log(λ)
19+
osm_turnover_model.n_draws — number of posterior parameter draws
20+
osm_turnover_model.save_full_model — save param_draws and serialized model
21+
22+
Prerequisites:
23+
Run osm_data/format_tabular.py first.
24+
25+
Output files (in model_output directory):
26+
fitted_params.csv — estimated λ with uncertainty per group
27+
predictions.csv — p(unchanged) at t = 0.0..10.0 years per group
28+
param_draws.csv — posterior draws (if save_full_model = true)
29+
fitted_model.pt — serialized ModelFitter (if save_full_model = true)
830
"""
931

1032
import numpy as np
Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,27 @@
11
"""
2-
Exploratory data viz script for OSM observations.
3-
4-
This script:
5-
1. Reads in the OSM observations from a CSV file.
6-
2. Creates time series plots of the observations, showing how many remain open over time.
2+
Plot OSM tag stability curves from observation data.
3+
4+
Reads osm_observations_{tag_key}.csv and computes Kaplan-Meier-style survival
5+
estimates showing what fraction of tag assignments remain unchanged over time.
6+
Saves two types of PNG figures:
7+
1. Overall stability curve — all tags pooled into a single panel.
8+
2. Per-subtype multi-panel curves — top-N values for each key in
9+
download_keys, shown as separate facets on one figure per key.
10+
11+
Config keys used (config.yaml):
12+
directories.osm_data — directory containing input CSV and viz/ output
13+
download.download_keys — tag keys used as grouping variables for subplots
14+
osm_data.tag_key — the tag being analysed (e.g. "amenity")
15+
osm_data.timestamp_cols — columns to parse as timestamps (rows with nulls dropped)
16+
osm_data.top_n_types — number of top subtype values per multi-panel figure
17+
download.osm.end_date — right-censoring date for still-unchanged tags
18+
19+
Prerequisites:
20+
Run osm_data/format_tabular.py first.
21+
22+
Output files (in osm_data/viz/):
23+
osm_changes_{tag_key}_all.png — overall survival curve
24+
osm_changes_{tag_key}_{key}.png — per-subtype facet grid, one per key
725
"""
826

927
import numpy as np
Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,24 @@
11
"""
2-
Exploratory script for downloading OSM data.
2+
Download OSM element change histories for a configured bbox and date range.
33
4-
This script uses the openpois.osm_data module to:
5-
1. Collect element IDs across a date range using the Overpass API.
6-
2. Download element histories from the OSM API.
7-
3. Save the results to CSV files.
4+
Uses the Overpass API to collect element IDs for each configured tag key across
5+
a series of snapshot dates, then fetches the full version history of each
6+
element via the OSM API.
7+
8+
Config keys used (config.yaml):
9+
download.general.bbox — WGS-84 bbox [xmin, ymin, xmax, ymax]
10+
download.general.timeout — request timeout in seconds
11+
download.osm.start_date — earliest snapshot date (min: 2012-09-13)
12+
download.osm.end_date — latest snapshot date
13+
download.osm.date_interval_days — spacing between Overpass queries in days
14+
download.download_keys — OSM tag keys to search for (e.g. amenity, shop)
15+
directories.osm_data — output directory
16+
17+
Output files (in osm_data directory):
18+
osm_elements.csv — element IDs observed at each snapshot date
19+
osm_versions.csv — one row per element version with all tag fields
20+
osm_changes.csv — one row per version pair flagging tag changes
21+
osm_failed_elements.csv — elements whose history could not be retrieved
822
"""
923
import datetime
1024
from config_versioned import Config
Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,24 @@
11
"""
2-
Exploratory script for reformatting OSM data into a tabular format.
3-
4-
This script:
5-
1. Reads in the OSM versions and changes data from CSV files.
6-
2. Reconfigures POI changesets into 'observations', which are either changes to the
7-
relevant POI tag or confirmation that the tag is unchanged.
8-
3. Saves the observations to a new CSV file.
2+
Reformat raw OSM version histories into modelling-ready observations.
3+
4+
Reads osm_versions.csv and osm_changes.csv produced by osm_data/download.py,
5+
then converts them into an observation-per-version format suitable for the
6+
change-rate model. Each observation records the tag value, the timestamps of
7+
the previous tag assignment and the current observation, and a flag for whether
8+
the tag changed.
9+
10+
Config keys used (config.yaml):
11+
directories.osm_data — directory containing input and output CSVs
12+
download.download_keys — all tag keys collected (passed as keep_keys)
13+
osm_data.tag_key — single tag key to model (e.g. "amenity")
14+
15+
Prerequisites:
16+
Run osm_data/download.py first to produce osm_versions.csv and osm_changes.csv.
17+
18+
Output file (in osm_data directory):
19+
osm_observations_{tag_key}.csv — one row per version observation with columns:
20+
id, version, tag_key, last_tag_timestamp, obs_timestamp, changed,
21+
plus all keep_keys columns for grouping
922
"""
1023

1124
import pandas as pd

0 commit comments

Comments
 (0)