You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/CLAUDE.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,13 +22,13 @@ Style: Black (format-on-save in VSCode). Lint: flake8 + pylint, configured in `p
22
22
23
23
## Architecture at a glance
24
24
25
-
**openpois** models POI stability over time from OpenStreetMap history, and produces unified OSM + Overture + Foursquare snapshots for web consumption. Work splits into four pipelines:
25
+
**openpois** models POI stability over time from OpenStreetMap history, and produces unified OSM + Overture snapshots for web consumption. Work splits into four pipelines:
26
26
27
27
| Pipeline | Skill |
28
28
|---|---|
29
29
| Fit λ from OSM history, rate current snapshots |[skills/model-history-pipeline](skills/model-history-pipeline/SKILL.md)|
30
30
| Iterate model variants on a pinned history run |[skills/iterate-model-types](skills/iterate-model-types/SKILL.md)|
31
-
| Refresh the three POI snapshots (OSM / Overture / FSQ) |[skills/full-data-pull](skills/full-data-pull/SKILL.md)|
31
+
| Refresh the POI snapshots (OSM / Overture) |[skills/full-data-pull](skills/full-data-pull/SKILL.md)|
32
32
| Conflate OSM + Overture, partition, upload to S3 |[skills/conflate-snapshots](skills/conflate-snapshots/SKILL.md)|
33
33
| Bump the frontend to the new data version |[skills/update-site](skills/update-site/SKILL.md)|
34
34
| Post-run QA on any of the above |[skills/verify-pipeline-run](skills/verify-pipeline-run/SKILL.md)|
@@ -37,7 +37,7 @@ Style: Black (format-on-save in VSCode). Lint: flake8 + pylint, configured in `p
37
37
38
38
| Path | Purpose |
39
39
|---|---|
40
-
|[src/openpois/io/](../src/openpois/io/)| I/O adapters: OSM history/snapshot, Overture, Foursquare, Census boundary |
40
+
|[src/openpois/io/](../src/openpois/io/)| I/O adapters: OSM history/snapshot, Overture, Census boundary |
-**Pipeline**: row filter `country IN ('US', 'PR') AND date_closed IS NULL` → PyIceberg scan → sjoin against exact US+PR polygon (PyIceberg has no spatial predicates).
**Prefer `get_file_path` over composing `get_dir_path()` + `get()` manually.**
33
32
34
-
`.get()` raises `ValueError` on null values — pass `fail_if_none=False` for optional fields like `download.overture.release_date: null` and `download.foursquare.release_date: null`.
33
+
`.get()` raises `ValueError` on null values — pass `fail_if_none=False` for optional fields like `download.overture.release_date: null`.
35
34
36
35
`config.write_self(section)`snapshots the effective config into the output directory — used by model and conflation scripts to record the state of a run.
37
36
38
37
## Naming conventions
39
38
40
39
- **Dates**: `YYYYMMDD`, e.g., `20260416`.
41
40
- **Model variants**: `{date}_by_{group_key}` (e.g., `20260416_by_leisure`, `20260416_by_amenity`) or `{date}_constant`. See [skills/iterate-model-types](../skills/iterate-model-types/SKILL.md).
42
-
- **Independent cadences**: snapshot versions can (and should) differ across sources — Overture releases ~monthly, Foursquare separately. Don't force them to match.
41
+
- **Independent cadences**: snapshot versions can (and should) differ across sources — Overture releases ~monthly. Don't force them to match.
43
42
44
43
## External references (hand-update when bumping)
45
44
@@ -48,7 +47,7 @@ Version strings appear in these places outside `versions:` — grep before any c
48
47
| File | References |
49
48
|---|---|
50
49
| [config.yaml](../../config.yaml) | `upload.latest_url_osm`, `upload.latest_url_conflation` (full URL with date) |
| [site/public/about.html](../../site/public/about.html) | Hardcoded S3 browse links in the data-access section |
53
52
| `osm_data.apply_model.model_stub` (config.yaml) | Which model family [scripts/osm_snapshot/apply_model.py](../../scripts/osm_snapshot/apply_model.py) ingests |
Copy file name to clipboardExpand all lines: .claude/skills/full-data-pull/SKILL.md
+4-8Lines changed: 4 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,15 @@
1
1
---
2
2
name: full-data-pull
3
-
description: Use when the user wants to refresh the three independent POI snapshots (OSM, Overture, Foursquare) and rate the OSM snapshot for conflation. Triggers: "refresh all snapshots", "do a new data pull", "download new OSM/Overture/Foursquare", "monthly data refresh", "pull the latest POI data". Does NOT include conflation or S3 upload — those live in conflate-snapshots.
3
+
description: Use when the user wants to refresh the independent POI snapshots (OSM, Overture) and rate the OSM snapshot for conflation. Triggers: "refresh all snapshots", "do a new data pull", "download new OSM/Overture", "monthly data refresh", "pull the latest POI data". Does NOT include conflation or S3 upload — those live in conflate-snapshots.
4
4
---
5
5
6
6
# Full data pull
7
7
8
-
Downloads the three snapshot sources (50 US states + DC + PR) and applies the rating model to OSM so conflation can run.
8
+
Downloads the snapshot sources (50 US states + DC + PR) and applies the rating model to OSM so conflation can run.
9
9
10
10
## Prerequisites
11
11
12
12
- conda env `openpois` active.
13
-
- For Foursquare: `FSQ_PORTAL_TOKEN` env var set.
14
13
- For OSM: `osmium` in env bin (resolved automatically via `Path(sys.executable).parent / "osmium"`).
15
14
- Boundary cache at `directories.boundary` (auto-downloads on first use).
16
15
- A fitted model exists for the OSM rating step (see [skills/model-history-pipeline](../model-history-pipeline/SKILL.md)).
@@ -22,16 +21,14 @@ Downloads the three snapshot sources (50 US states + DC + PR) and applies the ra
22
21
versions:
23
22
snapshot_osm: "YYYYMMDD"
24
23
snapshot_overture: "YYYYMMDD"
25
-
snapshot_foursquare: "YYYYMMDD"
26
24
```
27
25
See [docs/data-versioning.md](../../docs/data-versioning.md).
28
26
29
-
2. **Run the three downloads** (independent — order doesn't matter, can run in parallel):
27
+
2. **Run the downloads** (independent — order doesn't matter, can run in parallel):
Per-source details, auth, and schema quirks are in [docs/data-sources.md](../../docs/data-sources.md).
37
34
@@ -51,9 +48,8 @@ Downloads the three snapshot sources (50 US states + DC + PR) and applies the ra
51
48
Hand off to [skills/verify-pipeline-run](../verify-pipeline-run/SKILL.md). Baseline totals (as of 2026-04-17):
52
49
- OSM: ~7.78M POIs
53
50
- Overture: ~13.05M POIs (jumped from ~7.23M after widening `download.overture.taxonomy_allowlist` to include `services_and_business` + `lifestyle_services` sub-branches)
54
-
- Foursquare: ~8.32M POIs
55
51
56
-
Flag >5% drops — Foursquare in particular has had silent country-filter regressions (PR alpha-2 code quirk).
Copy file name to clipboardExpand all lines: .claude/skills/verify-pipeline-run/SKILL.md
+2-4Lines changed: 2 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,12 +7,11 @@ description: Use when the user wants a QA/sanity check on a recently completed p
7
7
8
8
Post-run QA runbook. Pick the subsection that matches what just ran.
9
9
10
-
## Snapshots (OSM / Overture / Foursquare)
10
+
## Snapshots (OSM / Overture)
11
11
12
12
Baseline row counts (2026-04-17):
13
13
- OSM: ~7.78M
14
14
- Overture: ~13.05M (up from ~7.23M after widening `taxonomy_allowlist`; pre-2026-04-17 runs will be lower)
15
-
- Foursquare: ~8.32M
16
15
17
16
Check:
18
17
```python
@@ -21,7 +20,6 @@ pd.read_parquet(path).shape[0]
21
20
```
22
21
23
22
Flag >5% drops. Known regression patterns:
24
-
-**Foursquare**: PR alpha-2 code — filter must be `country IN ('US', 'PR')`, not `'US'` only.
25
23
-**OSM**: PR is a *separate* PBF — confirm both `us-latest.osm.pbf` and `puerto-rico-latest.osm.pbf` got downloaded, filtered, and concat'd.
26
24
-**Overture**: coarse-bbox pushdown + final DuckDB `ST_Within` — drop means the Aleutian antimeridian split was lost or the Census boundary failed to load. If the run crashed with "Information loss on integer cast", the DuckDB pin was bumped off 1.4.1 (see [docs/data-sources.md](../../docs/data-sources.md) → Overture Maps).
27
25
@@ -63,7 +61,7 @@ Confirm `conf_mean`, `conf_lower`, `conf_upper` columns are populated for every
63
61
64
62
- Open the deployed site (or `npm run dev` locally after a constants.js bump).
0 commit comments