You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/TODO.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,11 @@ Short running list of in-progress / upcoming work. Edit freely; trim older compl
6
6
7
7
## Upcoming
8
8
9
+
-[ ]**Auto-capture the three per-version README fields** so the publish step doesn't need `publish.version_metadata` overrides. Added 2026-04-24. Today `build_version_readme` in [src/openpois/publish/build_readme.py](../src/openpois/publish/build_readme.py) falls back to config overrides or best-effort guesses; aim is for the pipeline to write authoritative values alongside the data it produces, and the publish step to just read them.
10
+
-*OSM snapshot date* — `scripts/osm_snapshot/download.py` should write a `~/data/openpois/snapshots/osm/<version>/download_metadata.json` containing `{"downloaded_at": "<ISO date>", "pbf_url": "..."}` after the PBF download completes. `_resolve_osm_snapshot_date` then reads that file before falling back to the version string.
11
+
-*Overture release* — `scripts/overture/download.py` already resolves a concrete release (pinned or auto-detected) inside `download_overture_snapshot`; currently only the `.parts/<release>/` directory records it and `.parts/` is deleted on success. Surface the resolved release by writing `~/data/openpois/snapshots/overture/<version>/download_metadata.json` with `{"release": "2026-04-15.0", ...}` before the cleanup step. `_resolve_overture_release` reads that file ahead of the `.parts/` heuristic.
12
+
-*Turnover-model commit* — `scripts/models/osm_turnover.py` should capture `git rev-parse HEAD` at training time and either (a) extend `config.write_self("model_output")` to include a `git_commit` entry or (b) drop a `git_commit.txt` next to the model artifacts. `_resolve_model_commit` reads that value instead of the publish-time HEAD, which is the right fingerprint if code has changed between training and publishing.
13
+
- Publishing behaviour: if any of the three files is missing, keep the current fallback (and print a visible warning) so old pipeline runs still publish cleanly.
9
14
-[ ] Watch for a DuckDB release that fixes the WSL2 httpfs "Information loss on integer cast" crash (issue #21669, fix PR #21395). Once a tagged release ships with the fix and a full `scripts/overture/download.py` run on WSL2 completes, we can unpin from `duckdb==1.4.1` and revert the per-part download to a single-query DuckDB scan. Added 2026-04-17.
10
15
-[ ] Auto-check taxonomy changes whenever we switch to a new Overture Maps version (detect new/removed L0/L1/L2 categories vs. `taxonomy_crosswalk_overture_maps.csv` and flag gaps). Added 2026-04-16.
11
16
-[ ] Watch for Overture L0/L1 → flat `basic_category` migration (~June 2026). Crosswalk CSV + `assign_overture_shared_label` will need updating. See [docs/taxonomy-setup.md](docs/taxonomy-setup.md).
- **Model variants**: `{date}_by_{group_key}` (e.g., `20260416_by_leisure`, `20260416_by_amenity`) or `{date}_constant`. See [skills/iterate-model-types](../skills/iterate-model-types/SKILL.md).
41
+
- **Source Coop folder**: `YYYY-MM-DD-v<IDX>`. Default `v0` for every fresh publish; only bump `v1`, `v2`, … if republishing under the same calendar date (e.g. a hot-fix). The Source Coop upload script writes the per-version README into this folder, so the suffix must be unique per upload round.
41
42
- **Independent cadences**: snapshot versions can (and should) differ across sources — Overture releases ~monthly. Don't force them to match.
42
43
43
44
## External references (hand-update when bumping)
@@ -46,16 +47,15 @@ Version strings appear in these places outside `versions:` — grep before any c
46
47
47
48
| File | References |
48
49
|---|---|
49
-
| [config.yaml](../../config.yaml) | `upload.latest_url_osm`, `upload.latest_url_conflation` (full URL with date) |
| [site/public/about.html](../../site/public/about.html) | Hardcoded Source Coop browse links in the data-access section |
52
52
| `osm_data.apply_model.model_stub` (config.yaml) | Which model family [scripts/osm_snapshot/apply_model.py](../../scripts/osm_snapshot/apply_model.py) ingests |
53
53
54
-
[skills/update-site](../skills/update-site/SKILL.md) covers the frontend side; [skills/conflate-snapshots](../skills/conflate-snapshots/SKILL.md) covers the upload + config side.
54
+
[skills/update-site](../skills/update-site/SKILL.md) covers the frontend side; [skills/conflate-snapshots](../skills/conflate-snapshots/SKILL.md) covers the publish + config side.
55
55
56
56
## Workflow
57
57
58
-
1. Bump the relevant `versions.*` keys before running a pipeline.
58
+
1. Bump the relevant `versions.*` keys before running a pipeline. For a public release, also bump `versions.source_coop` to the new `YYYY-MM-DD-v0`.
59
59
2. Run the pipeline — outputs land in the versioned directory.
60
-
3. After upload, update `upload.latest_url_*` and the frontend references.
61
-
4. Old versions stay on disk / S3 — delete manually when confident nothing references them.
60
+
3. After publishing, update the frontend references in `site/src/constants.js` and `site/public/about.html`.
61
+
4. Old local versions stay on disk — delete manually when confident nothing references them. Old Source Coop folders stay published indefinitely and serve as an immutable archive.
Copy file name to clipboardExpand all lines: .claude/docs/partitioning-strategy.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,7 +96,7 @@ GROUP BY 1, 2;
96
96
97
97
## When NOT to use this layout
98
98
99
-
The geohash-partitioned layout is a better fit for **small-bbox, many-types-at-once** queries — which is exactly the web-map viewport case we moved away from. If the S3 / map-viewport path comes back, the helpers are still in place: see `add_geohash_columns` and `write_partitioned_dataset` in [src/openpois/io/geohash_partition.py](../../src/openpois/io/geohash_partition.py), and the original S3 upload step in [scripts/conflation/upload_to_s3.py](../../scripts/conflation/upload_to_s3.py). Swap the function calls in the two `format_for_upload.py` scripts back to the geohash variants.
99
+
The geohash-partitioned layout is a better fit for **small-bbox, many-types-at-once** queries — which is exactly the web-map viewport case we moved away from. If the map-viewport path comes back, the helpers are still in place: see `add_geohash_columns` and `write_partitioned_dataset` in [src/openpois/io/geohash_partition.py](../../src/openpois/io/geohash_partition.py), and the Source Cooperative publish step in [scripts/publish/upload_to_source_coop.py](../../scripts/publish/upload_to_source_coop.py). Swap the function calls in the two `format_for_upload.py` scripts back to the geohash variants.
python -u scripts/conflation/format_for_upload.py 2>&1| tee ~/data/openpois/logs/conflated_repartition_<version>.log
108
108
```
109
109
110
-
Each script deletes the existing partitioned directory at its versioned path and rewrites it. Geohash precision is controlled by `upload.geohash_precision_sort` in [config.yaml](../../config.yaml) (currently 6 ≈ 0.6 × 1.2 km).
110
+
Each script deletes the existing partitioned directory at its versioned path and rewrites it. Geohash precision is controlled by `publish.geohash_precision_sort` in [config.yaml](../../config.yaml) (currently 6 ≈ 0.6 × 1.2 km).
111
111
112
112
**Where the code lives:**
113
113
@@ -116,4 +116,4 @@ Each script deletes the existing partitioned directory at its versioned path and
116
116
-[scripts/osm_snapshot/format_for_upload.py](../../scripts/osm_snapshot/format_for_upload.py) — OSM partitioning entry point.
117
117
-[tests/test_geohash_partition.py](../../tests/test_geohash_partition.py) — unit tests + a DuckDB Hive-decode round-trip.
118
118
119
-
**S3 upload is currently disabled** — `scripts/conflation/upload_to_s3.py` is not run as part of this flow. The `upload.latest_url_*` / `upload.s3_*` keys in `config.yaml` are stale but harmless; clean them up in a later pass if the frontend integration is formally retired.
119
+
The Source Cooperative publish flow ([scripts/publish/upload_to_source_coop.py](../../scripts/publish/upload_to_source_coop.py)) uploads these same partitioned trees to `<version>/osm-parquet/` and `<version>/conflated-parquet/`. PMTiles generation remains downstream of partitioning.
Copy file name to clipboardExpand all lines: .claude/skills/conflate-snapshots/SKILL.md
+37-22Lines changed: 37 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,21 +1,33 @@
1
1
---
2
2
name: conflate-snapshots
3
-
description: Use when the user wants to match rated OSM POIs with Overture POIs into a unified dataset, partition it for web consumption, and push to S3. Triggers: "run conflation", "push new conflated data to S3", "bump conflation version", "reconflate with new parameters", "re-upload the partitioned parquet".
3
+
description: Use when the user wants to match rated OSM POIs with Overture POIs into a unified dataset, partition it for web consumption, and push to Source Cooperative. Triggers: "run conflation", "publish new data", "push new conflated data to Source Cooperative", "bump conflation version", "reconflate with new parameters", "re-upload the partitioned parquet".
4
4
---
5
5
6
-
# Conflate snapshots + publish to S3
6
+
# Conflate snapshots + publish to Source Cooperative
7
7
8
-
Taxonomy-aware matching between rated OSM and Overture, then partition and upload for web consumption.
8
+
Taxonomy-aware matching between rated OSM and Overture, then partition and
9
+
upload for web consumption.
9
10
10
11
## Prerequisites
11
12
12
13
- Rated OSM snapshot (`osm_snapshot_rated.parquet`) at `versions.snapshot_osm` — produced by [skills/full-data-pull](../full-data-pull/SKILL.md) step 3.
13
14
- Overture snapshot (`overture_snapshot.parquet`) at `versions.snapshot_overture`.
14
-
- AWS credentials configured for the `openpois-public` bucket (region `us-west-2`).
15
+
-**Fresh Source Cooperative temp credentials** in `.env.json` at the repo root. Tokens expire in ~1 hour.
Copy file name to clipboardExpand all lines: .claude/skills/full-data-pull/SKILL.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
name: full-data-pull
3
-
description: Use when the user wants to refresh the independent POI snapshots (OSM, Overture) and rate the OSM snapshot for conflation. Triggers: "refresh all snapshots", "do a new data pull", "download new OSM/Overture", "monthly data refresh", "pull the latest POI data". Does NOT include conflation or S3 upload — those live in conflate-snapshots.
3
+
description: Use when the user wants to refresh the independent POI snapshots (OSM, Overture) and rate the OSM snapshot for conflation. Triggers: "refresh all snapshots", "do a new data pull", "download new OSM/Overture", "monthly data refresh", "pull the latest POI data". Does NOT include conflation or Source Cooperative publishing — those live in conflate-snapshots.
0 commit comments