You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/skills/iterate-model-types/SKILL.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,20 +9,21 @@ Rerun just the modeling step (step 3 of the full pipeline) for a fixed source-da
9
9
10
10
## Prerequisites
11
11
12
-
-`osm_observations_{tag_key}.csv` already generated by `scripts/osm_data/format_tabular.py` at the pinned `versions.osm_data`.
12
+
-`osm_observations.parquet` already generated by `scripts/osm_data/format_tabular.py` at the pinned `versions.osm_data`. Each row is one (POI version, shared_label) pair — POIs mapping to multiple taxonomy categories are exploded into multiple rows.
13
13
- You know which model variant you want next.
14
14
15
15
## Steps
16
16
17
17
1.**Pin `versions.osm_data`** in `config.yaml`. Do *not* change it — that's the whole point.
18
18
19
19
2.**Bump `versions.model_output`** using the convention:
20
-
-`{date}_by_{group_key}` for grouped models (e.g., `20260416_by_leisure`, `20260416_by_amenity`)
20
+
-`{date}_by_shared_label` for the unified random-effects model (one intercept per taxonomy category)
21
+
-`{date}_by_{group_key}` for ad-hoc groupings on other columns
21
22
-`{date}_constant` for the single-rate baseline
22
23
23
24
3.**Edit `osm_turnover_model`** in `config.yaml`:
24
-
-`model_type`: one of `constant`, `random_by_type`, `pseudo_varying` (registry at [src/openpois/models/osm_models.py](../../../src/openpois/models/osm_models.py))
25
-
-`group_key`: column to group by (e.g., `leisure_last_value`, `shop`, `amenity`). Null for constant.
25
+
-`model_type`: one of `constant`, `random_by_type` (registry at [src/openpois/models/osm_models.py](../../../src/openpois/models/osm_models.py))
26
+
-`group_key`: column to group by. Default `shared_label` (unified taxonomy). Null for constant. Other observation columns (raw OSM keys like `shop`, `amenity`, ...) are still accepted.
26
27
-`group_values`: restrict to specific values, or null for all
27
28
-`min_value_count`: drop groups below this count
28
29
@@ -35,7 +36,7 @@ Rerun just the modeling step (step 3 of the full pipeline) for a fixed source-da
35
36
```bash
36
37
python scripts/osm_snapshot/apply_model.py
37
38
```
38
-
`apply_model.py`picks up every`{stub}_by_*` directory at the stub date and falls back to `{stub}_constant` for unmatched groups.
39
+
`apply_model.py`loads a single`{stub}_by_shared_label` random-effects model (if present) and falls back to `{stub}_constant` for rows with no matching taxonomy label.
39
40
40
41
## Comparing variants
41
42
@@ -46,5 +47,5 @@ Rerun just the modeling step (step 3 of the full pipeline) for a fixed source-da
46
47
## Pitfalls
47
48
48
49
- Forgetting to change `versions.model_output` overwrites the previous variant's outputs.
49
-
-`group_key` must exist as a column in the observations CSV; run `format_tabular.py`first if adding a new tag key to `osm_data.tag_key`.
50
+
-`group_key` must exist as a column in the observations CSV. `shared_label` is populated by `format_tabular.py`from the conflation taxonomy crosswalk; if you change the crosswalk, rerun `format_tabular.py`.
50
51
-`min_value_count` filters groups silently — check `fitted_params.csv` row count vs. expected group count.
Uses `osm_data.tag_key` (e.g., `name`) to produce one observation row per versionwith change/deletion flags.
28
+
Uses `osm_data.tag_key` (e.g., `name`) to flag change/deletion per POI version, then assigns shared taxonomy labels from the conflation crosswalk and explodes rows per label. One row = (POI version, shared_label). Rows with no matching taxonomy category are dropped.
29
29
30
30
3.**Pick a modeling config and fit λ** — see [skills/iterate-model-types](../iterate-model-types/SKILL.md) for choosing `model_type` / `group_key`.
31
31
```bash
32
32
python scripts/models/osm_turnover.py
33
33
```
34
-
Writes `fitted_params.csv`, `param_draws.csv`, `predictions.csv`, `fitted_model.pt`to `{date}_by_{group_key}` (or `{date}_constant`) under `directories.model_output.path`.
34
+
Writes `fitted_params.csv`, `param_draws.csv`, `predictions.csv`to `{date}_by_shared_label` (the unified random-effects model) or `{date}_constant` (single-rate baseline) under `directories.model_output.path`.
35
35
36
36
4.**Apply predictions to the OSM snapshot** → `osm_snapshot_rated.parquet`
37
37
```bash
38
38
python scripts/osm_snapshot/apply_model.py
39
39
```
40
-
Reads the `osm_data.apply_model.model_stub` date, loads all`{stub}_by_*` dirs (plus a `{stub}_constant`fallback), and rates every POI in `osm_snapshot.parquet`.
40
+
Reads the `osm_data.apply_model.model_stub` date, loads the`{stub}_by_shared_label` random-effects model (if present), falls back to `{stub}_constant`for rows with no matching taxonomy label, and rates every POI in `osm_snapshot.parquet`.
41
41
42
42
## Verification
43
43
@@ -51,5 +51,5 @@ Hand off to [skills/verify-pipeline-run](../verify-pipeline-run/SKILL.md) — in
0 commit comments