You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,6 +48,30 @@ for train_idx, test_idx in splits:
48
48
print("Test:"); display(panel_data.loc[test_idx])
49
49
```
50
50
51
+
### Spatio-Temporal Cross-Validation
52
+
53
+
panelsplit can also handle combined spatio-temporal holdouts by factoring in entity hierarchies (e.g., states or cities) to prevent cluster-level leakage. You can simultaneously validate on unobserved time periods *and* structurally unobserved groups:
54
+
55
+
```python
56
+
from sklearn.model_selection import StratifiedGroupKFold
57
+
58
+
# Create spatial splits that evaluate cluster-level combinations robustly:
59
+
panel_split = PanelSplit(
60
+
periods=panel_data.year,
61
+
n_splits=2,
62
+
groups=panel_data["country_id"],
63
+
group_splitter=StratifiedGroupKFold(n_splits=3) # Use any valid Scikit-Learn group methodology!
64
+
)
65
+
66
+
# You can also pass arbitrarily nested multi-column groups!
67
+
# PanelSplit will internally flatten them into a single composite group identifier for KFold slicing.
68
+
# e.g., groups = panel_data[["country_id", "city_id"]]
69
+
70
+
# Lazy Evaluation securely propagates X and y through the StratifiedGroupKFold!
# Yields 6 total sub-splits (2 temporal cuts x 3 spatial stratified holds)!
73
+
```
74
+
51
75
For more examples and detailed usage instructions, refer to the [examples](examples) directory in this repository. Also feel free to check out [an introductory article on panelsplit](https://towardsdatascience.com/how-to-cross-validate-your-panel-data-in-python-9ad981ddd043).
Whether to include all training sets in their respective test sets. If set to
66
67
True, overrides ``include_first_train_in_test``. Default is False.
68
+
groups : Optional[Any]
69
+
A 1D/2D array or DataFrame of spatial groupings/IDs for implementing spatio-temporal holdouts.
70
+
If provided, tests will simultaneously cross-validate over spatial nested structures using GroupKFold. Default is None.
71
+
group_splitter : Optional[Any]
72
+
A scikit-learn compatible splitter (e.g., `StratifiedGroupKFold(n_splits=3)`) used to build spatial splits natively. Default is `GroupKFold(n_splits=2)`.
"train_test_splits is uncomputed. Your selected group_splitter requires passing X and y explicitly to the .split() method to calculate strata boundaries."
0 commit comments