Add partition_by support for lag transforms by janrth · Pull Request #609 · Nixtla/mlforecast

janrth · 2026-03-30T14:03:17Z

This PR adds partition_by support to lag/rolling transforms.

partition_by enables SQL-like PARTITION BY behavior for lag transforms: features are still computed in time order, but only using past observations that match the current row on one or more partition columns.

This is useful for regime-aware forecasting features such as:

Supported all three execution modes:
local: partition within each unique_id
groupby=[...]: aggregate across series within each (groupby, partition_by) bucket
global_=True: aggregate globally within each partition_by bucket

Integrated the new behavior across:
fit_transform
recursive/direct predict
update
cross_validation
transfer-learning predict(new_df=...)
AutoMLForecast
Added tests covering core functionality, update behavior, CV, transfer learning, and AutoML integration.

Usage example:

fcst = MLForecast(
    models={"hgb": HistGradientBoostingRegressor()},
    freq="D",
    lags=[1, 7, 28],
    lag_transforms={
        1: [
            ExpandingMean(partition_by=["promo"]),
            RollingMean(window_size=7, partition_by=["promo"]),
            RollingMean(
                window_size=7,
                groupby=["brand"],
                partition_by=["promo"],
            ),
        ]
    },
    date_features=["dayofweek", "month"],
)

Closes #587

Checklist:

This PR has a meaningful title and a clear description.
The tests pass.
All linting tasks pass.
The notebooks are clean.

…and AutoML

codspeed-hq · 2026-03-30T14:10:03Z

Merging this PR will not alter performance

✅ 12 untouched benchmarks

_{Comparing janrth:feature/partition_by_window_functions (fa0da62) with main (36ecd97)}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e62aa03ee5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

janrth · 2026-03-31T21:59:35Z

@codex

chatgpt-codex-connector · 2026-03-31T22:08:06Z

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

simonez-tuidi

I think there are some inconsistent results when using the partition_by aggregations in global_ or groupby mode, probably has to do with how the operations are combined. All looks good for the partition_by results when executed in the default mode.

Overall, great work, thanks so much @janrth!

simonez-tuidi · 2026-04-16T07:54:52Z

+        ("a", 1): np.nan,
+        ("a", 2): 1.0,
+        ("a", 3): 10.0,
+        ("a", 4): 11.5,


@janrth I think possibly here the values for the 4th timestamp should be average of all the previous values that have promo=True across all series, which would be the average of [1,2,20] so 7.66

@simonez-tuidi thx for your close review!! super helpful. I will try to check all your comments asap :)

simonez-tuidi · 2026-04-16T07:55:10Z

+        "promo": [True, True, False, True, False, True, False, True],
+    }
+    if include_brand:
+        data["brand"] = ["x"] * 8


Might be worth expanding the test coverage to have more timestamps and scenarios where there's more than one value in the groupby column, e.g. two brands etc

simonez-tuidi · 2026-04-16T07:55:56Z

+        ("a", 1): np.nan,
+        ("a", 2): 1.0,
+        ("a", 3): 10.0,
+        ("a", 4): 11.5,


Same as for the groupby case

simonez-tuidi · 2026-04-16T08:05:01Z

+    col = tfm._get_name(1)
+    np.testing.assert_allclose(
+        features.loc[features["step"].eq(0), col].to_numpy(),
+        np.array([np.nan, 30.0]),


Shouldn't the result here be [3.0, 30.0]?

The third value in series a has promo=False and y=3 so it's the only one to be used in the ExpandingMean

For series b 30.0 is the mean of [20, 40] that have promo=True so it's correct

simonez-tuidi · 2026-04-16T08:06:19Z

+    )
+    np.testing.assert_allclose(
+        features.loc[features["step"].eq(1), col].to_numpy(),
+        np.array([2.3333333333333335, 20.0]),


correct 👍

simonez-tuidi · 2026-04-16T08:10:37Z

+    col = tfm._get_name(1)
+    np.testing.assert_allclose(
+        features.loc[features["step"].eq(0), col].to_numpy(),
+        np.array([21.5, 22.333333333333332]),


Step 5 of series a has promo=False, so all previous False values are [3, 10, 30] -> mean = 14.33
Step 5 of series b has promo=True, so all previous True values are [1, 2, 4, 20, 40] -> mean = 13.8

simonez-tuidi · 2026-04-16T08:16:16Z

+    col = tfm._get_name(1)
+    np.testing.assert_allclose(
+        feats[col].to_numpy(),
+        np.array([2.3333333333333335, 20.0]),


correct 👍

simonez-tuidi · 2026-04-16T08:21:34Z

+        (
+            ExpandingMean(groupby=["brand"], partition_by=["promo"]),
+            {"static_features": ["brand"]},
+            np.array([29.25, 16.0]),


Step 6 of series a has promo=True, so all previous True values across series are [1, 2, 4, 20, 40, 50] -> mean = 19.5
Step 6 of series b has promo=False, so all previous False values are [3, 5, 10, 30] -> mean = 12

Add partition_by support for lag transforms across fit, predict, CV, …

e62aa03

…and AutoML

chatgpt-codex-connector Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread mlforecast/core.py

Comment thread mlforecast/core.py Outdated

janrth and others added 2 commits March 31, 2026 23:43

Fix partition-by edge cases and regressions

6a26ef5

Merge branch 'main' into feature/partition_by_window_functions

0e1db85

remove duplicates

d22290a

janrth added 2 commits April 3, 2026 21:06

Merge branch 'main' into feature/partition_by_window_functions

579956f

Merge branch 'main' into feature/partition_by_window_functions

fa0da62

simonez-tuidi reviewed Apr 16, 2026

View reviewed changes

simonez-tuidi mentioned this pull request Apr 30, 2026

[FEAT] Add partition_by support for lag transforms (with global_/groupby fixes) #636

Draft

janrth marked this pull request as draft May 12, 2026 05:14

Conversation

janrth commented Mar 30, 2026 • edited by nasaul Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

janrth commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 31, 2026

Uh oh!

simonez-tuidi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janrth commented Mar 30, 2026 •

edited by nasaul

Loading

codspeed-hq Bot commented Mar 30, 2026 •

edited

Loading