Skip to content

Add partition_by support for lag transforms #609

Draft
janrth wants to merge 6 commits into
Nixtla:mainfrom
janrth:feature/partition_by_window_functions
Draft

Add partition_by support for lag transforms #609
janrth wants to merge 6 commits into
Nixtla:mainfrom
janrth:feature/partition_by_window_functions

Conversation

@janrth
Copy link
Copy Markdown
Contributor

@janrth janrth commented Mar 30, 2026

This PR adds partition_by support to lag/rolling transforms.

partition_by enables SQL-like PARTITION BY behavior for lag transforms: features are still computed in time order, but only using past observations that match the current row on one or more partition columns.

This is useful for regime-aware forecasting features such as:

Supported all three execution modes:
local: partition within each unique_id
groupby=[...]: aggregate across series within each (groupby, partition_by) bucket
global_=True: aggregate globally within each partition_by bucket

Integrated the new behavior across:
fit_transform
recursive/direct predict
update
cross_validation
transfer-learning predict(new_df=...)
AutoMLForecast
Added tests covering core functionality, update behavior, CV, transfer learning, and AutoML integration.

Usage example:

fcst = MLForecast(
    models={"hgb": HistGradientBoostingRegressor()},
    freq="D",
    lags=[1, 7, 28],
    lag_transforms={
        1: [
            ExpandingMean(partition_by=["promo"]),
            RollingMean(window_size=7, partition_by=["promo"]),
            RollingMean(
                window_size=7,
                groupby=["brand"],
                partition_by=["promo"],
            ),
        ]
    },
    date_features=["dayofweek", "month"],
)

Closes #587

Checklist:

  • This PR has a meaningful title and a clear description.
  • The tests pass.
  • All linting tasks pass.
  • The notebooks are clean.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 30, 2026

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing janrth:feature/partition_by_window_functions (fa0da62) with main (36ecd97)

Open in CodSpeed

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e62aa03ee5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread mlforecast/core.py
Comment thread mlforecast/core.py Outdated
@janrth
Copy link
Copy Markdown
Contributor Author

janrth commented Mar 31, 2026

@codex

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

@simonez-tuidi simonez-tuidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some inconsistent results when using the partition_by aggregations in global_ or groupby mode, probably has to do with how the operations are combined. All looks good for the partition_by results when executed in the default mode.

Overall, great work, thanks so much @janrth!

Comment thread tests/test_core.py
("a", 1): np.nan,
("a", 2): 1.0,
("a", 3): 10.0,
("a", 4): 11.5,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janrth I think possibly here the values for the 4th timestamp should be average of all the previous values that have promo=True across all series, which would be the average of [1,2,20] so 7.66

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonez-tuidi thx for your close review!! super helpful. I will try to check all your comments asap :)

Comment thread tests/test_core.py
"promo": [True, True, False, True, False, True, False, True],
}
if include_brand:
data["brand"] = ["x"] * 8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth expanding the test coverage to have more timestamps and scenarios where there's more than one value in the groupby column, e.g. two brands etc

Comment thread tests/test_core.py
("a", 1): np.nan,
("a", 2): 1.0,
("a", 3): 10.0,
("a", 4): 11.5,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the groupby case

Comment thread tests/test_core.py
col = tfm._get_name(1)
np.testing.assert_allclose(
features.loc[features["step"].eq(0), col].to_numpy(),
np.array([np.nan, 30.0]),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the result here be [3.0, 30.0]?

The third value in series a has promo=False and y=3 so it's the only one to be used in the ExpandingMean

For series b 30.0 is the mean of [20, 40] that have promo=True so it's correct

Comment thread tests/test_core.py
)
np.testing.assert_allclose(
features.loc[features["step"].eq(1), col].to_numpy(),
np.array([2.3333333333333335, 20.0]),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct 👍

Comment thread tests/test_core.py
col = tfm._get_name(1)
np.testing.assert_allclose(
features.loc[features["step"].eq(0), col].to_numpy(),
np.array([21.5, 22.333333333333332]),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 5 of series a has promo=False, so all previous False values are [3, 10, 30] -> mean = 14.33
Step 5 of series b has promo=True, so all previous True values are [1, 2, 4, 20, 40] -> mean = 13.8

Comment thread tests/test_core.py
col = tfm._get_name(1)
np.testing.assert_allclose(
feats[col].to_numpy(),
np.array([2.3333333333333335, 20.0]),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct 👍

Comment thread tests/test_core.py
(
ExpandingMean(groupby=["brand"], partition_by=["promo"]),
{"static_features": ["brand"]},
np.array([29.25, 16.0]),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 6 of series a has promo=True, so all previous True values across series are [1, 2, 4, 20, 40, 50] -> mean = 19.5
Step 6 of series b has promo=False, so all previous False values are [3, 5, 10, 30] -> mean = 12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: partition_by support for window aggregations

2 participants