Fix log_training_metric causing IndexError for time series models by Copilot · Pull Request #1469 · microsoft/FLAML

Copilot · 2026-01-10T04:22:41Z

Time series models (ARIMA, SARIMAX, Holt-Winters) fail with IndexError when log_training_metric=True because _eval_estimator() attempts to compute training predictions on TimeSeriesDataset objects, which expect test_data for prediction but receive training data with empty/unset test splits.

Changes

flaml/automl/ml.py

Skip training metric computation for TimeSeriesDataset instances in _eval_estimator()
Regular ML models continue computing training metrics as before

test/automl/test_forecast.py

Add test_log_training_metric_ts_models() validating all three affected models with log_training_metric=True

Example

from flaml import AutoML

automl = AutoML()
automl.fit(
    dataframe=df,
    label="y",
    task="ts_forecast",
    metric="mape",
    eval_method="holdout",
    estimator_list=["arima", "sarimax", "holt-winters"],
    log_training_metric=True,  # Previously failed, now works
    period=12
)

Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug]: Forecasting: log_training_metric causes arima, sarimax, holt-winters to fail when set to true.</issue_title>
<issue_description>### Describe the bug

The key findings are:

Individual TS estimators (arima, sarimax, holt-winters) FAIL with log_training_metric=True
ML estimators (xgboost, lgbm, catboost) PASS
When log_training_metric is NOT set, arima PASSES (see the holdout split test)

ROOT CAUSE HYPOTHESIS:

log_training_metric=True causes FLAML to call get_y_pred() on X_train
For time series models (arima, sarimax, holt-winters), this fails because
the TS model's predict() method expects X to have timestamps, but during
internal validation, X_train can be empty or malformed.

Steps to reproduce

Script for reproduction

"""
FLAML Root Cause Verification Test

Hypothesis: The bug is triggered by `log_training_metric=True` with time series models.

When log_training_metric=True, FLAML tries to compute training predictions
via get_y_pred() which calls estimator.predict(X_train). For TS models,
this fails because X_train can be empty during certain validation scenarios.
"""

import sys
import os
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

import numpy as np
import pandas as pd
import sktime.datasets
from flaml import AutoML

def prepare_airline_data():
    """Prepare Airline data in FLAML format."""
    airline = sktime.datasets.load_airline()
    airline.index = airline.index.to_timestamp()
    
    return pd.DataFrame({
        "ds": airline.index,
        "y": airline.values.astype(np.float64),
    })


def test_log_training_metric_hypothesis():
    """Test if log_training_metric=True is the root cause."""
    print("\n" + "="*70)
    print("ROOT CAUSE VERIFICATION: log_training_metric")
    print("="*70)
    
    train_df = prepare_airline_data()
    
    # Base config
    base_config = {
        "task": "ts_forecast",
        "time_budget": 10,
        "metric": "mape",
        "eval_method": "holdout",
        "seed": 42,
        "verbose": 0,
        "estimator_list": ["arima"],
    }
    
    # Test 1: WITHOUT log_training_metric
    print("\n--- Test 1: WITHOUT log_training_metric ---")
    config1 = base_config.copy()
    
    try:
        automl = AutoML()
        automl.fit(dataframe=train_df, label="y", period=1, **config1)
        print(f"  ✅ SUCCESS - Best: {automl.best_estimator}")
    except Exception as e:
        print(f"  ❌ FAILED - {type(e).__name__}: {e}")
    
    # Test 2: WITH log_training_metric=True
    print("\n--- Test 2: WITH log_training_metric=True ---")
    config2 = base_config.copy()
    config2["log_training_metric"] = True
    
    try:
        automl = AutoML()
        automl.fit(dataframe=train_df, label="y", period=1, **config2)
        print(f"  ✅ SUCCESS - Best: {automl.best_estimator}")
    except Exception as e:
        print(f"  ❌ FAILED - {type(e).__name__}: {e}")
    
    # Test 3: WITH log_training_metric=False (explicit)
    print("\n--- Test 3: WITH log_training_metric=False ---")
    config3 = base_config.copy()
    config3["log_training_metric"] = False
    
    try:
        automl = AutoML()
        automl.fit(dataframe=train_df, label="y", period=1, **config3)
        print(f"  ✅ SUCCESS - Best: {automl.best_estimator}")
    except Exception as e:
        print(f"  ❌ FAILED - {type(e).__name__}: {e}")


def test_all_ts_estimators_with_and_without_logging():
    """Test all TS estimators with and without log_training_metric."""
    print("\n" + "="*70)
    print("ALL TS ESTIMATORS: with/without log_training_metric")
    print("="*70)
    
    train_df = prepare_airline_data()
    
    ts_estimators = ["arima", "sarimax", "holt-winters"]
    
    for est in ts_estimators:
        print(f"\n--- Estimator: {est} ---")
        
        # Without logging
        config_no_log = {
            "task": "ts_forecast",
            "time_budget": 5,
            "metric": "mape",
            "eval_method": "holdout",
            "seed": 42,
            "verbose": 0,
            "estimator_list": [est],
        }
        
        try:
            automl = AutoML()
            automl.fit(dataframe=train_df, label="y", period=1, **config_no_log)
            print(f"  log_training_metric=False: ✅ SUCCESS")
        except Exception as e:
            print(f"  log_training_metric=False: ❌ FAILED - {str(e)[:50]}")
        
        # With logging
        config_with_log = config_no_log.copy()
        config_with_log["log_training_metric"] = True
        
        try:
            automl = AutoML()
            automl.fit(dataframe=train_df, label="y", period=1, **config_with_log)
            print(f"  log_training_metric=True:  ✅ SUCCESS")
        except Exception as e:
            print(f"  log_training_metric=True:  ❌ FAILED - {str(e)[:50]}")


def test_fix_remove_log_training_metric():
    """Test the fix: remove log_trainin...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes microsoft/FLAML#1464

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

Copilot

Pull request overview

This pull request fixes a bug where setting log_training_metric=True causes failures for time series forecasting models (arima, sarimax, holt-winters). The root cause was that when logging training metrics, FLAML attempted to call predict() on the training data (X_train), which fails for time series models because they expect a TimeSeriesDataset with properly configured test_data for prediction.

Changes:

Added conditional logic to skip training metric computation for TimeSeriesDataset objects when log_training_metric=True
Added a new test to verify that time series models work correctly with log_training_metric=True

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
flaml/automl/ml.py	Added check to skip training metric computation when X_train is a TimeSeriesDataset, preventing the predict() call that would fail for time series models
test/automl/test_forecast.py	Added test function to validate that arima, sarimax, and holt-winters models work with log_training_metric=True

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI assigned Copilot and thinkall Jan 10, 2026

Copilot started work on behalf of thinkall January 10, 2026 04:23 View session

thinkall requested a review from Copilot January 10, 2026 04:28

Copilot started reviewing on behalf of thinkall January 10, 2026 04:29 View session

Copilot AI reviewed Jan 10, 2026

View reviewed changes

Comment thread flaml/automl/ml.py Outdated

Comment thread test/automl/test_forecast.py Outdated

Copilot AI changed the title ~~[WIP] Fix bug in time series model logging~~ Fix log_training_metric causing IndexError for time series models Jan 10, 2026

Copilot AI requested a review from thinkall January 10, 2026 04:42

Copilot finished work on behalf of thinkall January 10, 2026 04:42

Fix log_training_metric causing IndexError for time series models

566972a

thinkall force-pushed the copilot/fix-time-series-model-logging branch from 8d53675 to 566972a Compare January 10, 2026 08:09

thinkall marked this pull request as ready for review January 10, 2026 08:10

thinkall approved these changes Jan 10, 2026

View reviewed changes

jianglibigdata approved these changes Jan 10, 2026

View reviewed changes

thinkall merged commit 0b138d9 into main Jan 10, 2026
14 of 15 checks passed

thinkall deleted the copilot/fix-time-series-model-logging branch January 10, 2026 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix log_training_metric causing IndexError for time series models#1469

Fix log_training_metric causing IndexError for time series models#1469
thinkall merged 1 commit into
mainfrom
copilot/fix-time-series-model-logging

Copilot AI commented Jan 10, 2026 •

edited by thinkall

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Copilot AI commented Jan 10, 2026 • edited by thinkall Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example

Steps to reproduce

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Jan 10, 2026 •

edited by thinkall

Loading