Skip to content

Fix log_training_metric causing IndexError for time series models#1469

Merged
thinkall merged 1 commit into
mainfrom
copilot/fix-time-series-model-logging
Jan 10, 2026
Merged

Fix log_training_metric causing IndexError for time series models#1469
thinkall merged 1 commit into
mainfrom
copilot/fix-time-series-model-logging

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 10, 2026

Close #1464

Time series models (ARIMA, SARIMAX, Holt-Winters) fail with IndexError when log_training_metric=True because _eval_estimator() attempts to compute training predictions on TimeSeriesDataset objects, which expect test_data for prediction but receive training data with empty/unset test splits.

Changes

flaml/automl/ml.py

  • Skip training metric computation for TimeSeriesDataset instances in _eval_estimator()
  • Regular ML models continue computing training metrics as before

test/automl/test_forecast.py

  • Add test_log_training_metric_ts_models() validating all three affected models with log_training_metric=True

Example

from flaml import AutoML

automl = AutoML()
automl.fit(
    dataframe=df,
    label="y",
    task="ts_forecast",
    metric="mape",
    eval_method="holdout",
    estimator_list=["arima", "sarimax", "holt-winters"],
    log_training_metric=True,  # Previously failed, now works
    period=12
)
Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug]: Forecasting: log_training_metric causes arima, sarimax, holt-winters to fail when set to true.</issue_title>
<issue_description>### Describe the bug

The key findings are:

Individual TS estimators (arima, sarimax, holt-winters) FAIL with log_training_metric=True
ML estimators (xgboost, lgbm, catboost) PASS
When log_training_metric is NOT set, arima PASSES (see the holdout split test)

ROOT CAUSE HYPOTHESIS:

  • log_training_metric=True causes FLAML to call get_y_pred() on X_train
  • For time series models (arima, sarimax, holt-winters), this fails because
    the TS model's predict() method expects X to have timestamps, but during
    internal validation, X_train can be empty or malformed.

Steps to reproduce

Script for reproduction

"""
FLAML Root Cause Verification Test

Hypothesis: The bug is triggered by `log_training_metric=True` with time series models.

When log_training_metric=True, FLAML tries to compute training predictions
via get_y_pred() which calls estimator.predict(X_train). For TS models,
this fails because X_train can be empty during certain validation scenarios.
"""

import sys
import os
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

import numpy as np
import pandas as pd
import sktime.datasets
from flaml import AutoML

def prepare_airline_data():
    """Prepare Airline data in FLAML format."""
    airline = sktime.datasets.load_airline()
    airline.index = airline.index.to_timestamp()
    
    return pd.DataFrame({
        "ds": airline.index,
        "y": airline.values.astype(np.float64),
    })


def test_log_training_metric_hypothesis():
    """Test if log_training_metric=True is the root cause."""
    print("\n" + "="*70)
    print("ROOT CAUSE VERIFICATION: log_training_metric")
    print("="*70)
    
    train_df = prepare_airline_data()
    
    # Base config
    base_config = {
        "task": "ts_forecast",
        "time_budget": 10,
        "metric": "mape",
        "eval_method": "holdout",
        "seed": 42,
        "verbose": 0,
        "estimator_list": ["arima"],
    }
    
    # Test 1: WITHOUT log_training_metric
    print("\n--- Test 1: WITHOUT log_training_metric ---")
    config1 = base_config.copy()
    
    try:
        automl = AutoML()
        automl.fit(dataframe=train_df, label="y", period=1, **config1)
        print(f"  ✅ SUCCESS - Best: {automl.best_estimator}")
    except Exception as e:
        print(f"  ❌ FAILED - {type(e).__name__}: {e}")
    
    # Test 2: WITH log_training_metric=True
    print("\n--- Test 2: WITH log_training_metric=True ---")
    config2 = base_config.copy()
    config2["log_training_metric"] = True
    
    try:
        automl = AutoML()
        automl.fit(dataframe=train_df, label="y", period=1, **config2)
        print(f"  ✅ SUCCESS - Best: {automl.best_estimator}")
    except Exception as e:
        print(f"  ❌ FAILED - {type(e).__name__}: {e}")
    
    # Test 3: WITH log_training_metric=False (explicit)
    print("\n--- Test 3: WITH log_training_metric=False ---")
    config3 = base_config.copy()
    config3["log_training_metric"] = False
    
    try:
        automl = AutoML()
        automl.fit(dataframe=train_df, label="y", period=1, **config3)
        print(f"  ✅ SUCCESS - Best: {automl.best_estimator}")
    except Exception as e:
        print(f"  ❌ FAILED - {type(e).__name__}: {e}")


def test_all_ts_estimators_with_and_without_logging():
    """Test all TS estimators with and without log_training_metric."""
    print("\n" + "="*70)
    print("ALL TS ESTIMATORS: with/without log_training_metric")
    print("="*70)
    
    train_df = prepare_airline_data()
    
    ts_estimators = ["arima", "sarimax", "holt-winters"]
    
    for est in ts_estimators:
        print(f"\n--- Estimator: {est} ---")
        
        # Without logging
        config_no_log = {
            "task": "ts_forecast",
            "time_budget": 5,
            "metric": "mape",
            "eval_method": "holdout",
            "seed": 42,
            "verbose": 0,
            "estimator_list": [est],
        }
        
        try:
            automl = AutoML()
            automl.fit(dataframe=train_df, label="y", period=1, **config_no_log)
            print(f"  log_training_metric=False: ✅ SUCCESS")
        except Exception as e:
            print(f"  log_training_metric=False: ❌ FAILED - {str(e)[:50]}")
        
        # With logging
        config_with_log = config_no_log.copy()
        config_with_log["log_training_metric"] = True
        
        try:
            automl = AutoML()
            automl.fit(dataframe=train_df, label="y", period=1, **config_with_log)
            print(f"  log_training_metric=True:  ✅ SUCCESS")
        except Exception as e:
            print(f"  log_training_metric=True:  ❌ FAILED - {str(e)[:50]}")


def test_fix_remove_log_training_metric():
    """Test the fix: remove log_trainin...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes microsoft/FLAML#1464

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a bug where setting log_training_metric=True causes failures for time series forecasting models (arima, sarimax, holt-winters). The root cause was that when logging training metrics, FLAML attempted to call predict() on the training data (X_train), which fails for time series models because they expect a TimeSeriesDataset with properly configured test_data for prediction.

Changes:

  • Added conditional logic to skip training metric computation for TimeSeriesDataset objects when log_training_metric=True
  • Added a new test to verify that time series models work correctly with log_training_metric=True

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
flaml/automl/ml.py Added check to skip training metric computation when X_train is a TimeSeriesDataset, preventing the predict() call that would fail for time series models
test/automl/test_forecast.py Added test function to validate that arima, sarimax, and holt-winters models work with log_training_metric=True

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread flaml/automl/ml.py Outdated
Comment thread test/automl/test_forecast.py Outdated
Copilot AI changed the title [WIP] Fix bug in time series model logging Fix log_training_metric causing IndexError for time series models Jan 10, 2026
Copilot AI requested a review from thinkall January 10, 2026 04:42
@thinkall thinkall force-pushed the copilot/fix-time-series-model-logging branch from 8d53675 to 566972a Compare January 10, 2026 08:09
@thinkall thinkall marked this pull request as ready for review January 10, 2026 08:10
@thinkall thinkall merged commit 0b138d9 into main Jan 10, 2026
14 of 15 checks passed
@thinkall thinkall deleted the copilot/fix-time-series-model-logging branch January 10, 2026 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Forecasting: log_training_metric causes arima, sarimax, holt-winters to fail when set to true.

4 participants