Skip to content

Fix log_training_metric crash for statistical time series models#1468

Closed
Copilot wants to merge 5 commits into
mainfrom
copilot/fix-log-training-metric-bug-again
Closed

Fix log_training_metric crash for statistical time series models#1468
Copilot wants to merge 5 commits into
mainfrom
copilot/fix-log-training-metric-bug-again

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 10, 2026

Statistical time series models (ARIMA, SARIMAX, Holt-Winters) fail with IndexError: single positional indexer is out-of-bounds when log_training_metric=True.

Root Cause

The _eval_estimator function attempts to compute training metrics by calling predict() on training data. Statistical models use statsmodels' predict interface designed for out-of-sample forecasting—it requires timestamps and cannot handle in-sample predictions on training data like ML models.

Changes

  • flaml/automl/ml.py: Skip training metric computation for ARIMA, SARIMAX, and HoltWinters when log_training_metric=True
  • test/automl/test_log_training_metric_ts.py: Add tests covering all three statistical models individually and together, plus ML models to verify normal behavior preserved

Example

from flaml import AutoML

automl = AutoML()
automl.fit(
    dataframe=df,
    label="y",
    task="ts_forecast",
    estimator_list=["arima", "sarimax", "holt-winters"],
    log_training_metric=True,  # Now works without IndexError
    period=12
)

Statistical models will log validation metrics but not training loss (which is less meaningful for models that fit the entire sequence). ML models continue computing training metrics normally.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug]: Forecasting: log_training_metric causes arima, sarimax, holt-winters to fail when set to true.</issue_title>
<issue_description>### Describe the bug

The key findings are:

Individual TS estimators (arima, sarimax, holt-winters) FAIL with log_training_metric=True
ML estimators (xgboost, lgbm, catboost) PASS
When log_training_metric is NOT set, arima PASSES (see the holdout split test)

ROOT CAUSE HYPOTHESIS:

  • log_training_metric=True causes FLAML to call get_y_pred() on X_train
  • For time series models (arima, sarimax, holt-winters), this fails because
    the TS model's predict() method expects X to have timestamps, but during
    internal validation, X_train can be empty or malformed.

Steps to reproduce

Script for reproduction

"""
FLAML Root Cause Verification Test

Hypothesis: The bug is triggered by `log_training_metric=True` with time series models.

When log_training_metric=True, FLAML tries to compute training predictions
via get_y_pred() which calls estimator.predict(X_train). For TS models,
this fails because X_train can be empty during certain validation scenarios.
"""

import sys
import os
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

import numpy as np
import pandas as pd
import sktime.datasets
from flaml import AutoML

def prepare_airline_data():
    """Prepare Airline data in FLAML format."""
    airline = sktime.datasets.load_airline()
    airline.index = airline.index.to_timestamp()
    
    return pd.DataFrame({
        "ds": airline.index,
        "y": airline.values.astype(np.float64),
    })


def test_log_training_metric_hypothesis():
    """Test if log_training_metric=True is the root cause."""
    print("\n" + "="*70)
    print("ROOT CAUSE VERIFICATION: log_training_metric")
    print("="*70)
    
    train_df = prepare_airline_data()
    
    # Base config
    base_config = {
        "task": "ts_forecast",
        "time_budget": 10,
        "metric": "mape",
        "eval_method": "holdout",
        "seed": 42,
        "verbose": 0,
        "estimator_list": ["arima"],
    }
    
    # Test 1: WITHOUT log_training_metric
    print("\n--- Test 1: WITHOUT log_training_metric ---")
    config1 = base_config.copy()
    
    try:
        automl = AutoML()
        automl.fit(dataframe=train_df, label="y", period=1, **config1)
        print(f"  ✅ SUCCESS - Best: {automl.best_estimator}")
    except Exception as e:
        print(f"  ❌ FAILED - {type(e).__name__}: {e}")
    
    # Test 2: WITH log_training_metric=True
    print("\n--- Test 2: WITH log_training_metric=True ---")
    config2 = base_config.copy()
    config2["log_training_metric"] = True
    
    try:
        automl = AutoML()
        automl.fit(dataframe=train_df, label="y", period=1, **config2)
        print(f"  ✅ SUCCESS - Best: {automl.best_estimator}")
    except Exception as e:
        print(f"  ❌ FAILED - {type(e).__name__}: {e}")
    
    # Test 3: WITH log_training_metric=False (explicit)
    print("\n--- Test 3: WITH log_training_metric=False ---")
    config3 = base_config.copy()
    config3["log_training_metric"] = False
    
    try:
        automl = AutoML()
        automl.fit(dataframe=train_df, label="y", period=1, **config3)
        print(f"  ✅ SUCCESS - Best: {automl.best_estimator}")
    except Exception as e:
        print(f"  ❌ FAILED - {type(e).__name__}: {e}")


def test_all_ts_estimators_with_and_without_logging():
    """Test all TS estimators with and without log_training_metric."""
    print("\n" + "="*70)
    print("ALL TS ESTIMATORS: with/without log_training_metric")
    print("="*70)
    
    train_df = prepare_airline_data()
    
    ts_estimators = ["arima", "sarimax", "holt-winters"]
    
    for est in ts_estimators:
        print(f"\n--- Estimator: {est} ---")
        
        # Without logging
        config_no_log = {
            "task": "ts_forecast",
            "time_budget": 5,
            "metric": "mape",
            "eval_method": "holdout",
            "seed": 42,
            "verbose": 0,
            "estimator_list": [est],
        }
        
        try:
            automl = AutoML()
            automl.fit(dataframe=train_df, label="y", period=1, **config_no_log)
            print(f"  log_training_metric=False: ✅ SUCCESS")
        except Exception as e:
            print(f"  log_training_metric=False: ❌ FAILED - {str(e)[:50]}")
        
        # With logging
        config_with_log = config_no_log.copy()
        config_with_log["log_training_metric"] = True
        
        try:
            automl = AutoML()
            automl.fit(dataframe=train_df, label="y", period=1, **config_with_log)
            print(f"  log_training_metric=True:  ✅ SUCCESS")
        except Exception as e:
            print(f"  log_training_metric=True:  ❌ FAILED - {str(e)[:50]}")


def test_fix_remove_log_training_metric():
    """Test the fix: remove log_trainin...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes microsoft/FLAML#1464

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/FLAML/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 3 commits January 10, 2026 02:29
Skip training metric computation for ARIMA, SARIMAX, and Holt-Winters models when log_training_metric=True to avoid IndexError. These statistical models don't support in-sample predictions the same way ML models do.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Remove print statements and extract test data preparation to helper function

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix bug with log_training_metric causing time series models to fail Fix log_training_metric crash for statistical time series models Jan 10, 2026
Copilot AI requested a review from thinkall January 10, 2026 02:37
@thinkall thinkall closed this Jan 10, 2026
@thinkall thinkall deleted the copilot/fix-log-training-metric-bug-again branch January 10, 2026 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants