Skip to content

[Bug]: 'y' parameter lacking when using sklearn cross validation objects #1447

@santortiz

Description

@santortiz

Describe the bug

On class GenericTask, evaluate_model_CV method, when using, for example, sklearns' StratifiedKFold, the split method is missing the 'y' parameter therefore cross validation doesn't work.

Steps to reproduce

from flaml import AutoML
from sklearn.model_selection import StratifiedKFold

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

automl_settings = {
"task": 'classification',
"time_budget": 180, # total time in seconds
"metric": 'roc_auc',
"estimator_list": ["xgboost", "lgbm", "catboost"],
"eval_method": "cv",
"split_type": cv,
"ensemble": True
}

flaml_automl.fit(
X_train=X_train_transformed,
y_train=y_train,
**automl_settings
)

Model Used

Not any particular model. Error occurs when cross validating optimal-model.

Expected Behavior

Do the splits and cross validate

Screenshots and logs

This is the current code (check last line):

        X_train_split, y_train_split = X_train_all, y_train_all
        shuffle = getattr(kf, "shuffle", not self.is_ts_forecast())
        if isinstance(kf, RepeatedStratifiedKFold):
            kf = kf.split(X_train_split, y_train_split)
        elif isinstance(kf, (GroupKFold, StratifiedGroupKFold)):
            groups = kf.groups
            kf = kf.split(X_train_split, y_train_split, groups)
            shuffle = False
        elif isinstance(kf, TimeSeriesSplit):
            kf = kf.split(X_train_split, y_train_split)
        else:
            kf = kf.split(X_train_split)

Last line should be:
kf = kf.split(X_train_split, y_train_split) <----- with the "y" parameter

Additional Information

flaml 2.3.5
ubuntu 20.04.6 LTS
python 3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions