Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
6711934
Update gitignore
thinkall Dec 23, 2025
a49492f
Bump version to 2.4.0
thinkall Dec 23, 2025
8152679
Update readme
thinkall Dec 23, 2025
4397fc1
Pre-download california housing data
thinkall Dec 25, 2025
3b82f52
Use pre-downloaded california housing data
thinkall Dec 25, 2025
519cc5e
Pin lightning<=2.5.6
thinkall Dec 25, 2025
229826b
Fix typo in find and replace
thinkall Dec 25, 2025
d4d1e2e
Fix estimators has no attribute __sklearn_tags__
thinkall Dec 25, 2025
6c731ec
Pin torch to 2.2.2 in tests
thinkall Dec 25, 2025
5f1fe8e
Fix conflict
thinkall Dec 25, 2025
f59b667
Update pytorch-forecasting
thinkall Dec 25, 2025
4173a9b
Update pytorch-forecasting
thinkall Dec 26, 2025
12106d0
Update pytorch-forecasting
thinkall Dec 26, 2025
022d64a
Use numpy<2 for testing
thinkall Dec 26, 2025
28afa46
Update scikit-learn
thinkall Dec 26, 2025
5915ae0
Run Build and UT every other day
thinkall Dec 26, 2025
21d346a
Pin pip<24.1
thinkall Dec 26, 2025
768a8d8
Pin pip<24.1 in pipeline
thinkall Dec 26, 2025
843e860
Loosen pip, install pytorch_forecasting only in py311
thinkall Dec 26, 2025
dd5b992
Add support to new versions of nlp dependecies
thinkall Jan 7, 2026
54e3101
Fix formats
thinkall Jan 7, 2026
ffb87f4
Remove redefinition
thinkall Jan 7, 2026
bc30b13
Update mlflow versions
thinkall Jan 7, 2026
a688c5d
Fix mlflow version syntax
thinkall Jan 7, 2026
71ccd5a
Update gitignore
thinkall Jan 7, 2026
1902c91
Clean up cache to free space
thinkall Jan 7, 2026
61fe7ef
Remove clean up action cache
thinkall Jan 7, 2026
e551629
Fix blendsearch
thinkall Jan 7, 2026
7ea4bc6
Update test workflow
thinkall Jan 7, 2026
86afb2a
Update setup.py
thinkall Jan 7, 2026
e90836f
Fix catboost version
thinkall Jan 7, 2026
a051966
Update workflow
thinkall Jan 7, 2026
b5732ad
Prepare for python 3.14
thinkall Jan 7, 2026
3362875
Support no catboost
thinkall Jan 7, 2026
d7c77f1
Fix tests
thinkall Jan 7, 2026
fe1b144
Fix python_requires
thinkall Jan 7, 2026
c28d9db
Update test workflow
thinkall Jan 8, 2026
ffcfacc
Fix vw tests
thinkall Jan 8, 2026
ca1e0d1
Remove python 3.9
thinkall Jan 8, 2026
e962e83
Fix nlp tests
thinkall Jan 8, 2026
0157b4d
Fix prophet
thinkall Jan 8, 2026
12e3502
Print pip freeze for better debugging
thinkall Jan 8, 2026
560b7c7
Fix Optuna search does not support parameters of type Float with samp…
thinkall Jan 8, 2026
557c250
Save dependencies for later inspection
thinkall Jan 8, 2026
7b2aec9
Fix coverage.xml not exists
thinkall Jan 8, 2026
3196551
Fix github action permission
thinkall Jan 8, 2026
2eb598a
Handle python 3.13
thinkall Jan 8, 2026
fbe4192
Address openml is not installed
thinkall Jan 8, 2026
ae0e687
Check dependencies before run tests
thinkall Jan 8, 2026
3c49b6c
Update dependencies
thinkall Jan 8, 2026
6a7cddf
Fix syntax error
thinkall Jan 8, 2026
c5d6937
Use bash
thinkall Jan 8, 2026
2176cfc
Update dependencies
thinkall Jan 8, 2026
8ac1583
Fix git error
thinkall Jan 8, 2026
caffde4
Loose mlflow constraints
thinkall Jan 8, 2026
e609bf2
Add rerun, use mlflow-skinny
thinkall Jan 8, 2026
545f9ea
Fix git error
thinkall Jan 9, 2026
803acb3
Remove ray tests
thinkall Jan 9, 2026
308965c
Update xgboost versions
thinkall Jan 9, 2026
0287011
Fix automl pickle error
thinkall Jan 9, 2026
ed4b1ac
Don't test python 3.10 on macos as it's stuck
thinkall Jan 9, 2026
36c0745
Rebase before push
thinkall Jan 9, 2026
cc0089e
Reduce number of branches
thinkall Jan 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 54 additions & 41 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,12 @@ on:
- 'setup.py'
merge_group:
types: [checks_requested]
schedule:
# Every other day at 02:00 UTC
- cron: '0 2 */2 * *'

permissions: {}
permissions:
contents: write
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
Expand All @@ -36,15 +40,18 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.9", "3.10", "3.11"]
python-version: ["3.10", "3.11"]
exclude:
- os: macos-latest
python-version: "3.10"
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: On mac, install libomp to facilitate lgbm and xgboost install
if: matrix.os == 'macOS-latest'
if: matrix.os == 'macos-latest'
run: |
brew update
brew install libomp
Expand All @@ -70,62 +77,68 @@ jobs:
run: |
pip install pyspark==3.5.1
pip list | grep "pyspark"
- name: If linux and python<3.11, install ray 2
if: matrix.os == 'ubuntu-latest' && matrix.python-version != '3.11'
- name: On Ubuntu python 3.12, install pyspark 4.0.1
if: matrix.python-version == '3.12' && matrix.os == 'ubuntu-latest'
run: |
pip install "ray[tune]<2.5.0"
- name: If mac and python 3.10, install ray and xgboost 1
if: matrix.os == 'macOS-latest' && matrix.python-version == '3.10'
run: |
pip install -e .[ray]
# use macOS to test xgboost 1, but macOS also supports xgboost 2
pip install "xgboost<2"
- name: If linux, install prophet on python < 3.9
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.8'
pip install pyspark==4.0.1
pip list | grep "pyspark"
# # TODO: support ray
# - name: If linux and python<3.11, install ray 2
# if: matrix.os == 'ubuntu-latest' && matrix.python-version < '3.11'
# run: |
# pip install "ray[tune]<2.5.0"
- name: Install prophet when on linux
if: matrix.os == 'ubuntu-latest'
run: |
pip install -e .[forecast]
- name: Install vw on python < 3.10
if: matrix.python-version == '3.8' || matrix.python-version == '3.9'
# TODO: support vw for python 3.10+
- name: If linux and python<3.10, install vw
if: matrix.os == 'ubuntu-latest' && matrix.python-version < '3.10'
run: |
pip install -e .[vw]
- name: Pip freeze
run: |
pip freeze
- name: Check dependencies
run: |
python test/check_dependency.py
- name: Clear pip cache
run: |
pip cache purge
- name: Test with pytest
if: matrix.python-version != '3.10'
run: |
pytest test/ --ignore=test/autogen
pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10
- name: Coverage
if: matrix.python-version == '3.10'
run: |
pip install coverage
coverage run -a -m pytest test --ignore=test/autogen
coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10
coverage xml
- name: Upload coverage to Codecov
if: matrix.python-version == '3.10'
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
flags: unittests
- name: Save dependencies
shell: bash
run: |
git config --global user.name 'github-actions[bot]'
git config --global user.email 'github-actions[bot]@users.noreply.github.com'
git config advice.addIgnoredFile false

# docs:

# runs-on: ubuntu-latest
BRANCH=unit-tests-installed-dependencies
git fetch origin
git checkout -B "$BRANCH"
if git show-ref --verify --quiet "refs/remotes/origin/$BRANCH"; then
git rebase "origin/$BRANCH"
fi

# steps:
# - uses: actions/checkout@v3
# - name: Setup Python
# uses: actions/setup-python@v4
# with:
# python-version: '3.8'
# - name: Compile documentation
# run: |
# pip install -e .
# python -m pip install sphinx sphinx_rtd_theme
# cd docs
# make html
# - name: Deploy to GitHub pages
# if: ${{ github.ref == 'refs/heads/main' }}
# uses: JamesIves/github-pages-deploy-action@3.6.2
# with:
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# BRANCH: gh-pages
# FOLDER: docs/_build/html
# CLEAN: true
pip freeze > installed_all_dependencies_${{ matrix.python-version }}_${{ matrix.os }}.txt
python test/check_dependency.py > installed_first_tier_dependencies_${{ matrix.python-version }}_${{ matrix.os }}.txt
git add installed_*dependencies*.txt
mv coverage.xml ./coverage_${{ matrix.python-version }}_${{ matrix.os }}.xml || true
git add -f ./coverage_${{ matrix.python-version }}_${{ matrix.os }}.xml || true
git commit -m "Update installed dependencies for Python ${{ matrix.python-version }} on ${{ matrix.os }}" || exit 0
git push origin "$BRANCH"
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ test/default
test/housing.json
test/nlp/default/transformer_ms/seq-classification.json

flaml/fabric/fanova/_fanova.c
flaml/fabric/fanova/*fanova.c
# local config files
*.config.local

Expand All @@ -184,3 +184,7 @@ notebook/lightning_logs/
lightning_logs/
flaml/autogen/extensions/tmp/
test/autogen/my_tmp/
catboost_*

# Internal configs
.pypirc
49 changes: 5 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,17 @@
<br>
</p>

:fire: FLAML supports AutoML and Hyperparameter Tuning in [Microsoft Fabric Data Science](https://learn.microsoft.com/en-us/fabric/data-science/automated-machine-learning-fabric). In addition, we've introduced Python 3.11 support, along with a range of new estimators, and comprehensive integration with MLflow—thanks to contributions from the Microsoft Fabric product team.
:fire: FLAML supports AutoML and Hyperparameter Tuning in [Microsoft Fabric Data Science](https://learn.microsoft.com/en-us/fabric/data-science/automated-machine-learning-fabric). In addition, we've introduced Python 3.11 and 3.12 support, along with a range of new estimators, and comprehensive integration with MLflow—thanks to contributions from the Microsoft Fabric product team.

:fire: Heads-up: We have migrated [AutoGen](https://microsoft.github.io/autogen/) into a dedicated [github repository](https://github.com/microsoft/autogen). Alongside this move, we have also launched a dedicated [Discord](https://discord.gg/pAbnFJrkgZ) server and a [website](https://microsoft.github.io/autogen/) for comprehensive documentation.

:fire: The automated multi-agent chat framework in [AutoGen](https://microsoft.github.io/autogen/) is in preview from v2.0.0.

:fire: FLAML is highlighted in OpenAI's [cookbook](https://github.com/openai/openai-cookbook#related-resources-from-around-the-web).

:fire: [autogen](https://microsoft.github.io/autogen/) is released with support for ChatGPT and GPT-4, based on [Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference](https://arxiv.org/abs/2303.04673).
:fire: Heads-up: [AutoGen](https://microsoft.github.io/autogen/) has moved to a dedicated [GitHub repository](https://github.com/microsoft/autogen). FLAML no longer includes the `autogen` module—please use AutoGen directly.

## What is FLAML

FLAML is a lightweight Python library for efficient automation of machine
learning and AI operations. It automates workflow based on large language models, machine learning models, etc.
and optimizes their performance.

- FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
- FLAML enables economical automation and tuning for ML/AI workflows, including model selection and hyperparameter optimization under resource constraints.
- For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range.
- It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.

Expand All @@ -46,50 +40,17 @@ FLAML requires **Python version >= 3.9**. It can be installed from pip:
pip install flaml
```

Minimal dependencies are installed without extra options. You can install extra options based on the feature you need. For example, use the following to install the dependencies needed by the [`autogen`](https://microsoft.github.io/autogen/) package.
Minimal dependencies are installed without extra options. You can install extra options based on the feature you need. For example, use the following to install the dependencies needed by the [`automl`](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML) module.

```bash
pip install "flaml[autogen]"
pip install "flaml[automl]"
```

Find more options in [Installation](https://microsoft.github.io/FLAML/docs/Installation).
Each of the [`notebook examples`](https://github.com/microsoft/FLAML/tree/main/notebook) may require a specific option to be installed.

## Quickstart

- (New) The [autogen](https://microsoft.github.io/autogen/) package enables the next-gen GPT-X applications with a generic multi-agent conversation framework.
It offers customizable and conversable agents which integrate LLMs, tools and human.
By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example,

```python
from flaml import autogen

assistant = autogen.AssistantAgent("assistant")
user_proxy = autogen.UserProxyAgent("user_proxy")
user_proxy.initiate_chat(
assistant,
message="Show me the YTD gain of 10 largest technology companies as of today.",
)
# This initiates an automated chat between the two agents to solve the task
```

Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers a drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, templating, filtering. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.

```python
# perform tuning
config, analysis = autogen.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_func,
inference_budget=0.05,
optimization_budget=3,
num_samples=-1,
)
# perform inference for a test instance
response = autogen.Completion.create(context=test_instance, **config)
```

- With three lines of code, you can start using this economical and fast
AutoML engine as a [scikit-learn style estimator](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML).

Expand Down
18 changes: 18 additions & 0 deletions flaml/automl/automl.py
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,24 @@ def custom_metric(
self._estimator_type = "classifier" if settings["task"] in CLASSIFICATION else "regressor"
self.best_run_id = None

def __getstate__(self):
"""Customize pickling to avoid serializing runtime-only objects.

MLflow's sklearn flavor serializes estimators via (cloud)pickle. During
AutoML fitting we may attach an internal mlflow integration instance
which holds `concurrent.futures.Future` objects and executors containing
thread locks, which are not picklable.
"""

state = self.__dict__.copy()
state.pop("mlflow_integration", None)
return state

def __setstate__(self, state):
self.__dict__.update(state)
# Ensure attribute exists post-unpickle.
self.mlflow_integration = None

def get_params(self, deep: bool = False) -> dict:
return self._settings.copy()

Expand Down
11 changes: 7 additions & 4 deletions flaml/automl/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,10 @@ def load_openml_dataset(dataset_id, data_dir=None, random_state=0, dataset_forma
"""
import pickle

import openml
try:
import openml
except ImportError:
openml = None
from sklearn.model_selection import train_test_split

filename = "openml_ds" + str(dataset_id) + ".pkl"
Expand All @@ -61,15 +64,15 @@ def load_openml_dataset(dataset_id, data_dir=None, random_state=0, dataset_forma
dataset = pickle.load(f)
else:
print("download dataset from openml")
dataset = openml.datasets.get_dataset(dataset_id)
dataset = openml.datasets.get_dataset(dataset_id) if openml else None
if not os.path.exists(data_dir):
os.makedirs(data_dir)
with open(filepath, "wb") as f:
pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)
print("Dataset name:", dataset.name)
print("Dataset name:", dataset.name) if dataset else None
try:
X, y, *__ = dataset.get_data(target=dataset.default_target_attribute, dataset_format=dataset_format)
except ValueError:
except (ValueError, AttributeError, TypeError):
from sklearn.datasets import fetch_openml

X, y = fetch_openml(data_id=dataset_id, return_X_y=True)
Expand Down
14 changes: 13 additions & 1 deletion flaml/automl/ml.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,21 @@ def metric_loss_score(
import datasets

datasets_metric_name = huggingface_submetric_to_metric.get(metric_name, metric_name.split(":")[0])
metric = datasets.load_metric(datasets_metric_name, trust_remote_code=True)
metric_mode = huggingface_metric_to_mode[datasets_metric_name]

# datasets>=3 removed load_metric; prefer evaluate if available
try:
import evaluate

metric = evaluate.load(datasets_metric_name, trust_remote_code=True)
except Exception:
if hasattr(datasets, "load_metric"):
metric = datasets.load_metric(datasets_metric_name, trust_remote_code=True)
else:
from datasets import load_metric as _load_metric # older datasets

metric = _load_metric(datasets_metric_name, trust_remote_code=True)

if metric_name.startswith("seqeval"):
y_processed_true = [[labels[tr] for tr in each_list] for each_list in y_processed_true]
elif metric in ("pearsonr", "spearmanr"):
Expand Down
2 changes: 1 addition & 1 deletion flaml/automl/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ def limit_resource(memory_limit, time_limit):
pass


class BaseEstimator:
class BaseEstimator(sklearn.base.ClassifierMixin, sklearn.base.BaseEstimator):
"""The abstract class for all learners.

Typical examples:
Expand Down
8 changes: 8 additions & 0 deletions flaml/automl/nlp/huggingface/training_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,14 @@ class TrainingArgumentsForAuto(TrainingArguments):

logging_steps: int = field(default=500, metadata={"help": "Log every X updates steps."})

# Newer versions of HuggingFace Transformers may access `TrainingArguments.generation_config`
# (e.g., in generation-aware trainers/callbacks). Keep this attribute to remain compatible
# while defaulting to None for non-generation tasks.
generation_config: Optional[object] = field(
default=None,
metadata={"help": "Optional generation config (or path) used by generation-aware trainers."},
)

@staticmethod
def load_args_from_console():
from dataclasses import fields
Expand Down
Loading
Loading