Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion flaml/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "2.4.0"
__version__ = "2.4.1"
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,14 +116,14 @@
"scikit-learn",
],
"hf": [
"transformers[torch]==4.26",
"transformers[torch]>=4.26",
"datasets",
"nltk<=3.8.1",
"rouge_score",
"seqeval",
],
"nlp": [ # for backward compatibility; hf is the new option name
"transformers[torch]==4.26",
"transformers[torch]>=4.26",
"datasets",
"nltk<=3.8.1",
"rouge_score",
Expand Down
132 changes: 132 additions & 0 deletions website/docs/Best-Practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
````markdown
# Best Practices

This page collects practical guidance for using FLAML effectively across common tasks.

## General tips

- Start simple: set `task`, `time_budget`, and keep `metric="auto"` unless you have a strong reason to override.
- Prefer correct splits: ensure your evaluation strategy matches your data (time series vs i.i.d., grouped data, etc.).
- Keep estimator lists explicit when debugging: start with a small `estimator_list` and expand.
- Use built-in discovery helpers to avoid stale hardcoded lists:

```python
from flaml import AutoML
from flaml.automl.task.factory import task_factory

automl = AutoML()
print("Built-in sklearn metrics:", sorted(automl.supported_metrics[0]))
print("classification estimators:", sorted(task_factory("classification").estimators.keys()))
```

## Classification

- **Metric**: for binary classification, `metric="roc_auc"` is common; for multiclass, `metric="log_loss"` is often robust.
- **Imbalanced data**:
- pass `sample_weight` to `AutoML.fit()`;
- consider setting class weights via `custom_hp` / `fit_kwargs_by_estimator` for specific estimators (see [FAQ](FAQ)).
- **Probability vs label metrics**: use `roc_auc` / `log_loss` when you care about calibrated probabilities.

## Regression

- **Default metric**: `metric="r2"` (minimizes `1 - r2`).
- If your target scale matters (e.g., dollar error), consider `mae`/`rmse`.

## Learning to rank

- Use `task="rank"` with group information (`groups` / `groups_val`) so metrics like `ndcg` and `ndcg@k` are meaningful.
- If you pass `metric="ndcg@10"`, also pass `groups` so FLAML can compute group-aware NDCG.

## Time series forecasting

- Use time-aware splitting. For holdout validation, set `eval_method="holdout"` and use a time-ordered dataset.
- Prefer supplying a DataFrame with a clear time column when possible.
- Optional time-series estimators depend on optional dependencies. To list what is available in your environment:

```python
from flaml.automl.task.factory import task_factory

print("forecast:", sorted(task_factory("forecast").estimators.keys()))
```

## NLP (Transformers)

- Install the optional dependency: `pip install "flaml[hf]"`.
- When you provide a custom metric, ensure it returns `(metric_to_minimize, metrics_to_log)` with stable keys.

## Speed, stability, and tricky settings

- **Time budget vs convergence**: if you see warnings about not all estimators converging, increase `time_budget` or reduce `estimator_list`.
- **Memory pressure / OOM**:
- set `free_mem_ratio` (e.g., `0.2`) to keep free memory above a threshold;
- set `model_history=False` to reduce stored artifacts;
- **Reproducibility**: set `seed` and keep `n_jobs` fixed; expect some runtime variance.

## Persisting models

FLAML supports **both** MLflow logging and pickle-based persistence. For production deployment, MLflow logging is typically the most important option because it plugs into the MLflow ecosystem (tracking, model registry, serving, governance). For quick local reuse, persisting the whole `AutoML` object via pickle is often the most convenient.

### Option 1: MLflow logging (recommended for production)

When you run `AutoML.fit()` inside an MLflow run, FLAML can log metrics/params automatically (disable via `mlflow_logging=False` if needed). To persist the trained `AutoML` object as a model artifact and reuse MLflow tooling end-to-end:

```python
import mlflow
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from flaml import AutoML


X, y = load_iris(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

automl = AutoML()
mlflow.set_experiment("flaml")
with mlflow.start_run(run_name="flaml_run") as run:
automl.fit(X_train, y_train, task="classification", time_budget=3, retrain_full=False, eval_method="holdout")

run_id = run.info.run_id

# Later (or in a different process)
automl2 = mlflow.sklearn.load_model(f"runs:/{run_id}/model")
assert np.array_equal(automl2.predict(X_test), automl.predict(X_test))
```

### Option 2: Pickle the full `AutoML` instance (convenient / Fabric)

Pickling stores the *entire* `AutoML` instance (not just the best estimator). This is useful when you prefer not to rely on MLflow or when you want to reuse additional attributes of the AutoML object without retraining.

In Microsoft Fabric scenarios, this is particularly important for re-plotting visualization figures without requiring model retraining.

```python
import mlflow
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from flaml import AutoML


X, y = load_iris(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

automl = AutoML()
mlflow.set_experiment("flaml")
with mlflow.start_run(run_name="flaml_run") as run:
automl.fit(X_train, y_train, task="classification", time_budget=3, retrain_full=False, eval_method="holdout")

automl.pickle("automl.pkl")
automl2 = AutoML.load_pickle("automl.pkl")
assert np.array_equal(automl2.predict(X_test), automl.predict(X_test))
assert automl.best_config == automl2.best_config
assert automl.best_loss == automl2.best_loss
assert automl.mlflow_integration.infos == automl2.mlflow_integration.infos
```

See also: [Task-Oriented AutoML](Use-Cases/Task-Oriented-AutoML) and [FAQ](FAQ).

````
4 changes: 2 additions & 2 deletions website/docs/Contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,10 @@ There is currently no formal reviewer solicitation process. Current reviewers id

```bash
git clone https://github.com/microsoft/FLAML.git
pip install -e FLAML[notebook,autogen]
pip install -e ".[notebook]"
```

In case the `pip install` command fails, try escaping the brackets such as `pip install -e FLAML\[notebook,autogen\]`.
In case the `pip install` command fails, try escaping the brackets such as `pip install -e .\[notebook\]`.

### Docker

Expand Down
54 changes: 54 additions & 0 deletions website/docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,57 @@ from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(**best_params)
model.fit(X, y)
```

### How to save and load an AutoML object? (`pickle` / `load_pickle`)

FLAML provides `AutoML.pickle()` / `AutoML.load_pickle()` as a convenient and robust way to persist an AutoML run.

```python
from flaml import AutoML

automl = AutoML()
automl.fit(X_train, y_train, task="classification", time_budget=60)

# Save
automl.pickle("automl.pkl")

# Load
automl_loaded = AutoML.load_pickle("automl.pkl")
pred = automl_loaded.predict(X_test)
```

Notes:

- If you used Spark estimators, `AutoML.pickle()` externalizes Spark ML models into an adjacent artifact folder and keeps
the pickle itself lightweight.
- If you want to skip re-loading externalized Spark models (e.g., in an environment without Spark), use:

```python
automl_loaded = AutoML.load_pickle("automl.pkl", load_spark_models=False)
```

### How to list all available estimators for a task?

The available estimator set is task-dependent and can vary with optional dependencies. You can list the estimator keys
that FLAML currently has registered in your environment:

```python
from flaml.automl.task.factory import task_factory

print(sorted(task_factory("classification").estimators.keys()))
print(sorted(task_factory("regression").estimators.keys()))
print(sorted(task_factory("forecast").estimators.keys()))
print(sorted(task_factory("rank").estimators.keys()))
```

### How to list supported built-in metrics?

```python
from flaml import AutoML

automl = AutoML()
sklearn_metrics, hf_metrics, spark_metrics = automl.supported_metrics
print(sorted(sklearn_metrics))
print(sorted(hf_metrics))
print(spark_metrics)
```
43 changes: 4 additions & 39 deletions website/docs/Getting-Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,53 +8,17 @@ and optimizes their performance.

### Main Features

- FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
- For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend.
- It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.

FLAML is powered by a series of [research studies](/docs/Research) from Microsoft Research and collaborators such as Penn State University, Stevens Institute of Technology, University of Washington, and University of Waterloo.

### Quickstart

Install FLAML from pip: `pip install flaml`. Find more options in [Installation](/docs/Installation).
Install FLAML from pip: `pip install flaml` (**requires Python >= 3.10**). Find more options in [Installation](/docs/Installation).

There are several ways of using flaml:

#### (New) [AutoGen](https://microsoft.github.io/autogen/)

Autogen enables the next-gen GPT-X applications with a generic multi-agent conversation framework.
It offers customizable and conversable agents which integrate LLMs, tools and human.
By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example,

```python
from flaml import autogen

assistant = autogen.AssistantAgent("assistant")
user_proxy = autogen.UserProxyAgent("user_proxy")
user_proxy.initiate_chat(
assistant,
message="Show me the YTD gain of 10 largest technology companies as of today.",
)
# This initiates an automated chat between the two agents to solve the task
```

Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers a drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, error handling, templating. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.

```python
# perform tuning
config, analysis = autogen.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_func,
inference_budget=0.05,
optimization_budget=3,
num_samples=-1,
)
# perform inference for a test instance
response = autogen.Completion.create(context=test_instance, **config)
```

#### [Task-oriented AutoML](/docs/Use-Cases/task-oriented-automl)

With three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.
Expand Down Expand Up @@ -140,9 +104,10 @@ Then, you can use it just like you use the original `LGMBClassifier`. Your other

### Where to Go Next?

- Understand the use cases for [AutoGen](https://microsoft.github.io/autogen/), [Task-oriented AutoML](/docs/Use-Cases/Task-Oriented-Automl), [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML).
- Find code examples under "Examples": from [AutoGen - AgentChat](/docs/Examples/AutoGen-AgentChat) to [Tune - PyTorch](/docs/Examples/Tune-PyTorch).
- Understand the use cases for [Task-oriented AutoML](/docs/Use-Cases/Task-Oriented-Automl), [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML).
- Find code examples under "Examples": from [AutoML - Classification](/docs/Examples/AutoML-Classification) to [Tune - PyTorch](/docs/Examples/Tune-PyTorch).
- Learn about [research](/docs/Research) around FLAML and check [blogposts](/blog).
- Apply practical guidance in [Best Practices](/docs/Best-Practices).
- Chat on [Discord](https://discord.gg/Cppx2vSPVP).

If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](/docs/Contribute).
Expand Down
8 changes: 1 addition & 7 deletions website/docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Python

FLAML requires **Python version >= 3.7**. It can be installed from pip:
FLAML requires **Python version >= 3.10**. It can be installed from pip:

```bash
pip install flaml
Expand All @@ -16,12 +16,6 @@ conda install flaml -c conda-forge

### Optional Dependencies

#### [Autogen](Use-Cases/Autogen)

```bash
pip install "flaml[autogen]"
```

#### [Task-oriented AutoML](Use-Cases/Task-Oriented-AutoML)

```bash
Expand Down
Loading
Loading