Add comprehensive metric documentation and URL reference to AutoML docstrings (#1471)

Copilot · thinkall · web-flow · commit 22dcfcd3c04c · 2026-01-20T10:34:54.000+08:00
* Initial plan

* Update AutoML metric documentation with full list and documentation link

Co-authored-by: thinkall &lt;3197038+thinkall@users.noreply.github.com&gt;

* Apply black and mdformat formatting to code and documentation

Co-authored-by: thinkall &lt;3197038+thinkall@users.noreply.github.com&gt;

* Apply pre-commit formatting fixes

Co-authored-by: thinkall &lt;3197038+thinkall@users.noreply.github.com&gt;

---------

Co-authored-by: copilot-swe-agent[bot] &lt;198982749+Copilot@users.noreply.github.com&gt;
Co-authored-by: thinkall &lt;3197038+thinkall@users.noreply.github.com&gt;
Co-authored-by: Li Jiang &lt;bnujli@gmail.com&gt;
diff --git a/flaml/automl/automl.py b/flaml/automl/automl.py
@@ -118,6 +118,8 @@ def __init__(self, **settings):
                 e.g., 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_weighted',
                 'roc_auc_ovo_weighted', 'roc_auc_ovr_weighted', 'f1', 'micro_f1', 'macro_f1',
                 'log_loss', 'mae', 'mse', 'r2', 'mape'. Default is 'auto'.
+                For a full list of supported built-in metrics, please refer to
+                https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric
                 If passing a customized metric function, the function needs to
                 have the following input arguments:
 
@@ -1765,6 +1767,8 @@ def fit(
                 e.g., 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_weighted',
                 'roc_auc_ovo_weighted', 'roc_auc_ovr_weighted', 'f1', 'micro_f1', 'macro_f1',
                 'log_loss', 'mae', 'mse', 'r2', 'mape'. Default is 'auto'.
+                For a full list of supported built-in metrics, please refer to
+                https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric
                 If passing a customized metric function, the function needs to
                 have the following input arguments:
 
diff --git a/website/docs/Use-Cases/Task-Oriented-AutoML.md b/website/docs/Use-Cases/Task-Oriented-AutoML.md
@@ -51,6 +51,7 @@ If users provide the minimal inputs only, `AutoML` uses the default settings for
 The optimization metric is specified via the `metric` argument. It can be either a string which refers to a built-in metric, or a user-defined function.
 
 - Built-in metric.
+
   - 'accuracy': 1 - accuracy as the corresponding metric to minimize.
   - 'log_loss': default metric for multiclass classification.
   - 'r2': 1 - r2_score as the corresponding metric to minimize. Default metric for regression.
@@ -70,6 +71,40 @@ The optimization metric is specified via the `metric` argument. It can be either
   - 'ap': minimize 1 - average_precision_score.
   - 'ndcg': minimize 1 - ndcg_score.
   - 'ndcg@k': minimize 1 - ndcg_score@k. k is an integer.
+  - 'pr_auc': minimize 1 - precision-recall AUC score. (Spark-specific)
+  - 'var': minimize variance. (Spark-specific)
+
+- Built-in HuggingFace metrics (for NLP tasks).
+
+  - 'accuracy': minimize 1 - accuracy.
+  - 'bertscore': minimize 1 - BERTScore.
+  - 'bleu': minimize 1 - BLEU score.
+  - 'bleurt': minimize 1 - BLEURT score.
+  - 'cer': minimize character error rate.
+  - 'chrf': minimize ChrF score.
+  - 'code_eval': minimize 1 - code evaluation score.
+  - 'comet': minimize 1 - COMET score.
+  - 'competition_math': minimize 1 - competition math score.
+  - 'coval': minimize 1 - CoVal score.
+  - 'cuad': minimize 1 - CUAD score.
+  - 'f1': minimize 1 - F1 score.
+  - 'gleu': minimize 1 - GLEU score.
+  - 'google_bleu': minimize 1 - Google BLEU score.
+  - 'matthews_correlation': minimize 1 - Matthews correlation coefficient.
+  - 'meteor': minimize 1 - METEOR score.
+  - 'pearsonr': minimize 1 - Pearson correlation coefficient.
+  - 'precision': minimize 1 - precision.
+  - 'recall': minimize 1 - recall.
+  - 'rouge': minimize 1 - ROUGE score.
+  - 'rouge1': minimize 1 - ROUGE-1 score.
+  - 'rouge2': minimize 1 - ROUGE-2 score.
+  - 'sacrebleu': minimize 1 - SacreBLEU score.
+  - 'sari': minimize 1 - SARI score.
+  - 'seqeval': minimize 1 - SeqEval score.
+  - 'spearmanr': minimize 1 - Spearman correlation coefficient.
+  - 'ter': minimize translation error rate.
+  - 'wer': minimize word error rate.
+
 - User-defined function.
   A customized metric function that requires the following (input) signature, and returns the input config’s value in terms of the metric you want to minimize, and a dictionary of auxiliary information at your choice: