Skip to content

Commit 1c9835d

Browse files
authored
Add support to Python 3.12, Sync Fabric till dc382961 (#1467)
* Merged PR 1686010: Bump version to 2.3.5.post2, Distribute source and wheel, Fix license-file, Only log better models - Fix license-file - Bump version to 2.3.5.post2 - Distribute source and wheel - Log better models only - Add artifact_path to register_automl_pipeline - Improve logging of _automl_user_configurations ---- This pull request fixes the project’s configuration by updating the license metadata for compliance with FLAML OSS 2.3.5. The changes in `/pyproject.toml` update the project’s license and readme metadata by replacing deprecated keys with the new structured fields. - `/pyproject.toml`: Replaced `license_file` with `license = { text = "MIT" }`. - `/pyproject.toml`: Replaced `description-file` with `readme = "README.md"`. <!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot --> Related work items: #4252053 * Merged PR 1688479: Handle feature_importances_ is None, Catch RuntimeError and wait for spark cluster to recover - Add warning message when feature_importances_ is None (#3982120) - Catch RuntimeError and wait for spark cluster to recover (#3982133) ---- Bug fix. This pull request prevents an AttributeError in the feature importance plotting function by adding a check for a `None` value with an informative warning message. - `flaml/fabric/visualization.py`: Checks if `result.feature_importances_` is `None`, logs a warning with possible reasons, and returns early. - `flaml/fabric/visualization.py`: Imports `logger` from `flaml.automl.logger` to support the warning message. <!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot --> Related work items: #3982120, #3982133 * Removed deprecated metadata section * Fix log_params, log_artifact doesn't support run_id in mlflow 2.6.0 * Remove autogen * Remove autogen * Remove autogen * Merged PR 1776547: Fix flaky test test_automl Don't throw error when time budget is not enough ---- #### AI description (iteration 1) #### PR Classification Bug fix addressing a failing test in the AutoML notebook example. #### PR Summary This PR fixes a flaky test by adding a conditional check in the AutoML test that prints a message and exits early if no best estimator is set, thereby preventing unpredictable test failures. - `test/automl/test_notebook_example.py`: Introduced a check to print "Training budget is not sufficient" and return if `automl.best_estimator` is not found. <!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot --> Related work items: #4573514 * Merged PR 1777952: Fix unrecognized or malformed field 'license-file' when uploading wheel to feed Try to fix InvalidDistribution: Invalid distribution metadata: unrecognized or malformed field 'license-file' ---- Bug fix addressing package metadata configuration. This pull request fixes the error with unrecognized or malformed license file fields during wheel uploads by updating the setup configuration. - In `setup.py`, added `license="MIT"` and `license_files=["LICENSE"]` to provide proper license metadata. <!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot --> Related work items: #4560034 * Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0 * Cherry-pick Merged PR 1890869: Improve time_budget estimation for mlflow logging * Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0 * Disable openai workflow * Add python 3.12 to test envs * Manually trigger openai * Support markdown files with underscore-prefixed file names * Improve save dependencies * SynapseML is not installed * Fix syntax error:Module !flaml/autogen was never imported * macos 3.12 also hangs * fix syntax error * Update python version in actions * Install setuptools for using pkg_resources * Fix test_automl_performance in Github actions * Fix test_nested_run
1 parent 1285700 commit 1c9835d

33 files changed

Lines changed: 426 additions & 135 deletions

.coveragerc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
[run]
22
branch = True
3-
source = flaml
3+
source =
4+
flaml
45
omit =
5-
*test*
6+
*/test/*
7+
*/flaml/autogen/*

.github/workflows/CD.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
strategy:
1414
matrix:
1515
os: ["ubuntu-latest"]
16-
python-version: ["3.10"]
16+
python-version: ["3.12"]
1717
runs-on: ${{ matrix.os }}
1818
environment: package
1919
steps:

.github/workflows/deploy-website.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,11 @@ jobs:
3737
- name: setup python
3838
uses: actions/setup-python@v4
3939
with:
40-
python-version: "3.10"
40+
python-version: "3.12"
4141
- name: pydoc-markdown install
4242
run: |
4343
python -m pip install --upgrade pip
44-
pip install pydoc-markdown==4.7.0
44+
pip install pydoc-markdown==4.7.0 setuptools
4545
- name: pydoc-markdown run
4646
run: |
4747
pydoc-markdown
@@ -73,11 +73,11 @@ jobs:
7373
- name: setup python
7474
uses: actions/setup-python@v4
7575
with:
76-
python-version: "3.10"
76+
python-version: "3.12"
7777
- name: pydoc-markdown install
7878
run: |
7979
python -m pip install --upgrade pip
80-
pip install pydoc-markdown==4.7.0
80+
pip install pydoc-markdown==4.7.0 setuptools
8181
- name: pydoc-markdown run
8282
run: |
8383
pydoc-markdown

.github/workflows/openai.yml

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,15 @@
44
name: OpenAI
55

66
on:
7-
pull_request:
8-
branches: ['main']
9-
paths:
10-
- 'flaml/autogen/**'
11-
- 'test/autogen/**'
12-
- 'notebook/autogen_openai_completion.ipynb'
13-
- 'notebook/autogen_chatgpt_gpt4.ipynb'
14-
- '.github/workflows/openai.yml'
7+
workflow_dispatch:
8+
# pull_request:
9+
# branches: ['main']
10+
# paths:
11+
# - 'flaml/autogen/**'
12+
# - 'test/autogen/**'
13+
# - 'notebook/autogen_openai_completion.ipynb'
14+
# - 'notebook/autogen_chatgpt_gpt4.ipynb'
15+
# - '.github/workflows/openai.yml'
1516

1617
permissions: {}
1718

.github/workflows/python-package.yml

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,12 @@ jobs:
4040
fail-fast: false
4141
matrix:
4242
os: [ubuntu-latest, macos-latest, windows-latest]
43-
python-version: ["3.10", "3.11"]
43+
python-version: ["3.10", "3.11", "3.12"]
4444
exclude:
4545
- os: macos-latest
46-
python-version: "3.10"
46+
python-version: "3.10" # macOS runners will hang on python 3.10 for unknown reasons
47+
- os: macos-latest
48+
python-version: "3.12" # macOS runners will hang on python 3.12 for unknown reasons
4749
steps:
4850
- uses: actions/checkout@v4
4951
- name: Set up Python ${{ matrix.python-version }}
@@ -67,11 +69,6 @@ jobs:
6769
pip install -e .
6870
python -c "import flaml"
6971
pip install -e .[test]
70-
- name: On Ubuntu python 3.10, install pyspark 3.4.1
71-
if: matrix.python-version == '3.10' && matrix.os == 'ubuntu-latest'
72-
run: |
73-
pip install pyspark==3.4.1
74-
pip list | grep "pyspark"
7572
- name: On Ubuntu python 3.11, install pyspark 3.5.1
7673
if: matrix.python-version == '3.11' && matrix.os == 'ubuntu-latest'
7774
run: |
@@ -106,17 +103,17 @@ jobs:
106103
run: |
107104
pip cache purge
108105
- name: Test with pytest
109-
if: matrix.python-version != '3.10'
106+
if: matrix.python-version != '3.11'
110107
run: |
111108
pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10
112109
- name: Coverage
113-
if: matrix.python-version == '3.10'
110+
if: matrix.python-version == '3.11'
114111
run: |
115112
pip install coverage
116113
coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10
117114
coverage xml
118115
- name: Upload coverage to Codecov
119-
if: matrix.python-version == '3.10'
116+
if: matrix.python-version == '3.11'
120117
uses: codecov/codecov-action@v3
121118
with:
122119
file: ./coverage.xml
@@ -130,15 +127,12 @@ jobs:
130127
131128
BRANCH=unit-tests-installed-dependencies
132129
git fetch origin
133-
git checkout -B "$BRANCH"
134-
if git show-ref --verify --quiet "refs/remotes/origin/$BRANCH"; then
135-
git rebase "origin/$BRANCH"
136-
fi
130+
git checkout -B "$BRANCH" "origin/$BRANCH"
137131
138132
pip freeze > installed_all_dependencies_${{ matrix.python-version }}_${{ matrix.os }}.txt
139133
python test/check_dependency.py > installed_first_tier_dependencies_${{ matrix.python-version }}_${{ matrix.os }}.txt
140134
git add installed_*dependencies*.txt
141135
mv coverage.xml ./coverage_${{ matrix.python-version }}_${{ matrix.os }}.xml || true
142136
git add -f ./coverage_${{ matrix.python-version }}_${{ matrix.os }}.xml || true
143137
git commit -m "Update installed dependencies for Python ${{ matrix.python-version }} on ${{ matrix.os }}" || exit 0
144-
git push origin "$BRANCH"
138+
git push origin "$BRANCH" --force

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ coverage.xml
6060
.hypothesis/
6161
.pytest_cache/
6262
cover/
63+
junit
6364

6465
# Translations
6566
*.mo

flaml/autogen/__init__.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
import warnings
2+
13
from .agentchat import *
24
from .code_utils import DEFAULT_MODEL, FAST_MODEL
35
from .oai import *
6+
7+
warnings.warn(
8+
"The `flaml.autogen` module is deprecated and will be removed in a future release. "
9+
"Please refer to `https://github.com/microsoft/autogen` for latest usage.",
10+
DeprecationWarning,
11+
stacklevel=2,
12+
)

flaml/automl/automl.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
# * project root for license information.
55
from __future__ import annotations
66

7+
import inspect
78
import json
89
import logging
910
import os
@@ -177,10 +178,11 @@ def custom_metric(
177178
['auto', 'cv', 'holdout'].
178179
split_ratio: A float of the valiation data percentage for holdout.
179180
n_splits: An integer of the number of folds for cross - validation.
180-
log_type: A string of the log type, one of
181-
['better', 'all'].
182-
'better' only logs configs with better loss than previos iters
183-
'all' logs all the tried configs.
181+
log_type: Specifies which logs to save. One of ['better', 'all']. Default is 'better'.
182+
- 'better': Logs configs and models (if `model_history` is True) only when the loss improves,
183+
to `log_file_name` and MLflow, respectively.
184+
- 'all': Logs all configs and models (if `model_history` is True), regardless of performance.
185+
Note: Configs are always logged to MLflow if MLflow logging is enabled.
184186
model_history: A boolean of whether to keep the best
185187
model per estimator. Make sure memory is large enough if setting to True. Default False.
186188
log_training_metric: A boolean of whether to log the training
@@ -2174,7 +2176,7 @@ def _search_parallel(self):
21742176
use_spark=True,
21752177
force_cancel=self._force_cancel,
21762178
mlflow_exp_name=self._mlflow_exp_name,
2177-
automl_info=(mlflow_log_latency,), # pass automl info to tune.run
2179+
automl_info=(mlflow_log_latency, self._log_type), # pass automl info to tune.run
21782180
extra_tag=self.autolog_extra_tag,
21792181
# raise_on_failed_trial=False,
21802182
# keep_checkpoints_num=1,
@@ -2237,7 +2239,9 @@ def _search_parallel(self):
22372239
if better or self._log_type == "all":
22382240
self._log_trial(search_state, estimator)
22392241
if self.mlflow_integration:
2240-
self.mlflow_integration.record_state(self, search_state, estimator)
2242+
self.mlflow_integration.record_state(
2243+
self, search_state, estimator, better or self._log_type == "all"
2244+
)
22412245

22422246
def _log_trial(self, search_state, estimator):
22432247
if self._training_log:
@@ -2479,7 +2483,9 @@ def _search_sequential(self):
24792483
if better or self._log_type == "all":
24802484
self._log_trial(search_state, estimator)
24812485
if self.mlflow_integration:
2482-
self.mlflow_integration.record_state(self, search_state, estimator)
2486+
self.mlflow_integration.record_state(
2487+
self, search_state, estimator, better or self._log_type == "all"
2488+
)
24832489

24842490
logger.info(
24852491
" at {:.1f}s,\testimator {}'s best error={:.4f},\tbest estimator {}'s best error={:.4f}".format(

flaml/automl/data.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import json
66
import os
77
import random
8+
import re
89
import uuid
910
from datetime import datetime, timedelta
1011
from decimal import ROUND_HALF_UP, Decimal
@@ -708,6 +709,14 @@ def auto_convert_dtypes_pandas(
708709
"""
709710
if na_values is None:
710711
na_values = {"NA", "na", "NULL", "null", ""}
712+
# Remove the empty string separately (handled by the regex `^\s*$`)
713+
vals = [re.escape(v) for v in na_values if v != ""]
714+
# Build inner alternation group
715+
inner = "|".join(vals) if vals else ""
716+
if inner:
717+
pattern = re.compile(rf"^\s*(?:{inner})?\s*$")
718+
else:
719+
pattern = re.compile(r"^\s*$")
711720

712721
df_converted = df.convert_dtypes()
713722
schema = {}
@@ -721,7 +730,11 @@ def auto_convert_dtypes_pandas(
721730
for col in df.columns:
722731
series = df[col]
723732
# Replace NA-like values if string
724-
series_cleaned = series.map(lambda x: np.nan if isinstance(x, str) and x.strip() in na_values else x)
733+
if series.dtype == object:
734+
mask = series.astype(str).str.match(pattern)
735+
series_cleaned = series.where(~mask, np.nan)
736+
else:
737+
series_cleaned = series
725738

726739
# Skip conversion if already non-object data type, except bool which can potentially be categorical
727740
if (

flaml/automl/model.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2347,8 +2347,11 @@ def config2params(self, config: dict) -> dict:
23472347
params = super().config2params(config)
23482348
params["tol"] = params.get("tol", 0.0001)
23492349
params["loss"] = params.get("loss", None)
2350-
if params["loss"] is None and self._task.is_classification():
2351-
params["loss"] = "log_loss" if SKLEARN_VERSION >= "1.1" else "log"
2350+
if params["loss"] is None:
2351+
if self._task.is_classification():
2352+
params["loss"] = "log_loss" if SKLEARN_VERSION >= "1.1" else "log"
2353+
else:
2354+
params["loss"] = "squared_error"
23522355
if not self._task.is_classification() and "n_jobs" in params:
23532356
params.pop("n_jobs")
23542357

0 commit comments

Comments
 (0)