Skip to content

Support float and bool targets in khiops classifiers#354

Merged
nairbenrekia merged 1 commit intodev-v10from
support-float-and-bool-targets-in-khiops-classifiers
Mar 24, 2025
Merged

Support float and bool targets in khiops classifiers#354
nairbenrekia merged 1 commit intodev-v10from
support-float-and-bool-targets-in-khiops-classifiers

Conversation

@nairbenrekia
Copy link
Copy Markdown
Collaborator

@nairbenrekia nairbenrekia commented Feb 7, 2025

closes #74 , #128


TODO Before Asking for a Review

  • Rebase your branch to the latest version of dev (or main for release PRs)
  • Make sure all CI workflows are green
  • When adding a public feature/fix: Update the Unreleased section of CHANGELOG.md (no date)
  • Self-Review: Review "Files Changed" tab and fix any problems you find
  • API Docs (only if there are changes in docstrings, rst files or samples):
    • Check the docs build without warning: see the log of the API Docs workflow
    • Check that your changes render well in HTML: download the API Docs artifact and open index.html
    • If there are any problems it is faster to iterate by building locally the API Docs

@nairbenrekia nairbenrekia force-pushed the support-float-and-bool-targets-in-khiops-classifiers branch from ee7a636 to 93c9ff3 Compare February 7, 2025 19:10
@nairbenrekia nairbenrekia force-pushed the support-float-and-bool-targets-in-khiops-classifiers branch 2 times, most recently from c5a5644 to 3d56a94 Compare February 11, 2025 12:16
@popescu-v popescu-v linked an issue Feb 11, 2025 that may be closed by this pull request
@popescu-v
Copy link
Copy Markdown
Collaborator

Rebase PR branch on dev-v10.
Change base branch of the PR to dev-v10.

Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments.
Update docstring of KhiopsClassifier.predict with the mention of the bool and float target types: https://github.com/KhiopsML/khiops-python/pull/354/files#diff-802caefa84456ad964818a51e2c00f1decf6553480a063759a060c229fb16166R2308.

@nairbenrekia nairbenrekia changed the base branch from dev to dev-v10 February 11, 2025 17:26
@nairbenrekia nairbenrekia force-pushed the support-float-and-bool-targets-in-khiops-classifiers branch from 3d56a94 to d7d6526 Compare February 11, 2025 18:14
@popescu-v
Copy link
Copy Markdown
Collaborator

Commit 5816fc6 should be dropped.

@popescu-v
Copy link
Copy Markdown
Collaborator

popescu-v commented Feb 12, 2025

Add sklearn sample illustrating the boolean and / or floating-point targets? For instance, update samples_sklearn.py khiops_classifier_multiclass by adding the use of boolean targets (a new train / test by having as target, e.g. "is Iris-setosa" with values True / False) ?

@nairbenrekia nairbenrekia force-pushed the support-float-and-bool-targets-in-khiops-classifiers branch from d7d6526 to 85d6b30 Compare February 13, 2025 13:22
Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update samples_sklearn.py to use boolean and / or floating-point classification targets (see relevant comment on the PR).
Check API doc building workflow for no warnings; visually inspect API docs rendering from the CI-produced artifact.

@nairbenrekia nairbenrekia force-pushed the support-float-and-bool-targets-in-khiops-classifiers branch from 85d6b30 to a404e40 Compare February 17, 2025 11:22
Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, LGTM. However, rebase of the PR branch on the latest dev-v10 is needed before merging.

@nairbenrekia nairbenrekia force-pushed the support-float-and-bool-targets-in-khiops-classifiers branch from 3baff2c to 3155507 Compare March 18, 2025 15:27
Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@folmos-at-orange folmos-at-orange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small things + 1 that can be done in another PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a samples for these features are necessary. This is the kind of thing that "it should just work".

Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but as it didn't, it's also a way of ensuring non-regression and to show that this (user-required) feature, which was not supported, is supported from now on.

Copy link
Copy Markdown
Member

@folmos-at-orange folmos-at-orange Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the tests for this, which were updated in this PR. Should we have also samples for every feature combination?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but the key word in my argument is "to show" (i.e. [historical] users can see that this is now supported; a contrario, the tests are not read by the users).

Copy link
Copy Markdown
Member

@folmos-at-orange folmos-at-orange Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again: I think this should "just work", and it is not an essential feature to showcase as it is not related to ML nor multi-table. Should we also add samples for 1-column dataframes as y?, and for python value lists?. All these are accepted types in sklearn.

Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, I tend to agree that dedicated samples for float and bool targets are not needed.

Hence, I am OK with removing these two dedicated samples; the automatic tests indeed cover this feature.

However, I would still add a realistic sample emphasizing this, e.g. using directly the results of some previous processing (as an array of floats) as targets for a classification problem, without needing any conversion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me. Maybe in another PR ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed ! OK to remove them.

# 'dtype' has been introduced on `column_or_1d' since Scikit-learn 1.2;
if sklearn.__version__ < "1.2":
if pd.api.types.is_string_dtype(dtype) and y.isin(["True", "False"]).all():
warnings.warn("'y' contains strings of 'True' and 'False'")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rephrase: y stores strings restricted to 'True'/'False' values: The predict method may return a bool vector.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.


# Check the return type of predict
y_pred = khc.predict(X)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminate this blank line

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@nairbenrekia nairbenrekia force-pushed the support-float-and-bool-targets-in-khiops-classifiers branch from 3155507 to 7e7f044 Compare March 24, 2025 11:10
@nairbenrekia nairbenrekia force-pushed the support-float-and-bool-targets-in-khiops-classifiers branch from 7e7f044 to e7674ec Compare March 24, 2025 16:47
@nairbenrekia nairbenrekia merged commit 6373ff5 into dev-v10 Mar 24, 2025
19 checks passed
@nairbenrekia nairbenrekia deleted the support-float-and-bool-targets-in-khiops-classifiers branch March 24, 2025 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support bool column types in KhiopsClassifier Support floating-point targets in Khiops classifiers

3 participants