Support float and bool targets in khiops classifiers#354
Conversation
ee7a636 to
93c9ff3
Compare
c5a5644 to
3d56a94
Compare
|
Rebase PR branch on |
There was a problem hiding this comment.
See the comments.
Update docstring of KhiopsClassifier.predict with the mention of the bool and float target types: https://github.com/KhiopsML/khiops-python/pull/354/files#diff-802caefa84456ad964818a51e2c00f1decf6553480a063759a060c229fb16166R2308.
3d56a94 to
d7d6526
Compare
|
Commit 5816fc6 should be dropped. |
|
Add sklearn sample illustrating the boolean and / or floating-point targets? For instance, update |
d7d6526 to
85d6b30
Compare
popescu-v
left a comment
There was a problem hiding this comment.
Update samples_sklearn.py to use boolean and / or floating-point classification targets (see relevant comment on the PR).
Check API doc building workflow for no warnings; visually inspect API docs rendering from the CI-produced artifact.
85d6b30 to
a404e40
Compare
popescu-v
left a comment
There was a problem hiding this comment.
Technically, LGTM. However, rebase of the PR branch on the latest dev-v10 is needed before merging.
3baff2c to
3155507
Compare
folmos-at-orange
left a comment
There was a problem hiding this comment.
Small things + 1 that can be done in another PR.
There was a problem hiding this comment.
I don't think a samples for these features are necessary. This is the kind of thing that "it should just work".
There was a problem hiding this comment.
Right, but as it didn't, it's also a way of ensuring non-regression and to show that this (user-required) feature, which was not supported, is supported from now on.
There was a problem hiding this comment.
We have the tests for this, which were updated in this PR. Should we have also samples for every feature combination?
There was a problem hiding this comment.
No, but the key word in my argument is "to show" (i.e. [historical] users can see that this is now supported; a contrario, the tests are not read by the users).
There was a problem hiding this comment.
Again: I think this should "just work", and it is not an essential feature to showcase as it is not related to ML nor multi-table. Should we also add samples for 1-column dataframes as y?, and for python value lists?. All these are accepted types in sklearn.
There was a problem hiding this comment.
Finally, I tend to agree that dedicated samples for float and bool targets are not needed.
Hence, I am OK with removing these two dedicated samples; the automatic tests indeed cover this feature.
However, I would still add a realistic sample emphasizing this, e.g. using directly the results of some previous processing (as an array of floats) as targets for a classification problem, without needing any conversion.
There was a problem hiding this comment.
That seems fine to me. Maybe in another PR ?
There was a problem hiding this comment.
Agreed ! OK to remove them.
khiops/sklearn/dataset.py
Outdated
| # 'dtype' has been introduced on `column_or_1d' since Scikit-learn 1.2; | ||
| if sklearn.__version__ < "1.2": | ||
| if pd.api.types.is_string_dtype(dtype) and y.isin(["True", "False"]).all(): | ||
| warnings.warn("'y' contains strings of 'True' and 'False'") |
There was a problem hiding this comment.
I'd rephrase: y stores strings restricted to 'True'/'False' values: The predict method may return a bool vector.
tests/test_sklearn_output_types.py
Outdated
|
|
||
| # Check the return type of predict | ||
| y_pred = khc.predict(X) | ||
|
|
There was a problem hiding this comment.
Eliminate this blank line
3155507 to
7e7f044
Compare
7e7f044 to
e7674ec
Compare
closes #74 , #128
TODO Before Asking for a Review
dev(ormainfor release PRs)Unreleasedsection ofCHANGELOG.md(no date)index.html