Skip to content

Add sklearn estimators feature_importances_ attribute#486

Closed
popescu-v wants to merge 9 commits into
devfrom
480-improve-feature-importance-support-in-sklearn-khiops-estimators
Closed

Add sklearn estimators feature_importances_ attribute#486
popescu-v wants to merge 9 commits into
devfrom
480-improve-feature-importance-support-in-sklearn-khiops-estimators

Conversation

@popescu-v

@popescu-v popescu-v commented Oct 1, 2025

Copy link
Copy Markdown
Collaborator

Also:

  • drop feature_evaluated_importances_ and associated
  • only put importances in feature_used_importances.

TODO Before Asking for a Review

  • Rebase your branch to the latest version of dev (or main for release PRs)
  • Make sure all CI workflows are green
  • When adding a public feature/fix: Update the Unreleased section of CHANGELOG.md (no date)
  • Self-Review: Review "Files Changed" tab and fix any problems you find
  • API Docs (only if there are changes in docstrings, rst files or samples):
    • Check the docs build without warning: see the log of the API Docs workflow
    • Check that your changes render well in HTML: download the API Docs artifact and open index.html
    • If there are any problems it is faster to iterate by building locally the API Docs

@popescu-v popescu-v linked an issue Oct 1, 2025 that may be closed by this pull request
@popescu-v popescu-v marked this pull request as draft October 1, 2025 17:13
@popescu-v popescu-v force-pushed the 480-improve-feature-importance-support-in-sklearn-khiops-estimators branch 3 times, most recently from 262381e to e5dbf4c Compare October 2, 2025 16:25
@popescu-v popescu-v self-assigned this Oct 2, 2025
@popescu-v popescu-v force-pushed the 480-improve-feature-importance-support-in-sklearn-khiops-estimators branch 2 times, most recently from 06c2784 to 230dd96 Compare October 3, 2025 13:33
@popescu-v popescu-v marked this pull request as ready for review October 3, 2025 13:35
@popescu-v popescu-v force-pushed the 480-improve-feature-importance-support-in-sklearn-khiops-estimators branch 3 times, most recently from 0052b91 to 7501724 Compare October 3, 2025 14:02
Comment thread khiops/sklearn/estimators.py Outdated
Comment thread khiops/sklearn/estimators.py Outdated
Comment thread tests/test_estimator_attributes.py Outdated
Comment thread khiops/sklearn/estimators.py Outdated
Comment thread khiops/sklearn/estimators.py Outdated
Comment thread doc/samples/samples_sklearn.rst Outdated
@popescu-v popescu-v force-pushed the 480-improve-feature-importance-support-in-sklearn-khiops-estimators branch 3 times, most recently from 3b1b81a to 254c962 Compare October 7, 2025 17:46
Comment thread khiops/sklearn/estimators.py Outdated
n_features : int, default 100
*Multi-table only* : Maximum number of multi-table aggregate features to
construct. See :doc:`/multi_table_primer` for more details.
Maximum number of features to construct. See :doc:`/multi_table_primer`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put "AutoML features"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid having "AutoML" and rather use "Maximum number of features to construct automatically".

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for me

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread khiops/sklearn/estimators.py Outdated
Comment thread CHANGELOG.md
Comment thread khiops/samples/samples_sklearn.py Outdated
Comment thread khiops/sklearn/estimators.py Outdated
Comment on lines +1562 to +1563
# feature_used_names_ is not set if no variable is selected in the model
feature_used_names = getattr(self, "feature_used_names_", [])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the else branch above.

@popescu-v popescu-v Oct 9, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, there is no need for an else here: either the condition in if modeling_report.selected_variables is not None is true, in which case feature_used_names_ is created and set, or the condition is false, in which case it is not. The getattr that follows the if block guarantees that feature_used_names is a list: either as retrieved from feature_used_names_ (which was set in the if block above), or the empty list.

@popescu-v popescu-v Oct 9, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But on second thought I see your point: we set the three attributes to empty / zero structures if there are no selected variables. Thusly, we can tap into the attributes directly afterwards, without any getattr needed. Hence, I'll do this (if we keep the importances).

Comment thread khiops/sklearn/estimators.py
Comment thread khiops/sklearn/estimators.py
This attribute quantifies the importance of each of the _input_
features, in their order of occurrence in the input dataset:
- if the feature is used, then its importance is retrieved from the
  report (as the average of its exact Shapley values across the training
  dataset)
- else, the importance is set to 0.0.
…ators

Indeed, these characterize the analysis process, not the resulting model
itself.
…ttribute

The level and weight of the features characterize the analysis process,
not the resulting model itself.
…ample

This enables a more even parallel with the (input) feature importances.
@popescu-v popescu-v force-pushed the 480-improve-feature-importance-support-in-sklearn-khiops-estimators branch from 254c962 to b7389c1 Compare October 9, 2025 15:59
Comment thread CHANGELOG.md Outdated
comments, and dictionary and variable block internal comments.
- (`core`) Dictionary `Rule` class and supporting API for serializing `Rule` instances.
- (`core`) New way to add a variable to a dictionary using a complete specification.
- (`core`) New API constants for rule names:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for rule names -> for rules used in automatic variable construction.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread khiops/sklearn/estimators.py
- separate construction rules into:
  - rules applied by default (`DEFAULT_CONSTRUCTION_RULES`);
  - calendar-related rules (`CALENDRICAL_CONSTRUCTION_RULES`);
- document the construction rules;
- fix the `construction_rules` parameter documentation in the Core API;
- fix the `n_features` parameter documentation the Sklearn estimator API;
- update the `feature_importances_` attribute documentation in the
  Sklearn estimator API accordingly.
@popescu-v popescu-v force-pushed the 480-improve-feature-importance-support-in-sklearn-khiops-estimators branch from b7389c1 to 1cfb7a6 Compare October 10, 2025 12:11
@popescu-v popescu-v force-pushed the 480-improve-feature-importance-support-in-sklearn-khiops-estimators branch from 1cfb7a6 to 9d19f9a Compare October 10, 2025 12:35
@popescu-v

Copy link
Copy Markdown
Collaborator Author

Not merging it as per the latest decisions (see #480 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Feature Importance Support in Sklearn Khiops Estimators

2 participants