Skip to content

allow passing additional arguments to scorers in dataops#1995

Merged
jeromedockes merged 15 commits into
skrub-data:mainfrom
jeromedockes:dataop-with-scoring
Apr 9, 2026
Merged

allow passing additional arguments to scorers in dataops#1995
jeromedockes merged 15 commits into
skrub-data:mainfrom
jeromedockes:dataop-with-scoring

Conversation

@jeromedockes
Copy link
Copy Markdown
Member

@jeromedockes jeromedockes commented Mar 26, 2026

We can now call .skb.with_scoring to define which scorer should be used and pass additional arguments to it (such as sample weights). those can be dataops (ie sample weights are computed dynamically from the inputs)

>>> import skrub
>>> from sklearn.dummy import DummyClassifier

>>> df = skrub.datasets.toy_products()
>>> data = skrub.var("df", df)
>>> X = data[["description", "price"]].skb.mark_as_X(cv=2)
>>> y = data["category"].skb.mark_as_y()
>>> sample_weight = X["price"]

>>> pred = X.skb.apply(DummyClassifier(), y=y).skb.with_scoring(
...     "accuracy", kwargs={"sample_weight": sample_weight}
... )

>>> pred.skb.cross_validate()
   fit_time  score_time  test_score
0  0.002466    0.003201    0.888889
1  0.002033    0.002878    0.647059

@jeromedockes jeromedockes changed the title allow passing additional arguments to scorers in dataopsfd allow passing additional arguments to scorers in dataops Mar 26, 2026
@jeromedockes jeromedockes force-pushed the dataop-with-scoring branch from a8a4d92 to 7435464 Compare April 1, 2026 08:19
@jeromedockes jeromedockes force-pushed the dataop-with-scoring branch from 7435464 to fb7e23f Compare April 1, 2026 08:58
@jeromedockes jeromedockes marked this pull request as ready for review April 1, 2026 16:14
@jeromedockes jeromedockes added the data_ops Something related to the skrub DataOps label Apr 2, 2026
Copy link
Copy Markdown
Member

@rcap107 rcap107 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR @jeromedockes, this will be very useful. I left a few comments for clairification (as usual), and a few small cosmetic changes. I think the main thing that is missing is an entry in the API reference. We might also want to update one of the examples in the gallery with the new method at some point.

Comment thread doc/modules/data_ops/validation/tuning_validating_data_ops.rst Outdated
Comment thread doc/modules/data_ops/validation/tuning_validating_data_ops.rst Outdated
Comment thread doc/modules/data_ops/validation/tuning_validating_data_ops.rst
Comment thread doc/modules/data_ops/validation/tuning_validating_data_ops.rst Outdated
Comment thread doc/modules/data_ops/validation/tuning_validating_data_ops.rst Outdated
Comment thread skrub/_data_ops/_skrub_namespace.py Outdated
Comment thread skrub/_data_ops/_skrub_namespace.py
Comment thread skrub/_data_ops/_skrub_namespace.py Outdated
Comment thread skrub/_data_ops/_skrub_namespace.py Outdated
Comment thread CHANGES.rst
@jeromedockes
Copy link
Copy Markdown
Member Author

thanks for the review @rcap107 , I checked the links and they should be good now

Copy link
Copy Markdown
Member

@rcap107 rcap107 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks a lot @jeromedockes

@jeromedockes
Copy link
Copy Markdown
Member Author

thanks for the reviews :)

@jeromedockes jeromedockes merged commit 29aa183 into skrub-data:main Apr 9, 2026
29 checks passed
@jeromedockes jeromedockes deleted the dataop-with-scoring branch April 9, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data_ops Something related to the skrub DataOps

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants