-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathCHANGES.html
More file actions
2196 lines (1956 loc) · 226 KB
/
CHANGES.html
File metadata and controls
2196 lines (1956 loc) · 226 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en" data-content_root="./" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<meta property="og:title" content="Release history" />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://skrub-data.github.io/stable/CHANGES.html" />
<meta property="og:site_name" content="skrub" />
<meta property="og:description" content="Release 0.8.0: New Features: The eager_data_ops configuration option has been added. When set to False, no previews are computed and validation is deferred until the DataOp is actually used (e.g. w..." />
<meta property="og:image" content="https://skrub-data.github.io/stable/_static/skrub.svg" />
<meta property="og:image:alt" content="skrub" />
<meta name="description" content="Release 0.8.0: New Features: The eager_data_ops configuration option has been added. When set to False, no previews are computed and validation is deferred until the DataOp is actually used (e.g. w..." />
<title>Release history — skrub</title>
<script data-cfasync="false">
document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
</script>
<!--
this give us a css class that will be invisible only if js is disabled
-->
<noscript>
<style>
.pst-js-only { display: none !important; }
</style>
</noscript>
<!-- Loaded before other Sphinx assets -->
<link href="_static/styles/theme.css?digest=8878045cc6db502f8baf" rel="stylesheet" />
<link href="_static/styles/pydata-sphinx-theme.css?digest=8878045cc6db502f8baf" rel="stylesheet" />
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=8f2a1f02" />
<link rel="stylesheet" type="text/css" href="_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Vibur" />
<link rel="stylesheet" type="text/css" href="_static/jupyterlite_sphinx.css?v=8ee2c72c" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery.css?v=d2d258e8" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-binder.css?v=f4aeca0c" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-dataframe.css?v=2082cf3c" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-rendered-html.css?v=1277b6f3" />
<link rel="stylesheet" type="text/css" href="_static/css/custom.css?v=7821ceae" />
<!-- So that users can add custom icons -->
<script src="_static/scripts/fontawesome.js?digest=8878045cc6db502f8baf"></script>
<!-- Pre-loaded scripts that we'll load fully later -->
<link rel="preload" as="script" href="_static/scripts/bootstrap.js?digest=8878045cc6db502f8baf" />
<link rel="preload" as="script" href="_static/scripts/pydata-sphinx-theme.js?digest=8878045cc6db502f8baf" />
<script src="_static/documentation_options.js?v=486e5634"></script>
<script src="_static/doctools.js?v=9bcbadda"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="_static/clipboard.min.js?v=a7894cd8"></script>
<script src="_static/copybutton.js?v=fd10adb8"></script>
<script src="_static/jupyterlite_sphinx.js?v=96e329c5"></script>
<script>DOCUMENTATION_OPTIONS.pagename = 'CHANGES';</script>
<script>
DOCUMENTATION_OPTIONS.theme_version = '0.16.1';
DOCUMENTATION_OPTIONS.theme_switcher_json_url = 'https://raw.githubusercontent.com/skrub-data/skrub/main/doc/version.json';
DOCUMENTATION_OPTIONS.theme_switcher_version_match = '0.8.0';
DOCUMENTATION_OPTIONS.show_version_warning_banner =
true;
</script>
<script src="_static/scripts/sg_plotly_resize.js?v=a751aa24"></script>
<link rel="canonical" href="https://skrub-data.org/stable/CHANGES.html" />
<link rel="icon" href="_static/skrub.svg"/>
<link rel="author" title="About these documents" href="about.html" />
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Development" href="development.html" />
<link rel="prev" title="Learning Materials" href="learning_materials.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
<meta name="docsearch:version" content="0.8.0" />
</head>
<body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
<div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
<div id="pst-scroll-pixel-helper"></div>
<button type="button" class="btn rounded-pill" id="pst-back-to-top">
<i class="fa-solid fa-arrow-up"></i>Back to top</button>
<dialog id="pst-search-dialog">
<form class="bd-search d-flex align-items-center"
action="search.html"
method="get">
<i class="fa-solid fa-magnifying-glass"></i>
<input type="search"
class="form-control"
name="q"
placeholder="Search the docs ..."
aria-label="Search the docs ..."
autocomplete="off"
autocorrect="off"
autocapitalize="off"
spellcheck="false"/>
<span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
</form>
</dialog>
<div class="pst-async-banner-revealer d-none">
<aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
</div>
<header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
<div class="bd-header__inner bd-page-width">
<button class="pst-navbar-icon sidebar-toggle primary-toggle" aria-label="Site navigation">
<span class="fa-solid fa-bars"></span>
</button>
<div class=" navbar-header-items__start">
<div class="navbar-item">
<a class="navbar-brand logo" href="index.html">
<img src="_static/skrub.svg" class="logo__image only-light" alt="skrub - Home"/>
<img src="_static/skrub.svg" class="logo__image only-dark pst-js-only" alt="skrub - Home"/>
</a></div>
</div>
<div class=" navbar-header-items">
<div class="me-auto navbar-header-items__center">
<div class="navbar-item">
<nav>
<ul class="bd-navbar-elements navbar-nav">
<li class="nav-item ">
<a class="nav-link nav-internal" href="install.html">
Install
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="documentation.html">
User Guide
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="reference/index.html">
API Reference
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="auto_examples/index.html">
Examples
</a>
</li>
<li class="nav-item dropdown">
<button class="btn dropdown-toggle nav-item" type="button"
data-bs-toggle="dropdown" aria-expanded="false"
aria-controls="pst-nav-more-links">
More
</button>
<ul id="pst-nav-more-links" class="dropdown-menu">
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="learning_materials.html">
Learning Materials
</a>
</li>
<li class=" current active">
<a class="nav-link dropdown-item nav-internal" href="#">
Release history
</a>
</li>
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="development.html">
Development
</a>
</li>
</ul>
</li>
</ul>
</nav></div>
</div>
<div class="navbar-header-items__end">
<div class="navbar-item navbar-persistent--container">
<button class="btn search-button-field search-button__button pst-js-only" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="fa-solid fa-magnifying-glass"></i>
<span class="search-button__default-text">Search</span>
<span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
</button>
</div>
<div class="navbar-item">
<div class="version-switcher__container dropdown pst-js-only">
<button id="pst-version-switcher-button-2"
type="button"
class="version-switcher__button btn btn-sm dropdown-toggle"
data-bs-toggle="dropdown"
aria-haspopup="listbox"
aria-controls="pst-version-switcher-list-2"
aria-label="Version switcher list"
>
Choose version <!-- this text may get changed later by javascript -->
<span class="caret"></span>
</button>
<div id="pst-version-switcher-list-2"
class="version-switcher__menu dropdown-menu list-group-flush py-0"
role="listbox" aria-labelledby="pst-version-switcher-button-2">
<!-- dropdown will be populated by javascript on page load -->
</div>
</div></div>
<div class="navbar-item">
<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button pst-js-only" aria-label="Color mode" data-bs-title="Color mode" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light" title="Light"></i>
<i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark" title="Dark"></i>
<i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto" title="System Settings"></i>
</button></div>
<div class="navbar-item"><ul class="navbar-icon-links"
aria-label="Icon Links">
<li class="nav-item">
<a href="https://github.com/skrub-data/skrub/" title="GitHub" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-github fa-lg" aria-hidden="true"></i>
<span class="sr-only">GitHub</span></a>
</li>
<li class="nav-item">
<a href="https://discord.gg/ABaPnm7fDC" title="Discord" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-discord fa-lg" aria-hidden="true"></i>
<span class="sr-only">Discord</span></a>
</li>
<li class="nav-item">
<a href="https://bsky.app/profile/skrub-data.bsky.social" title="Bluesky" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-bluesky fa-lg" aria-hidden="true"></i>
<span class="sr-only">Bluesky</span></a>
</li>
<li class="nav-item">
<a href="https://x.com/skrub_data" title="X (ex-Twitter)" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-x-twitter fa-lg" aria-hidden="true"></i>
<span class="sr-only">X (ex-Twitter)</span></a>
</li>
</ul></div>
</div>
</div>
<div class="navbar-persistent--mobile">
<button class="btn search-button-field search-button__button pst-js-only" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="fa-solid fa-magnifying-glass"></i>
<span class="search-button__default-text">Search</span>
<span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
</button>
</div>
<button class="pst-navbar-icon sidebar-toggle secondary-toggle" aria-label="On this page">
<span class="fa-solid fa-outdent"></span>
</button>
</div>
</header>
<div class="bd-container">
<div class="bd-container__inner bd-page-width">
<dialog id="pst-primary-sidebar-modal"></dialog>
<div id="pst-primary-sidebar" class="bd-sidebar-primary bd-sidebar hide-on-wide">
<div class="sidebar-header-items sidebar-primary__section">
<div class="sidebar-header-items__center">
<div class="navbar-item">
<nav>
<ul class="bd-navbar-elements navbar-nav">
<li class="nav-item ">
<a class="nav-link nav-internal" href="install.html">
Install
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="documentation.html">
User Guide
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="reference/index.html">
API Reference
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="auto_examples/index.html">
Examples
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="learning_materials.html">
Learning Materials
</a>
</li>
<li class="nav-item current active">
<a class="nav-link nav-internal" href="#">
Release history
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="development.html">
Development
</a>
</li>
</ul>
</nav></div>
</div>
<div class="sidebar-header-items__end">
<div class="navbar-item">
<div class="version-switcher__container dropdown pst-js-only">
<button id="pst-version-switcher-button-3"
type="button"
class="version-switcher__button btn btn-sm dropdown-toggle"
data-bs-toggle="dropdown"
aria-haspopup="listbox"
aria-controls="pst-version-switcher-list-3"
aria-label="Version switcher list"
>
Choose version <!-- this text may get changed later by javascript -->
<span class="caret"></span>
</button>
<div id="pst-version-switcher-list-3"
class="version-switcher__menu dropdown-menu list-group-flush py-0"
role="listbox" aria-labelledby="pst-version-switcher-button-3">
<!-- dropdown will be populated by javascript on page load -->
</div>
</div></div>
<div class="navbar-item">
<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button pst-js-only" aria-label="Color mode" data-bs-title="Color mode" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light" title="Light"></i>
<i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark" title="Dark"></i>
<i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto" title="System Settings"></i>
</button></div>
<div class="navbar-item"><ul class="navbar-icon-links"
aria-label="Icon Links">
<li class="nav-item">
<a href="https://github.com/skrub-data/skrub/" title="GitHub" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-github fa-lg" aria-hidden="true"></i>
<span class="sr-only">GitHub</span></a>
</li>
<li class="nav-item">
<a href="https://discord.gg/ABaPnm7fDC" title="Discord" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-discord fa-lg" aria-hidden="true"></i>
<span class="sr-only">Discord</span></a>
</li>
<li class="nav-item">
<a href="https://bsky.app/profile/skrub-data.bsky.social" title="Bluesky" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-bluesky fa-lg" aria-hidden="true"></i>
<span class="sr-only">Bluesky</span></a>
</li>
<li class="nav-item">
<a href="https://x.com/skrub_data" title="X (ex-Twitter)" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-x-twitter fa-lg" aria-hidden="true"></i>
<span class="sr-only">X (ex-Twitter)</span></a>
</li>
</ul></div>
</div>
</div>
<div class="sidebar-primary-items__end sidebar-primary__section">
<div class="sidebar-primary-item">
<div id="ethical-ad-placement"
class="flat"
data-ea-publisher="readthedocs"
data-ea-type="readthedocs-sidebar"
data-ea-manual="true">
</div></div>
</div>
</div>
<main id="main-content" class="bd-main" role="main">
<div class="bd-content">
<div class="bd-article-container">
<div class="bd-header-article d-print-none">
<div class="header-article-items header-article__inner">
<div class="header-article-items__start">
<div class="header-article-item">
<nav aria-label="Breadcrumb" class="d-print-none">
<ul class="bd-breadcrumbs">
<li class="breadcrumb-item breadcrumb-home">
<a href="index.html" class="nav-link" aria-label="Home">
<i class="fa-solid fa-home"></i>
</a>
</li>
<li class="breadcrumb-item active" aria-current="page"><span class="ellipsis">Release history</span></li>
</ul>
</nav>
</div>
</div>
</div>
</div>
<div id="searchbox"></div>
<article class="bd-article">
<section id="release-history">
<span id="changes"></span><h1>Release history<a class="headerlink" href="#release-history" title="Link to this heading">#</a></h1>
<section id="release-0-8-0">
<h2>Release 0.8.0<a class="headerlink" href="#release-0-8-0" title="Link to this heading">#</a></h2>
<section id="new-features">
<h3>New Features<a class="headerlink" href="#new-features" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>The <code class="docutils literal notranslate"><span class="pre">eager_data_ops</span></code> <a class="reference internal" href="modules/configuration_and_utils/customizing_configuration.html#user-guide-configuration-parameters"><span class="std std-ref">configuration</span></a> option has been added. When set to
False, no previews are computed and validation is deferred until the DataOp is
actually used (e.g. with <code class="docutils literal notranslate"><span class="pre">.skb.eval()</span></code>) rather than as soon as it is
defined. This can make the definition of complex DataOps with many nodes
faster (the overhead it removes typically becomes noticeable only in DataOps
with 50-100 nodes or more). Moreover, the evaluation of large DataOps has also
become faster. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1890">#1890</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>The reports produced by <a class="reference internal" href="reference/generated/skrub.DataOp.skb.full_report.html#skrub.DataOp.skb.full_report" title="skrub.DataOp.skb.full_report"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.full_report()</span></code></a> and
<a class="reference internal" href="reference/generated/skrub.SkrubLearner.html#skrub.SkrubLearner.report" title="skrub.SkrubLearner.report"><code class="xref py py-meth docutils literal notranslate"><span class="pre">SkrubLearner.report()</span></code></a> now also display the values provided in the
environment. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1920">#1920</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.SkrubLearner.html#skrub.SkrubLearner" title="skrub.SkrubLearner"><code class="xref py py-class docutils literal notranslate"><span class="pre">SkrubLearner</span></code></a>, <a class="reference internal" href="reference/generated/skrub.ParamSearch.html#skrub.ParamSearch" title="skrub.ParamSearch"><code class="xref py py-class docutils literal notranslate"><span class="pre">ParamSearch</span></code></a> and <a class="reference internal" href="reference/generated/skrub.OptunaParamSearch.html#skrub.OptunaParamSearch" title="skrub.OptunaParamSearch"><code class="xref py py-class docutils literal notranslate"><span class="pre">OptunaParamSearch</span></code></a> expose
some more attributes for inspection by scikit-learn: <code class="docutils literal notranslate"><span class="pre">__sklearn_tags__</span></code>,
<code class="docutils literal notranslate"><span class="pre">classes_</span></code>, <code class="docutils literal notranslate"><span class="pre">_estimator_type</span></code>. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1931">#1931</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>It is now possible to pass additional (dynamically computed) arguments to the
cross-validation splitter used by <a class="reference internal" href="reference/generated/skrub.DataOp.html#skrub.DataOp" title="skrub.DataOp"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataOp</span></code></a> objects for validation,
hyperparameter search etc. For example, the groups for a
<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html#sklearn.model_selection.GroupKFold" title="(in scikit-learn v1.8)"><code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.model_selection.GroupKFold</span></code></a> can be computed as part of the
DataOp evaluation and used for splitting. This is achieved by passing the
splitter and its arguments to <a class="reference internal" href="reference/generated/skrub.DataOp.skb.mark_as_X.html#skrub.DataOp.skb.mark_as_X" title="skrub.DataOp.skb.mark_as_X"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.mark_as_X()</span></code></a>. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1943">#1943</a> by
<a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.selectors.has_nulls.html#skrub.selectors.has_nulls" title="skrub.selectors.has_nulls"><code class="xref py py-func docutils literal notranslate"><span class="pre">selectors.has_nulls()</span></code></a> now takes a <code class="docutils literal notranslate"><span class="pre">proportion</span></code> parameter, which allows
selecting columns that have a fraction of null values above the given threshold.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1881">#1881</a> by <a class="reference external" href="https://github.com/gabrielapgomezji">Gabriela Gómez Jiménez</a>.</p></li>
</ul>
</section>
<section id="id1">
<h3>Changes<a class="headerlink" href="#id1" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>Increased the minimum version of polars from 0.20 to 1.5.0.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1897">#1897</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">ApplyToCols</span></code> and <code class="docutils literal notranslate"><span class="pre">ApplyToFrame</span></code> have been merged into a single class,
<a class="reference internal" href="reference/generated/skrub.ApplyToCols.html#skrub.ApplyToCols" title="skrub.ApplyToCols"><code class="xref py py-class docutils literal notranslate"><span class="pre">ApplyToCols</span></code></a>,that covers the functionality of both the old classes by
detecting automatically whether the provided transformer should be applied
independently on each column, or on all selected columns as a single dataframe.
As a result, <code class="docutils literal notranslate"><span class="pre">ApplyToCols</span></code> and <code class="docutils literal notranslate"><span class="pre">ApplyToFrame</span></code> have been removed.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1913">#1913</a>, <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1919">#1919</a> and <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1962">#1962</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>The dataset fetcher functions now include a “path” field for each table in the dataset.
For example, the dataset “employee_salaries” now has the field <code class="docutils literal notranslate"><span class="pre">employee_salaries_path</span></code>.
Additionally, datasets that include a single table have the field <code class="docutils literal notranslate"><span class="pre">path</span></code>. These
fields contain the paths to the datasets stored in the <code class="docutils literal notranslate"><span class="pre">skrub_data</span></code> folder.
The default <code class="docutils literal notranslate"><span class="pre">skrub_data</span></code> folder can now be set in the skrub configuration and by setting
the <code class="docutils literal notranslate"><span class="pre">SKB_DATA_DIRECTORY</span></code> environment variable. The environment variable <code class="docutils literal notranslate"><span class="pre">SKRUB_DATA_DIRECTORY</span></code>
is deprecated and will be removed in a future version of skrub.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1852">#1852</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>. Examples in the gallery have
been updated accordingly in <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1940">#1940</a> and <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1964">#1964</a> by <a class="reference external" href="https://github.com/MuditAtrey">MuditAtrey</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.core.SingleColumnTransformer.html#skrub.core.SingleColumnTransformer" title="skrub.core.SingleColumnTransformer"><code class="xref py py-class docutils literal notranslate"><span class="pre">SingleColumnTransformer</span></code></a> and associated exception
<a class="reference internal" href="reference/generated/skrub.core.RejectColumn.html#skrub.core.RejectColumn" title="skrub.core.RejectColumn"><code class="xref py py-class docutils literal notranslate"><span class="pre">RejectColumn</span></code></a> (used internally by many skrub estimators) have
been added to the public API, in the newly-created <code class="docutils literal notranslate"><span class="pre">skrub.core</span></code> module.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1851">#1851</a> by <a class="reference external" href="https://github.com/emassoulie">Eloi Massoulié</a>.</p></li>
<li><p>Added the strings <code class="docutils literal notranslate"><span class="pre">"None"</span></code> and <code class="docutils literal notranslate"><span class="pre">"none"</span></code> to the list of null string values in
<a class="reference internal" href="reference/generated/skrub.Cleaner.html#skrub.Cleaner" title="skrub.Cleaner"><code class="xref py py-class docutils literal notranslate"><span class="pre">Cleaner</span></code></a>. Also, exposed the list of null string values that will be set
to null by the <a class="reference internal" href="reference/generated/skrub.Cleaner.html#skrub.Cleaner" title="skrub.Cleaner"><code class="xref py py-class docutils literal notranslate"><span class="pre">Cleaner</span></code></a> as the parameter <code class="docutils literal notranslate"><span class="pre">null_strings</span></code>.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1952">#1952</a> and <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1954">#1954</a> by <a class="reference external" href="https://github.com/lisaleemcb">Lisa McBride</a>.</p></li>
<li><p>The configuration parameter “use_table_report” has been removed from the skrub
configuration. Use <a class="reference internal" href="reference/generated/skrub.patch_display.html#skrub.patch_display" title="skrub.patch_display"><code class="xref py py-meth docutils literal notranslate"><span class="pre">patch_display()</span></code></a> instead.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1973">#1973</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>Updated how the <code class="docutils literal notranslate"><span class="pre">column_filters</span></code> parameter of <a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> works.
It now accepts a dictionary where the key is the display name for the
dropdown menu, and the value is a filter of the columns that will be displayed.
Accepts either a list of column indices, a list of column names
or an instance of the <code class="xref py py-class docutils literal notranslate"><span class="pre">Selector</span></code>.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1976">#1976</a> by <a class="reference external" href="https://github.com/lisaleemcb">Lisa McBride</a>.</p></li>
<li><p>The overplotting of the counts atop the vertical histogram bars in the
<a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> has been removed due to formatting issues.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1984">#1984</a> by <a class="reference external" href="https://github.com/lisaleemcb">Lisa McBride</a>.</p></li>
</ul>
</section>
<section id="bug-fixes">
<h3>Bug Fixes<a class="headerlink" href="#bug-fixes" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>The <a class="reference internal" href="reference/generated/skrub.TableVectorizer.html#skrub.TableVectorizer" title="skrub.TableVectorizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableVectorizer</span></code></a> now correctly handles the case where one of the
provided encoders is a scikit-learn Pipeline that starts with a skrub
single-column transformer. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1899">#1899</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>
and <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1900">#1900</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>Errors raised when a polars LazyFrame is passed where an eager DataFrame is
expected are now clearer. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1916">#1916</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.DataOp.skb.cross_validate.html#skrub.DataOp.skb.cross_validate" title="skrub.DataOp.skb.cross_validate"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.cross_validate()</span></code></a> would raise an error when passed
<code class="docutils literal notranslate"><span class="pre">return_indices=True</span></code>. Now it returns the train and test indices of each
fold in the <code class="docutils literal notranslate"><span class="pre">train_indices</span></code> and <code class="docutils literal notranslate"><span class="pre">test_indices</span></code> columns of the result
dataframe. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1953">#1953</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>Polars LazyFrames are no longer collected automatically anywhere in the library;
a <code class="docutils literal notranslate"><span class="pre">TypeError</span></code> is now raised instead.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1941">#1941</a> by <a class="reference external" href="https://github.com/MuditAtrey">Mudit Atrey</a>.</p></li>
</ul>
</section>
</section>
<section id="release-0-7-2">
<h2>Release 0.7.2<a class="headerlink" href="#release-0-7-2" title="Link to this heading">#</a></h2>
<section id="id2">
<h3>Changes<a class="headerlink" href="#id2" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>The <a class="reference internal" href="reference/generated/skrub.StringEncoder.html#skrub.StringEncoder" title="skrub.StringEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">StringEncoder</span></code></a> now exposes the <code class="docutils literal notranslate"><span class="pre">vocabulary</span></code> parameter from the parent
<code class="xref py py-class docutils literal notranslate"><span class="pre">TfidfVectorizer</span></code>.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1819">#1819</a> by <a class="reference external" href="https://github.com/emassoulie">Eloi Massoulié</a></p></li>
<li><p><code class="xref py py-func docutils literal notranslate"><span class="pre">compute_ngram_distance()</span></code> has been renamed to <code class="xref py py-func docutils literal notranslate"><span class="pre">_compute_ngram_distance()</span></code> and is now a private function.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1838">#1838</a> by <a class="reference external" href="https://github.com/siddharthbaleja">Siddharth Baleja</a>.</p></li>
<li><p>The repository wheel has been made smaller by removing some material that was
not necessary for using the library. Benchmarks are now available in a separate
<a class="reference external" href="https://github.com/skrub-data/skrub-benchmarks">repository</a>.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1893">#1893</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
</ul>
</section>
<section id="bugfixes">
<h3>Bugfixes<a class="headerlink" href="#bugfixes" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>Fixed some issues related to the release of Pandas 3.0. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1855">#1855</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
</ul>
</section>
</section>
<section id="release-0-7-1">
<h2>Release 0.7.1<a class="headerlink" href="#release-0-7-1" title="Link to this heading">#</a></h2>
<section id="id3">
<h3>New features<a class="headerlink" href="#id3" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>A new dataset, <code class="xref py py-func docutils literal notranslate"><span class="pre">fetch_california_housing()</span></code>, has been added to the
<code class="xref py py-mod docutils literal notranslate"><span class="pre">skrub.datasets</span></code> module. It allows to get a redundancy copy of the scikit-learn
<code class="xref py py-func docutils literal notranslate"><span class="pre">fetch_california_housing()</span></code> function.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1830">#1830</a> by <a class="reference external" href="https://github.com/glemaitre">Guillaume Lemaitre</a>.</p></li>
</ul>
</section>
<section id="id4">
<h3>Bugfixes<a class="headerlink" href="#id4" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p><a class="reference internal" href="reference/generated/skrub.DropCols.html#skrub.DropCols" title="skrub.DropCols"><code class="xref py py-class docutils literal notranslate"><span class="pre">DropCols</span></code></a> and <code class="xref py py-class docutils literal notranslate"><span class="pre">SelectCols:</span></code> attributes were renamed to end
with an underscore, in order to follow a scikit-learn convention which is
used to determine if an estimator is fitted. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1813">#1813</a> by <a class="reference external" href="https://github.com/auguste-probabl">Auguste
Baum</a>.</p></li>
</ul>
</section>
</section>
<section id="release-0-7-0">
<h2>Release 0.7.0<a class="headerlink" href="#release-0-7-0" title="Link to this heading">#</a></h2>
<section id="id5">
<h3>New features<a class="headerlink" href="#id5" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>It is now possible to tune the choices in a <a class="reference internal" href="reference/generated/skrub.DataOp.html#skrub.DataOp" title="skrub.DataOp"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataOp</span></code></a> with <a class="reference external" href="https://optuna.readthedocs.io/en/stable/">Optuna</a>. See
<a class="reference internal" href="auto_examples/data_ops/1131_optuna_choices.html#example-optuna-choices"><span class="std std-ref">Tuning DataOps with Optuna</span></a> for an example.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1661">#1661</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.DataOp.skb.apply.html#skrub.DataOp.skb.apply" title="skrub.DataOp.skb.apply"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.apply()</span></code></a> now allows passing extra named arguments to the
estimator’s methods through the parameters <code class="docutils literal notranslate"><span class="pre">fit_kwargs</span></code>, <code class="docutils literal notranslate"><span class="pre">predict_kwargs</span></code>
etc. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1642">#1642</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>TableReport now displays the mean statistic for boolean columns.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1647">#1647</a> by <a class="reference external" href="https://github.com/abenechehab">Abdelhakim Benechehab</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.DataOp.skb.get_vars.html#skrub.DataOp.skb.get_vars" title="skrub.DataOp.skb.get_vars"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.get_vars()</span></code></a> allows inspecting all the variables, or all the
named dataops, in a <a class="reference internal" href="reference/generated/skrub.DataOp.html#skrub.DataOp" title="skrub.DataOp"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataOp</span></code></a>. This lets us easily know what keys should
be present in the <code class="docutils literal notranslate"><span class="pre">environment</span></code> dictionary we pass to
<a class="reference internal" href="reference/generated/skrub.DataOp.skb.eval.html#skrub.DataOp.skb.eval" title="skrub.DataOp.skb.eval"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.eval()</span></code></a> or to <code class="xref py py-meth docutils literal notranslate"><span class="pre">SkrubLearner.fit()</span></code>,
<code class="xref py py-meth docutils literal notranslate"><span class="pre">SkrubLearner.predict()</span></code>, etc.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1646">#1646</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.DataOp.skb.iter_cv_splits.html#skrub.DataOp.skb.iter_cv_splits" title="skrub.DataOp.skb.iter_cv_splits"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.iter_cv_splits()</span></code></a> iterates over the training and testing
environments produced by a CV splitter – similar to
<a class="reference internal" href="reference/generated/skrub.DataOp.skb.train_test_split.html#skrub.DataOp.skb.train_test_split" title="skrub.DataOp.skb.train_test_split"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.train_test_split()</span></code></a> but for multiple cross-validation splits.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1653">#1653</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> now supports <code class="docutils literal notranslate"><span class="pre">np.array</span></code>. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1676">#1676</a> by <a class="reference external" href="https://github.com/Nismamjad1">Nisma Amjad</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.DataOp.skb.full_report.html#skrub.DataOp.skb.full_report" title="skrub.DataOp.skb.full_report"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.full_report()</span></code></a> now accepts a new parameter, <code class="docutils literal notranslate"><span class="pre">title</span></code>, that is displayed
in the html report.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1654">#1654</a> by <a class="reference external" href="https://github.com/MarieSacksick">Marie Sacksick</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> now includes the <code class="docutils literal notranslate"><span class="pre">open_tab</span></code> parameter, which lets the
user select which tab should be opened when the <code class="docutils literal notranslate"><span class="pre">TableReport</span></code> is
rendered. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1737">#1737</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.selectors.Selector.html#skrub.selectors.Selector" title="skrub.selectors.Selector"><code class="xref py py-class docutils literal notranslate"><span class="pre">selectors.Selector</span></code></a> now has documentation for its <a class="reference internal" href="reference/generated/skrub.selectors.Selector.html#skrub.selectors.Selector.expand" title="skrub.selectors.Selector.expand"><code class="xref py py-meth docutils literal notranslate"><span class="pre">selectors.Selector.expand()</span></code></a>
and <a class="reference internal" href="reference/generated/skrub.selectors.Selector.html#skrub.selectors.Selector.expand_index" title="skrub.selectors.Selector.expand_index"><code class="xref py py-meth docutils literal notranslate"><span class="pre">selectors.Selector.expand_index()</span></code></a> methods, with added information and examples
in the user guide, as well as mentions in the corresponding constructor functions.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1841">#1841</a> by <a class="reference external" href="https://github.com/emassoulie">Eloi Massoulié</a>.</p></li>
</ul>
</section>
<section id="id6">
<h3>Changes<a class="headerlink" href="#id6" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>The minimum supported version of Python has been increased to 3.10. Additionally,
the minimum supported versions of scikit-learn and requests are 1.4.2 and 2.27.1
respectively. Support for python 3.14 has been added.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1572">#1572</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>The <a class="reference internal" href="reference/generated/skrub.DataOp.skb.full_report.html#skrub.DataOp.skb.full_report" title="skrub.DataOp.skb.full_report"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.full_report()</span></code></a> method now deletes reports created with
<code class="docutils literal notranslate"><span class="pre">output_dir=None</span></code> after 7 days. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1657">#1657</a> by <a class="reference external" href="https://github.com/simon.dierickx">Simon Dierickx</a>.</p></li>
<li><p>The <a class="reference internal" href="reference/generated/skrub.tabular_pipeline.html#skrub.tabular_pipeline" title="skrub.tabular_pipeline"><code class="xref py py-func docutils literal notranslate"><span class="pre">tabular_pipeline()</span></code></a> uses a <a class="reference internal" href="reference/generated/skrub.SquashingScaler.html#skrub.SquashingScaler" title="skrub.SquashingScaler"><code class="xref py py-class docutils literal notranslate"><span class="pre">SquashingScaler</span></code></a> instead of a
<code class="xref py py-class docutils literal notranslate"><span class="pre">StandardScaler</span></code> for centering and scaling numerical features
when linear models are used.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1644">#1644</a> by <a class="reference external" href="https://github.com/dierickxsimon">Simon Dierickx</a></p></li>
<li><p>The transformer <a class="reference internal" href="reference/generated/skrub.ToFloat.html#skrub.ToFloat" title="skrub.ToFloat"><code class="xref py py-class docutils literal notranslate"><span class="pre">ToFloat</span></code></a>, previously called <code class="docutils literal notranslate"><span class="pre">ToFloat32</span></code>, is now public.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1687">#1687</a> by <a class="reference external" href="https://github.com/MarieSacksick">Marie Sacksick</a>.</p></li>
<li><p>Improved the error message raised when a Polars lazyframe is passed to
<a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a>, clarifying that <code class="docutils literal notranslate"><span class="pre">.collect()</span></code> must be called first.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1767">#1767</a> by <a class="reference external" href="https://github.com/fatiben2002">Fatima Ben Kadour</a>.</p></li>
<li><p>Computing the associations in <a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> is now deterministic and can
be controlled by the new parameter <code class="docutils literal notranslate"><span class="pre">subsampling_seed</span></code> of the global configuration.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1775">#1775</a> by <a class="reference external" href="https://github.com/thomass-dev">Thomas S.</a>.</p></li>
<li><p>Added <code class="docutils literal notranslate"><span class="pre">cast_to_str</span></code> parameter to <a class="reference internal" href="reference/generated/skrub.Cleaner.html#skrub.Cleaner" title="skrub.Cleaner"><code class="xref py py-class docutils literal notranslate"><span class="pre">Cleaner</span></code></a> to prevent unintended
conversion of list/object-like columns to strings unless explicitly enabled.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1789">#1789</a> by <a class="reference external" href="https://github.com/PilliSiddharth">@PilliSiddharth</a>.</p></li>
</ul>
</section>
<section id="id7">
<h3>Bugfixes<a class="headerlink" href="#id7" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>The <a class="reference internal" href="reference/generated/skrub.cross_validate.html#skrub.cross_validate" title="skrub.cross_validate"><code class="xref py py-meth docutils literal notranslate"><span class="pre">skrub.cross_validate()</span></code></a> function now raises a specific exception if the wrong variable
type is passed.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1799">#1799</a> by <a class="reference external" href="https://github.com/emassoulie">Eloi Massoulié</a></p></li>
<li><p>Fixed various issues with some transformers by adding <code class="docutils literal notranslate"><span class="pre">get_feature_names_out</span></code>
to all single column transformers.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1666">#1666</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>Issues occurring when <a class="reference internal" href="reference/generated/skrub.DataOp.skb.apply.html#skrub.DataOp.skb.apply" title="skrub.DataOp.skb.apply"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.apply()</span></code></a> was passed a DataOp as the
estimator have been fixed in <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1671">#1671</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> could raise an error while trying to check if Polars
columns with some dtypes (lists, structs) are sorted. It would not indicate
Polars columns sorted in descending order. Fixed in <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1673">#1673</a> by
<a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>Fixed nightly checks and added support for upcoming library versions, including Pandas
v3.0. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1664">#1664</a> by <a class="reference external" href="https://github.com/auguste-probabl">Auguste Baum</a> and
<a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>Fixed the use of <a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> and <a class="reference internal" href="reference/generated/skrub.Cleaner.html#skrub.Cleaner" title="skrub.Cleaner"><code class="xref py py-class docutils literal notranslate"><span class="pre">Cleaner</span></code></a> with Polars dataframes
containing a column with empty string as name.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1722">#1722</a> by <a class="reference external" href="https://github.com/MarieSacksick">Marie Sacksick</a>.</p></li>
<li><p>Fixed an issue where <a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> would fail when computing associations
for Polars dataframes if PyArrow was not installed.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1742">#1742</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>Fixed an issue in the Data Ops report generation in cases where the DataOp
contained escape characters or were spanning multiple lines.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1764">#1764</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>Added <code class="xref py py-meth docutils literal notranslate"><span class="pre">get_feature_names_out()</span></code> to <a class="reference internal" href="reference/generated/skrub.Cleaner.html#skrub.Cleaner" title="skrub.Cleaner"><code class="xref py py-class docutils literal notranslate"><span class="pre">Cleaner</span></code></a> for consistency with the
<a class="reference internal" href="reference/generated/skrub.TableVectorizer.html#skrub.TableVectorizer" title="skrub.TableVectorizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableVectorizer</span></code></a> and other transformers. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1762">#1762</a> by
<a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>Improve error message when <a class="reference internal" href="reference/generated/skrub.TextEncoder.html#skrub.TextEncoder" title="skrub.TextEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">TextEncoder</span></code></a> is used without the optional
transformers dependencies. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1769">#1769</a> by <a class="reference external" href="https://github.com/fxzhou22">Fangxuan Zhou</a>.</p></li>
<li><p>Accessing <code class="docutils literal notranslate"><span class="pre">.skb.applied_estimator</span></code> on a <a class="reference internal" href="reference/generated/skrub.DataOp.html#skrub.DataOp" title="skrub.DataOp"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataOp</span></code></a> after calling
<code class="docutils literal notranslate"><span class="pre">.skb.set_name()</span></code>, <code class="docutils literal notranslate"><span class="pre">.skb.set_description()</span></code>, <code class="docutils literal notranslate"><span class="pre">.skb.mark_as_X()</span></code> or
<code class="docutils literal notranslate"><span class="pre">.skb.mark_as_y()</span></code> used to raise an error, this has been fixed in <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1782">#1782</a>
by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>Fixed potential issues that could arise in <a class="reference internal" href="reference/generated/skrub.ParamSearch.html#skrub.ParamSearch.plot_results" title="skrub.ParamSearch.plot_results"><code class="xref py py-meth docutils literal notranslate"><span class="pre">ParamSearch.plot_results()</span></code></a>
when NaN values were present in the cross-validation results.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1800">#1800</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
</ul>
</section>
</section>
<section id="release-0-6-2">
<h2>Release 0.6.2<a class="headerlink" href="#release-0-6-2" title="Link to this heading">#</a></h2>
<section id="id8">
<h3>New features<a class="headerlink" href="#id8" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>The <a class="reference internal" href="reference/generated/skrub.DataOp.skb.full_report.html#skrub.DataOp.skb.full_report" title="skrub.DataOp.skb.full_report"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.full_report()</span></code></a> now displays the time each node took to
evaluate. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1596">#1596</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
</ul>
</section>
<section id="id9">
<h3>Changes<a class="headerlink" href="#id9" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>Ken embeddings are now deprecated, the functions <code class="xref py py-func docutils literal notranslate"><span class="pre">datasets.get_ken_embeddings()</span></code>,
<code class="xref py py-func docutils literal notranslate"><span class="pre">datasets.get_ken_table_aliases()</span></code>, and <code class="xref py py-func docutils literal notranslate"><span class="pre">datasets.get_ken_types()</span></code> will be
removed in the next release of skrub.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1546">#1546</a> by <a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a>.</p></li>
<li><p>Improved error messages when a DataOp is being sent to dispatched functions.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1607">#1607</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>The accepted values for the parameter <code class="docutils literal notranslate"><span class="pre">how</span></code> of <a class="reference internal" href="reference/generated/skrub.DataOp.skb.apply.html#skrub.DataOp.skb.apply" title="skrub.DataOp.skb.apply"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.apply()</span></code></a> have
changed. The new values are <code class="docutils literal notranslate"><span class="pre">"auto"</span></code> (unchanged), <code class="docutils literal notranslate"><span class="pre">"cols"</span></code> to wrap the
transformer in <a class="reference internal" href="reference/generated/skrub.ApplyToCols.html#skrub.ApplyToCols" title="skrub.ApplyToCols"><code class="xref py py-class docutils literal notranslate"><span class="pre">ApplyToCols</span></code></a>, <code class="docutils literal notranslate"><span class="pre">"frame"</span></code> to wrap the transformer in
<code class="xref py py-class docutils literal notranslate"><span class="pre">ApplyToFrame</span></code>, or <code class="docutils literal notranslate"><span class="pre">"no_wrap"</span></code> for no wrapping. The old values are
deprecated and will result in an error in a future release.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1628">#1628</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>The parameter <code class="docutils literal notranslate"><span class="pre">splitter</span></code> of <a class="reference internal" href="reference/generated/skrub.DataOp.skb.train_test_split.html#skrub.DataOp.skb.train_test_split" title="skrub.DataOp.skb.train_test_split"><code class="xref py py-meth docutils literal notranslate"><span class="pre">DataOp.skb.train_test_split()</span></code></a> has been
renamed <code class="docutils literal notranslate"><span class="pre">split_func</span></code>. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1630">#1630</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>KEN embeddings and all the relevant functions have been removed from skrub.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1567">#1567</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>The objects <code class="docutils literal notranslate"><span class="pre">tabular_learner</span></code> and <code class="docutils literal notranslate"><span class="pre">DropIfTooManyNulls</span></code> were removed. Use
<a class="reference internal" href="reference/generated/skrub.tabular_pipeline.html#skrub.tabular_pipeline" title="skrub.tabular_pipeline"><code class="xref py py-func docutils literal notranslate"><span class="pre">tabular_pipeline()</span></code></a> and <a class="reference internal" href="reference/generated/skrub.DropUninformative.html#skrub.DropUninformative" title="skrub.DropUninformative"><code class="xref py py-class docutils literal notranslate"><span class="pre">DropUninformative</span></code></a> instead.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1567">#1567</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>The skrub global configuration now includes a parameter for setting the default
verbosity of the <a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a>.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1567">#1567</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
</ul>
</section>
<section id="id10">
<h3>Bugfixes<a class="headerlink" href="#id10" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>Fixed a compatibility bug with Polars 1.32.3 that may cause <cite>ToFloat32</cite> to fail
when applied to categorical columns. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1570">#1570</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>Fixed the display of DataOp objects in google colab cell outputs (no output
was displayed). <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1590">#1590</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>Fixed an error that occurred when using <code class="docutils literal notranslate"><span class="pre">.skb.concat</span></code> with a pandas dataframe
with column names that aren’t strings. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1594">#1594</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>Fixed the range from which <a class="reference internal" href="reference/generated/skrub.choose_float.html#skrub.choose_float" title="skrub.choose_float"><code class="xref py py-func docutils literal notranslate"><span class="pre">choose_float()</span></code></a> and <a class="reference internal" href="reference/generated/skrub.choose_int.html#skrub.choose_int" title="skrub.choose_int"><code class="xref py py-func docutils literal notranslate"><span class="pre">choose_int()</span></code></a> sample
values when <code class="docutils literal notranslate"><span class="pre">log=False</span></code> and <code class="docutils literal notranslate"><span class="pre">n_steps</span></code> is <code class="docutils literal notranslate"><span class="pre">None</span></code>. It was between <code class="docutils literal notranslate"><span class="pre">low</span></code>
and <code class="docutils literal notranslate"><span class="pre">low</span> <span class="pre">+</span> <span class="pre">high</span></code>, now it is between <code class="docutils literal notranslate"><span class="pre">low</span></code> and <code class="docutils literal notranslate"><span class="pre">high</span></code>. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1603">#1603</a> by
<a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>DataOp hyperparameter search would raise an error when doing classification
and using the <code class="docutils literal notranslate"><span class="pre">scoring</span></code> parameter, when the dataop contained no variables.
Fixed in <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1601">#1601</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.SkrubLearner.html#skrub.SkrubLearner" title="skrub.SkrubLearner"><code class="xref py py-class docutils literal notranslate"><span class="pre">SkrubLearner</span></code></a> used to do a prediction on the train set during
<code class="docutils literal notranslate"><span class="pre">fit()</span></code>, this has been fixed.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1610">#1610</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.DataOp.html#skrub.DataOp" title="skrub.DataOp"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataOp</span></code></a> would raise errors when containing subclasses of list, tuple
or dict that cannot be initialized with an instance of the builtin type (such
as classes created by <code class="docutils literal notranslate"><span class="pre">collections.namedtuple</span></code>), this has been fixed.
DataOps now only recurse into the builtin collections to evaluate their items
(not into their subclasses). If you need the items evaluated (ie if they
contain DataOps or Choices), store them in one of the builtin collections.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1612">#1612</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.SkrubLearner.html#skrub.SkrubLearner.report" title="skrub.SkrubLearner.report"><code class="xref py py-meth docutils literal notranslate"><span class="pre">SkrubLearner.report()</span></code></a> with <code class="docutils literal notranslate"><span class="pre">mode="fit"</span></code> used to display the dataops
themselves, rather than their outputs, in the report. This has been fixed in
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1623">#1623</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>Fixed a bug that happened when <code class="docutils literal notranslate"><span class="pre">get_feature_names_out</span></code> was called on instances
of the <a class="reference internal" href="reference/generated/skrub.DatetimeEncoder.html#skrub.DatetimeEncoder" title="skrub.DatetimeEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatetimeEncoder</span></code></a>. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1622">#1622</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
</ul>
</section>
</section>
<section id="release-0-6-1">
<h2>Release 0.6.1<a class="headerlink" href="#release-0-6-1" title="Link to this heading">#</a></h2>
<section id="id11">
<h3>Bugfixes<a class="headerlink" href="#id11" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">get_feature_names_out</span></code> now works correctly when used by <a class="reference internal" href="reference/generated/skrub.GapEncoder.html#skrub.GapEncoder" title="skrub.GapEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">GapEncoder</span></code></a>,
<a class="reference internal" href="reference/generated/skrub.DropCols.html#skrub.DropCols" title="skrub.DropCols"><code class="xref py py-class docutils literal notranslate"><span class="pre">DropCols</span></code></a>, <code class="xref py py-class docutils literal notranslate"><span class="pre">SelectCols:</span></code> from within a scikit-learn <code class="docutils literal notranslate"><span class="pre">Pipeline</span></code>. In
addition, <a class="reference internal" href="reference/generated/skrub.DropCols.html#skrub.DropCols" title="skrub.DropCols"><code class="xref py py-class docutils literal notranslate"><span class="pre">DropCols</span></code></a>’s <code class="docutils literal notranslate"><span class="pre">get_feature_names_out</span></code> method now returns the
names of the columns that are not dropped, rather than the names of the columns
that are dropped. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1543">#1543</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
</ul>
</section>
</section>
<section id="release-0-6-0">
<h2>Release 0.6.0<a class="headerlink" href="#release-0-6-0" title="Link to this heading">#</a></h2>
<section id="highlights">
<h3>Highlights<a class="headerlink" href="#highlights" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>Major feature! Skrub DataOps are a powerful new way of
combining dataframe transformations over multiple tables, and machine learning
pipelines. DataOps can be combined to form compled data plans, that can be used
to train and tune machine learning models. Then, the DataOps plans can be exported
as <code class="docutils literal notranslate"><span class="pre">Learners</span></code> (<a class="reference internal" href="reference/generated/skrub.SkrubLearner.html#skrub.SkrubLearner" title="skrub.SkrubLearner"><code class="xref py py-class docutils literal notranslate"><span class="pre">skrub.SkrubLearner</span></code></a>), standalone objects that can be
used on new data. More detail about the DataOps can be found in the
<a class="reference internal" href="data_ops.html#user-guide-data-ops-index"><span class="std std-ref">User guide</span></a> and in the
<a class="reference internal" href="auto_examples/data_ops/index.html#data-ops-examples-ref"><span class="std std-ref">examples</span></a>.</p></li>
<li><p>The <a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> has been improved with many new features. Series are
now supported directly. It is now
possible to skip computing column associations and generating plots when the
number of columns in the dataframe exceeds a user-defined threshold. Columns with
high cardinality and sorted columns are now highlighted in the report.</p></li>
<li><p><a class="reference external" href="https://docs.python.org/3/library/selectors.html#module-selectors" title="(in Python v3.14)"><code class="xref py py-mod docutils literal notranslate"><span class="pre">selectors</span></code></a>, <a class="reference internal" href="reference/generated/skrub.ApplyToCols.html#skrub.ApplyToCols" title="skrub.ApplyToCols"><code class="xref py py-class docutils literal notranslate"><span class="pre">ApplyToCols</span></code></a> and <code class="xref py py-class docutils literal notranslate"><span class="pre">ApplyToFrame</span></code> are now available,
providing utilities for selecting columns to which a transformer should be applied
in a flexible way. For more details, see the <a class="reference internal" href="modules/multi_column_operations/selectors.html#user-guide-selectors"><span class="std std-ref">User guide</span></a>
and the <a class="reference internal" href="auto_examples/0090_apply_to_cols.html#sphx-glr-auto-examples-0090-apply-to-cols-py"><span class="std std-ref">example</span></a>.</p></li>
<li><p>The <a class="reference internal" href="reference/generated/skrub.SquashingScaler.html#skrub.SquashingScaler" title="skrub.SquashingScaler"><code class="xref py py-class docutils literal notranslate"><span class="pre">SquashingScaler</span></code></a> has been added: it robustly rescales and smoothly
clips numeric columns, enabling more robust handling of numeric columns
with neural networks. See the <a class="reference internal" href="auto_examples/0100_squashing_scaler.html#sphx-glr-auto-examples-0100-squashing-scaler-py"><span class="std std-ref">example</span></a></p></li>
</ul>
</section>
<section id="id12">
<h3>New features<a class="headerlink" href="#id12" title="Link to this heading">#</a></h3>
<ul class="simple">
<li><p>The Skrub DataOps are new mechanism for building machine-learning
pipelines that handle multiple tables and easily describing their
hyperparameter spaces. Main PR: <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1233">#1233</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.
Additional work from other contributors can be found
<a class="reference external" href="https://github.com/skrub-data/skrub/issues?q=merged%3A%3C2025-07-24%20label%3Adata_ops">here</a>:
<a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a> provided very important help by
trying the DataOps on many use-cases and datasets, providing feedback and
suggesting improvements, improving the examples (including creating all the
figures in the examples) and adding jitter to the parallel coordinate plots,
<a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a> experimented with the DataOps,
suggested improvements and improved the examples, <a class="reference external" href="https://github.com/gaelvaroquaux">Gaël Varoquaux</a> , <a class="reference external" href="https://github.com/glemaitre">Guillaume Lemaitre</a>, <a class="reference external" href="https://github.com/adrinjalali">Adrin Jalali</a>, <a class="reference external" href="https://github.com/ogrisel">Olivier Grisel</a> and others participated
through many discussions in defining the requirements and the public API.
See <a class="reference internal" href="auto_examples/data_ops/index.html#data-ops-examples-ref"><span class="std std-ref">the examples</span></a> for
an introduction.</p></li>
<li><p>The <a class="reference external" href="https://docs.python.org/3/library/selectors.html#module-selectors" title="(in Python v3.14)"><code class="xref py py-mod docutils literal notranslate"><span class="pre">selectors</span></code></a> module provides utilities for selecting columns to which
a transformer should be applied in a flexible way. The module was created in
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/895">#895</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a> and added to the public API
in <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1341">#1341</a> by <a class="reference external" href="https://github.com/jeromedockes">Jérôme Dockès</a>.</p></li>
<li><p>The <a class="reference internal" href="reference/generated/skrub.DropUninformative.html#skrub.DropUninformative" title="skrub.DropUninformative"><code class="xref py py-class docutils literal notranslate"><span class="pre">DropUninformative</span></code></a> transformer is now available. This transformer
employs different heuristics to detect columns that are not likely to bring
useful information for training a model.
The current implementation includes detection of columns that contain only a
single value (constant columns), only missing values, or all unique values (such
as IDs). <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1313">#1313</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.get_config.html#skrub.get_config" title="skrub.get_config"><code class="xref py py-func docutils literal notranslate"><span class="pre">get_config()</span></code></a>, <a class="reference internal" href="reference/generated/skrub.set_config.html#skrub.set_config" title="skrub.set_config"><code class="xref py py-func docutils literal notranslate"><span class="pre">set_config()</span></code></a> and <a class="reference internal" href="reference/generated/skrub.config_context.html#skrub.config_context" title="skrub.config_context"><code class="xref py py-func docutils literal notranslate"><span class="pre">config_context()</span></code></a> are now available
to configure settings for dataframes display and expressions. <a class="reference internal" href="reference/generated/skrub.patch_display.html#skrub.patch_display" title="skrub.patch_display"><code class="xref py py-func docutils literal notranslate"><span class="pre">patch_display()</span></code></a>
and <a class="reference internal" href="reference/generated/skrub.unpatch_display.html#skrub.unpatch_display" title="skrub.unpatch_display"><code class="xref py py-func docutils literal notranslate"><span class="pre">unpatch_display()</span></code></a> are deprecated and will be removed in the next release
of skrub. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1427">#1427</a> by <a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a>.
The global configuration includes the parameter <code class="docutils literal notranslate"><span class="pre">cardinality_threshold</span></code> that
controls the threshold value used to warn user if they have high cardinality
columns in their dataset. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1498">#1498</a> by <a class="reference external" href="https://github.com/rouk1">rouk1</a>.
Additionally, the parameter <code class="docutils literal notranslate"><span class="pre">float_precision</span></code>
controls the number of significant digits displayed for floating-point values
in reports. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1470">#1470</a> by <a class="reference external" href="https://github.com/georgescutelnicu">George S</a>.</p></li>
<li><p>Added the <a class="reference internal" href="reference/generated/skrub.SquashingScaler.html#skrub.SquashingScaler" title="skrub.SquashingScaler"><code class="xref py py-class docutils literal notranslate"><span class="pre">SquashingScaler</span></code></a>, a transformer that
robustly rescales and smoothly clips numeric columns,
enabling more robust handling of numeric columns
with neural networks. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1310">#1310</a> by <a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a> and
<a class="reference external" href="https://github.com/dholzmueller">David Holzmüller</a>.</p></li>
<li><p><code class="xref py py-func docutils literal notranslate"><span class="pre">datasets.toy_order()</span></code> is now available to create a toy dataframe and
corresponding targets for examples.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1485">#1485</a> by <a class="reference external" href="https://github.com/canag">Antoine Canaguier-Durand</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.ApplyToCols.html#skrub.ApplyToCols" title="skrub.ApplyToCols"><code class="xref py py-class docutils literal notranslate"><span class="pre">ApplyToCols</span></code></a> and <code class="xref py py-class docutils literal notranslate"><span class="pre">ApplyToFrame</span></code> are now available to apply transformers
on a set of columns independently and jointly respectively.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1478">#1478</a> by <a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a>.</p></li>
</ul>
</section>
<section id="id13">
<h3>Changes<a class="headerlink" href="#id13" title="Link to this heading">#</a></h3>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>The default high cardinality encoder for both <a class="reference internal" href="reference/generated/skrub.TableVectorizer.html#skrub.TableVectorizer" title="skrub.TableVectorizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableVectorizer</span></code></a> and
<code class="xref py py-meth docutils literal notranslate"><span class="pre">tabular_learner()</span></code> (now <a class="reference internal" href="reference/generated/skrub.tabular_pipeline.html#skrub.tabular_pipeline" title="skrub.tabular_pipeline"><code class="xref py py-meth docutils literal notranslate"><span class="pre">tabular_pipeline()</span></code></a>) has been changed from
<a class="reference internal" href="reference/generated/skrub.GapEncoder.html#skrub.GapEncoder" title="skrub.GapEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">GapEncoder</span></code></a> to <a class="reference internal" href="reference/generated/skrub.StringEncoder.html#skrub.StringEncoder" title="skrub.StringEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">StringEncoder</span></code></a>. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1354">#1354</a> by
<a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p>
</div>
<ul class="simple">
<li><p>The <code class="docutils literal notranslate"><span class="pre">tabular_learner</span></code> function has been deprecated in favor of <a class="reference internal" href="reference/generated/skrub.tabular_pipeline.html#skrub.tabular_pipeline" title="skrub.tabular_pipeline"><code class="xref py py-func docutils literal notranslate"><span class="pre">tabular_pipeline()</span></code></a> to honor
its scikit-learn pipeline cultural heritage, and remove the ambiguity with the data
ops Learner. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1493">#1493</a> by <a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a>.</p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.StringEncoder.html#skrub.StringEncoder" title="skrub.StringEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">StringEncoder</span></code></a> now exposes the <code class="docutils literal notranslate"><span class="pre">stop_words</span></code> argument, which is passed to the
underlying vectorizer (<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer" title="(in scikit-learn v1.8)"><code class="xref py py-class docutils literal notranslate"><span class="pre">TfidfVectorizer</span></code></a>,
or <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.HashingVectorizer.html#sklearn.feature_extraction.text.HashingVectorizer" title="(in scikit-learn v1.8)"><code class="xref py py-class docutils literal notranslate"><span class="pre">HashingVectorizer</span></code></a>). <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1415">#1415</a> by
<a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a>.</p></li>
<li><p>A new parameter <code class="docutils literal notranslate"><span class="pre">max_association_columns</span></code> has been added to the
<a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> to skip association computation when the number of columns
exceeds the specified value. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1304">#1304</a> by <a class="reference external" href="https://github.com/victoris93">Victoria Shevchenko</a>.</p></li>
<li><p>The <cite>packaging</cite> dependency was removed.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1307">#1307</a> by <a class="reference external" href="https://github.com/jovan-stojanovic">Jovan Stojanovic</a></p></li>
<li><p><a class="reference internal" href="reference/generated/skrub.TextEncoder.html#skrub.TextEncoder" title="skrub.TextEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">TextEncoder</span></code></a>, <a class="reference internal" href="reference/generated/skrub.StringEncoder.html#skrub.StringEncoder" title="skrub.StringEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">StringEncoder</span></code></a> and <a class="reference internal" href="reference/generated/skrub.GapEncoder.html#skrub.GapEncoder" title="skrub.GapEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">GapEncoder</span></code></a> now compute the
total standard deviation norm during training, which is a global constant, and
normalize the vector outputs by performing element-wise division on all entries.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1274">#1274</a> by <a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a>.</p></li>
<li><p>The <code class="xref py py-class docutils literal notranslate"><span class="pre">DropIfTooManyNulls</span></code> transformer has been replaced by the
<a class="reference internal" href="reference/generated/skrub.DropUninformative.html#skrub.DropUninformative" title="skrub.DropUninformative"><code class="xref py py-class docutils literal notranslate"><span class="pre">DropUninformative</span></code></a> transformer and will be removed in a future release.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1313">#1313</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a></p></li>
<li><p>The <code class="xref py py-func docutils literal notranslate"><span class="pre">concat_horizontal()</span></code> function was replaced with <code class="xref py py-func docutils literal notranslate"><span class="pre">concat()</span></code>. Horizontal or vertical concatenation
is now controlled by the <cite>axis</cite> parameter. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1334">#1334</a> by <a class="reference external" href="https://github.com/pvprajwal">Parasa V Prajwal</a>.</p></li>
<li><p>The <a class="reference internal" href="reference/generated/skrub.TableVectorizer.html#skrub.TableVectorizer" title="skrub.TableVectorizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableVectorizer</span></code></a> and <a class="reference internal" href="reference/generated/skrub.Cleaner.html#skrub.Cleaner" title="skrub.Cleaner"><code class="xref py py-class docutils literal notranslate"><span class="pre">Cleaner</span></code></a> now accept a <cite>datetime_format</cite>
parameter for specifying the format to use when parsing datetime columns.
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1358">#1358</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>The <code class="xref py py-class docutils literal notranslate"><span class="pre">SimpleCleaner</span></code> has been removed. use <a class="reference internal" href="reference/generated/skrub.Cleaner.html#skrub.Cleaner" title="skrub.Cleaner"><code class="xref py py-class docutils literal notranslate"><span class="pre">Cleaner</span></code></a> instead. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1370">#1370</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>The periodic encoding for the <code class="docutils literal notranslate"><span class="pre">day_in_year</span></code> has been removed from the <a class="reference internal" href="reference/generated/skrub.DatetimeEncoder.html#skrub.DatetimeEncoder" title="skrub.DatetimeEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatetimeEncoder</span></code></a> as it was
redundant. The feature itself is still added if the flag is set to <code class="docutils literal notranslate"><span class="pre">True</span></code>. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1396">#1396</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
<li><p>The naming scheme used for the features generated by <a class="reference internal" href="reference/generated/skrub.TextEncoder.html#skrub.TextEncoder" title="skrub.TextEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">TextEncoder</span></code></a>, <a class="reference internal" href="reference/generated/skrub.StringEncoder.html#skrub.StringEncoder" title="skrub.StringEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">StringEncoder</span></code></a>, <a class="reference internal" href="reference/generated/skrub.MinHashEncoder.html#skrub.MinHashEncoder" title="skrub.MinHashEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">MinHashEncoder</span></code></a>,
<a class="reference internal" href="reference/generated/skrub.DatetimeEncoder.html#skrub.DatetimeEncoder" title="skrub.DatetimeEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatetimeEncoder</span></code></a> has been standardized. Now features generated by all encoders have indices in the range
<code class="docutils literal notranslate"><span class="pre">[0,</span> <span class="pre">n_components-1]</span></code>, rather than <code class="docutils literal notranslate"><span class="pre">[1,</span> <span class="pre">n_components]</span></code>. Additionally, columns with empty name are assigned a default