You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: dev/CHANGES.html
+11-7Lines changed: 11 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -611,8 +611,8 @@ <h3>Changes<a class="headerlink" href="#id1" title="Link to this heading">#</a><
611
611
</div>
612
612
<ulclass="simple">
613
613
<li><p><aclass="reference internal" href="reference/generated/skrub.StringEncoder.html#skrub.StringEncoder" title="skrub.StringEncoder"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">StringEncoder</span></code></a> now exposes the <codeclass="docutils literal notranslate"><spanclass="pre">stop_words</span></code> argument, which is passed to the
<li><p>A new parameter <codeclass="docutils literal notranslate"><spanclass="pre">max_association_columns</span></code> has been added to the
618
618
<aclass="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">TableReport</span></code></a> to skip association computation when the number of columns
@@ -628,6 +628,10 @@ <h3>Changes<a class="headerlink" href="#id1" title="Link to this heading">#</a><
628
628
parameter for specifying the format to use when parsing datetime columns.
629
629
<aclass="reference external" href="https://github.com/skrub-data/skrub/pull/1358">#1358</a> by <aclass="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
630
630
<li><p>The <codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">SimpleCleaner</span></code> has been removed. use <aclass="reference internal" href="reference/generated/skrub.Cleaner.html#skrub.Cleaner" title="skrub.Cleaner"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Cleaner</span></code></a> instead. <aclass="reference external" href="https://github.com/skrub-data/skrub/pull/1370">#1370</a> by <aclass="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
631
+
<li><p>The naming scheme used for the features generated by <aclass="reference internal" href="reference/generated/skrub.TextEncoder.html#skrub.TextEncoder" title="skrub.TextEncoder"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">TextEncoder</span></code></a>, <aclass="reference internal" href="reference/generated/skrub.StringEncoder.html#skrub.StringEncoder" title="skrub.StringEncoder"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">StringEncoder</span></code></a>, <aclass="reference internal" href="reference/generated/skrub.MinHashEncoder.html#skrub.MinHashEncoder" title="skrub.MinHashEncoder"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">MinHashEncoder</span></code></a>,
632
+
<aclass="reference internal" href="reference/generated/skrub.DatetimeEncoder.html#skrub.DatetimeEncoder" title="skrub.DatetimeEncoder"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">DatetimeEncoder</span></code></a> has been standardized. Now features generated by all encoders have indices in the range
633
+
<codeclass="docutils literal notranslate"><spanclass="pre">[0,</span><spanclass="pre">n_components-1]</span></code>, rather than <codeclass="docutils literal notranslate"><spanclass="pre">[1,</span><spanclass="pre">n_components]</span></code>. Additionally, columns with empty name are assigned a default
634
+
name that depends on the encoder used. <aclass="reference external" href="https://github.com/skrub-data/skrub/pull/1405">#1405</a> by <aclass="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
631
635
<li><p>The optional dependencies ‘dev’, ‘doc’, ‘lint’ and ‘test’ have been coalesced into
632
636
‘dev’. <aclass="reference external" href="https://github.com/skrub-data/skrub/pull/1404">#1404</a> by <aclass="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a>.</p></li>
633
637
<li><p>The <aclass="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">TableReport</span></code></a> now supports Series in addition to Dataframes. <aclass="reference external" href="https://github.com/skrub-data/skrub/pull/1420">#1420</a> by <aclass="reference external" href="https://github.com/vitorpohlenz">Vitor Pohlenz</a>.</p></li>
<h3>Major changes<aclass="headerlink" href="#id19" title="Link to this heading">#</a></h3>
1032
1036
<ulclass="simple">
1033
1037
<li><p><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">TargetEncoder</span></code> has been removed in favor of
1034
-
<aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.TargetEncoder.html#sklearn.preprocessing.TargetEncoder" title="(in scikit-learn v1.6)"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">sklearn.preprocessing.TargetEncoder</span></code></a>, available since scikit-learn 1.3.</p></li>
1038
+
<aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.TargetEncoder.html#sklearn.preprocessing.TargetEncoder" title="(in scikit-learn v1.7)"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">sklearn.preprocessing.TargetEncoder</span></code></a>, available since scikit-learn 1.3.</p></li>
1035
1039
<li><p><aclass="reference internal" href="reference/generated/skrub.Joiner.html#skrub.Joiner" title="skrub.Joiner"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Joiner</span></code></a> and <aclass="reference internal" href="reference/generated/skrub.fuzzy_join.html#skrub.fuzzy_join" title="skrub.fuzzy_join"><codeclass="xref py py-func docutils literal notranslate"><spanclass="pre">fuzzy_join()</span></code></a> support several ways of rescaling
1036
1040
distances; <codeclass="docutils literal notranslate"><spanclass="pre">match_score</span></code> has been replaced by <codeclass="docutils literal notranslate"><spanclass="pre">max_dist</span></code>; bugs which
1037
1041
prevented the Joiner to consistently vectorize inputs and accept or reject
@@ -1282,8 +1286,8 @@ <h3>Major changes<a class="headerlink" href="#id27" title="Link to this heading"
1282
1286
<li><p>The <aclass="reference internal" href="reference/generated/skrub.TableVectorizer.html#skrub.TableVectorizer" title="skrub.TableVectorizer"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">TableVectorizer</span></code></a> has seen some major improvements and bug fixes:</p>
1283
1287
<ulclass="simple">
1284
1288
<li><p>Fixes the automatic casting logic in <codeclass="docutils literal notranslate"><spanclass="pre">transform</span></code>.</p></li>
1285
-
<li><p>To avoid dimensionality explosion when a feature has two unique values, the default encoder (<aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder" title="(in scikit-learn v1.6)"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">OneHotEncoder</span></code></a>) now drops one of the two vectors (see parameter <cite>drop=”if_binary”</cite>).</p></li>
1286
-
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">fit_transform</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">transform</span></code> can now return unencoded features, like the <aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer" title="(in scikit-learn v1.6)"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">ColumnTransformer</span></code></a>’s behavior. Previously, a <codeclass="docutils literal notranslate"><spanclass="pre">RuntimeError</span></code> was raised.</p></li>
1289
+
<li><p>To avoid dimensionality explosion when a feature has two unique values, the default encoder (<aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder" title="(in scikit-learn v1.7)"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">OneHotEncoder</span></code></a>) now drops one of the two vectors (see parameter <cite>drop=”if_binary”</cite>).</p></li>
1290
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">fit_transform</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">transform</span></code> can now return unencoded features, like the <aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer" title="(in scikit-learn v1.7)"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">ColumnTransformer</span></code></a>’s behavior. Previously, a <codeclass="docutils literal notranslate"><spanclass="pre">RuntimeError</span></code> was raised.</p></li>
1287
1291
</ul>
1288
1292
<p><aclass="reference external" href="https://github.com/skrub-data/skrub/pull/300">#300</a> by <aclass="reference external" href="https://github.com/LilianBoulard">Lilian Boulard</a></p>
1289
1293
</li>
@@ -1344,7 +1348,7 @@ <h3>Major changes<a class="headerlink" href="#id29" title="Link to this heading"
1344
1348
<li><dlclass="simple">
1345
1349
<dt>Improvements to the <aclass="reference internal" href="reference/generated/skrub.MinHashEncoder.html#skrub.MinHashEncoder" title="skrub.MinHashEncoder"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">MinHashEncoder</span></code></a></dt><dd><ulclass="simple">
1346
1350
<li><p>It is now possible to fit multiple columns simultaneously with the <aclass="reference internal" href="reference/generated/skrub.MinHashEncoder.html#skrub.MinHashEncoder" title="skrub.MinHashEncoder"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">MinHashEncoder</span></code></a>.
1347
-
Very useful when using for instance the <aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html#sklearn.compose.make_column_transformer" title="(in scikit-learn v1.6)"><codeclass="xref py py-func docutils literal notranslate"><spanclass="pre">make_column_transformer()</span></code></a> function,
1351
+
Very useful when using for instance the <aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html#sklearn.compose.make_column_transformer" title="(in scikit-learn v1.7)"><codeclass="xref py py-func docutils literal notranslate"><spanclass="pre">make_column_transformer()</span></code></a> function,
1348
1352
on multiple columns.</p></li>
1349
1353
</ul>
1350
1354
</dd>
@@ -1448,7 +1452,7 @@ <h3>Major changes<a class="headerlink" href="#id34" title="Link to this heading"
1448
1452
<li><p><aclass="reference internal" href="reference/generated/skrub.TableVectorizer.html#skrub.TableVectorizer" title="skrub.TableVectorizer"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">TableVectorizer</span></code></a>: Added automatic transform through the
columns automatically based on their type. It provides a replacement
1451
-
for scikit-learn’s <aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer" title="(in scikit-learn v1.6)"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">ColumnTransformer</span></code></a> simpler to use on heterogeneous
1455
+
for scikit-learn’s <aclass="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer" title="(in scikit-learn v1.7)"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">ColumnTransformer</span></code></a> simpler to use on heterogeneous
1452
1456
pandas DataFrame. <aclass="reference external" href="https://github.com/skrub-data/skrub/pull/167">#167</a> by <aclass="reference external" href="https://github.com/LilianBoulard">Lilian Boulard</a></p></li>
1453
1457
<li><p><strong>Backward incompatible change to</strong><aclass="reference internal" href="reference/generated/skrub.GapEncoder.html#skrub.GapEncoder" title="skrub.GapEncoder"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">GapEncoder</span></code></a>: The <aclass="reference internal" href="reference/generated/skrub.GapEncoder.html#skrub.GapEncoder" title="skrub.GapEncoder"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">GapEncoder</span></code></a> now only
1454
1458
supports two-dimensional inputs of shape (n_samples, n_features).
0 commit comments