Skip to content

Commit 4f30133

Browse files
committed
Pushing the docs to dev/ for branch: main, commit f070bdec4f71d88fd0ceaca4a7f299129d71c9cf
1 parent e6619b6 commit 4f30133

206 files changed

Lines changed: 22006 additions & 21882 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

dev/CHANGES.html

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -611,8 +611,8 @@ <h3>Changes<a class="headerlink" href="#id1" title="Link to this heading">#</a><
611611
</div>
612612
<ul class="simple">
613613
<li><p><a class="reference internal" href="reference/generated/skrub.StringEncoder.html#skrub.StringEncoder" title="skrub.StringEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">StringEncoder</span></code></a> now exposes the <code class="docutils literal notranslate"><span class="pre">stop_words</span></code> argument, which is passed to the
614-
underlying vectorizer (<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer" title="(in scikit-learn v1.6)"><code class="xref py py-class docutils literal notranslate"><span class="pre">TfidfVectorizer</span></code></a>,
615-
or <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.HashingVectorizer.html#sklearn.feature_extraction.text.HashingVectorizer" title="(in scikit-learn v1.6)"><code class="xref py py-class docutils literal notranslate"><span class="pre">HashingVectorizer</span></code></a>). <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1415">#1415</a> by
614+
underlying vectorizer (<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer" title="(in scikit-learn v1.7)"><code class="xref py py-class docutils literal notranslate"><span class="pre">TfidfVectorizer</span></code></a>,
615+
or <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.HashingVectorizer.html#sklearn.feature_extraction.text.HashingVectorizer" title="(in scikit-learn v1.7)"><code class="xref py py-class docutils literal notranslate"><span class="pre">HashingVectorizer</span></code></a>). <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1415">#1415</a> by
616616
<a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a>.</p></li>
617617
<li><p>A new parameter <code class="docutils literal notranslate"><span class="pre">max_association_columns</span></code> has been added to the
618618
<a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> to skip association computation when the number of columns
@@ -628,6 +628,10 @@ <h3>Changes<a class="headerlink" href="#id1" title="Link to this heading">#</a><
628628
parameter for specifying the format to use when parsing datetime columns.
629629
<a class="reference external" href="https://github.com/skrub-data/skrub/pull/1358">#1358</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
630630
<li><p>The <code class="xref py py-class docutils literal notranslate"><span class="pre">SimpleCleaner</span></code> has been removed. use <a class="reference internal" href="reference/generated/skrub.Cleaner.html#skrub.Cleaner" title="skrub.Cleaner"><code class="xref py py-class docutils literal notranslate"><span class="pre">Cleaner</span></code></a> instead. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1370">#1370</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
631+
<li><p>The naming scheme used for the features generated by <a class="reference internal" href="reference/generated/skrub.TextEncoder.html#skrub.TextEncoder" title="skrub.TextEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">TextEncoder</span></code></a>, <a class="reference internal" href="reference/generated/skrub.StringEncoder.html#skrub.StringEncoder" title="skrub.StringEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">StringEncoder</span></code></a>, <a class="reference internal" href="reference/generated/skrub.MinHashEncoder.html#skrub.MinHashEncoder" title="skrub.MinHashEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">MinHashEncoder</span></code></a>,
632+
<a class="reference internal" href="reference/generated/skrub.DatetimeEncoder.html#skrub.DatetimeEncoder" title="skrub.DatetimeEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatetimeEncoder</span></code></a> has been standardized. Now features generated by all encoders have indices in the range
633+
<code class="docutils literal notranslate"><span class="pre">[0,</span> <span class="pre">n_components-1]</span></code>, rather than <code class="docutils literal notranslate"><span class="pre">[1,</span> <span class="pre">n_components]</span></code>. Additionally, columns with empty name are assigned a default
634+
name that depends on the encoder used. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1405">#1405</a> by <a class="reference external" href="https://github.com/rcap107">Riccardo Cappuzzo</a>.</p></li>
631635
<li><p>The optional dependencies ‘dev’, ‘doc’, ‘lint’ and ‘test’ have been coalesced into
632636
‘dev’. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1404">#1404</a> by <a class="reference external" href="https://github.com/Vincent-Maladiere">Vincent Maladiere</a>.</p></li>
633637
<li><p>The <a class="reference internal" href="reference/generated/skrub.TableReport.html#skrub.TableReport" title="skrub.TableReport"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableReport</span></code></a> now supports Series in addition to Dataframes. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/1420">#1420</a> by <a class="reference external" href="https://github.com/vitorpohlenz">Vitor Pohlenz</a>.</p></li>
@@ -1031,7 +1035,7 @@ <h2>skrub release 0.1.0<a class="headerlink" href="#skrub-release-0-1-0" title="
10311035
<h3>Major changes<a class="headerlink" href="#id19" title="Link to this heading">#</a></h3>
10321036
<ul class="simple">
10331037
<li><p><code class="xref py py-class docutils literal notranslate"><span class="pre">TargetEncoder</span></code> has been removed in favor of
1034-
<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.TargetEncoder.html#sklearn.preprocessing.TargetEncoder" title="(in scikit-learn v1.6)"><code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.preprocessing.TargetEncoder</span></code></a>, available since scikit-learn 1.3.</p></li>
1038+
<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.TargetEncoder.html#sklearn.preprocessing.TargetEncoder" title="(in scikit-learn v1.7)"><code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.preprocessing.TargetEncoder</span></code></a>, available since scikit-learn 1.3.</p></li>
10351039
<li><p><a class="reference internal" href="reference/generated/skrub.Joiner.html#skrub.Joiner" title="skrub.Joiner"><code class="xref py py-class docutils literal notranslate"><span class="pre">Joiner</span></code></a> and <a class="reference internal" href="reference/generated/skrub.fuzzy_join.html#skrub.fuzzy_join" title="skrub.fuzzy_join"><code class="xref py py-func docutils literal notranslate"><span class="pre">fuzzy_join()</span></code></a> support several ways of rescaling
10361040
distances; <code class="docutils literal notranslate"><span class="pre">match_score</span></code> has been replaced by <code class="docutils literal notranslate"><span class="pre">max_dist</span></code>; bugs which
10371041
prevented the Joiner to consistently vectorize inputs and accept or reject
@@ -1282,8 +1286,8 @@ <h3>Major changes<a class="headerlink" href="#id27" title="Link to this heading"
12821286
<li><p>The <a class="reference internal" href="reference/generated/skrub.TableVectorizer.html#skrub.TableVectorizer" title="skrub.TableVectorizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableVectorizer</span></code></a> has seen some major improvements and bug fixes:</p>
12831287
<ul class="simple">
12841288
<li><p>Fixes the automatic casting logic in <code class="docutils literal notranslate"><span class="pre">transform</span></code>.</p></li>
1285-
<li><p>To avoid dimensionality explosion when a feature has two unique values, the default encoder (<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder" title="(in scikit-learn v1.6)"><code class="xref py py-class docutils literal notranslate"><span class="pre">OneHotEncoder</span></code></a>) now drops one of the two vectors (see parameter <cite>drop=”if_binary”</cite>).</p></li>
1286-
<li><p><code class="docutils literal notranslate"><span class="pre">fit_transform</span></code> and <code class="docutils literal notranslate"><span class="pre">transform</span></code> can now return unencoded features, like the <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer" title="(in scikit-learn v1.6)"><code class="xref py py-class docutils literal notranslate"><span class="pre">ColumnTransformer</span></code></a>’s behavior. Previously, a <code class="docutils literal notranslate"><span class="pre">RuntimeError</span></code> was raised.</p></li>
1289+
<li><p>To avoid dimensionality explosion when a feature has two unique values, the default encoder (<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder" title="(in scikit-learn v1.7)"><code class="xref py py-class docutils literal notranslate"><span class="pre">OneHotEncoder</span></code></a>) now drops one of the two vectors (see parameter <cite>drop=”if_binary”</cite>).</p></li>
1290+
<li><p><code class="docutils literal notranslate"><span class="pre">fit_transform</span></code> and <code class="docutils literal notranslate"><span class="pre">transform</span></code> can now return unencoded features, like the <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer" title="(in scikit-learn v1.7)"><code class="xref py py-class docutils literal notranslate"><span class="pre">ColumnTransformer</span></code></a>’s behavior. Previously, a <code class="docutils literal notranslate"><span class="pre">RuntimeError</span></code> was raised.</p></li>
12871291
</ul>
12881292
<p><a class="reference external" href="https://github.com/skrub-data/skrub/pull/300">#300</a> by <a class="reference external" href="https://github.com/LilianBoulard">Lilian Boulard</a></p>
12891293
</li>
@@ -1344,7 +1348,7 @@ <h3>Major changes<a class="headerlink" href="#id29" title="Link to this heading"
13441348
<li><dl class="simple">
13451349
<dt>Improvements to the <a class="reference internal" href="reference/generated/skrub.MinHashEncoder.html#skrub.MinHashEncoder" title="skrub.MinHashEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">MinHashEncoder</span></code></a></dt><dd><ul class="simple">
13461350
<li><p>It is now possible to fit multiple columns simultaneously with the <a class="reference internal" href="reference/generated/skrub.MinHashEncoder.html#skrub.MinHashEncoder" title="skrub.MinHashEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">MinHashEncoder</span></code></a>.
1347-
Very useful when using for instance the <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html#sklearn.compose.make_column_transformer" title="(in scikit-learn v1.6)"><code class="xref py py-func docutils literal notranslate"><span class="pre">make_column_transformer()</span></code></a> function,
1351+
Very useful when using for instance the <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html#sklearn.compose.make_column_transformer" title="(in scikit-learn v1.7)"><code class="xref py py-func docutils literal notranslate"><span class="pre">make_column_transformer()</span></code></a> function,
13481352
on multiple columns.</p></li>
13491353
</ul>
13501354
</dd>
@@ -1448,7 +1452,7 @@ <h3>Major changes<a class="headerlink" href="#id34" title="Link to this heading"
14481452
<li><p><a class="reference internal" href="reference/generated/skrub.TableVectorizer.html#skrub.TableVectorizer" title="skrub.TableVectorizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableVectorizer</span></code></a>: Added automatic transform through the
14491453
<a class="reference internal" href="reference/generated/skrub.TableVectorizer.html#skrub.TableVectorizer" title="skrub.TableVectorizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">TableVectorizer</span></code></a> class. It transforms
14501454
columns automatically based on their type. It provides a replacement
1451-
for scikit-learn’s <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer" title="(in scikit-learn v1.6)"><code class="xref py py-class docutils literal notranslate"><span class="pre">ColumnTransformer</span></code></a> simpler to use on heterogeneous
1455+
for scikit-learn’s <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer" title="(in scikit-learn v1.7)"><code class="xref py py-class docutils literal notranslate"><span class="pre">ColumnTransformer</span></code></a> simpler to use on heterogeneous
14521456
pandas DataFrame. <a class="reference external" href="https://github.com/skrub-data/skrub/pull/167">#167</a> by <a class="reference external" href="https://github.com/LilianBoulard">Lilian Boulard</a></p></li>
14531457
<li><p><strong>Backward incompatible change to</strong> <a class="reference internal" href="reference/generated/skrub.GapEncoder.html#skrub.GapEncoder" title="skrub.GapEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">GapEncoder</span></code></a>: The <a class="reference internal" href="reference/generated/skrub.GapEncoder.html#skrub.GapEncoder" title="skrub.GapEncoder"><code class="xref py py-class docutils literal notranslate"><span class="pre">GapEncoder</span></code></a> now only
14541458
supports two-dimensional inputs of shape (n_samples, n_features).
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
0 Bytes
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)