Simplify code using skrub TableReport and TableVectorizer

- Add a notebook + video to show how all the pandas code in the [Visual inspection of data subsection](https://inria.github.io/scikit-learn-mooc/python_scripts/01_tabular_data_exploration.html#visual-inspection-of-the-data) can be simplified using [skrub.TableReport](https://skrub-data.org/stable/reference/generated/skrub.TableReport.html):
- Replace `ColumnTransformer` with [skrub.TableVectorizer](https://skrub-data.org/stable/reference/generated/skrub.TableVectorizer.html) starting from the [Using numerical and categorical variables together notebook](https://inria.github.io/scikit-learn-mooc/python_scripts/03_categorical_pipeline_column_transformer.html)
  - In the same notebook, section [Fitting a more powerful model](https://inria.github.io/scikit-learn-mooc/python_scripts/03_categorical_pipeline_column_transformer.html#fitting-a-more-powerful-model), replace `OrdinalEncoder` by `skrub.ToCategorical`.
  - Explicitly mention that `TableVectorizer` makes the column selection automatically by using its `dtype`
  - Introduce concept of "low/high cardinality" and demonstrate effect of `cardinality_threshold` on the "native-country" column in the Adult Census dataset.
  - Update visualizing scikit-learn pipelines video to use `TableVectorizer` (with scikit-learn version >= 1.8)
  - Modify wrap-up quizzes that use the Ames Housing dataset i.e. M1, M4 and M5 to select subset of numerical columns with pandas
- Redo the datasets description using `TableReport` 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify code using skrub TableReport and TableVectorizer #866

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Simplify code using skrub TableReport and TableVectorizer #866

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions