|
1 | 1 | """ |
2 | | -Introduction to wrangling pipelines for machine-learning skrub DataOps |
| 2 | +Tutorial: Using Data Ops to build a machine-learning pipeline |
3 | 3 | ======================================================================= |
4 | 4 |
|
5 | | -This example shows data wrangling for machine learning using Skrub's |
6 | | -:ref:`DataOps <user_guide_data_ops_index>`. |
7 | | -
|
8 | | -The challenge of data-wrangling for machine learning is the need to |
9 | | -apply the wrangling operations to new data, for prediction. |
10 | | -
|
11 | | -Skrub's DataOps build pipelines that blend data wrangling and machine |
12 | | -learning by recording all the operations involved in pre-processing data |
13 | | -and training models. They result in an a full *learner* that starts from the |
14 | | -raw data. We will also how show it can be saved, loaded back, and then used to make |
15 | | -predictions on new, unseen data. |
16 | | -
|
17 | | -This example is meant to be an introduction to Skrub DataOps, and as such it |
18 | | -will not cover all the features. Further examples in the gallery |
19 | | -:ref:`data_ops_examples_ref` go into more detail on Skrub DataOps |
20 | | -for more complex tasks. |
21 | | -
|
22 | 5 | .. currentmodule:: skrub |
23 | 6 |
|
24 | 7 | .. |fetch_employee_salaries| replace:: :func:`datasets.fetch_employee_salaries` |
|
27 | 10 | .. |skb.mark_as_X| replace:: :meth:`DataOp.skb.mark_as_X` |
28 | 11 | .. |skb.mark_as_y| replace:: :meth:`DataOp.skb.mark_as_y` |
29 | 12 | .. |TableVectorizer| replace:: :class:`TableVectorizer` |
| 13 | +.. |ToDatetime| replace:: :class:`ToDatetime` |
30 | 14 | .. |skb.apply| replace:: :meth:`.skb.apply() <DataOp.skb.apply>` |
31 | 15 | .. |HistGradientBoostingRegressor| replace:: |
32 | 16 | :class:`~sklearn.ensemble.HistGradientBoostingRegressor` |
|
35 | 19 | .. |make_randomized_search| replace:: |
36 | 20 | :meth:`.skb.make_randomized_search <DataOp.skb.make_randomized_search>` |
37 | 21 |
|
| 22 | +This example shows data how we can use skrub's |
| 23 | +:ref:`DataOps <user_guide_data_ops_index>` for building a machine learning pipeline. |
| 24 | +
|
| 25 | +The challenge of preparing data for machine learning is the need to |
| 26 | +apply the same data preparation and wrangling operations to new data, for prediction. |
| 27 | +
|
| 28 | +Skrub's DataOps build pipelines that blend data wrangling and machine |
| 29 | +learning by recording all the operations involved in pre-processing data |
| 30 | +and training models, as well as the state of the transformers and models used to |
| 31 | +make predictions. |
| 32 | +
|
| 33 | +.. admonition:: What is a state? |
| 34 | + :collapsible: closed |
| 35 | +
|
| 36 | + The state of a transformer or model refers to the internal parameters and |
| 37 | + attributes that are learned or set during the fitting process. For example, |
| 38 | + in a :class:`~sklearn.preprocessing.StandardScaler`, the state would include |
| 39 | + the mean and standard deviation calculated from the training data. |
| 40 | + In a pre-processing transformer like |ToDatetime|, the state would include the |
| 41 | + inferred datetime format based on the data it was fitted on. |
| 42 | + In a machine learning model like |HistGradientBoostingRegressor|, the state |
| 43 | + would include the fitted parameters of the model after training on the data. |
| 44 | +
|
| 45 | +The result of building a DataOps plan is a *learner*, an object with an interface |
| 46 | +similar to that of a scikit-learn estimator, but which contains all the steps in the |
| 47 | +data preparation and model training process, along with the state of all the |
| 48 | +transformers and models: this allows to save the learner, load it back later, |
| 49 | +and use it to make predictions on new data. |
| 50 | +
|
| 51 | +This example is meant to be an introduction to Skrub DataOps, and as such it |
| 52 | +will not cover all the features. Further examples in the gallery |
| 53 | +:ref:`data_ops_examples_ref` go into more detail on Skrub DataOps |
| 54 | +for more complex tasks. |
| 55 | +
|
| 56 | +
|
38 | 57 | """ |
39 | 58 |
|
40 | 59 | # %% |
|
0 commit comments