Sync with Inria repo

ArturoAmorQ · web-flow · commit 18a19cf49f78 · 2026-03-20T04:14:21.000-06:00
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -10,7 +10,7 @@ repos:
       exclude: notebooks
       exclude_types: [svg]
 - repo: https://github.com/psf/black
-  rev: 23.1.0
+  rev: 25.11.0
   hooks:
   -   id: black
 - repo: https://github.com/astral-sh/ruff-pre-commit
diff --git a/environment-dev.yml b/environment-dev.yml
@@ -14,4 +14,4 @@ dependencies:
   - packaging
   - pip
   - pip:
-    - jupyter-book >= 0.11
+    - jupyter-book < 2.0
diff --git a/jupyter-book/_config.yml b/jupyter-book/_config.yml
@@ -47,7 +47,7 @@ html:
     <div>
       <div class="mooc_add">
        <a href="https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn">Join the full MOOC experience</a>
-       <a href="https://certification.probabl.ai/">Get officially certified!</a>
+       <a href="https://probabl.ai/certification?utm_source=inria&utm_medium=mooc&utm_campaign=2026_inria_mooc_referrals">Get officially certified!</a>
       </div>
       Brought to you under a <a href="https://github.com/INRIA/scikit-learn-mooc/blob/main/LICENSE">CC-BY License</a> by
       <a href="https://learninglab.inria.fr">Inria Learning Lab</a>,
diff --git a/notebooks/cross_validation_validation_curve.ipynb b/notebooks/cross_validation_validation_curve.ipynb
@@ -259,18 +259,18 @@
     "errors made during the data collection process (besides not measuring the\n",
     "unobserved input feature).\n",
     "\n",
-    "One extreme case could happen if there where samples in the dataset with\n",
-    "exactly the same input feature values but different values for the target\n",
-    "variable. That is very unlikely in real life settings, but could the case if\n",
-    "all features are categorical or if the numerical features were discretized\n",
-    "or rounded up naively. In our example, we can imagine two houses having\n",
-    "the exact same features in our dataset, but having different prices because\n",
-    "of the (unmeasured) seller's rush.\n",
-    "\n",
-    "Apart from these extreme case, it's hard to know for sure what should qualify\n",
-    "or not as noise and which kind of \"noise\" as introduced above is dominating.\n",
-    "But in practice, the best ways to make our predictive models robust to noise\n",
-    "are to avoid overfitting models by:\n",
+    "One extreme case could happen if there where samples in the dataset with exactly\n",
+    "the same input feature values but different values for the target variable. That\n",
+    "is very unlikely in real life settings, but could be the case if all features\n",
+    "are categorical or if the numerical features were discretized or rounded up\n",
+    "naively. In our example, we can imagine two houses having the exact same\n",
+    "features in our dataset, but having different prices because of the (unmeasured)\n",
+    "seller's rush.\n",
+    "\n",
+    "Apart from this extreme case, it's hard to know for sure what should qualify or\n",
+    "not as noise and which kind of \"noise\" as introduced above is dominating. But in\n",
+    "practice, the best way to make our predictive models robust to noise is to\n",
+    "avoid overfitting models by:\n",
     "\n",
     "- selecting models that are simple enough or with tuned hyper-parameters as\n",
     "  explained in this module;\n",
diff --git a/notebooks/linear_models_ex_01.ipynb b/notebooks/linear_models_ex_01.ipynb
@@ -43,7 +43,7 @@
     "penguins = pd.read_csv(\"../datasets/penguins_regression.csv\")\n",
     "feature_name = \"Flipper Length (mm)\"\n",
     "target_name = \"Body Mass (g)\"\n",
-    "data, target = penguins[[feature_name]], penguins[target_name]"
+    "data, target = penguins[[feature_name]], penguins[[target_name]]"
    ]
   },
   {
diff --git a/notebooks/linear_models_sol_01.ipynb b/notebooks/linear_models_sol_01.ipynb
@@ -43,7 +43,7 @@
     "penguins = pd.read_csv(\"../datasets/penguins_regression.csv\")\n",
     "feature_name = \"Flipper Length (mm)\"\n",
     "target_name = \"Body Mass (g)\"\n",
-    "data, target = penguins[[feature_name]], penguins[target_name]"
+    "data, target = penguins[[feature_name]], penguins[[target_name]]"
    ]
   },
   {
@@ -153,7 +153,7 @@
     "def goodness_fit_measure(true_values, predictions):\n",
     "    # we compute the error between the true values and the predictions of our\n",
     "    # model\n",
-    "    errors = np.ravel(true_values) - np.ravel(predictions)\n",
+    "    errors = true_values - predictions\n",
     "    # We have several possible strategies to reduce all errors to a single value.\n",
     "    # Computing the mean error (sum divided by the number of element) might seem\n",
     "    # like a good solution. However, we have negative errors that will misleadingly\n",
diff --git a/python_scripts/cross_validation_validation_curve.py b/python_scripts/cross_validation_validation_curve.py
@@ -202,16 +202,16 @@
 #
 # One extreme case could happen if there where samples in the dataset with
 # exactly the same input feature values but different values for the target
-# variable. That is very unlikely in real life settings, but could the case if
-# all features are categorical or if the numerical features were discretized
-# or rounded up naively. In our example, we can imagine two houses having
-# the exact same features in our dataset, but having different prices because
-# of the (unmeasured) seller's rush.
+# variable. That is very unlikely in real life settings, but could be the case
+# if all features are categorical or if the numerical features were discretized
+# or rounded up naively. In our example, we can imagine two houses having the
+# exact same features in our dataset, but having different prices because of the
+# (unmeasured) seller's rush.
 #
-# Apart from these extreme case, it's hard to know for sure what should qualify
+# Apart from this extreme case, it's hard to know for sure what should qualify
 # or not as noise and which kind of "noise" as introduced above is dominating.
-# But in practice, the best ways to make our predictive models robust to noise
-# are to avoid overfitting models by:
+# But in practice, the best way to make our predictive models robust to noise
+# is to avoid overfitting models by:
 #
 # - selecting models that are simple enough or with tuned hyper-parameters as
 #   explained in this module;
diff --git a/python_scripts/datasets_ames_housing.py b/python_scripts/datasets_ames_housing.py
@@ -169,7 +169,6 @@
 from sklearn.impute import SimpleImputer
 from sklearn.pipeline import make_pipeline
 
-
 numerical_features = [
     "LotFrontage",
     "LotArea",
diff --git a/python_scripts/ensemble_bagging.py b/python_scripts/ensemble_bagging.py
@@ -356,7 +356,6 @@ def bootstrap_sample(data, target, seed=0):
 from sklearn.preprocessing import MinMaxScaler
 from sklearn.pipeline import make_pipeline
 
-
 polynomial_regressor = make_pipeline(
     MinMaxScaler(),
     PolynomialFeatures(degree=4, include_bias=False),
diff --git a/python_scripts/feature_selection_introduction.py b/python_scripts/feature_selection_introduction.py
@@ -57,7 +57,6 @@
 from sklearn.feature_selection import f_classif
 from sklearn.pipeline import make_pipeline
 
-
 model_with_selection = make_pipeline(
     SelectKBest(score_func=f_classif, k=2),
     RandomForestClassifier(n_jobs=2),
diff --git a/python_scripts/linear_models_ex_01.py b/python_scripts/linear_models_ex_01.py
@@ -40,7 +40,7 @@
 penguins = pd.read_csv("../datasets/penguins_regression.csv")
 feature_name = "Flipper Length (mm)"
 target_name = "Body Mass (g)"
-data, target = penguins[[feature_name]], penguins[target_name]
+data, target = penguins[[feature_name]], penguins[[target_name]]
 
 # %% [markdown]
 # ### Model definition
diff --git a/python_scripts/linear_models_feature_engineering_classification.py b/python_scripts/linear_models_feature_engineering_classification.py
@@ -84,7 +84,6 @@
 import matplotlib.pyplot as plt
 from matplotlib.colors import ListedColormap
 
-
 _, axs = plt.subplots(ncols=3, figsize=(14, 4), constrained_layout=True)
 
 common_scatter_plot_params = dict(
diff --git a/python_scripts/linear_models_sol_01.py b/python_scripts/linear_models_sol_01.py
@@ -34,7 +34,7 @@
 penguins = pd.read_csv("../datasets/penguins_regression.csv")
 feature_name = "Flipper Length (mm)"
 target_name = "Body Mass (g)"
-data, target = penguins[[feature_name]], penguins[target_name]
+data, target = penguins[[feature_name]], penguins[[target_name]]
 
 # %% [markdown]
 # ### Model definition
@@ -107,7 +107,7 @@ def linear_model_flipper_mass(
 def goodness_fit_measure(true_values, predictions):
     # we compute the error between the true values and the predictions of our
     # model
-    errors = np.ravel(true_values) - np.ravel(predictions)
+    errors = true_values - predictions
     # We have several possible strategies to reduce all errors to a single value.
     # Computing the mean error (sum divided by the number of element) might seem
     # like a good solution. However, we have negative errors that will misleadingly
diff --git a/python_scripts/parameter_tuning_sol_03.py b/python_scripts/parameter_tuning_sol_03.py
@@ -80,8 +80,8 @@
 from sklearn.model_selection import RandomizedSearchCV
 
 param_distributions = {
-    "kneighborsregressor__n_neighbors": np.logspace(0, 3, num=10).astype(
-        np.int32
+    "kneighborsregressor__n_neighbors": (
+        np.logspace(0, 3, num=10).astype(np.int32)
     ),
     "standardscaler__with_mean": [True, False],
     "standardscaler__with_std": [True, False],
diff --git a/python_scripts/trees_sol_01.py b/python_scripts/trees_sol_01.py
@@ -71,7 +71,6 @@
 
 from sklearn.inspection import DecisionBoundaryDisplay
 
-
 tab10_norm = mpl.colors.Normalize(vmin=-0.5, vmax=8.5)
 
 palette = ["tab:blue", "tab:green", "tab:orange"]
diff --git a/requirements-dev.txt b/requirements-dev.txt
@@ -4,7 +4,7 @@ matplotlib
 seaborn >= 0.13
 plotly
 skrub
-jupyter-book>=0.11
+jupyter-book < 2.0
 jupytext
 beautifulsoup4
 IPython

Original file line number	Diff line number	Diff line change
`@@ -43,7 +43,7 @@`
`43`	`43`	`"penguins = pd.read_csv(\"../datasets/penguins_regression.csv\")\n",`
`44`	`44`	`"feature_name = \"Flipper Length (mm)\"\n",`
`45`	`45`	`"target_name = \"Body Mass (g)\"\n",`
`46`		`- "data, target = penguins[[feature_name]], penguins[target_name]"`
	`46`	`+ "data, target = penguins[[feature_name]], penguins[[target_name]]"`
`47`	`47`	`]`
`48`	`48`	`},`
`49`	`49`	`{`