Skip to content

Commit d262f9d

Browse files
committed
add some narrative for the quantile
1 parent a48d4d0 commit d262f9d

1 file changed

Lines changed: 44 additions & 3 deletions

File tree

content/python_files/feature_engineering.py

Lines changed: 44 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1429,6 +1429,12 @@ def scoring(regressor, X, y):
14291429
# In this section, we show how one can use a gradient boosting but modify the loss
14301430
# function to predict different quantiles and thus obtain an uncertainty quantification
14311431
# of the predictions.
1432+
#
1433+
# In terms of evaluation, we reuse the R2 and MAPE scores. However, they are not helpful
1434+
# to assess the reliability of quantile models. For this purpose, we use a derivate of
1435+
# the metric minimize by those models: the pinball loss. We use the D2 score that is
1436+
# easier to interpret since the best possible score is bounded by 1 and a score of 0
1437+
# corresponds to constant predictions at the target quantile.
14321438

14331439
# %%
14341440
from sklearn.metrics import d2_pinball_score
@@ -1441,6 +1447,14 @@ def scoring(regressor, X, y):
14411447
"d2_pinball_95": make_scorer(d2_pinball_score, alpha=0.95),
14421448
}
14431449

1450+
# %% [markdown]
1451+
#
1452+
# We know define three different models:
1453+
#
1454+
# - a model predicting the 5th percentile of the load
1455+
# - a model predicting the median of the load
1456+
# - a model predicting the 95th percentile of the load
1457+
14441458
# %%
14451459
common_params = dict(
14461460
loss="quantile", learning_rate=0.1, max_leaf_nodes=100, random_state=0
@@ -1458,6 +1472,10 @@ def scoring(regressor, X, y):
14581472
y=target,
14591473
)
14601474

1475+
# %% [markdown]
1476+
#
1477+
# Finally, we cross-validate each models and compute the above scores.
1478+
14611479
# %%
14621480
cv_results_hgbr_05 = predictions_hgbr_05.skb.cross_validate(
14631481
cv=ts_cv_5,
@@ -1481,6 +1499,10 @@ def scoring(regressor, X, y):
14811499
n_jobs=-1,
14821500
)
14831501

1502+
# %% [markdown]
1503+
#
1504+
# Let's now show the test scores for each model.
1505+
14841506
# %%
14851507
cv_results_hgbr_05[
14861508
[col for col in cv_results_hgbr_05.columns if col.startswith("test_")]
@@ -1496,6 +1518,16 @@ def scoring(regressor, X, y):
14961518
[col for col in cv_results_hgbr_95.columns if col.startswith("test_")]
14971519
].mean(axis=0).round(3)
14981520

1521+
# %% [markdown]
1522+
#
1523+
# Focusing on the different D2 scores, we observe that each model minimize the D2 score
1524+
# associated to the target quantile that we set. For instance, the model predicting the
1525+
# 5th percentile obtained the highest D2 pinball score with `alpha=0.05`. It is expected
1526+
# but a confirmation of what loss each model minimizes.
1527+
#
1528+
# Now, let's make a plot of the predictions for each model. Let's first gather all
1529+
# the predictions in a single dataframe.
1530+
14991531
# %%
15001532
results = pl.concat(
15011533
[
@@ -1507,6 +1539,12 @@ def scoring(regressor, X, y):
15071539
how="horizontal",
15081540
).tail(24 * 7)
15091541

1542+
# %% [markdown]
1543+
#
1544+
# Now, we plot the observed values and the predicted median with a line. In addition,
1545+
# we plot the 5th and 95th percentiles as a shaded area. It means that between those
1546+
# two bounds, we expect to find 90% of the observed values.
1547+
15101548
# %%
15111549
median_chart = (
15121550
altair.Chart(results)
@@ -1515,19 +1553,22 @@ def scoring(regressor, X, y):
15151553
.encode(x="prediction_time:T", y="value:Q", color="key:N")
15161554
)
15171555

1556+
# Add a column for the band legend
1557+
results_with_band = results.with_columns(pl.lit("90% interval").alias("band_type"))
1558+
15181559
quantile_band_chart = (
1519-
altair.Chart(results)
1560+
altair.Chart(results_with_band)
15201561
.mark_area(opacity=0.4, tooltip=True)
15211562
.encode(
15221563
x="prediction_time:T",
15231564
y="quantile_05:Q",
15241565
y2="quantile_95:Q",
1525-
color=altair.value("lightgreen"),
1566+
color=altair.Color("band_type:N", scale=altair.Scale(range=["lightgreen"])),
15261567
)
15271568
)
15281569

15291570
combined_chart = quantile_band_chart + median_chart
1530-
combined_chart.interactive()
1571+
combined_chart.resolve_scale(color="independent").interactive()
15311572

15321573
# %%
15331574
cv_predictions_hgbr_05 = collect_cv_predictions(

0 commit comments

Comments
 (0)