Skip to content

Commit bb212c2

Browse files
avehtarifweber144
andauthored
add proj_epred (#560)
* add proj_epred * re-document * create NEWS entry --------- Co-authored-by: fweber144 <fweber144@protonmail.com>
1 parent 72eb5eb commit bb212c2

7 files changed

Lines changed: 435 additions & 44 deletions

File tree

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ export(get_refmodel)
7373
export(init_refmodel)
7474
export(performances)
7575
export(predictor_terms)
76+
export(proj_epred)
7677
export(proj_linpred)
7778
export(proj_predict)
7879
export(project)

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ If you read this from a place other than <https://mc-stan.org/projpred/news/inde
88

99
## Minor changes
1010

11+
* Added `proj_epred()`, which is essentially a wrapper around `proj_linpred()` with `transform = TRUE`. (GitHub: #559, #560)
12+
1113
## Bug fixes
1214

1315

R/methods.R

Lines changed: 76 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,39 @@
22

33
#' Predictions from a submodel (after projection)
44
#'
5-
#' After the projection of the reference model onto a submodel, the linear
6-
#' predictors (for the original or a new dataset) based on that submodel can be
7-
#' calculated by [proj_linpred()]. These linear predictors can also be
8-
#' transformed to response scale and averaged across the projected parameter
9-
#' draws. Furthermore, [proj_linpred()] returns the corresponding log predictive
10-
#' density values if the (original or new) dataset contains response values. The
11-
#' [proj_predict()] function draws from the predictive distributions (there is
12-
#' one such distribution for each observation from the original or new dataset)
13-
#' of the submodel that the reference model has been projected onto. If the
14-
#' projection has not been performed yet, both functions call [project()]
15-
#' internally to perform the projection. Both functions can also handle multiple
16-
#' submodels at once (for `object`s of class `vsel` or `object`s returned by a
17-
#' [project()] call to an object of class `vsel`; see [project()]).
5+
#' The [proj_predict()] function draws from the projected posterior
6+
#' predictive distribution of the submodel that the reference model
7+
#' has been projected onto. By definition, these draws have higher
8+
#' variability than draws of the expected value of the posterior
9+
#' predictive distribution computed by [proj_epred()]. This is because
10+
#' the aleatoric uncertainty from the data model is incorporated in
11+
#' [proj_predict()].
12+
#'
13+
#' The [proj_epred()] function draws from the distribution of the
14+
#' expected value of the projected posterior predictive distribution.
15+
#' By definition, these predictions have smaller variability than the
16+
#' projected posterior predictions performed by [proj_predict()]. This
17+
#' is because only the epistemic uncertainty in the expected value of
18+
#' the projected posterior predictive distribution is incorporated in
19+
#' the draws, while the aleatoric uncertainty from the data model is
20+
#' not included. However, the estimated means of both methods averaged
21+
#' across draws should be very similar.
22+
#'
23+
#' The [proj_linpred()] function draws from the projected posterior of the
24+
#' linear predictors, that is, draws before applying any link functions
25+
#' or other transformations. These linear predictors can also be
26+
#' transformed to response scale with argument `transform = TRUE`, which
27+
#' produces draws equivalent to draws produced by [proj_epred()].
28+
#' Furthermore, [proj_linpred()] returns the corresponding log predictive
29+
#' density values if the (original or new) dataset contains response values.
30+
#'
31+
#' All these predictions can be performed for the data used to fit the
32+
#' reference model or for new data. If the projection has not been
33+
#' performed yet, all three functions call [project()] internally to
34+
#' perform the projection. All three functions can also handle
35+
#' multiple submodels at once (for `object`s of class `vsel` or
36+
#' `object`s returned by a [project()] call to an object of class
37+
#' `vsel`; see [project()]).
1838
#'
1939
#' @name pred-projection
2040
#'
@@ -26,13 +46,14 @@
2646
#' for only those elements (submodels) with a number of predictor terms in
2747
#' `filter_nterms`. Therefore, needs to be a numeric vector or `NULL`. If
2848
#' `NULL`, use all submodels.
29-
#' @param transform For [proj_linpred()] only. A single logical value indicating
49+
#' @param transform For [proj_linpred()] only (not applicable for [proj_epred()]
50+
#' which always uses `transform = TRUE` internally). A single logical value indicating
3051
#' whether the linear predictor should be transformed to response scale using
3152
#' the inverse-link function (`TRUE`) or not (`FALSE`). In case of the latent
3253
#' projection, argument `transform` is similar in spirit to argument
3354
#' `resp_oscale` from other functions and affects the scale of both output
3455
#' elements `pred` and `lpd` (see sections "Details" and "Value" below).
35-
#' @param integrated For [proj_linpred()] only. A single logical value
56+
#' @param integrated For [proj_linpred()] and [proj_epred()] only. A single logical value
3657
#' indicating whether the output should be averaged across the projected
3758
#' posterior draws (`TRUE`) or not (`FALSE`).
3859
#' @param nresample_clusters For [proj_predict()] with clustered projection (and
@@ -42,8 +63,8 @@
4263
#' gives the number of draws (*with* replacement) from the set of clustered
4364
#' posterior draws after projection (with this set being determined by
4465
#' argument `nclusters` of [project()]).
45-
#' @param allow_nonconst_wdraws_prj Only relevant for [proj_linpred()] and only
46-
#' if `integrated` is `FALSE`. A single logical value indicating whether to
66+
#' @param allow_nonconst_wdraws_prj Only relevant for [proj_linpred()] and
67+
#' [proj_epred()] and only if `integrated` is `FALSE`. A single logical value indicating whether to
4768
#' allow projected draws with different (i.e., nonconstant) weights (`TRUE`)
4869
#' or not (`FALSE`). If `return_draws_matrix` is `TRUE`,
4970
#' `allow_nonconst_wdraws_prj` is internally set to `TRUE` as well.
@@ -53,16 +74,18 @@
5374
#' matrices).
5475
#' @param return_draws_matrix A single logical value indicating whether to
5576
#' return an object (in case of [proj_predict()]) or objects (in case of
56-
#' [proj_linpred()]) of class `draws_matrix` (see
57-
#' [posterior::draws_matrix()]). In case of [proj_linpred()] and projected
77+
#' [proj_linpred()] and [proj_epred()]) of class `draws_matrix` (see
78+
#' [posterior::draws_matrix()]). In case of [proj_linpred()] or
79+
#' [proj_epred()] and projected
5880
#' draws with nonconstant weights (as well as `integrated` being `FALSE`),
5981
#' [posterior::weight_draws()] is applied internally.
6082
#' @param .seed Pseudorandom number generation (PRNG) seed by which the same
6183
#' results can be obtained again if needed. Passed to argument `seed` of
6284
#' [set.seed()], but can also be `NA` to not call [set.seed()] at all. If not
6385
#' `NA`, then the PRNG state is reset (to the state before calling
64-
#' [proj_linpred()] or [proj_predict()]) upon exiting [proj_linpred()] or
65-
#' [proj_predict()]. Here, `.seed` is used for drawing new group-level effects
86+
#' [proj_linpred()], [proj_epred()], or [proj_predict()]) upon exiting
87+
#' [proj_linpred()], [proj_epred()], or [proj_predict()]. Here, `.seed` is
88+
#' used for drawing new group-level effects
6689
#' in case of a multilevel submodel (however, not yet in case of a GAMM) and
6790
#' for drawing from the predictive distributions of the submodel(s) in case of
6891
#' [proj_predict()]. If a clustered projection was performed, then in
@@ -140,6 +163,12 @@
140163
#' `return_draws_matrix`, `allow_nonconst_wdraws_prj`, and `integrated`
141164
#' are all `FALSE`, then projected draws with nonconstant weights cause an
142165
#' error.)
166+
#' * [proj_epred()] is a wrapper around [proj_linpred()] with `transform =
167+
#' TRUE` and returns only the draws of the expected value of the projected
168+
#' posterior predictive distribution on the response scale (i.e., the `pred`
169+
#' element of the `list` returned by [proj_linpred()], without the `lpd`
170+
#' element). The structure of the returned object is the same as that of the
171+
#' `pred` element described for [proj_linpred()] above.
143172
#' * [proj_predict()] returns an \eqn{S_{\mathrm{prj}} \times N}{S_prj x N}
144173
#' matrix of predictions where \eqn{S_{\mathrm{prj}}}{S_prj} denotes
145174
#' `nresample_clusters` in case of clustered projection (or, more generally,
@@ -179,14 +208,15 @@
179208
#' # Predictions (at the training points) from the submodel onto which the
180209
#' # reference model was projected:
181210
#' prjl <- proj_linpred(prj)
211+
#' prje <- proj_epred(prj)
182212
#' prjp <- proj_predict(prj, .seed = 7364)
183213
#'
184214
NULL
185215

186216
# Function definitions ----------------------------------------------------
187217

188-
## The 'helper' for proj_linpred and proj_predict, ie. does all the
189-
## functionality that is common to them. It essentially checks all the arguments
218+
## The 'helper' for proj_linpred, proj_epred, and proj_predict, ie. does all
219+
## the functionality that is common to them. It essentially checks all the arguments
190220
## and sets them to their respective defaults and then loops over the
191221
## projections. For each projection, it evaluates the fun-function, which
192222
## calculates the linear predictor if called from proj_linpred and samples from
@@ -506,6 +536,29 @@ proj_predict <- function(object, newdata = NULL, offsetnew = NULL,
506536
)
507537
}
508538

539+
#' @rdname pred-projection
540+
#' @export
541+
proj_epred <- function(object, newdata = NULL, offsetnew = NULL,
542+
weightsnew = NULL, filter_nterms = NULL,
543+
integrated = FALSE,
544+
allow_nonconst_wdraws_prj = return_draws_matrix,
545+
return_draws_matrix = FALSE, .seed = NA, ...) {
546+
out <- proj_linpred(
547+
object = object, newdata = newdata,
548+
offsetnew = offsetnew, weightsnew = weightsnew,
549+
filter_nterms = filter_nterms, transform = TRUE,
550+
integrated = integrated,
551+
allow_nonconst_wdraws_prj = allow_nonconst_wdraws_prj,
552+
return_draws_matrix = return_draws_matrix,
553+
.seed = .seed, ...
554+
)
555+
if (is.list(out) && "pred" %in% names(out)) {
556+
return(out$pred)
557+
}
558+
## Multiple submodels: each element is a list with `pred` and `lpd`.
559+
return(lapply(out, "[[", "pred"))
560+
}
561+
509562
## function applied to each projected submodel in case of proj_predict()
510563
proj_predict_aux <- function(proj, newdata, offsetnew, weightsnew,
511564
nresample_clusters = 1000, resp_oscale = TRUE,

man/pred-projection.Rd

Lines changed: 64 additions & 21 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)