Skip to content

Commit 5aa2984

Browse files
committed
Updated group-sequential-testing.rmd
1 parent 065eba2 commit 5aa2984

1 file changed

Lines changed: 99 additions & 46 deletions

File tree

vignettes/group-sequential-testing.Rmd

Lines changed: 99 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
---
22
title: "Group Sequential Design with Graphical Approaches"
3-
output: rmarkdown::html_vignette
3+
output:
4+
rmarkdown::html_vignette:
5+
number_sections: true
6+
toc: true
47
vignette: >
58
%\VignetteIndexEntry{Group Sequential Design with Graphical Approaches}
69
%\VignetteEngine{knitr::rmarkdown}
@@ -81,7 +84,11 @@ $$P(Z_1 < b_1, \ldots, Z_k < b_k) = 1 - f(\alpha, t_k)$$
8184

8285
where $f(\alpha, t_k)$ is the cumulative spending and
8386
$(Z_1, \ldots, Z_k)$ follow the canonical joint distribution with
84-
$\text{Cor}(Z_i, Z_j) = \sqrt{t_i / t_j}$ for $i \le j$.
87+
$\text{Cor}(Z_i, Z_j) = \sqrt{t_i / t_j}$ for $i \le j$. We consider
88+
one-sided tests with the upper alternative (i.e., larger effects are better).
89+
At analysis $k$, the null hypothesis is rejected if $Z_k \ge b_k$, or
90+
equivalently, if the observed p-value $p_k \le \Phi(-b_k)$ where $\Phi$ is
91+
the standard normal CDF.
8592

8693
```{r boundaries-example}
8794
# Compute boundaries for OBF spending at alpha = 0.025 with 3 equally spaced analyses
@@ -101,13 +108,15 @@ knitr::kable(boundary_table, digits = 6,
101108

102109
## Repeated and Sequential P-values
103110

104-
Two types of p-values are central to the group sequential graphical procedure:
111+
Two types of p-values are central to the group sequential graphical procedure.
112+
Let $\hat{p}_k$ denote the repeated p-value and $\tilde{p}_k$ denote the
113+
sequential p-value at analysis $k$.
105114

106-
- **Repeated p-value** at analysis $k$: the minimum significance level at which
115+
- **Repeated p-value** $\hat{p}_k$: the minimum significance level at which
107116
the group sequential boundary *at analysis $k$ specifically* would be crossed.
108117
It only considers the boundary at the current analysis.
109118

110-
- **Sequential p-value** at analysis $k$: the minimum significance level at
119+
- **Sequential p-value** $\tilde{p}_k$: the minimum significance level at
111120
which any group sequential boundary *at analyses $1, \ldots, k$* would be
112121
crossed. It equals the cumulative minimum of repeated p-values:
113122
$\tilde{p}_k = \min_{l=1}^{k} \hat{p}_l$.
@@ -140,6 +149,16 @@ since it considers all prior analyses. A hypothesis that nearly crossed its
140149
boundary at an earlier analysis will have a much smaller sequential p-value
141150
than its repeated p-value at the current analysis.
142151

152+
The `graph_test_shortcut_gsd()` function supports two modes controlled by the
153+
`look_back` parameter. When `look_back = FALSE` (the default), rejection
154+
decisions at each analysis are based on repeated p-values only — i.e., only
155+
the boundary at the current analysis is considered. When `look_back = TRUE`,
156+
rejection decisions are based on sequential p-values, which "look back" at
157+
all prior analyses by taking the cumulative minimum of repeated p-values.
158+
This means that strong evidence from an earlier analysis is carried forward
159+
and can contribute to a rejection at a later analysis. Both modes are
160+
illustrated in the case studies below.
161+
143162
## Case Study: Maurer and Bretz (2013), Section 4
144163

145164
We replicate the numerical example from Section 4 of Maurer and Bretz (2013).
@@ -159,7 +178,8 @@ The trial has four hypotheses:
159178
The testing strategy follows the successiveness principle: secondary hypotheses
160179
cannot be tested until their parent primary hypothesis is rejected. Both
161180
primary hypotheses start with equal weight (0.5 each), and upon rejection,
162-
the full weight propagates to the corresponding secondary hypothesis.
181+
the weight is split equally between the other primary hypothesis and the
182+
corresponding secondary hypothesis.
163183

164184
```{r graph-setup}
165185
hypotheses <- c(0.5, 0.5, 0, 0)
@@ -194,26 +214,28 @@ p <- rbind(
194214
H4 = c(0.13, 0.06)
195215
)
196216
217+
p_display <- as.data.frame(p)
218+
colnames(p_display) <- paste("Analysis", 1:2)
197219
knitr::kable(
198-
data.frame(
199-
Analysis = 1:2,
200-
`Info Fraction` = c("1/3", "2/3"),
201-
H1 = p["H1", ], H2 = p["H2", ], H3 = p["H3", ], H4 = p["H4", ],
202-
check.names = FALSE
203-
),
204-
caption = "Observed nominal p-values (Table 1 of Maurer and Bretz, 2013)"
220+
p_display,
221+
caption = "Observed nominal p-values"
205222
)
206223
```
207224

208225
### Running the Procedure (look_back = FALSE)
209226

210227
The default mode is `look_back = FALSE`, which means the procedure does **not**
211-
look back at evidence from prior analyses. At each analysis, rejection decisions
212-
are based solely on the data observed at that analysis.
228+
look back at test statistics from prior analyses. At each analysis $k$,
229+
rejection decisions are based solely on the repeated p-value $\hat{p}_k$
230+
computed from the test statistic at analysis $k$, without utilizing test
231+
statistics from previous analyses. Note that the test statistic at analysis
232+
$k$ is computed from all data accumulated up to that point, but the
233+
rejection decision at analysis $k$ does not incorporate the test statistics
234+
(or repeated p-values) from analyses $1, \ldots, k-1$.
213235

214236
There are two equivalent ways to understand the rejection decisions:
215237

216-
1. **Repeated p-values** (default): A repeated p-value at analysis $k$ is the
238+
1. **Repeated p-values** (default): The repeated p-value $\hat{p}_k$ is the
217239
minimum significance level at which the group sequential boundary at
218240
analysis $k$ would be crossed. These are passed to the graphical shortcut
219241
procedure (`graph_test_shortcut()`) for multiplicity adjustment.
@@ -301,35 +323,41 @@ with their new (increased) weights, potentially enabling further rejections
301323
at the same analysis.
302324

303325
```{r test-values-tables}
304-
knitr::kable(result$test_values[[1]], digits = 6,
326+
format_test_values <- function(tv) {
327+
tv$Boundary <- formatC(tv$Boundary, format = "f", digits = 6)
328+
tv
329+
}
330+
knitr::kable(format_test_values(result$test_values[[1]]), digits = 6,
305331
caption = "Analysis 1: nominal boundaries and rejection decisions")
306-
knitr::kable(result$test_values[[2]], digits = 6,
332+
knitr::kable(format_test_values(result$test_values[[2]]), digits = 6,
307333
caption = "Analysis 2: nominal boundaries and rejection decisions")
308334
```
309335

310336
**Analysis 1 (t = 1/3).** The initial weights are $(0.5, 0.5, 0, 0)$. The
311337
OBF spending function allocates very little alpha to the first interim
312-
analysis — the nominal boundary for $H_1$ and $H_2$ is approximately
313-
`r sprintf("%.5f", result$test_values[[1]]$Boundary[1])`. Since both observed
314-
p-values ($p_{1,1} = 0.0062$ and $p_{2,1} = 0.017$) exceed this boundary, no
315-
hypothesis is rejected.
338+
analysis — as shown in the Analysis 1 table above, the nominal boundary for
339+
$H_1$ and $H_2$ is approximately
340+
`r formatC(result$test_values[[1]]$Boundary[1], format = "f", digits = 6)`.
341+
Since both observed p-values (0.0062 for $H_1$ and 0.017 for $H_2$) exceed
342+
this boundary, no hypothesis is rejected.
316343

317-
An important note from the paper: the nominal significance level
318-
$\alpha^*_{1,1}(w \cdot \alpha)$ is **not** equal to
319-
$w \cdot \alpha^*_{1,1}(\alpha)$:
344+
An important note: the nominal boundary computed at a fraction of alpha is
345+
**not** equal to the same fraction of the boundary computed at the full alpha.
346+
For example, the boundary at the first analysis with the OBF spending function:
320347

321348
```{r key-inequality}
322349
b_half <- gs_boundaries(0.0125, c(1/3, 2/3, 1), spending_of)
323350
b_full <- gs_boundaries(0.025, c(1/3, 2/3, 1), spending_of)
324351
cat(sprintf(
325-
"alpha*_1(0.0125) = %.6f\n0.5 * alpha*_1(0.025) = %.6f\n",
352+
"Boundary at alpha = 0.0125: %.6f\n0.5 * Boundary at alpha = 0.025: %.6f\n",
326353
b_half$bounds_nominal[1],
327354
0.5 * b_full$bounds_nominal[1]
328355
))
329356
```
330357

331-
This demonstrates why one must evaluate the spending function at
332-
$w_i \cdot \alpha$, not apply the weight to boundaries computed at $\alpha$.
358+
This demonstrates why the spending function must be evaluated at the
359+
hypothesis-specific significance level (weight times alpha), rather than
360+
applying the weight to boundaries computed at the full alpha.
333361

334362
**Analysis 2 (t = 2/3).** The test_values table above shows the boundary for
335363
each hypothesis at the point when it is tested, reflecting sequential graph
@@ -505,14 +533,10 @@ via graph update) at a later analysis. We illustrate this using the same graph
505533
but with modified p-values and Pocock spending:
506534

507535
```{r look-back-difference}
508-
# Same graph as MB case study, but different p-values and spending function
509-
# H3 has strong evidence at analysis 1 (p = 0.0008) but just misses at analysis 2
510-
p_modified <- rbind(
511-
H1 = c(0.02, 0.0002),
512-
H2 = c(0.02, 0.003),
513-
H3 = c(0.0008, 0.006),
514-
H4 = c(0.3, 0.2)
515-
)
536+
# Same graph and p-values as MB case study, except H3's p-values are modified
537+
# H3 has strong evidence at analysis 1 (p = 0.0008) but weaker at analysis 2
538+
p_modified <- p
539+
p_modified["H3", ] <- c(0.0008, 0.006)
516540
517541
# look_back = FALSE: only considers repeated p-values at each analysis
518542
result_no_lb <- graph_test_shortcut_gsd(
@@ -738,18 +762,34 @@ The transition structure follows the hierarchy: within each population, alpha
738762
flows from OS to PFS to ORR, and ORR recycles to OS. Between populations,
739763
the all-subjects hypotheses share alpha with the subgroup hypotheses.
740764

741-
```{r oncology-graph-plot, eval = requireNamespace("igraph", quietly = TRUE), fig.height=8, fig.width=6}
765+
```{r oncology-graph-plot, eval = requireNamespace("igraph", quietly = TRUE), fig.height=6, fig.width=6}
742766
onc_layout <- rbind(
743-
c(1, 3), # H1_OS_S
744-
c(2, 3), # H2_OS_A
745-
c(1, 2), # H3_PFS_S
746-
c(2.5, 2), # H4_PFS_A (shifted right)
747-
c(1, 1), # H5_ORR_S
748-
c(2, 1) # H6_ORR_A
767+
c(0, 3), # H1_OS_S
768+
c(2, 3), # H2_OS_A
769+
c(0, 1.8), # H3_PFS_S
770+
c(1.3, 1.8), # H4_PFS_A
771+
c(0, 0.5), # H5_ORR_S
772+
c(2, 0.5) # H6_ORR_A
749773
)
750-
plot(g_onc, layout = onc_layout, vertex.size = 60,
751-
edge_curves = c("H6_ORR_A|H2_OS_A" = 0.01,
752-
"H4_PFS_A|H5_ORR_S" = 0.01,
774+
# Edge label positions: NA = auto, explicit coords to move specific labels
775+
# Edge order: 1=H6->H1, 2=H1->H2, 3=H6->H2, 4=H2->H3, 5=H2->H4,
776+
# 6=H3->H4, 7=H4->H5, 8=H4->H6, 9=H5->H6
777+
label_x <- rep(NA, 9)
778+
label_y <- rep(NA, 9)
779+
label_x[1] <- 1.35; label_y[1] <- 1.1 # H6->H1: on the curved edge
780+
label_x[7] <- 0.65; label_y[7] <- 1.1 # H4->H5: between nodes, near arrow
781+
782+
plot(g_onc, layout = onc_layout, vertex.size = 60, asp = 1,
783+
vertex.label.cex = 0.7,
784+
rescale = FALSE,
785+
xlim = c(-1.2, 3.5),
786+
ylim = c(-0.2, 3.8),
787+
edge.label.x = label_x,
788+
edge.label.y = label_y,
789+
edge_curves = c("H6_ORR_A|H2_OS_A" = 0,
790+
"H6_ORR_A|H1_OS_S" = 0.2,
791+
"H4_PFS_A|H6_ORR_A" = 0,
792+
"H4_PFS_A|H5_ORR_S" = 0,
753793
"H3_PFS_S|H4_PFS_A" = 0))
754794
```
755795

@@ -939,6 +979,19 @@ For this example, both modes produce the same rejection decisions. This is
939979
because the evidence at the rejection analyses is strong enough that looking
940980
back at earlier analyses does not change the outcome.
941981

982+
**Note on differences from gMCPLite.** This case study is adapted from the
983+
[gMCPLite vignette](https://cran.r-project.org/web/packages/gMCPLite/vignettes/huyett-burnett-example.html).
984+
The rejection decisions (H1, H3, H5 rejected; H2, H4, H6 not rejected) agree
985+
between the two implementations. However, sequential p-values may differ
986+
slightly for some hypotheses. The reason is that gMCPLite (via gsDesign)
987+
separates *spending time* from *information fraction*: for all-subjects
988+
hypotheses (H2 and H4), gMCPLite uses the subgroup event counts as the
989+
spending time while using the all-subjects event counts for the correlation
990+
structure. In contrast, `graphicalMCP` uses `info_frac` for both alpha
991+
spending and the correlation structure. This difference affects the group
992+
sequential boundaries and hence the sequential p-values, but in this example
993+
it does not change which hypotheses are rejected.
994+
942995
This case study demonstrates that `graph_test_shortcut_gsd()` handles trials
943996
where different endpoints have different numbers of analyses — a common
944997
situation in oncology trials with OS, PFS, and ORR endpoints.

0 commit comments

Comments
 (0)