You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+48-66Lines changed: 48 additions & 66 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,117 +4,99 @@ This repository contains all materials associated with the manuscript:
4
4
**"From Multilevel Modeling to GEE: Revisiting the Within- and Between-Person Debate with Binary Predictors and Outcomes"**
5
5
It includes code, supplementary documentation, and simulation results to ensure full transparency and reproducibility of the study.
6
6
7
+
7
8
## Ethics Assessment
8
9
9
-
This simulation study was approved by the Ethical Review Board of the Faculty of Social and Behavioural Sciences of Utrecht University. The approval is based on the documents sent by the researchers as requested in the form of the Ethics committee and filed under FETC number 24-2003. The approval is valid through 31 May 2025. The approval of the Ethical Review Board concerns ethical aspects, as well as data management and privacy issues (including the GDPR).
10
+
This simulation study was approved by the Ethical Review Board of the Faculty of Social and Behavioural Sciences of Utrecht University. The approval is based on the documents submitted by the researchers as required by the Ethics Committee and filed under FETC number 24-2003. The approval is valid through 31 May 2025. The approval pertains to ethical considerations, data management, and privacy issues (including GDPR compliance).
10
11
11
12
## Study Design
12
13
13
-
This simulation study evaluates the generalizability of disaggregation methods—commonly applied in multilevel linear models (MLMs)—to *generalized* multilevel models (GLMMs) and *generalized estimating equations* (GEEs) when dealing with binary predictors and/or binary outcomes.
14
-
15
-
We address two primary questions:
16
-
17
-
1. Can disaggregation methods (UC, CWC, MuCo) reliably recover within-person and contextual effects in GLMMs with binary predictors and/or outcomes?
18
-
2. Do GEEs require explicit disaggregation to correctly estimate within-person effects, especially when contextual effects are present?
19
-
20
-
We simulate data across four data-generating models (DGMs) that vary in the scale of the predictor and outcome variables (continuous or binary). Across DGMs, we keep constant the within-cluser standard deviation (SD) of the continuous predictor, the fixed intercept, the within-cluster effect and the level 1 residual SD (for DGMs with continuous outcome). For each of these DGMs, we systematically vary:
14
+
This simulation study evaluates the generalizability of disaggregation methods—commonly applied in multilevel linear models (MLMs)—to *generalized* multilevel models (GLMMs) and *generalized estimating equations* (GEEs) in the context of binary predictors and/or outcomes.
21
15
22
-
CREATE MARKDOWN TABLE
16
+
We address two questions:
23
17
24
-
* Sample size *(N = 100, 200)*
25
-
* Number of time points *(T = 5, 10, 20)*
26
-
* Between-cluster SD in continuous predictor (0, 1, 3)
27
-
* SD in Z (the latent trait underlying between-person variability in binary X): (0, 1, 3)
28
-
* Contextual effect: (0, 1, 3)
29
-
* Random intercept residual SD: (1, 3)
18
+
1. Can disaggregation methods (uncentered, centering-within-cluster, Mundlak's contextual model) reliably recover within-person and contextual effects in GLMMs with binary predictors and/or outcomes?
19
+
2. Do GEEs require explicit disaggregation to correctly estimate within-person effects in the presence of contextual effects?
30
20
31
-
Each dataset is analyzed using 12 strategies: combinations of 3 disaggregation methods (uncentered, centering-within-clusters and mundlak's contextual model) and 4 estimation approaches (GLMM, and GEE with independence, exchangeable, and AR(1) correlation structures).
21
+
We simulate data under four data-generating mechanisms (DGMs) that vary the scale of the predictor and outcome variables (binary or continuous). Across DGMs, the following parameters are held constant: the within-cluster SD of the continuous predictor, the fixed intercept, the within-cluster effect, and the level-1 residual SD (for DGMs with a continuous outcome). The table below summarizes the manipulated design factors:
32
22
33
-
Model performance is assessed via estimation bias in fixed effects (within-person: β₁; contextual: γ₀₁).
Each dataset is analyzed using 12 strategies: all combinations of 3 disaggregation methods and 4 estimation approaches (GLMM and GEE with independence, exchangeable, and AR(1) correlation structures). Model performance is evaluated in terms of estimation bias for the within-person effect ($\beta_1$) and the contextual effect ($\gamma_{01}$).
36
33
37
34
## Repository Structure
38
35
39
36
**`renv.lock`**
40
37
41
-
Contains information on the requirements of all the dependencies of R-packages used in the simulation study.
38
+
Contains dependency information for full reproducibility with `renv`.
42
39
43
40
### `scripts/`
44
41
45
-
Contains all core scripts for running and analyzing the simulation study.
Modularized main script that executes the simulation across various design conditions for DGM 1.
46
+
Runs simulations for DGM 1.
51
47
52
48
***`results-plotting.R`**
53
-
Scripts used to produce the plots featured in the Results section of the manuscript.
49
+
Produces plots used in the manuscript.
54
50
55
-
***`helper-functions/`** (subfolder with modular components):
51
+
***`helper-functions/`**
56
52
57
-
*`data-generation-centeredX.R`: Data-generating mechanisms based on the hybrid model.
58
-
*`data-generation-mundlak.R`: Data-generating mechanisms based on Mundlak’s contextual model (the model used in the main manuscript).
59
-
*`model-fitting.R`: Model-fitting procedures for both GLMMs and GEEs.
60
-
*`result-formatting.R`: Functions to clean and format simulation output.
53
+
*`data-generation-centeredX.R`: Hybrid model data generation.
54
+
*`data-generation-mundlak.R`: Contextual model data generation.
55
+
*`model-fitting.R`: GLMM and GEE fitting procedures.
56
+
*`result-formatting.R`: Formatting and summarizing outputs.
61
57
62
58
### `docs/`
63
59
64
-
Contains interactive and rendered documents to explore the simulation designs.
65
-
66
-
***`data-exploration.qmd`**: Allows users to explore all four data-generating mechanisms (DGMs) considered in the study.
67
-
***`data-exploration.html`**: Rendered HTML version for [direct inspection](https://wardeiling.github.io/multilevel-vs-gee-binary/data-exploration.html).
60
+
***`data-exploration.qmd`** / **`.html`**
61
+
Interactive Quarto document for exploring DGMs. [View HTML](https://wardeiling.github.io/multilevel-vs-gee-binary/data-exploration.html)
68
62
69
-
Supplementary materials accompanying the manuscript.
70
-
71
-
***`supplementary_materials.qmd`**: Quarto document with:
72
-
73
-
1. A comparison between the hybrid and Mundlak's contextual model.
74
-
2. Discussion of boundary/extreme estimates in GEEs and how they were handled.
75
-
***`supplementary_materials.html`**: Rendered HTML version for [direct inspection](https://wardeiling.github.io/multilevel-vs-gee-binary/supplementary_materials.html).
Contains documents that save the settings of the `renv` environment.
81
+
Contains internal `renv` files storing the project-specific package environment.
95
82
96
83
## Reproducibility via `renv`
97
84
98
-
This repository uses the [`renv`](https://rstudio.github.io/renv/) package to create a reproducible R environment. To replicate the computational setup:
85
+
This project uses the [`renv`](https://rstudio.github.io/renv/) package to ensure a reproducible R environment. To replicate the computational setup:
99
86
100
-
1.Download `R` version 4.2.2 from CRAN ([link](https://cran.rstudio.com/bin/windows/base/old/4.4.2/R-4.4.2-win.exe)), install in RStudio and set as R version.
101
-
2. Clone or download the entire repository.
87
+
1.Install R version 4.2.2 from [CRAN](https://cran.rstudio.com/bin/windows/base/old/4.4.2/R-4.4.2-win.exe).
88
+
2. Clone or download the repository.
102
89
3. Open the project in RStudio.
103
90
4. Run:
104
91
105
92
```r
106
93
renv::restore()
107
94
```
108
95
109
-
This restores all package versions as specified in the `renv.lock` file, ensuring consistent results across systems and over time.
110
-
111
-
After the computational setup is replicated, we can run the main simulation and reproduce the output as follows
112
-
113
-
1. Run **`scripts/main-simulation-function-future-simul-part1.R`**, which should automatically retrieve the helper functions and produce output in the folder `April10_fullsimulation/`
114
-
2. Run **`scripts/main-simulation-function-future-simul-part2.R`**, which should automatically retrieve the helper functions and produce output in the folder `April17_fullsimulation_contXY/`
115
-
116
-
Now we can use the output to reproduce the figures shown in the result section of the manuscript as follows
96
+
This will install all required package versions as specified in `renv.lock`.
117
97
118
-
1. Run **`post-processing/results-plotting.R`**, which should retrieve the simulation output, merge them together and process it for creating the boxplots.
98
+
## Reproducing Results
119
99
120
-
---
100
+
1. Run `scripts/main-simulation-function-future-simul-part1.R` to simulate DGMs 2–4. Output is saved to `April10_fullsimulation/`.
101
+
2. Run `scripts/main-simulation-function-future-simul-part2.R` for DGM 1. Output is saved to `April17_fullsimulation_contxy/`.
102
+
3. Run `scripts/results-plotting.R` to process and visualize the simulation results used in the manuscript.
0 commit comments