Skip to content

Commit a289734

Browse files
committed
2 parents 1351efa + 9fa87b4 commit a289734

1 file changed

Lines changed: 91 additions & 29 deletions

File tree

README.md

Lines changed: 91 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -4,67 +4,129 @@ This repository contains all materials associated with the manuscript:
44
**"From Multilevel Modeling to GEE: Revisiting the Within- and Between-Person Debate with Binary Predictors and Outcomes"**
55
It includes code, supplementary documentation, and simulation results to ensure full transparency and reproducibility of the study.
66

7-
## Reproducibility via `renv`
7+
## Ethics Assessment
88

9-
This repository uses the [`renv`](https://rstudio.github.io/renv/) package to create a reproducible R environment. To replicate the computational setup:
9+
This simulation study was approved by the Ethical Review Board of the Faculty of Social and Behavioural Sciences of Utrecht University. The approval is based on the documents submitted by the researchers as required by the Ethics Committee and filed under FETC number 24-2003. The approval is valid through 31 May 2025. The approval pertains to ethical considerations, data management, and privacy issues (including GDPR compliance).
1010

11-
1. Clone or download the entire repository.
12-
2. Open the project in RStudio (or start an R session in the project folder).
13-
3. Run:
11+
## Study Design
1412

15-
```r
16-
renv::restore()
17-
```
13+
This simulation study evaluates the generalizability of disaggregation methods—commonly applied in multilevel linear models (MLMs)—to *generalized* multilevel models (GLMMs) and *generalized estimating equations* (GEEs) in the context of binary predictors and/or outcomes.
14+
15+
We address two questions:
16+
17+
1. Can disaggregation methods (uncentered, centering-within-cluster, Mundlak's contextual model) reliably recover within-person and contextual effects in GLMMs with binary predictors and/or outcomes?
18+
2. Do GEEs require explicit disaggregation to correctly estimate within-person effects in the presence of contextual effects?
1819

19-
This restores all package versions as specified in the `renv.lock` file, ensuring consistent results across systems and over time.
20+
We simulate data under four data-generating mechanisms (DGMs) that vary the scale of the predictor and outcome variables (binary or continuous). Across DGMs, the following parameters are held constant: the within-cluster SD of the continuous predictor, the fixed intercept, the within-cluster effect, and the level-1 residual SD (for DGMs with a continuous outcome). The table below summarizes the manipulated design factors:
21+
22+
| Factor | Levels |
23+
| ------------------------------------ | --------- |
24+
| Sample size *(N)* | 100, 200 |
25+
| Number of time points *(T)* | 5, 10, 20 |
26+
| Between-cluster SD in continuous *X* | 0, 1, 3 |
27+
| SD in latent *Z* for binary *X* | 0, 1, 3 |
28+
| Contextual effect | 0, 1, 3 |
29+
| Random intercept residual SD | 1, 3 |
30+
31+
Each dataset is analyzed using 12 strategies: all combinations of 3 disaggregation methods and 4 estimation approaches (GLMM and GEE with independence, exchangeable, and AR(1) correlation structures). Model performance is evaluated in terms of estimation bias for the within-person effect ($\beta_1$) and the contextual effect ($\gamma_{01}$).
2032

2133
## Repository Structure
2234

35+
**`renv.lock`**
36+
37+
Contains dependency information for full reproducibility with `renv`.
38+
2339
### `scripts/`
2440

2541
Contains all core scripts for running and analyzing the simulation study.
2642

27-
* **`main-simulation-function-future-simul.R`**
28-
Modularized main script that executes the simulation across various design conditions.
43+
* **`main-simulation-function-future-simul-part1.R`**
44+
Modularized script that runs simulations for DGMs 2–4.
45+
46+
* **`main-simulation-function-future-simul-part2.R`**
47+
Modularized script that runs simulations for DGM 1.
2948

3049
* **`results-plotting.R`**
31-
Scripts used to produce the plots featured in the Results section of the manuscript.
50+
Produces plots used in the result section of the manuscript.
3251

3352
* **`helper-functions/`** (subfolder with modular components):
3453

3554
* `data-generation-centeredX.R`: Data-generating mechanisms based on the hybrid model.
3655
* `data-generation-mundlak.R`: Data-generating mechanisms based on Mundlak’s contextual model (the model used in the main manuscript).
3756
* `model-fitting.R`: Model-fitting procedures for both GLMMs and GEEs.
38-
* `result-formatting.R`: Functions to clean and format simulation output.
57+
* `result-formatting.R`: Function to clean and format model-fitting output.
3958

4059
### `docs/`
4160

42-
Contains interactive and rendered documents to explore the simulation designs.
43-
44-
* **`data-exploration.qmd`**: Allows users to explore all four data-generating mechanisms (DGMs) considered in the study.
45-
* **`data-exploration.html`**: Rendered HTML version for [direct inspection](https://wardeiling.github.io/multilevel-vs-gee-binary/data-exploration.html).
46-
47-
Supplementary materials accompanying the manuscript.
61+
Contains supporting materials that provide additional context and in-depth explanations of specific aspects of the study.
4862

49-
* **`supplementary_materials.qmd`**: Quarto document with:
63+
* **`data-exploration.qmd`** / **`.html`**
64+
Allows users to explore all four data-generating mechanisms (DGMs) considered in the study. [View HTML](https://wardeiling.github.io/multilevel-vs-gee-binary/data-exploration.html)
5065

66+
* **`supplementary_materials.qmd`** / **`.html`**
67+
Supplementary materials accompanying the manuscript [View HTML](https://wardeiling.github.io/multilevel-vs-gee-binary/supplementary_materials.html)
5168
1. A comparison between the hybrid and Mundlak's contextual model.
5269
2. Discussion of boundary/extreme estimates in GEEs and how they were handled.
53-
* **`supplementary_materials.html`**: Rendered HTML version for [direct inspection](https://wardeiling.github.io/multilevel-vs-gee-binary/supplementary_materials.html).
5470

55-
### `simulation_results/`
71+
### `output/`
5672

5773
Contains raw and processed simulation outputs, organized into subfolders corresponding to different simulation runs:
5874

59-
* **`April10_fullsimulation/`**: Part 1, covering DGMs 2–4.
60-
* **`April17_fullsimulation_contxy/`**: Part 2, covering DGM 1.
61-
* **`April18_fullsimulation_combined/figures/`**: Final figures used in the manuscript.
75+
* **`April10_fullsimulation/`**: Part 1 of the simulations, covering DGMs 2–4.
76+
* **`April17_fullsimulation_contxy/`**: Part 2 of the simulations, covering DGM 1.
77+
* **`April18_fullsimulation_combined/figures/`**: Final figures used in result section of the manuscript.
6278

63-
Each run folder contains:
79+
Each folder includes:
6480

65-
* `i.RDS`: Raw output for each design/scenario `i`.
66-
* `settings.RDS`: Simulation settings used for that batch.
81+
* `i.RDS`: Raw output for each scenario *i*.
82+
* `settings.RDS`: Simulation settings used for that part.
6783
* `log.txt`: Logs containing warnings and errors during simulation.
68-
* `summary-results-bias.RDS` & `.csv`: Summary files quantifying bias in the estimates.
84+
* `summary-results-bias.RDS` and `.csv`: Summary files quantifying bias in the estimates.
85+
86+
### `renv/`
87+
88+
Contains internal `renv` files storing the project-specific package environment.
89+
90+
## Reproducibility: Step-by-Step Guide
91+
92+
This repository uses the [`renv`](https://rstudio.github.io/renv/) package to create a reproducible R environment. To replicate the computational setup and rerun the analyses:
93+
94+
### Step 1: Setup R and RStudio
95+
96+
1. Install **R version 4.2.2** from CRAN ([download link](https://cran.rstudio.com/bin/windows/base/old/4.2.2/R-4.2.2-win.exe))
97+
2. Install **RStudio** (latest stable release)
98+
99+
### Step 2: Clone or Download the Repository
100+
101+
Clone the repository via GitHub or download the ZIP file and unzip it locally.
102+
103+
### Step 3: Restore the Project Environment via `renv`
104+
105+
1. Open **`Master-Thesis.Rproj`** with RStudio.
106+
2. Run the following in the R console:
107+
108+
```r
109+
renv::restore()
110+
```
111+
112+
This restores the exact package versions as specified in the `renv.lock` file, ensuring a consistent and reproducible computational environment.
113+
114+
### Step 4: Run the Simulation Scripts
115+
116+
1. Execute **`scripts/main-simulation-function-future-simul-part1.R`**
117+
118+
* Runs simulations for DGMs 2–4
119+
* Outputs will be saved in `simulation_results/April10_fullsimulation/`
120+
121+
2. Execute **`scripts/main-simulation-function-future-simul-part2.R`**
122+
123+
* Runs simulations for DGM 1
124+
* Outputs will be saved in `simulation_results/April17_fullsimulation_contxy/`
125+
126+
### Step 5: Reproduce the Figures
127+
128+
Run **`scripts/results-plotting.R`** to generate the plots used in the manuscript.
129+
130+
This script automatically collects and merges the simulation outputs from both parts and creates boxplots for each design condition.
69131

70132
---

0 commit comments

Comments
 (0)