Skip to content

Commit 0db6e14

Browse files
authored
Update README.md
1 parent bf9de33 commit 0db6e14

1 file changed

Lines changed: 48 additions & 66 deletions

File tree

README.md

Lines changed: 48 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -4,117 +4,99 @@ This repository contains all materials associated with the manuscript:
44
**"From Multilevel Modeling to GEE: Revisiting the Within- and Between-Person Debate with Binary Predictors and Outcomes"**
55
It includes code, supplementary documentation, and simulation results to ensure full transparency and reproducibility of the study.
66

7+
78
## Ethics Assessment
89

9-
This simulation study was approved by the Ethical Review Board of the Faculty of Social and Behavioural Sciences of Utrecht University. The approval is based on the documents sent by the researchers as requested in the form of the Ethics committee and filed under FETC number 24-2003. The approval is valid through 31 May 2025. The approval of the Ethical Review Board concerns ethical aspects, as well as data management and privacy issues (including the GDPR).
10+
This simulation study was approved by the Ethical Review Board of the Faculty of Social and Behavioural Sciences of Utrecht University. The approval is based on the documents submitted by the researchers as required by the Ethics Committee and filed under FETC number 24-2003. The approval is valid through 31 May 2025. The approval pertains to ethical considerations, data management, and privacy issues (including GDPR compliance).
1011

1112
## Study Design
1213

13-
This simulation study evaluates the generalizability of disaggregation methods—commonly applied in multilevel linear models (MLMs)—to *generalized* multilevel models (GLMMs) and *generalized estimating equations* (GEEs) when dealing with binary predictors and/or binary outcomes.
14-
15-
We address two primary questions:
16-
17-
1. Can disaggregation methods (UC, CWC, MuCo) reliably recover within-person and contextual effects in GLMMs with binary predictors and/or outcomes?
18-
2. Do GEEs require explicit disaggregation to correctly estimate within-person effects, especially when contextual effects are present?
19-
20-
We simulate data across four data-generating models (DGMs) that vary in the scale of the predictor and outcome variables (continuous or binary). Across DGMs, we keep constant the within-cluser standard deviation (SD) of the continuous predictor, the fixed intercept, the within-cluster effect and the level 1 residual SD (for DGMs with continuous outcome). For each of these DGMs, we systematically vary:
14+
This simulation study evaluates the generalizability of disaggregation methods—commonly applied in multilevel linear models (MLMs)—to *generalized* multilevel models (GLMMs) and *generalized estimating equations* (GEEs) in the context of binary predictors and/or outcomes.
2115

22-
CREATE MARKDOWN TABLE
16+
We address two questions:
2317

24-
* Sample size *(N = 100, 200)*
25-
* Number of time points *(T = 5, 10, 20)*
26-
* Between-cluster SD in continuous predictor (0, 1, 3)
27-
* SD in Z (the latent trait underlying between-person variability in binary X): (0, 1, 3)
28-
* Contextual effect: (0, 1, 3)
29-
* Random intercept residual SD: (1, 3)
18+
1. Can disaggregation methods (uncentered, centering-within-cluster, Mundlak's contextual model) reliably recover within-person and contextual effects in GLMMs with binary predictors and/or outcomes?
19+
2. Do GEEs require explicit disaggregation to correctly estimate within-person effects in the presence of contextual effects?
3020

31-
Each dataset is analyzed using 12 strategies: combinations of 3 disaggregation methods (uncentered, centering-within-clusters and mundlak's contextual model) and 4 estimation approaches (GLMM, and GEE with independence, exchangeable, and AR(1) correlation structures).
21+
We simulate data under four data-generating mechanisms (DGMs) that vary the scale of the predictor and outcome variables (binary or continuous). Across DGMs, the following parameters are held constant: the within-cluster SD of the continuous predictor, the fixed intercept, the within-cluster effect, and the level-1 residual SD (for DGMs with a continuous outcome). The table below summarizes the manipulated design factors:
3222

33-
Model performance is assessed via estimation bias in fixed effects (within-person: β₁; contextual: γ₀₁).
23+
| Factor | Levels |
24+
| ------------------------------------ | --------- |
25+
| Sample size *(N)* | 100, 200 |
26+
| Number of time points *(T)* | 5, 10, 20 |
27+
| Between-cluster SD in continuous *X* | 0, 1, 3 |
28+
| SD in latent *Z* for binary *X* | 0, 1, 3 |
29+
| Contextual effect | 0, 1, 3 |
30+
| Random intercept residual SD | 1, 3 |
3431

35-
---
32+
Each dataset is analyzed using 12 strategies: all combinations of 3 disaggregation methods and 4 estimation approaches (GLMM and GEE with independence, exchangeable, and AR(1) correlation structures). Model performance is evaluated in terms of estimation bias for the within-person effect ($\beta_1$) and the contextual effect ($\gamma_{01}$).
3633

3734
## Repository Structure
3835

3936
**`renv.lock`**
4037

41-
Contains information on the requirements of all the dependencies of R-packages used in the simulation study.
38+
Contains dependency information for full reproducibility with `renv`.
4239

4340
### `scripts/`
4441

45-
Contains all core scripts for running and analyzing the simulation study.
46-
4742
* **`main-simulation-function-future-simul-part1.R`**
48-
Modularized main script that executes the simulation across various design conditions for DGM 2, 3 and 4
43+
Runs simulations for DGMs 2–4.
44+
4945
* **`main-simulation-function-future-simul-part2.R`**
50-
Modularized main script that executes the simulation across various design conditions for DGM 1.
46+
Runs simulations for DGM 1.
5147

5248
* **`results-plotting.R`**
53-
Scripts used to produce the plots featured in the Results section of the manuscript.
49+
Produces plots used in the manuscript.
5450

55-
* **`helper-functions/`** (subfolder with modular components):
51+
* **`helper-functions/`**
5652

57-
* `data-generation-centeredX.R`: Data-generating mechanisms based on the hybrid model.
58-
* `data-generation-mundlak.R`: Data-generating mechanisms based on Mundlak’s contextual model (the model used in the main manuscript).
59-
* `model-fitting.R`: Model-fitting procedures for both GLMMs and GEEs.
60-
* `result-formatting.R`: Functions to clean and format simulation output.
53+
* `data-generation-centeredX.R`: Hybrid model data generation.
54+
* `data-generation-mundlak.R`: Contextual model data generation.
55+
* `model-fitting.R`: GLMM and GEE fitting procedures.
56+
* `result-formatting.R`: Formatting and summarizing outputs.
6157

6258
### `docs/`
6359

64-
Contains interactive and rendered documents to explore the simulation designs.
65-
66-
* **`data-exploration.qmd`**: Allows users to explore all four data-generating mechanisms (DGMs) considered in the study.
67-
* **`data-exploration.html`**: Rendered HTML version for [direct inspection](https://wardeiling.github.io/multilevel-vs-gee-binary/data-exploration.html).
60+
* **`data-exploration.qmd`** / **`.html`**
61+
Interactive Quarto document for exploring DGMs. [View HTML](https://wardeiling.github.io/multilevel-vs-gee-binary/data-exploration.html)
6862

69-
Supplementary materials accompanying the manuscript.
70-
71-
* **`supplementary_materials.qmd`**: Quarto document with:
72-
73-
1. A comparison between the hybrid and Mundlak's contextual model.
74-
2. Discussion of boundary/extreme estimates in GEEs and how they were handled.
75-
* **`supplementary_materials.html`**: Rendered HTML version for [direct inspection](https://wardeiling.github.io/multilevel-vs-gee-binary/supplementary_materials.html).
63+
* **`supplementary_materials.qmd`** / **`.html`**
64+
Additional theoretical discussion. [View HTML](https://wardeiling.github.io/multilevel-vs-gee-binary/supplementary_materials.html)
7665

7766
### `simulation_results/`
7867

79-
Contains raw and processed simulation outputs, organized into subfolders corresponding to different simulation runs:
80-
81-
* **`April10_fullsimulation/`**: Part 1, covering DGMs 2–4.
82-
* **`April17_fullsimulation_contxy/`**: Part 2, covering DGM 1.
83-
* **`April18_fullsimulation_combined/figures/`**: Final figures used in the manuscript.
68+
* **`April10_fullsimulation/`**: DGMs 2–4
69+
* **`April17_fullsimulation_contxy/`**: DGM 1
70+
* **`April18_fullsimulation_combined/figures/`**: Final manuscript figures
8471

85-
Each run folder contains:
72+
Each folder includes:
8673

87-
* `i.RDS`: Raw output for each design/scenario `i`.
88-
* `settings.RDS`: Simulation settings used for that batch.
89-
* `log.txt`: Logs containing warnings and errors during simulation.
90-
* `summary-results-bias.RDS` & `.csv`: Summary files quantifying bias in the estimates.
74+
* `i.RDS`: Raw results for design scenario *i*
75+
* `settings.RDS`: Design settings metadata
76+
* `log.txt`: Warnings/errors from execution
77+
* `summary-results-bias.RDS` and `.csv`: Summary performance metrics
9178

92-
### **`renv/`**
79+
### `renv/`
9380

94-
Contains documents that save the settings of the `renv` environment.
81+
Contains internal `renv` files storing the project-specific package environment.
9582

9683
## Reproducibility via `renv`
9784

98-
This repository uses the [`renv`](https://rstudio.github.io/renv/) package to create a reproducible R environment. To replicate the computational setup:
85+
This project uses the [`renv`](https://rstudio.github.io/renv/) package to ensure a reproducible R environment. To replicate the computational setup:
9986

100-
1. Download `R` version 4.2.2 from CRAN ([link](https://cran.rstudio.com/bin/windows/base/old/4.4.2/R-4.4.2-win.exe)), install in RStudio and set as R version.
101-
2. Clone or download the entire repository.
87+
1. Install R version 4.2.2 from [CRAN](https://cran.rstudio.com/bin/windows/base/old/4.4.2/R-4.4.2-win.exe).
88+
2. Clone or download the repository.
10289
3. Open the project in RStudio.
10390
4. Run:
10491

10592
```r
10693
renv::restore()
10794
```
10895

109-
This restores all package versions as specified in the `renv.lock` file, ensuring consistent results across systems and over time.
110-
111-
After the computational setup is replicated, we can run the main simulation and reproduce the output as follows
112-
113-
1. Run **`scripts/main-simulation-function-future-simul-part1.R`**, which should automatically retrieve the helper functions and produce output in the folder `April10_fullsimulation/`
114-
2. Run **`scripts/main-simulation-function-future-simul-part2.R`**, which should automatically retrieve the helper functions and produce output in the folder `April17_fullsimulation_contXY/`
115-
116-
Now we can use the output to reproduce the figures shown in the result section of the manuscript as follows
96+
This will install all required package versions as specified in `renv.lock`.
11797

118-
1. Run **`post-processing/results-plotting.R`**, which should retrieve the simulation output, merge them together and process it for creating the boxplots.
98+
## Reproducing Results
11999

120-
---
100+
1. Run `scripts/main-simulation-function-future-simul-part1.R` to simulate DGMs 2–4. Output is saved to `April10_fullsimulation/`.
101+
2. Run `scripts/main-simulation-function-future-simul-part2.R` for DGM 1. Output is saved to `April17_fullsimulation_contxy/`.
102+
3. Run `scripts/results-plotting.R` to process and visualize the simulation results used in the manuscript.

0 commit comments

Comments
 (0)