Online Supplementary Materials

From Multilevel Modeling to GEE: Revisiting the Within- and Between-Person Debate with Binary Predictors and Outcomes

Author

Affiliation

Ward B. Eiling

Utrecht University

Published

May 12, 2024

1 Introduction

This document provides supplementary materials for the manuscript “From Multilevel Modeling to GEE: Revisiting the Within- and Between-Person Debate with Binary Predictors and Outcomes”. We include:

A comparison of the hybrid and Mundlak’s contextual method.
Details on the treatment of extreme parameter estimates in Generalized Estimating Equations (GEEs).

All analyses were conducted in R. Package citations are provided at the end of this document.

Code

# Load packages
library(purrr)       # Functional programming
library(dplyr)       # Data manipulation
library(tidyr)       # Data reshaping
library(stringr)     # String handling
library(ggplot2)     # Plotting
library(here)        # File paths
library(patchwork)   # Combining ggplots
library(knitr)       # For bib generation

# Save citations to BibTeX for reproducibility
write_bib(c("purrr", "dplyr", "tidyr", "stringr", "ggplot2", "here", "patchwork", "knitr"), file = "packages.bib")

2 Loading Simulation Results

First, we need to load the simulation results. As the simulations were ran in two parts, we will load the results from both parts. The first part includes DGMs 2-4 (binary X and continuous Y, continuous X and binary Y, and binary X and Y), while the second part includes DGM 1 (continuous predictor and outcome). The results are saved in the output folder.

Code

### PART 1: DGM 2 (binary X and continuous Y), DGM 3 (continuous X and binary Y) and DGM 4 (binary X and Y) ----
runname <- "April10_fullsimulation"

# retrieve the design and parameterization
settings <- readRDS(here(paste0("output/", runname, "/settings.RDS")))
design <- settings$design
parametrization <- settings$parametrization
nsim <- settings$nsim

# repeat each row of the design nsim times and create replication index
design_long <- design[rep(1:nrow(design), each = nsim), ] %>%
  mutate(replication = rep(1:nsim, times = nrow(design)),
         design_id = rep(1:nrow(design), each = nsim))

# initialize list to store simulation results
results_list <- list()

# loop over the design
for (idesign in 1:nrow(design)) {
  
  ### Extract parameter values from design
  g.00 <- design$g.00[idesign]
  g.01 <- design$g.01[idesign]
  g.10 <- design$g.10[idesign]
  
  # determine the beta values based on the parametrization
  if (parametrization == "centeredX") {
    beta_between <- g.01
    beta_within <- g.10
    beta_contextual <- beta_between - beta_within
  } else if (parametrization == "mundlak") {
    beta_contextual <- g.01
    beta_within <- g.10
    beta_between <- beta_within + beta_contextual
  }
  
  # read in the results
  parallel_results_setting <- readRDS(here(paste0("output/", runname, "/", idesign, ".RDS")))
  
  # unlist the lists inside the list
  df <- map_dfr(parallel_results_setting, function(rep) {
    map_dfr(rep, ~ as.data.frame(as.list(.x)), .id = "model")
  }, .id = "replication")
  
  df_wide <- df %>%
    select(-X.Intercept.) %>%
    pivot_wider(id_cols = replication, names_from = model, values_from = c("X", "X.cent", "X.cluster.means"), names_glue = "{model}_{.value}") %>%
    select(where(~ !all(is.na(.)))) %>%
    mutate(replication = as.integer(replication))  # ensure numeric
  
  # compute bias
  df_wide2 <- df_wide %>%
    mutate(l1_g.10_bias = l1_X - beta_within,
           l2_g.10_bias = l2_X.cent - beta_within,
           l3a_g.10_bias = l3a_X.cent - beta_within,
           l3a_g.01_bias = l3a_X.cluster.means - beta_between,
           l4_g.10_bias = l4_X - beta_within,
           l4_g.01_bias = l4_X.cluster.means - beta_contextual,
           # also for GEE
           # parametrization 1
           g.independence1_g.10_bias = g.independence1_X - beta_within,
           g.exchangeable1_g.10_bias = g.exchangeable1_X - beta_within,
           g.ar11_g.10_bias = g.ar11_X - beta_within,
           # parametrization 2
           g.independence2_g.10_bias = g.independence2_X.cent - beta_within,
           g.exchangeable2_g.10_bias = g.exchangeable2_X.cent - beta_within,
           g.ar12_g.10_bias = g.ar12_X.cent - beta_within,
           # parametrization 3
           g.independence3_g.10_bias = g.independence3_X.cent - beta_within,
           g.independence3_g.01_bias = g.independence3_X.cluster.means - beta_between,
           g.exchangeable3_g.10_bias = g.exchangeable3_X.cent - beta_within,
           g.exchangeable3_g.01_bias = g.exchangeable3_X.cluster.means - beta_between,
           g.ar13_g.10_bias = g.ar13_X.cent - beta_within,
           g.ar13_g.01_bias = g.ar13_X.cluster.means - beta_between,
           # parametrization 4
           g.independence4_g.10_bias = g.independence4_X - beta_within,
           g.independence4_g.01_bias = g.independence4_X.cluster.means - beta_contextual,
           g.exchangeable4_g.10_bias = g.exchangeable4_X - beta_within,
           g.exchangeable4_g.01_bias = g.exchangeable4_X.cluster.means - beta_contextual,
           g.ar14_g.10_bias = g.ar14_X - beta_within,
           g.ar14_g.01_bias = g.ar14_X.cluster.means - beta_contextual,
           # add design_id for merging later
           design_id = idesign) 
  
  # store the result
  results_list[[idesign]] <- df_wide2
}

# combine all results
sim_results_all <- bind_rows(results_list)

# merge with the long design
final_df <- left_join(design_long, sim_results_all, by = c("design_id", "replication"))

# save the final data frame
saveRDS(final_df, here(paste0("output/", runname, "/plotting_bias_df1.RDS")))

### PART 2: DGM 1 (continuous X and Y) ----

runname <- "April17_fullsimulation_contXY"

# retrieve the design and parameterization
settings <- readRDS(here(paste0("output/", runname, "/settings.RDS")))
design <- settings$design
parametrization <- settings$parametrization
nsim <- settings$nsim

# repeat each row of the design nsim times and create replication index
design_long <- design[rep(1:nrow(design), each = nsim), ] %>%
  mutate(replication = rep(1:nsim, times = nrow(design)),
         design_id = rep(1:nrow(design), each = nsim) + 378)

# initialize list to store simulation results
results_list <- list()

# loop over the design
for (idesign in 1:nrow(design)) {
  
  ### Extract parameter values from design
  g.00 <- design$g.00[idesign]
  g.01 <- design$g.01[idesign]
  g.10 <- design$g.10[idesign]
  
  # determine the beta values based on the parametrization
  if (parametrization == "centeredX") {
    beta_between <- g.01
    beta_within <- g.10
    beta_contextual <- beta_between - beta_within
  } else if (parametrization == "mundlak") {
    beta_contextual <- g.01
    beta_within <- g.10
    beta_between <- beta_within + beta_contextual
  }
  
  # read in the results
  parallel_results_setting <- readRDS(here(paste0("output/", runname, "/", idesign, ".RDS")))
  
  # unlist the lists inside the list
  df <- map_dfr(parallel_results_setting, function(rep) {
    map_dfr(rep, ~ as.data.frame(as.list(.x)), .id = "model")
  }, .id = "replication")
  
  df_wide <- df %>%
    select(-X.Intercept.) %>%
    pivot_wider(id_cols = replication, names_from = model, values_from = c("X", "X.cent", "X.cluster.means"), names_glue = "{model}_{.value}") %>%
    select(where(~ !all(is.na(.)))) %>%
    mutate(replication = as.integer(replication))  # ensure numeric
  
  # compute bias
  df_wide2 <- df_wide %>%
    mutate(l1_g.10_bias = l1_X - beta_within,
           l2_g.10_bias = l2_X.cent - beta_within,
           l3a_g.10_bias = l3a_X.cent - beta_within,
           l3a_g.01_bias = l3a_X.cluster.means - beta_between,
           l4_g.10_bias = l4_X - beta_within,
           l4_g.01_bias = l4_X.cluster.means - beta_contextual,
           # also for GEE
           # parametrization 1
           g.independence1_g.10_bias = g.independence1_X - beta_within,
           g.exchangeable1_g.10_bias = g.exchangeable1_X - beta_within,
           g.ar11_g.10_bias = g.ar11_X - beta_within,
           # parametrization 2
           g.independence2_g.10_bias = g.independence2_X.cent - beta_within,
           g.exchangeable2_g.10_bias = g.exchangeable2_X.cent - beta_within,
           g.ar12_g.10_bias = g.ar12_X.cent - beta_within,
           # parametrization 3
           g.independence3_g.10_bias = g.independence3_X.cent - beta_within,
           g.independence3_g.01_bias = g.independence3_X.cluster.means - beta_between,
           g.exchangeable3_g.10_bias = g.exchangeable3_X.cent - beta_within,
           g.exchangeable3_g.01_bias = g.exchangeable3_X.cluster.means - beta_between,
           g.ar13_g.10_bias = g.ar13_X.cent - beta_within,
           g.ar13_g.01_bias = g.ar13_X.cluster.means - beta_between,
           # parametrization 4
           g.independence4_g.10_bias = g.independence4_X - beta_within,
           g.independence4_g.01_bias = g.independence4_X.cluster.means - beta_contextual,
           g.exchangeable4_g.10_bias = g.exchangeable4_X - beta_within,
           g.exchangeable4_g.01_bias = g.exchangeable4_X.cluster.means - beta_contextual,
           g.ar14_g.10_bias = g.ar14_X - beta_within,
           g.ar14_g.01_bias = g.ar14_X.cluster.means - beta_contextual,
           # add design_id for merging later
           design_id = idesign) 
  
  # store the result
  results_list[[idesign]] <- df_wide2
}

# combine all results
sim_results_all <- bind_rows(results_list) %>%
  mutate(design_id = design_id + 378) # adjust design_id to match the first simulation

# merge with the long design
final_df <- left_join(design_long, sim_results_all, by = c("design_id", "replication"))

# save the final data frame
saveRDS(final_df, here(paste0("output/", runname, "/plotting_bias_df2.RDS")))

Let us now combine the results from both simulations

Code

### COMBINE RESULTS FROM BOTH SIMULATIONS ### ----

# read in the first simulation results
runname1 <- "April10_fullsimulation"
final_df1 <- readRDS(here(paste0("output/", runname1, "/plotting_bias_df1.RDS")))
# read in the second simulation results
runname2 <- "April17_fullsimulation_contXY"
final_df2 <- readRDS(here(paste0("output/", runname2, "/plotting_bias_df2.RDS")))

# combine the two data frames
final_df <- bind_rows(final_df1, final_df2)
# save the final data frame
newrunname <- "April18_fullsimulation_combined"
saveRDS(final_df, here(paste0("output/", newrunname, "/plotting_bias_df.RDS")))

Code

# read in the final data frame
newrunname <- "April18_fullsimulation_combined"
final_df <- readRDS(here(paste0("output/", newrunname, "/plotting_bias_df.RDS")))

3 Comparison of Hybrid and Mundlak’s Contextual Method

To demonstrate numerical similarity between the hybrid (Model 4) and mundlak’s contextual (Model 3a) specifications, we computed the bias difference per replication. Results show that the within-person effects are equivalent across all scenarios. For contextual effects, we find no systematic differences under continuous outcomes (a linear link), but do find systematic differences in GEE under binary outcomes (a logit link). However, these do not affect conclusions made in the manuscript.

For continuous outcomes (DGMs 1 and 2; see Figure 1), we can see that there are no differences in the within-person effect estimated by the two methods across estimation frameworks and the 1000 replications. In contrast, we see slight differences in the contextual effect estimated by the two methods, although the distribution is closely centered around 0, implying that any imbalance in the estimates is very small and will balance out in the long run.

Code

# filter for the hybrid and contextual method
hybrid_contextual_df <- final_df %>%
  select(-c(ends_with("_success"), ends_with("_X"), ends_with("_X.cent"), ends_with("_X.cluster.means"),
            "l1_X", "g.independence1_X", "g.exchangeable1_X", "g.ar11_X", contains("2_"))) %>%
  filter(outcome.type == "continuous") %>%
  # create a column for the difference for within-person effect beta_1
  mutate(GLMM_beta1_dif = l4_g.10_bias - l3a_g.10_bias,
         GEE_indep_beta1_dif = g.independence4_g.10_bias - g.independence3_g.10_bias,
         GEE_exch_beta1_dif = g.exchangeable4_g.10_bias - g.exchangeable3_g.10_bias,
         GEE_ar_beta1_dif = g.ar14_g.10_bias - g.ar13_g.10_bias) %>%
  # create columns for the difference in contextual effect 
  mutate(GLMM_g01_dif = l4_g.01_bias - l3a_g.01_bias,
         GEE_indep_g01_dif = g.independence4_g.01_bias - g.independence3_g.01_bias,
         GEE_exch_g01_dif = g.exchangeable4_g.01_bias - g.exchangeable3_g.01_bias,
         GEE_ar_g01_dif = g.ar14_g.01_bias - g.ar13_g.01_bias)

# plot the differences in a density plot
plot1 <- hybrid_contextual_df %>%
  ggplot(aes(x = GLMM_beta1_dif)) +
  geom_density() +
  labs(x = "Difference in within-person effect", y = "Density", title = "GLMM within-person effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot2 <- hybrid_contextual_df %>%
  ggplot(aes(x = GLMM_g01_dif)) +
  geom_density() +
  labs(x = "Difference in contextual effect", y = "Density", title = "GLMM contextual effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot3 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_indep_beta1_dif)) +
  geom_density() +
  labs(x = "Difference in within-person effect", y = "Density", title = "GEE-indep within-person effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot4 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_indep_g01_dif)) +
  geom_density() +
  labs(x = "Difference in contextual effect", y = "Density", title = "GEE-indep contextual effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot5 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_exch_beta1_dif)) +
  geom_density() +
  labs(x = "Difference in within-person effect", y = "Density", title = "GEE-exch within-person effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot6 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_exch_g01_dif)) +
  geom_density() +
  labs(x = "Difference in contextual effect", y = "Density", title = "GEE-exch contextual effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot7 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_ar_beta1_dif)) +
  geom_density() +
  labs(x = "Difference in within-person effect", y = "Density", title = "GEE-AR(1) within-person effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot8 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_ar_g01_dif)) +
  geom_density() +
  labs(x = "Difference in contextual effect", y = "Density", title = "GEE-AR(1) contextual effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

# combine plots with title
combined_plot <- (plot1 + plot2) / (plot3 + plot4) / (plot5 + plot6) / (plot7 + plot8)
combined_plot

Figure 1: Differences in Estimates Between Hybrid and Contextual Methods in DGM 1 and 2 (continuous outcome)

For binary outcomes (DGM 3 and 4; see Figure 2), the story is different. We see again that the within-person effect estimates are similar across the two methods. For the contextual effect, we see that GLMM shows substantially more variation than for the continuous outcomes, but that the distribution is still symmetric and centered around 0. However, for the GEEs estimation approaches, we see bi-modal distributions (two peaks) for the contextual effect estimates and that the distribution is not centered around 0 anymore. This indicates a difference in estimation of the identity (linear) and logit (non-linear) link functions.

Code

# filter for the hybrid and contextual method
hybrid_contextual_df <- final_df %>%
  select(-c(ends_with("_success"), ends_with("_X"), ends_with("_X.cent"), ends_with("_X.cluster.means"),
            "l1_X", "g.independence1_X", "g.exchangeable1_X", "g.ar11_X", contains("2_"))) %>%
  filter(outcome.type == "binary") %>%
  # create a column for the difference for within-person effect beta_1
  mutate(GLMM_beta1_dif = l4_g.10_bias - l3a_g.10_bias,
         GEE_indep_beta1_dif = g.independence4_g.10_bias - g.independence3_g.10_bias,
         GEE_exch_beta1_dif = g.exchangeable4_g.10_bias - g.exchangeable3_g.10_bias,
         GEE_ar_beta1_dif = g.ar14_g.10_bias - g.ar13_g.10_bias) %>%
  # create columns for the difference in contextual effect 
  mutate(GLMM_g01_dif = l4_g.01_bias - l3a_g.01_bias,
         GEE_indep_g01_dif = g.independence4_g.01_bias - g.independence3_g.01_bias,
         GEE_exch_g01_dif = g.exchangeable4_g.01_bias - g.exchangeable3_g.01_bias,
         GEE_ar_g01_dif = g.ar14_g.01_bias - g.ar13_g.01_bias)

# plot the differences in a density plot
plot1 <- hybrid_contextual_df %>%
  ggplot(aes(x = GLMM_beta1_dif)) +
  geom_density() +
  labs(x = "Difference in within-person effect", y = "Density", title = "GLMM within-person effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot2 <- hybrid_contextual_df %>%
  ggplot(aes(x = GLMM_g01_dif)) +
  geom_density() +
  labs(x = "Difference in contextual effect", y = "Density", title = "GLMM contextual effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot3 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_indep_beta1_dif)) +
  geom_density() +
  labs(x = "Difference in within-person effect", y = "Density", title = "GEE-indep within-person effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot4 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_indep_g01_dif)) +
  geom_density() +
  labs(x = "Difference in contextual effect", y = "Density", title = "GEE-indep contextual effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot5 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_exch_beta1_dif)) +
  geom_density() +
  labs(x = "Difference in within-person effect", y = "Density", title = "GEE-exch within-person effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot6 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_exch_g01_dif)) +
  geom_density() +
  labs(x = "Difference in contextual effect", y = "Density", title = "GEE-exch contextual effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot7 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_ar_beta1_dif)) +
  geom_density() +
  labs(x = "Difference in within-person effect", y = "Density", title = "GEE-AR(1) within-person effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

plot8 <- hybrid_contextual_df %>%
  ggplot(aes(x = GEE_ar_g01_dif)) +
  geom_density() +
  labs(x = "Difference in contextual effect", y = "Density", title = "GEE-AR(1) contextual effect") +
  theme_bw() +
  xlim(-5, 5) +
  theme(legend.position = "bottom")

# combine plots with title
combined_plot <- (plot1 + plot2) / (plot3 + plot4) / (plot5 + plot6) / (plot7 + plot8)
combined_plot

Figure 2: Differences in Estimates Between Hybrid and Contextual Methods in DGM 3 and 4 (binary outcome)

Let us investigate whether this influenced the figures shown in the manuscript for binary outcomes. Let us first consider DGM 3 (continuous predictor, binary outcome), which is visualized in Figure 3 . As expected based on the density plots, we see that the GLMM estimates of the two methods are very similar, but that the GEE estimates are not. Specifically, we see that the HB method produces estimates that have greater bias than the MuCo method. Nevertheless, this does not change any conclusions, as both methods under GEE with a binary outcome consistently fail to retrieve the true contextual effect.

Code

runname <- "April18_fullsimulation_combined"

# read in the final data frame
final_df <- readRDS(here(paste0("output/", runname, "/plotting_bias_df.RDS")))
  
# select relevant variables and cases (select and filter)
plot_df <- final_df %>%
# select T = 20, N = 200, sd.u0 = 1, predictor.type = "binary" and outcome.type = "continuous"
filter(T_total %in% c(5, 20), N_total == 200, sd.u0 %in% c(1, 3), predictor.type == "continuous", outcome.type == "binary",
       sdX.between == 3, g.01 == 3) %>%
select(-c(ends_with("_success"), ends_with("_X"), ends_with("_X.cent"), ends_with("_X.cluster.means")))

plot_df_g01 <- plot_df %>%
select(-ends_with("_g.10_bias")) %>%
# turn the bias variables into long format, with a new column indicating the model name
pivot_longer(cols = ends_with("_bias"), names_to = "model", values_to = "g01_bias") %>%
# remove the "_g.01_bias" suffix from the model names
mutate(model = str_remove(model, "_g.01_bias")) %>%
# change model names
mutate(model = recode(model,
                      "l3a" = "GLMM-HB",
                      "l4" = "GLMM-MuCo",
                      "g.independence3" = "GEE-independence-HB",
                      "g.independence4" = "GEE-independence-MuCo",
                      "g.exchangeable3" = "GEE-exchangeable-HB",
                      "g.exchangeable4" = "GEE-exchangeable-MuCo",
                      "g.ar13" = "GEE-AR1-HB",
                      "g.ar14" = "GEE-AR1-MuCo")) %>%
# set factor levels of model to ensure correct order in the plot
mutate(model = factor(model, levels = c("GLMM-HB", "GLMM-MuCo", "GEE-independence-HB", "GEE-independence-MuCo",
                                        "GEE-exchangeable-HB", "GEE-exchangeable-MuCo",
                                        "GEE-AR1-HB", "GEE-AR1-MuCo"))) %>%
# create new variable indicating method type (so M1 and G1 are "Method 1")
mutate(method_type = case_when(
  str_detect(model, "HB") ~ "HB",
  str_detect(model, "MuCo") ~ "MuCo",
)) %>%
mutate(estimation_type = case_when(
  str_detect(model, "GLMM") ~ "GLMM",
  str_detect(model, "independence") ~ "GEE-independence",
  str_detect(model, "exchangeable") ~ "GEE-exchangeable",
  str_detect(model, "AR1") ~ "GEE-AR(1)"
)) %>%
# set factor levels of method_type to ensure correct order in the plot
mutate(estimation_type = factor(estimation_type, levels = c("GLMM", "GEE-independence", "GEE-exchangeable", "GEE-AR(1)")),
       method_type = factor(method_type, levels = c("HB", "MuCo"))) %>%
# Turn label variables (sdX.between, g.01 and sd.u0) into strings with an underscore
mutate(sd.u0_label = factor(sd.u0,
                            levels = c(1, 3),
                            labels = c(expression(sigma[u] == 1), expression(sigma[u] == 3)))) %>%
mutate(T_total_label = factor(T_total,
                              levels = c(5, 20),
                              labels = c("T == 5", "T == 20"))) %>%
# remove all bias values exceeding 100
mutate(g01_bias = ifelse(abs(g01_bias) > 100, NA, g01_bias)) 

# For the contextual effect
ggplot(plot_df_g01, aes(x = method_type, y = g01_bias, col = estimation_type)) +
geom_boxplot() +
geom_hline(yintercept = 0, linetype = "dashed") +  # Dashed horizontal line at 0
coord_cartesian(ylim = c(-1.5, 1.5)) +  # Set y-axis limits
# add tick mark at Y for every 0.5
scale_y_continuous(breaks = seq(-1.5, 1.5, by = 0.5)) +
labs(x = "Method", y = "Bias") +
facet_grid(sd.u0_label ~ T_total_label, labeller = label_parsed) + # Show T and N values in labels
theme_bw() +
# remove X axis labels
theme(# remove vertical grid lines
  panel.grid.major.x = element_blank(),
  # remove X tick marks
  axis.ticks.x = element_blank(),
  # increase font size for grid titles
  strip.text.x = element_text(size = 12),
  strip.text.y = element_text(size = 12),
  # increase font size for X entries
  axis.text.x = element_text(size = 12),
  axis.text.y = element_text(size = 12),
  axis.title.y = element_text(size = 13),
  axis.title.x = element_text(size = 13),
  # remove legend
  # increase legend font size
  legend.text = element_text(size = 11),
  legend.title = element_text(size = 13),
  legend.position = "bottom",
) +
# change legend title to "Estimation"
scale_color_brewer(name = "Estimation", palette = "Spectral")

Figure 3: Bias in contextual effect estimates for DGM 3 (continuous predictor, binary outcome)

For DGM 4 (binary predictor and outcome), we see the same pattern found for DGM 3, indicating that the differences in bias between the two methods do not alter the conclusions.

Code

runname <- "April18_fullsimulation_combined"

# read in the final data frame
final_df <- readRDS(here(paste0("output/", runname, "/plotting_bias_df.RDS")))
  
# select relevant variables and cases (select and filter)
plot_df <- final_df %>%
# select T = 20, N = 200, sd.u0 = 1, predictor.type = "binary" and outcome.type = "continuous"
filter(T_total %in% c(5, 20), N_total == 200, sd.u0 %in% c(1, 3), predictor.type == "binary", outcome.type == "binary",
       sdX.between == 3, g.01 == 3) %>%
select(-c(ends_with("_success"), ends_with("_X"), ends_with("_X.cent"), ends_with("_X.cluster.means")))

plot_df_g01 <- plot_df %>%
select(-ends_with("_g.10_bias")) %>%
# turn the bias variables into long format, with a new column indicating the model name
pivot_longer(cols = ends_with("_bias"), names_to = "model", values_to = "g01_bias") %>%
# remove the "_g.01_bias" suffix from the model names
mutate(model = str_remove(model, "_g.01_bias")) %>%
# change model names
mutate(model = recode(model,
                      "l3a" = "GLMM-HB",
                      "l4" = "GLMM-MuCo",
                      "g.independence3" = "GEE-independence-HB",
                      "g.independence4" = "GEE-independence-MuCo",
                      "g.exchangeable3" = "GEE-exchangeable-HB",
                      "g.exchangeable4" = "GEE-exchangeable-MuCo",
                      "g.ar13" = "GEE-AR1-HB",
                      "g.ar14" = "GEE-AR1-MuCo")) %>%
# set factor levels of model to ensure correct order in the plot
mutate(model = factor(model, levels = c("GLMM-HB", "GLMM-MuCo", "GEE-independence-HB", "GEE-independence-MuCo",
                                        "GEE-exchangeable-HB", "GEE-exchangeable-MuCo",
                                        "GEE-AR1-HB", "GEE-AR1-MuCo"))) %>%
# create new variable indicating method type (so M1 and G1 are "Method 1")
mutate(method_type = case_when(
  str_detect(model, "HB") ~ "HB",
  str_detect(model, "MuCo") ~ "MuCo",
)) %>%
mutate(estimation_type = case_when(
  str_detect(model, "GLMM") ~ "GLMM",
  str_detect(model, "independence") ~ "GEE-independence",
  str_detect(model, "exchangeable") ~ "GEE-exchangeable",
  str_detect(model, "AR1") ~ "GEE-AR(1)"
)) %>%
# set factor levels of method_type to ensure correct order in the plot
mutate(estimation_type = factor(estimation_type, levels = c("GLMM", "GEE-independence", "GEE-exchangeable", "GEE-AR(1)")),
       method_type = factor(method_type, levels = c("HB", "MuCo"))) %>%
# Turn label variables (sdX.between, g.01 and sd.u0) into strings with an underscore
mutate(sd.u0_label = factor(sd.u0,
                            levels = c(1, 3),
                            labels = c(expression(sigma[u] == 1), expression(sigma[u] == 3)))) %>%
mutate(T_total_label = factor(T_total,
                              levels = c(5, 20),
                              labels = c("T == 5", "T == 20"))) %>%
# remove all bias values exceeding 100
mutate(g01_bias = ifelse(abs(g01_bias) > 100, NA, g01_bias)) 

# For the contextual effect
ggplot(plot_df_g01, aes(x = method_type, y = g01_bias, col = estimation_type)) +
geom_boxplot() +
geom_hline(yintercept = 0, linetype = "dashed") +  # Dashed horizontal line at 0
coord_cartesian(ylim = c(-1.5, 1.5)) +  # Set y-axis limits
# add tick mark at Y for every 0.5
scale_y_continuous(breaks = seq(-1.5, 1.5, by = 0.5)) +
labs(x = "Method", y = "Bias") +
facet_grid(sd.u0_label ~ T_total_label, labeller = label_parsed) + # Show T and N values in labels
theme_bw() +
# remove X axis labels
theme(# remove vertical grid lines
  panel.grid.major.x = element_blank(),
  # remove X tick marks
  axis.ticks.x = element_blank(),
  # increase font size for grid titles
  strip.text.x = element_text(size = 12),
  strip.text.y = element_text(size = 12),
  # increase font size for X entries
  axis.text.x = element_text(size = 12),
  axis.text.y = element_text(size = 12),
  axis.title.y = element_text(size = 13),
  axis.title.x = element_text(size = 13),
  # remove legend
  # increase legend font size
  legend.text = element_text(size = 11),
  legend.title = element_text(size = 13),
  legend.position = "bottom",
) +
# change legend title to "Estimation"
scale_color_brewer(name = "Estimation", palette = "Spectral")

Figure 4: Bias in contextual effect estimates for DGM 4 (binary predictor and outcome)

4 Appendix: Handling of Extreme Parameter Estimates in GEEs

In the simulations, we found that GEEs tended to yield extreme values rather than terminating with an error. These extreme values are likely a sign of non-convergence. In this section, we will diagnose the extent of this issue and how it relates to the design parameters.

In Table 1 we can see that extreme values only occurred in scenarios with binary outcome variables, that they occurred most frequently for the AR(1) GEE approach, followed by rare occurrences in the exchangeable GEE approach and a complete absence in the independence GEE approach. Here we can see that the g.ar12 with a continuous predictor and binary outcome had the most extreme values (4714 extreme values).

Code

# checking how many GEEs yielded extreme values
runname <- "April18_fullsimulation_combined"

# read in the final data frame
final_df <- readRDS(here(paste0("output/", runname, "/plotting_bias_df.RDS")))

# scenario 1
GEE_check_df <- final_df %>%
  # remove columns ending with "g.01", "X", "X.cent", "X.cluster.means"
  select(-ends_with("_g.01_bias"), -ends_with("_X"), -ends_with("_X.cent"), -ends_with("_X.cluster.means")) %>%
  pivot_longer(cols = ends_with("_g.10_bias"),
               names_to = "model", values_to = "beta1_bias") %>%
  mutate(
    model = str_remove(model, "_g.10_bias"),
    model = factor(model, levels = c("l1", "l2", "l3a", "l4", 
                                     "g.exchangeable1", "g.ar11", "g.independence1",
                                     "g.exchangeable2", "g.ar12", "g.independence2", 
                                     "g.exchangeable3", "g.ar13", "g.independence3", 
                                     "g.exchangeable4", "g.ar14", "g.independence4"))) %>%
  # select only the GEE models
  filter(model %in% c("g.exchangeable1", "g.ar11", "g.independence1",
                      "g.exchangeable2", "g.ar12", "g.independence2", 
                      "g.exchangeable3", "g.ar13", "g.independence3", 
                      "g.exchangeable4", "g.ar14", "g.independence4")) 


# define a threshold for "extreme" bias
threshold1 <- 1E+11  # can also be set to 5, doesn't affect the number of extreme values
threshold2 <- 20  # can also be set to 5, doesn't affect the number of extreme values

# create a table with the number of extreme scenarios per model
extreme_count_df <- GEE_check_df %>%
  group_by(model, predictor.type, outcome.type) %>%
  summarize(
    num_extreme1 = sum(abs(beta1_bias) > threshold1),
    num_extreme2 = sum(abs(beta1_bias) > threshold2),
    .groups = "drop"
  ) %>%
  arrange(desc(num_extreme1))

# create table
knitr::kable(extreme_count_df[1:10,], format = "html")

Table 1: Design Features with Greatest Number of Extreme Values in GEEs

model	predictor.type	outcome.type	num_extreme1	num_extreme2
g.ar12	continuous	binary	4714	4714
g.ar13	continuous	binary	1003	1003
g.ar14	continuous	binary	1003	1003
g.ar13	binary	binary	612	612
g.ar14	binary	binary	611	611
g.ar12	binary	binary	57	57
g.exchangeable3	continuous	binary	26	26
g.exchangeable4	continuous	binary	26	26
g.ar11	continuous	binary	6	6
g.exchangeable1	binary	binary	0	0

All counts are the same across the two thresholds, indicating that when values are extreme, they are very extreme.

Code

# check whether the counts are the same (TRUE or FALSE)
counts_equal <- extreme_count_df$num_extreme1 == extreme_count_df$num_extreme2
# check if there are exclusively TRUE values
all(counts_equal)

[1] TRUE

Another way of investigating this is by evaluating the proportion of extreme replications per scenario. Table 2 shows that there is one scenario (design_id = 299) with nearly half of the replications resulting in extreme values (according to both thresholds).

Code

# create a table with the proportion of extreme replications per scenario
extreme_prop_df <- GEE_check_df %>%
  group_by(model, design_id) %>%
  summarize(
    proportion_extreme1 = mean(abs(beta1_bias) > threshold1),
    proportion_extreme2 = mean(abs(beta1_bias) > threshold2),
    .groups = "drop"
  ) %>%
  arrange(desc(proportion_extreme1))

knitr::kable(extreme_prop_df[1:20,])

Table 2: Designs with Greatest Proportion of Extreme Values in GEEs

model	design_id	proportion_extreme1	proportion_extreme2
g.ar12	299	0.484	0.484
g.ar12	300	0.453	0.453
g.ar12	353	0.451	0.451
g.ar12	354	0.425	0.425
g.ar13	263	0.375	0.375
g.ar14	263	0.375	0.375
g.ar12	227	0.257	0.257
g.ar12	173	0.245	0.245
g.ar12	47	0.223	0.223
g.ar13	264	0.216	0.216
g.ar14	264	0.216	0.216
g.ar12	101	0.210	0.210
g.ar12	174	0.195	0.195
g.ar13	257	0.190	0.190
g.ar14	257	0.190	0.190
g.ar12	228	0.184	0.184
g.ar12	83	0.181	0.181
g.ar12	209	0.181	0.181
g.ar12	102	0.151	0.151
g.ar12	48	0.141	0.141

All proportions are the same across the two thresholds, indicating that when values are extreme, they are very extreme.

Code

# check whether the proportions are the same by checking if there is any false in the vector extreme_prop_df$proportion_extreme1 == extreme_prop_df$proportion_extreme2
proportions_equal <- extreme_prop_df$proportion_extreme1 == extreme_prop_df$proportion_extreme2
# check if there are exclusively TRUE values
all(proportions_equal)

[1] TRUE

There are 76 scenarios that have at least one extreme replication.

Code

# remove zeros from the extreme_prop_df
extreme_prop_df_nozeros <- extreme_prop_df %>%
  filter(proportion_extreme1 > 0)
# count number of unique design_ids
length(unique(extreme_prop_df_nozeros$design_id))

[1] 76

Table 3 shows the GEE scenarios for which the most extreme values occurred in a descending order. We can see here that the scenarios with the most extreme values had a large number of time points (T = 20), a continuous predictor, a binary outcome a non-zero random intercept value (sd.u0). The model with the most extreme values was the AR(1) GEE approach with the CWC method (g.ar12), followed by a tie between Mundlak’s contextual and the Hybrid methods (g.ar13 and g.ar14).

Code

# filter the first 8 columns
extreme_scenarios <- extreme_prop_df[1:8,]

# filter the GEE_check_df for the extreme scenarios
GEE_check_df_extreme <- GEE_check_df %>%
  right_join(extreme_scenarios, by = c("model", "design_id")) %>%
  select(-c(beta1_bias, replication, g.00, sdX.within, g.10, true_cluster_means, design_id, sd.u1, proportion_extreme2)) %>%
  distinct() %>%
  rename(prop_extreme = proportion_extreme1) %>%
  arrange(desc(prop_extreme))

# turn this into table
knitr::kable(GEE_check_df_extreme, format = "html")

Table 3: Scenarios with Greatest Proportion of Extreme Values in GEEs

N_total	T_total	predictor.type	outcome.type	sdX.between	g.01	sd.u0	sd.e	model	prop_extreme
100	20	continuous	binary	3	0	3	1	g.ar12	0.484
200	20	continuous	binary	3	0	3	1	g.ar12	0.453
100	20	continuous	binary	1	3	3	1	g.ar12	0.451
200	20	continuous	binary	1	3	3	1	g.ar12	0.425
100	20	continuous	binary	0	0	3	1	g.ar13	0.375
100	20	continuous	binary	0	0	3	1	g.ar14	0.375
100	20	continuous	binary	1	3	1	1	g.ar12	0.257
100	20	continuous	binary	3	0	1	1	g.ar12	0.245

5 References

Müller, K. (2020). Here: A simpler way to find your files. https://here.r-lib.org/

Pedersen, T. L. (2024). Patchwork: The composer of plots. https://patchwork.data-imaginist.com

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org

Wickham, H. (2023). Stringr: Simple, consistent wrappers for common string operations. https://stringr.tidyverse.org

Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., Dunnington, D., & van den Brand, T. (2025). ggplot2: Create elegant data visualisations using the grammar of graphics. https://ggplot2.tidyverse.org

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). Dplyr: A grammar of data manipulation. https://dplyr.tidyverse.org

Wickham, H., & Henry, L. (2025). Purrr: Functional programming tools. https://purrr.tidyverse.org/

Wickham, H., Vaughan, D., & Girlich, M. (2024). Tidyr: Tidy messy data. https://tidyr.tidyverse.org

Xie, Y. (2014). Knitr: A comprehensive tool for reproducible research in R. In V. Stodden, F. Leisch, & R. D. Peng (Eds.), Implementing reproducible computational research. Chapman; Hall/CRC.

Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Chapman; Hall/CRC. https://yihui.org/knitr/

Xie, Y. (2025). Knitr: A general-purpose package for dynamic report generation in r. https://yihui.org/knitr/