tips:multiple_imputation_with_mice_and_metafor
Differences
This shows you the differences between two versions of the page.
Next revision | |||
— | tips:multiple_imputation_with_mice_and_metafor [2019/10/09 12:46] – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ===== Multiple Imputation with the mice and metafor Packages ===== | ||
+ | |||
+ | Meta-analytic data often looks like Swiss cheese -- there are lots of holes in it! For example, due to missing information, | ||
+ | |||
+ | ==== Data Preparation ==== | ||
+ | |||
+ | For the example, I will use data from the meta-analysis by Bangert-Drowns et al. (2004) on the effectiveness of school-based writing-to-learn interventions on academic achievement ('' | ||
+ | <code rsplus> | ||
+ | library(metafor) | ||
+ | dat <- dat.bangertdrowns2004 | ||
+ | </ | ||
+ | (I copy the dataset into ' | ||
+ | <code rsplus> | ||
+ | rbind(head(dat, | ||
+ | </ | ||
+ | <code output> | ||
+ | | ||
+ | 1 | ||
+ | 2 | ||
+ | 3 | ||
+ | 4 | ||
+ | 5 | ||
+ | 6 | ||
+ | 7 | ||
+ | 8 | ||
+ | 9 | ||
+ | 10 10 | ||
+ | 39 39 Ross & Faucette 1994 | ||
+ | 40 40 Sharp 1987 | ||
+ | 41 41 | ||
+ | 42 42 | ||
+ | 43 43 | ||
+ | 44 44 Weiss & Walters 1980 | ||
+ | 45 45 Wells 1986 | ||
+ | 46 46 Willey 1988 | ||
+ | 47 47 Willey 1988 | ||
+ | 48 48 | ||
+ | </ | ||
+ | |||
+ | Variable '' | ||
+ | |||
+ | For illustration purposes, the following variables will be examined as potential moderators of the treatment effect (i.e., the size of the treatment effect may vary in a systematic way as a function of one or more of these variables): | ||
+ | |||
+ | * length: treatment length (in weeks) | ||
+ | * wic: writing tasks were completed in class (0 = no; 1 = yes) | ||
+ | * feedback: feedback on writing was provided (0 = no; 1 = yes) | ||
+ | * info: writing contained informational components (0 = no; 1 = yes) | ||
+ | * pers: writing contained personal components (0 = no; 1 = yes) | ||
+ | * imag: writing contained imaginative components (0 = no; 1 = yes) | ||
+ | * meta: prompts for metacognitive reflection (0 = no; 1 = yes) | ||
+ | |||
+ | More details about the meaning of these variables can be found in Bangert-Drowns et al. (2004). For the purposes of this illustration, | ||
+ | |||
+ | To make some of the code below simpler, I will only keep the variables needed for the analyses: | ||
+ | <code rsplus> | ||
+ | dat <- dat[c(" | ||
+ | </ | ||
+ | |||
+ | ==== Complete Case Analysis ==== | ||
+ | |||
+ | Standard model fitting functions in R (including those from the metafor package) will usually apply ' | ||
+ | <code rsplus> | ||
+ | data.frame(k.NA=colSums(is.na(dat))) | ||
+ | </ | ||
+ | <code output> | ||
+ | k.NA | ||
+ | yi 0 | ||
+ | vi 0 | ||
+ | length | ||
+ | wic 2 | ||
+ | feedback | ||
+ | info 2 | ||
+ | pers 2 | ||
+ | imag 0 | ||
+ | meta 2 | ||
+ | </ | ||
+ | We can also check how many missing values there are for each study: | ||
+ | <code rsplus> | ||
+ | table(rowSums(is.na(dat))) | ||
+ | </ | ||
+ | <code output> | ||
+ | | ||
+ | 41 5 2 | ||
+ | </ | ||
+ | So, for 41 studies, the data are complete, but 5 studies have one missing value, while 2 studies have three missing values. Therefore, when fitting a meta-regression model with all moderators of interest included simultaneously, | ||
+ | |||
+ | Let's try this: | ||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, mods = ~ length + wic + feedback + info + pers + imag + meta, data=dat) | ||
+ | res | ||
+ | </ | ||
+ | Note that the '' | ||
+ | <code output> | ||
+ | Warning message: | ||
+ | In rma(yi, vi, mods = ~length + wic + feedback + info + pers + imag + : | ||
+ | Studies with NAs omitted from model fitting. | ||
+ | </ | ||
+ | The output of '' | ||
+ | <code output> | ||
+ | Mixed-Effects Model (k = 41; tau^2 estimator: REML) | ||
+ | |||
+ | tau^2 (estimated amount of residual heterogeneity): | ||
+ | tau (square root of estimated tau^2 value): | ||
+ | I^2 (residual heterogeneity / unaccounted variability): | ||
+ | H^2 (unaccounted variability / sampling variability): | ||
+ | R^2 (amount of heterogeneity accounted for): 21.01% | ||
+ | |||
+ | Test for Residual Heterogeneity: | ||
+ | QE(df = 33) = 51.4963, p-val = 0.0211 | ||
+ | |||
+ | Test of Moderators (coefficients 2:8): | ||
+ | QM(df = 7) = 11.7175, p-val = 0.1102 | ||
+ | |||
+ | Model Results: | ||
+ | |||
+ | estimate | ||
+ | intrcpt | ||
+ | length | ||
+ | wic -0.0472 | ||
+ | feedback | ||
+ | info | ||
+ | pers | ||
+ | imag 0.4106 | ||
+ | meta 0.2010 | ||
+ | |||
+ | --- | ||
+ | Signif. codes: | ||
+ | </ | ||
+ | As can be seen from the output, $k = 41$ studies were included in the analysis (i.e., the ones with complete data). So, roughly 15% (i.e., 7 out of 48) of the studies were excluded from the analysis. While this isn't the most horrible example, it illustrates the loss of data (and information) that can occur when conducting a complete case analysis. | ||
+ | |||
+ | ==== Multiple Imputation ==== | ||
+ | |||
+ | One way of dealing with missing data is to make use of imputation techniques. The advantage of using [[wp> | ||
+ | |||
+ | The mice package allows us to automate this process and can be used in combination with the metafor package. First, we install and load the mice package and then evaluate some code from the metafor package that generates two helper functions we need so that mice and metafor can interact as necessary: | ||
+ | <code rsplus> | ||
+ | install.packages(" | ||
+ | library(mice) | ||
+ | eval(metafor::: | ||
+ | </ | ||
+ | |||
+ | Most of the moderators are dummy variables (coded 0 vs 1). Although it isn't typically necessary to encode such variables as " | ||
+ | <code rsplus> | ||
+ | dat$wic | ||
+ | dat$feedback <- factor(dat$feedback) | ||
+ | dat$info | ||
+ | dat$pers | ||
+ | dat$imag | ||
+ | dat$meta | ||
+ | </ | ||
+ | |||
+ | Next, we will set up the predictor matrix for the imputations. For this, we run the '' | ||
+ | <code rsplus> | ||
+ | predMatrix <- make.predictorMatrix(dat) | ||
+ | predMatrix | ||
+ | </ | ||
+ | <code output> | ||
+ | yi vi length wic feedback info pers imag meta | ||
+ | yi 0 1 1 | ||
+ | vi 1 0 1 | ||
+ | length | ||
+ | wic | ||
+ | feedback | ||
+ | info 1 1 1 | ||
+ | pers 1 1 1 | ||
+ | imag 1 1 1 | ||
+ | meta 1 1 1 | ||
+ | </ | ||
+ | A value of 1 in this matrix indicates that the corresponding column variable is used to predict the corresponding row variable (so each row can be regarded as specifying the predictors for the model used to predict each row variable). Now we will make a few changes to this matrix. First, I set the entire column corresponding to the '' | ||
+ | <code rsplus> | ||
+ | predMatrix[," | ||
+ | predMatrix[" | ||
+ | predMatrix[" | ||
+ | predMatrix | ||
+ | </ | ||
+ | <code output> | ||
+ | yi vi length wic feedback info pers imag meta | ||
+ | yi 0 0 0 | ||
+ | vi 0 0 0 | ||
+ | length | ||
+ | wic | ||
+ | feedback | ||
+ | info 1 0 1 | ||
+ | pers 1 0 1 | ||
+ | imag 1 0 1 | ||
+ | meta 1 0 1 | ||
+ | </ | ||
+ | |||
+ | Next, I create a vector that specifies the method used to predict (and hence impute) each variable: | ||
+ | <code rsplus> | ||
+ | impMethod <- make.method(dat) | ||
+ | impMethod | ||
+ | </ | ||
+ | <code output> | ||
+ | yi | ||
+ | "" | ||
+ | |||
+ | </ | ||
+ | By default, predictive mean matching ('' | ||
+ | |||
+ | Now we are ready to generate the multiple imputations. Often (and by default), only 5 datasets are generated, but I increase this to 20 below. I also set the seed (for the random number generator) to make the following results fully reproducible: | ||
+ | <code rsplus> | ||
+ | imp <- mice(dat, print=FALSE, | ||
+ | </ | ||
+ | |||
+ | Next, we can fit the model of interest to each of the 20 imputed datasets with: | ||
+ | <code rsplus> | ||
+ | fit <- with(imp, rma(yi, vi, mods = ~ length + wic + feedback + info + pers + imag + meta)) | ||
+ | </ | ||
+ | |||
+ | And finally, we can pool the results with: | ||
+ | <code rsplus> | ||
+ | pool <- pool(fit) | ||
+ | round(summary(pool), | ||
+ | </ | ||
+ | <code output> | ||
+ | estimate std.error statistic | ||
+ | intrcpt | ||
+ | length | ||
+ | wic1 | ||
+ | feedback1 | ||
+ | info1 -0.3133 | ||
+ | pers1 -0.3233 | ||
+ | imag1 | ||
+ | meta1 | ||
+ | </ | ||
+ | |||
+ | For easier comparison, let's look at the coefficient table based on the complete case analysis obtained earlier: | ||
+ | <code rsplus> | ||
+ | round(coef(summary(res)), | ||
+ | </ | ||
+ | <code output> | ||
+ | estimate | ||
+ | intrcpt | ||
+ | length | ||
+ | wic -0.0472 0.1097 -0.4308 0.6666 -0.2622 0.1677 | ||
+ | feedback | ||
+ | info | ||
+ | pers | ||
+ | imag 0.4106 0.1847 | ||
+ | meta 0.2010 0.1742 | ||
+ | </ | ||
+ | |||
+ | Leaving aside a discussion of the usefulness of p-values, there is an interesting discrepancy in the findings. In particular, while moderator '' | ||
+ | |||
+ | For more details on multiple imputation and the mice package, I would suggest to take a look at the book by Van Buuren (2018), which you can also read online (https:// | ||
+ | |||
+ | ==== References ==== | ||
+ | |||
+ | Van Buuren, S. (2018). //Flexible imputation of missing data// (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC. | ||
tips/multiple_imputation_with_mice_and_metafor.txt · Last modified: 2022/08/03 11:35 by Wolfgang Viechtbauer