The metafor Package

This is an old revision of the document!

Bootstrapping with Meta-Analytic Models

The use of bootstrapping in the meta-analytic context has been suggested by a number of authors (e.g., Adams, Gurevitch, & Rosenberg, 1997; van den Noortgate & Onghena, 2005; Switzer, Paese, & Drasgow, 1992; Turner et al., 2000). The example below shows how to conduct parametric and non-parametric bootstrapping using the metafor and boot packages in combination. The example is based on a meta-analysis by Collins et al. (1985) examining the effectiveness of diuretics in pregnancy for preventing pre-eclampsia.

Data Preparation

The data can be loaded with:

library(metafor)
dat <- dat.collins1985b[,1:7]
dat

We only need the first 7 columns of the dataset (the remaining columns pertain to other outcomes). The contents of the dataset are:

  id                  author year pre.nti pre.nci pre.xti pre.xci
1  1       Weseley & Douglas 1962     131     136      14      14
2  2          Flowers et al. 1962     385     134      21      17
3  3                 Menzies 1964      57      48      14      24
4  4           Fallis et al. 1964      38      40       6      18
5  5         Cuadros & Tatum 1964    1011     760      12      35
6  6        Landesman et al. 1965    1370    1336     138     175
7  7            Kraus et al. 1966     506     524      15      20
8  8    Tervila & Vartiainen 1971     108     103       6       2
9  9 Campbell & MacGillivray 1975     153     102      65      40

Variables pre.nti and pre.nci indicate the number of women in the treatment and control/placebo groups, respectively, while pre.xti and pre.xci indicate the number of women in the respective groups with any form of pre-eclampsia during the pregnancy.

The corresponding log odds ratios (and corresponding sampling variances) can then be computed and added to the dataset with:

dat <- escalc(measure="OR", ai=pre.xti, n1i=pre.nti, ci=pre.xci, n2i=pre.nci, data=dat)
dat

  id                  author year pre.nti pre.nci pre.xti pre.xci      yi     vi
1  1       Weseley & Douglas 1962     131     136      14      14  0.0418 0.1596
2  2          Flowers et al. 1962     385     134      21      17 -0.9237 0.1177
3  3                 Menzies 1964      57      48      14      24 -1.1221 0.1780
4  4           Fallis et al. 1964      38      40       6      18 -1.4733 0.2989
5  5         Cuadros & Tatum 1964    1011     760      12      35 -1.3910 0.1143
6  6        Landesman et al. 1965    1370    1336     138     175 -0.2969 0.0146
7  7            Kraus et al. 1966     506     524      15      20 -0.2615 0.1207
8  8    Tervila & Vartiainen 1971     108     103       6       2  1.0888 0.6864
9  9 Campbell & MacGillivray 1975     153     102      65      40  0.1353 0.0679

Random-Effects Model

For the analyses to be shown below, we will focus on the meta-analytic random-effects model, letting $y_i$ denote the observed outcome/effect for a particular study (variable yi in the dataset above) and $\theta_i$ the corresponding true outcome/effect. We assume that $$y_i = \theta_i + e_i,$$ where $e_i \sim N(0, v_i)$ denotes sampling error, whose variance is (approximately) known (variable vi in the dataset above). The true effects are in turn assumed to be given by $$\theta_i = \mu + u_i,$$ where $u_i \sim N(0, \tau^2)$, so that $\mu$ denotes the average true outcome/effect and $\tau^2$ the variance in the true outcomes/effects. The model implies that $$y_i \sim N(\mu, \tau^2 + v_i).$$ The goal is to estimate $\mu$ and $\tau^2$ and to construct corresponding confidence intervals for these parameters.

The results from fitting a random-effects model (using REML estimation) to the data given above can be obtained with:

res <- rma(yi, vi, data=dat)
res

Random-Effects Model (k = 9; tau^2 estimator: REML)
 
tau^2 (estimated amount of total heterogeneity): 0.3008 (SE = 0.2201)
tau (square root of estimated tau^2 value):      0.5484
I^2 (total heterogeneity / total variability):   75.92%
H^2 (total variability / sampling variability):  4.15
 
Test for Heterogeneity: 
Q(df = 8) = 27.2649, p-val = 0.0006
 
Model Results:
 
estimate      se     zval    pval    ci.lb    ci.ub   
 -0.5181  0.2236  -2.3167  0.0205  -0.9564  -0.0798  *
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Therefore, the estimated average log odds ratio is equal to $\hat{\mu} = -.52$ with an approximate Wald-type 95% CI of $(-0.96, -0.08)$. The variance of the true log odds ratios is estimated to be $\hat{\tau}^2 = .30$. Confidence intervals for $\tau^2$ can be obtained in a variety of ways – see Viechtbauer (2007) for an illustration of a variety of different methods. One of them is the so-called Q-profile method, which provides exact CIs (under the assumptions of the model). Using this method, the CI bounds can be obtained with:

confint(res)

       estimate   ci.lb   ci.ub
tau^2    0.3008  0.0723  2.2027
tau      0.5484  0.2689  1.4842
I^2(%)  75.9238 43.1209 95.8494
H^2      4.1535  1.7581 24.0929

Note that CIs for the $I^2$ and $H^2$ statistics are also provided.

Parametric Bootstrapping

We will now use the boot package to obtain parametric and non-parametric bootstrap CIs for $\mu$ and $\tau^2$, so we start by loading the package with:

library(boot)

For parametric bootstrapping, we need to define two functions, one for calculating the statistic(s) of interest (and possibly the corresponding variance(s)) based on the bootstrap data, the second for actually generating the bootstrap data. In the present case, our interest is focused on the estimates of $\mu$ and $\tau^2$, so the first function could be written as:

boot.func <- function(data.boot) {
 
   res <- try(rma(yi, vi, data=data.boot), silent=TRUE)
 
   if (is.element("try-error", class(res))) {
      NA
   } else {
      c(coef(res), vcov(res), res$tau2, res$se.tau2^2)
   }
 
}

The purpose of the try() function is to catch cases where the algorithm used to obtain the REML estimate of $\tau^2$ does not converge. Otherwise, the function returns the estimate of $\mu$, the corresponding variance, the estimate of $\tau^2$, and its corresponding variance.

For a random-effects model, the data generation process is described by the last equation given above, where the two unknown parameters are replaced by their corresponding estimates (i.e., based on the fitted model). Therefore, the second function needed for the parametric bootstrapping is:

data.gen <- function(dat, mle) {
   data.frame(yi=rnorm(nrow(dat), mle$mu, sqrt(mle$tau2 + dat$vi)), vi=dat$vi)
}

Next, we can do the actual bootstrapping (based on 10,000 bootstrap samples) with (note that setting the seed allows for reproducibility of the results):

set.seed(8781328)
res.boot <- boot(dat, boot.func, R=10000, sim="parametric", ran.gen=data.gen, mle=list(mu=coef(res), tau2=res$tau2))
res.boot

PARAMETRIC BOOTSTRAP
 
Call:
boot(data = dat, statistic = boot.func, R = 10000, sim = "parametric",
    ran.gen = data.gen, mle = list(mu = coef(res), tau2 = res$tau2))
 
Bootstrap Statistics :
       original        bias    std. error
t1* -0.51810321  0.0043534226  0.22661343
t2*  0.05001318 -0.0004975425  0.02678066
t3*  0.30079577  0.0032844143  0.22289406
t4*  0.04845165  0.0136897479  0.07136212

Finally, a variety of different CIs for $\mu$ can be obtained with:

boot.ci(res.boot, type=c("norm", "basic", "stud", "perc"), index=1:2)

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 9964 bootstrap replicates
 
CALL :
boot.ci(boot.out = res.boot, type = c("norm", "basic", "stud",
    "perc"), index = 1:2)
 
Intervals : 
Level      Normal              Basic         
95%   (-0.9666, -0.0783 )   (-0.9604, -0.0720 )  
 
Level    Studentized          Percentile     
95%   (-1.0661,  0.0241 )   (-0.9642, -0.0758 )
Calculations and Intervals on Original Scale

All of the intervals except the one based on the studentized method are similar to the Wald-type CI obtained earlier. The plot below shows the bootstrap distribution of $\hat{\mu}$ (with a kernel density estimate of the distribution superimposed and the tails shaded based on the percentile CI).

Since the studentized method is based on the use of the t-distribution, it is not surprising that more comparable results to the studentized CI can be obtained by using the Knapp and Hartung method when fitting the random-effects model (which also leads to the use of the t-distribution when constructing CIs for the fixed effects of the model). In particular, this can be done with:

rma(yi, vi, data=dat, test="knha")

Model Results:
 
estimate       se     tval     pval    ci.lb    ci.ub
 -0.5181   0.2408  -2.1512   0.0637  -1.0735   0.0373

The CI obtained in this manner is more similar to the one obtained using the studentized method.

Next, parametric bootstrap CIs for $\tau^2$ can be obtained with:

boot.ci(res.boot, type=c("norm", "basic", "stud", "perc"), index=3:4)

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 9964 bootstrap replicates
 
CALL :
boot.ci(boot.out = res.boot, type = c("norm", "basic", "stud",
    "perc"), index = 3:4)
 
Intervals : 
Level      Normal              Basic         
95%   (-0.1394,  0.7344 )   (-0.2358,  0.6016 )  
 
Level    Studentized          Percentile     
95%   ( 0.0644,  2.2430 )   ( 0.0000,  0.8374 )
Calculations and Intervals on Original Scale

The various CIs are quite different from each other. Interestingly, the studentized method yields bounds that are quite similar to the ones obtained with the Q-profile method. Below is a plot of the bootstrap distribution of $\hat{\tau}^2$ (again, with the kernel density estimate and the tail regions shaded based on the percentile CI).

Non-Parametric Bootstrapping

For the non-parametric bootstrap method, we only need to define one function with two arguments, one for the original data and one for a vector of indices which define the bootstrap sample. The function again returns the statistics of interest (and the corresponding variances):

boot.func <- function(dat, indices) {
 
   res <- try(rma(yi, vi, data=dat, subset=indices), silent=TRUE)
 
   if (is.element("try-error", class(res))) {
      NA
   } else {
      c(coef(res), vcov(res), res$tau2, res$se.tau2^2)
   }
 
}

The indices that define the bootstrap sample can be directly passed to the subset argument, which will then select the appropriate rows from the dataset. Again, the try() function is used to catch the occasional case of non-convergence.

The actual bootstrapping can then be carried out with:

set.seed(8781328)
res.boot <- boot(dat, boot.func, R=10000)
res.boot

ORDINARY NONPARAMETRIC BOOTSTRAP
 
Call:
boot(data = dat, statistic = boot.func, R = 10000)
 
Bootstrap Statistics :
       original       bias    std. error
t1* -0.51810321  0.004682588  0.21214699
t2*  0.05001318 -0.004397091  0.02165423
t3*  0.30079577 -0.036919665  0.15270307
t4*  0.04845165  0.001115270  0.04832504

The various non-parametric bootstrap CIs for $\mu$ can now be obtained with:

boot.ci(res.boot, index=1:2)

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 9980 bootstrap replicates
 
CALL :
boot.ci(boot.out = res.boot, index = 1:2)
 
Intervals : 
Level      Normal              Basic             Studentized     
95%   (-0.9386, -0.1070 )   (-0.9158, -0.0932 )   (-1.3070, -0.0274 )  
 
Level     Percentile            BCa          
95%   (-0.9430, -0.1204 )   (-0.9827, -0.1530 )
Calculations and Intervals on Original Scale

Now, the so-called bias-corrected and accelerated (BCa) CI is also included. The bootstrap distribution is shown below.

Finally, the various CIs for $\tau^2$ can be obtained with:

boot.ci(res.boot, index=3:4)

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 9980 bootstrap replicates
 
CALL :
boot.ci(boot.out = res.boot, index = 3:4)
 
Intervals : 
Level      Normal              Basic             Studentized     
95%   ( 0.0384,  0.6370 )   (-0.0047,  0.6016 )   ( 0.1396,  4.8907 )  
 
Level     Percentile            BCa          
95%   ( 0.0000,  0.6063 )   ( 0.0685,  0.8122 )
Calculations and Intervals on Original Scale

Again, the bootstrap distribution is shown below.

Discussion

The code above is shown for illustrative purposes only. Whether bootstrap CIs for $\mu$ are preferable to the standard Wald-type or Knapp and Hartung adjusted CIs is unclear. Also, the results based on Viechtbauer (2007) suggest that the coverage probabilities of parametric and non-parametric bootstrap CIs for $\tau^2$ are less than adequate (however, only percentile and BCa intervals were closely examined in that paper, leaving the accuracy of the other bootstrap CIs unknown). Finally, other bootstrap strategies (e.g., the error bootstrap) are described by van den Noortgate and Onghena (2005).

References

Adams, D. C., Gurevitch, J., & Rosenberg, M. S. (1997). Resampling tests for meta-analysis of ecological data. Ecology, 78(5), 1277–1283.

Collins, R., Yusuf, S., & Peto, R. (1985). Overview of randomised trials of diuretics in pregnancy. British Medical Journal, 290(6461), 17–23.

van den Noortgate, W., & Onghena, P. (2005). Parametric and nonparametric bootstrap methods for meta-analysis. Behavior Research Methods, 37(1), 11–22.

Switzer III, F. S., Paese, P. W., & Drasgow, F. (1992). Bootstrap estimates of standard errors in validity generalization. Journal of Applied Psychology, 77(2), 123–129.

Turner, R. M., Omar, R. Z., Yang, M., Goldstein, H., & Thompson, S. G. (2000). A multilevel model framework for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine, 19(24), 3417–3432.

Viechtbauer, W. (2007). Confidence intervals for the amount of heterogeneity in meta-analysis. Statistics in Medicine, 26(1), 37–52.

Table of Contents