The metafor Package

Raudenbush (2009)

The Methods and Data

Raudenbush (2009) is an excellent chapter in The handbook of research synthesis and meta-analysis (2nd ed.) and describes in detail many of the models and methods that are implemented in the rma() function (including the meta-analytic random- and mixed-effects models). The dataset that is used for the illustration of the various models and methods is actually the same that is used in the Raudenbush and Bryk (1985) and provides the results from 19 studies examining how teachers' expectations about their pupils can influence actual IQ levels (Raudenbush, 1984). A reproduction of the analyses described in Raudenbush and Bryk (1985) can be found here.

Here, I will reproduce the results from Raudenbush (2009). The data are provided in Table 16.1 (p. 300) in the chapter and can be loaded with:

library(metafor)
dat <- dat.raudenbush1985
dat

(I copy the dataset into dat, which is a bit shorter and therefore easier to type further below). The contents of the dataset are:

   study               author year weeks setting tester    yi     vi
1      1     Rosenthal et al. 1974     2   group  aware  0.03 0.0156
2      2          Conn et al. 1968    21   group  aware  0.12 0.0216
3      3          Jose & Cody 1971    19   group  aware -0.14 0.0279
4      4   Pellegrini & Hicks 1972     0   group  aware  1.18 0.1391
5      5   Pellegrini & Hicks 1972     0   group  blind  0.26 0.1362
6      6    Evans & Rosenthal 1968     3   group  aware -0.06 0.0106
7      7       Fielder et al. 1971    17   group  blind -0.02 0.0106
8      8             Claiborn 1969    24   group  aware -0.32 0.0484
9      9               Kester 1969     0   group  aware  0.27 0.0269
10    10              Maxwell 1970     1   indiv  blind  0.80 0.0630
11    11               Carter 1970     0   group  blind  0.54 0.0912
12    12              Flowers 1966     0   group  blind  0.18 0.0497
13    13              Keshock 1970     1   indiv  blind -0.02 0.0835
14    14            Henrikson 1970     2   indiv  blind  0.23 0.0841
15    15                 Fine 1972    17   group  aware -0.18 0.0253
16    16              Grieger 1970     5   group  blind -0.06 0.0279
17    17 Rosenthal & Jacobson 1968     1   group  aware  0.30 0.0193
18    18   Fleming & Anttonen 1971     2   group  blind  0.07 0.0088
19    19             Ginsburg 1970     7   group  aware -0.07 0.0303

Note that there is a typo in Table 16.1 in the book chapter: The observed standardized mean difference for study 10 was $0.80$ and not $0.87$ (cf. Raudenbush and Bryk, 1985, Table 1).

Equal-Effects Model

The results from a model assuming homogeneous effects can now be obtained with:

res.EE <- rma(yi, vi, data=dat, digits=3, method="EE")
res.EE

Equal-Effects Model (k = 19)
 
I^2 (total heterogeneity / total variability):   49.76%
H^2 (total variability / sampling variability):  1.99
 
Test for Heterogeneity:
Q(df = 18) = 35.830, p-val = 0.007
 
Model Results:
 
estimate     se   zval   pval   ci.lb  ci.ub
   0.060  0.036  1.655  0.098  -0.011  0.132  .
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These results match exactly what is reported in Table 16.2 (p. 301). In particular, $\hat{\theta} = 0.060$, $SE[\hat{\theta}] = 0.036$, and $z = 1.66$ for the test $\mbox{H}_0{:}\; \theta = 0$. The bounds of a 95% confidence interval for $\theta$ are given by $-0.011$ and $0.132$ (p. 301). However, the true effects are found to be heterogeneous ($Q(18) = 35.83$, $p = .007$) (p. 302).

Random-Effects Model

Next, Raudenbush (2009) uses REML estimation to fit a random-effects model. Since REML estimation is the default for the rma() function, we can obtain the same results with:

res.RE <- rma(yi, vi, data=dat, digits=3)
res.RE

Random-Effects Model (k = 19; tau^2 estimator: REML)
 
tau^2 (estimated amount of total heterogeneity): 0.019 (SE = 0.016)
tau (square root of estimated tau^2 value):      0.137
I^2 (total heterogeneity / total variability):   41.86%
H^2 (total variability / sampling variability):  1.72
 
Test for Heterogeneity:
Q(df = 18) = 35.830, p-val = 0.007
 
Model Results:
 
estimate       se     zval     pval    ci.lb    ci.ub
   0.084    0.052    1.621    0.105   -0.018    0.185
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Again, these are the same results reported in Table 16.2 (p. 301). In particular, $\hat{\tau}^2 = .019$, $\hat{\mu} = .084$, $SE[\hat{\mu}] = .052$, and $z = 1.62$ for the test $\mbox{H}_0{:}\; \mu = 0$.

Prediction Interval

Raudenbush (2009) also reports the results from a 95% prediction interval (or "plausible value interval"), which can be obtained with $\hat{\mu} \pm 1.96 \hat{\tau}$. Such an interval can be obtained with:

predict(res.RE)

  pred    se  ci.lb ci.ub  pi.lb pi.ub
 0.089 0.056 -0.020 0.199 -0.245 0.423

but note that the interval ($-0.245$ to $0.423$) is a bit wider than the one reported in the book chapter (p. 302). That is because the prediction interval is computed with $\hat{\mu} \pm 1.96 \sqrt{\hat{\tau}^2 + SE[\hat{\mu}]^2}$ in the metafor package (see here for more details).

Measures of Heterogeneity

As additional measures of heterogeneity, Raudenbush (2009) describes the "variation to signal ratio" $H$ (p. 302) and the "intra-class correlation" $\hat{\lambda}$ (p. 3030). First, note that $H$ is actually $H^2$ in the notation of Higgins and Thompson (2002) and $\hat{\lambda}$ is akin to $I^2$ (but not quite the same in the way it is computed). Moreover, note that $I^2$ and $H^2$ are computed in the metafor package based on more general definitions than typically used in practice. The equations used in the metafor package are described in detail under the Frequently Asked Questions section. Depending on the method used to estimate $\tau^2$, one will get different values for $I^2$ and $H^2$. The equations that are typically used to compute $I^2$ and $H^2$ (i.e., $I^2 = 100\% \times (Q - (k-1))/Q$ and $H^2 = Q/(k-1)$) are actually special versions of the more general definitions used in the metafor package. When using the DerSimonian-Laird estimator of $\tau^2$, then the two definitions coincide. This can be seen if we use:

res.DL <- rma(yi, vi, data=dat, digits=3, method="DL")
res.DL

Random-Effects Model (k = 19; tau^2 estimator: DL)
 
tau^2 (estimated amount of total heterogeneity): 0.026 (SE = 0.019)
tau (square root of estimated tau^2 value):      0.161
I^2 (total heterogeneity / total variability):   49.76%
H^2 (total variability / sampling variability):  1.99
 
Test for Heterogeneity:
Q(df = 18) = 35.830, p-val = 0.007
 
Model Results:
 
estimate       se     zval     pval    ci.lb    ci.ub
   0.089    0.056    1.601    0.109   -0.020    0.199
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now, $I^2 = 100 \times (35.83 - 18)/ 35.83 = 49.76%$ and $H^2 = 35.83 / 18 = 1.99$. So, if you want metafor to use the more conventional equations for computing $I^2$ and $H^2$, use method="DL" when fitting the model with the rma() function.

Mixed-Effects Model

Next, Raudenbush (2009) use a mixed-effects model with the number of prior contact weeks as predictor/moderator. For this analysis, all week values greater than 3 are recoded to 3:

dat$weeks.c <- ifelse(dat$weeks > 3, 3, dat$weeks)

Then, the mixed-effects model can be fitted with:

res.ME <- rma(yi, vi, mods = ~ weeks.c, data=dat, digits=3)
res.ME

Mixed-Effects Model (k = 19; tau^2 estimator: REML)
 
tau^2 (estimated amount of residual heterogeneity):     0.000 (SE = 0.007)
tau (square root of estimated tau^2 value):             0.001
I^2 (residual heterogeneity / unaccounted variability): 0.00%
H^2 (unaccounted variability / sampling variability):   1.00
R^2 (amount of heterogeneity accounted for):            100.00%
 
Test for Residual Heterogeneity:
QE(df = 17) = 16.571, p-val = 0.484
 
Test of Moderators (coefficient(s) 2):
QM(df = 1) = 19.258, p-val < .001
 
Model Results:
 
         estimate     se    zval   pval   ci.lb   ci.ub
intrcpt     0.407  0.087   4.678  <.001   0.237   0.578  ***
weeks.c    -0.157  0.036  -4.388  <.001  -0.227  -0.087  ***
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

These results again match the results in Table 16.2. The residual amount of heterogeneity is now $\hat{\tau}^2 \approx 0$ and the test statistic for $\mbox{H}_0{:}\; \tau^2 = 0$ is $Q_E(df=17) = 16.57$. The estimated model is $E(d_i) = .407 - 0.157 x_i$, where $x_i$ is the number of prior contact weeks. The standard errors of the model coefficients are $SE[b_0] = .087$ and $SE[b_1] = .036$, so that the test statistics are $z_0 = 4.68$ and $z_1 = -4.39$, respectively.¹⁾ The amount of heterogeneity accounted for by the moderator included in the mixed-effects model (p. 304) is given under R^2 as part of the output above. In this example, all of the heterogeneity is accounted for by the moderator.

Empirical Bayes Estimates

The unconditional shrinkage (empirical Bayes) estimates discussed on pages 304-305 can be obtained with:

blup(res.RE)

     pred    se  pi.lb pi.ub
1   0.054 0.095 -0.132 0.241
2   0.101 0.104 -0.103 0.304
3  -0.006 0.110 -0.223 0.210
4   0.214 0.137 -0.053 0.482
5   0.105 0.136 -0.162 0.372
6  -0.008 0.084 -0.174 0.157
7   0.017 0.084 -0.148 0.183
8  -0.029 0.122 -0.269 0.210
9   0.160 0.110 -0.054 0.375
10  0.249 0.127  0.000 0.497
11  0.162 0.132 -0.097 0.421
12  0.110 0.123 -0.130 0.351
13  0.065 0.131 -0.192 0.321
14  0.110 0.131 -0.146 0.367
15 -0.029 0.108 -0.241 0.183
16  0.026 0.110 -0.191 0.242
17  0.191 0.101 -0.008 0.389
18  0.074 0.079 -0.081 0.230
19  0.025 0.112 -0.195 0.245

For example, for study 4, the empirical Bayes estimate is $\hat{\theta}_4 = 0.21$ (p. 305).²⁾ The range of the empirical Bayes estimates is:

round(range(blup(res)$pred), 2)

[1] -0.03  0.25

as reported in Table 16.2 and on page 305.

For the mixed-effects model, the conditional shrinkage estimates can be obtained with:

blup(res.ME)

     pred    se  pi.lb pi.ub
1   0.093 0.037  0.020 0.166
2  -0.065 0.046 -0.155 0.026
3  -0.065 0.046 -0.155 0.026
4   0.407 0.087  0.237 0.578
5   0.407 0.087  0.237 0.578
6  -0.065 0.046 -0.155 0.026
7  -0.065 0.046 -0.155 0.026
8  -0.065 0.046 -0.155 0.026
9   0.407 0.087  0.237 0.578
10  0.250 0.057  0.139 0.361
11  0.407 0.087  0.237 0.578
12  0.407 0.087  0.237 0.578
13  0.250 0.057  0.139 0.361
14  0.093 0.037  0.020 0.166
15 -0.065 0.046 -0.155 0.026
16 -0.065 0.046 -0.155 0.026
17  0.250 0.057  0.139 0.361
18  0.093 0.037  0.020 0.166
19 -0.065 0.046 -0.155 0.026

These values are actually the same now as the fitted values (since $\hat{\tau}^2 \approx 0$), which can be obtained with (not shown):

predict(res.ME)

Alternative $\tau^2$ Estimators

The results in Table 16.3 using the "conventional" approach can be reproduced with:

res.std <- list()
 
res.std$EE   <- rma(yi, vi, data=dat, digits=3, method="EE")
res.std$ML   <- rma(yi, vi, data=dat, digits=3, method="ML")
res.std$REML <- rma(yi, vi, data=dat, digits=3, method="REML")
res.std$DL   <- rma(yi, vi, data=dat, digits=3, method="DL")
res.std$HE   <- rma(yi, vi, data=dat, digits=3, method="HE")
 
round(t(sapply(res.std, function(x)
               c(tau2=x$tau2, mu=x$b, se=x$se, z=x$zval,
                 ci.lb=x$ci.lb, ci.ub=x$ci.ub))), 3)

      tau2    mu    se     z  ci.lb ci.ub
EE   0.000 0.060 0.036 1.655 -0.011 0.132
ML   0.013 0.078 0.047 1.637 -0.015 0.171
REML 0.019 0.084 0.052 1.621 -0.018 0.185
DL   0.026 0.089 0.056 1.601 -0.020 0.199
HE   0.080 0.114 0.079 1.443 -0.041 0.270

The results under "method of moments" in Table 16.3 correspond to those obtained when using the DerSimonian-Laird estimator (method="DL"). The estimator by Hedges (1983) is another method of moments estimator (described in the chapter on page 311) and is also included in the results above (method="HE").

Knapp & Hartung Method

The "quasi-F" approach (described on pages 312-313) is the same as the Knapp and Hartung (2003) method. These results can be obtained with:

res.knha <- list()
 
res.knha$EE   <- rma(yi, vi, data=dat, digits=3, method="EE", test="knha")
res.knha$ML   <- rma(yi, vi, data=dat, digits=3, method="ML", test="knha")
res.knha$REML <- rma(yi, vi, data=dat, digits=3, method="REML", test="knha")
res.knha$DL   <- rma(yi, vi, data=dat, digits=3, method="DL", test="knha")
res.knha$HE   <- rma(yi, vi, data=dat, digits=3, method="HE", test="knha")
 
round(t(sapply(res.knha, function(x)
               c(tau2=x$tau2, mu=x$b, se=x$se, z=x$zval,
                 ci.lb=x$ci.lb, ci.ub=x$ci.ub))), 3)

      tau2    mu    se     z  ci.lb ci.ub
EE   0.000 0.060 0.051 1.173 -0.048 0.168
ML   0.013 0.078 0.059 1.311 -0.047 0.202
REML 0.019 0.084 0.062 1.359 -0.046 0.213
DL   0.026 0.089 0.064 1.405 -0.044 0.223
HE   0.080 0.114 0.071 1.608 -0.035 0.264

Huber-White Method

The results using the Huber-White method (as described on pages 313-314) can be obtained with:

res.hw <- list()
 
res.hw$EE   <- robust(res.std$EE,   cluster=dat$study, adjust=FALSE)
res.hw$ML   <- robust(res.std$ML,   cluster=dat$study, adjust=FALSE)
res.hw$REML <- robust(res.std$REML, cluster=dat$study, adjust=FALSE)
res.hw$DL   <- robust(res.std$DL,   cluster=dat$study, adjust=FALSE)
res.hw$HE   <- robust(res.std$HE,   cluster=dat$study, adjust=FALSE)
 
round(t(sapply(res.hw, function(x)
               c(tau2=x$tau2, mu=x$b, se=x$se, t=x$tval,
                 ci.lb=x$ci.lb, ci.ub=x$ci.ub))), 3)

      tau2    mu    se     t  ci.lb ci.ub
EE   0.000 0.060 0.040 1.515 -0.023 0.144
ML   0.013 0.078 0.047 1.637 -0.022 0.178
REML 0.019 0.084 0.050 1.676 -0.021 0.189
DL   0.026 0.089 0.052 1.710 -0.020 0.199
HE   0.080 0.114 0.062 1.850 -0.015 0.244

Note that adjust=FALSE must be used here to reproduce the results given (however, in practice, the small-sample correction should be used).

Final Note

Please note that there appear to be a few typos in Table 16.3. Most of the smaller discrepancies are probably due to rounding differences, but a few values are a bit more off. For example, for the conventional full MLE approach, $\hat{\mu} = .078$ and $SE[\hat{\mu}] = .047$, so that the upper CI bound should be approximately $.078 + 1.96(.047) \approx .170$, but it is reported as $.136$ (which does not follow based on the values for $\hat{\mu}$ and $SE[\hat{\mu}]$ reported in the table).

References

Hedges, L. V. (1983). A random effects model for effect sizes. Psychological Bulletin, 93(2), 388–395.

Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558.

Knapp, G., & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. Statistics in Medicine, 22(17), 2693–2710.

Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76(1), 85–97.

Raudenbush, S. W. (2009). Analyzing effect sizes: Random effects models. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 295–315). New York: Russell Sage Foundation.

Raudenbush, S. W., & Bryk, A. S. (1985). Empirical Bayes meta-analysis. Journal of Educational Statistics, 10(2), 75–98.

Riley, R. D., Higgins, J. P., & Deeks, J. J. (2011). Interpretation of random effects meta-analyses. British Medical Journal, 342, d549.

¹⁾

Note that there is a typo in Table 16.2. The test statistic for the slope is not -4.90, but -4.39.

²⁾

Note that equation 16.25 in Raudenbush (2009) is an approximation for the posterior variance of the empirical Bayes estimates. The metafor package uses the exact equation (see Appendix in Raudenbush & Bryk, 1985).

Table of Contents