# The metafor Package

A Meta-Analysis Package for R

### Sidebar

analyses:raudenbush2009

## Raudenbush (2009)

### The Methods and Data

Raudenbush (2009) is an excellent chapter in The handbook of research synthesis and meta-analysis (2nd ed.) and describes in detail many of the models and methods that are implemented in the rma() function (including the meta-analytic random- and mixed-effects models). The dataset that is used for the illustration of the various models and methods is actually the same that is used in the Raudenbush and Bryk (1985) and provides the results from 19 studies examining how teachers' expectations about their pupils can influence actual IQ levels (Raudenbush, 1984). A reproduction of the analyses described in Raudenbush and Bryk (1985) can be found here.

Here, I will reproduce the results from Raudenbush (2009). The data are provided in Table 16.1 (p. 300) in the chapter and can be loaded with:

library(metafor)
dat <- dat.raudenbush1985
dat

(I copy the dataset into 'dat', which is a bit shorter and therefore easier to type further below). The contents of the dataset are:

   study               author year weeks setting tester    yi     vi
1      1     Rosenthal et al. 1974     2   group  aware  0.03 0.0156
2      2          Conn et al. 1968    21   group  aware  0.12 0.0216
3      3          Jose & Cody 1971    19   group  aware -0.14 0.0279
4      4   Pellegrini & Hicks 1972     0   group  aware  1.18 0.1391
5      5   Pellegrini & Hicks 1972     0   group  blind  0.26 0.1362
6      6    Evans & Rosenthal 1968     3   group  aware -0.06 0.0106
7      7       Fielder et al. 1971    17   group  blind -0.02 0.0106
8      8             Claiborn 1969    24   group  aware -0.32 0.0484
9      9               Kester 1969     0   group  aware  0.27 0.0269
10    10              Maxwell 1970     1   indiv  blind  0.80 0.0630
11    11               Carter 1970     0   group  blind  0.54 0.0912
12    12              Flowers 1966     0   group  blind  0.18 0.0497
13    13              Keshock 1970     1   indiv  blind -0.02 0.0835
14    14            Henrikson 1970     2   indiv  blind  0.23 0.0841
15    15                 Fine 1972    17   group  aware -0.18 0.0253
16    16              Grieger 1970     5   group  blind -0.06 0.0279
17    17 Rosenthal & Jacobson 1968     1   group  aware  0.30 0.0193
18    18   Fleming & Anttonen 1971     2   group  blind  0.07 0.0088
19    19             Ginsburg 1970     7   group  aware -0.07 0.0303

Note that there is a typo in Table 16.1 in the book chapter: The observed standardized mean difference for study 10 was $0.80$ and not $0.87$ (cf. Raudenbush and Bryk, 1985, Table 1).

### Fixed-Effects Model

The results from a model assuming homogeneous effects can now be obtained with:

res.FE <- rma(yi, vi, data=dat, digits=3, method="FE")
res.FE
Fixed-Effects Model (k = 19)

Test for Heterogeneity:
Q(df = 18) = 35.830, p-val = 0.007

Model Results:

estimate       se     zval     pval    ci.lb    ci.ub
0.060    0.036    1.655    0.098   -0.011    0.132        .

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

These results match exactly what is reported in Table 16.2 (p. 301). In particular, $\hat{\theta} = 0.060$, $SE[\hat{\theta}] = 0.036$, and $z = 1.66$ for the test $H_0: \theta = 0$. The bounds of a 95% confidence interval for $\theta$ are given by $-0.011$ and $0.132$ (p. 301). However, the true effects are found to be heterogeneous ($Q(18) = 35.83$, $p = .007$) (p. 302).

Sidenote: The title of the output above indicates that these are the results from a "fixed-effects model" (and not "homogeneous effects model"). To be precise, this means that these results are also correct even if the true effects are heterogeneous, as long as the conclusions are restricted to the $k = 19$ studies included in the meta-analysis. Moreover, since these results are based (by default) on weighted estimation, the model estimate (i.e., $0.060$) is an estimate of the weighted average of the true effects in these 19 studies (with inverse variance weights). In fact, since the true effects appear to be heterogeneous, this is really the correct way of interpreting the $0.060$ value.

### Random-Effects Model

Next, Raudenbush (2009) uses REML estimation to fit a random-effects model. Since REML estimation is the default for the rma() function, we can obtain the same results with:

res.RE <- rma(yi, vi, data=dat, digits=3)
res.RE
Random-Effects Model (k = 19; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.019 (SE = 0.016)
tau (square root of estimated tau^2 value):      0.137
I^2 (total heterogeneity / total variability):   41.86%
H^2 (total variability / sampling variability):  1.72

Test for Heterogeneity:
Q(df = 18) = 35.830, p-val = 0.007

Model Results:

estimate       se     zval     pval    ci.lb    ci.ub
0.084    0.052    1.621    0.105   -0.018    0.185

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Again, these are the same results reported in Table 16.2 (p. 301). In particular, $\hat{\tau}^2 = .019$, $\hat{\mu} = .084$, $SE[\hat{\mu}] = .052$, and $z = 1.62$ for the test $H_0: \mu = 0$.

### Credibility/Prediction Interval

Raudenbush (2009) also reports the results from a 95% credibility/prediction interval (or "plausible value interval"), which can be obtained with $\hat{\mu} \pm 1.96 \hat{\tau}$. Such an interval can be obtained with:

predict(res.RE)
  pred    se  ci.lb ci.ub  cr.lb cr.ub
0.089 0.056 -0.020 0.199 -0.245 0.423

but note that the interval ($-0.245$ to $0.423$) is a bit wider than the one reported in the book chapter (p. 302). That is because the credibility interval is computed with $\hat{\mu} \pm 1.96 \sqrt{\hat{\tau}^2 + SE[\hat{\mu}]^2}$ in the metafor package (see here for more details).

### Measures of Heterogeneity

As additional measures of heterogeneity, Raudenbush (2009) describes the "variation to signal ratio" $H$ (p. 302) and the "intra-class correlation" $\hat{\lambda}$ (p. 3030). First, note that $H$ is actually $H^2$ in the notation of Higgins and Thompson (2002) and $\hat{\lambda}$ is akin to $I^2$ (but not quite the same in the way it is computed). Moreover, note that $I^2$ and $H^2$ are computed in the metafor package based on more general definitions than typically used in practice. The equations used in the metafor package are described in detail under the Frequently Asked Questions section. Depending on the method used to estimate $\tau^2$, one will get different values for $I^2$ and $H^2$. The equations that are typically used to compute $I^2$ and $H^2$ (i.e., $I^2 = 100\% \times (Q - (k-1))/Q$ and $H^2 = Q/(k-1)$) are actually special versions of the more general definitions used in the metafor package. When using the DerSimonian-Laird estimator of $\tau^2$, then the two definitions coincide. This can be seen if we use:

res.DL <- rma(yi, vi, data=dat, digits=3, method="DL")
res.DL
Random-Effects Model (k = 19; tau^2 estimator: DL)

tau^2 (estimated amount of total heterogeneity): 0.026 (SE = 0.019)
tau (square root of estimated tau^2 value):      0.161
I^2 (total heterogeneity / total variability):   49.76%
H^2 (total variability / sampling variability):  1.99

Test for Heterogeneity:
Q(df = 18) = 35.830, p-val = 0.007

Model Results:

estimate       se     zval     pval    ci.lb    ci.ub
0.089    0.056    1.601    0.109   -0.020    0.199

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now, $I^2 = 100 \times (35.83 - 18)/ 35.83 = 49.76%$ and $H^2 = 35.83 / 18 = 1.99$. So, if you want metafor to use the more conventional equations for computing $I^2$ and $H^2$, use method="DL" when fitting the model with the rma() function.

### Mixed-Effects Model

Next, Raudenbush (2009) use a mixed-effects model with the number of prior contact weeks as predictor/moderator. For this analysis, all week values greater than 3 are recoded to 3:

dat$weeks.c <- ifelse(dat$weeks > 3, 3, dat$weeks) Then, the mixed-effects model can be fitted with: res.ME <- rma(yi, vi, mods = ~ weeks.c, data=dat, digits=3) res.ME Mixed-Effects Model (k = 19; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.000 (SE = 0.007) tau (square root of estimated tau^2 value): 0.001 I^2 (residual heterogeneity / unaccounted variability): 0.00% H^2 (unaccounted variability / sampling variability): 1.00 R^2 (amount of heterogeneity accounted for): 100.00% Test for Residual Heterogeneity: QE(df = 17) = 16.571, p-val = 0.484 Test of Moderators (coefficient(s) 2): QM(df = 1) = 19.258, p-val < .001 Model Results: estimate se zval pval ci.lb ci.ub intrcpt 0.407 0.087 4.678 <.001 0.237 0.578 *** weeks.c -0.157 0.036 -4.388 <.001 -0.227 -0.087 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 These results again match the results in Table 16.2. The residual amount of heterogeneity is now$\hat{\tau}^2 \approx 0$and the test statistic for$H_0: \tau^2 = 0$is$Q_E(df=17) = 16.57$. The estimated model is$E(d_i) = .407 - 0.157 x_i$, where$x_i$is the number of prior contact weeks. The standard errors of the model coefficients are$SE[b_0] = .087$and$SE[b_1] = .036$, so that the test statistics are$z_0 = 4.68$and$z_1 = -4.39$, respectively.1) The amount of heterogeneity accounted for by the moderator included in the mixed-effects model (p. 304) is given under R^2 as part of the output above. In this example, all of the heterogeneity is accounted for by the moderator. ### Empirical Bayes Estimates The unconditional shrinkage (empirical Bayes) estimates discussed on pages 304-305 can be obtained with: blup(res.RE)  pred se pi.lb pi.ub 1 0.054 0.095 -0.132 0.241 2 0.101 0.104 -0.103 0.304 3 -0.006 0.110 -0.223 0.210 4 0.214 0.137 -0.053 0.482 5 0.105 0.136 -0.162 0.372 6 -0.008 0.084 -0.174 0.157 7 0.017 0.084 -0.148 0.183 8 -0.029 0.122 -0.269 0.210 9 0.160 0.110 -0.054 0.375 10 0.249 0.127 0.000 0.497 11 0.162 0.132 -0.097 0.421 12 0.110 0.123 -0.130 0.351 13 0.065 0.131 -0.192 0.321 14 0.110 0.131 -0.146 0.367 15 -0.029 0.108 -0.241 0.183 16 0.026 0.110 -0.191 0.242 17 0.191 0.101 -0.008 0.389 18 0.074 0.079 -0.081 0.230 19 0.025 0.112 -0.195 0.245 For example, for study 4, the empirical Bayes estimate is$\hat{\theta}_4 = 0.21$(p. 305).2) The range of the empirical Bayes estimates is: round(range(blup(res)$pred), 2)
[1] -0.03  0.25

as reported in Table 16.2 and on page 305.

For the mixed-effects model, the conditional shrinkage estimates can be obtained with:

blup(res.ME)
     pred    se  pi.lb pi.ub
1   0.093 0.037  0.020 0.166
2  -0.065 0.046 -0.155 0.026
3  -0.065 0.046 -0.155 0.026
4   0.407 0.087  0.237 0.578
5   0.407 0.087  0.237 0.578
6  -0.065 0.046 -0.155 0.026
7  -0.065 0.046 -0.155 0.026
8  -0.065 0.046 -0.155 0.026
9   0.407 0.087  0.237 0.578
10  0.250 0.057  0.139 0.361
11  0.407 0.087  0.237 0.578
12  0.407 0.087  0.237 0.578
13  0.250 0.057  0.139 0.361
14  0.093 0.037  0.020 0.166
15 -0.065 0.046 -0.155 0.026
16 -0.065 0.046 -0.155 0.026
17  0.250 0.057  0.139 0.361
18  0.093 0.037  0.020 0.166
19 -0.065 0.046 -0.155 0.026

These values are actually the same now as the fitted values (since $\hat{\tau}^2 \approx 0$), which can be obtained with (not shown):

predict(res.ME)

### Alternative $\tau^2$ Estimators

The results in Table 16.3 using the "conventional" approach can be reproduced with:

res.std <- list()

res.std$FE <- rma(yi, vi, data=dat, digits=3, method="FE") res.std$ML   <- rma(yi, vi, data=dat, digits=3, method="ML")
res.std$REML <- rma(yi, vi, data=dat, digits=3, method="REML") res.std$DL   <- rma(yi, vi, data=dat, digits=3, method="DL")
res.std$HE <- rma(yi, vi, data=dat, digits=3, method="HE") round(t(sapply(res.std, function(x) c(tau2=x$tau2, mu=x$b, se=x$se, z=x$zval, ci.lb=x$ci.lb, ci.ub=x$ci.ub))), 3)  tau2 mu se z ci.lb ci.ub FE 0.000 0.060 0.036 1.655 -0.011 0.132 ML 0.013 0.078 0.047 1.637 -0.015 0.171 REML 0.019 0.084 0.052 1.621 -0.018 0.185 DL 0.026 0.089 0.056 1.601 -0.020 0.199 HE 0.080 0.114 0.079 1.443 -0.041 0.270 The results under "method of moments" in Table 16.3 correspond to those obtained when using the DerSimonian-Laird estimator (method="DL"). The estimator by Hedges (1983) is another method of moments estimator (described in the chapter on page 311) and is also included in the results above (method="HE"). ### Knapp & Hartung Method The "quasi-F" approach (described on pages 312-313) is the same as the Knapp and Hartung (2003) method. These results can be obtained with: res.knha <- list() res.knha$FE   <- rma(yi, vi, data=dat, digits=3, method="FE", test="knha")
res.knha$ML <- rma(yi, vi, data=dat, digits=3, method="ML", test="knha") res.knha$REML <- rma(yi, vi, data=dat, digits=3, method="REML", test="knha")
res.knha$DL <- rma(yi, vi, data=dat, digits=3, method="DL", test="knha") res.knha$HE   <- rma(yi, vi, data=dat, digits=3, method="HE", test="knha")

round(t(sapply(res.knha, function(x) c(tau2=x$tau2, mu=x$b, se=x$se, z=x$zval, ci.lb=x$ci.lb, ci.ub=x$ci.ub))), 3)
      tau2    mu    se     z  ci.lb ci.ub
FE   0.000 0.060 0.051 1.173 -0.048 0.168
ML   0.013 0.078 0.059 1.311 -0.047 0.202
REML 0.019 0.084 0.062 1.359 -0.046 0.213
DL   0.026 0.089 0.064 1.405 -0.044 0.223
HE   0.080 0.114 0.071 1.608 -0.035 0.264

### Huber-White Method

The results using the Huber-White method (as described on pages 313-314) can be obtained with:

res.hw <- list()

res.hw$FE <- robust(res.std$FE,   cluster=dat$study, adjust=FALSE) res.hw$ML   <- robust(res.std$ML, cluster=dat$study, adjust=FALSE)
res.hw$REML <- robust(res.std$REML, cluster=dat$study, adjust=FALSE) res.hw$DL   <- robust(res.std$DL, cluster=dat$study, adjust=FALSE)

### References

Hedges, L. V. (1983). A random effects model for effect sizes. Psychological Bulletin, 93(2), 388–395.

Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558.

Knapp, G., & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. Statistics in Medicine, 22(17), 2693–2710.

Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76(1), 85–97.

Raudenbush, S. W. (2009). Analyzing effect sizes: Random effects models. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 295–315). New York: Russell Sage Foundation.

Raudenbush, S. W., & Bryk, A. S. (1985). Empirical Bayes meta-analysis. Journal of Educational Statistics, 10(2), 75–98.

Riley, R. D., Higgins, J. P., & Deeks, J. J. (2011). Interpretation of random effects meta-analyses. British Medical Journal, 342, d549.

1)
Note that there is a typo in Table 16.2. The test statistic for the slope is not -4.90, but -4.39.
2)
Note that equation 16.25 in Raudenbush (2009) is an approximation for the posterior variance of the empirical Bayes estimates. The metafor package uses the exact equation (see Appendix in Raudenbush & Bryk, 1985).