Table of Contents
Raudenbush (2009)
The Methods and Data
Raudenbush (2009) is an excellent chapter in The handbook of research synthesis and meta-analysis (2nd ed.) and describes in detail many of the models and methods that are implemented in the rma()
function (including the meta-analytic random- and mixed-effects models). The dataset that is used for the illustration of the various models and methods is actually the same that is used in the Raudenbush and Bryk (1985) and provides the results from 19 studies examining how teachers' expectations about their pupils can influence actual IQ levels (Raudenbush, 1984). A reproduction of the analyses described in Raudenbush and Bryk (1985) can be found here.
Here, I will reproduce the results from Raudenbush (2009). The data are provided in Table 16.1 (p. 300) in the chapter and can be loaded with:
library(metafor) dat <- dat.raudenbush1985 dat
(I copy the dataset into dat
, which is a bit shorter and therefore easier to type further below). The contents of the dataset are:
study author year weeks setting tester yi vi 1 1 Rosenthal et al. 1974 2 group aware 0.03 0.0156 2 2 Conn et al. 1968 21 group aware 0.12 0.0216 3 3 Jose & Cody 1971 19 group aware -0.14 0.0279 4 4 Pellegrini & Hicks 1972 0 group aware 1.18 0.1391 5 5 Pellegrini & Hicks 1972 0 group blind 0.26 0.1362 6 6 Evans & Rosenthal 1968 3 group aware -0.06 0.0106 7 7 Fielder et al. 1971 17 group blind -0.02 0.0106 8 8 Claiborn 1969 24 group aware -0.32 0.0484 9 9 Kester 1969 0 group aware 0.27 0.0269 10 10 Maxwell 1970 1 indiv blind 0.80 0.0630 11 11 Carter 1970 0 group blind 0.54 0.0912 12 12 Flowers 1966 0 group blind 0.18 0.0497 13 13 Keshock 1970 1 indiv blind -0.02 0.0835 14 14 Henrikson 1970 2 indiv blind 0.23 0.0841 15 15 Fine 1972 17 group aware -0.18 0.0253 16 16 Grieger 1970 5 group blind -0.06 0.0279 17 17 Rosenthal & Jacobson 1968 1 group aware 0.30 0.0193 18 18 Fleming & Anttonen 1971 2 group blind 0.07 0.0088 19 19 Ginsburg 1970 7 group aware -0.07 0.0303
Note that there is a typo in Table 16.1 in the book chapter: The observed standardized mean difference for study 10 was $0.80$ and not $0.87$ (cf. Raudenbush and Bryk, 1985, Table 1).
Equal-Effects Model
The results from a model assuming homogeneous effects can now be obtained with:
res.EE <- rma(yi, vi, data=dat, digits=3, method="EE") res.EE
Equal-Effects Model (k = 19) I^2 (total heterogeneity / total variability): 49.76% H^2 (total variability / sampling variability): 1.99 Test for Heterogeneity: Q(df = 18) = 35.830, p-val = 0.007 Model Results: estimate se zval pval ci.lb ci.ub 0.060 0.036 1.655 0.098 -0.011 0.132 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These results match exactly what is reported in Table 16.2 (p. 301). In particular, $\hat{\theta} = 0.060$, $SE[\hat{\theta}] = 0.036$, and $z = 1.66$ for the test $\mbox{H}_0{:}\; \theta = 0$. The bounds of a 95% confidence interval for $\theta$ are given by $-0.011$ and $0.132$ (p. 301). However, the true effects are found to be heterogeneous ($Q(18) = 35.83$, $p = .007$) (p. 302).
Random-Effects Model
Next, Raudenbush (2009) uses REML estimation to fit a random-effects model. Since REML estimation is the default for the rma()
function, we can obtain the same results with:
res.RE <- rma(yi, vi, data=dat, digits=3) res.RE
Random-Effects Model (k = 19; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.019 (SE = 0.016) tau (square root of estimated tau^2 value): 0.137 I^2 (total heterogeneity / total variability): 41.86% H^2 (total variability / sampling variability): 1.72 Test for Heterogeneity: Q(df = 18) = 35.830, p-val = 0.007 Model Results: estimate se zval pval ci.lb ci.ub 0.084 0.052 1.621 0.105 -0.018 0.185 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Again, these are the same results reported in Table 16.2 (p. 301). In particular, $\hat{\tau}^2 = .019$, $\hat{\mu} = .084$, $SE[\hat{\mu}] = .052$, and $z = 1.62$ for the test $\mbox{H}_0{:}\; \mu = 0$.
Prediction Interval
Raudenbush (2009) also reports the results from a 95% prediction interval (or "plausible value interval"), which can be obtained with $\hat{\mu} \pm 1.96 \hat{\tau}$. Such an interval can be obtained with:
predict(res.RE)
pred se ci.lb ci.ub pi.lb pi.ub 0.089 0.056 -0.020 0.199 -0.245 0.423
but note that the interval ($-0.245$ to $0.423$) is a bit wider than the one reported in the book chapter (p. 302). That is because the prediction interval is computed with $\hat{\mu} \pm 1.96 \sqrt{\hat{\tau}^2 + SE[\hat{\mu}]^2}$ in the metafor package (see here for more details).
Measures of Heterogeneity
As additional measures of heterogeneity, Raudenbush (2009) describes the "variation to signal ratio" $H$ (p. 302) and the "intra-class correlation" $\hat{\lambda}$ (p. 3030). First, note that $H$ is actually $H^2$ in the notation of Higgins and Thompson (2002) and $\hat{\lambda}$ is akin to $I^2$ (but not quite the same in the way it is computed). Moreover, note that $I^2$ and $H^2$ are computed in the metafor package based on more general definitions than typically used in practice. The equations used in the metafor package are described in detail under the Frequently Asked Questions section. Depending on the method used to estimate $\tau^2$, one will get different values for $I^2$ and $H^2$. The equations that are typically used to compute $I^2$ and $H^2$ (i.e., $I^2 = 100\% \times (Q - (k-1))/Q$ and $H^2 = Q/(k-1)$) are actually special versions of the more general definitions used in the metafor package. When using the DerSimonian-Laird estimator of $\tau^2$, then the two definitions coincide. This can be seen if we use:
res.DL <- rma(yi, vi, data=dat, digits=3, method="DL") res.DL
Random-Effects Model (k = 19; tau^2 estimator: DL) tau^2 (estimated amount of total heterogeneity): 0.026 (SE = 0.019) tau (square root of estimated tau^2 value): 0.161 I^2 (total heterogeneity / total variability): 49.76% H^2 (total variability / sampling variability): 1.99 Test for Heterogeneity: Q(df = 18) = 35.830, p-val = 0.007 Model Results: estimate se zval pval ci.lb ci.ub 0.089 0.056 1.601 0.109 -0.020 0.199 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Now, $I^2 = 100 \times (35.83 - 18)/ 35.83 = 49.76%$ and $H^2 = 35.83 / 18 = 1.99$. So, if you want metafor to use the more conventional equations for computing $I^2$ and $H^2$, use method="DL"
when fitting the model with the rma()
function.
Mixed-Effects Model
Next, Raudenbush (2009) use a mixed-effects model with the number of prior contact weeks as predictor/moderator. For this analysis, all week values greater than 3 are recoded to 3:
dat$weeks.c <- ifelse(dat$weeks > 3, 3, dat$weeks)
Then, the mixed-effects model can be fitted with:
res.ME <- rma(yi, vi, mods = ~ weeks.c, data=dat, digits=3) res.ME
Mixed-Effects Model (k = 19; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.000 (SE = 0.007) tau (square root of estimated tau^2 value): 0.001 I^2 (residual heterogeneity / unaccounted variability): 0.00% H^2 (unaccounted variability / sampling variability): 1.00 R^2 (amount of heterogeneity accounted for): 100.00% Test for Residual Heterogeneity: QE(df = 17) = 16.571, p-val = 0.484 Test of Moderators (coefficient(s) 2): QM(df = 1) = 19.258, p-val < .001 Model Results: estimate se zval pval ci.lb ci.ub intrcpt 0.407 0.087 4.678 <.001 0.237 0.578 *** weeks.c -0.157 0.036 -4.388 <.001 -0.227 -0.087 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
These results again match the results in Table 16.2. The residual amount of heterogeneity is now $\hat{\tau}^2 \approx 0$ and the test statistic for $\mbox{H}_0{:}\; \tau^2 = 0$ is $Q_E(df=17) = 16.57$. The estimated model is $E(d_i) = .407 - 0.157 x_i$, where $x_i$ is the number of prior contact weeks. The standard errors of the model coefficients are $SE[b_0] = .087$ and $SE[b_1] = .036$, so that the test statistics are $z_0 = 4.68$ and $z_1 = -4.39$, respectively.1) The amount of heterogeneity accounted for by the moderator included in the mixed-effects model (p. 304) is given under R^2
as part of the output above. In this example, all of the heterogeneity is accounted for by the moderator.
Empirical Bayes Estimates
The unconditional shrinkage (empirical Bayes) estimates discussed on pages 304-305 can be obtained with:
blup(res.RE)
pred se pi.lb pi.ub 1 0.054 0.095 -0.132 0.241 2 0.101 0.104 -0.103 0.304 3 -0.006 0.110 -0.223 0.210 4 0.214 0.137 -0.053 0.482 5 0.105 0.136 -0.162 0.372 6 -0.008 0.084 -0.174 0.157 7 0.017 0.084 -0.148 0.183 8 -0.029 0.122 -0.269 0.210 9 0.160 0.110 -0.054 0.375 10 0.249 0.127 0.000 0.497 11 0.162 0.132 -0.097 0.421 12 0.110 0.123 -0.130 0.351 13 0.065 0.131 -0.192 0.321 14 0.110 0.131 -0.146 0.367 15 -0.029 0.108 -0.241 0.183 16 0.026 0.110 -0.191 0.242 17 0.191 0.101 -0.008 0.389 18 0.074 0.079 -0.081 0.230 19 0.025 0.112 -0.195 0.245
For example, for study 4, the empirical Bayes estimate is $\hat{\theta}_4 = 0.21$ (p. 305).2) The range of the empirical Bayes estimates is:
round(range(blup(res)$pred), 2)
[1] -0.03 0.25
as reported in Table 16.2 and on page 305.
For the mixed-effects model, the conditional shrinkage estimates can be obtained with:
blup(res.ME)
pred se pi.lb pi.ub 1 0.093 0.037 0.020 0.166 2 -0.065 0.046 -0.155 0.026 3 -0.065 0.046 -0.155 0.026 4 0.407 0.087 0.237 0.578 5 0.407 0.087 0.237 0.578 6 -0.065 0.046 -0.155 0.026 7 -0.065 0.046 -0.155 0.026 8 -0.065 0.046 -0.155 0.026 9 0.407 0.087 0.237 0.578 10 0.250 0.057 0.139 0.361 11 0.407 0.087 0.237 0.578 12 0.407 0.087 0.237 0.578 13 0.250 0.057 0.139 0.361 14 0.093 0.037 0.020 0.166 15 -0.065 0.046 -0.155 0.026 16 -0.065 0.046 -0.155 0.026 17 0.250 0.057 0.139 0.361 18 0.093 0.037 0.020 0.166 19 -0.065 0.046 -0.155 0.026
These values are actually the same now as the fitted values (since $\hat{\tau}^2 \approx 0$), which can be obtained with (not shown):
predict(res.ME)
Alternative $\tau^2$ Estimators
The results in Table 16.3 using the "conventional" approach can be reproduced with:
res.std <- list() res.std$EE <- rma(yi, vi, data=dat, digits=3, method="EE") res.std$ML <- rma(yi, vi, data=dat, digits=3, method="ML") res.std$REML <- rma(yi, vi, data=dat, digits=3, method="REML") res.std$DL <- rma(yi, vi, data=dat, digits=3, method="DL") res.std$HE <- rma(yi, vi, data=dat, digits=3, method="HE") round(t(sapply(res.std, function(x) c(tau2=x$tau2, mu=x$b, se=x$se, z=x$zval, ci.lb=x$ci.lb, ci.ub=x$ci.ub))), 3)
tau2 mu se z ci.lb ci.ub EE 0.000 0.060 0.036 1.655 -0.011 0.132 ML 0.013 0.078 0.047 1.637 -0.015 0.171 REML 0.019 0.084 0.052 1.621 -0.018 0.185 DL 0.026 0.089 0.056 1.601 -0.020 0.199 HE 0.080 0.114 0.079 1.443 -0.041 0.270
The results under "method of moments" in Table 16.3 correspond to those obtained when using the DerSimonian-Laird estimator (method="DL"
). The estimator by Hedges (1983) is another method of moments estimator (described in the chapter on page 311) and is also included in the results above (method="HE"
).
Knapp & Hartung Method
The "quasi-F" approach (described on pages 312-313) is the same as the Knapp and Hartung (2003) method. These results can be obtained with:
res.knha <- list() res.knha$EE <- rma(yi, vi, data=dat, digits=3, method="EE", test="knha") res.knha$ML <- rma(yi, vi, data=dat, digits=3, method="ML", test="knha") res.knha$REML <- rma(yi, vi, data=dat, digits=3, method="REML", test="knha") res.knha$DL <- rma(yi, vi, data=dat, digits=3, method="DL", test="knha") res.knha$HE <- rma(yi, vi, data=dat, digits=3, method="HE", test="knha") round(t(sapply(res.knha, function(x) c(tau2=x$tau2, mu=x$b, se=x$se, z=x$zval, ci.lb=x$ci.lb, ci.ub=x$ci.ub))), 3)
tau2 mu se z ci.lb ci.ub EE 0.000 0.060 0.051 1.173 -0.048 0.168 ML 0.013 0.078 0.059 1.311 -0.047 0.202 REML 0.019 0.084 0.062 1.359 -0.046 0.213 DL 0.026 0.089 0.064 1.405 -0.044 0.223 HE 0.080 0.114 0.071 1.608 -0.035 0.264
Huber-White Method
The results using the Huber-White method (as described on pages 313-314) can be obtained with:
res.hw <- list() res.hw$EE <- robust(res.std$EE, cluster=dat$study, adjust=FALSE) res.hw$ML <- robust(res.std$ML, cluster=dat$study, adjust=FALSE) res.hw$REML <- robust(res.std$REML, cluster=dat$study, adjust=FALSE) res.hw$DL <- robust(res.std$DL, cluster=dat$study, adjust=FALSE) res.hw$HE <- robust(res.std$HE, cluster=dat$study, adjust=FALSE) round(t(sapply(res.hw, function(x) c(tau2=x$tau2, mu=x$b, se=x$se, t=x$tval, ci.lb=x$ci.lb, ci.ub=x$ci.ub))), 3)
tau2 mu se t ci.lb ci.ub EE 0.000 0.060 0.040 1.515 -0.023 0.144 ML 0.013 0.078 0.047 1.637 -0.022 0.178 REML 0.019 0.084 0.050 1.676 -0.021 0.189 DL 0.026 0.089 0.052 1.710 -0.020 0.199 HE 0.080 0.114 0.062 1.850 -0.015 0.244
Note that adjust=FALSE
must be used here to reproduce the results given (however, in practice, the small-sample correction should be used).
Final Note
Please note that there appear to be a few typos in Table 16.3. Most of the smaller discrepancies are probably due to rounding differences, but a few values are a bit more off. For example, for the conventional full MLE approach, $\hat{\mu} = .078$ and $SE[\hat{\mu}] = .047$, so that the upper CI bound should be approximately $.078 + 1.96(.047) \approx .170$, but it is reported as $.136$ (which does not follow based on the values for $\hat{\mu}$ and $SE[\hat{\mu}]$ reported in the table).
References
Hedges, L. V. (1983). A random effects model for effect sizes. Psychological Bulletin, 93(2), 388–395.
Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558.
Knapp, G., & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. Statistics in Medicine, 22(17), 2693–2710.
Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76(1), 85–97.
Raudenbush, S. W. (2009). Analyzing effect sizes: Random effects models. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 295–315). New York: Russell Sage Foundation.
Raudenbush, S. W., & Bryk, A. S. (1985). Empirical Bayes meta-analysis. Journal of Educational Statistics, 10(2), 75–98.
Riley, R. D., Higgins, J. P., & Deeks, J. J. (2011). Interpretation of random effects meta-analyses. British Medical Journal, 342, d549.