The metafor Package

The Methods and Data
Descriptive Statistics
Standard Random-Effects Model
Hunter and Schmidt Method
Standard RE Model with the rma.mv() Function
Multilevel Meta-Analysis Model
Intraclass Correlation of the True Outcomes
Multivariate Parameterization of the Model
Likelihood Ratio Test to Compare Models
Uncorrelated Sampling Errors
A Common Mistake when Fitting the Multilevel Model
References

Credé et al. (2010)

The Methods and Data

The meta-analysis by Credé, Roch, and Kieszczynka (2010) examined to what extent class attendance in college is related to college grades and students' grade point average (GPA) more broadly. The studies included in the meta-analysis either reported the correlation between attendance and grades for a sample of students in a particular class or the correlation between attendance and GPA. Aside from the interesting subject matter, the meta-analysis is also instructive because it allows for a nice illustration of the use of a multilevel meta-analytic model for its analysis. The reason for using such a model will be discussed further below.

The first author (who was a fellow PhD student at the University of Illinois, Urbana-Champaign, in the early 2000's) was kind enough to share the dataset with me and it is now included in the metafor package.¹⁾ Note that this dataset differs just slightly from the one that was actually used by Credé et al. (2010) in their meta-analysis, but any differences are negligible and not relevant for the purposes of this illustration.

The data can be loaded with:

library(metafor)
dat <- dat.crede2010

(I copy the dataset into dat, which is a bit shorter and therefore easier to type further below). Let's examine the first 15 rows of the dataset:

head(dat, 15)

   studyid year       source sampleid criterion      class  ni      ri
1        1 2009 dissertation        1     grade nonscience  76  0.8860
2        2 1975      journal        1     grade nonscience 297  0.3000
3        3 2007 dissertation        1       gpa       <NA> 191  0.7200
4        4 1989      journal        1     grade nonscience 265  0.4750
5        4 1989      journal        2     grade nonscience 154  0.3340
6        5 2008      journal        1     grade nonscience 162  0.6150
7        6 1999      journal        1     grade nonscience  28  0.1450
8        6 1999      journal        2     grade nonscience  33  0.2300
9        6 1999      journal        3     grade nonscience  47  0.2700
10       6 1999      journal        4     grade nonscience  25 -0.0228
11       6 1999      journal        5     grade nonscience  48  0.4290
12       6 1999      journal        6     grade nonscience  39  0.3490
13       6 1999      journal        7     grade nonscience  41  0.2200
14       6 1999      journal        8     grade nonscience  35  0.3390
15       6 1999      journal        9     grade nonscience  46  0.4470

Variable studyid is a study identifier, year denotes the publication year of the study, the source variable indicates whether a particular study was a published journal article, dissertation, or was obtained from some other source, variable sampleid is a sample (within studies) identifier (some studies included multiple samples), criterion indicates whether the correlation reported in a particular row corresponds to the relationship between attendance and grades within a particular class or overall GPA, the class variable distinguishes science from non-science classes (missing when the criterion is GPA), while variables ni and ri contain the sample sizes and correlation coefficients, respectively.

For the analysis, we will use r-to-z transformed correlation coefficients and focus on the analysis between attendance and grades in particular classes. We use the escalc() function to compute the transformed coefficients and select only the rows from the dataset where the criterion variable is equal to grade.

dat <- escalc(measure="ZCOR", ri=ri, ni=ni,
              data=dat, subset=criterion=="grade")
head(dat, 15)

   studyid  .  sampleid  .   ni      ri      yi     vi
1        1  .         1  .   76  0.8860  1.4030 0.0137
2        2  .         1  .  297  0.3000  0.3095 0.0034
4        4  .         1  .  265  0.4750  0.5165 0.0038
5        4  .         2  .  154  0.3340  0.3473 0.0066
6        5  .         1  .  162  0.6150  0.7169 0.0063
7        6  .         1  .   28  0.1450  0.1460 0.0400
8        6  .         2  .   33  0.2300  0.2342 0.0333
9        6  .         3  .   47  0.2700  0.2769 0.0227
10       6  .         4  .   25 -0.0228 -0.0228 0.0455
11       6  .         5  .   48  0.4290  0.4587 0.0222
12       6  .         6  .   39  0.3490  0.3643 0.0278
13       6  .         7  .   41  0.2200  0.2237 0.0263
14       6  .         8  .   35  0.3390  0.3530 0.0312
15       6  .         9  .   46  0.4470  0.4809 0.0233
17       7  .         1  .  350  0.4010  0.4248 0.0029

Variables yi and vi contain the transformed correlation coefficients and the corresponding sampling variances, respectively.

Descriptive Statistics

First, we will check how many studies and how many correlations are included in the dataset:

c(studies=length(unique(dat$studyid)), correlations=nrow(dat))

     studies correlations
          54           67

Hence, the dataset includes 54 studies, which provide a total of 67 (transformed) correlation coefficients. So, as noted above, some studies included multiple samples (e.g., for different classes or for multiple sections of the same class) and reported separate correlation coefficients for the different samples. We can check how often this was the case with:

table(table(dat$studyid))

 1  2  9
48  5  1

Therefore, most (i.e., 48) studies only provided a single correlation coefficient, 5 studies reported two coefficients, and one study reported the results for 9 different samples (which was study 6, as we can see in the data output further above).

We can obtain some descriptive statistics about the observed correlation coefficients with:

round(c(summary(dat$ri), SD=sd(dat$ri)), digits=2)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.      SD
  -0.02    0.30    0.39    0.40    0.49    0.89    0.17

The observed correlations range from -0.02 to 0.89, with a mean and median equal to 0.39 and 0.40, respectively.

Standard Random-Effects Model

We start out by fitting a standard random-effects model to the data. Here, we treat the 67 coefficients in the dataset as independent (which we will later see is not justified). We can fit such a model with:

res <- rma(yi, vi, data=dat)
res

Random-Effects Model (k = 67; tau^2 estimator: REML)
 
tau^2 (estimated amount of total heterogeneity): 0.0511 (SE = 0.0104)
tau (square root of estimated tau^2 value):      0.2261
I^2 (total heterogeneity / total variability):   93.83%
H^2 (total variability / sampling variability):  16.21
 
Test for Heterogeneity:
Q(df = 66) = 1068.7213, p-val < .0001
 
Model Results:
 
estimate      se     zval    pval   ci.lb   ci.ub
  0.4547  0.0300  15.1343  <.0001  0.3958  0.5136  ***
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Q-test and the estimate of $\tau^2$ suggest that the observed outcomes are heterogeneous. Hence, the strength of the relationship between class attendance and grades appears to vary across studies. The estimated average transformed correlation coefficient (i.e., 0.4547) is clearly above 0, indicating that there is on average a positive relationship between these two variables. For easier interpretation, we can back-transform the results using the z-to-r transformation:

predict(res, transf=transf.ztor, digits=2)

 pred ci.lb ci.ub pi.lb pi.ub
 0.43  0.38  0.47  0.01  0.72

Hence, the estimated average correlation is equal to 0.43 (95% CI: 0.38 to 0.47), which is a fairly sizable association between these two variables.²⁾ In fact, as the authors of the meta-analysis note, this makes class attendance "a better predictor of college grades than any other known predictor of academic performance, including scores on standardized admissions tests such as the SAT, high school GPA, study habits, and study skills." Of course, we do not know if the attendance per se leads to better grades or if better students tend to attend more classes or if some third variable (e.g., motivation) influences both attendance and grades, but this issue is beyond the purposes of this illustration (but see the paper by Credé et al., 2010, for some further discussion on this).

Finally, it is worth noting in this analysis that the 95% prediction interval is quite wide, ranging from a value just above 0 to a correlation of 0.72, reflecting the substantial heterogeneity in the results. Hence, while on average there appears to be a sizable correlation between these two variables, in any particular situation/class the actual strength of the association might be quite a bit weaker or stronger.

Hunter and Schmidt Method

When comparing the results reported in the paper (see the row labeled "All Class Grades" in Table 1) with what we obtained above, we see some slight discrepancies. This is because the analyses in the paper used a different meta-analytic approach based on the methods described by Hunter and Schmidt (see here for some further details on this method). We can obtain essentially the same results as those given in the paper with:³⁾

tmp <- escalc(measure="COR", ri=ri, ni=ni, data=dat, vtype="AV")
res <- rma(yi, vi, weights=ni, data=tmp, method="HS")
print(res, digits=2)

Random-Effects Model (k = 67; tau^2 estimator: HS)
 
tau^2 (estimated amount of total heterogeneity): 0.02 (SE = 0.01)
tau (square root of estimated tau^2 value):      0.14
I^2 (total heterogeneity / total variability):   90.57%
H^2 (total variability / sampling variability):  10.61
 
Test for Heterogeneity:
Q(df = 66) = 741.15, p-val < .01
 
Model Results:
 
estimate    se   zval  pval  ci.lb  ci.ub
    0.44  0.04  12.45  <.01   0.37   0.51  ***
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since we directly meta-analyze the correlation coefficients in this approach, the value given under estimate is the estimated average correlation coefficient, which matches what the authors report (i.e., 0.44). The value reported as tau in the output is the estimated standard deviation of the true correlations, which also matches what the authors report as $SD_{\rho}$ (i.e., 0.14). Finally, the authors report an 80% prediction interval (with bounds labeled as "10%CV" and "90%CV"), which we can obtain with:⁴⁾

predict(res, digits=2, level=80, pi.type="simple")

 pred   se ci.lb ci.ub pi.lb pi.ub
 0.44 0.04  0.40  0.49  0.26  0.63

There is a slight difference in the upper bound (0.63 versus 0.62), but this might just be a rounding issue.

Standard RE Model with the rma.mv() Function

Let us return for the moment to the standard random-effects model we fitted earlier using the rma() function. We can also fit the same model with the rma.mv() function (see also here for a more detailed discussion on this). The syntax for the random-effects model is:

dat$id <- 1:nrow(dat)
res <- rma.mv(yi, vi, random = ~ 1 | id, data=dat)
res

Multivariate Meta-Analysis Model (k = 67; method: REML)
 
Variance Components:
 
            estim    sqrt  nlvls  fixed  factor
sigma^2    0.0511  0.2261     67     no      id
 
Test for Heterogeneity:
Q(df = 66) = 1068.7213, p-val < .0001
 
Model Results:
 
estimate      se     tval  df    pval   ci.lb   ci.ub
  0.4547  0.0300  15.1343  66  <.0001  0.3947  0.5147  ***
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As shown above, we first create a variable called id that takes on a unique value for every row in the dataset and then use random = ~ 1 | id in rma.mv() to add a random effect corresponding to each row in the dataset to the model. This is in fact what happens automatically in rma() but when using the rma.mv() function, we have to add random effects manually. As we can see above, the results are identical to those we obtained earlier with the rma() function.

Multilevel Meta-Analysis Model

As noted earlier, some studies reported correlation coefficients for multiple samples within the same study. The dataset therefore has a multilevel structure, with samples nested within studies. A multilevel meta-analysis of these data can be used to estimate and account for the amount of heterogeneity between studies and between samples within studies. For this, we add a random effect not only at the study level, but also at the sample (within studies) level. The syntax for this is:

res <- rma.mv(yi, vi, random = ~ 1 | studyid/sampleid, data=dat)
res

Multivariate Meta-Analysis Model (k = 67; method: REML)
 
Variance Components:
 
            estim    sqrt  nlvls  fixed            factor
sigma^2.1  0.0376  0.1939     54     no           studyid
sigma^2.2  0.0159  0.1259     67     no  studyid/sampleid
 
Test for Heterogeneity:
Q(df = 66) = 1068.7213, p-val < .0001
 
Model Results:
 
estimate      se     zval    pval   ci.lb   ci.ub
  0.4798  0.0331  14.5167  <.0001  0.4151  0.5446  ***
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The estimated variance components indicate that there is more heterogeneity between studies (0.0376) than between samples within studies (0.0159). Again, we can back-transform the results with:

predict(res, transf=transf.ztor, digits=2)

 pred ci.lb ci.ub pi.lb pi.ub
 0.45  0.39  0.50  0.02  0.73

Although these results are not substantially different than what we obtained earlier, this model accounts for the multilevel structure in the data.

Intraclass Correlation of the True Outcomes

The multilevel model above accounts for potential dependence in multiple outcomes coming from the same study. In fact, it can be shown that the model implies an intraclass correlation coefficient (ICC) of $$\rho = \frac{\sigma^2_1}{\sigma^2_1 + \sigma^2_2}$$ for the true outcomes within the same study (where $\sigma^2_1$ is the variance component corresponding to the study level and $\sigma^2_2$ is the variance component corresponding to the sample level).

In the present case, the ICC is estimated to be:

round(res$sigma2[1] / sum(res$sigma2), digits=4)

[1] 0.7033

Therefore, the underlying true outcomes (i.e., transformed correlation coefficients) are estimated to correlate quite strongly within studies. On the other hand, the standard random-effects model we fitted earlier ignores this dependence and treats all rows in the dataset as independent.

Also, it is worth noting that the sum of the two variance components can be interpreted as the total amount of heterogeneity in the true outcomes:

round(sum(res$sigma2), digits=4)

[1] 0.0535

The relevance of this value will become clear in a moment.

Multivariate Parameterization of the Model

We can fit the same model using a multivariate parameterization, which directly provides us with the ICC. For this, we use the syntax:

res <- rma.mv(yi, vi, random = ~ sampleid | studyid, data=dat)
res

Multivariate Meta-Analysis Model (k = 67; method: REML)
 
Variance Components:
 
outer factor: studyid  (nlvls = 54)
inner factor: sampleid (nlvls = 9)
 
            estim    sqrt  fixed
tau^2      0.0535  0.2312     no
rho        0.7033             no
 
Test for Heterogeneity:
Q(df = 66) = 1068.7213, p-val < .0001
 
Model Results:
 
estimate      se     zval    pval   ci.lb   ci.ub
  0.4798  0.0331  14.5167  <.0001  0.4151  0.5446  ***
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The random = ~ sampleid | studyid argument adds correlated random effects for the different samples within studies to the model, where the variance-covariance matrix of the random effects takes on a compound symmetric structure (struct="CS" is the default). Note that the estimate given next to rho is exactly the same as the ICC value we computed earlier based on the multilevel model. Also, the estimate given next to tau^2 obtained from the multivariate parameterization is the same as the total amount of heterogeneity computed earlier based on the multilevel model.

As long as rho is estimated to be positive, the multilevel and multivariate parameterizations are in essence identical. In fact, the log likelihoods of the two models should be identical, which we can confirm with:

res1 <- rma.mv(yi, vi, random = ~ 1 | studyid/sampleid, data=dat)
res2 <- rma.mv(yi, vi, random = ~ sampleid | studyid, data=dat)
fitstats(res1, res2)

               res1      res2
logLik:    1.304449  1.304449
deviance: -2.608898 -2.608898
AIC:       3.391102  3.391102
BIC:       9.960066  9.960066
AICc:      3.778198  3.778198

Likelihood Ratio Test to Compare Models

The analysis above raises the question whether the additional complexity of the multilevel model is warranted. To address this question, we can conduct a likelihood ratio test, comparing the fit of the multilevel model with that of the standard random-effects model:

res0 <- rma.mv(yi, vi, random = ~ 1 | id, data=dat)
res1 <- rma.mv(yi, vi, random = ~ 1 | studyid/sampleid, data=dat)
anova(res0, res1)

        df    AIC     BIC   AICc  logLik    LRT   pval        QE
Full     3 3.3911  9.9601 3.7782  1.3044               1068.7213
Reduced  2 6.5060 10.8853 6.6964 -1.2530 5.1149 0.0237 1068.7213

The multilevel model provides a significantly better fit, as indicated by the likelihood ratio test ($\chi^2 = 5.11$, $\mbox{df} = 1$, $p = 0.0237$). Also, the information criteria (i.e., the AIC, BIC, and the corrected AIC) all point towards the multilevel model as being the preferable model (for these statistics, smaller is better).

Uncorrelated Sampling Errors

The multilevel model above accounts for possible dependence in the underlying true outcomes (i.e., the true transformed correlation coefficients) within studies. That is, if the true association is quite strong for a particular section of a class, then it probably also tends to be quite strong for other sections (and vice-versa). In fact, this is what the high ICC value above suggests.

It is important to note that the model still assumes that the sampling errors of the observed outcomes are independent. This is typically a safe assumption as long as there is no overlap in the data/subjects used to compute the various estimates. In the present example, this would seem to be the case, given that multiple correlation coefficients within the same study were obtained from independent samples.

However, when multiple estimates are obtained from the same group of subjects (e.g., the correlation between attendance and grades in different classes for the same group of subjects), then this assumption would be violated. In this case, one also needs to account for the dependence in the sampling errors of the estimates, which requires additional work, but is beyond the scope of this example.

A Common Mistake when Fitting the Multilevel Model

When fitting the multilevel model, a common mistake is to forget to add random effects at both levels. In other words, one could fit the model:

rma.mv(yi, vi, random = ~ 1 | studyid, data=dat)

Multivariate Meta-Analysis Model (k = 67; method: REML)
 
Variance Components:
 
            estim    sqrt  nlvls  fixed   factor
sigma^2    0.0528  0.2298     54     no  studyid
 
Test for Heterogeneity:
Q(df = 66) = 1068.7213, p-val < .0001
 
Model Results:
 
estimate      se     zval    pval   ci.lb   ci.ub
  0.4865  0.0332  14.6416  <.0001  0.4214  0.5517  ***
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This model only accounts for heterogeneity between studies, so it assumes that the true (transformed) correlation coefficients for different samples within the same study are all identical (or put differently, that the ICC is equal to 1). In the present case, the ICC in the multilevel model is actually quite high, so the results are not dramatically different, but this may not be true in other applications.

We could in fact again conduct a likelihood ratio test to compare these two models:

res0 <- rma.mv(yi, vi, random = ~ 1 | studyid, data=dat)
res1 <- rma.mv(yi, vi, random = ~ 1 | studyid/sampleid, data=dat)
anova(res0, res1)

        df     AIC     BIC    AICc  logLik     LRT   pval        QE
Full     3  3.3911  9.9601  3.7782  1.3044                1068.7213
Reduced  2 14.8533 19.2326 15.0438 -5.4267 13.4622 0.0002 1068.7213

These results indicate that the multilevel model with random effects at both levels fits significantly better than the simpler model that only adds a random effect at the study level.

Regardless of the result of this test, not adding random effects at both levels is typically an inadvertent mistake. Unless the estimate of the variance component at the sample level happens to be equal to zero (which is identical to simply leaving out this random effect from the model altogether), it should always be included.

References

Credé, M., Roch, S. G. & Kieszczynka, U. M. (2010). Class attendance in college: A meta-analytic review of the relationship of class attendance with grades and student characteristics. Review of Educational Research, 80(2), 272–295. https://doi.org/10.3102/0034654310362998

¹⁾

To be precise, the dataset is part of the metadat package, which metafor automatically loads.

²⁾

Technically, since the transformation used in this meta-analysis is a non-linear transformation, the 0.43 value is an estimate of the median correlation (and not the average correlation). To obtain an estimate of the average correlation, we should use the integral transformation method for the z-to-r transformation with predict(res, transf=transf.ztor.int, targs=res$tau2, digits=2), which yields an estimate of 0.41 (95% CI: 0.36 to 0.46). Since the difference is relatively small in this case, we will ignore this technicality for the remainder of this illustration.

³⁾

The analyses in the paper were actually based on $k=68$ correlation coefficients and hence one additional coefficient than the dataset analyzed here, but this appears to have no major impact on the results.

⁴⁾

Aside from adjusting the level for the confidence and prediction intervals, we also set pi.type="simple" to use a slightly simpler method for constructing the prediction interval, which is used in the Hunter and Schmidt method.

Table of Contents