The metafor Package

Comparing Estimates of Independent Meta-Analyses or Subgroups

Suppose we have summary estimates (e.g., estimated average effects) obtained from two independent meta-analyses or two subgroups of studies within the same meta-analysis and we want to test whether the estimates are different from each other. A Wald-type test can be used for this purpose. Alternatively, one could run a single meta-regression model including all studies and using a dichotomous moderator to distinguish the two sets. Both approaches are conceptually very similar with a subtle difference that will be illustrated below with an example.

Data Preparation

We will use the 'famous' BCG vaccine meta-analysis for this illustration. First, we compute the log risk ratios (and corresponding sampling variances) for each study and then dichotomize the alloc variable.

library(metafor)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
dat$alloc <- ifelse(dat$alloc == "random", "random", "other")
dat

   trial               author year tpos  tneg cpos  cneg ablat  alloc      yi     vi
1      1              Aronson 1948    4   119   11   128    44 random -0.8893 0.3256
2      2     Ferguson & Simes 1949    6   300   29   274    55 random -1.5854 0.1946
3      3      Rosenthal et al 1960    3   228   11   209    42 random -1.3481 0.4154
4      4    Hart & Sutherland 1977   62 13536  248 12619    52 random -1.4416 0.0200
5      5 Frimodt-Moller et al 1973   33  5036   47  5761    13  other -0.2175 0.0512
6      6      Stein & Aronson 1953  180  1361  372  1079    44  other -0.7861 0.0069
7      7     Vandiviere et al 1973    8  2537   10   619    19 random -1.6209 0.2230
8      8           TPT Madras 1980  505 87886  499 87892    13 random  0.0120 0.0040
9      9     Coetzee & Berjak 1968   29  7470   45  7232    27 random -0.4694 0.0564
10    10      Rosenthal et al 1961   17  1699   65  1600    42  other -1.3713 0.0730
11    11       Comstock et al 1974  186 50448  141 27197    18  other -0.3394 0.0124
12    12   Comstock & Webster 1969    5  2493    3  2338    33  other  0.4459 0.5325
13    13       Comstock et al 1976   27 16886   29 17825    33  other -0.0173 0.0714

Separate Meta-Analyses

First, we fit two separate random-effects models within each subset defined by the alloc variable:

res1 <- rma(yi, vi, data=dat, subset=alloc=="random")
res2 <- rma(yi, vi, data=dat, subset=alloc=="other")

We then combine the estimates and standard errors from each model into a data frame. We also add a variable to distinguish the two models and, for reasons to be explained in more detail below, we add the estimated amounts of heterogeneity within each subset to the data frame.

dat.comp <- data.frame(alloc    = c("random", "other"), 
                       estimate = c(coef(res1), coef(res2)), 
                       stderror = c(se(res1), se(res2)),
                       tau2     = c(res1$tau2, res2$tau2))
dfround(dat.comp, 3)

   alloc estimate stderror  tau2
1 random   -0.971    0.276 0.393
2  other   -0.481    0.217 0.212

We can now compare the two estimates (i.e., the estimated average log risk ratios) by feeding them back to the rma() function and using the variable to distinguish the two estimates as a moderator. We use a fixed-effects meta-regression model for this purpose, because the (residual) heterogeneity within each subset has already been accounted for by fitting random-effects models above.

rma(estimate, sei=stderror, mods = ~ alloc, method="FE", data=dat.comp, digits=3)

Fixed-Effects with Moderators Model (k = 2)
 
Test for Residual Heterogeneity:
QE(df = 0) = 0.000, p-val = 1.000
 
Test of Moderators (coefficient(s) 2):
QM(df = 1) = 1.946, p-val = 0.163
 
Model Results:
 
             estimate     se    zval   pval   ci.lb   ci.ub
intrcpt        -0.481  0.217  -2.218  0.027  -0.907  -0.056  *
allocrandom    -0.490  0.351  -1.395  0.163  -1.178   0.198
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

While we find that studies using random assignment obtain on average larger (i.e., more negative) effects than studies not using random assignment ($b_1 = -0.490$, $SE = 0.351$), the difference between the two estimates is not statistically significant ($z = -1.395$, $p = .163$).

The test of the difference between the two estimates is really just a Wald-type test, given by the equation $$z = \frac{\hat{\mu}_1 - \hat{\mu}_2}{\sqrt{SE[\hat{\mu}_1]^2 + SE[\hat{\mu}_2]^2}},$$ where $\hat{\mu}_1$ and $\hat{\mu}_2$ are the two estimates and $SE[\hat{\mu}_1]$ and $SE[\hat{\mu}_2]$ the corresponding standard errors. The test statistics can therefore also be computed with:

with(dat.comp, round(c(zval = (estimate[1] - estimate[2])/sqrt(stderror[1]^2 + stderror[2]^2)), 3))

  zval
-1.395

This is the same value that we obtained above.

Meta-Regression with All Studies

Now let's take a different approach, fitting a meta-regression model with alloc as a categorical moderator based on all studies:

rma(yi, vi, mods = ~ alloc, data=dat, digits=3)

Mixed-Effects Model (k = 13; tau^2 estimator: REML)
 
tau^2 (estimated amount of residual heterogeneity):     0.318 (SE = 0.178)
tau (square root of estimated tau^2 value):             0.564
I^2 (residual heterogeneity / unaccounted variability): 89.92%
H^2 (unaccounted variability / sampling variability):   9.92
R^2 (amount of heterogeneity accounted for):            0.00%
 
Test for Residual Heterogeneity:
QE(df = 11) = 138.511, p-val < .001
 
Test of Moderators (coefficient(s) 2):
QM(df = 1) = 1.833, p-val = 0.176
 
Model Results:
 
             estimate     se    zval   pval   ci.lb  ci.ub
intrcpt        -0.467  0.257  -1.816  0.069  -0.972  0.037  .
allocrandom    -0.490  0.362  -1.354  0.176  -1.199  0.219
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The result is very similar to what we saw earlier: The coefficient corresponding to the alloc dummy is equal to $b_1 = -0.490$ ($SE = 0.362$) and not significant ($p = .176$).

However, the results are not exactly identical. The reason for this is as follows. When we fit separate random-effects models in the two subsets, we are allowing the amount of heterogeneity within each set to be different (as shown earlier, the estimates were $\hat{\tau}^2 = 0.393$ and $\hat{\tau}^2 = 0.212$ for studies using and not using random assignment, respectively). On the other hand, the mixed-effects meta-regression model fitted above has a single variance component for the amount of residual heterogeneity, which implies that the amount of heterogeneity within each subset is assumed to be the same ($\hat{\tau}^2 = 0.318$ in this example – note that this falls somewhere between the two $\hat{\tau}^2$ values we obtained within the subsets).

Meta-Regression with All Studies but Different Amounts of (Residual) Heterogeneity

Using the rma.mv() function, we can easily fit a meta-regression model using all studies where we allow the amount of residual heterogeneity to be different in each subset:

rma.mv(yi, vi, mods = ~ alloc, random = ~ alloc | trial, 
       struct="DIAG", data=dat, digits=3)

Multivariate Meta-Analysis Model (k = 13; method: REML)
 
Variance Components:
 
outer factor: trial (nlvls = 13)
inner factor: alloc (nlvls = 2)
 
           estim   sqrt  k.lvl  fixed   level
tau^2.1    0.212  0.460      6     no   other
tau^2.2    0.393  0.627      7     no  random
 
Test for Residual Heterogeneity:
QE(df = 11) = 138.511, p-val < .001
 
Test of Moderators (coefficient(s) 2):
QM(df = 1) = 1.946, p-val = 0.163
 
Model Results:
 
             estimate     se    zval   pval   ci.lb   ci.ub
intrcpt        -0.481  0.217  -2.218  0.027  -0.907  -0.056  *
allocrandom    -0.490  0.351  -1.395  0.163  -1.178   0.198
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note that the two estimates of $\tau^2$ are now identical to the ones we obtained earlier from the separate random-effects models. Also, the coefficient, standard error, and p-value for the moderator now matches the results obtained earlier.¹⁾

A discussion/comparison of these two approaches (i.e., assuming a single $\tau^2$ value or allowing $\tau^2$ to differ across subsets) can be found in the following articles:

Rubio-Aparicio, M., Sánchez-Meca, J., López-López, J. A., Botella, J. & Marín-Martínez, F. (2017). Analysis of categorical moderators in mixed-effects meta-analysis: Consequences of using pooled versus separate estimates of the residual between-studies variances. British Journal of Mathematical and Statistical Psychology, 70(3), 439-456. https://doi.org/https://doi.org/10.1111/bmsp.12092

Rubio-Aparicio, M., López-López, J. A., Viechtbauer, W., Marín-Martínez, F., Botella, J., & Sánchez-Meca, J. (2020). Testing categorical moderators in mixed-effects meta-analysis in the presence of heteroscedasticity. Journal of Experimental Education, 88(2), 288-310. https://doi.org/10.1080/00220973.2018.1561404

We can also conduct a likelihood ratio test (LRT) to examine whether there are significant differences in the $\tau^2$ values across subsets. This can be done with:

res1 <- rma.mv(yi, vi, mods = ~ alloc, random = ~ alloc | trial, struct="DIAG", data=dat)
res0 <- rma.mv(yi, vi, mods = ~ alloc, random = ~ alloc | trial, struct="ID",   data=dat)
anova(res1, res0)

        df     AIC     BIC    AICc   logLik    LRT   pval       QE
Full     4 29.2959 30.8875 35.9626 -10.6480               138.5113
Reduced  3 27.5948 28.7885 31.0234 -10.7974 0.2989 0.5845 138.5113

So in this example, we would not reject the null hypothesis $H_0: \tau^2_1 = \tau^2_2$ ($p = .58$).

Other Types of Models

The issue discussed above also arises for other types of models (e.g., multilevel meta-analytic models). When fitting a particular model within several subgroups, then the variance components of the model are automatically allowed to differ across the subgroups. On the other hand, when fitting the same type of model to all studies combined (but including a moderator to allow the mean effect size to differ across subgroups), then the variance components are assumed to be the same within each subgroups (unless one takes extra steps as illustrated above to allow the variance components to differ across subgroups). Consequently, the results obtained with these two approaches will not be identical.

¹⁾

See also here for another example to illustrate various approaches for allowing $\tau^2$ to differ across subgroups.

Table of Contents