The metafor Package

Testing Factors and Linear Combinations of Parameters

The example below shows how to test factors and linear combinations of parameters in (mixed-effects) meta-regression models.

Testing Factors

Meta-regression models can easily handle categorical predictors (factors) via appropriate coding of factors in terms of dummy variables. We will use the dataset for the BCG vaccine meta-analysis (Colditz et al., 1994) as an illustration.

The (log) risk ratios and corresponding sampling variances for the 13 studies can be obtained with:

library(metafor)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
dat

   trial               author year tpos  tneg cpos  cneg ablat      alloc      yi     vi
1      1              Aronson 1948    4   119   11   128    44     random -0.8893 0.3256
2      2     Ferguson & Simes 1949    6   300   29   274    55     random -1.5854 0.1946
3      3      Rosenthal et al 1960    3   228   11   209    42     random -1.3481 0.4154
4      4    Hart & Sutherland 1977   62 13536  248 12619    52     random -1.4416 0.0200
5      5 Frimodt-Moller et al 1973   33  5036   47  5761    13  alternate -0.2175 0.0512
6      6      Stein & Aronson 1953  180  1361  372  1079    44  alternate -0.7861 0.0069
7      7     Vandiviere et al 1973    8  2537   10   619    19     random -1.6209 0.2230
8      8           TPT Madras 1980  505 87886  499 87892    13     random  0.0120 0.0040
9      9     Coetzee & Berjak 1968   29  7470   45  7232    27     random -0.4694 0.0564
10    10      Rosenthal et al 1961   17  1699   65  1600    42 systematic -1.3713 0.0730
11    11       Comstock et al 1974  186 50448  141 27197    18 systematic -0.3394 0.0124
12    12   Comstock & Webster 1969    5  2493    3  2338    33 systematic  0.4459 0.5325
13    13       Comstock et al 1976   27 16886   29 17825    33 systematic -0.0173 0.0714

For easier interpretation of the results, we will "center" the publication year and absolute latitude moderators at their minimum values (1948 and 13 degrees, respectively):

dat$year  <- dat$year  - 1948
dat$ablat <- dat$ablat - 13

Now let's fit a mixed-effects meta-regression model to these data, including publication year (year), absolute latitude (ablat), and the method of treatment allocation (alloc) as predictors/moderators.¹⁾ The standard way of handling a factor in regression models is to create dummy variables representing the various levels. One of the dummies is then left out of the model, with the corresponding level becoming the "reference" level. Instead of creating the dummy variables manually, we can let R handle the dummy coding of the allocation factor for us:

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat)
res

Mixed-Effects Model (k = 13; tau^2 estimator: REML)
 
tau^2 (estimated amount of residual heterogeneity):     0.1796 (SE = 0.1425)
tau (square root of estimated tau^2 value):             0.4238
I^2 (residual heterogeneity / unaccounted variability): 73.09%
H^2 (unaccounted variability / sampling variability):   3.72
R^2 (amount of heterogeneity accounted for):            42.67%
 
Test for Residual Heterogeneity:
QE(df = 8) = 26.2030, p-val = 0.0010
 
Test of Moderators (coefficients 2:5):
QM(df = 4) = 9.5254, p-val = 0.0492
 
Model Results:
 
                         estimate      se     zval    pval    ci.lb   ci.ub
intrcpt                   -0.2324  0.5550  -0.4187  0.6754  -1.3201  0.8554
factor(alloc)random       -0.3421  0.4180  -0.8183  0.4132  -1.1613  0.4772
factor(alloc)systematic    0.0101  0.4467   0.0226  0.9820  -0.8654  0.8856
year                       0.0075  0.0194   0.3849  0.7003  -0.0306  0.0456
ablat                     -0.0236  0.0132  -1.7816  0.0748  -0.0495  0.0024  .
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By default, alternate was made the reference level (since the first letter "a" comes before "r" and "s"), so the corresponding dummy variable for this level has been left out.²⁾ Therefore, the model intercept (coefficients $b_0$) is the estimated (average) log risk ratio for alternating allocation in 1948 at 13 degrees absolute latitude. Coefficients $b_1$ and $b_2$ indicate how much lower/higher the estimated (average) log risk ratio is for random and systematic allocation, respectively, compared to alternating allocation. Finally, the coefficients for year and absolute latitude (i.e., $b_3$ and $b_4$) estimate how the (average) log risk ratio changes for a one-unit increase in the respective moderator.

The results under "Test of Moderators" indicate that we can (just) reject the null hypothesis $\mbox{H}_0{:}\; \beta_1 = \beta_2 = \beta_3 = \beta_4 = 0$ (i.e., this is a so-called omnibus test of coefficients 2 through 5 – the first being the intercept, which is typically denoted as $\beta_0$ in model equations and is not included in the test by default). In other words, it appears as if at least part of the heterogeneity in the true effects is related to some of the predictors/moderators included in the model.

Now consider the two coefficients for the allocation factor. Neither coefficient is significantly different from 0 (as indicated by the corresponding p-values), but this isn't a proper test of the factor as a whole. Instead, we want to simultaneously test whether both coefficients are simultaneously equal to 0. We can do this with:

anova(res, btt=2:3)

Test of Moderators (coefficients 2:3):
QM(df = 2) = 1.3663, p-val = 0.5050

This is a (Wald-type) chi-square test of the null hypothesis $\mbox{H}_0{:}\; \beta_1 = \beta_2 = 0$ with two degrees of freedom. It indicates that the factor as a whole is not significant.

Note that the reference level chosen for a factor does not alter the results of the omnibus test. For example, we could make random the reference level with:

res <- rma(yi, vi, mods = ~ relevel(factor(alloc), ref="random") + year + ablat, data=dat)
anova(res, btt=2:3)

The results of the test are unchanged (output not shown).

Likelihood Ratio Tests

We can also use likelihood ratio tests (LRT) to test sets of model coefficients. For this, we fit the model once with and once without the treatment allocation factor (i.e., the "full" and the "reduced" models) and then conduct a model comparison. When the two models to be compared differ in terms of their fixed effects (i.e., model coefficients) – as in the present example – then one must fit both models using maximum likelihood (ML) estimation, and not restricted maximum likelihood (REML) estimation (which is the default). So, the LRT can be conducted with:

res0 <- rma(yi, vi, mods = ~ year + ablat, data=dat, method="ML")
res1 <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat, method="ML")
anova(res0, res1)

        df     AIC     BIC    AICc  logLik    LRT   pval      QE  tau^2 R^2
Full     6 25.8412 29.2309 39.8412 -6.9206               26.2030 0.0329
Reduced  4 23.2922 25.5520 28.2922 -7.6461 1.4510 0.4841 28.3251 0.0269  0%

In the present example, the test statistic for the LRT is equal to $1.45$, which (asymptotically) follows a chi-square distribution with two degrees of freedom (i.e., the difference in the number of parameters in the full and the reduced model) under the null hypothesis (i.e., that the reduced model fits the data as well as the full model). This leads to a p-value of $.48$, which does not lead us to reject the null hypothesis and hence to the conclusion that the inclusion of the allocation factor in the model does not provide a significantly better fit.

Knapp & Hartung Adjustment

Knapp and Hartung (2003) described an adjustment to the standard Wald-type tests typically used in meta-regression models that leads to tests with better statistical properties (i.e., better control of the Type I error rate). The method leads to t-test for the individual model coefficients. The adjustment can also be applied when testing sets of model coefficients, which leads to F-tests. Applying this adjustment in the present example leads the following results:

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat, test="knha")
res

Mixed-Effects Model (k = 13; tau^2 estimator: REML)
 
tau^2 (estimated amount of residual heterogeneity):     0.1796 (SE = 0.1425)
tau (square root of estimated tau^2 value):             0.4238
I^2 (residual heterogeneity / unaccounted variability): 73.09%
H^2 (unaccounted variability / sampling variability):   3.72
R^2 (amount of heterogeneity accounted for):            42.67%
 
Test for Residual Heterogeneity:
QE(df = 8) = 26.2030, p-val = 0.0010
 
Test of Moderators (coefficients 2:5):
F(df1 = 4, df2 = 8) = 2.2668, p-val = 0.1509
 
Model Results:
 
                         estimate      se     tval  df    pval    ci.lb   ci.ub
intrcpt                   -0.2324  0.5688  -0.4085   8  0.6936  -1.5441  1.0794
factor(alloc)random       -0.3421  0.4284  -0.7984   8  0.4477  -1.3300  0.6459
factor(alloc)systematic    0.0101  0.4579   0.0221   8  0.9829  -1.0457  1.0659
year                       0.0075  0.0199   0.3755   8  0.7170  -0.0385  0.0534
ablat                     -0.0236  0.0136  -1.7382   8  0.1204  -0.0548  0.0077
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The test of the allocation factor can again be obtained with:

anova(res, btt=2:3)

Test of Moderators (coefficients 2:3):
F(df1 = 2, df2 = 8) = 0.6503, p-val = 0.5474

We obtain $F(2,8) = 0.65, p = .55$, which yields the same conclusion as obtained earlier.

Permutation Test

Yet another possibility is to conduct permutation tests (Higgins & Thompson, 2004), which are implemented in the permutest() function. For this, the rows of the model matrix are permuted, so that any association between the predictors and the observed outcomes is purely due to chance. This process is repeated many times (ideally for all possible unique permutations), yielding the distribution of the test statistics under the null hypothesis. For the omnibus test, the p-value is simply the proportion of times that the test statistic under the permuted data is as extreme or more extreme than the actually observed one.

We can try to carry out an exact permutation test (for all possible unique permutations) with:

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat)
permutest(res, exact=TRUE)

Running 6227020800 iterations for exact permutation test.
Error in permutest.rma.uni(res, exact = T) :
  Number of iterations requested too large.

So, the exact test would require more than $6 \times 10^9$ model fits, which is just not feasible (even on a fast computer, this would easily take years to complete). So, instead, we can approximate the exact permutation-based p-values by going through a smaller number (as specified by the iter argument) of random permutations (the default is 1000). Let's go with 10000 iterations (this may still take a minutes or so to complete). We also specify that we want to obtain the omnibus test of the second and third coefficient (to test the allocation factor):

permutest(res, iter=10000, btt=2:3)

Test of Moderators (coefficients 2:3):¹
QM(df = 2) = 1.3663, p-val = 0.5418
 
Model Results:
 
                         estimate      se     zval    pval¹    ci.lb   ci.ub
intrcpt                   -0.2324  0.5550  -0.4187  0.7661   -1.3201  0.8554
factor(alloc)random       -0.3421  0.4180  -0.8183  0.4544   -1.1613  0.4772
factor(alloc)systematic    0.0101  0.4467   0.0226  0.9824   -0.8654  0.8856
year                       0.0075  0.0194   0.3849  0.7370   -0.0306  0.0456
ablat                     -0.0236  0.0132  -1.7816  0.1094   -0.0495  0.0024
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
1) p-values based on permutation testing

The p-value for the omnibus test is equal to $.54$, which is very close to the one obtained with the Knapp and Hartung adjustment.

Testing Linear Combinations

One can also test linear combinations of model coefficients. For example, one could test whether random and systematic allocation lead to significantly different effects with:

anova(res, X=c(0,1,-1,0,0))

Hypothesis:
1: factor(alloc)random - factor(alloc)systematic = 0
 
Results:
   estimate     se    zval   pval
1:  -0.3522 0.3357 -1.0489 0.2942
 
Test of Hypothesis:
QM(df = 1) = 1.1002, p-val = 0.2942

If the model was fitted with the Knapp and Hartung adjustment, then we again get t- and F-tests:

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat, test="knha")
anova(res, X=c(0,1,-1,0,0))

Hypothesis:
1: factor(alloc)random - factor(alloc)systematic = 0
 
Results:
   estimate     se    tval df   pval
1:  -0.3522 0.3441 -1.0234  8 0.3361
 
Test of Hypothesis:
F(df1 = 1, df2 = 8) = 1.0473, p-val = 0.3361

As another example, we could test whether the estimated average log risk ratio based on random allocation in 1970 at an absolute latitude of 30 degrees is significantly different from zero. Recall that these two moderators were "centered" at 1948 and 13, respectively. So, this linear hypothesis can be tested with:

anova(res, X=c(1,1,0,1970-1948,30-13))

Hypothesis:
1: intrcpt + factor(alloc)random + 22*year + 17*ablat = 0
 
Results:
   estimate     se    tval df   pval
1:  -0.8103 0.2155 -3.7607  8 0.0055 **
 
Test of Hypothesis:
F(df1 = 1, df2 = 8) = 14.1432, p-val = 0.0055

The estimated average log risk ratio for this combination of moderator values is significantly different from zero. We can also obtain the corresponding estimated average log risk ratio with:

tmp <- predict(res, newmods=c(1,0,1970-1948,30-13))
tmp

    pred     se   ci.lb   ci.ub   pi.lb  pi.ub
 -0.8103 0.2155 -1.3072 -0.3135 -1.9067 0.2860

In addition, the corresponding standard error, confidence interval, and prediction interval are given. Note that the intercept is automatically added (by default) and hence does not need to be included in the vector specified for the newmods argument for the predict() function.

All Pairwise Comparisons

Finally, when including a factor in a model, we may be interested in obtaining all pairwise comparisons (contrasts) between the factor levels. Let's start with the model without the Knapp and Hartung adjustment.

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat)
res

The table of the model coefficients then look as follows:

                         estimate      se     zval    pval    ci.lb   ci.ub
intrcpt                   -0.2324  0.5550  -0.4187  0.6754  -1.3201  0.8554
factor(alloc)random       -0.3421  0.4180  -0.8183  0.4132  -1.1613  0.4772
factor(alloc)systematic    0.0101  0.4467   0.0226  0.9820  -0.8654  0.8856
year                       0.0075  0.0194   0.3849  0.7003  -0.0306  0.0456
ablat                     -0.0236  0.0132  -1.7816  0.0748  -0.0495  0.0024  .

Note that coefficients $b_1$ and $b_2$ are already contrasts (between random and alternating allocation and between systematic and alternating allocation). The contrast between systematic and random allocation is just $b_2-b_1$. All these comparisons can be obtained in a single table with:

anova(res, X=rbind(c(0,1,0,0,0), c(0,0,1,0,0), c(0,-1,1,0,0)))

Hypotheses:
1:                            factor(alloc)random = 0
2:                        factor(alloc)systematic = 0
3: -factor(alloc)random + factor(alloc)systematic = 0
 
Results:
   estimate     se    zval   pval
1:  -0.3421 0.4180 -0.8183 0.4132
2:   0.0101 0.4467  0.0226 0.9820
3:   0.3522 0.3357  1.0489 0.2942

Note that the p-values are unadjusted, so the chances of making at least one Type I error can accumulate quickly when there are many factor levels. Using the adjust argument, one can apply various corrections for multiple testing (see below for an example).

Corresponding confidence intervals (CIs) can be obtained with:

predict(res, newmods=rbind(c(1,0,0,0), c(0,1,0,0), c(-1,1,0,0)), intercept=FALSE)

     pred     se   ci.lb  ci.ub   pi.lb  pi.ub
1 -0.3421 0.4180 -1.1613 0.4772 -1.5087 0.8246
2  0.0101 0.4467 -0.8654 0.8856 -1.1967 1.2169
3  0.3522 0.3357 -0.3059 1.0102 -0.7075 1.4118

Recall that, by default, the intercept is automatically added to the vector(s) specified via the newmods argument (when the model actually includes an intercept), which we do not want to happen here, so we set intercept=FALSE.

We can also use an alternative parameterization of the same model, leaving out the intercept altogether, and including all three dummy variables:

res <- rma(yi, vi, mods = ~ 0 + factor(alloc) + year + ablat, data=dat)
res

The table of coefficients now looks like this:

                         estimate      se     zval    pval    ci.lb   ci.ub
factor(alloc)alternate    -0.2324  0.5550  -0.4187  0.6754  -1.3201  0.8554
factor(alloc)random       -0.5744  0.6547  -0.8774  0.3803  -1.8576  0.7087
factor(alloc)systematic   -0.2223  0.6636  -0.3349  0.7377  -1.5230  1.0784
year                       0.0075  0.0194   0.3849  0.7003  -0.0306  0.0456
ablat                     -0.0236  0.0132  -1.7816  0.0748  -0.0495  0.0024  .

So, $b_1$, $b_2$, and $b_3$ are now the estimated (average) log risk ratios for alternating, random, and systematic allocation in 1948 at 13 degrees absolute latitude.

To obtain all pairwise comparisons, we could now use:

anova(res, X=rbind(c(-1,1,0,0,0), c(-1,0,1,0,0), c(0,-1,1,0,0)))

However, more convenient is to make use of the pairmat() function to specify for which coefficients we want to obtain all pairwise contrasts:

anova(res, X=pairmat(btt=1:3))

Hypotheses:
1:     -factor(alloc)alternate + factor(alloc)random = 0
2: -factor(alloc)alternate + factor(alloc)systematic = 0
3:    -factor(alloc)random + factor(alloc)systematic = 0
 
Results:
   estimate     se    zval   pval
1:  -0.3421 0.4180 -0.8183 0.4132
2:   0.0101 0.4467  0.0226 0.9820
3:   0.3522 0.3357  1.0489 0.2942

The p-values for the contrasts can be adjusted using various methods (see help(anova.rma) and help(p.adjust) for possible options). For example, Holm's method is often a good choice, as it provides strong control of the family-wise error rate, but is not as conservative as the Bonferroni correction. We can obtain the adjusted p-values with:

anova(res, X=pairmat(btt=1:3), adjust="holm")

Hypotheses:
1:     -factor(alloc)alternate + factor(alloc)random = 0
2: -factor(alloc)alternate + factor(alloc)systematic = 0
3:    -factor(alloc)random + factor(alloc)systematic = 0
 
Results:
   estimate     se    zval   pval
1:  -0.3421 0.4180 -0.8183 0.8826
2:   0.0101 0.4467  0.0226 0.9820
3:   0.3522 0.3357  1.0489 0.8826

The Knapp and Hartung adjustment can be used in this context as well:

res <- rma(yi, vi, mods = ~ 0 + factor(alloc) + year + ablat, data=dat, test="knha")
anova(res, X=pairmat(btt=1:3))

Hypotheses:
1:     -factor(alloc)alternate + factor(alloc)random = 0
2: -factor(alloc)alternate + factor(alloc)systematic = 0
3:    -factor(alloc)random + factor(alloc)systematic = 0
 
Results:
   estimate     se    tval df   pval
1:  -0.3421 0.4284 -0.7984  8 0.4477
2:   0.0101 0.4579  0.0221  8 0.9829
3:   0.3522 0.3441  1.0234  8 0.3361

Similarly, we can obtain confidence and prediction interval bounds for the contrasts with:

predict(res, newmods=pairmat(btt=1:3))

                                                  pred     se   ci.lb  ci.ub   pi.lb  pi.ub
factor(alloc)alternate-factor(alloc)random     -0.3421 0.4284 -1.3300 0.6459 -1.7317 1.0476
factor(alloc)alternate-factor(alloc)systematic  0.0101 0.4579 -1.0457 1.0659 -1.4286 1.4488
factor(alloc)random-factor(alloc)systematic     0.3522 0.3441 -0.4414 1.1457 -0.9067 1.6110

These intervals are now based on t-distributions.

References

Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., et al. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: Meta-analysis of the published literature. Journal of the American Medical Association, 271(9), 698–702.

Higgins, J. P. T., & Thompson, S. G. (2004). Controlling the risk of spurious findings from meta-regression. Statistics in Medicine, 23(11), 1663–1682.

Knapp, G., & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. Statistics in Medicine, 22(17), 2693–2710.

¹⁾

With only $k=13$ studies, results from models including multiple moderators should be interpreted cautiously. The example is used here primarily for illustration purposes.

²⁾

Since alloc is a character variable, the coercion to a factor via the factor() function isn't strictly necessary (since this would have happened automatically anyway), but it's better to be explicit. In some cases, a numeric variable may have been used to represent levels of a categorical variable, in which case use of the factor() function is crucial, so that it is properly treated as a categorical and not as a quantitative variable.

Table of Contents