### Table of Contents

## Testing Factors and Linear Combinations of Parameters

The example below shows how to test factors and linear combinations of parameters in (mixed-effects) meta-regression models.

### Testing Factors

Meta-regression models can easily handle categorical predictors (factors) via appropriate coding of factors in terms of dummy variables. We will use the dataset for the BCG vaccine meta-analysis (Colditz et al., 1994) as an illustration.

The (log) risk ratios and corresponding sampling variances for the 13 studies can be obtained with:

library(metafor) dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) dat

trial author year tpos tneg cpos cneg ablat alloc yi vi 1 1 Aronson 1948 4 119 11 128 44 random -0.8893 0.3256 2 2 Ferguson & Simes 1949 6 300 29 274 55 random -1.5854 0.1946 3 3 Rosenthal et al 1960 3 228 11 209 42 random -1.3481 0.4154 4 4 Hart & Sutherland 1977 62 13536 248 12619 52 random -1.4416 0.0200 5 5 Frimodt-Moller et al 1973 33 5036 47 5761 13 alternate -0.2175 0.0512 6 6 Stein & Aronson 1953 180 1361 372 1079 44 alternate -0.7861 0.0069 7 7 Vandiviere et al 1973 8 2537 10 619 19 random -1.6209 0.2230 8 8 TPT Madras 1980 505 87886 499 87892 13 random 0.0120 0.0040 9 9 Coetzee & Berjak 1968 29 7470 45 7232 27 random -0.4694 0.0564 10 10 Rosenthal et al 1961 17 1699 65 1600 42 systematic -1.3713 0.0730 11 11 Comstock et al 1974 186 50448 141 27197 18 systematic -0.3394 0.0124 12 12 Comstock & Webster 1969 5 2493 3 2338 33 systematic 0.4459 0.5325 13 13 Comstock et al 1976 27 16886 29 17825 33 systematic -0.0173 0.0714

For easier interpretation of the results, we will "center" the publication year and absolute latitude moderators at their minimum values (1948 and 13 degrees, respectively):

dat$year <- dat$year - 1948 dat$ablat <- dat$ablat - 13

Now let's fit a mixed-effects meta-regression model to these data, including publication year (`year`

), absolute latitude (`ablat`

), and the method of treatment allocation (`alloc`

) as predictors/moderators.^{1)} The standard way of handling a factor in regression models is to create dummy variables representing the various levels. One of the dummies is then left out of the model, with the corresponding level becoming the "reference" level. Instead of creating the dummy variables manually, we can let R handle the dummy coding of the allocation factor for us:

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat) res

Mixed-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.1796 (SE = 0.1425) tau (square root of estimated tau^2 value): 0.4238 I^2 (residual heterogeneity / unaccounted variability): 73.09% H^2 (unaccounted variability / sampling variability): 3.72 R^2 (amount of heterogeneity accounted for): 42.67% Test for Residual Heterogeneity: QE(df = 8) = 26.2030, p-val = 0.0010 Test of Moderators (coefficients 2:5): QM(df = 4) = 9.5254, p-val = 0.0492 Model Results: estimate se zval pval ci.lb ci.ub intrcpt -0.2324 0.5550 -0.4187 0.6754 -1.3201 0.8554 factor(alloc)random -0.3421 0.4180 -0.8183 0.4132 -1.1613 0.4772 factor(alloc)systematic 0.0101 0.4467 0.0226 0.9820 -0.8654 0.8856 year 0.0075 0.0194 0.3849 0.7003 -0.0306 0.0456 ablat -0.0236 0.0132 -1.7816 0.0748 -0.0495 0.0024 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By default, `alternate`

was made the reference level (since the first letter "a" comes before "r" and "s"), so the corresponding dummy variable for this level has been left out.^{2)} Therefore, the model intercept (coefficients $b_0$) is the estimated (average) log risk ratio for alternating allocation in 1948 at 13 degrees absolute latitude. Coefficients $b_1$ and $b_2$ indicate how much lower/higher the estimated (average) log risk ratio is for random and systematic allocation, respectively, compared to alternating allocation. Finally, the coefficients for year and absolute latitude (i.e., $b_3$ and $b_4$) estimate how the (average) log risk ratio changes for a one-unit increase in the respective moderator.

The results under "Test of Moderators" indicate that we can (just) reject the null hypothesis $\mbox{H}_0{:}\; \beta_1 = \beta_2 = \beta_3 = \beta_4 = 0$ (i.e., this is a so-called omnibus test of coefficients 2 through 5 – the first being the intercept, which is typically denoted as $\beta_0$ in model equations and is not included in the test by default). In other words, it appears as if at least part of the heterogeneity in the true effects is related to some of the predictors/moderators included in the model.

Now consider the two coefficients for the allocation factor. Neither coefficient is significantly different from 0 (as indicated by the corresponding p-values), but this isn't a proper test of the factor as a whole. Instead, we want to simultaneously test whether both coefficients are simultaneously equal to 0. We can do this with:

anova(res, btt=2:3)

Test of Moderators (coefficients 2:3): QM(df = 2) = 1.3663, p-val = 0.5050

This is a (Wald-type) chi-square test of the null hypothesis $\mbox{H}_0{:}\; \beta_1 = \beta_2 = 0$ with two degrees of freedom. It indicates that the factor as a whole is not significant.

Note that the reference level chosen for a factor does not alter the results of the omnibus test. For example, we could make `random`

the reference level with:

res <- rma(yi, vi, mods = ~ relevel(factor(alloc), ref="random") + year + ablat, data=dat) anova(res, btt=2:3)

The results of the test are unchanged (output not shown).

### Likelihood Ratio Tests

We can also use likelihood ratio tests (LRT) to test sets of model coefficients. For this, we fit the model once with and once without the treatment allocation factor (i.e., the "full" and the "reduced" models) and then conduct a model comparison. When the two models to be compared differ in terms of their fixed effects (i.e., model coefficients) – as in the present example – then one must fit both models using maximum likelihood (ML) estimation, and not restricted maximum likelihood (REML) estimation (which is the default). So, the LRT can be conducted with:

res0 <- rma(yi, vi, mods = ~ year + ablat, data=dat, method="ML") res1 <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat, method="ML") anova(res0, res1)

df AIC BIC AICc logLik LRT pval QE tau^2 R^2 Full 6 25.8412 29.2309 39.8412 -6.9206 26.2030 0.0329 Reduced 4 23.2922 25.5520 28.2922 -7.6461 1.4510 0.4841 28.3251 0.0269 0%

In the present example, the test statistic for the LRT is equal to $1.45$, which (asymptotically) follows a chi-square distribution with two degrees of freedom (i.e., the difference in the number of parameters in the full and the reduced model) under the null hypothesis (i.e., that the reduced model fits the data as well as the full model). This leads to a p-value of $.48$, which does not lead us to reject the null hypothesis and hence to the conclusion that the inclusion of the allocation factor in the model does not provide a significantly better fit.

### Knapp & Hartung Adjustment

Knapp and Hartung (2003) described an adjustment to the standard Wald-type tests typically used in meta-regression models that leads to tests with better statistical properties (i.e., better control of the Type I error rate). The method leads to t-test for the individual model coefficients. The adjustment can also be applied when testing sets of model coefficients, which leads to F-tests. Applying this adjustment in the present example leads the following results:

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat, test="knha") res

Mixed-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.1796 (SE = 0.1425) tau (square root of estimated tau^2 value): 0.4238 I^2 (residual heterogeneity / unaccounted variability): 73.09% H^2 (unaccounted variability / sampling variability): 3.72 R^2 (amount of heterogeneity accounted for): 42.67% Test for Residual Heterogeneity: QE(df = 8) = 26.2030, p-val = 0.0010 Test of Moderators (coefficients 2:5): F(df1 = 4, df2 = 8) = 2.2668, p-val = 0.1509 Model Results: estimate se tval df pval ci.lb ci.ub intrcpt -0.2324 0.5688 -0.4085 8 0.6936 -1.5441 1.0794 factor(alloc)random -0.3421 0.4284 -0.7984 8 0.4477 -1.3300 0.6459 factor(alloc)systematic 0.0101 0.4579 0.0221 8 0.9829 -1.0457 1.0659 year 0.0075 0.0199 0.3755 8 0.7170 -0.0385 0.0534 ablat -0.0236 0.0136 -1.7382 8 0.1204 -0.0548 0.0077 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The test of the allocation factor can again be obtained with:

anova(res, btt=2:3)

Test of Moderators (coefficients 2:3): F(df1 = 2, df2 = 8) = 0.6503, p-val = 0.5474

We obtain $F(2,8) = 0.65, p = .55$, which yields the same conclusion as obtained earlier.

### Permutation Test

Yet another possibility is to conduct permutation tests (Higgins & Thompson, 2004), which are implemented in the `permutest()`

function. For this, the rows of the model matrix are permuted, so that any association between the predictors and the observed outcomes is purely due to chance. This process is repeated many times (ideally for all possible unique permutations), yielding the distribution of the test statistics under the null hypothesis. For the omnibus test, the p-value is simply the proportion of times that the test statistic under the permuted data is as extreme or more extreme than the actually observed one.

We can try to carry out an exact permutation test (for all possible unique permutations) with:

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat) permutest(res, exact=TRUE)

Running 6227020800 iterations for exact permutation test. Error in permutest.rma.uni(res, exact = T) : Number of iterations requested too large.

So, the exact test would require more than $6 \times 10^9$ model fits, which is just not feasible (even on a fast computer, this would easily take years to complete). So, instead, we can approximate the exact permutation-based p-values by going through a smaller number (as specified by the `iter`

argument) of random permutations (the default is 1000). Let's go with 10000 iterations (this may still take a minutes or so to complete). We also specify that we want to obtain the omnibus test of the second and third coefficient (to test the allocation factor):

permutest(res, iter=10000, btt=2:3)

Test of Moderators (coefficients 2:3):¹ QM(df = 2) = 1.3663, p-val = 0.5418 Model Results: estimate se zval pval¹ ci.lb ci.ub intrcpt -0.2324 0.5550 -0.4187 0.7661 -1.3201 0.8554 factor(alloc)random -0.3421 0.4180 -0.8183 0.4544 -1.1613 0.4772 factor(alloc)systematic 0.0101 0.4467 0.0226 0.9824 -0.8654 0.8856 year 0.0075 0.0194 0.3849 0.7370 -0.0306 0.0456 ablat -0.0236 0.0132 -1.7816 0.1094 -0.0495 0.0024 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 1) p-values based on permutation testing

The p-value for the omnibus test is equal to $.54$, which is very close to the one obtained with the Knapp and Hartung adjustment.

### Testing Linear Combinations

One can also test linear combinations of model coefficients. For example, one could test whether random and systematic allocation lead to significantly different effects with:

anova(res, X=c(0,1,-1,0,0))

Hypothesis: 1: factor(alloc)random - factor(alloc)systematic = 0 Results: estimate se zval pval 1: -0.3522 0.3357 -1.0489 0.2942 Test of Hypothesis: QM(df = 1) = 1.1002, p-val = 0.2942

If the model was fitted with the Knapp and Hartung adjustment, then we again get t- and F-tests:

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat, test="knha") anova(res, X=c(0,1,-1,0,0))

Hypothesis: 1: factor(alloc)random - factor(alloc)systematic = 0 Results: estimate se tval df pval 1: -0.3522 0.3441 -1.0234 8 0.3361 Test of Hypothesis: F(df1 = 1, df2 = 8) = 1.0473, p-val = 0.3361

As another example, we could test whether the estimated average log risk ratio based on random allocation in 1970 at an absolute latitude of 30 degrees is significantly different from zero. Recall that these two moderators were "centered" at 1948 and 13, respectively. So, this linear hypothesis can be tested with:

anova(res, X=c(1,1,0,1970-1948,30-13))

Hypothesis: 1: intrcpt + factor(alloc)random + 22*year + 17*ablat = 0 Results: estimate se tval df pval 1: -0.8103 0.2155 -3.7607 8 0.0055 ** Test of Hypothesis: F(df1 = 1, df2 = 8) = 14.1432, p-val = 0.0055

The estimated average log risk ratio for this combination of moderator values is significantly different from zero. We can also obtain the corresponding estimated average log risk ratio with:

tmp <- predict(res, newmods=c(1,0,1970-1948,30-13)) tmp

pred se ci.lb ci.ub pi.lb pi.ub -0.8103 0.2155 -1.3072 -0.3135 -1.9067 0.2860

In addition, the corresponding standard error, confidence interval, and prediction interval are given. Note that the intercept is automatically added (by default) and hence does not need to be included in the vector specified for the `newmods`

argument for the `predict()`

function.

### All Pairwise Comparisons

Finally, when including a factor in a model, we may be interested in obtaining all pairwise comparisons (contrasts) between the factor levels. Let's start with the model without the Knapp and Hartung adjustment.

res <- rma(yi, vi, mods = ~ factor(alloc) + year + ablat, data=dat) res

The table of the model coefficients then look as follows:

estimate se zval pval ci.lb ci.ub intrcpt -0.2324 0.5550 -0.4187 0.6754 -1.3201 0.8554 factor(alloc)random -0.3421 0.4180 -0.8183 0.4132 -1.1613 0.4772 factor(alloc)systematic 0.0101 0.4467 0.0226 0.9820 -0.8654 0.8856 year 0.0075 0.0194 0.3849 0.7003 -0.0306 0.0456 ablat -0.0236 0.0132 -1.7816 0.0748 -0.0495 0.0024 .

Note that coefficients $b_1$ and $b_2$ are already contrasts (between random and alternating allocation and between systematic and alternating allocation). The contrast between systematic and random allocation is just $b_2-b_1$. All these comparisons can be obtained in a single table with:

anova(res, X=rbind(c(0,1,0,0,0), c(0,0,1,0,0), c(0,-1,1,0,0)))

Hypotheses: 1: factor(alloc)random = 0 2: factor(alloc)systematic = 0 3: -factor(alloc)random + factor(alloc)systematic = 0 Results: estimate se zval pval 1: -0.3421 0.4180 -0.8183 0.4132 2: 0.0101 0.4467 0.0226 0.9820 3: 0.3522 0.3357 1.0489 0.2942

Note that the p-values are unadjusted, so the chances of making at least one Type I error can accumulate quickly when there are many factor levels. Using the `adjust`

argument, one can apply various corrections for multiple testing (see below for an example).

Corresponding confidence intervals (CIs) can be obtained with:

predict(res, newmods=rbind(c(1,0,0,0), c(0,1,0,0), c(-1,1,0,0)), intercept=FALSE)

pred se ci.lb ci.ub pi.lb pi.ub 1 -0.3421 0.4180 -1.1613 0.4772 -1.5087 0.8246 2 0.0101 0.4467 -0.8654 0.8856 -1.1967 1.2169 3 0.3522 0.3357 -0.3059 1.0102 -0.7075 1.4118

Recall that, by default, the intercept is automatically added to the vector(s) specified via the `newmods`

argument (when the model actually includes an intercept), which we do not want to happen here, so we set `intercept=FALSE`

.

We can also use an alternative parameterization of the same model, leaving out the intercept altogether, and including all three dummy variables:

res <- rma(yi, vi, mods = ~ 0 + factor(alloc) + year + ablat, data=dat) res

The table of coefficients now looks like this:

estimate se zval pval ci.lb ci.ub factor(alloc)alternate -0.2324 0.5550 -0.4187 0.6754 -1.3201 0.8554 factor(alloc)random -0.5744 0.6547 -0.8774 0.3803 -1.8576 0.7087 factor(alloc)systematic -0.2223 0.6636 -0.3349 0.7377 -1.5230 1.0784 year 0.0075 0.0194 0.3849 0.7003 -0.0306 0.0456 ablat -0.0236 0.0132 -1.7816 0.0748 -0.0495 0.0024 .

So, $b_1$, $b_2$, and $b_3$ are now the estimated (average) log risk ratios for alternating, random, and systematic allocation in 1948 at 13 degrees absolute latitude.

To obtain all pairwise comparisons, we could now use:

anova(res, X=rbind(c(-1,1,0,0,0), c(-1,0,1,0,0), c(0,-1,1,0,0)))

However, more convenient is to make use of the `pairwise()`

function to specify for which coefficients we want to obtain all pairwise contrasts:

anova(res, X=pairwise(btt=1:3))

Hypotheses: 1: -factor(alloc)alternate + factor(alloc)random = 0 2: -factor(alloc)alternate + factor(alloc)systematic = 0 3: -factor(alloc)random + factor(alloc)systematic = 0 Results: estimate se zval pval 1: -0.3421 0.4180 -0.8183 0.4132 2: 0.0101 0.4467 0.0226 0.9820 3: 0.3522 0.3357 1.0489 0.2942

The p-values for the contrasts can be adjusted using various methods (see `help(anova.rma)`

and `help(p.adjust)`

for possible options). For example, Holm's method is often a good choice, as it provides strong control of the family-wise error rate, but is not as conservative as the Bonferroni correction. We can obtain the adjusted p-values with:

anova(res, X=pairwise(btt=1:3), adjust="holm")

Hypotheses: 1: -factor(alloc)alternate + factor(alloc)random = 0 2: -factor(alloc)alternate + factor(alloc)systematic = 0 3: -factor(alloc)random + factor(alloc)systematic = 0 Results: estimate se zval pval 1: -0.3421 0.4180 -0.8183 0.8826 2: 0.0101 0.4467 0.0226 0.9820 3: 0.3522 0.3357 1.0489 0.8826

The Knapp and Hartung adjustment can be used in this context as well:

res <- rma(yi, vi, mods = ~ 0 + factor(alloc) + year + ablat, data=dat, test="knha") anova(res, X=pairwise(btt=1:3))

Hypotheses: 1: -factor(alloc)alternate + factor(alloc)random = 0 2: -factor(alloc)alternate + factor(alloc)systematic = 0 3: -factor(alloc)random + factor(alloc)systematic = 0 Results: estimate se tval df pval 1: -0.3421 0.4284 -0.7984 8 0.4477 2: 0.0101 0.4579 0.0221 8 0.9829 3: 0.3522 0.3441 1.0234 8 0.3361

Similarly, we can obtain confidence and prediction interval bounds for the contrasts with:

predict(res, newmods=pairwise(btt=1:3))

pred se ci.lb ci.ub pi.lb pi.ub factor(alloc)alternate-factor(alloc)random -0.3421 0.4284 -1.3300 0.6459 -1.7317 1.0476 factor(alloc)alternate-factor(alloc)systematic 0.0101 0.4579 -1.0457 1.0659 -1.4286 1.4488 factor(alloc)random-factor(alloc)systematic 0.3522 0.3441 -0.4414 1.1457 -0.9067 1.6110

These intervals are now based on t-distributions.

### References

Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., et al. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: Meta-analysis of the published literature. *Journal of the American Medical Association, 271*(9), 698–702.

Higgins, J. P. T., & Thompson, S. G. (2004). Controlling the risk of spurious findings from meta-regression. *Statistics in Medicine, 23*(11), 1663–1682.

Knapp, G., & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. *Statistics in Medicine, 22*(17), 2693–2710.

^{1)}

^{2)}

`alloc`

is a character variable, the coercion to a factor via the `factor()`

function isn't strictly necessary (since this would have happened automatically anyway), but it's better to be explicit. In some cases, a numeric variable may have been used to represent levels of a categorical variable, in which case use of the `factor()`

function is crucial, so that it is properly treated as a categorical and not as a quantitative variable.