The metafor Package

This is an old revision of the document!

Linear Regression and the Mixed-Effects Meta-Regression Model

The standard linear regression model is given by $$y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_p x_{ip} + e_i,$$ where $e_i \sim N(0, \sigma^2)$. Models of this sort can be fitted with the R function lm(). The mixed-effects meta-regression model is given by $$y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_p x_{ip} + u_i + e_i,$$ where $u_i \sim N(0, \tau^2)$ and $e_i \sim N(0, v_i)$, where $v_i$ are the (approximately) known sampling variances of the observed outcomes or effect size estimates (e.g., standardized mean differences, log odds ratios). Models of this sort can be fitted with the function rma() from the metafor package.

Consequently, if one were to set $v_i = 0$ for all outcomes, then the standard linear regression model and the mixed-effects meta-regression model are actually identical (with $\sigma^2$ denoting the same parameter as $\tau^2$). This equivalence can be demonstrated with an arbitrary dataset:

library(metafor)
stackloss

   Air.Flow Water.Temp Acid.Conc. stack.loss
1        80         27         89         42
2        80         27         88         37
3        75         25         90         37
4        62         24         87         28
5        62         22         87         18
6        62         23         87         18
7        62         24         93         19
8        62         24         93         20
9        58         23         87         15
10       58         18         80         14
11       58         18         89         14
12       58         17         88         13
13       58         18         82         11
14       58         19         93         12
15       50         18         89          8
16       50         18         86          7
17       50         19         72          8
18       50         19         79          8
19       50         20         80          9
20       56         20         82         15
21       70         20         91         15

See help(stackloss) for more details on this dataset. Most importantly, variable stack.loss is the dependent variable with Air.Flow, Water.Temp, and Acid.Conc serving as potentially relevant predictors.

Now let's fit the standard linear regression model to these data with both the lm() and the rma() functions with:

res.lm <- lm(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., data=stackloss)

and

stackloss$vi <- 0
res.rma <- rma(stack.loss, vi, mods =  ~ Air.Flow + Water.Temp + Acid.Conc., data=stackloss)

Note that all sampling variances are set to 0 for rma() (the function will actually issue a warning that the dataset includes outcomes with non-positive sampling variances – which would be rather strange in the meta-analytic context – but this can be safely ignored here).

We can now compare the output from the two models with:

summary(res.lm)

Call:
lm(formula = stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., 
    data = stackloss)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-7.2377 -1.7117 -0.4551  2.3614  5.6978 
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -39.9197    11.8960  -3.356  0.00375 ** 
Air.Flow      0.7156     0.1349   5.307  5.8e-05 ***
Water.Temp    1.2953     0.3680   3.520  0.00263 ** 
Acid.Conc.   -0.1521     0.1563  -0.973  0.34405    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 3.243 on 17 degrees of freedom
Multiple R-squared:  0.9136,    Adjusted R-squared:  0.8983 
F-statistic:  59.9 on 3 and 17 DF,  p-value: 3.016e-09

and

res.rma

Mixed-Effects Model (k = 21; tau^2 estimator: REML)
 
tau^2 (estimated amount of residual heterogeneity):     10.5194 (SE = 3.6081)
tau (square root of estimated tau^2 value):             3.2434
R^2 (amount of heterogeneity accounted for):            89.83%
 
Test of Moderators (coefficient(s) 2:4): 
QM(df = 3) = 179.7067, p-val < .0001
 
Model Results:
 
            estimate       se     zval    pval     ci.lb     ci.ub     
intrcpt     -39.9197  11.8960  -3.3557  0.0008  -63.2354  -16.6039  ***
Air.Flow      0.7156   0.1349   5.3066  <.0001    0.4513    0.9800  ***
Water.Temp    1.2953   0.3680   3.5196  0.0004    0.5740    2.0166  ***
Acid.Conc.   -0.1521   0.1563  -0.9733  0.3304   -0.4585    0.1542     
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

A few things are worth noting:

The estimated model coefficients, corresponding standard errors, and the test statistics are exactly the same. However, lm() computes the p-values based on the t-distribution, while rma() uses (by default) the standard normal distribution.

The omnibus test of the model coefficients conducted by lm() is an F-test (F-statistic: 59.9 on 3 and 17 DF, p-value: 3.016e-09), while rma() uses by default a chi-square test (QM(df = 3) = 179.7067, p-val < .0001).

The estimated Residual standard error: 3.243 is the same as the estimate of tau: 3.2434.

The (pseudo) R-squared value reported by rma() (89.83%) actually corresponds to the adjusted R-squared value provided by lm() (0.8983).

The $I^2$ and $H^2$ statistics typically reported by the rma() function are missing, since these statistics cannot be computed when the dataset includes outcomes with non-positive sampling variances. Similarly, the usually reported results from the $Q$-test for heterogeneity are omitted for the same reason.

To get full correspondence between the two models, we can use the 'Knapp & Hartung' method when fitting the model with the rma() function:

res.rma <- rma(stack.loss, vi, mods =  ~ Air.Flow + Water.Temp + Acid.Conc., data=stackloss, test="knha")
res.rma

Mixed-Effects Model (k = 21; tau^2 estimator: REML)
 
tau^2 (estimated amount of residual heterogeneity):     10.5194 (SE = 3.6081)
tau (square root of estimated tau^2 value):             3.2434
R^2 (amount of heterogeneity accounted for):            89.83%
 
Test of Moderators (coefficient(s) 2:4): 
F(df1 = 3, df2 = 17) = 59.9022, p-val < .0001
 
Model Results:
 
            estimate       se     tval    pval     ci.lb     ci.ub     
intrcpt     -39.9197  11.8960  -3.3557  0.0038  -65.0180  -14.8213   **
Air.Flow      0.7156   0.1349   5.3066  <.0001    0.4311    1.0002  ***
Water.Temp    1.2953   0.3680   3.5196  0.0026    0.5188    2.0717   **
Acid.Conc.   -0.1521   0.1563  -0.9733  0.3440   -0.4819    0.1776     
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now the p-values and the results from the omnibus test of the moderators also match up completely with the output from the lm() function.