## Linear Regression and the Mixed-Effects Meta-Regression Model

The standard linear regression model is given by $$y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_p x_{ip} + e_i,$$ where $e_i \sim N(0, \sigma^2)$. Models of this sort can be fitted with the R function `lm()`

. The mixed-effects meta-regression model is given by $$y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_p x_{ip} + u_i + e_i,$$ where $u_i \sim N(0, \tau^2)$ and $e_i \sim N(0, v_i)$, where $v_i$ are the (approximately) known sampling variances of the observed outcomes or effect size estimates (e.g., standardized mean differences, log odds ratios). Models of this sort can be fitted with the function `rma()`

from the metafor package.

Consequently, if one were to set $v_i = 0$ for all outcomes, then the standard linear regression model and the mixed-effects meta-regression model are actually identical (with $\sigma^2$ denoting the same parameter as $\tau^2$). This equivalence can be demonstrated with an arbitrary dataset:

library(metafor) stackloss

Air.Flow Water.Temp Acid.Conc. stack.loss 1 80 27 89 42 2 80 27 88 37 3 75 25 90 37 4 62 24 87 28 5 62 22 87 18 6 62 23 87 18 7 62 24 93 19 8 62 24 93 20 9 58 23 87 15 10 58 18 80 14 11 58 18 89 14 12 58 17 88 13 13 58 18 82 11 14 58 19 93 12 15 50 18 89 8 16 50 18 86 7 17 50 19 72 8 18 50 19 79 8 19 50 20 80 9 20 56 20 82 15 21 70 20 91 15

See `help(stackloss)`

for more details on this dataset. Most importantly, variable `stack.loss`

is the dependent variable with `Air.Flow`

, `Water.Temp`

, and `Acid.Conc`

serving as potentially relevant predictors.

Now let's fit the standard linear regression model to these data with both the `lm()`

and the `rma()`

functions with:

res.lm <- lm(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., data=stackloss)

and

stackloss$vi <- 0 res.rma <- rma(stack.loss, vi, mods = ~ Air.Flow + Water.Temp + Acid.Conc., data=stackloss)

Note that all sampling variances are set to 0 for `rma()`

(the function will actually issue a warning that the dataset includes outcomes with non-positive sampling variances – which would be rather strange in the meta-analytic context – but this can be safely ignored here).

We can now compare the output from the two models with:

summary(res.lm)

Call: lm(formula = stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., data = stackloss) Residuals: Min 1Q Median 3Q Max -7.2377 -1.7117 -0.4551 2.3614 5.6978 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -39.9197 11.8960 -3.356 0.00375 ** Air.Flow 0.7156 0.1349 5.307 5.8e-05 *** Water.Temp 1.2953 0.3680 3.520 0.00263 ** Acid.Conc. -0.1521 0.1563 -0.973 0.34405 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.243 on 17 degrees of freedom Multiple R-squared: 0.9136, Adjusted R-squared: 0.8983 F-statistic: 59.9 on 3 and 17 DF, p-value: 3.016e-09

and

res.rma

Mixed-Effects Model (k = 21; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 10.5194 (SE = 3.6081) tau (square root of estimated tau^2 value): 3.2434 R^2 (amount of heterogeneity accounted for): 89.83% Test of Moderators (coefficient(s) 2:4): QM(df = 3) = 179.7067, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub intrcpt -39.9197 11.8960 -3.3557 0.0008 -63.2354 -16.6039 *** Air.Flow 0.7156 0.1349 5.3066 <.0001 0.4513 0.9800 *** Water.Temp 1.2953 0.3680 3.5196 0.0004 0.5740 2.0166 *** Acid.Conc. -0.1521 0.1563 -0.9733 0.3304 -0.4585 0.1542 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

A few things are worth noting:

- The estimated model coefficients, corresponding standard errors, and the test statistics are exactly the same. However,
`lm()`

computes the p-values based on the t-distribution, while`rma()`

uses (by default) the standard normal distribution.

- The omnibus test of the model coefficients conducted by
`lm()`

is an F-test (`F-statistic: 59.9 on 3 and 17 DF, p-value: 3.016e-09`

), while`rma()`

uses by default a chi-square test (`QM(df = 3) = 179.7067, p-val < .0001`

).

- The estimated
`Residual standard error: 3.243`

is the same as the estimate of`tau: 3.2434`

.

- The (pseudo) R-squared value reported by
`rma()`

(`89.83%`

) actually corresponds to the adjusted R-squared value provided by`lm()`

(`0.8983`

).

- The $I^2$ and $H^2$ statistics typically reported by the
`rma()`

function are missing, since these statistics cannot be computed when the dataset includes outcomes with non-positive sampling variances. Similarly, the usually reported results from the $Q$-test for heterogeneity are omitted for the same reason.

To get full correspondence between the two models, we can use the 'Knapp & Hartung' method when fitting the model with the `rma()`

function:

res.rma <- rma(stack.loss, vi, mods = ~ Air.Flow + Water.Temp + Acid.Conc., data=stackloss, test="knha") res.rma

Mixed-Effects Model (k = 21; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 10.5194 (SE = 3.6081) tau (square root of estimated tau^2 value): 3.2434 R^2 (amount of heterogeneity accounted for): 89.83% Test of Moderators (coefficient(s) 2:4): F(df1 = 3, df2 = 17) = 59.9022, p-val < .0001 Model Results: estimate se tval pval ci.lb ci.ub intrcpt -39.9197 11.8960 -3.3557 0.0038 -65.0180 -14.8213 ** Air.Flow 0.7156 0.1349 5.3066 <.0001 0.4311 1.0002 *** Water.Temp 1.2953 0.3680 3.5196 0.0026 0.5188 2.0717 ** Acid.Conc. -0.1521 0.1563 -0.9733 0.3440 -0.4819 0.1776 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now the p-values and the results from the omnibus test of the moderators also match up completely with the output from the `lm()`

function.