tips:rma_vs_lm_lme_lmer

In essence, the commonly used meta-analytic models are just special cases of the linear (mixed-effects) model with the only peculiar aspect that the variances of the error terms (i.e., the sampling variances) are known. Not surprisingly, the question therefore comes up occasionally why one cannot use the `lm()`

, `lme()`

, and `lmer()`

functions for conducting meta-analyses. After all, the functions can be used to fit linear (mixed-effects) models and both functions allow the user to specify the sampling variances via the `weights`

argument. So it seems that one should also be able to fit meta-analytic models with these functions. Below, I describe and illustrate how the models fitted via the `lm()`

, `lme()`

, and `lmer()`

functions differ from the models fitted by the `rma()`

function and why the those functions are therefore not suitable for fitting meta-analytic models.

Let's start with a fixed-effects model. As an example, consider the data from the meta-analysis by Molloy et al. (2014) on the relationship between conscientiousness and medication adherence. For each study, we can compute the r-to-z transformed correlation coefficient and corresponding sampling variance with:

library(metafor) dat <- escalc(measure="ZCOR", ri=ri, ni=ni, data=dat.molloy2014) dat[,-c(5:10)]

authors year ni ri yi vi 1 Axelsson et al. 2009 109 0.187 0.1892 0.0094 2 Axelsson et al. 2011 749 0.162 0.1634 0.0013 3 Bruce et al. 2010 55 0.340 0.3541 0.0192 4 Christensen et al. 1999 107 0.320 0.3316 0.0096 5 Christensen & Smith 1995 72 0.270 0.2769 0.0145 6 Cohen et al. 2004 65 0.000 0.0000 0.0161 7 Dobbels et al. 2005 174 0.175 0.1768 0.0058 8 Ediger et al. 2007 326 0.050 0.0500 0.0031 9 Insel et al. 2006 58 0.260 0.2661 0.0182 10 Jerant et al. 2011 771 0.010 0.0100 0.0013 11 Moran et al. 1997 56 -0.090 -0.0902 0.0189 12 O'Cleirigh et al. 2007 91 0.370 0.3884 0.0114 13 Penedo et al. 2003 116 0.000 0.0000 0.0088 14 Quine et al. 2012 537 0.150 0.1511 0.0019 15 Stilley et al. 2004 158 0.240 0.2448 0.0065 16 Wiebe & Christensen 1997 65 0.040 0.0400 0.0161

The `yi`

values are the r-to-z transformed correlations and the `vi`

values the corresponding sampling variances.

We can now fit a fixed-effects model to these data with:

res.fe <- rma(yi, vi, data=dat, method="FE") res.fe

Fixed-Effects Model (k = 16) Test for Heterogeneity: Q(df = 15) = 38.1595, p-val = 0.0009 Model Results: estimate se zval pval ci.lb ci.ub 0.1252 0.0170 7.3642 <.0001 0.0919 0.1585 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now let's try to fit the same model with the `lm()`

function by specifying the inverse of the sampling variances as weights:

res.lm <- lm(yi ~ 1, weights = 1/vi, data=dat) summary(res.lm)

Call: lm(formula = yi ~ 1, data = dat, weights = 1/vi) Weighted Residuals: Min 1Q Median 3Q Max -3.1919 -1.0719 0.6674 1.3173 2.4695 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.12518 0.02711 4.617 0.000335 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.595 on 15 degrees of freedom

Two things are of note here:

- The estimated intercept (
`0.12518`

) is exactly the same as the model coefficient obtained earlier with the`rma()`

function (the value reported by the`rma()`

function is rounded to 4 decimals, but that can be changed with`print(res.fe, digits=5)`

yielding the same value). - The standard error (of the estimated intercept) is different (and hence, the test statistic and p-value also differs).

The reason for this discrepancy is that the model fitted by the `lm()`

function assumes that the weights are not exactly known, but only up to a proportionality constant (namely $\sigma^2_e$, that is, the error or residual variance), which is estimated and then factored into the weights.

This can be demonstrated by extracting the estimated error variance from the `lm`

object, multiplying the sampling variances by that value, and re-fitting the model with the `rma()`

function:

rma(yi, vi*summary(res.lm)$sigma^2, data=dat, method="FE")

Fixed-Effects Model (k = 16) Test for Heterogeneity: Q(df = 15) = 15.0000, p-val = 0.4514 Model Results: estimate se zval pval ci.lb ci.ub 0.1252 0.0271 4.6171 <.0001 0.0720 0.1783 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now these results are exactly the same as those obtained by the `lm()`

function. Note that multiplying the sampling variances by some constant value does not affect the model estimate, but only the corresponding standard error.

Therefore, if we want to obtain the same standard error from `lm()`

as from `rma()`

, we must adjust the standard error by $\hat{\sigma}_e$ (i.e., the square-root of the estimated error variance or what is called the "residual standard error" in the output from the `lm()`

function):

coef(summary(res.fe))

estimate se zval pval ci.lb ci.ub 1 0.1251767 0.01699805 7.364176 1.782441e-13 0.09186109 0.1584922

coef(summary(res.lm))[1,2] / summary(res.lm)$sigma

[1] 0.01699805

A random-effects model can be fitted to the same data using the `rma()`

function with:

res.re <- rma(yi, vi, data=dat) res.re

Random-Effects Model (k = 16; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.0081 (SE = 0.0055) tau (square root of estimated tau^2 value): 0.0901 I^2 (total heterogeneity / total variability): 61.73% H^2 (total variability / sampling variability): 2.61 Test for Heterogeneity: Q(df = 15) = 38.1595, p-val = 0.0009 Model Results: estimate se zval pval ci.lb ci.ub 0.1499 0.0316 4.7501 <.0001 0.0881 0.2118 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now let's try to fit the same model with the `lme()`

function from the `nlme`

package. Here, the `weights`

argument is used to specify a variance function with fixed variances (note: the name of the `weights`

argument is a bit of a misnomer for the `lme()`

function, as one does not use it to specify the weights – as in the `lm()`

function – but a variance function structure). In addition, we need to add the random effects for each study via the `random`

argument.

library(nlme) dat$study <- 1:nrow(dat) res.lme <- lme(yi ~ 1, random = ~ 1 | study, weights = varFixed(~ vi), data=dat) summary(res.lme)

Linear mixed-effects model fit by REML Data: dat AIC BIC logLik -8.781569 -6.657418 7.390784 Random effects: Formula: ~1 | study (Intercept) Residual StdDev: 0.06818354 1.259548 Variance function: Structure: fixed weights Formula: ~vi Fixed effects: yi ~ 1 Value Std.Error DF t-value p-value (Intercept) 0.1415557 0.03071906 16 4.608073 3e-04 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.1596768 -0.6903414 0.1964221 0.7117538 1.4616798 Number of Observations: 16 Number of Groups: 16

Alternatively, we could use the `lmer()`

function from the `lme4`

package. The syntax would be:

library(lme4) res.lmer <- lmer(yi ~ 1 + (1 | study), weights = 1/vi, data=dat, control=lmerControl(check.nobs.vs.nlev="ignore", check.nobs.vs.nRE="ignore")) summary(res.lmer)

Linear mixed model fit by REML ['lmerMod'] Formula: yi ~ 1 + (1 | study) Data: dat Weights: 1/vi Control: lmerControl(check.nobs.vs.nlev = "ignore", check.nobs.vs.nRE = "ignore") REML criterion at convergence: -14.8 Scaled residuals: Min 1Q Median 3Q Max -1.1597 -0.6903 0.1964 0.7117 1.4617 Random effects: Groups Name Variance Std.Dev. study (Intercept) 0.004649 0.06818 Residual 1.586461 1.25955 Number of obs: 16, groups: study, 16 Fixed effects: Estimate Std. Error t value (Intercept) 0.14156 0.03072 4.608

Note that some of the default checks need to be switched off before `lmer()`

will go through with fitting this model.

A couple things are of note here:

- The estimated intercept differs from the model coefficient obtained earlier with the
`rma()`

function. - The standard error is also different (and hence, the test statistic and p-value also differs).
- The estimated standard deviation of the random effects also differs (
`0.0682`

for`lme()`

and`lmer()`

compared to`0.0901`

for`rma()`

).

These discrepancies are due to the exact same reason described earlier. The `lme()`

and `lmer()`

functions assume that the sampling variances are not exactly known, but again just up to a proportionality constant, namely the residual variance.

To illustrate this, we can again factor in that constant into the sampling variances and refit the model with `rma()`

:

rma(yi, vi*res.lme$sigma^2, data=dat)

Random-Effects Model (k = 16; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.0046 (SE = 0.0049) tau (square root of estimated tau^2 value): 0.0682 I^2 (total heterogeneity / total variability): 36.82% H^2 (total variability / sampling variability): 1.58 Test for Heterogeneity: Q(df = 15) = 24.0532, p-val = 0.0642 Model Results: estimate se zval pval ci.lb ci.ub 0.1416 0.0307 4.6081 <.0001 0.0813 0.2018 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now these results match exactly what was obtained with the `lme()`

function.

In terms of the underlying models, we can also easily describe the difference as follows. The underlying model fitted by the `rma()`

function is: $$y_i = \mu + u_i + e_i,$$ where $u_i \sim N(0, \tau^2)$ and $e_i \sim N(0, v_i)$, where the $v_i$ values are the known sampling variances. On the other hand, the model fitted by the `lme()`

and `lmer()`

functions is: $$y_i = \mu + u_i + e_i,$$ where $u_i \sim N(0, \tau^2)$ and $e_i \sim N(0, \sigma^2_e v_i)$, where $\sigma^2_e$ is that proportionality constant.

The examples above show models without any covariates/predictors/moderators (i.e., the models only include an intercept term). However, the exact same discrepancy will be found when including covariates into the models. In this context, Thompson and Sharp (1999) also describe the difference discussed above. They denote the model fitted by `lm()`

as a model that accounts for potential heterogeneity by a multiplicative factor (i.e., the residual variance), while the random-effects model fitted via the `rma()`

function is a model that accounts for potential heterogeneity by an additive factor. They note that the "rationale for using a multiplicative factor for variance inflation is weak" (p. 2705) and clearly recommend to use a model with an additive heterogeneity component.

The possibility of using both an additive and multiplicative component simultaneously was not discussed by Thompson and Sharp (1999). This is in fact what the model fitted with the `lme()`

function does. However, as far as I am aware of, no motivation supporting such a model has ever been given. In fact, it is my impression that those who use the `lme()`

and `lmer()`

functions for meta-analytic purposes are typically not aware of the issues discussed above.

Note that the same issues apply when fitting more complex meta-analytic models (e.g., multivariate/multilevel models). The `rma.mv()`

function in the metafor package can be used to fit such models (see the analysis examples section for some illustrations). Using the `lme()`

and `lmer()`

functions to fit such models will again lead to a mix of additive and multiplicative variance components, which is not how multivariate/multilevel meta-analytic models are typically defined.

The `nlme`

package was developed by José Pinheiro and Douglas Bates for both R and S-Plus. Interestingly, the S-Plus version has a special `control`

argument that allows the user to set the error variance to a fixed constant. By setting this to 1, one can fit the exact same model as the `rma()`

does:

res.lme <- lme(yi ~ 1, random = ~ 1 | study, weights = varFixed(~ vi), control=lmeControl(sigma = 1), data=dat) summary(res.lme)

Linear mixed-effects model fit by REML Data: dat AIC BIC logLik -10.44653 -9.03043 7.223265 Random effects: Formula: ~ 1 | study (Intercept) Residual StdDev: 0.09005302 1 Variance function: Structure: fixed weights Formula: ~ vi Fixed effects: yi ~ 1 Value Std.Error DF t-value p-value (Intercept) 0.1499171 0.03155994 16 4.750234 0.0002 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.222824 -0.5462851 0.09989517 0.6159687 1.305632 Number of Observations: 16 Number of Groups: 16

These are the exact same results as obtained earlier with the `rma()`

function. Unfortunately, the R version of the `nlme`

package does not provide this functionality.

**Update:** The R version of the `nlme`

package *does* allow the use of the `lmeControl(sigma = 1)`

control argument (this was added in version 3.1-123, which was released 2016-01-17). However, using this does not yield the same results as obtained above (the results are close but not the same). The results given above have also been compared to those obtained with Stata (metareg command) and SAS (proc mixed) and are definitely correct. This issue has also been raised on the R-sig-ME mailing list (see here).

The models fitted by the `rma()`

function assume that the sampling variances are known. The models fitted by the `lm()`

, `lme()`

, and `lmer()`

functions assume that the sampling variances are known only up to a proportionality constant. These are different models than typically used in meta-analyses.

Molloy, G. J., O'Carroll, R. E., & Ferguson, E. (2014). Conscientiousness and medication adherence: A meta-analysis. *Annals of Behavioral Medicine, 47*(1), 92–101.

Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in meta-analysis: A comparison of methods. *Statistics in Medicine, 18*(20), 2693–2708.

tips/rma_vs_lm_lme_lmer.txt · Last modified: 2016/07/01 17:25 (external edit)

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International