## Specifying Inputs to the rma() Function

Unfortunately, I have seen a number of cases where users of the *metafor* package have misspecified the inputs to the `rma()`

function, which will then lead to incorrect results. To explain the problem in more detail, let's examine the arguments of the function (note: only the first three arguments are shown below):

args(rma)

function (yi, vi, sei, ...)

So we see that the first argument is called `yi`

and, as the documentation explains (see `help(rma)`

or go here), this argument is for specifying the **observed effect sizes or outcomes**. The second argument is called `vi`

and it is for specifying the corresponding **sampling variances** of the observed effect sizes or outcomes.

Say we have the following dataset with the standardized mean differences (Cohen's d values) for 6 studies (variable `dvals`

) and the corresponding sampling variances (variable `vars`

).

dat <- data.frame(dvals = c(.12, .39, .04, .52, -.16, .25), vars = c(.091, .064, .099, .072, .040, .018)) dat

dvals vars 1 0.12 0.091 2 0.39 0.064 3 0.04 0.099 4 0.52 0.072 5 -0.16 0.040 6 0.25 0.018

Then we can fit a random-effects model to these data with:

rma(dvals, vars, data=dat)

Random-Effects Model (k = 6; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.0105 (SE = 0.0372) tau (square root of estimated tau^2 value): 0.1027 I^2 (total heterogeneity / total variability): 17.10% H^2 (total variability / sampling variability): 1.21 Test for Heterogeneity: Q(df = 5) = 5.6811, p-val = 0.3385 Model Results: estimate se zval pval ci.lb ci.ub 0.1880 0.1003 1.8735 0.0610 -0.0087 0.3846 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When we write `rma(dvals, vars, data=dat)`

, we have two unnamed inputs (i.e., `dvals`

and `vars`

) and one input whose argument name is specified (i.e., `data=dat`

). Unnamed inputs are matched up to their corresponding arguments by their position (see here). Therefore, the code above is really a shortcut for writing `rma(yi=dvals, vi=vars, data=dat)`

.

At times, users may have a dataset that doesn't include the sampling variances of the effect sizes, but their **standard errors**, which are simply the square-root of the sampling variances:

dat$stderrs <- sqrt(dat$var) dat$vars <- NULL dat

dvals stderrs 1 0.12 0.3016621 2 0.39 0.2529822 3 0.04 0.3146427 4 0.52 0.2683282 5 -0.16 0.2000000 6 0.25 0.1341641

If we now try to do the same as above, where we specify `dvals`

and `stderrs`

as inputs relying on positional matching, then we get:

rma(dvals, stderrs, data=dat)

Random-Effects Model (k = 6; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.1405) tau (square root of estimated tau^2 value): 0 I^2 (total heterogeneity / total variability): 0.00% H^2 (total variability / sampling variability): 1.00 Test for Heterogeneity: Q(df = 5) = 1.2910, p-val = 0.9359 Model Results: estimate se zval pval ci.lb ci.ub 0.1903 0.1938 0.9821 0.3260 -0.1895 0.5702 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These results are **completely wrong**, since we have in essence used the code `rma(yi=dvals, vi=stderrs, data=dat)`

, so we told the function that variable `stderrs`

contains the sampling variances of the effect sizes, when in fact this variable provides the standard errors.

Going back to the arguments of the `rma()`

function above, we see that the third argument is called `sei`

, which is for specifying the standard errors of the effect sizes (which is only relevant when not using the `vi`

argument). So, we can get the correct results if we fit the model with:

rma(dvals, sei=stderrs, data=dat)

Random-Effects Model (k = 6; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.0105 (SE = 0.0372) tau (square root of estimated tau^2 value): 0.1027 I^2 (total heterogeneity / total variability): 17.10% H^2 (total variability / sampling variability): 1.21 Test for Heterogeneity: Q(df = 5) = 5.6811, p-val = 0.3385 Model Results: estimate se zval pval ci.lb ci.ub 0.1880 0.1003 1.8735 0.0610 -0.0087 0.3846 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Equivalently, we could have fitted the model with `rma(dvals, stderrs^2, data=dat)`

to obtain the same (correct) results (i.e., the second argument is for specifying the square of the standard errors).

Unfortunately, I have seen code of the form `rma(esvar, sevar)`

on various occasions (e.g., 1, 2, 3, 4), have corrected this in a number of reviews I have done for various journals (which was possible when the authors actually provided the code for their analyses together with the manuscript), but I am quite certain that there are published meta-analyses out there where the authors have made the mistake described above, in which case every result reported based on this mistake is completely wrong.

The discussion above might raise a number of questions:

**1) Why are the inputs to rma() like this?**

One could indeed argue that the `rma()`

function is a bit unusual in the way the inputs should be specified. For example, in the *meta* package, the function `metagen(TE, seTE, ...)`

uses the standard errors as the second argument and in the *rmeta* package, we see that `meta.summaries(d, se, ...)`

works similarly. Stata users may be familiar with `meta set esvar sevar`

to specify the variables for the effect sizes and their standard errors and so might also be caught off guard when trying out `rma()`

without realizing this difference.

On the other hand, if we look at meta-analysis models as special cases of linear regression and mixed-effects models, we see that corresponding functions will typically require users to specify either the sampling variances (or their inverse) as input (although some additional adjustments may be needed so that the weighting carried out by those functions corresponds exactly to the way effect sizes are typically weighted in a meta-analysis; for more details, see here).

Moreover, once we look at `rma.mv()`

โ a function to fit multivariate/multilevel meta-analysis models (see `help(rma.mv)`

or go here for details) โ then we see that using variances as the second argument actually provides consistency across functions in the metafor package. In particular, for this function, the first two arguments are `rma.mv(yi, V, ...)`

, so here we specify again the observed effect sizes or outcomes as the first argument. The second argument is for specifying the variance-covariance matrix of the sampling errors (in case the sampling errors are independent, then `V`

can just be a vector with the sampling variances). Although there is also the square-root of a matrix, it would be a really bizarre thing to ask users to specify the square-root of the variance-covariance matrix of the sampling errors as input to the `rma.mv()`

function. Hence, in both functions, the second argument is for specifying variances (or even an entire variance-covariance matrix).

**2) Can you not do something to detect this problem?**

Based on the actual values specified for the second argument to `rma()`

, it is essentially impossible to tell if the values are sampling variances or standard errors, at least not in any foolproof/reliable way. Instead, one can look at the name of the variable specified via the second argument and then check if the name contains suspicious characters (e.g., `stderr`

, `SE`

) that might suggest that the mistake described above is being committed. Such a check is now actually implemented in the *metafor* package and would trigger a warning message in the example above. However, this check can only catch some obvious cases and hopefully won't generate many false positives (which would be annoying to users; also note that if the check triggers the warning, then this will only happen once per session). In addition, I will continue to clarify this issue whenever it arises in the hopes that this will reduce or even eliminate potential misuse of the `rma()`

function.

**A final note:** As Brenton Wiernik noted on Twitter, this problem can be avoided altogether by not relying on positional matching of arguments (at least not beyond the very first argument, whose meaning/purpose is typically unambiguous). This is also something that I teach, although I must admit that whenever I use `rma()`

, I am quite guilty of writing `rma(yi, vi, ...)`

(which might be confusing in itself, since *variable* `yi`

is passed to *argument* `yi`

and *variable* `vi`

is passed to *argument* `vi`

โ I purposefully avoided this ambiguity above by using variable names that differ from the argument names). In any case, if one would follow this principle in the example above, writing `rma(dvals, vi=stderrs, data=dat)`

has at least a greater potential to signal that something is off and that `rma(dvals, sei=stderrs, data=dat)`

would have been correct.