# The metafor Package

A Meta-Analysis Package for R

news:news

## Package News

### 2023-03-19: Version 4.0-0 Released on CRAN

I am excited to announce the official (i.e., CRAN) release of version 4.0-0 of the metafor package. This will be the 30th update to the package after its initial release in 2009. Since then, the package has grown from a measly 4460 lines of code / 60 functions / 76 pages of documentation to a respectable 36879 lines of code / 330 functions / 347 pages of documentation. Aside from a few improvements related to modeling (e.g., the emmprep() function provides easier interoperability with the emmeans package and the selmodel() function gains a few additional selection models), I would say the focus of this update was on steps that occur prior to modeling, namely the calculation of the chosen effect size measure (or outcome measure as I prefer to call it) and the construction of the dataset in general.

In particular, the escalc() function now allows the user to also input appropriate test statistics and/or p-values for a number of measures where these can be directly transformed into the corresponding values of the measure. For example, the t-statistic from an independent samples t-test can be easily transformed into a standardized mean difference or the t-statistic from a standard test of a correlation coefficient can be easily transformed into the correlation coefficient or its r-to-z transformed version. Speaking of the latter, essentially all correlation-type measures can now be transformed using the r-to-z transformation, although it should be noted that this is not a proper variance-stabilizing transformation for all measures. This can still be useful though since the r-to-z transformation also has normalizing properties and when combining different types of correlation coefficients in the same analysis (e.g., Pearson product-moment correlations and tetrachoric/biserial correlations).

Finally, there are now several functions in the package that facilitate the construction of the dataset for a meta-analysis more generally. The conv.2x2() function helps to reconstruct 2x2 tables based on various pieces of information (e.g., odds ratios, chi-square statistics), while the conv.fivenum() function provides various methods for computing (or more precisely, estimating) means and standard deviations based on five-number summary values (i.e., the minimum, first quartile, median, third quartile, and maximum) and subsets thereof. The conv.wald() function converts Wald-type tests and/or confidence intervals to effect sizes and corresponding sampling variances (e.g., to transform a reported odds ratio and its confidence interval to the corresponding log odds ratio and sampling variance). And the conv.delta() function transforms effect sizes or outcomes and their sampling variances using the delta method, which can be useful in several data preparations steps. See the documentation of these functions for further details and examples.

If you come across any issues/bugs, please report them here. However, for questions or discussions about these functions (or really anything related to the metafor package or meta-analysis with R in general), please use the R-sig-meta-analysis mailing list.

### 2022-10-15: Allowing $\tau^2$ to Differ Across Subgroups

A while ago, I wrote up a little discussion on how comparing estimates of independent meta-analyses or subgroups here. Part of this discussion touched upon subgroup analyses and two possible approaches thereof, one where we assume that the amount of heterogeneity ($\tau^2$) is the same within the various subgroups and one where we relax this assumption and allow $\tau^2$ to differ across groups. I now wrote up a more extensive discussion of this here where I also illustrate several different methods/models for conducting a subgroup analysis that allow the amount of heterogeneity to differ across the subgroups.

### 2022-10-07: Convergence Problems with the rma.mv() Function

I already had written up a little discussion of convergence problems that can arise when using the rma() (i.e., rma.uni()) function (and some remedies to deal with them) previously. I have now added an analogous discussion for the rma.mv() function here.

### 2022-09-26: Confidence Intervals for $R^2$

The (pseudo) $R^2$ statistic that is shown in the output for meta-regression models fitted with the rma() function provides an estimate of how much of the total amount of heterogeneity is accounted for by the moderator(s) included in the model (Raudenbush, 2009). However, it is important to realize that this statistic can be quite inaccurate, especially when the number of studies ($k$) is small. We may therefore want to construct a confidence interval to get a better sense of how precise the value may be. This can be done using bootstrapping, as illustrated here. I also conducted a small simulation study to examine how well bootstrapping actually works for constructing CIs for $R^2$. The results indicate that the bias-corrected and accelerated (BCa) CI actually works quite well, as long as $k$ is at least 40 and the true value of $R^2$ is not too small.

### 2022-08-27: Version 3.8-1 Released on CRAN

I just sent a new version (3.8-1) of the metafor package to CRAN. This update was prompted due to a small issue in the help pages (related to my use of MathJax to render nice equations in the docs), which was easy to fix. I took the opportunity to incorporate some other updates into the new version, which provide a bit of polish.

One thing I am kind of excited about is the completely overhauled vif() function for computing variance inflation factors. One of the major difficulties with VIFs is their interpretation. Is a particular value 'large'? Commonly used cutoffs like 5 or 10 are quite arbitrary. To make it easier to gauge whether a VIF value is relatively large, one can now simulate the distribution of a VIF under independence, similar to a 'parallel analysis' that is used in factor analysis to determine the number of factors. One can then examine how extreme the actually observed VIF is under this distribution. A plot method is also available to visualize this.

There is now some more support for using an identity link when fitting location-scale models with the rma() function, although the default log link is typically the better choice and avoids having to use constrained optimization to fit the model.

I also added (experimental!) support for additional measures (e.g., log risk ratios and risk differences) to rma.glmm() (by using log and identity links in the generalized linear mixed-effects model), but using these measures will often lead to estimation problems. For 2x2 table data, the log odds ratio (i.e., using a logit link) is still the preferred choice.

Another nice feature when computing standardized mean differences with the escalc() function is that one can now specify d-values and t-test statistics directly. This makes it easier to assemble data for a meta-analysis with SMD values, as described here.

Aside from this, there were a few smaller improvements. The full changelog can be found here.

### 2022-08-22: Another Multilevel Meta-Analysis Example

My go-to example for illustrating the use of a multilevel meta-analysis is based on the dataset in the paper by Konstantopoulos (2011). It is a nice example, but slightly unusual, since the multilevel structure in the dataset involves a clustering variable above studies.

In practice, one is more likely to encounter cases where studies form the higher level clustering variable, with multiple effects/outcomes reported within at least some of the studies (e.g., for different samples). Such an example arises in the meta-analysis by Credé et al. (2010) on the relationship between class attendance and grades, which I have now added as a case study to the analysis examples.

### 2022-06-19: Difference Between the Omnibus Test and Individual Predictors

When a meta-regression model includes multiple predictors, one can examine the significance of each individual predictor (i.e., coefficient), but also the significance of the model as whole. For the latter, we can conduct an omnibus test that tests the null hypothesis that all predictors are unrelated to the effect sizes. It can happen that the omnibus test leads to a different conclusion than tests of the individual coefficients. An illustration and discussion around this phenomenon is provided here.

### 2022-06-13: Forest Plot with Adjusted Text Position

By default, the forest() function places the annotations (giving the study-specific estimates and corresponding confidence interval bounds) to the right of the actual forest plot. See here for an example. Sometimes, we might prefer to show those annotations in a different position. This can be done with the textpos argument, which takes as input a vector of length 2 to specify the placement of the study labels and the annotations. An illustration of the use of this argument is provided here.

### 2022-05-24: Increasing Value of $\tau^2$ When Adding Moderators

In (mixed-effects) meta-regression models, it can happen that the estimate of $\tau^2$ (which denotes the amount of heterogeneity not accounted for by the model) exceeds the estimate of $\tau^2$ from the random-effects model (which denotes the total amount of heterogeneity). I have written up a little illustration to demonstrate and discuss this somewhat counterintuitive result here.

### 2022-04-27: Forest Plot with Aggregated Values

When a meta-analysis involves studies that contribute multiple effect size estimates to the analysis, the dataset can quickly become so large that drawing a forest plot of the individual estimates becomes infeasible. As an alternative visualization, we can aggregate estimates within studies and then use these aggregated values in a forest plot. I have written up a little illustration to show how this can be done.

### 2022-04-21: Version 3.4 Released on CRAN

A new version of the metafor package (version 3.4) has been released. Some highlights:

• The vcalc() function was added. With this function, one can construct or approximate the variance-covariance matrix of dependent effect sizes for a wide variety of circumstances.
• The robust() function, for cluster-robust inferences (also known as robust variance estimation), now interfaces fully with the excellent clubSandwich package, so one can make use of the improved methods therein.
• For meta-analyses involving complex dependency structures, vcalc(), rma.mv(), and robust(..., clubSandwich=TRUE) are all part of a general workflow that can handle the vast majority of dependencies in meta-analyses, as described here.
• The aggregate.escalc() method – for aggregating escalc datasets with dependent effect sizes to a higher level – has a new 'structure' (struct="CS+CAR") if there are effects at multiple time points and multiple effect sizes at these time points.
• A few more measures were added to escalc(): "MPORM" for computing marginal log odds ratios based on marginal 2x2 tables and "REH" for computing the (log transformed) relative excess heterozygosity (this is a bit more esoteric stuff).
• rma.glmm() – for meta-analytic generalized linear (mixed-effects) models – allows more flexibility in the coding of the group variable and whether the random study effects should be allowed to be correlated with the random group effects.
• Even more optimizer choices for rma.mv() (including a subspace-searching simplex algorithm and the Barzilai-Borwein gradient decent method): If you can't get the model to converge with any of the available options, all hope is lost!
• All datasets that used to be part of the metafor package have now been moved to the metadat package (which now includes even more meta-analysis datasets).
• A bunch of smaller convenience features (e.g., some as.data.frame() methods, a refit argument in anova(), more use of a data argument), a few clever tricks with a custom package environment to store settings, and free candy (not really).
• Lots of documentation updates, including a description of fixed- versus random-effects models, some recommended practices, and miscellaneous options and features.

The full changelog can be found here.

### 2022-03-20: Forest Plot with Exact Confidence Intervals

A question was recently raised on the R-sig-meta-analysis mailing list that asked about the difference between the confidence intervals shown in forest plots and those computed based on 'exact' methods (see here for the question and here for my response). Using a slightly more common example of a meta-analysis based on $2 \times 2$ table data, I have written up a little illustration to show how one can create a forest plot with exact confidence intervals.

### 2022-03-12: Over 10,000 Citations

Since I don't obsessively check my Google Scholar profile like everybody else does, it is by mere coincidence that I noticed that my JSS paper about the metafor package has now been cited more than 10,000 times (of course, like everybody else, I will ignore the Web of Science count, which isn't quite there yet ...). I greatly appreciate that people are citing the paper and hence supporting the creation and maintenance of this R package in this way. It can still be difficult to receive proper credit for software development in academia, so citing the software is one of the best ways that you can support developers in their work (aside from donating a million bucks you happen to have lying around). I think it also helps if there is a paper or book about the software, which is sometimes a bit easier to cite than the software itself (what was again the APA style for citing software?) and citation counts are more easily tracked for papers/books than citations of the software itself.

### 2022-03-06: Specifying Inputs to the rma() Function

Unfortunately, I have seen a number of cases where users of the metafor package have misspecified the inputs to the rma() function, giving the standard errors of the effect sizes as an unnamed second argument. This will lead to incorrect results. To explain the problem in more detail (and so that I can simply point people to a place where this issue is explained thoroughly), I have written up this discussion.

### 2022-01-02: More Forest Plot Examples

Happy New Year! Hope this one will be at least marginally less crazy than the previous ones ...

I was recently asked whether I would add the feature to show multiple confidence intervals for each of the studies in a forest plot (e.g., by using lines with varying thickness) to the metafor package. Turns out that one can already do this without too much difficulty using the existing tools, simply by superimposing two forest plots on top of each other. This is illustrated here.

I also wanted to see to what extent one can reproduce forest plots created by different software or using the aesthetics of certain journals. I started with the recreation of a forest plot that was obtained using RevMan, the software provided by the Cochrane Collaboration for conducting and authoring Cochrane reviews. You can find the figure and corresponding code for this here. Then I recreated a forest plot that was obtained from an article in the British Medical Journal. The resulting figure and code can be found here.

Although it takes a bit of effort to recreate these figures (especially if one wants to make them look almost identical to the originals), it shows that one can essentially recreate any forest plot using the various forest() functions from metafor and then some additional functions like text(), points(), and so on, which give you full control over how things are drawn and the information included in the figure.