The metafor Package

Convergence Problems with the rma() Function

Some routines within the rma() function are not based on closed-form solutions, but require numerical (iterative) methods. In particular, when using the ML (maximum likelihood), REML (restricted maximum likelihood), and EB (empirical Bayes) estimators for $\tau^2$, the function makes use of the Fisher scoring algorithm, which is robust to poor starting values and usually converges quickly (Harville, 1977; Jennrich & Sampson, 1976). By default, the starting value is set equal to the value of the Hedges (HE) estimator and the algorithm terminates when the change in the estimated value of $\tau^2$ is smaller than $10^{-5}$ from one iteration to the next. The maximum number of iterations is 100 by default (which should be sufficient in most cases). Information on the progress of the algorithm can be obtained by setting verbose=TRUE or with control=list(verbose=TRUE).

Example 1: Convergence with Default Settings

Let's examine how the algorithm works in one particular example. First, we load the package and then create a vector with the effect size estimates and corresponding sampling variances:

library(metafor)
yi <- c(-0.47, -1.56, 0.18, 0.88, 0.74, 0.89, -0.05, 0.52, 2.08, 0.81)
vi <- c(0.663, 0.660, 0.125, 0.068, 0.971, 0.094, 0.509, 0.887, 0.704, 0.556)

The value of the Hedges (HE) estimator for these data is:

round(rma(yi, vi, method="HE")$tau2, 5)

[1] 0.41412

Now, when we fit the model with REML estimation (the default), we can examine the progress of the algorithm with:

res <- rma(yi, vi, verbose=TRUE, digits=5)

Iteration 0     tau^2 = 0.41412
Iteration 1     tau^2 = 0.26531
Iteration 2     tau^2 = 0.22359
Iteration 3     tau^2 = 0.20858
Iteration 4     tau^2 = 0.20274
Iteration 5     tau^2 = 0.20039
Iteration 6     tau^2 = 0.19944
Iteration 7     tau^2 = 0.19905
Iteration 8     tau^2 = 0.19889
Iteration 9     tau^2 = 0.19883
Iteration 10    tau^2 = 0.19880
Iteration 11    tau^2 = 0.19879
Iteration 12    tau^2 = 0.19878
Fisher scoring algorithm converged after 12 iterations.

Sidenote: For all of the examples given on this page, I will increase the number of decimals shown to 5 digits, but this only affects the displayed output, not the accuracy of the underlying algorithms.

So, the starting value for the algorithm is the one obtained with the HE estimator. After 12 iterations, the algorithm has converged to an estimate of 0.19878, as the change in the estimate from one iteration to the next is sufficiently small at this point (i.e., less than .00001). We can also visualize the progress of the algorithm by adding (red) points corresponding to the estimates at the various iterations to a profile plot of the restricted log-likelihood as a function of $\tau^2$.

tmp <- capture.output(rma(yi, vi, verbose=TRUE, digits=5))
tmp <- tmp[grepl("Iteration", tmp)]
 
tau2s <- as.numeric(sapply(strsplit(tmp, "="), function(x) x[2]))
lls   <- sapply(tau2s, function(x) logLik(rma(yi, vi, tau2=x)))
 
profile(res, xlim=c(0,0.5))
points(tau2s, lls, pch=19, cex=1.5, col="red")

Below is an animation illustrating how the algorithm converges.

Example 2: Convergence with Increased Number of Iterations

A different starting value, threshold, and maximum number of iterations can be specified via the control argument by setting control=list(tau2.init=value, threshold=value, maxiter=value). The step length of the Fisher scoring algorithm can also be manually adjusted by a desired factor with control=list(stepadj=value) (values below 1 will reduce the step length).

These things can become useful when the Fisher scoring algorithm does not converge with the default settings. Let's consider an example.

yi <- c(1.30, 1.94, 0.70, 0.36, 1.31, 0.46, 1.24, 0.71, 0.35, 0.77)
vi <- c(0.640, 0.421, 0.992, 0.058, 0.756, 0.634, 0.79, 0.596, 0.457, 0.935)
res <- rma(yi, vi, verbose=TRUE, digits=5)

Iteration 0     tau^2 = 0.00000
Iteration 1     tau^2 = 0.17086
Iteration 2     tau^2 = 0.00854
...
Iteration 98    tau^2 = 0.08035
Iteration 99    tau^2 = 0.08354
Iteration 100   tau^2 = 0.08047
Error in rma(yi, vi, verbose = TRUE) :
  Fisher scoring algorithm did not converge. See 'help(rma)' for possible remedies.

No convergence is achieved after 100 iterations. In this case, increasing the number of iterations can help.

res <- rma(yi, vi, verbose=TRUE, digits=5, control=list(maxiter=1000))

Iteration 0     tau^2 = 0.00000
Iteration 1     tau^2 = 0.17086
Iteration 2     tau^2 = 0.00854
...
Iteration 248   tau^2 = 0.08197
Iteration 249   tau^2 = 0.08198
Iteration 250   tau^2 = 0.08197
Fisher scoring algorithm converged after 250 iterations.

It took a rather large number of iterations, but eventually convergence was achieved. An animation illustrating the progress of the algorithm is shown below.

Interestingly, the algorithm keeps overshooting the peak, cycling back and forth, but due to the decreasing amplitude of the oscillation, the peak (i.e., the REML estimate) is eventually found. This behavior suggests that the default step length is a bit too large in this case. When we adjust the step length by a constant factor below 1 (e.g., 0.5), convergence is rapidly achieved within a few iterations.

res <- rma(yi, vi, verbose=TRUE, digits=5, control=list(stepadj=0.5))

Iteration 0     tau^2 = 0.00000
Iteration 1     tau^2 = 0.08543
Iteration 2     tau^2 = 0.08205
Iteration 3     tau^2 = 0.08197
Iteration 4     tau^2 = 0.08197
Fisher scoring algorithm converged after 4 iterations.

That wasn't so difficult after all.

Example 3: Convergence with Adjusted Step Length

In fact, in some cases, convergence with the Fisher scoring algorithm is only obtained after decreasing the step length. Here is another example illustrating this scenario.

yi <- c(0.38, 0.58, -0.90, 0.32, -0.15, -0.29, 1.13, 0.39, 0.45, 0.11)
vi <- c(0.599, 0.431, 0.793, 0.599, 0.483, 0.478, 0.054, 0.453, 0.772, 0.216)
res <- rma(yi, vi, control=list(maxiter=10000))

Error in rma(yi, vi, control = list(maxiter = 10000)) :
  Fisher scoring algorithm did not converge. See 'help(rma)' for possible remedies.

Not even 10000 iterations are not enough to get the algorithm to converge. The animation below shows what the problem is.

Note how the algorithm keeps cycling through a series of values, repeatedly overshooting the target, and just never converges on the peak (I only went up to 500 iterations above, but the same thing continues to happen indefinitely). This again indicates that the step length is just too large here. Reducing the step length easily solves this problem.

res <- rma(yi, vi, verbose=TRUE, digits=5, control=list(stepadj=0.5))

Iteration 0     tau^2 = 0.00000
Iteration 1     tau^2 = 0.25241
Iteration 2     tau^2 = 0.16997
Iteration 3     tau^2 = 0.16625
Iteration 4     tau^2 = 0.16651
Iteration 5     tau^2 = 0.16649
Iteration 6     tau^2 = 0.16649
Fisher scoring algorithm converged after 6 iterations.

Much better.

Using rma.mv() for Model Fitting

As explained elsewhere on this site, the rma.mv() function can also be used to fit the same models as the rma() function. The underlying algorithms for the model fitting are a bit different though and make use of other optimization functions available in R (choices include optim(), nlminb(), and a bunch of others). So, in case rma() does not converge, another solution may be to switch to the rma.mv() function to see if this one works better. Let's try this out with the data from the previous example. Note that we need to explicitly add the random effects for each study when we use the rma.mv() function. So, the following code will fit the same model as rma() did earlier.

study <- 1:length(yi)
res <- rma.mv(yi, vi, random = ~ 1 | study)
round(res$sigma2, 5)

[1] 0.16649

The REML estimate of the variance component is the same as the one obtained earlier with the Fisher scoring algorithm after applying the step length adjustment. Note that minor numerical differences are possible when using different optimization routines. We can also see this when switching to a different optimizer.

res <- rma.mv(yi, vi, random = ~ 1 | study, control=list(optimizer="optim", optmethod="Nelder-Mead"))
round(res$sigma2, 5)

[1] 0.16645

Such differences are (usually) negligible and not practically relevant.

Maximum Likelihood Estimation

The examples above all involved the use of restricted maximum likelihood (REML) estimation, which is the default for models fitted with the rma() and rma.mv() functions. Regular maximum likelihood (ML) estimation can be used with method="ML". Let's try this out with the data from the previous example.

res <- rma(yi, vi, method="ML")
round(res$tau2, 5)

[1] 0.13701

No convergence problems with the default settings (of course, the estimate of $\tau^2$ is different). So, how well the algorithm works in any particular case can also depend on which estimator of $\tau^2$ is chosen.

Empirical Bayes / Paule-Mandel Estimator

The empirical Bayes (EB) estimator (Berkey et al., 1995; Morris, 1983) can also only be obtained iteratively. For the same data as used above, the Fisher scoring algorithm again has difficulties converging with the default settings, but this is easily remedied by adjusting the step length.

res <- rma(yi, vi, method="EB", control=list(stepadj=0.5))
round(res$tau2, 5)

[1] 0.04575

The empirical Bayes estimator can actually be shown to be identical to the estimator described by Paule and Mandel (1982). However, while the EB estimator is obtained via the Fisher scoring algorithm, the Paule-Mandel (PM) estimator is obtained in a different way, making use of the uniroot() function. By default, the desired accuracy and maximum number of iterations are set as described above. The upper bound of the interval searched is set to 100 (which should be large enough for most cases). The desired accuracy (threshold), maximum number of iterations (maxiter), and upper bound (tau2.max) can be adjusted with control=list(threshold=value, maxiter=value, tau2.max=value). But let's try obtaining the estimate with the default settings.

res <- rma(yi, vi, method="PM")
round(res$tau2, 5)

[1] 0.04575

Same value, at least up to 5 decimals places. In fact, if we increase the accuracy (i.e., lower the threshold) and round even less, we still get the same values with the EB and PM estimators.

res <- rma(yi, vi, method="EB", control=list(stepadj=0.5, threshold=1e-8))
res$tau2

[1] 0.04574773

res <- rma(yi, vi, method="PM", control=list(threshold=1e-8))
round(res$tau2, 8)

[1] 0.04574773

Afterword

Computing the values of the various estimators described above requires the use of optimization techniques or root-finding algorithms. While the specific application of such numerical methods considered here is relatively simple, it is still difficult to find a method that always works regardless of the data you throw at it.

When I first started to develop the routines that now form the basis of the rma() function (those routines go back further than the metafor package and its predecessor, the mima() function – see here), the ML, REML, and EB estimators were computed at the time via relatively simple iterative schemes (e.g., National Research Council, 1992; Thompson & Sharp, 1999) that seemed to lack a solid theoretical foundation and also could exhibit cyclical non-convergence behavior (Brockwell & Gordon, 2001). So, I programmed various optimization techniques from scratch, including the Newton-Raphson technique and the Fisher scoring algorithm and found that the latter worked quite well in most cases. As part of my dissertation research, I also showed that the Fisher scoring algorithm in this specific context can be rewritten such that it corresponds to the iterative schemes described in some of those early papers.

In the end, this implies that the algorithm, despite its long history and positive representation in the literature as a generally robust and effective numerical procedure for computing ML/REML estimates of variance components (e.g., Harville, 1977; Jennrich & Sampson, 1976), also inherited the potential for the less than desirable behavior (for certain meta-analytic models) as illustrated in some of the examples above. However, in pretty much all of the cases where this issue arises, the most effective solution appears to be a reduction of the step length. Of course, I could consider making this adjustment by default, but that could lead to backwards incompatibility. And since the solution is easy to apply (and there is an easy alternative by means of the rma.mv() function, at least for ML and REML estimation), I see little need to make any changes to the underlying code.

Two things are important to note though:

First of all, non-convergence of the algorithm is still a relatively rare occurrence. In practice, this is rarely going to affect the user. On the other hand, simulation studies that make use of the metafor package – where the same model is applied thousands of times to simulated data – may require setting up 'contingency plans' when non-convergence is encountered (the try() function is your friend here).

Second, non-convergence is not a deficiency of the estimators themselves. The ML, REML, and EB/PM estimators all have a solid theoretical basis – it's just that they cannot be computed with a simple equation, so numerical methods must be used. And any specific numerical method can sometimes fail. But such problems can usually be overcome with simple adjustments to the algorithm or switching to another method. So, please keep this in mind when you read criticisms of iterative estimators (such as ML, REML, or EB/PM) related to non-convergence: The problem does not lie with the estimators themselves but with the numerical methods used to compute them.

References

Berkey, C. S., Hoaglin, D. C., Mosteller, F., & Colditz, G. A. (1995). A random-effects regression model for meta-analysis. Statistics in Medicine, 14(4), 395–411.

Brockwell, S. E., & Gordon, I. R. (2001). A comparison of statistical methods for meta-analysis. Statistics in Medicine, 20(6), 825–840.

Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association, 72(358), 320–338.

Jennrich, R. I., & Sampson, P. F. (1976). Newton-Raphson and related algorithms for maximum likelihood variance component estimation. Technometrics, 18(1), 11–17.

Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications (with discussion). Journal of the American Statistical Association, 78(381), 47-65.

National Research Council (1992). Combining information: Statistical issues and opportunities. Washington, DC: National Academic Press.

Paule, R. C., & Mandel, J. (1982). Consensus values and weighting factors. Journal of Research of the National Bureau of Standards, 87(5), 377–385.

Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in meta-analysis: A comparison of methods. Statistics in Medicine, 18(20), 2693–2708.

Table of Contents