The metafor Package

A Meta-Analysis Package for R

User Tools

Site Tools


tips:convergence_problems_rma

Differences

This shows you the differences between two versions of the page.


Next revision
tips:convergence_problems_rma [2019/09/28 10:26] – external edit 127.0.0.1
Line 1: Line 1:
 +===== Convergence Problems with the rma() Function =====
 +
 +Some routines within the ''rma()'' function are not based on closed-form solutions, but require numerical (iterative) methods. In particular, when using the ML (maximum likelihood), REML (restricted maximum likelihood), and EB (empirical Bayes) estimators for $\tau^2$, the function makes use of the [[wp>Scoring_algorithm|Fisher scoring algorithm]], which is robust to poor starting values and usually converges quickly (Harville, 1977; Jennrich & Sampson, 1976). By default, the starting value is set equal to the value of the Hedges (HE) estimator and the algorithm terminates when the change in the estimated value of $\tau^2$ is smaller than $10^{-5}$ from one iteration to the next. The maximum number of iterations is 100 by default (which should be sufficient in most cases). Information on the progress of the algorithm can be obtained by setting ''verbose=TRUE'' or with ''control=list(verbose=TRUE)''.
 +
 +==== Example 1: Convergence with Default Settings ====
 +
 +Let's examine how the algorithm works in one particular example. First, we load the package and then create a vector with the effect size estimates and corresponding sampling variances:
 +
 +<code rsplus>
 +library(metafor)
 +yi <- c(-0.47, -1.56, 0.18, 0.88, 0.74, 0.89, -0.05, 0.52, 2.08, 0.81)
 +vi <- c(0.663, 0.660, 0.125, 0.068, 0.971, 0.094, 0.509, 0.887, 0.704, 0.556)
 +</code>
 +
 +The value of the Hedges (HE) estimator for these data is:
 +<code rsplus>
 +round(rma(yi, vi, method="HE")$tau2, 5)
 +</code>
 +<code output>
 +[1] 0.41412
 +</code>
 +
 +Now, when we fit the model with REML estimation (the default), we can examine the progress of the algorithm with:
 +<code rsplus>
 +res <- rma(yi, vi, verbose=TRUE, digits=5)
 +</code>
 +<code output>
 +Iteration 0     tau^2 = 0.41412
 +Iteration 1     tau^2 = 0.26531
 +Iteration 2     tau^2 = 0.22359
 +Iteration 3     tau^2 = 0.20858
 +Iteration 4     tau^2 = 0.20274
 +Iteration 5     tau^2 = 0.20039
 +Iteration 6     tau^2 = 0.19944
 +Iteration 7     tau^2 = 0.19905
 +Iteration 8     tau^2 = 0.19889
 +Iteration 9     tau^2 = 0.19883
 +Iteration 10    tau^2 = 0.19880
 +Iteration 11    tau^2 = 0.19879
 +Iteration 12    tau^2 = 0.19878
 +Fisher scoring algorithm converged after 12 iterations.
 +</code>
 +
 +**Sidenote**: For all of the examples given on this page, I will increase the number of decimals shown to 5 digits, but this only affects the displayed output, not the accuracy of the underlying algorithms.
 +
 +So, the starting value for the algorithm is the one obtained with the HE estimator. After 12 iterations, the algorithm has converged to an estimate of 0.19878, as the change in the estimate from one iteration to the next is sufficiently small at this point (i.e., less than .00001). We can also visualize the progress of the algorithm by adding (red) points corresponding to the estimates at the various iterations to a profile plot of the restricted log-likelihood as a function of $\tau^2$.
 +
 +<code rsplus>
 +tmp <- capture.output(rma(yi, vi, verbose=TRUE, digits=5))
 +tmp <- tmp[grepl("Iteration", tmp)]
 +
 +tau2s <- as.numeric(sapply(strsplit(tmp, "="), function(x) x[2]))
 +lls   <- sapply(tau2s, function(x) logLik(rma(yi, vi, tau2=x)))
 +
 +profile(res, xlim=c(0,0.5))
 +points(tau2s, lls, pch=19, cex=1.5, col="red")
 +</code>
 +
 +Below is an animation illustrating how the algorithm converges.
 +
 +{{ tips:convergence_problems_rma_animation1.gif?nolink }}
 +
 +==== Example 2: Convergence with Increased Number of Iterations ====
 +
 +A different starting value, threshold, and maximum number of iterations can be specified via the ''control'' argument by setting ''control=list(tau2.init=value, threshold=value, maxiter=value)''. The step length of the Fisher scoring algorithm can also be manually adjusted by a desired factor with ''control=list(stepadj=value)'' (values below 1 will reduce the step length).
 +
 +These things can become useful when the Fisher scoring algorithm does not converge with the default settings. Let's consider an example.
 +
 +<code rsplus>
 +yi <- c(1.30, 1.94, 0.70, 0.36, 1.31, 0.46, 1.24, 0.71, 0.35, 0.77)
 +vi <- c(0.640, 0.421, 0.992, 0.058, 0.756, 0.634, 0.79, 0.596, 0.457, 0.935)
 +res <- rma(yi, vi, verbose=TRUE, digits=5)
 +</code>
 +<code output>
 +Iteration 0     tau^2 = 0.00000
 +Iteration 1     tau^2 = 0.17086
 +Iteration 2     tau^2 = 0.00854
 +...
 +Iteration 98    tau^2 = 0.08035
 +Iteration 99    tau^2 = 0.08354
 +Iteration 100   tau^2 = 0.08047
 +Error in rma(yi, vi, verbose = TRUE) :
 +  Fisher scoring algorithm did not converge. See 'help(rma)' for possible remedies.
 +</code>
 +
 +No convergence is achieved after 100 iterations. In this case, increasing the number of iterations can help.
 +
 +<code rsplus>
 +res <- rma(yi, vi, verbose=TRUE, digits=5, control=list(maxiter=1000))
 +</code>
 +<code output>
 +Iteration 0     tau^2 = 0.00000
 +Iteration 1     tau^2 = 0.17086
 +Iteration 2     tau^2 = 0.00854
 +...
 +Iteration 248   tau^2 = 0.08197
 +Iteration 249   tau^2 = 0.08198
 +Iteration 250   tau^2 = 0.08197
 +Fisher scoring algorithm converged after 250 iterations.
 +</code>
 +
 +It took a rather large number of iterations, but eventually convergence was achieved. An animation illustrating the progress of the algorithm is shown below.
 +
 +{{ tips:convergence_problems_rma_animation2.gif?nolink }}
 +
 +Interestingly, the algorithm keeps overshooting the peak, cycling back and forth, but due to the decreasing amplitude of the oscillation, the peak (i.e., the REML estimate) is eventually found. This behavior suggests that the default step length is a bit too large in this case. When we adjust the step length by a constant factor below 1 (e.g., 0.5), convergence is rapidly achieved within a few iterations.
 +
 +<code rsplus>
 +res <- rma(yi, vi, verbose=TRUE, digits=5, control=list(stepadj=0.5))
 +</code>
 +<code output>
 +Iteration 0     tau^2 = 0.00000
 +Iteration 1     tau^2 = 0.08543
 +Iteration 2     tau^2 = 0.08205
 +Iteration 3     tau^2 = 0.08197
 +Iteration 4     tau^2 = 0.08197
 +Fisher scoring algorithm converged after 4 iterations.
 +</code>
 +
 +That wasn't so difficult after all.
 +
 +==== Example 3: Convergence with Adjusted Step Length ====
 +
 +In fact, in some cases, convergence with the Fisher scoring algorithm is only obtained after decreasing the step length. Here is another example illustrating this scenario.
 +
 +<code rsplus>
 +yi <- c(0.38, 0.58, -0.90, 0.32, -0.15, -0.29, 1.13, 0.39, 0.45, 0.11)
 +vi <- c(0.599, 0.431, 0.793, 0.599, 0.483, 0.478, 0.054, 0.453, 0.772, 0.216)
 +res <- rma(yi, vi, control=list(maxiter=10000))
 +</code>
 +<code output>
 +Error in rma(yi, vi, control = list(maxiter = 10000)) :
 +  Fisher scoring algorithm did not converge. See 'help(rma)' for possible remedies.
 +</code>
 +
 +Not even 10000 iterations are not enough to get the algorithm to converge. The animation below shows what the problem is.
 +
 +{{ tips:convergence_problems_rma_animation3.gif?nolink }}
 +
 +Note how the algorithm keeps cycling through a series of values, repeatedly overshooting the target, and just never converges on the peak (I only went up to 500 iterations above, but the same thing continues to happen indefinitely). This again indicates that the step length is just too large here. Reducing the step length easily solves this problem.
 +
 +<code rsplus>
 +res <- rma(yi, vi, verbose=TRUE, digits=5, control=list(stepadj=0.5))
 +</code>
 +<code output>
 +Iteration 0     tau^2 = 0.00000
 +Iteration 1     tau^2 = 0.25241
 +Iteration 2     tau^2 = 0.16997
 +Iteration 3     tau^2 = 0.16625
 +Iteration 4     tau^2 = 0.16651
 +Iteration 5     tau^2 = 0.16649
 +Iteration 6     tau^2 = 0.16649
 +Fisher scoring algorithm converged after 6 iterations.
 +</code>
 +
 +Much better.
 +
 +==== Using rma.mv() for Model Fitting ====
 +
 +As explained [[tips:rma.uni_vs_rma.mv|elsewhere on this site]], the ''rma.mv()'' function can also be used to fit the same models as the ''rma()'' function. The underlying algorithms for the model fitting are a bit different though and make use of other optimization functions available in R (choices include ''optim()'', ''nlminb()'', and a bunch of others). So, in case ''rma()'' does not converge, another solution may be to switch to the ''rma.mv()'' function to see if this one works better. Let's try this out with the data from the previous example. Note that we need to explicitly add the random effects for each study when we use the ''rma.mv()'' function. So, the following code will fit the same model as ''rma()'' did earlier.
 +
 +<code rsplus>
 +study <- 1:length(yi)
 +res <- rma.mv(yi, vi, random = ~ 1 | study)
 +round(res$sigma2, 5)
 +</code>
 +<code output>
 +[1] 0.16649
 +</code>
 +
 +The REML estimate of the variance component is the same as the one obtained earlier with the Fisher scoring algorithm after applying the step length adjustment. Note that minor numerical differences are possible when using different optimization routines. We can also see this when switching to a different optimizer.
 +
 +<code rsplus>
 +res <- rma.mv(yi, vi, random = ~ 1 | study, control=list(optimizer="optim", optmethod="Nelder-Mead"))
 +round(res$sigma2, 5)
 +</code>
 +<code output>
 +[1] 0.16645
 +</code>
 +
 +==== Maximum Likelihood Estimation ====
 +
 +The examples above all involved the use of restricted maximum-likelihood (REML) estimation, which is the default for models fitted with the ''rma()'' and ''rma.mv()'' functions. Regular maximum likelihood (ML) estimation can be used with ''method="ML"''. Let's try this out with the data from the previous example.
 +
 +<code rsplus>
 +res <- rma(yi, vi, method="ML")
 +round(res$tau2, 5)
 +</code>
 +<code output>
 +[1] 0.13701
 +</code>
 +
 +No convergence problems with the default settings (of course, the estimate of $\tau^2$ is different). So, how well the algorithm works in any particular case can also depend on which estimator of $\tau^2$ is chosen.
 +
 +==== Empirical Bayes / Paule-Mandel Estimator ====
 +
 +The empirical Bayes (EB) estimator (Berkey et al., 1995; Morris, 1983) can also only be obtained iteratively. For the same data as used above, the Fisher scoring algorithm again has difficulties converging with the default settings, but this is easily remedied by adjusting the step length.
 +
 +<code rsplus>
 +res <- rma(yi, vi, method="EB", control=list(stepadj=0.5))
 +round(res$tau2, 5)
 +</code>
 +<code output>
 +[1] 0.04575
 +</code>
 +
 +The empirical Bayes estimator can actually be shown to be identical to the estimator described by Paule and Mandel (1982). However, while the EB estimator is obtained via the Fisher scoring algorithm, the Paule-Mandel (PM) estimator is obtained in a different way, making use of the ''uniroot()'' function. By default, the desired accuracy and maximum number of iterations are set as described above. The upper bound of the interval searched is set to 100 (which should be large enough for most cases). The desired accuracy (''threshold''), maximum number of iterations (''maxiter''), and upper bound (''tau2.max'') can be adjusted with ''control=list(threshold=value, maxiter=value, tau2.max=value)''. But let's try obtaining the estimate with the default settings.
 +
 +<code rsplus>
 +res <- rma(yi, vi, method="PM")
 +round(res$tau2, 5)
 +</code>
 +<code output>
 +[1] 0.04575
 +</code>
 +
 +Same value up, at least up to 5 decimals places. In fact, if we increase the accuracy (i.e., lower the threshold) and round even less, we still get the same values with the EB and PM estimators.
 +
 +<code rsplus>
 +res <- rma(yi, vi, method="EB", control=list(stepadj=0.5, threshold=1e-8))
 +res$tau2
 +</code>
 +<code output>
 +[1] 0.04574773
 +</code>
 +
 +<code rsplus>
 +res <- rma(yi, vi, method="PM", control=list(threshold=1e-8))
 +round(res$tau2, 8)
 +</code>
 +<code output>
 +[1] 0.04574773
 +</code>
 +
 +==== Afterword ====
 +
 +Computing the values of the various estimators described above requires the use of [[wp>Mathematical_optimization|optimization techniques]] or [[wp>Root-finding_algorithm|root-finding algorithms]]. While the specific application of such numerical methods considered here is relatively simple, it is still difficult to find a method that always works regardless of the data you throw at it.
 +
 +When I first started to develop the routines that now form the basis of the ''rma()'' function (those routines go back further than the metafor package and its predecessor, the ''mima()'' function -- see [[:faq#what_is_was_the_mima_function|here]]), the ML, REML, and EB estimators were computed at the time via relatively simple iterative schemes (e.g., National Research Council, 1992; Thompson & Sharp, 1999) that seemed to lack a solid theoretical foundation and also could exhibit cyclical non-convergence behavior (Brockwell & Gordon, 2001). So, I programmed various optimization techniques from scratch, including the [[wp>Newton's_method_in_optimization|Newton-Raphson technique]] and the [[wp>Scoring_algorithm|Fisher scoring algorithm]] and found that the latter worked quite well in most cases. As part of my dissertation research, I also showed that the Fisher scoring algorithm in this specific context can be rewritten such that it corresponds to the iterative schemes described in some of those early papers.
 +
 +In the end, this implies that the algorithm, despite its long history and positive representation in the literature as a generally robust and effective numerical procedure for computing ML/REML estimates of variance components (e.g., Harville, 1977; Jennrich & Sampson, 1976), also inherited the potential for the less than desirable behavior (for certain meta-analytic models) as illustrated in some of the examples above. However, in pretty much all of the cases where this issue arises, the most effective solution appears to be a reduction of the step length. Of course, I could consider making this adjustment by default, but that could lead to backwards incompatibility. And since the solution is easy to apply (and there is an easy alternative by means of the ''rma.mv()'' function, at least for ML and REML estimation), I see little need to make any changes to the underlying code.
 +
 +Two things are important to note though:
 +
 +First of all, non-convergence of the algorithm is still a relatively rare occurrence. In practice, this is rarely going to affect the user. On the other hand, simulation studies that make use of the metafor package -- where the same model is applied thousands of times to simulated data -- may require setting up 'contingency plans' when non-convergence is encountered (the ''try()'' function is your friend here).
 +
 +Second, non-convergence is not a deficiency of the estimators themselves. The ML, REML, and EB/PM estimators all have a solid theoretical basis -- it's just that they cannot be computed with a simple equation, so numerical methods must be used. And any specific numerical method can sometimes fail. But such problems can usually be overcome with simple adjustments to the algorithm or switching to another method. So, please keep this in mind when you read criticisms of iterative estimators (such as ML, REML, or EB/PM) related to non-convergence: The problem does not lie with the estimators themselves but with the numerical methods used to compute them.
 +
 +==== References ====
 +
 +Berkey, C. S., Hoaglin, D. C., Mosteller, F., & Colditz, G. A. (1995). A random-effects regression model for meta-analysis. //Statistics in Medicine, 14//(4), 395--411.
 +
 +Brockwell, S. E., & Gordon, I. R. (2001). A comparison of statistical methods for meta-analysis. //Statistics in Medicine, 20//(6), 825--840.
 +
 +Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. //Journal of the American Statistical Association, 72//(358), 320--338.
 +
 +Jennrich, R. I., & Sampson, P. F. (1976). Newton-Raphson and related algorithms for maximum likelihood variance component estimation. //Technometrics, 18//(1), 11--17.
 +
 +Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications (with discussion). //Journal of the American Statistical Association, 78(381)//, 47-65.
 +
 +National Research Council (1992). //Combining information: Statistical issues and opportunities//. Washington, DC: National Academic Press.
 +
 +Paule, R. C., & Mandel, J. (1982). Consensus values and weighting factors. //Journal of Research of the National Bureau of Standards, 87//(5), 377--385.
 +
 +Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in meta-analysis: A comparison of methods. //Statistics in Medicine, 18//(20), 2693--2708.
  
tips/convergence_problems_rma.txt · Last modified: 2022/03/26 14:50 by Wolfgang Viechtbauer