tips:convergence_problems_rma
Differences
This shows you the differences between two versions of the page.
Next revision | |||
— | tips:convergence_problems_rma [2019/09/28 10:26] – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ===== Convergence Problems with the rma() Function ===== | ||
+ | |||
+ | Some routines within the '' | ||
+ | |||
+ | ==== Example 1: Convergence with Default Settings ==== | ||
+ | |||
+ | Let's examine how the algorithm works in one particular example. First, we load the package and then create a vector with the effect size estimates and corresponding sampling variances: | ||
+ | |||
+ | <code rsplus> | ||
+ | library(metafor) | ||
+ | yi <- c(-0.47, -1.56, 0.18, 0.88, 0.74, 0.89, -0.05, 0.52, 2.08, 0.81) | ||
+ | vi <- c(0.663, 0.660, 0.125, 0.068, 0.971, 0.094, 0.509, 0.887, 0.704, 0.556) | ||
+ | </ | ||
+ | |||
+ | The value of the Hedges (HE) estimator for these data is: | ||
+ | <code rsplus> | ||
+ | round(rma(yi, | ||
+ | </ | ||
+ | <code output> | ||
+ | [1] 0.41412 | ||
+ | </ | ||
+ | |||
+ | Now, when we fit the model with REML estimation (the default), we can examine the progress of the algorithm with: | ||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, verbose=TRUE, | ||
+ | </ | ||
+ | <code output> | ||
+ | Iteration 0 tau^2 = 0.41412 | ||
+ | Iteration 1 tau^2 = 0.26531 | ||
+ | Iteration 2 tau^2 = 0.22359 | ||
+ | Iteration 3 tau^2 = 0.20858 | ||
+ | Iteration 4 tau^2 = 0.20274 | ||
+ | Iteration 5 tau^2 = 0.20039 | ||
+ | Iteration 6 tau^2 = 0.19944 | ||
+ | Iteration 7 tau^2 = 0.19905 | ||
+ | Iteration 8 tau^2 = 0.19889 | ||
+ | Iteration 9 tau^2 = 0.19883 | ||
+ | Iteration 10 tau^2 = 0.19880 | ||
+ | Iteration 11 tau^2 = 0.19879 | ||
+ | Iteration 12 tau^2 = 0.19878 | ||
+ | Fisher scoring algorithm converged after 12 iterations. | ||
+ | </ | ||
+ | |||
+ | **Sidenote**: | ||
+ | |||
+ | So, the starting value for the algorithm is the one obtained with the HE estimator. After 12 iterations, the algorithm has converged to an estimate of 0.19878, as the change in the estimate from one iteration to the next is sufficiently small at this point (i.e., less than .00001). We can also visualize the progress of the algorithm by adding (red) points corresponding to the estimates at the various iterations to a profile plot of the restricted log-likelihood as a function of $\tau^2$. | ||
+ | |||
+ | <code rsplus> | ||
+ | tmp <- capture.output(rma(yi, | ||
+ | tmp <- tmp[grepl(" | ||
+ | |||
+ | tau2s <- as.numeric(sapply(strsplit(tmp, | ||
+ | lls <- sapply(tau2s, | ||
+ | |||
+ | profile(res, | ||
+ | points(tau2s, | ||
+ | </ | ||
+ | |||
+ | Below is an animation illustrating how the algorithm converges. | ||
+ | |||
+ | {{ tips: | ||
+ | |||
+ | ==== Example 2: Convergence with Increased Number of Iterations ==== | ||
+ | |||
+ | A different starting value, threshold, and maximum number of iterations can be specified via the '' | ||
+ | |||
+ | These things can become useful when the Fisher scoring algorithm does not converge with the default settings. Let's consider an example. | ||
+ | |||
+ | <code rsplus> | ||
+ | yi <- c(1.30, 1.94, 0.70, 0.36, 1.31, 0.46, 1.24, 0.71, 0.35, 0.77) | ||
+ | vi <- c(0.640, 0.421, 0.992, 0.058, 0.756, 0.634, 0.79, 0.596, 0.457, 0.935) | ||
+ | res <- rma(yi, vi, verbose=TRUE, | ||
+ | </ | ||
+ | <code output> | ||
+ | Iteration 0 tau^2 = 0.00000 | ||
+ | Iteration 1 tau^2 = 0.17086 | ||
+ | Iteration 2 tau^2 = 0.00854 | ||
+ | ... | ||
+ | Iteration 98 tau^2 = 0.08035 | ||
+ | Iteration 99 tau^2 = 0.08354 | ||
+ | Iteration 100 tau^2 = 0.08047 | ||
+ | Error in rma(yi, vi, verbose = TRUE) : | ||
+ | Fisher scoring algorithm did not converge. See ' | ||
+ | </ | ||
+ | |||
+ | No convergence is achieved after 100 iterations. In this case, increasing the number of iterations can help. | ||
+ | |||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, verbose=TRUE, | ||
+ | </ | ||
+ | <code output> | ||
+ | Iteration 0 tau^2 = 0.00000 | ||
+ | Iteration 1 tau^2 = 0.17086 | ||
+ | Iteration 2 tau^2 = 0.00854 | ||
+ | ... | ||
+ | Iteration 248 tau^2 = 0.08197 | ||
+ | Iteration 249 tau^2 = 0.08198 | ||
+ | Iteration 250 tau^2 = 0.08197 | ||
+ | Fisher scoring algorithm converged after 250 iterations. | ||
+ | </ | ||
+ | |||
+ | It took a rather large number of iterations, but eventually convergence was achieved. An animation illustrating the progress of the algorithm is shown below. | ||
+ | |||
+ | {{ tips: | ||
+ | |||
+ | Interestingly, | ||
+ | |||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, verbose=TRUE, | ||
+ | </ | ||
+ | <code output> | ||
+ | Iteration 0 tau^2 = 0.00000 | ||
+ | Iteration 1 tau^2 = 0.08543 | ||
+ | Iteration 2 tau^2 = 0.08205 | ||
+ | Iteration 3 tau^2 = 0.08197 | ||
+ | Iteration 4 tau^2 = 0.08197 | ||
+ | Fisher scoring algorithm converged after 4 iterations. | ||
+ | </ | ||
+ | |||
+ | That wasn't so difficult after all. | ||
+ | |||
+ | ==== Example 3: Convergence with Adjusted Step Length ==== | ||
+ | |||
+ | In fact, in some cases, convergence with the Fisher scoring algorithm is only obtained after decreasing the step length. Here is another example illustrating this scenario. | ||
+ | |||
+ | <code rsplus> | ||
+ | yi <- c(0.38, 0.58, -0.90, 0.32, -0.15, -0.29, 1.13, 0.39, 0.45, 0.11) | ||
+ | vi <- c(0.599, 0.431, 0.793, 0.599, 0.483, 0.478, 0.054, 0.453, 0.772, 0.216) | ||
+ | res <- rma(yi, vi, control=list(maxiter=10000)) | ||
+ | </ | ||
+ | <code output> | ||
+ | Error in rma(yi, vi, control = list(maxiter = 10000)) : | ||
+ | Fisher scoring algorithm did not converge. See ' | ||
+ | </ | ||
+ | |||
+ | Not even 10000 iterations are not enough to get the algorithm to converge. The animation below shows what the problem is. | ||
+ | |||
+ | {{ tips: | ||
+ | |||
+ | Note how the algorithm keeps cycling through a series of values, repeatedly overshooting the target, and just never converges on the peak (I only went up to 500 iterations above, but the same thing continues to happen indefinitely). This again indicates that the step length is just too large here. Reducing the step length easily solves this problem. | ||
+ | |||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, verbose=TRUE, | ||
+ | </ | ||
+ | <code output> | ||
+ | Iteration 0 tau^2 = 0.00000 | ||
+ | Iteration 1 tau^2 = 0.25241 | ||
+ | Iteration 2 tau^2 = 0.16997 | ||
+ | Iteration 3 tau^2 = 0.16625 | ||
+ | Iteration 4 tau^2 = 0.16651 | ||
+ | Iteration 5 tau^2 = 0.16649 | ||
+ | Iteration 6 tau^2 = 0.16649 | ||
+ | Fisher scoring algorithm converged after 6 iterations. | ||
+ | </ | ||
+ | |||
+ | Much better. | ||
+ | |||
+ | ==== Using rma.mv() for Model Fitting ==== | ||
+ | |||
+ | As explained [[tips: | ||
+ | |||
+ | <code rsplus> | ||
+ | study <- 1: | ||
+ | res <- rma.mv(yi, vi, random = ~ 1 | study) | ||
+ | round(res$sigma2, | ||
+ | </ | ||
+ | <code output> | ||
+ | [1] 0.16649 | ||
+ | </ | ||
+ | |||
+ | The REML estimate of the variance component is the same as the one obtained earlier with the Fisher scoring algorithm after applying the step length adjustment. Note that minor numerical differences are possible when using different optimization routines. We can also see this when switching to a different optimizer. | ||
+ | |||
+ | <code rsplus> | ||
+ | res <- rma.mv(yi, vi, random = ~ 1 | study, control=list(optimizer=" | ||
+ | round(res$sigma2, | ||
+ | </ | ||
+ | <code output> | ||
+ | [1] 0.16645 | ||
+ | </ | ||
+ | |||
+ | ==== Maximum Likelihood Estimation ==== | ||
+ | |||
+ | The examples above all involved the use of restricted maximum-likelihood (REML) estimation, which is the default for models fitted with the '' | ||
+ | |||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, method=" | ||
+ | round(res$tau2, | ||
+ | </ | ||
+ | <code output> | ||
+ | [1] 0.13701 | ||
+ | </ | ||
+ | |||
+ | No convergence problems with the default settings (of course, the estimate of $\tau^2$ is different). So, how well the algorithm works in any particular case can also depend on which estimator of $\tau^2$ is chosen. | ||
+ | |||
+ | ==== Empirical Bayes / Paule-Mandel Estimator ==== | ||
+ | |||
+ | The empirical Bayes (EB) estimator (Berkey et al., 1995; Morris, 1983) can also only be obtained iteratively. For the same data as used above, the Fisher scoring algorithm again has difficulties converging with the default settings, but this is easily remedied by adjusting the step length. | ||
+ | |||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, method=" | ||
+ | round(res$tau2, | ||
+ | </ | ||
+ | <code output> | ||
+ | [1] 0.04575 | ||
+ | </ | ||
+ | |||
+ | The empirical Bayes estimator can actually be shown to be identical to the estimator described by Paule and Mandel (1982). However, while the EB estimator is obtained via the Fisher scoring algorithm, the Paule-Mandel (PM) estimator is obtained in a different way, making use of the '' | ||
+ | |||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, method=" | ||
+ | round(res$tau2, | ||
+ | </ | ||
+ | <code output> | ||
+ | [1] 0.04575 | ||
+ | </ | ||
+ | |||
+ | Same value up, at least up to 5 decimals places. In fact, if we increase the accuracy (i.e., lower the threshold) and round even less, we still get the same values with the EB and PM estimators. | ||
+ | |||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, method=" | ||
+ | res$tau2 | ||
+ | </ | ||
+ | <code output> | ||
+ | [1] 0.04574773 | ||
+ | </ | ||
+ | |||
+ | <code rsplus> | ||
+ | res <- rma(yi, vi, method=" | ||
+ | round(res$tau2, | ||
+ | </ | ||
+ | <code output> | ||
+ | [1] 0.04574773 | ||
+ | </ | ||
+ | |||
+ | ==== Afterword ==== | ||
+ | |||
+ | Computing the values of the various estimators described above requires the use of [[wp> | ||
+ | |||
+ | When I first started to develop the routines that now form the basis of the '' | ||
+ | |||
+ | In the end, this implies that the algorithm, despite its long history and positive representation in the literature as a generally robust and effective numerical procedure for computing ML/REML estimates of variance components (e.g., Harville, 1977; Jennrich & Sampson, 1976), also inherited the potential for the less than desirable behavior (for certain meta-analytic models) as illustrated in some of the examples above. However, in pretty much all of the cases where this issue arises, the most effective solution appears to be a reduction of the step length. Of course, I could consider making this adjustment by default, but that could lead to backwards incompatibility. And since the solution is easy to apply (and there is an easy alternative by means of the '' | ||
+ | |||
+ | Two things are important to note though: | ||
+ | |||
+ | First of all, non-convergence of the algorithm is still a relatively rare occurrence. In practice, this is rarely going to affect the user. On the other hand, simulation studies that make use of the metafor package -- where the same model is applied thousands of times to simulated data -- may require setting up ' | ||
+ | |||
+ | Second, non-convergence is not a deficiency of the estimators themselves. The ML, REML, and EB/PM estimators all have a solid theoretical basis -- it's just that they cannot be computed with a simple equation, so numerical methods must be used. And any specific numerical method can sometimes fail. But such problems can usually be overcome with simple adjustments to the algorithm or switching to another method. So, please keep this in mind when you read criticisms of iterative estimators (such as ML, REML, or EB/PM) related to non-convergence: | ||
+ | |||
+ | ==== References ==== | ||
+ | |||
+ | Berkey, C. S., Hoaglin, D. C., Mosteller, F., & Colditz, G. A. (1995). A random-effects regression model for meta-analysis. // | ||
+ | |||
+ | Brockwell, S. E., & Gordon, I. R. (2001). A comparison of statistical methods for meta-analysis. // | ||
+ | |||
+ | Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. //Journal of the American Statistical Association, | ||
+ | |||
+ | Jennrich, R. I., & Sampson, P. F. (1976). Newton-Raphson and related algorithms for maximum likelihood variance component estimation. // | ||
+ | |||
+ | Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications (with discussion). //Journal of the American Statistical Association, | ||
+ | |||
+ | National Research Council (1992). //Combining information: | ||
+ | |||
+ | Paule, R. C., & Mandel, J. (1982). Consensus values and weighting factors. //Journal of Research of the National Bureau of Standards, 87//(5), 377--385. | ||
+ | |||
+ | Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in meta-analysis: | ||
tips/convergence_problems_rma.txt · Last modified: 2022/03/26 14:50 by Wolfgang Viechtbauer