The metafor Package

A Meta-Analysis Package for R

todo

Differences

This shows you the differences between two versions of the page.

 — todo [2019/07/23 14:24] (current) Line 1: Line 1: + ===== To-Do List / Planned Features ===== + + Below is a to-do list and a description of planned features that I would like to implement in the metafor package at some point in the future. This list is dynamic, incomplete, somewhat disorganized,​ and I cannot provide any kind of timeline when I will get around to implementing some of those things. If you are interested in working on any of these topics -- or you think there are other things that really should make it into the package -- let me know and we can discuss this further. + + ==== Psychometric Meta-Analysis ==== + + At some point, I plan on adding aspects of what is commonly called the Hunter and Schmidt (H&S) method for meta-analysis. To find out more about this approach, you could take a look at the book: + + [[http://​www.amazon.com/​Methods-Meta-Analysis-Correcting-Research-Findings/​dp/​1452286892/​|Schmidt,​ F. L., & Hunter, J. E. (2014). Methods of meta-analysis:​ Correcting error and bias in research findings (3rd ed.). Sage.]] + + While the principles underlying this method can be applied to various outcome measures used in meta-analyses,​ it is most commonly used when meta-analyzing correlation coefficients and is widely used in particular areas of research (especially in [[wp>​Industrial_and_organizational_psychology|industrial and organization psychology]] and related areas). The crucial idea is that we want to correct the observed correlation coefficients for various artifacts (e.g., [[wp>​Correction_for_attenuation|attenuation due to measurement error]]), so that the aggregated results reflect the direction and strength of the relationship between the two variables of interest if they could be measured under ideal circumstances (e.g., free of any artifacts such as measurement error). + + There are various aspects to the H&S method that deviate a bit from the statistical/​theoretical framework underlying the ''​rma()''​ function (it would take some time to go into the specifics, so I might write up a more detailed description at some point in the future separately). So, this implies two approaches one can take here: + + - Write a completely different set of functions that exactly replicate the H&S method (and the corresponding software). + - Make it easy for users to specify the necessary information to correct correlations individually for various artifacts and then feed those corrected correlations into the modeling functions already available. Some tweaks/​adjustments are possible even at the moment to obtain results from a '​bare-bones meta-analysis'​ that are quite similar to those obtained with the H&S method. + + Option 1 would take a lot more work and would require separately maintaining that part of the package together with the rest. The advantage of course would be that users who really want to carry out the H&S method "by the letter"​ can switch to R. + + Option 2 is what I am currently aiming for, because it is easier to implement and fits into the existing package framework. For example, to correct a correlation coefficient for measurement error, the user would need to specify, besides the coefficient and the sample size, also the reliability estimate(s) (for one or both variables). The rest is just the correction for attenuation. This option does not preclude that, at some point I (or anybody else) would write code to implement a line-by-line reimplementation of the H&S method. But that is currently not at the top of my to-do list. + + Note that there is now a discussion of the H&S method under the tips section (see: [[tips:​hunter_schmidt_method|]]). Also, there is now the [[https://​cran.r-project.org/​package=psychmeta|psychmeta]] package, which is specifically geared towards psychometric meta-analyses. + + ==== Bayesian Meta-Analysis ==== + + I get this question quite a bit and the answer is: Yes, I definitely would like to implement methods for fitting meta-analytic models using Bayesian methods. In fact, I had developed code for this a long time ago (at the time, interfacing with [[http://​www.mrc-bsu.cam.ac.uk/​software/​bugs/​the-bugs-project-winbugs/​|WinBUGS]]),​ but it never quite made it into the package and then I got side-tracked with getting other functionality implemented first. I picked up that idea at some point, then started to develop code that would work with [[http://​mcmc-jags.sourceforge.net/​|JAGS]] (I would in fact prefer a solution that will work across all platforms), and got side-tracked again. So much to do, so little time ... + + There are a couple issues: + + - As a start, I will implement this for models that can be fitted with the ''​rma()''​ function (i.e., '​normal-normal'​ models). + - What choice of priors should be available? For the model coefficients (e.g., $\mu$ in a random-effects model), most people seem to use normal distributions (usually vague ones, with large variances). What if somebody is analyzing proportions or correlations?​ Those are bounded between [0,1] and [-1,1], respectively -- the normal distribution doesn'​t quite make sense here (leaving aside the issue whether one should meta-analyze raw proportions and raw correlations to begin with; there are obvious transformations that would make this issue moot). How about priors for $\tau^2$? There are [[http://​www.ncbi.nlm.nih.gov/​pubmed/​16015676|lots of choices]] and they can have an impact (but some are more realistic than others, as discussed by [[http://​www.ncbi.nlm.nih.gov/​pubmed/​16906552|Senn]]). So, my thinking is to have a default (with maybe a few alternative choices), but allow a sort of '​plug-in'​ interface, where users can specify 'prior functions'​. That will provide the largest flexibility,​ but will take some time to develop. + - Even with '​normal-normal'​ models, there are some subtle decisions one has to make when using a Bayesian approach. For example, suppose one is fitting a Bayesian '​normal-normal'​ model to logit-transformed proportions. The sampling variance of ''​logit(pi)''​ is ''​1/​(ni*pi*(1-pi))''​. At least, that's the large-sample approximation to ''​var(logit(pi))''​. But ''​var(logit(pi))''​ depends on ''​pi''​ itself. So, one can either compute the sampling variances and leave them as they are or, during the Gibbs sampling, update the sampling variance estimate for each study based on the (back-transformed) predicted ''​logit(pi)''​ value. This can have quite some impact on the results. + - How about other models besides the '​normal-normal'​ ones? In particular, Bayesian alternatives to the models that can be fitted with the ''​rma.glmm()''​ function. That's going to take more time. Also, do I put this into the same function or write separate functions for that? Hmmm, choices, choices. + - And then there is the ''​rma.mv()''​ function. Multilevel, multivariate,​ and especially network meta-analyses are the 'hot new things'​ these days (and in some areas not even so hot anymore) and those types of models can be fitted with this function. But ''​rma.mv()''​ is already a monster to begin with (it's quite flexible, but that flexibility frequently induces headaches when I stare at the code -- and I was the one who wrote it!) and writing a Bayesian analogue to it makes me wake up at night drenched in sweat and still shaking from the terrors that lie beneath. + + And I probably forgot some important issues ... + + ==== Extending rma.glmm() to Multilevel/​Multivariate Models ==== + + At the moment, the ''​rma.glmm()''​ function only handles relatively standard random/​mixed-effects (meta-regression) models -- well, to be honest, most meta-analyses that could benefit from the underlying functionality implemented in that function still stick to simpler approximations,​ despite the fact that there are good reasons to switch. Anyway, it would be nice if ''​rma.glmm()''​ could be extended to handle additional random effects (the way ''​rma.mv()''​ extends the functionality of the ''​rma()''​ function). While this certainly would be useful, most of the code of the ''​rma.glmm()''​ function is just a bunch of data manipulation and pre/​post-processing,​ so that other functions/​packages (most notably, ''​glm()''​ and the ''​glmer()''​ function from the excellent [[http://​cran.r-project.org/​web/​packages/​lme4/​index.html|lme4]] package) can be used for the model fitting (the exception to this is the conditional logistic model, which results when you condition on the total number of events in each 2x2 table, so that the data in the tables can be modeled with non-central hypergeometric distributions -- this I had to implement, in a rather brute-force manner, more or less from scratch). So, if you want to fit more complicated models based on the same underlying logic, it would probably just make sense to switch to ''​lme4''​ (or similar/​related packages) directly, instead of waiting for me to extend the ''​rma.glmm()''​ function (which may not happen any time soon, as I frankly do not see such a pressing need here -- but this may change in the future). + + ==== Alternative Graphing Systems ==== + + Right now, all of the figures and graphs that can be created with the package are drawn using 'base graphics'​. Of course, there are [[http://​cran.r-project.org/​web/​views/​Graphics.html|alternative graphing systems]], most notably [[http://​cran.r-project.org/​web/​packages/​lattice/​index.html|lattice]] (authored by Deepayan Sarkar) and [[http://​cran.r-project.org/​web/​packages/​ggplot2/​index.html|ggplot2]] (authored by Hadley Wickham) -- and especially the latter has become extremely popular in recent years. + + Fun fact: The grammar of graphics, which ''​ggplot2''​ is based on, was written/​developed by [[wp>​Leland_Wilkinson|Leland Wilkinson]],​ who developed the [[wp>​SYSTAT_(statistics)|SYSTAT]] statistical package (which I used for a bit during grad school) and worked for [[http://​www-01.ibm.com/​software/​analytics/​spss/​|SPSS]],​ which some may consider the complete antithesis of what R represents. And while we are at it, the Statistical Computing and Graphics sections of the American Statistical Association have an interesting [[http://​stat-graphics.org/​movies/​|video library]] showcasing some of the history of systems for dynamic data visualizations. + + At any rate, it would be nice to provide the option to create all figures and graphs in metafor using such alternative systems, especially given some of the limitations of the current approach (for a nice discussion of the underlying issue, see the article "​What'​s in a Name: The Importance of Naming grid Grobs When Drawing Plots in R" by Paul Murrell in the [[http://​journal.r-project.org/​archive/​2012-2/​|R Journal, Volume 4/2, December 2012]]. + + ==== Selection Models ==== + + I am considering adding at least some version of a selection model to the package. But then again, there are the [[http://​cran.r-project.org/​web/​packages/​metasens/​index.html|metasens]] and [[http://​cran.r-project.org/​web/​packages/​selectMeta/​index.html|selectMeta]] packages, so this isn't all that pressing. But I have some ideas based on a recent model proposed by [[http://​www.air.org/​person/​martyna-citkowicz|Martyna Citkowicz]] and [[http://​faculty.ucmerced.edu/​jvevea/​|Jack Vevea]] that I think would make for a nice addition, but I am waiting for the paper to appear in press to consider this is in more detail. + + ==== Diagnostic Meta-Analysis Models ==== + + The synthesis of diagnostic studies is of course a major application of meta-analytic methodology. Measures such as sensitivity and specificity can of course be easily meta-analyzed with the metafor package, but for things like the hierarchical summary receiver operating characteristic (HSROC) model and the binomial-normal version of the bivariate model (the normal-normal bivariate model can be handled with ''​rma.mv()''​),​ one currently has to turn to other packages (see the [[http://​cran.r-project.org/​web/​views/​MetaAnalysis.html|CRAN Task View: Meta-Analysis]] for some options). Then again, no need to duplicate existing functionality,​ so this is also not too high on my priority list. + + ==== Miscellaneous Features and Things To Do ==== + + Lots of other/​smaller things: + + * The ''​anova()''​ function should really automatically test appropriate subsets of coefficients for model objects returned by the ''​rma()'',​ ''​rma.glmm()'',​ and ''​rma.mv()''​ functions (like it does for other model fitting functions). I kind of messed up when I made some early design choices that now make implementing this functionality more tricky (if I don't want to break backwards compatibility or rewrite large chunks of code from scratch). + + * Make ''​anova()''​ work for ''​rma.glmm''​ objects. + + * The ''​rma()''​ function may be extended to fit so-called '​location-scale'​ models at some point in the future. Just implementing this isn't really sufficient though -- most people wouldn'​t know what that even means or how it can be useful, so I would need to write an entire article about this method and illustrate its use. + + * <​del>​The ''​rma.mv()''​ function should allow for one (or multiple?) //​continuous//​ ''​inner''​ variables in ''​random = ~ inner | outer''​ for modeling random slopes of moderator variables. This is especially relevant for meta-analyses examining '​dose-response'​ relationships (assuming there are studies examining the responses to more than two levels of a dose/​exposure variable).​ (DONE!) + + * Compute SEs of variance components (and covariances/​correlations) in ''​rma.mv()''​. Not that those SEs are really all that useful. <​del>​Probably more important: Make ''​confint()''​ work with ''​rma.mv''​ objects, to provide profile likelihood CIs.​ (DONE!). + + * Also, make ''​blup()''​ <​del>​and ''​ranef()''​ work with ''​rma.mv''​ objects (and ''​rma.glmm''​ objects?). + + * <​del>​Allow ''​ID''​ (for a scaled identity matrix) and ''​DIAG''​ (for a diagonal matrix) for the ''​struct''​ argument in ''​rma.mv()''?​ But setting ''​rho=0''​ with ''​struct="​CS"''​ and ''​struct="​HCS"''​ does the same thing, so is this really necessary?​ (DONE!). + + * <​del>​Maybe add a //​continuous//​ AR1 structure to ''​rma.mv()''​ (DONE!). + + * <​del>​Maybe add a ''​ranef()''​ function (with alias ''​random.effects()''​ as in the [[http://​cran.r-project.org/​web/​packages/​nlme/​index.html|nlme]] package) to just extract the predicted random effects (i.e., free of the fixed effects).​ (DONE!) + + * The ''​rma.glmm()''​ function currently uses ±1/2 coding for the random effects when ''​model="​UM.FS"''​ (i.e., an unconditional model with fixed study effects). Maybe provide an option that allows for switching to 0/1 coding? + + * <​del>​Add the difference, ratio, and odds ratio based on //paired// proportions as outcome measures to ''​escalc()''​.​ (DONE!) + + * <​del>​Add the partial and semi-partial correlation as outcome measures to ''​escalc()''​.​ (DONE!) + + * <​del>​Add measures of variability (variance, standard deviation, and the difference/​ratio thereof) as outcome measure to ''​escalc()''​. See also the recent article by [[http://​onlinelibrary.wiley.com/​doi/​10.1111/​2041-210X.12309/​abstract|Nakagawa et al. (2015)]].​ (DONE!) + + * Maybe allow optimization in ''​rma.mv()''​ (and while we are at it, also ''​rma.uni()''​) to also include the fixed effects (instead of profiling out the fixed effects). Then we can profile over the fixed effects to obtain profile likelihood CIs and LRTs for the model coefficients as well. + + * Consider making use of some functionality from the [[http://​cran.r-project.org/​web/​packages/​matrixStats/​index.html|matrixStats]] package. + + * Allow '​textual'​ specifications of contrasts/​hypotheses via the ''​L''​ argument in the ''​anova()''​ function? (as in the [[http://​cran.r-project.org/​web/​packages/​car/​index.html|car]] package). + + * Make metafor compatible with the [[https://​cran.r-project.org/​package=lsmeans|lsmeans]] and/or [[https://​cran.r-project.org/​package=effects|effects]] package(s). + + * Add more analysis examples to this website, especially for some network meta-analyses. + + * Write more tests. Really. I mean it. Lots of them. + + ==== Some Additional Thoughts/​Considerations ==== + + * When I get the chance, I would like to spend some time examining to what extent using other math libraries speeds things up, especially for ''​rma.mv()''​ and large datasets. Most of the heavy lifting in metafor in fact involves matrix algebra, so maybe some real gains could be obtained this way. But this is only relevant if you are dealing with relatively large datasets (in the meta-analytic context, anything beyond a couple hundred effect size estimates is //huge//!). Note: I added an entry to the tips section on [[tips:​speeding_up_model_fitting|Speeding Up Model Fitting]] that addresses this issue to some extent. + + * Alternatively,​ consider writing some of the slower parts directly in C++ (via [[http://​cran.r-project.org/​web/​packages/​Rcpp/​index.html|Rccp]];​ see also [[ http://​rcpp.org/​]] and [[http://​adv-r.had.co.nz/​Rcpp.html]]). + + * Various things are currently computed for ''​method="​FE"''​ as if we are fitting an equal-effects model (e.g., residuals, log likelihood, and so on), when in fact the fixed-effects model is more general. But making the appropriate changes would lead to backwards incompatibility,​ would just confuse most users, and it's not clear if all of the necessary adjustments can be easily implemented.