Differences

This shows you the differences between two versions of the page.

--- tips:model_selection_with_glmulti_and_mumin [2022/08/03 12:51] – Wolfgang Viechtbauer
+++ tips:model_selection_with_glmulti_and_mumin [2022/09/25 10:45] – Wolfgang Viechtbauer
@@ Line 1: / Line 1: @@
 ===== Model Selection using the glmulti and MuMIn Packages =====
-Information-theoretic approaches provide methods for model selection and (multi)model inference that differ quite a bit from more traditional methods based on null hypothesis testing (e.g., Anderson, 2007; Burnham & Anderson, 2002). These methods can also be used in the meta-analytic context when model fitting is based on likelihood methods. Below, I illustrate how to use the metafor package in combination with the [[https://cran.r-project.org/package=glmulti|glmulti]] and [[https://cran.r-project.org/package=MuMIn|MuMIn]] packages that provides the necessary functionality for model selection and multimodel inference using an information-theoretic approach.
+Information-theoretic approaches provide methods for model selection and (multi)model inference that differ quite a bit from more traditional methods based on null hypothesis testing (e.g., Anderson, 2008; Burnham & Anderson, 2002). These methods can also be used in the meta-analytic context when model fitting is based on likelihood methods. Below, I illustrate how to use the metafor package in combination with the [[https://cran.r-project.org/package=glmulti|glmulti]] and [[https://cran.r-project.org/package=MuMIn|MuMIn]] packages that provide the necessary functionality for model selection and multimodel inference using an information-theoretic approach.
 ==== Data Preparation ====
-For the example, I will use data from the meta-analysis by Bangert-Drowns et al. (2004) on the effectiveness of school-based writing-to-learn interventions on academic achievement (''help(dat.bangertdrowns2004)'' provides a bit more background). The data can be loaded with:
+For the example, I will use data from the meta-analysis by Bangert-Drowns et al. (2004) on the effectiveness of school-based writing-to-learn interventions on academic achievement (''help(dat.bangertdrowns2004)'' provides a bit more background on this dataset). The data can be loaded with:
 <code rsplus>
 library(metafor)
@@ Line 43: / Line 43: @@
   * length: treatment length (in weeks)
-  * wic: writing in class (0 = no; 1 = yes)
+  * wic: writing tasks were completed in class (0 = no; 1 = yes)
-  * feedback: feedback (0 = no; 1 = yes)
+  * feedback: feedback on writing was provided (0 = no; 1 = yes)
   * info: writing contained informational components (0 = no; 1 = yes)
   * pers: writing contained personal components (0 = no; 1 = yes)
   * imag: writing contained imaginative components (0 = no; 1 = yes)
-  * meta: prompts for metacognitive reflection (0 = no; 1 = yes)
+  * meta: prompts for metacognitive reflection were given (0 = no; 1 = yes)
 More details about the meaning of these variables can be found in Bangert-Drowns et al. (2004). For the purposes of this illustration, it is sufficient to understand that we have 7 variables that are potentially (and a priori plausible) predictors of the size of the treatment effect. As we will fit various models to these data (containing all possible subsets of these 7 variables), and we need to keep the data being included in the various models the same across models, we will remove rows where at least one of the values of these 7 moderator variables is missing. We can do this with:
@@ Line 59: / Line 59: @@
 ==== Model Selection ====
-We will now examine the fit and plausibility of various models, focusing on models that contain none, one, and up to seven (i.e., all) of these moderator variables. For this, we need to install and load the glmulti package and define a function that takes a model formula and a dataset as input and then fits a random/mixed-effects meta-regression model to the given data using maximum likelihood estimation:
+We will now examine the fit and plausibility of various models, focusing on models that contain none, one, and up to seven (i.e., all) of these moderator variables. For this, we install and load the glmulti package and define a function that (a) takes a model formula and dataset as input and (b) then fits a mixed-effects meta-regression model to the given data using maximum likelihood estimation:
 <code rsplus>
 install.packages("glmulti")
@@ Line 101: / Line 101: @@
 {{ tips:model_selection_ic_values.png?nolink }}
-The horizontal red line differentiates between models whose AICc value is less versus more than 2 units away from that of the "best" model (i.e., the model with the lowest AICc). The output above shows that there are 10 such models. Sometimes this is taken as a cutoff, so that models with values more than 2 units away are considered substantially less plausible than those with AICc values closer to that of the best model. However, we should not get too hung up about such (somewhat arbitrary) divisions (and there are critiques of this rule; e.g., Anderson, 2007).
+The horizontal red line differentiates between models whose AICc value is less versus more than 2 units away from that of the "best" model (i.e., the model with the lowest AICc). The output above shows that there are 10 such models. Sometimes this is taken as a cutoff, so that models with values more than 2 units away are considered substantially less plausible than those with AICc values closer to that of the best model. However, we should not get too hung up about such (somewhat arbitrary) divisions (and there are critiques of this rule; e.g., Anderson, 2008).
 But let's take a look at the top 10 models:
@@ Line 123: / Line 123: @@
 </code>
-We see that the "best" model is the one that only includes ''imag'' as a moderator. The second best includes ''imag'' and ''meta''. And so on. The values under ''weights'' are the model weights (also called "Akaike weights"). From an information-theoretic perspective, the Akaike weight for a particular model can be regarded as the probability that the model is the best model (in a Kullback-Leibler sense of minimizing the loss of information when approximating full reality by a fitted model) out of all of the models considered/fitted. So, while the "best" model has the highest weight/probability, its weight in this example is not substantially larger than that of the second model (and also the third, fourth, and so on). So, we shouldn't be all too certain here that the top model is really //the// best model in the set. Several models are almost equally plausible (in other examples, one or two models may carry most of the weight, but not here).
+We see that the "best" model is the one that only includes ''imag'' as a moderator. The second best includes ''imag'' and ''meta''. And so on. The values under ''weights'' are the model weights (also called "Akaike weights"). From an information-theoretic perspective, the Akaike weight for a particular model can be regarded as the probability that the model is the best model (in a Kullback-Leibler sense of minimizing the loss of information when approximating full reality or the real data generating mechanism by a fitted model) out of all of the models considered/fitted. So, while the "best" model has the highest weight/probability, its weight in this example is not substantially larger than that of the second model (and also the third, fourth, and so on). So, we shouldn't be all too certain here that the top model is really //the// best model in the set. Several models are almost equally plausible (in other examples, one or two models may carry most of the weight, but not here).
 So, we could now examine the "best" model with:
@@ Line 178: / Line 178: @@
 Now we can carry out the computations for multimodel inference with:
 <code rsplus>
-coef(res)
+coef(res, varweighting="Johnson")
 </code>
 The output is not shown, because I don't find it very intuitive. But with a bit of extra code, we can make it more interpretable:
 <code rsplus>
-mmi <- as.data.frame(coef(res))
+mmi <- as.data.frame(coef(res, varweighting="Johnson"))
 mmi <- data.frame(Estimate=mmi$Est, SE=sqrt(mmi$Uncond), Importance=mmi$Importance, row.names=row.names(mmi))
 mmi$z <- mmi$Estimate / mmi$SE
@@ Line 195: / Line 195: @@
 <code output>
          Estimate Std. Error z value Pr(>|z|)   ci.lb  ci.ub Importance
-intrcpt    0.1084     0.0956  1.1338   0.2569 -0.0790 0.2958     1.0000
+intrcpt    0.1084     0.1031  1.0514   0.2931 -0.0937 0.3105     1.0000
-imag       0.3512     0.1895  1.8528   0.0639 -0.0203 0.7226     0.8478
+imag       0.3512     0.2016  1.7414   0.0816 -0.0441 0.7464     0.8478
-meta       0.0512     0.0767  0.6676   0.5044 -0.0991 0.2015     0.4244
+meta       0.0512     0.0853  0.6003   0.5483 -0.1160 0.2184     0.4244
-feedback   0.0366     0.0607  0.6029   0.5466 -0.0824 0.1556     0.3671
+feedback   0.0366     0.0689  0.5311   0.5954 -0.0985 0.1717     0.3671
-length     0.0023     0.0041  0.5490   0.5830 -0.0058 0.0104     0.3255
+length     0.0023     0.0050  0.4527   0.6508 -0.0076 0.0121     0.3255
-pers       0.0132     0.0447  0.2964   0.7669 -0.0743 0.1008     0.2913
+pers       0.0132     0.0688  0.1925   0.8473 -0.1216 0.1481     0.2913
-wic       -0.0170     0.0393 -0.4323   0.6655 -0.0941 0.0601     0.2643
+wic       -0.0170     0.0545 -0.3122   0.7549 -0.1238 0.0897     0.2643
-info      -0.0183     0.0516 -0.3539   0.7234 -0.1195 0.0829     0.2416
+info      -0.0183     0.0799 -0.2286   0.8191 -0.1749 0.1384     0.2416
 </code>
-I rounded the results to 4 digits to make the results easier to interpret. Note that the table again includes the importance values. In addition, we get unconditional estimates of the model coefficients (first column). These are model-averaged parameter estimates, which are weighted averages of the model coefficients across the various models (with weights equal to the model probabilities). These values are called "unconditional" as they are not conditional on any one model (but they are still conditional on the 128 models that we have fitted to these data; but not as conditional as fitting a single model and then making all inferences conditional on that one single model). Moreover, we get estimates of the unconditional standard errors of these model-averaged values. These standard errors take two sources of uncertainty into account: (1) uncertainty within a given model (i.e., the standard error of a particular model coefficient shown in the output when fitting a model; as an example, see the output from the "best" model shown earlier) and (2) uncertainty with respect to which model is actually the best approximation to reality (so this source of variability examines how much the size of a model coefficient varies across the set of candidate models). The model-averaged parameter estimates and the unconditional standard errors can be used for multimodel inference, that is, we can compute z-values, p-values, and confidence interval bounds for each coefficient in the usual manner.
+I rounded the results to 4 digits to make the results easier to interpret. Note that the table again includes the importance values. In addition, we get unconditional estimates of the model coefficients (first column). These are model-averaged parameter estimates, which are weighted averages of the model coefficients across the various models (with weights equal to the model probabilities). These values are called "unconditional" as they are not conditional on any one model (but they are still conditional on the 128 models that we have fitted to these data; but not as conditional as fitting a single model and then making all inferences conditional on that one single model). Moreover, we get estimates of the unconditional standard errors of these model-averaged values.((Above, we used ''varweighting="Johnson"'' so equation 6.12 from Burnham and Anderson (2002) is used for computing the standard errors (instead of 4.9), which Anderson (2008) recommends.)) These standard errors take two sources of uncertainty into account: (1) uncertainty within a given model (i.e., the standard error of a particular model coefficient shown in the output when fitting a model; as an example, see the output from the "best" model shown earlier) and (2) uncertainty with respect to which model is actually the best approximation to reality (so this source of variability examines how much the size of a model coefficient varies across the set of candidate models). The model-averaged parameter estimates and the unconditional standard errors can be used for multimodel inference, that is, we can compute z-values, p-values, and confidence interval bounds for each coefficient in the usual manner.
 ==== Multimodel Predictions ====
@@ Line 280: / Line 280: @@
 So, the candidate set would include over $2 \times 10^8$ possible models. Fitting all of these models would not only test our patience (and would be a waste of valuable CPU cycles), it would also be a pointless exercise (even fitting the 128 models above could be critiqued by some as a mindless hunting expedition -- although if one does not get too fixated on //the// best model, but considers all the models in the set as part of a multimodel inference approach, this critique loses some of its force). So, I won't consider this any further in this example.
-==== Other Model Types ====
-The same principle can of course be applied when fitting other types of models, such as those that can be fitted with the ''rma.mv()'' or ''rma.glmm()'' functions. One just has to write an appropriate ''rma.glmulti'' function.
-For multivariate/multilevel models fitted with the ''rma.mv()'' function, one can also consider model selection with respect to the random effects structure. Making this work would require a bit more work. Time permitting, I might write up an example illustrating this at some point in the future.
 ==== Using the MuMIn Package ====
@@ Line 331: / Line 325: @@
 Multimodel inference can be done with:
 <code rsplus>
-summary(model.avg(res, revised.var=FALSE))
+summary(model.avg(res))
 </code>
 <code output>
@@ Line 337: / Line 331: @@
 (full average)
           Estimate Std. Error z value Pr(>|z|)
-intrcpt   0.108404   0.095613   1.134   0.2569
+intrcpt   0.108404   0.103105   1.051   0.2931
-imag      0.351153   0.189530   1.853   0.0639 .
+imag      0.351153   0.201648   1.741   0.0816 .
-meta      0.051201   0.076700   0.668   0.5044
+meta      0.051201   0.085290   0.600   0.5483
-feedback  0.036604   0.060711   0.603   0.5466
+feedback  0.036604   0.068926   0.531   0.5954
-length    0.002272   0.004138   0.549   0.5830
+length    0.002272   0.005019   0.453   0.6508
-wic      -0.017004   0.039337   0.432   0.6655
+wic      -0.017004   0.054466   0.312   0.7549
-pers      0.013244   0.044679   0.296   0.7669
+pers      0.013244   0.068788   0.193   0.8473
-info     -0.018272   0.051631   0.354   0.7234
+info     -0.018272   0.079911   0.229   0.8191
 </code>
-I have removed some of the output, since this is the part we are most interested in. These are the same results as in object ''mmi'' shown earlier. Note that, by default, ''model.avg()'' uses a slightly different equation for computing the unconditional standard errors. To get the same results as we obtained with glmulti, I set ''revised.var=FALSE''.
+I have removed some of the output, since this is the part we are most interested in. These are the same results as in object ''mmi'' shown earlier.
 Finally, the relative importance values for the predictors can be obtained with:
 <code rsplus>
-importance(res)
+sw(res)
 </code>
 <code output>
@@ Line 358: / Line 352: @@
 </code>
 These are again the same values we obtained earlier.
+==== Other Model Types ====
+The same principle can of course be applied when fitting other types of models, such as those that can be fitted with the ''rma.mv()'' or ''rma.glmm()'' functions. One just has to write an appropriate ''rma.glmulti'' function when using the glmulti package (when using the MuMIn package, one simply applies the ''dredge()'' function to the full model). An example illustrating the use of model selection with a fairly complex ''rma.mv()'' model can be found [[https://gist.github.com/wviechtb/891483eea79da21d057e60fd1e28856b|here]] (the example involves only a small number of predictors, but also considers their interaction and also illustrates using parallel processing when using the ''dredge()'' function).
+For multivariate/multilevel models fitted with the ''rma.mv()'' function, one can also consider model selection with respect to the random effects structure. Making this work would require a bit more work. Time permitting, I might write up an example illustrating this at some point in the future.
 ==== References ====
-Anderson, D. R. (2007). //Model based inference in the life sciences: A primer on evidence//. New York: Springer.
+Anderson, D. R. (2008). //Model based inference in the life sciences: A primer on evidence//. New York: Springer.
 Bangert-Drowns, R. L., Hurley, M. M., & Wilkinson, B. (2004). The effects of school-based writing-to-learn interventions on academic achievement: A meta-analysis. //Review of Educational Research, 74//(1), 29–58.