Table of Contents
Interpreting Coefficients in Meta-Regression Models with (Log) Risk Ratios
In this tutorial, we will examine ways of interpreting the coefficients in meta-regression models when the log risk ratio is used as the effect size measure.
Data Preparation
For this illustration, I will use (once again) the BCG vaccine dataset. To start, let's compute the log risk ratios (and corresponding sampling variances) for each study. For this illustration, we will compute the log risk ratios in such a way that they reflect the difference in the (log-transformed) risk of a tuberculosis infection in those from the control (i.e., not vaccinated) group compared to those from the treatment (i.e., vaccinated) group:1)
library(metafor) dat <- escalc(measure="RR", ai=cpos, bi=cneg, ci=tpos, di=tneg, data=dat.bcg) dat
trial author year tpos tneg cpos cneg ablat alloc yi vi 1 1 Aronson 1948 4 119 11 128 44 random 0.8893 0.3256 2 2 Ferguson & Simes 1949 6 300 29 274 55 random 1.5854 0.1946 3 3 Rosenthal et al 1960 3 228 11 209 42 random 1.3481 0.4154 4 4 Hart & Sutherland 1977 62 13536 248 12619 52 random 1.4416 0.0200 5 5 Frimodt-Moller et al 1973 33 5036 47 5761 13 alternate 0.2175 0.0512 6 6 Stein & Aronson 1953 180 1361 372 1079 44 alternate 0.7861 0.0069 7 7 Vandiviere et al 1973 8 2537 10 619 19 random 1.6209 0.2230 8 8 TPT Madras 1980 505 87886 499 87892 13 random -0.0120 0.0040 9 9 Coetzee & Berjak 1968 29 7470 45 7232 27 random 0.4694 0.0564 10 10 Rosenthal et al 1961 17 1699 65 1600 42 systematic 1.3713 0.0730 11 11 Comstock et al 1974 186 50448 141 27197 18 systematic 0.3394 0.0124 12 12 Comstock & Webster 1969 5 2493 3 2338 33 systematic -0.4459 0.5325 13 13 Comstock et al 1976 27 16886 29 17825 33 systematic 0.0173 0.0714
The yi
values are the observed log risk ratios for the 13 trials, reflecting the (usually) higher risk of an infection in those who did NOT receive the BCG vaccine compared to those who did (while the vi
values are the corresponding sampling variances).
Meta-Regression with a Categorical Moderator
Now let's fit a meta-regression model to these outcomes with the three-level alloc
factor as a moderator:
res <- rma(yi, vi, mods = ~ factor(alloc), data=dat) res
Mixed-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.3615 (SE = 0.2111) tau (square root of estimated tau^2 value): 0.6013 I^2 (residual heterogeneity / unaccounted variability): 88.77% H^2 (unaccounted variability / sampling variability): 8.91 R^2 (amount of heterogeneity accounted for): 0.00% Test for Residual Heterogeneity: QE(df = 10) = 132.3676, p-val < .0001 Test of Moderators (coefficients 2:3): QM(df = 2) = 1.7675, p-val = 0.4132 Model Results: estimate se zval pval ci.lb ci.ub intrcpt 0.5180 0.4412 1.1740 0.2404 -0.3468 1.3827 factor(alloc)random 0.4478 0.5158 0.8682 0.3853 -0.5632 1.4588 factor(alloc)systematic -0.0890 0.5600 -0.1590 0.8737 -1.1867 1.0086 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
While the factor as a whole is not significant (as we can see from the $Q_M$-test), we will ignore this for this illustration. So, how do we interpret the coefficients from this model? To start, the intercept is the estimated average log risk ratio for alternate
(alternating) allocation (since we see coefficients for random
and systematic
in the model, we know that alternate
was automatically chosen as the reference level). However, a log risk ratio is a bit difficult to interpret, so we can exponentiate this coefficient to obtain an estimate of the average risk ratio:2)
exp(coef(res)[[1]])
[1] 1.678593
To also obtain the corresponding confidence interval (CI), we can use the predict()
function (and we might as well round the results to two decimal places, because anything beyond that is probably just noise):
predict(res, newmods=c(0,0), transf=exp, digits=2)
pred ci.lb ci.ub pi.lb pi.ub 1.68 0.71 3.99 0.39 7.24
Hence, the estimated average risk ratio (of a tuberculosis infection in those NOT receiving the BCG vaccine compared to those who did) in trials using alternating allocation is 1.68 (with 95% CI: 0.71 to 3.99). Another way of expressing the same result is to say that the risk was on average 1.68 times (or 68%) greater in those who did not receive the vaccine compared to those who did.
The other two coefficients of the model reflect the difference in the (average) log risk ratio when comparing random
and systematic
allocation with alternate
allocation. Hence, if we want to know what the average risk ratio for random allocation was, we just add the second coefficient to the intercept and exponentiate the result:
exp(coef(res)[[1]] + coef(res)[[2]])
[1] 2.62682
Again, using the predict()
function also provides us with the CI:
predict(res, newmods=c(1,0), transf=exp, digits=2)
pred ci.lb ci.ub pi.lb pi.ub 2.63 1.56 4.44 0.72 9.54
Note that predict()
(by default) automatically adds the intercept term, so using newmods=c(1,0)
gives us the linear combination intercept + coefficient for random * 1 + coefficient for systematic * 0
which is then exponentiated to obtain the desired result. Hence, under this form of allocation, we estimate an average risk ratio of 2.63 (95% CI: 1.56 to 4.44) or a 2.63 times greater risk in non-vaccinated individuals.
The average risk ratio was therefore greater under random compared to alternating allocation. How much greater? $2.63 / 1.68 \approx 1.56$ times greater! This is a ratio of risk ratios (RRR), which we can also obtain directly by exponentiating the coefficient for random allocation:
exp(coef(res)[[2]])
[1] 1.564894
To get the CI for this value, we can again use the predict()
function, but now we just want to pick out the coefficient for random allocation and not add the intercept, so we use:
predict(res, newmods=c(1,0), intercept=FALSE, transf=exp, digits=2)
pred ci.lb ci.ub pi.lb pi.ub 1.56 0.57 4.30 0.33 7.39
Hence, the average risk ratio for random allocation is 1.56 times greater compared to alternating allocation (95% CI: 0.57 to 4.30).
To illustrate that this is indeed a ratio of the two risk ratios for these two subgroups of trials, let's fit a random-effects model in the two subgroups:
res.a <- rma(yi, vi, data=dat, subset=alloc=="alternate", tau2=res$tau2) res.r <- rma(yi, vi, data=dat, subset=alloc=="random", tau2=res$tau2)
We need to set $\tau^2$ to the value from the meta-regression model (otherwise, due to different $\tau^2$ values in the two subgroups, the following will not give us the same result). Now we can obtain the estimated average risk ratio via exponentiation in each subgroup and take the ratio:
exp(coef(res.a)) / exp(coef(res.r))
[1] 1.564894
This is the exact same value we obtained above. Hence, we see that we are indeed getting an estimate of the ratio of two risk ratios when we exponentiate a coefficient from the meta-regression model.
Meta-Regression with a Continuous Moderator
The same principle also applies when the model involves a continuous moderator. Let's use the absolute latitude (i.e., the distance from the equator) as such a moderator:
res <- rma(yi, vi, mods = ~ ablat, data=dat) res
Mixed-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.0764 (SE = 0.0591) tau (square root of estimated tau^2 value): 0.2763 I^2 (residual heterogeneity / unaccounted variability): 68.39% H^2 (unaccounted variability / sampling variability): 3.16 R^2 (amount of heterogeneity accounted for): 75.62% Test for Residual Heterogeneity: QE(df = 11) = 30.7331, p-val = 0.0012 Test of Moderators (coefficient 2): QM(df = 1) = 16.3571, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub intrcpt -0.2515 0.2491 -1.0095 0.3127 -0.7397 0.2368 ablat 0.0291 0.0072 4.0444 <.0001 0.0150 0.0432 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Based on this model, we therefore estimate that the average risk ratio at the equator is:3)
predict(res, newmods=0, transf=exp, digits=2)
pred ci.lb ci.ub pi.lb pi.ub 0.78 0.48 1.27 0.38 1.61
The coefficient for ablat
reflects the change in the average log risk ratio for a one-unit (i.e., degree) increase in absolute latitude. Hence, for example at 30 degrees, the estimated average risk ratio is:
predict(res, newmods=30, transf=exp, digits=2)
pred ci.lb ci.ub pi.lb pi.ub 1.86 1.51 2.30 1.04 3.33
This risk ratio is $1.86 / 0.78 \approx 2.39$ times greater than the one at the equator. This is the same as taking the coefficient, multiplying it by 30, and then exponentiating the result:
exp(coef(res)[[2]] * 30)
[1] 2.394202
With predict()
, we can again obtain the corresponding CI:
predict(res, newmods=30, intercept=FALSE, transf=exp, digits=2)
pred ci.lb ci.ub pi.lb pi.ub 2.39 1.57 3.66 1.20 4.76
So, once again, this value actually reflects the ratio of two average risk ratios (at 30 degrees absolute latitude versus the equator).
The absolute values of the moderator variable do not actually matter here. We could just as well compare 40 with 10 degrees (or any other 30 degree difference) to obtain the same result. For example:
predict(res, newmods=10, transf=exp, digits=2) predict(res, newmods=40, transf=exp, digits=2)
yields
pred ci.lb ci.ub pi.lb pi.ub 1.04 0.72 1.50 0.54 2.00
and
pred ci.lb ci.ub pi.lb pi.ub 2.49 1.96 3.17 1.38 4.51
which again gives us a ratio of $2.49 / 1.04 \approx 2.39$.
Other Effect Size Measures
The same principle applies to other log-transformed effect size measures, such as the log odds ratio, the log incidence ratio, and the log-transformed ratio of means (also called the log response ratio), except that we would then get a ratio of odds ratios, a ratio of incidence ratios, or a ratio of the ratio of two means (yes, that sounds quite convoluted).
In fact, in regular logistic regression modeling, this phenomenon is well known. Let's say we fit a logistic regression model with two predictors and also include their interaction in the model. When we then exponentiate the coefficient for the interaction term, we get a ratio of two odds ratios. See here for a discussion of this.
ablat
moderator is 13 to 55 degrees, so this is an extrapolation, which should be treated with great caution.