Assembling Data for a Meta-Analysis of (Log) Odds Ratios
Suppose the goal of a meta-analysis is to aggregate the results from studies contrasting two groups (e.g., treatment versus control) and each study measured a dichotomous outcome of interest (e.g., treatment success versus failure). A commonly used effect size measure used to quantify the size of the group difference (i.e., the size of the treatment effect) is then the odds ratio.
As an example, consider the data reported in Colditz et al. (1994) on the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine against tuberculosis (for this illustration, we will remove some variables that are not further needed):
library(metafor) dat.bcg <- dat.bcg[,c(2:7)] dat.bcg
author year tpos tneg cpos cneg 1 Aronson 1948 4 119 11 128 2 Ferguson & Simes 1949 6 300 29 274 3 Rosenthal et al 1960 3 228 11 209 4 Hart & Sutherland 1977 62 13536 248 12619 5 Frimodt-Moller et al 1973 33 5036 47 5761 6 Stein & Aronson 1953 180 1361 372 1079 7 Vandiviere et al 1973 8 2537 10 619 8 TPT Madras 1980 505 87886 499 87892 9 Coetzee & Berjak 1968 29 7470 45 7232 10 Rosenthal et al 1961 17 1699 65 1600 11 Comstock et al 1974 186 50448 141 27197 12 Comstock & Webster 1969 5 2493 3 2338 13 Comstock et al 1976 27 16886 29 17825
Variables tpos
and tneg
indicate the number of TB positive and TB negative cases in the treated (vaccinated) group, while variables cpos
and cneg
indicate the number of TB positive and TB negative cases in the control (non-vaccinated) group. The data of each study can be arranged in terms of a 2×2 table of the form:
| TB+ TB- --------+------------ Treated | tpos tneg Control | cpos cneg
With this information, we can compute the log odds ratio (and corresponding sampling variance) for each study with:
dat1 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) dat1
author year tpos tneg cpos cneg yi vi 1 Aronson 1948 4 119 11 128 -0.9387 0.3571 2 Ferguson & Simes 1949 6 300 29 274 -1.6662 0.2081 3 Rosenthal et al 1960 3 228 11 209 -1.3863 0.4334 4 Hart & Sutherland 1977 62 13536 248 12619 -1.4564 0.0203 5 Frimodt-Moller et al 1973 33 5036 47 5761 -0.2191 0.0520 6 Stein & Aronson 1953 180 1361 372 1079 -0.9581 0.0099 7 Vandiviere et al 1973 8 2537 10 619 -1.6338 0.2270 8 TPT Madras 1980 505 87886 499 87892 0.0120 0.0040 9 Coetzee & Berjak 1968 29 7470 45 7232 -0.4717 0.0570 10 Rosenthal et al 1961 17 1699 65 1600 -1.4012 0.0754 11 Comstock et al 1974 186 50448 141 27197 -0.3408 0.0125 12 Comstock & Webster 1969 5 2493 3 2338 0.4466 0.5342 13 Comstock et al 1976 27 16886 29 17825 -0.0173 0.0716
Note that the escalc()
function directly computes the log-transformed odds ratios, as these are the values we need for a meta-analysis. A negative log odds ratio indicates that the odds of a TB infection were lower in the treated group compared to the control group in a particular study.
A random-effects model can then be fitted to these data with:
res1 <- rma(yi, vi, data=dat1) res1
Random-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.3378 (SE = 0.1784) tau (square root of estimated tau^2 value): 0.5812 I^2 (total heterogeneity / total variability): 92.07% H^2 (total variability / sampling variability): 12.61 Test for Heterogeneity: Q(df = 12) = 163.1649, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub -0.7452 0.1860 -4.0057 <.0001 -1.1098 -0.3806 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Therefore, the estimated average log odds ratio is equal to $\hat{\mu} = -0.75$ (with 95% CI: $-1.11$ to $-0.38$). For easier interpretation, we can back-transform the results with:
predict(res1, transf=exp, digits=2)
pred ci.lb ci.ub pi.lb pi.ub 0.47 0.33 0.68 0.14 1.57
The odds of a TB infection are therefore estimated to be approximately half as large on average in vaccinated groups (i.e., an odds ratio of $0.47$ with 95% CI: $0.33$ to $0.68$), or put differently, we can say that the odds of infection are on average 53% lower (i.e., $1 - 0.47 = 0.53$) in vaccinated groups. However, there is a considerable amount of heterogeneity in the findings (as indicated by the large estimate of $\tau^2$, the wide prediction interval, the large $I^2$ value, and the significant $Q$-test).
Now suppose that the 2×2 table data are not reported in all studies, but that the following dataset could be assembled based on information reported in the studies:
dat2 <- data.frame(summary(dat1)) dat2[c("yi", "ci.lb", "ci.ub")] <- data.frame(summary(dat1, transf=exp))[c("yi", "ci.lb", "ci.ub")] names(dat2)[which(names(dat2) == "yi")] <- "or" dat2[,c("or","ci.lb","ci.ub","pval")] <- round(dat2[,c("or","ci.lb","ci.ub","pval")], digits=2) dat2$vi <- dat2$sei <- dat2$zi <- NULL dat2$ntot <- with(dat2, tpos + tneg + cpos + cneg) dat2[c(1,12),c(3:6,9:10)] <- NA dat2[c(4,9), c(3:6,8)] <- NA dat2[c(2:3,5:8,10:11,13),c(7:10)] <- NA dat2
author year tpos tneg cpos cneg or pval ci.lb ci.ub ntot 1 Aronson 1948 NA NA NA NA 0.39 0.12 NA NA 262 2 Ferguson & Simes 1949 6 300 29 274 NA NA NA NA 609 3 Rosenthal et al 1960 3 228 11 209 NA NA NA NA 451 4 Hart & Sutherland 1977 NA NA NA NA 0.23 NA 0.18 0.31 26465 5 Frimodt-Moller et al 1973 33 5036 47 5761 NA NA NA NA 10877 6 Stein & Aronson 1953 180 1361 372 1079 NA NA NA NA 2992 7 Vandiviere et al 1973 8 2537 10 619 NA NA NA NA 3174 8 TPT Madras 1980 505 87886 499 87892 NA NA NA NA 176782 9 Coetzee & Berjak 1968 NA NA NA NA 0.62 NA 0.39 1.00 14776 10 Rosenthal et al 1961 17 1699 65 1600 NA NA NA NA 3381 11 Comstock et al 1974 186 50448 141 27197 NA NA NA NA 77972 12 Comstock & Webster 1969 NA NA NA NA 1.56 0.54 NA NA 4839 13 Comstock et al 1976 27 16886 29 17825 NA NA NA NA 34767
In particular, in studies 1 and 12, authors reported only the odds ratio and the corresponding p-value (based on a Wald-type test whether the log odds ratio differs significantly from 0) and in studies 4 and 9, authors reported only the odds ratio and the corresponding 95% Wald-type confidence interval bounds. Given only this information, it is possible to reconstruct the full dataset for the meta-analysis.
First, we use the escalc()
function as before.
dat2 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat2) dat2
. tpos tneg cpos cneg or pval ci.lb ci.ub ntot yi vi 1 . NA NA NA NA 0.39 0.12 NA NA 262 NA NA 2 . 6 300 29 274 NA NA NA NA 609 -1.6662 0.2081 3 . 3 228 11 209 NA NA NA NA 451 -1.3863 0.4334 4 . NA NA NA NA 0.23 NA 0.18 0.31 26465 NA NA 5 . 33 5036 47 5761 NA NA NA NA 10877 -0.2191 0.0520 6 . 180 1361 372 1079 NA NA NA NA 2992 -0.9581 0.0099 7 . 8 2537 10 619 NA NA NA NA 3174 -1.6338 0.2270 8 . 505 87886 499 87892 NA NA NA NA 176782 0.0120 0.0040 9 . NA NA NA NA 0.62 NA 0.39 1.00 14776 NA NA 10 . 17 1699 65 1600 NA NA NA NA 3381 -1.4012 0.0754 11 . 186 50448 141 27197 NA NA NA NA 77972 -0.3408 0.0125 12 . NA NA NA NA 1.56 0.54 NA NA 4839 NA NA 13 . 27 16886 29 17825 NA NA NA NA 34767 -0.0173 0.0716
As we can see above, this will calculate the log odds ratios and corresponding sampling variances based on the 2×2 table data where possible. For studies not reporting 2×2 data (studies 1, 4, 9, and 12), the values for the yi
and vi
variables are missing.
For the studies that directly report the odds ratios, it is trivial to convert these values to the log odds ratios. What is a bit more tricky is the computation of the corresponding sampling variances. However, the p-values from the Wald-type tests and the Wald-type confidence intervals provide sufficient information to reconstruct the sampling variances of the log odds ratios. For this, we can use the conv.wald()
function as follows.
dat2 <- conv.wald(out=or, ci.lb=ci.lb, ci.ub=ci.ub, pval=pval, n=ntot, data=dat2, transf=log) dat2
. tpos tneg cpos cneg or pval ci.lb ci.ub ntot yi vi 1 . NA NA NA NA 0.39 0.12 NA NA 262 -0.9416 0.3668 2 . 6 300 29 274 NA NA NA NA 609 -1.6662 0.2081 3 . 3 228 11 209 NA NA NA NA 451 -1.3863 0.4334 4 . NA NA NA NA 0.23 NA 0.18 0.31 26465 -1.4697 0.0192 5 . 33 5036 47 5761 NA NA NA NA 10877 -0.2191 0.0520 6 . 180 1361 372 1079 NA NA NA NA 2992 -0.9581 0.0099 7 . 8 2537 10 619 NA NA NA NA 3174 -1.6338 0.2270 8 . 505 87886 499 87892 NA NA NA NA 176782 0.0120 0.0040 9 . NA NA NA NA 0.62 NA 0.39 1.00 14776 -0.4780 0.0577 10 . 17 1699 65 1600 NA NA NA NA 3381 -1.4012 0.0754 11 . 186 50448 141 27197 NA NA NA NA 77972 -0.3408 0.0125 12 . NA NA NA NA 1.56 0.54 NA NA 4839 0.4447 0.5266 13 . 27 16886 29 17825 NA NA NA NA 34767 -0.0173 0.0716
We now have a complete dataset. Any differences compared to dat1
are purely a result of the rounding of the or
, ci.lb
, ci.ub
, and pval
variables. However, the differences are negligible.
Sidenote: The n
argument was used above to supply the total sample sizes of the studies to the function. This has no relevance for the calculations done by conv.wald()
, but some other functions may use this information (e.g., when drawing a funnel plot with the funnel()
function and one adjusts the yaxis
argument to one of the options that puts the sample sizes or some transformation thereof on the y-axis).
We can then fit a random-effects model to these data with:
res2 <- rma(yi, vi, data=dat2) res2
Random-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.3408 (SE = 0.1798) tau (square root of estimated tau^2 value): 0.5838 I^2 (total heterogeneity / total variability): 92.18% H^2 (total variability / sampling variability): 12.80 Test for Heterogeneity: Q(df = 12) = 167.4513, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub -0.7472 0.1867 -4.0015 <.0001 -1.1132 -0.3812 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict(res2, transf=exp, digits=2)
pred ci.lb ci.ub pi.lb pi.ub 0.47 0.33 0.68 0.14 1.57
These results are essentially the same as the ones we obtained earlier.
References
Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., & Mosteller, F. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: Meta-analysis of the published literature. Journal of the American Medical Association, 271(9), 698–702.