## Assembling Data for a Meta-Analysis of (Log) Odds Ratios

Suppose the goal of a meta-analysis is to aggregate the results from studies contrasting two groups (e.g., treatment versus control) and each study measured a dichotomous outcome of interest (e.g., treatment success versus failure). A commonly used effect size measure used to quantify the size of the group difference (i.e., the size of the treatment effect) is then the odds ratio.

As an example, consider the data reported in Colditz et al. (1994) on the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine against tuberculosis (for this illustration, we will remove some variables that are not further needed):

library(metafor) dat.bcg <- dat.bcg[,c(2:7)] dat.bcg

author year tpos tneg cpos cneg 1 Aronson 1948 4 119 11 128 2 Ferguson & Simes 1949 6 300 29 274 3 Rosenthal et al 1960 3 228 11 209 4 Hart & Sutherland 1977 62 13536 248 12619 5 Frimodt-Moller et al 1973 33 5036 47 5761 6 Stein & Aronson 1953 180 1361 372 1079 7 Vandiviere et al 1973 8 2537 10 619 8 TPT Madras 1980 505 87886 499 87892 9 Coetzee & Berjak 1968 29 7470 45 7232 10 Rosenthal et al 1961 17 1699 65 1600 11 Comstock et al 1974 186 50448 141 27197 12 Comstock & Webster 1969 5 2493 3 2338 13 Comstock et al 1976 27 16886 29 17825

Variables `tpos`

and `tneg`

indicate the number of TB positive and TB negative cases in the treated (vaccinated) group, while variables `cpos`

and `cneg`

indicate the number of TB positive and TB negative cases in the control (non-vaccinated) group. The data can be arranged in terms of a 2×2 table of the form:

| TB+ TB- --------+------------ Treated | tpos tneg Control | cpos cneg

With this information, we can compute the log odds ratio (and corresponding sampling variance) for each study with:

dat1 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) dat1

author year tpos tneg cpos cneg yi vi 1 Aronson 1948 4 119 11 128 -0.9387 0.3571 2 Ferguson & Simes 1949 6 300 29 274 -1.6662 0.2081 3 Rosenthal et al 1960 3 228 11 209 -1.3863 0.4334 4 Hart & Sutherland 1977 62 13536 248 12619 -1.4564 0.0203 5 Frimodt-Moller et al 1973 33 5036 47 5761 -0.2191 0.0520 6 Stein & Aronson 1953 180 1361 372 1079 -0.9581 0.0099 7 Vandiviere et al 1973 8 2537 10 619 -1.6338 0.2270 8 TPT Madras 1980 505 87886 499 87892 0.0120 0.0040 9 Coetzee & Berjak 1968 29 7470 45 7232 -0.4717 0.0570 10 Rosenthal et al 1961 17 1699 65 1600 -1.4012 0.0754 11 Comstock et al 1974 186 50448 141 27197 -0.3408 0.0125 12 Comstock & Webster 1969 5 2493 3 2338 0.4466 0.5342 13 Comstock et al 1976 27 16886 29 17825 -0.0173 0.0716

Note that the `escalc()`

function directly computes the log-transformed odds ratios, as these are the values we need for a meta-analysis. A negative log odds ratio therefore indicates that the odds of a TB infection were lower in the treated group compared to the control group in a particular study.

A random-effects model can then be fitted to these data with:

res1 <- rma(yi, vi, data=dat1) res1

Random-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.3378 (SE = 0.1784) tau (square root of estimated tau^2 value): 0.5812 I^2 (total heterogeneity / total variability): 92.07% H^2 (total variability / sampling variability): 12.61 Test for Heterogeneity: Q(df = 12) = 163.1649, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub -0.7452 0.1860 -4.0057 <.0001 -1.1098 -0.3806 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Therefore, the estimated average log odds ratio is equal to $\hat{\mu} = -0.75$ (with 95% CI: $-1.11$ to $-0.38$). For easier interpretation, we can back-transform the results with:

predict(res1, transf=exp, digits=2)

pred ci.lb ci.ub pi.lb pi.ub 0.47 0.33 0.68 0.14 1.57

The odds of a TB infection are therefore estimated to be approximately half as large on average in vaccinated groups (i.e., an odds ratio of $0.47$ with 95% CI: $0.33$ to $0.68$), or put differently, we can say that the odds of infection are on average 53% lower (i.e., $1 - 0.47 = 0.53$) in vaccinated groups. However, there is a considerable amount of heterogeneity in the findings (as indicated by the large estimate of $\tau^2$, the large $I^2$ value, and the significant $Q$-test).

Now suppose that the 2×2 table data are not reported in all studies, but that the following dataset could be assembled based on information reported in the studies:

dat2 <- data.frame(summary(dat1)) dat2[c("yi", "ci.lb", "ci.ub")] <- data.frame(summary(dat1, transf=exp))[c("yi", "ci.lb", "ci.ub")] names(dat2)[which(names(dat2) == "yi")] <- "or" dat2[,c("or","ci.lb","ci.ub","pval")] <- round(dat2[,c("or","ci.lb","ci.ub","pval")], 2) dat2$vi <- dat2$sei <- dat2$zi <- NULL dat2[c(1,12),c(3:6,9:10)] <- NA dat2[c(4,9), c(3:6,8)] <- NA dat2[c(2:3,5:8,10:11,13),c(7:10)] <- NA dat2

author year tpos tneg cpos cneg or pval ci.lb ci.ub 1 Aronson 1948 NA NA NA NA 0.39 0.12 NA NA 2 Ferguson & Simes 1949 6 300 29 274 NA NA NA NA 3 Rosenthal et al 1960 3 228 11 209 NA NA NA NA 4 Hart & Sutherland 1977 NA NA NA NA 0.23 NA 0.18 0.31 5 Frimodt-Moller et al 1973 33 5036 47 5761 NA NA NA NA 6 Stein & Aronson 1953 180 1361 372 1079 NA NA NA NA 7 Vandiviere et al 1973 8 2537 10 619 NA NA NA NA 8 TPT Madras 1980 505 87886 499 87892 NA NA NA NA 9 Coetzee & Berjak 1968 NA NA NA NA 0.62 NA 0.39 1.00 10 Rosenthal et al 1961 17 1699 65 1600 NA NA NA NA 11 Comstock et al 1974 186 50448 141 27197 NA NA NA NA 12 Comstock & Webster 1969 NA NA NA NA 1.56 0.54 NA NA 13 Comstock et al 1976 27 16886 29 17825 NA NA NA NA

In particular, in studies 1 and 12, authors reported only the odds ratio and the corresponding p-value (based on a Wald-type test whether the odds ratio differs significantly from 1) and in studies 4 and 9, authors reported only the odds ratio and the corresponding 95% confidence interval bounds. Given only this information, it is possible to reconstruct the full dataset for the meta-analysis (after each step below, I would recommend examining the contents of the `dat2`

object to better appreciate what the code is doing).

1) For the studies reporting 2×2 data, we can again use the `escalc()`

function to obtain the log odds ratios and corresponding sampling variances with:

dat2 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat2)

For studies not reporting 2×2 data (studies 1, 4, 9, and 12), the values for the `yi`

and `vi`

variables are missing.

2) For the studies reporting the odds ratios directly, we can easily transform these values to log odds ratios with:

dat2$yi <- replmiss(yi, log(or), data=dat2)

The `replmiss()`

function is a little helper function in the metafor package that can be used to only replace the missing values.

3) The p-values can be converted into the corresponding z-values with:

dat2$zi <- sign(dat2$yi) * qnorm(dat2$pval/2, lower.tail=FALSE)

Note that the two-sided p-values need to be turned into one-sided p-values first (i.e., by dividing by 2) before using the `qnorm()`

function. Also, the two-sided p-values do not contain information about the sign of the test statistics, but that information is available via the log odds ratios themselves.

4) The test statistics can be converted into the corresponding standard errors with:

dat2$sei <- dat2$yi / dat2$zi

5) The confidence interval bounds can be converted into the standard errors with:

dat2$sei <- replmiss(sei, (log(ci.ub) - log(ci.lb))/(2*1.96), data=dat2)

Again, only the missing values should be replaced where possible.

6) Any missing values for the `vi`

variable (the sampling variances) can now be replaced with:

dat2$vi <- replmiss(vi, sei^2, data=dat2)

7) Finally, since we won't need these variables further, we remove the test statistics and standard errors with:

dat2$zi <- dat2$sei <- NULL

We can now examine the contents of the dataset:

dat2

author year tpos tneg cpos cneg or pval ci.lb ci.ub yi vi 1 Aronson 1948 NA NA NA NA 0.39 0.12 NA NA -0.9416 0.3668 2 Ferguson & Simes 1949 6 300 29 274 NA NA NA NA -1.6662 0.2081 3 Rosenthal et al 1960 3 228 11 209 NA NA NA NA -1.3863 0.4334 4 Hart & Sutherland 1977 NA NA NA NA 0.23 NA 0.18 0.31 -1.4697 0.0192 5 Frimodt-Moller et al 1973 33 5036 47 5761 NA NA NA NA -0.2191 0.0520 6 Stein & Aronson 1953 180 1361 372 1079 NA NA NA NA -0.9581 0.0099 7 Vandiviere et al 1973 8 2537 10 619 NA NA NA NA -1.6338 0.2270 8 TPT Madras 1980 505 87886 499 87892 NA NA NA NA 0.0120 0.0040 9 Coetzee & Berjak 1968 NA NA NA NA 0.62 NA 0.39 1.00 -0.4780 0.0577 10 Rosenthal et al 1961 17 1699 65 1600 NA NA NA NA -1.4012 0.0754 11 Comstock et al 1974 186 50448 141 27197 NA NA NA NA -0.3408 0.0125 12 Comstock & Webster 1969 NA NA NA NA 1.56 0.54 NA NA 0.4447 0.5266 13 Comstock et al 1976 27 16886 29 17825 NA NA NA NA -0.0173 0.0716

Any differences compared to `dat1`

are purely a result of the rounding of the `or`

, `ci.lb`

, `ci.ub`

, and `pval`

variables. However, the differences are negligible.

We can then fit a random-effects model to these data with:

res2 <- rma(yi, vi, data=dat2) res2

Random-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.3408 (SE = 0.1798) tau (square root of estimated tau^2 value): 0.5838 I^2 (total heterogeneity / total variability): 92.18% H^2 (total variability / sampling variability): 12.80 Test for Heterogeneity: Q(df = 12) = 167.4533, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub -0.7472 0.1867 -4.0015 <.0001 -1.1132 -0.3812 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

predict(res2, transf=exp, digits=2)

pred ci.lb ci.ub pi.lb pi.ub 0.47 0.33 0.68 0.14 1.57

The results are essentially the same as obtained earlier.

Some final notes:

- The conversion of a p-value to the corresponding test statistic as shown above assumes that the exact p-value is reported. If authors only report that the p-value fell below a certain threshold (e.g., $p < .01$ or if authors only state that the test was significant – which typically implies $p < .05$), then a common approach is to use the value of the cutoff reported (e.g., if $p < .01$ is reported, then assume $p = .01$), which is conservative (since the actual p-value was below that assumed value by some unknown amount). The conversion will therefore tend to be much less accurate.

- The conversion of the p-value shown above is only applicable if authors actually conducted a Wald-type test of the null hypothesis that the true odds ratio is equal to 1 (or equivalently, that the true log odds ratio is equal to 0). Similarly, the conversion of the confidence interval bounds to the standard error assumes that a Wald-type CI is reported.

- The same approach can be used when working with (log) risk ratios. Just change
`measure="OR"`

above to`measure="RR"`

.

### References

Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., & Mosteller, F. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: Meta-analysis of the published literature. *Journal of the American Medical Association, 271*(9), 698–702.