# The metafor Package

A Meta-Analysis Package for R

### Sidebar

tips:assembling_data_or

This is an old revision of the document!

## Assembling Data for a Meta-Analysis of (Log) Odds Ratios

Suppose the goal of a meta-analysis is to aggregate the results from studies contrasting two groups (e.g., treatment versus control) and each study measured a dichotomous outcome of interest (e.g., treatment success versus failure). A commonly used effect size measure used to quantify the size of the group difference is then the odds ratio.

As an example, consider the data reported in Colditz et al. (1994) on the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine against tuberculosis (for this illustration, we will remove some variables that are not further needed):

library(metafor)
dat.bcg <- dat.bcg[,c(2:7)]
dat.bcg
                 author year tpos  tneg cpos  cneg
1               Aronson 1948    4   119   11   128
2      Ferguson & Simes 1949    6   300   29   274
3       Rosenthal et al 1960    3   228   11   209
4     Hart & Sutherland 1977   62 13536  248 12619
5  Frimodt-Moller et al 1973   33  5036   47  5761
6       Stein & Aronson 1953  180  1361  372  1079
7      Vandiviere et al 1973    8  2537   10   619
8            TPT Madras 1980  505 87886  499 87892
9      Coetzee & Berjak 1968   29  7470   45  7232
10      Rosenthal et al 1961   17  1699   65  1600
11       Comstock et al 1974  186 50448  141 27197
12   Comstock & Webster 1969    5  2493    3  2338
13       Comstock et al 1976   27 16886   29 17825

Variables tpos and tneg indicate the number of TB positive and TB negative cases in the treated (vaccinated) group, while variables cpos and cneg indicate the number of TB positive and TB negative cases in the control (non-vaccinated) group. The data can be arranged in terms of a 2×2 table of the form:

        |  TB+   TB-
--------+------------
Treated |  tpos  tneg
Control |  cpos  cneg

With this information, we can compute the log odds ratios (and corresponding sampling variance) for each study with:

dat1 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
dat1
                 author year tpos  tneg cpos  cneg      yi     vi
1               Aronson 1948    4   119   11   128 -0.9387 0.3571
2      Ferguson & Simes 1949    6   300   29   274 -1.6662 0.2081
3       Rosenthal et al 1960    3   228   11   209 -1.3863 0.4334
4     Hart & Sutherland 1977   62 13536  248 12619 -1.4564 0.0203
5  Frimodt-Moller et al 1973   33  5036   47  5761 -0.2191 0.0520
6       Stein & Aronson 1953  180  1361  372  1079 -0.9581 0.0099
7      Vandiviere et al 1973    8  2537   10   619 -1.6338 0.2270
8            TPT Madras 1980  505 87886  499 87892  0.0120 0.0040
9      Coetzee & Berjak 1968   29  7470   45  7232 -0.4717 0.0570
10      Rosenthal et al 1961   17  1699   65  1600 -1.4012 0.0754
11       Comstock et al 1974  186 50448  141 27197 -0.3408 0.0125
12   Comstock & Webster 1969    5  2493    3  2338  0.4466 0.5342
13       Comstock et al 1976   27 16886   29 17825 -0.0173 0.0716

Note that the escalc() function directly computes the log-transformed odds ratios, as these are the values we need for the meta-analysis. A negative log odds ratio therefore indicates that the odds of a TB infection were lower in the treated group compared to the control group in a particular study.

A random-effects model can then be fitted to these data with:

res1 <- rma(yi, vi, data=dat1)
res1
Random-Effects Model (k = 13; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.3378 (SE = 0.1784)
tau (square root of estimated tau^2 value):      0.5812
I^2 (total heterogeneity / total variability):   92.07%
H^2 (total variability / sampling variability):  12.61

Test for Heterogeneity:
Q(df = 12) = 163.1649, p-val < .0001

Model Results:

estimate       se     zval     pval    ci.lb    ci.ub
-0.7452   0.1860  -4.0057   <.0001  -1.1098  -0.3806      ***

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Therefore, the estimated average log odds ratio is equal to $\hat{\mu} = -0.75$ (with 95% CI: $-1.11$ to $-0.38$). For easier interpretation, we can back-transform the results with:

predict(res1, transf=exp, digits=2)
 pred ci.lb ci.ub cr.lb cr.ub
0.47  0.33  0.68  0.14  1.57

The odds of a TB infection are therefore estimated to be approximately half as large on average in vaccinated groups (i.e., an odds ratio of $0.47$ with 95% CI: $0.33$ to $0.68$), or put differently, we can say that the odds of infection are on average 53% lower (i.e., $1 - 0.47 = 0.53$) in vaccinated groups. However, there is a considerable amount of heterogeneity in the findings (as indicated by the large estimate of $\tau^2$, the large $I^2$ value, and the significant $Q$-test).

Now suppose that the 2×2 table data are not reported in all studies, but that the following dataset could be assembled based on information reported in the studies:

dat2 <- data.frame(summary(dat1, transf=exp))
names(dat2)[which(names(dat2) == "yi")] <- "or"
dat2$pval <- 2 * pnorm(abs(dat2$zi), lower.tail=FALSE)
dat2[,c("or","ci.lb","ci.ub","pval")] <- round(dat2[,c("or","ci.lb","ci.ub","pval")], 2)
dat2$vi <- dat2$sei <- dat2$zi <- NULL dat2[c(1,12),c(3:6,8:9)] <- NA dat2[c(4,9), c(3:6,10)] <- NA dat2[c(2,3,5:8,10,11,13),c(7:10)] <- NA dat2  author year tpos tneg cpos cneg or ci.lb ci.ub pval 1 Aronson 1948 NA NA NA NA 0.39 NA NA 0.12 2 Ferguson & Simes 1949 6 300 29 274 NA NA NA NA 3 Rosenthal et al 1960 3 228 11 209 NA NA NA NA 4 Hart & Sutherland 1977 NA NA NA NA 0.23 0.18 0.31 NA 5 Frimodt-Moller et al 1973 33 5036 47 5761 NA NA NA NA 6 Stein & Aronson 1953 180 1361 372 1079 NA NA NA NA 7 Vandiviere et al 1973 8 2537 10 619 NA NA NA NA 8 TPT Madras 1980 505 87886 499 87892 NA NA NA NA 9 Coetzee & Berjak 1968 NA NA NA NA 0.62 0.39 1.00 NA 10 Rosenthal et al 1961 17 1699 65 1600 NA NA NA NA 11 Comstock et al 1974 186 50448 141 27197 NA NA NA NA 12 Comstock & Webster 1969 NA NA NA NA 1.56 NA NA 0.54 13 Comstock et al 1976 27 16886 29 17825 NA NA NA NA In particular, in studies 1 and 12, authors reported only the odds ratio and the corresponding p-value (based on a Wald-type test whether the odds ratio differs significantly from 1) and in studies 4 and 9, authors reported only the odds ratio and the corresponding 95% confidence interval bounds. Given only this information, it is possible to reconstruct the full dataset for the meta-analysis (after each step below, I would recommend examining the contents of the dat2 object to better appreciate what the code is doing). 1. For the studies reporting 2×2 data, we can again use the escalc() function to obtain the log odds ratios and corresponding sampling variances with: dat2 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat2) For studies not reporting 2×2 data, the values for the yi and vi variables are missing. 2. For the studies reporting the odds ratios directly, we can easily transform these values to log odds ratios with: dat2$yi <- replmiss(dat2$yi, log(dat2$or))

The replmiss() function is a little helper function in the metafor package that can be used to only replace the missing values.

3. The p-values can be converted into the corresponding z-values with:
dat2$zi <- sign(dat2$yi) * qnorm(dat2$pval/2, lower.tail=FALSE) Note that the two-sided p-values need to be turned into one-sided p-values first (i.e., by dividing by 2) before using the qnorm() function. Also, the p-values do not contain information on the sign of the test statistics, but that information is available via the log odds ratios themselves. 4. Together with the log odds ratios, the test statistics can be converted into the corresponding standard errors with: dat2$sei <- dat2$yi / dat2$zi
5. The confidence interval bounds can be converted into the standard errors with:
dat2$sei <- replmiss(dat2$sei, with(dat2, (log(ci.ub) - log(ci.lb))/(2*1.96)))

Again, only the missing values should be replaced where possible.

6. Finally, any missing values for the vi variable (the sampling variances) can now be replaced with:
dat2$vi <- replmiss(dat2$vi, dat2$sei^2) 7. Finally, since we won't need these variables any further, we can remove the test statistics and standard errors with: dat2$zi <- dat2$sei <- NULL We can now examine the contents of the dataset: dat2  author year tpos tneg cpos cneg or ci.lb ci.ub pval yi vi 1 Aronson 1948 NA NA NA NA 0.39 NA NA 0.12 -0.9416 0.3668 2 Ferguson & Simes 1949 6 300 29 274 NA NA NA NA -1.6662 0.2081 3 Rosenthal et al 1960 3 228 11 209 NA NA NA NA -1.3863 0.4334 4 Hart & Sutherland 1977 NA NA NA NA 0.23 0.18 0.31 NA -1.4697 0.0192 5 Frimodt-Moller et al 1973 33 5036 47 5761 NA NA NA NA -0.2191 0.0520 6 Stein & Aronson 1953 180 1361 372 1079 NA NA NA NA -0.9581 0.0099 7 Vandiviere et al 1973 8 2537 10 619 NA NA NA NA -1.6338 0.2270 8 TPT Madras 1980 505 87886 499 87892 NA NA NA NA 0.0120 0.0040 9 Coetzee & Berjak 1968 NA NA NA NA 0.62 0.39 1.00 NA -0.4780 0.0577 10 Rosenthal et al 1961 17 1699 65 1600 NA NA NA NA -1.4012 0.0754 11 Comstock et al 1974 186 50448 141 27197 NA NA NA NA -0.3408 0.0125 12 Comstock & Webster 1969 NA NA NA NA 1.56 NA NA 0.54 0.4447 0.5266 13 Comstock et al 1976 27 16886 29 17825 NA NA NA NA -0.0173 0.0716 Any differences compared to dat1 are purely a result of the rounding of the or, ci.lb, ci.ub, and pval variables. However, the differences are negligible. We can then fit a random-effects model to these data with: res2 <- rma(yi, vi, data=dat2) res2 Random-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.3408 (SE = 0.1798) tau (square root of estimated tau^2 value): 0.5838 I^2 (total heterogeneity / total variability): 92.18% H^2 (total variability / sampling variability): 12.80 Test for Heterogeneity: Q(df = 12) = 167.4533, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub -0.7472 0.1867 -4.0015 <.0001 -1.1132 -0.3812 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 predict(res2, transf=exp, digits=2)  pred ci.lb ci.ub cr.lb cr.ub 0.47 0.33 0.68 0.14 1.57 The results are essentially the same as obtained earlier. Some final notes: • The conversion of a p-value to the corresponding test statistic as shown above assumes that the exact p-value is reported. If authors only report that the p-value fell below a certain threshold (e.g.,$p < .01$or if authors only state that the test was significant – which typically implies$p < .05$), then a common approach is to use the value of the cutoff reported (e.g., if$p < .01$is reported, then assume$p = .01\$), which is conservative (since the actual p-value was below that assumed value by some unknown amount). The conversion will therefore tend to be much less accurate.
• The conversion of the p-value shown above is only applicable if authors actually conducted a Wald-type test of the null hypothesis that the true odds ratio is equal to 1 (or equivalently, that the true log odds ratio is equal to 0). Similarly, the conversion of the confidence interval bounds to the standard error assumes that a Wald-type CI is reported.
• The same approach can be used when working with (log) risk ratios. Just change measure="OR" above to measure="RR".

### References

Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., & Mosteller, F. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: Meta-analysis of the published literature. Journal of the American Medical Association, 271(9), 698–702.

tips/assembling_data_or.1457790726.txt.gz · Last modified: 2016/03/12 13:52 by 127.0.0.1