The metafor Package

Assembling Data for a Meta-Analysis of Standardized Mean Differences

Suppose the goal of a meta-analysis is to aggregate the results from studies contrasting two groups (e.g., treatment versus control) and each study measured an outcome of interest using some quantitative scale. A commonly used effect size measure used to quantify the size of the group difference is then the standardized mean difference (also commonly known as Cohen's d).

As an example, consider the data reported in Normand (1999) on the length of the hospital stay of stroke patients under specialized care and under conventional/routine non-specialist care:

library(metafor)
dat.normand1999

  study             source n1i m1i sd1i n2i m2i sd2i
1     1          Edinburgh 155  55   47 156  75   64
2     2     Orpington-Mild  31  27    7  32  29    4
3     3 Orpington-Moderate  75  64   17  71 119   29
4     4   Orpington-Severe  18  66   20  18 137   48
5     5      Montreal-Home   8  14    8  13  18   11
6     6  Montreal-Transfer  57  19    7  52  18    4
7     7          Newcastle  34  52   45  33  41   34
8     8               Umea 110  21   16 183  31   27
9     9            Uppsala  60  30   27  52  23   20

Variables n1i and n2i indicate the number of patients under specialized and under routine care, respectively, variables m1i and m2i indicate the respective mean length of stay (in days) in the two groups during the acute hospital admission, and sd1i and sd2i are the corresponding standard deviations of the length of stay.

With this information, we can compute the standardized mean difference (and corresponding sampling variance) for each study with:

dat1 <- escalc(measure="SMD", m1i=m1i, sd1i=sd1i, n1i=n1i,
                              m2i=m2i, sd2i=sd2i, n2i=n2i, data=dat.normand1999)
dat1

  study             source n1i m1i sd1i n2i m2i sd2i      yi     vi
1     1          Edinburgh 155  55   47 156  75   64 -0.3552 0.0131
2     2     Orpington-Mild  31  27    7  32  29    4 -0.3479 0.0645
3     3 Orpington-Moderate  75  64   17  71 119   29 -2.3176 0.0458
4     4   Orpington-Severe  18  66   20  18 137   48 -1.8880 0.1606
5     5      Montreal-Home   8  14    8  13  18   11 -0.3840 0.2054
6     6  Montreal-Transfer  57  19    7  52  18    4  0.1721 0.0369
7     7          Newcastle  34  52   45  33  41   34  0.2721 0.0603
8     8               Umea 110  21   16 183  31   27 -0.4246 0.0149
9     9            Uppsala  60  30   27  52  23   20  0.2896 0.0363

Finally, a random-effects model can be fitted to these data with:

res1 <- rma(yi, vi, data=dat1)
res1

Random-Effects Model (k = 9; tau^2 estimator: REML)
 
tau^2 (estimated amount of total heterogeneity): 0.7908 (SE = 0.4281)
tau (square root of estimated tau^2 value):      0.8893
I^2 (total heterogeneity / total variability):   95.49%
H^2 (total variability / sampling variability):  22.20
 
Test for Heterogeneity:
Q(df = 8) = 123.7293, p-val < .0001
 
Model Results:
 
estimate      se     zval    pval    ci.lb   ci.ub
 -0.5371  0.3087  -1.7401  0.0818  -1.1421  0.0679  .
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Therefore, stroke patients receiving specialized care are estimated to spent on average about half of a standard deviation fewer days in the hospital compared to patients receiving routine care ($\hat{\mu} = -0.54$ with 95% CI: $-1.14$ to $0.07$), but there is a considerable amount of heterogeneity in the findings (as indicated by the large estimate of $\tau^2$, the large $I^2$ value, and the significant $Q$-test).

Sidenote: Since the exact same variable (number of days of the hospital stay) was measured in each study, it would also be possible (and probably better) to conduct the meta-analysis with the raw (unstandardized) mean difference as the effect size measure (or possibly with the (log transformed) ratio of means, since the variable was measured on a ratio scale). The data are used here just for illustration purposes.

Now suppose that the means and standard deviations of the length of stay variable are not reported in all studies, but that the following dataset could be assembled based on the information reported in the studies:

dat2 <- dat.normand1999
dat2$dval <- with(dat2, (m1i - m2i) / sqrt(((n1i-1)*sd1i^2 + (n2i-1)*sd2i^2)/(n1i + n2i - 2)))
dat2$tval <- with(dat2, dval / sqrt(1/n1i + 1/n2i))
dat2$pval <- 2 * with(dat2, pt(abs(tval), df = n1i + n2i - 2, lower.tail=FALSE))
dat2$sign <- with(dat2, ifelse(m1i > m2i, 1, -1))
dat2[,c("dval","tval","pval")] <- round(dat2[,c("dval","tval","pval")], 2)
dat2[c(1,7),c(4,5,7,8,10,11,12)] <- NA
dat2[c(5,8),c(4,5,7,8, 9,11,12)] <- NA
dat2[c(2,9),c(4,5,7,8, 9,10)]    <- NA
dat2[c(3,4,6),c(9:12)] <- NA
dat2

  study             source n1i m1i sd1i n2i m2i sd2i  dval  tval pval sign
1     1          Edinburgh 155  NA   NA 156  NA   NA -0.36    NA   NA   NA
2     2     Orpington-Mild  31  NA   NA  32  NA   NA    NA    NA 0.17   -1
3     3 Orpington-Moderate  75  64   17  71 119   29    NA    NA   NA   NA
4     4   Orpington-Severe  18  66   20  18 137   48    NA    NA   NA   NA
5     5      Montreal-Home   8  NA   NA  13  NA   NA    NA -0.89   NA   NA
6     6  Montreal-Transfer  57  19    7  52  18    4    NA    NA   NA   NA
7     7          Newcastle  34  NA   NA  33  NA   NA  0.28    NA   NA   NA
8     8               Umea 110  NA   NA 183  NA   NA    NA -3.53   NA   NA
9     9            Uppsala  60  NA   NA  52  NA   NA    NA    NA 0.13    1

In particular, in studies 1 and 7, the authors did not report the means and standard deviations, but they directly reported the standardized mean difference (Cohen's d value). In studies 5 and 8, the authors only reported the t-statistic from an independent samples t-test comparing the two groups, and in studies 2 and 9, the authors only reported the (two-sided) p-value corresponding to the t-test. For all studies, the group sizes are known.¹⁾ Given only this information, it is possible to reconstruct the full dataset for the meta-analysis.

For standardized mean differences as the outcome measure, the escalc() function also allows the user to directly specify d values (if they are known) via the di argument. Moreover, the t-statistics can be converted into the corresponding Cohen's d values with $$d = t \times \sqrt{1/n_1 + 1/n_2}.$$ This conversion is based on the well-known relationship between Cohen's d values and the test statistic of the independent samples t-test. If we specify argument ti with the variable containing the t-statistics, then the escalc() function will do this automatically for us.

Finally, the p-values can be back-transformed to t-statistics, which in turn can be transformed into standardized mean differences as described above. For this, one can specify the p-values via the pi argument. However, since two-sided p-values do not contain information about the sign of the t-statistics and hence the standardized mean differences, that information needs to be available as well (as encoded in the sign variable). Therefore, one can specify positive or negative values for pi and the sign of the values is taken to be the sign of the standardized mean differences.

So, we can use the following code:

dat2 <- escalc(measure="SMD", m1i=m1i, sd1i=sd1i, n1i=n1i,
                              m2i=m2i, sd2i=sd2i, n2i=n2i,
                              ti=tval, di=dval, pi=sign*pval, data=dat2)

We can now examine the contents of the dataset with:

dat2

  study n1i m1i sd1i n2i m2i sd2i    dval    tval pval sign      yi     vi
1     1 155  NA   NA 156  NA   NA -0.3600      NA   NA   NA -0.3591 0.0131
2     2  31  NA   NA  32  NA   NA -0.3499 -1.3886 0.17   -1 -0.3456 0.0645
3     3  75  64   17  71 119   29      NA      NA   NA   NA -2.3176 0.0458
4     4  18  66   20  18 137   48      NA      NA   NA   NA -1.8880 0.1606
5     5   8  NA   NA  13  NA   NA -0.3999 -0.8900   NA   NA -0.3839 0.2054
6     6  57  19    7  52  18    4      NA      NA   NA   NA  0.1721 0.0369
7     7  34  NA   NA  33  NA   NA  0.2800      NA   NA   NA  0.2768 0.0603
8     8 110  NA   NA 183  NA   NA -0.4259 -3.5300   NA   NA -0.4248 0.0149
9     9  60  NA   NA  52  NA   NA  0.2890  1.5255 0.13    1  0.2871 0.0363

Any differences in the yi and vi variables compared to dat1 are purely a result of the fact that rounded values were reported for variables dval, tval, and pval. However, as we can see, the differences are (in this example) negligible. Note that the function applied the usual bias correction to the values specified via argument dval, which is why the corresponding values under yi are slightly lower.

We can now fit a random-effects model to these data with:

res2 <- rma(yi, vi, data=dat2)
res2

Random-Effects Model (k = 9; tau^2 estimator: REML)
 
tau^2 (estimated amount of total heterogeneity): 0.7912 (SE = 0.4283)
tau (square root of estimated tau^2 value):      0.8895
I^2 (total heterogeneity / total variability):   95.50%
H^2 (total variability / sampling variability):  22.20
 
Test for Heterogeneity:
Q(df = 8) = 123.7121, p-val < .0001
 
Model Results:
 
estimate      se     zval    pval    ci.lb   ci.ub
 -0.5371  0.3087  -1.7398  0.0819  -1.1422  0.0680  .
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The results are essentially the same as obtained earlier.

Some final notes:

The conversion of a p-value to the corresponding t-value as shown above assumes that the exact p-value is reported. If authors only report that the p-value fell below a certain threshold (e.g., $p < .01$ or if authors only state that the test was significant – which typically implies $p < .05$), then a common approach is to use the value of the cutoff reported (e.g., if $p < .01$ is reported, then assume $p = .01$), which is conservative (since the actual p-value was below that assumed value by some unknown amount). The conversion will therefore tend to be less accurate.

The conversion of a p-value to the t-statistic shown above is only applicable if authors actually conducted an independent samples t-test (i.e., it is not appropriate if the p-value was based on a Mann–Whitney U test or some other non-parametric test). Also, the t-test must have been a Student's t-test (assuming equal population variances in the two groups) and not Welch's t-test (allowing for unequal variances).

When authors report only the Cohen's d value and/or the test statistic from an independent samples t-test, care must be taken to ensure that the values have the correct sign (authors may only report the absolute value or may have interchanged the groups).

When specifying Cohen's d values via argument di, then these values are assumed to be uncorrected values (and escalc() will then apply the usual bias correction to them). If the values are already corrected, one would technically first have to 'uncorrect' them, so that they are not doubly corrected. However, unless the sample sizes are very small, the bias correction is quite negligible anyway, so even a double correction would have no noteworthy impact on the results.

A useful reference for the types of transformations discussed above is Lipsey and Wilson (2001).

References

Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.

Normand, S. T. (1999). Meta-analysis: Formulating, evaluating, combining, and reporting. Statistics in Medicine, 18(3), 321–359.

¹⁾

In some cases, only the total sample size of a study may be known (i.e., ni = n1i + n2i). If it is not at all possible to obtain information about the group sizes (e.g., by contacting the author(s) of the study), one possibility would be to assume that both groups are of the same size (i.e., n1i = n2i = ni/2). In the present dataset, this would not be far from the truth in most cases.