# Overview of Meta-Analysis, Part 5b (of 7): Primary Meta-Analyses (cont.)

This is the second of three posts in Part 5 of my overview of meta-analysis.  In Part 5a I described six conventional models for meta-analysis, each of which combines within-study and between-studies models.  In this second post I first comment on nested models then describe estimation and inference for two models without covariates—procedures for fitting these models to effect-size (ES) estimates and quantifying uncertainty about their focal (hyper)parameters.  In the third post, Part 5c, I’ll do the same for two models with covariates and also comment on extensions and variants of these models and procedures.

## Nested Models

As a precursor to estimation and inference, it’s useful to note certain relations among the six models I presented.  To that end, below I list them in combined linear-model form.  Where relevant we assume Ei ~ N(0, σi2) with known conditional variance (CV) σi2, E(Ui) = 0, Var(Ui) = τ2, and Ei and Ui are independent.  (For some procedures we further assume Ui ~ N.)

• SHoFE: Yi = μ + Ei
• SHeFE: Yi = μ + ηi + Ei
• SRE: Yi = μ + Ui + Ei
• MHoFE: Yi = xiβ + Ei
• MHeFE: Yixiβ + ηi + Ei
• MRE: YixiβUi + Ei

Some meta-analytic procedures involve comparing nested pairs of these models, at least implicitly.  For present purposes, let’s consider Model A nested within Model B if constraining (hyper)parameters in Model B to specific values yields (a model equivalent to) Model A.

For example, the SHoFE model is nested within all five others: We can arrive at it by constraining quantities in the SHeFE model (ηi = 0), the SRE model (Ui = 0 or τ2 = 0), the MHoFE model (xi = 1 and β = μ), and so on.  Similarly, the SHeFE and SRE models are nested within the MHeFE and MRE models, respectively, by the constraints xi = 1 and β = μ, and the MHoFE model is nested within both the MHeFE model (ηi = 0) and the MRE model (Ui = 0 or τ2 = 0).F1  Furthermore, any model that permits covariates can include different sets of covariates, and two versions of such a model with nested sets of covariates (e.g., 1 set is a subset of the other) are nested models.

Comparing nested models essentially involves assessing a tradeoff between adequacy and parsimony, given that more complex models tend to fit a data set better than simpler models: It’s sensible to prefer a simpler (more complex) model whose gain in parsimony (adequacy) is large relative to its loss in adequacy (parsimony).F2  This principle can be used to assess whether a mean effect size (ES) is plausibly some specific value (e.g., μ = 0, μ = 1/2), whether ES parameters are plausibly homogeneous (e.g., ηi = 0, τ2 = 0), and whether one or more covariates’ associations with ES parameters are plausibly 0 or other specific values (e.g., for non-intercept elements of β).  I’ll mention such comparisons occasionally when describing meta-analysis procedures.

The two heterogeneous fixed-effects models, SHeFE and MHeFE, seem to be used rarely.  For this reason and to conserve resources, I won’t discuss them further in this overview.  Interested readers might refer to the following articles about such models and associated procedures:

Bonett, D. G. (2008). Meta-analytic interval estimation for bivariate correlations. Psychological Methods, 13, 173-181. doi:10.1037/a0012868

Bonett, D. G. (2009). Meta-analytic interval estimation for standardized and unstandardized mean differences. Psychological Methods, 14, 225–238. doi:10.1037/a0016619

Bonett, D. G. (2010). Varying coefficient meta-analytic methods for alpha reliability. Psychological Methods, 15, 368–385. doi:10.1037/a0020142

Overton, R. C. (1998). A comparison of fixed-effects and mixed (random-effects) models for meta-analysis tests of moderator variable effects. Psychological Methods, 3, 354-379. doi:10.1037/1082-989X.3.3.354

## Estimation and Inference: Models without Covariates

In this section I describe common procedures for estimating and making inferences about (hyper)parameters in two of the above models without covariates: SHoFE and SRE.  Each subsection below focuses on one of these.  Common meta-analytic procedures for these models and those with covariates share several attributes, such as using weighted least-squares (WLS) estimators for fixed effects (e.g., μ or β), with weights based on CVs.

Aside about notation: I haven’t yet figured out how to typset non-trivial mathematical expressions in blog posts (time to learn LaTeX!), so for now I’ll denote estimates of (hyper)parameters with a caret prefix (e.g., ^μ is an estimate of μ) and denote summation over k studies using ∑i, where i runs from 1 to k (e.g., ∑iyi = y1y2 + … + yk).  (end of aside)

SHoFE.  The main statistical tasks under this model are to estimate and make inferences about μ, the common ES parameter.  A widely used procedure for accomplishing these tasks is a simple WLS method that yields a point estimate of μ and this estimate’s variance.  This point estimate is just a precision-weighted mean of the ES estimates; the optimal weights—which minimize the estimator’s sampling variance or maximize its precision—are reciprocals of CVs, 1 / σi2, but as described in this overview’s Part 2 we often estimate these weights based on CV estimates, wi = 1 / vi.  In terms of estimated weights, the WLS point estimate is

F = ∑iwiyi / ∑iwi ,

and its variance is

Var(^μF) = 1 / ∑iwi .

This point estimate and variance are typically used to make standard-normal inferences, such as a confidence interval (CI) for or test of μ.  Specifically, we could construct a 100(1 − α)% equal-tail CI for μ as

F ± zαSE(^μF) ,

where zα = -Φ(α/2) (e.g., for a 95% CI z.05 = 1.960) and SE = √Var(^μF)] is ^μF‘s standard error (SE).  Likewise, to test the null hypothesis H0: μ = μ0, where μ0 is an a priori value, we could refer the statistic

zF = (^μF − μ0) / SE(^μF)

to a standard-normal reference distribution to obtain a p value.F3

Because Var(^μF) is not known when wi is an estimate, standard-normal inferences might not perform as advertised (e.g., CI coverage rate below nominal, inflated Type I error rate for tests).  Other potential problems include non-normality of ES estimators—especially with small samples of subjects—and non-independence of ESs.  Strategies to address these problems are beyond the present scope but could entail updating weights iteratively when σi2 depends on θi (= μ), using alternative ES estimators that are more normal or whose CVs are more nearly known, or eliminating or combining dependent ESs or modeling their dependence.

Another statistical task is to decide whether the model is adequate or less appropriate than another model.  This falls under the general statistical problem of model selection, which is challenging in many contexts.  One or more aspects of the SHoFE model could be inappropriate for our data, but perhaps the most commonly assessed aspect is between-studies homogeneity of ES parameters (i.e., θi = μ for all i).  A popular way to assess this assumption is to test H0: θi = μ, which is often done using the following heterogeneity statistic:

Q = ∑iwi(yi − ^μF)2 .

If H0 is true (and other assumptions underlying the SHoFE model are satisfied), this weighted sum of squares follows a χ2(k − 1) distribution.  Essentially, this “specification” test evaluates whether our collection of ES estimates vary more than we’d expect based on their CVs; a statistically significant upper-tail test suggests there’s excess variation due to between-studies heterogeneity of ES parameters.  It’s an omnibus test designed to detect any departure from homogeneity, so it’s not tailored to a specific pattern of heterogeneity (e.g., different ES parameters for 2 subsets of studies).

This homogeneity test is a topic of controversy.  Meta-analysts often misuse it to guide or defend data-analysis choices.  Its performance depends on several features of the data, such as how well our ES estimators and data-collection process conform to the SHoFE model.  Rejecting homogeneity doesn’t guarantee there’s some type of heterogeneity (e.g., it might be a Type I error), provide a measure of any such heterogeneity’s real-world importance, or tell us which of countless alternative models is appropriate.  Likewise, failing to reject homogeneity doesn’t rule out definitively some type of heterogeneity (e.g., it might be a Type II error) or preclude detecting a specific pattern of heterogeneity (e.g., a covariate effect).  Other proposed ways to assess homogeneity, such as descriptive measures of the magnitude of heterogeneity or its influence on certain results (e.g., H2, I2), are beyond the present scope.

Example—Workplace Exercise: Let’s illustrate SHoFE analyses using Conn, Hafdahl, Cooper, Brown, and Lusk’s (2009) quantitative review of workplace exercise interventions, described in Part 1 of this overview.  Corresponding to each of their (well, our) SRE results in Tables 2 and 3, for three types of standardized mean difference (SMD) on 11 outcome variables, they also conducted SHoFE analyses.  In particular, for fitness they analyzed k = 35 two-group posttest SMDs after excluding one outlier.F4  These estimates and their (estimated) CVs—based on shrinkage estimates of θi that I won’t discuss here—yield the following sums needed for SHoFE analyses:

• iwi = 321.7
• iwiyi = 183.4
• iwiyi2 = 172.1

These in turn yield the WLS point estimate of μ

F = 183.4 / 321.7 = 0.570

and its variance

Var(^μF) = 1 / 321.7 = 0.05582 .

This estimate of the common two-group posttest SMD on fitness represents a treatment mean just over ½ standard deviation (SD) above the control mean, and it’s about 10 times larger than its SE.  Using these quantities for standard-normal inferences, we obtain the 95% CI

0.570 ± 1.960(0.0558) = (0.461, 0.679) .

A two-tailed test of the nil null hypothesis H0: μ = 0 at α2 = .05 yields the test statistic

zF = 0.570 / 0.0558 = 10.22 ,

whose p value is 0 to many decimal places.  This CI and test reflect only within-study sampling error over hypothetical meta-analyses (due to random sampling of participants), thereby supporting conditional inferences that extend only to studies like Conn et al.’s 35.  The test indicates that the common SMD is (statistically) significantly positive, and the CI suggests more specifically that we can be 95% confident—in the somewhat awkward frequentist sense—that this common SMD is between 0.46 and 0.68.

To assess homogeneity we can compute the heterogeneity statistic

Q(34) = 172.1 − (183.42 / 321.7) = 67.6 ,

for which p = .000529.  This indicates significant heterogeneity, which suggests these data might violate the SHoFE model’s homogeneity assumption.

SRE.  This model’s two hyperparameters, μ and τ2, are usefully viewed as the mean and variance (i.e., BSVC) of a (hyper)distribution of ES parameters.  We can estimate and make inferences about both of these.  Many meta-analysts who use this model focus solely on μ, but some are also interested in τ2 or other features of the ES-parameter distribution.  Perhaps the most widely used meta-analytic technique for this model is a two-step procedure that entails first obtaining a weighted method-of-moments (WMoM) estimate of τ2; adding this to each study’s CV to estimate yields unconditional variances, whose reciprocals are weights in a WLS estimate of μ.  Specially, we first use the SHoFE model’s weights (wi) and heterogeneity statistic (Q) to estimate τ2 as

S2 = max{0, [Q − (k − 1)] / cS} ,

where taking the maximum avoids negative estimates, and

cS = ∑iwi − (∑iwi2 / ∑iwi) .

For insight into this BSVC estimator, consider the “balanced” case where every study’s CV estimate is v: Because all weights are equal (i.e., wi = w = 1 / v for all i), ^μF is just the simple mean of ES estimates, Q is the unweighted sum of squared deviations from this mean, cS reduces to w(k − 1), and ^τS2 is either 0 or a positive value for sy2 − v, where sy2 is the usual unbiased variance estimate applied to the ES estimates.  Re-arranging this yields

sy2 = ^τS2 + v ,

which represents a decomposition of the ES estimates’ total variance into between-studies and within-study variances (i.e., due to sampling of studies and subjects).  Even in the more general situation with unequal vi, the above BSVC estimate is still essentially the excess variance in ES estimates beyond that due to within-study variance.

At any rate, we next use the BSVC estimate to estimate each study’s unconditional weight as as wSi = 1 / (^τS2 + vi).  (The somewhat clumsy notation wSi distinguishes this weight from its counterparts from the SHoFE, MHoFE, and MRE models.)  Provided that ^τS2 > 0, these unconditional weights (wSi) will be smaller—reflecting lower precision—and more similar than their conditional counterparts (wi).  Now, to estimate μ we simply apply WLS with these new weights:

R = ∑iwSiyi / ∑iwSi .F5

As ^τS2 increases, ^μR approaches the ES estimates’ unweighted mean.  The mean estimator’s variance,

Var(^μR) = 1 / ∑iwSi ,

increases with larger ^τS2; this is evident in the balanced case (i.e., viv for all i), where Var(^μR) = (^τS2 + v) / k.  It’s conventional to use ^μR and its variance for standard-normal inferences about μ, such as a CI or test.  These procedures face additional limitations besides those for their counterparts under the SHoFE model: Because ^τS2 and hence Var(^μR) are subject to sampling error, standard-normal techniques may perform poorly, especially with few studies (i.e., small k).  Moreover, if the CV depends on θi it’s unclear what substitute for θi in vi would optimize estimation or inference (e.g., estimate of μ? shrinkage estimate of θi?).  To overcome some of these limitations, other estimators for τ2 have been proposed, as have other methods of inference for μ; they’re beyond this overview’s scope, but the following articles and chapter address several of them:

DerSimonian, R., & Kacker, R. (2007). Random-effects model for meta-analysis of clinical trials: An update. Contemporary Clinical Trials, 28, 105-114. doi:10.1016/j.cct.2006.04.004

Raudenbush, S. W. (2009). Analyzing effect sizes: Random-effects models. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 295-315). New York: Russell Sage Foundation.

Sidik, K., & Jonkman, J. N. (2007). A comparison of heterogeneity variance estimators in combining results of studies. Statistics in Medicine, 26, 1964-1981. doi:10.1002/sim.2688

Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational & Behavioral Statistics, 30, 261-293. doi:10.3102/10769986030003261

As for inference about τ2, we’ve already met the most common procedure (and its limitations): The Q test of the SHoFE model’s homogeneity assumption also tests H0: τ2 = 0.  (Readers familiar with random-effects ANOVA may recognize a parallel with the classical 1-way ANOVA for k independent samples’ means, where the test is identical for fixed and random factors.)  It’s also possible to test H0: τ2 = τ02 with a non-zero a priori value for τ02, but this is rarely done and won’t be addressed here.  Constructing a CI for τ2 is more common but still fairly rare; the following article provides computational details and may help interested readers find related work (e.g., in citing articles):

Viechtbauer, W. (2007). Confidence intervals for the amount of heterogeneity in meta-analysis. Statistics in Medicine, 26, 37-52. doi:10.1002/sim.2514

Finally, I’ll simply mention other potentially interesting features of the ES-parameter distribution without considering estimation or inference methods.  In some situations we might wish to find the proportion of ES parameters below, above, or between selected values (e.g., positive, negligibly small), which involves the cumulative distribution function (CDF).  Likewise, finding values that demarcate specific proportions of ES parameters, such as quartiles or percentiles, involves the quantile function (i.e., inverse CDF).  For instance, we might wish to express between-studies heterogeneity as an interval or more general set of values in which most ES parameters fall, such as a 95% prediction interval, credibility interval (in validity-generalization parlance), or highest density region.  These proportions and quantiles depend on the distribution’s shape, which we might estimate from our data instead of assuming normality.  If we permit non-normal ES parameters, we might also be interested in higher-order moments such as skewness or kurtosis.

Example—Workplace Exercise: Let’s illustrate a SRE analysis using again Conn et al.’s (2009) 35 two-group posttest SMDs on fitness.  We’ll need ∑iwi = 321.7 from the SHoFE analyses as well as ∑iwi2 = 4063.9.  To estimate the BSVC we first compute

cS = 321.7 − (4063.9 / 321.7) = 309.1 ,

which in turn yields

S2 = [67.6 − (35 − 1)] / 309.1 = 0.3302 .

Adding this estimate to each study’s CV estimate and computing unconditional weights (wSi) yields the following sums:

• iwSi = 149.2
• iwSiyi = 86.4

These in turn yield the point estimate of μ

R = 86.4 / 149.2 = 0.579

and its variance

Var(^μR) = 1 / 149.2 = 0.08192 .

This estimate of the mean two-group posttest SMD on fitness is only slightly larger than its SHoFE counterpart (for the common SMD).  This SRE estimate’s variance is more than twice the SHoFE estimate’s, however, reflecting the substantial BSVC.  Using these quantities for standard-normal inferences about μ, we obtain the 95% CI

0.579 ± 1.960(0.0819) = (0.419, 0.740) .

A two-tailed test of the nil null hypothesis H0: μ = 0 at α2 = .05 yields the test statistic

zR = 0.579 / 0.0819 = 7.08 ,

whose p value is 0 to many decimal places.  This CI and test reflect both within-study and between-studies sampling error over hypothetical meta-analyses, supporting unconditional inferences that extend to a universe of studies from which our 35 were sampled.  The price paid for these broader inferences, relative to their SHoFE counterparts, is a less precise estimate of μ, as reflected in the wider CI and smaller test statistic.  The test indicates that the mean SMD is significantly positive, and the CI suggests we can be 95% confident that this mean SMD is between 0.42 and 0.74.

That’s all I’ll say about meta-analytic models without covariates.  Stay tuned for Part 5c, in which I’ll describe and demonstrate versions of the above procedures that handle covariates; I’ll also mention some extensions and other variants of these models and procedures.

## Footnotes

1. I doubt HeFE models are nested within RE models, but I’m unsure; this is rarely (if ever) discussed.  Clearly they’d be equivalent if Ui = ηi, but this constraint isn’t expressed in terms of (hyper)parameters.

2. Comparing non-nested models is trickier but possible.

3. To relate this test to similar tests of fixed effects in more complex models, note that squaring zF yields a statistic distributed approximately as χ2(1) (i.e., chi-squared with 1 degree of freedom) under H0.  We can write this as a weighted sum of squares comparing two models—one in which μ is estimated freely and another in which it’s constrained to μ0:

QμF = ∑iwi(^μF − μ0)2 = (^μF − μ0)2 / (1 / ∑iwi) = [(^μF − μ0) / SE(^μF)]2 = zF2 .

4. Their analyses accounted for dependence among 4 multiple-treatment pairs and 1 multiple-treatment triplet; for simplicity I’ll instead treat the 35 SMD estimates as independent, which decreases Var(^μF) and Q somewhat.

5. Although ^μR is a different estimator than the SHoFE model’s ^μF, and these estimate different quantities in different models, they sometimes take the same value: when Qk − 1 and, hence, ^τS2 = 0 so that wSi = wi.

### 2 Comments on “Overview of Meta-Analysis, Part 5b (of 7): Primary Meta-Analyses (cont.)”

1. Sneak Preview 2: Outliers, Metric Transformation, and ES Distribution « Meta-Analysis Sandwich says:

[…] previous three posts on fitting models to effect sizes (ESs)—Parts 5a, 5b, and 5c—were the core of my seven-part overview of meta-analysis.  With only two posts remaining […]

2. Overview of Meta-Analysis, Part 5c (of 7): Primary Meta-Analyses (cont.) « Meta-Analysis Sandwich says:

[…] Overview of Meta-Analysis, Part 5b (of 7): Primary Meta-Analyses (cont.) […]