# Overview of Meta-Analysis, Part 5b (of 7): Primary Meta-Analyses (cont.)

**Posted:**April 30, 2012 |

**Author:**A. R. Hafdahl |

**Filed under:**Overview of Meta-Analysis |

**Tags:**between-studies variance component, conditional variance, fixed effect, heterogeneity, interval estimation, math notation, meta-analysis, meta-regression, random effect, significance testing, standardized mean difference |2 Comments

This is the second of three posts in Part 5 of my overview of meta-analysis. In Part 5a I described six conventional models for meta-analysis, each of which combines within-study and between-studies models. In this second post I first comment on **nested models** then describe **estimation and inference for two models without covariates**—procedures for fitting these models to effect-size (ES) estimates and quantifying uncertainty about their focal (hyper)parameters. In the third post, Part 5c, I’ll do the same for two models with covariates and also comment on extensions and variants of these models and procedures.

## Nested Models

As a precursor to estimation and inference, it’s useful to note certain relations among the six models I presented. To that end, below I list them in combined linear-model form. Where relevant we assume *E _{i}* ~

*N*(0, σ

_{i}^{2}) with known conditional variance (CV) σ

_{i}^{2}, E(

*U*) = 0, Var(

_{i}*U*) = τ

_{i}^{2}, and

*E*and

_{i}*U*are independent. (For some procedures we further assume

_{i}*U*~

_{i}*N*.)

- SHoFE:
*Y*= μ +_{i}*E*_{i} - SHeFE:
*Y*= μ + η_{i}+_{i}*E*_{i} - SRE:
*Y*= μ +_{i}*U*+_{i}*E*_{i} - MHoFE:
*Y*=_{i}**x**_{i}**β**+*E*_{i} - MHeFE:
*Y*=_{i}**x**_{i}**β**+ η+_{i}*E*_{i} - MRE:
*Y*=_{i}**x**_{i}**β**+*U*+_{i}*E*_{i}

Some meta-analytic procedures involve comparing nested pairs of these models, at least implicitly. For present purposes, let’s consider *Model A nested within Model B* if constraining (hyper)parameters in Model B to specific values yields (a model equivalent to) Model A.

For example, the SHoFE model is nested within all five others: We can arrive at it by constraining quantities in the SHeFE model (η* _{i}* = 0), the SRE model (

*U*= 0 or τ

_{i}^{2}= 0), the MHoFE model (

**x**

*= 1 and*

_{i}**β**= μ), and so on. Similarly, the SHeFE and SRE models are nested within the MHeFE and MRE models, respectively, by the constraints

**x**

*= 1 and*

_{i}**β**= μ, and the MHoFE model is nested within both the MHeFE model (η

*= 0) and the MRE model (*

_{i}*U*= 0 or τ

_{i}^{2}= 0).

^{F1}Furthermore, any model that permits covariates can include different sets of covariates, and two versions of such a model with nested sets of covariates (e.g., 1 set is a subset of the other) are nested models.

Comparing nested models essentially involves assessing a **tradeoff between adequacy and parsimony**, given that more complex models tend to fit a data set better than simpler models: It’s sensible to prefer a simpler (more complex) model whose gain in parsimony (adequacy) is large relative to its loss in adequacy (parsimony).^{F2} This principle can be used to assess whether a mean effect size (ES) is plausibly some specific value (e.g., μ = 0, μ = 1/2), whether ES parameters are plausibly homogeneous (e.g., η* _{i}* = 0, τ

^{2}= 0), and whether one or more covariates’ associations with ES parameters are plausibly 0 or other specific values (e.g., for non-intercept elements of

**β**). I’ll mention such comparisons occasionally when describing meta-analysis procedures.

The two heterogeneous fixed-effects models, SHeFE and MHeFE, seem to be used rarely. For this reason and to conserve resources, I won’t discuss them further in this overview. Interested readers might refer to the following articles about such models and associated procedures:

Bonett, D. G. (2008). Meta-analytic interval estimation for bivariate correlations. *Psychological Methods, 13,* 173-181. doi:10.1037/a0012868

Bonett, D. G. (2009). Meta-analytic interval estimation for standardized and unstandardized mean differences. *Psychological Methods, 14,* 225–238. doi:10.1037/a0016619

Bonett, D. G. (2010). Varying coefficient meta-analytic methods for alpha reliability. *Psychological Methods, 15,* 368–385. doi:10.1037/a0020142

Overton, R. C. (1998). A comparison of fixed-effects and mixed (random-effects) models for meta-analysis tests of moderator variable effects. *Psychological Methods, 3,* 354-379. doi:10.1037/1082-989X.3.3.354

## Estimation and Inference: Models without Covariates

In this section I describe common procedures for estimating and making inferences about (hyper)parameters in two of the above models without covariates: SHoFE and SRE. Each subsection below focuses on one of these. Common meta-analytic procedures for these models and those with covariates share several attributes, such as using weighted least-squares (WLS) estimators for fixed effects (e.g., μ or **β**), with weights based on CVs.

Aside about notation: I haven’t yet figured out how to typset non-trivial mathematical expressions in blog posts (time to learn LaTeX!), so for now I’ll denote estimates of (hyper)parameters with a caret prefix (e.g., ^μ is an estimate of μ) and denote summation over *k* studies using ∑* _{i}*, where

*i*runs from 1 to

*k*(e.g., ∑

*=*

_{i}y_{i}*y*

_{1}+

*y*

_{2}+ … +

*y*). (end of aside)

_{k}**SHoFE.** The main statistical tasks under this model are to **estimate and make inferences about μ**, the common ES parameter. A widely used procedure for accomplishing these tasks is a simple WLS method that yields a point estimate of μ and this estimate’s variance. This point estimate is just a precision-weighted mean of the ES estimates; the optimal weights—which minimize the estimator’s sampling variance or maximize its precision—are reciprocals of CVs, 1 / σ_{i}^{2}, but as described in this overview’s Part 2 we often estimate these weights based on CV estimates, *w _{i}* = 1 /

*v*. In terms of estimated weights, the WLS point estimate is

_{i}^μ_{F} = ∑* _{i}w_{i}y_{i}* / ∑

*,*

_{i}w_{i}and its variance is

Var(^μ_{F}) = 1 / ∑* _{i}w_{i}* .

This point estimate and variance are typically used to make standard-normal inferences, such as a confidence interval (CI) for or test of μ. Specifically, we could construct a 100(1 − α)% equal-tail CI for μ as

^μ_{F} ± *z*_{α}SE(^μ_{F}) ,

where *z*_{α} = -Φ(α/2) (e.g., for a 95% CI *z*_{.05} = 1.960) and SE = √Var(^μ_{F})] is ^μ_{F}‘s standard error (SE). Likewise, to test the null hypothesis *H*_{0}: μ = μ_{0}, where μ_{0} is an a priori value, we could refer the statistic

*z*_{F} = (^μ_{F} − μ_{0}) / SE(^μ_{F})

to a standard-normal reference distribution to obtain a *p* value.^{F3}

Because Var(^μ_{F}) is not known when *w _{i}* is an estimate,

*standard-normal inferences might not perform as advertised*(e.g., CI coverage rate below nominal, inflated Type I error rate for tests). Other potential problems include non-normality of ES estimators—especially with small samples of subjects—and non-independence of ESs. Strategies to address these problems are beyond the present scope but could entail updating weights iteratively when σ

_{i}^{2}depends on θ

*(= μ), using alternative ES estimators that are more normal or whose CVs are more nearly known, or eliminating or combining dependent ESs or modeling their dependence.*

_{i}Another statistical task is to** decide whether the model is adequate** or less appropriate than another model. This falls under the general statistical problem of *model selection*, which is challenging in many contexts. One or more aspects of the SHoFE model could be inappropriate for our data, but perhaps the most commonly assessed aspect is between-studies homogeneity of ES parameters (i.e., θ* _{i}* = μ for all

*i*). A popular way to assess this assumption is to test

*H*

_{0}: θ

*= μ, which is often done using the following heterogeneity statistic:*

_{i}*Q* = ∑* _{i}w_{i}*(

*y*− ^μ

_{i}_{F})

^{2}.

If *H*_{0} is true (and other assumptions underlying the SHoFE model are satisfied), this weighted sum of squares follows a χ^{2}(*k* − 1) distribution. Essentially, this “specification” test evaluates whether our collection of ES estimates vary more than we’d expect based on their CVs; a statistically significant upper-tail test suggests there’s excess variation due to between-studies heterogeneity of ES parameters. It’s an omnibus test designed to detect *any* departure from homogeneity, so it’s not tailored to a specific pattern of heterogeneity (e.g., different ES parameters for 2 subsets of studies).

*This homogeneity test is a topic of controversy.* Meta-analysts often misuse it to guide or defend data-analysis choices. Its performance depends on several features of the data, such as how well our ES estimators and data-collection process conform to the SHoFE model. Rejecting homogeneity doesn’t guarantee there’s some type of heterogeneity (e.g., it might be a Type I error), provide a measure of any such heterogeneity’s real-world importance, or tell us which of countless alternative models is appropriate. Likewise, failing to reject homogeneity doesn’t rule out definitively some type of heterogeneity (e.g., it might be a Type II error) or preclude detecting a specific pattern of heterogeneity (e.g., a covariate effect). Other proposed ways to assess homogeneity, such as descriptive measures of the magnitude of heterogeneity or its influence on certain results (e.g., *H*^{2}, *I*^{2}), are beyond the present scope.

Example—Workplace Exercise:Let’s illustrate SHoFE analyses using Conn, Hafdahl, Cooper, Brown, and Lusk’s (2009) quantitative review of workplace exercise interventions, described in Part 1 of this overview. Corresponding to each of their (well, our) SRE results in Tables 2 and 3, for three types of standardized mean difference (SMD) on 11 outcome variables, they also conducted SHoFE analyses. In particular, for fitness they analyzedk= 35 two-group posttest SMDs after excluding one outlier.^{F4}These estimates and their (estimated) CVs—based on shrinkage estimates of θthat I won’t discuss here—yield the following sums needed for SHoFE analyses:_{i}

- ∑
= 321.7_{i}w_{i}- ∑
= 183.4_{i}w_{i}y_{i}- ∑
_{i}w_{i}y_{i}^{2}= 172.1These in turn yield the WLS point estimate of μ

^μ

_{F}= 183.4 / 321.7 = 0.570and its variance

Var(^μ

_{F}) = 1 / 321.7 = 0.0558^{2}.This estimate of the common two-group posttest SMD on fitness represents a treatment mean just over ½ standard deviation (SD) above the control mean, and it’s about 10 times larger than its SE. Using these quantities for standard-normal inferences, we obtain the 95% CI

0.570 ± 1.960(0.0558) = (0.461, 0.679) .

A two-tailed test of the nil null hypothesis

H_{0}: μ = 0 at α_{2}= .05 yields the test statistic

z_{F}= 0.570 / 0.0558 = 10.22 ,whose

pvalue is 0 to many decimal places. This CI and test reflect only within-study sampling error over hypothetical meta-analyses (due to random sampling of participants), thereby supporting conditional inferences that extend only to studies like Conn et al.’s 35. The test indicates that the common SMD is (statistically) significantly positive, and the CI suggests more specifically that we can be 95% confident—in the somewhat awkward frequentist sense—that this common SMD is between 0.46 and 0.68.To assess homogeneity we can compute the heterogeneity statistic

Q(34) = 172.1 − (183.4^{2}/ 321.7) = 67.6 ,for which

p= .000529. This indicates significant heterogeneity, which suggests these data might violate the SHoFE model’s homogeneity assumption.

**SRE.** This model’s two hyperparameters, μ and τ^{2}, are usefully viewed as the mean and variance (i.e., BSVC) of a (hyper)distribution of ES parameters. We can estimate and make inferences about both of these. Many meta-analysts who use this model focus solely on μ, but some are also interested in τ^{2} or other features of the ES-parameter distribution. Perhaps the most widely used meta-analytic technique for this model is a two-step procedure that entails first obtaining a weighted method-of-moments (WMoM) estimate of τ^{2}; adding this to each study’s CV to estimate yields *unconditional* variances, whose reciprocals are weights in a WLS estimate of μ. Specially, we first use the SHoFE model’s weights (*w _{i}*) and heterogeneity statistic (

*Q*) to estimate τ

^{2}as

^τ_{S}^{2} = max{0, [*Q* − (*k* − 1)] / *c*_{S}} ,

where taking the maximum avoids negative estimates, and

*c*_{S} = ∑* _{i}w_{i}* − (∑

_{i}w_{i}^{2}/ ∑

*) .*

_{i}w_{i}For insight into this BSVC estimator, consider the “balanced” case where every study’s CV estimate is *v*: Because all weights are equal (i.e., *w _{i}* =

*w*= 1 /

*v*for all

*i*), ^μ

_{F}is just the simple mean of ES estimates,

*Q*is the unweighted sum of squared deviations from this mean,

*c*

_{S}reduces to

*w*(

*k*− 1), and ^τ

_{S}

^{2}is either 0 or a positive value for

*s*

_{y}^{2}−

*v*, where

*s*

_{y}^{2}is the usual unbiased variance estimate applied to the ES estimates. Re-arranging this yields

*s _{y}*

^{2}= ^τ

_{S}

^{2}+

*v*,

which represents a decomposition of the ES estimates’ total variance into between-studies and within-study variances (i.e., due to sampling of studies and subjects). Even in the more general situation with unequal *v _{i}*, the above BSVC estimate is still essentially the excess variance in ES estimates beyond that due to within-study variance.

At any rate, we next use the BSVC estimate to estimate each study’s unconditional weight as as *w*_{Si} = 1 / (^τ_{S}^{2} + *v _{i}*). (The somewhat clumsy notation

*w*

_{Si}distinguishes this weight from its counterparts from the SHoFE, MHoFE, and MRE models.) Provided that ^τ

_{S}

^{2}> 0, these unconditional weights (

*w*

_{Si}) will be smaller—reflecting lower precision—and more similar than their conditional counterparts (

*w*). Now, to estimate μ we simply apply WLS with these new weights:

_{i}^μ_{R} = ∑_{i}w_{Si}y_{i} / ∑_{i}w_{Si} .^{F5}

As ^τ_{S}^{2} increases, ^μ_{R} approaches the ES estimates’ unweighted mean. The mean estimator’s variance,

Var(^μ_{R}) = 1 / ∑_{i}w_{Si} ,

increases with larger ^τ_{S}^{2}; this is evident in the balanced case (i.e., *v _{i}* =

*v*for all

*i*), where Var(^μ

_{R}) = (^τ

_{S}

^{2}+

*v*) /

*k*. It’s conventional to use ^μ

_{R}and its variance for standard-normal inferences about μ, such as a CI or test. These procedures face additional limitations besides those for their counterparts under the SHoFE model: Because ^τ

_{S}

^{2}and hence Var(^μ

_{R}) are subject to sampling error,

*standard-normal techniques may perform poorly*, especially with few studies (i.e., small

*k*). Moreover, if the CV depends on θ

*it’s unclear what substitute for θ*

_{i}*in*

_{i}*v*would optimize estimation or inference (e.g., estimate of μ? shrinkage estimate of θ

_{i}*?). To overcome some of these limitations, other estimators for τ*

_{i}^{2}have been proposed, as have other methods of inference for μ; they’re beyond this overview’s scope, but the following articles and chapter address several of them:

DerSimonian, R., & Kacker, R. (2007). Random-effects model for meta-analysis of clinical trials: An update. *Contemporary Clinical Trials, 28,* 105-114. doi:10.1016/j.cct.2006.04.004

Raudenbush, S. W. (2009). Analyzing effect sizes: Random-effects models. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), *The handbook of research synthesis and meta-analysis* (2nd ed., pp. 295-315). New York: Russell Sage Foundation.

Sidik, K., & Jonkman, J. N. (2007). A comparison of heterogeneity variance estimators in combining results of studies. *Statistics in Medicine, 26,* 1964-1981. doi:10.1002/sim.2688

Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. *Journal of Educational & Behavioral Statistics, 30,* 261-293. doi:10.3102/10769986030003261

As for **inference about τ ^{2}**, we’ve already met the most common procedure (and its limitations): The

*Q*test of the SHoFE model’s homogeneity assumption also tests

*H*

_{0}: τ

^{2}= 0. (Readers familiar with random-effects ANOVA may recognize a parallel with the classical 1-way ANOVA for

*k*independent samples’ means, where the test is identical for fixed and random factors.) It’s also possible to test

*H*

_{0}: τ

^{2}= τ

_{0}

^{2}with a non-zero a priori value for τ

_{0}

^{2}, but this is rarely done and won’t be addressed here. Constructing a CI for τ

^{2}is more common but still fairly rare; the following article provides computational details and may help interested readers find related work (e.g., in citing articles):

Viechtbauer, W. (2007). Confidence intervals for the amount of heterogeneity in meta-analysis. *Statistics in Medicine, 26,* 37-52. doi:10.1002/sim.2514

Finally, I’ll simply mention **other potentially interesting features of the ES-parameter distribution** without considering estimation or inference methods. In some situations we might wish to find the proportion of ES parameters below, above, or between selected values (e.g., positive, negligibly small), which involves the cumulative distribution function (CDF). Likewise, finding values that demarcate specific proportions of ES parameters, such as quartiles or percentiles, involves the quantile function (i.e., inverse CDF). For instance, we might wish to express between-studies heterogeneity as an interval or more general set of values in which most ES parameters fall, such as a 95% prediction interval, credibility interval (in validity-generalization parlance), or highest density region. These proportions and quantiles depend on the distribution’s shape, which we might estimate from our data instead of assuming normality. If we permit non-normal ES parameters, we might also be interested in higher-order moments such as skewness or kurtosis.

Example—Workplace Exercise:Let’s illustrate a SRE analysis using again Conn et al.’s (2009) 35 two-group posttest SMDs on fitness. We’ll need ∑= 321.7 from the SHoFE analyses as well as ∑_{i}w_{i}_{i}w_{i}^{2}= 4063.9. To estimate the BSVC we first compute

c_{S}= 321.7 − (4063.9 / 321.7) = 309.1 ,which in turn yields

^τ

_{S}^{2}= [67.6 − (35 − 1)] / 309.1 = 0.330^{2}.Adding this estimate to each study’s CV estimate and computing unconditional weights (

) yields the following sums:w_{Si}

- ∑
_{i}w_{Si}= 149.2- ∑
_{i}w_{Si}y_{i}= 86.4These in turn yield the point estimate of μ

^μ

_{R}= 86.4 / 149.2 = 0.579and its variance

Var(^μ

_{R}) = 1 / 149.2 = 0.0819^{2}.This estimate of the mean two-group posttest SMD on fitness is only slightly larger than its SHoFE counterpart (for the common SMD). This SRE estimate’s variance is more than twice the SHoFE estimate’s, however, reflecting the substantial BSVC. Using these quantities for standard-normal inferences about μ, we obtain the 95% CI

0.579 ± 1.960(0.0819) = (0.419, 0.740) .

A two-tailed test of the nil null hypothesis

H_{0}: μ = 0 at α_{2}= .05 yields the test statistic

z_{R}= 0.579 / 0.0819 = 7.08 ,whose

pvalue is 0 to many decimal places. This CI and test reflect both within-study and between-studies sampling error over hypothetical meta-analyses, supporting unconditional inferences that extend to a universe of studies from which our 35 were sampled. The price paid for these broader inferences, relative to their SHoFE counterparts, is a less precise estimate of μ, as reflected in the wider CI and smaller test statistic. The test indicates that the mean SMD is significantly positive, and the CI suggests we can be 95% confident that this mean SMD is between 0.42 and 0.74.

That’s all I’ll say about meta-analytic models without covariates. Stay tuned for Part 5c, in which I’ll describe and demonstrate versions of the above procedures that handle covariates; I’ll also mention some extensions and other variants of these models and procedures.

## Footnotes

**1.** I doubt HeFE models are nested within RE models, but I’m unsure; this is rarely (if ever) discussed. Clearly they’d be equivalent if *U _{i}* = η

*, but this constraint isn’t expressed in terms of (hyper)parameters.*

_{i}**2.** Comparing non-nested models is trickier but possible.

**3.** To relate this test to similar tests of fixed effects in more complex models, note that squaring *z*_{F} yields a statistic distributed approximately as χ^{2}(1) (i.e., chi-squared with 1 degree of freedom) under *H*_{0}. We can write this as a weighted sum of squares comparing two models—one in which μ is estimated freely and another in which it’s constrained to μ_{0}:

*Q*_{μF} = ∑* _{i}w_{i}*(^μ

_{F}− μ

_{0})

^{2}= (^μ

_{F}− μ

_{0})

^{2}/ (1 / ∑

*) = [(^μ*

_{i}w_{i}_{F}− μ

_{0}) / SE(^μ

_{F})]

^{2}=

*z*

_{F}

^{2}.

**4.** Their analyses accounted for dependence among 4 multiple-treatment pairs and 1 multiple-treatment triplet; for simplicity I’ll instead treat the 35 SMD estimates as independent, which decreases Var(^μ_{F}) and *Q* somewhat.

**5.** Although ^μ_{R} is a different estimator than the SHoFE model’s ^μ_{F}, and these estimate different quantities in different models, they sometimes take the same value: when *Q* ≤ *k* − 1 and, hence, ^τ_{S}^{2} = 0 so that *w*_{Si} = *w _{i}*.

[...] previous three posts on fitting models to effect sizes (ESs)—Parts 5a, 5b, and 5c—were the core of my seven-part overview of meta-analysis. With only two posts remaining [...]

[...] Overview of Meta-Analysis, Part 5b (of 7): Primary Meta-Analyses (cont.) [...]