Overview of Meta-Analysis, Part 5c (of 7): Primary Meta-Analyses (cont.)

This is the last of three posts in Part 5 of my overview of meta-analysis.  In Part 5a I described six conventional meta-analytic models for effect-size (ES) estimates, and in Part 5b I described estimation and inference for two of those models without covariates.  In this post I’ll extend the methods of Part 5b to two models with covariates and comment on extensions and other variants of these models and procedures, to hint at the wide variety of situations that arise in meta-analysis.  In Parts 6 and 7 of the overview, I’ll address follow-up procedures and ways to report results, respectively.

Estimation and Inference: Models with Covariates

In each subsection below I describe common procedures for estimating and making inferences about (hyper)parameters in two conventional meta-analytic models with one or more study-level covariates: MHoFE and MRE.  Each of these models extends its simpler no-covariate counterpart—SHoFE or SRE—by incorporating fixed covariate effects.  These effects, contained in the coefficient vector β, are often central to meta-analyses that address ES moderators.

MHoFE.  Under this meta-regression model, estimation and inference focus mainly on β, the vector of regression coefficients.  The widely used weighted least-squares (WLS) procedures for these tasks just generalize their counterparts for the SHoFE model.  We can express these compactly by using matrices to collect the studies’ covariates, weights, and ES estimates:

  • X = [x1T  x2T  …  xkT]T is a k × (q+1) matrix formed by stacking xi vertically.
  • W = diag(w1, w2, …, wk) is a k × k diagonal matrix with (estimated) weights wi = 1/ vi on the diagonal (and 0 elsewhere).
  • y = [y1 y2yk]T is a k-element column vector of ES estimates.

Now we can estimate β as

^βF = (XTWX)-1XTWy ,

and this estimator’s covariance matrix is

Cov(^βF) = (XTWX)-1 .F1

To be clear, the (partial regression) coefficients in ^βF are not standardized, and Cov(^βF) contains each coefficient’s sampling variance on the diagonal and each pair’s sampling covariance off the diagonal.  We can use ^βF and Cov(^βF) for inference about β.  For instance, denoting individual coefficients as βj, j = 0, 1, 2, …, q, we can apply standard-normal procedures to ^βjF and SE(^βjF) = √Var(^βjF) to test H0: βj = βj0 or construct a confidence interval (CI) for βj just as we did for μ under the SHoFE model.

More generally, standard-normal inference procedures extend readily to two or more of β‘s elements or linear combinations thereof, such as if we wish to test or construct a multivariate confidence region for all non-intercept coefficients, a subset of them (e.g., testing a block of covariates while partialling another), or linear combinations of them (e.g., 1 or more contrasts among a categorical moderator’s levels).  In brief, the general procedures for m linear combinations of β‘s elements use a m × (q + 1) matrix L, of which each row contains that linear combination’s q + 1 coefficients.  Denoting the vector of linear combinations as γ = , we can use L to obtain the estimate

^γF = L^βF

and its covariance matrix

Cov(^γF) = LCov(^βF)LT .

We can in turn use these to test γ (e.g., H0γ = γ0) or its elements or construct confidence regions for these quantities using multivariate normal-theory procedures that involve χ2 distributions.  All of these inference methods are prone to similar problems as their counterparts for the SHoFE model, with additional complications such as what to substitute for θi when the conditional variance (CV) depends on it (e.g., yi, xi^βF, ^μF).

As for assessing the MHoFE model’s assumption of residual homogeneity, we can use a generalization of the SHoFE model’s Q statistic.  Namely, this assumption that each study’s ES parameter is its covariate-predicted value—so there’s no excess or residual variation of ES parameters—can be written as the null hypothesis H0: θi = xiβ.  We can test this using the residual (or error) heterogeneity statistic

QE = (yX^βF)TW(yX^βF) = ∑iwi(yixi^βF)2 ,

which is distributed approximately as χ2[k − (q + 1)] under H0.  A significant upper-tail test indicates there’s more heterogeneity among ES parameters than expected due to the covariate(s) and CVs.  This test is subject to similar limitations as its SHoFE counterpart.

The form of QE suggests a weighted sum of squared deviations between predicted values from two nested models: one with k parameters (θi for each study) and another with q + 1 parameters.  Indeed, QE is a special case of a more general statistic for comparing Models A and B when A is nested within B:

QA,B = ∑iwi(xiB^βFBxiA^βFA)2 ,

where xiA and xiB are Study i‘s covariate values for each model and ^βFA and ^βFB are each model’s estimated coefficients.  Under the null hypothesis that Models A and B are equivalent (i.e., constraints on B to create A reflect reality), QA,B is distributed approximately as χ2(qB − qA), where qA and qB are each model’s number of independent parameters.  A significant upper-tail test indicates that Model B’s gain in adequacy more than offsets its loss in parsimony.  Furthermore, if wi is the same for both models, then we can compute this model-comparison statistic as

QA,B = QEA − QEB ,

the difference between the two models’ residual heterogeneity statistics.  For example, since the SHoFE model is nested within the MHoFE model, we can test the latter’s q non-intercept coefficients (in β) by comparing these models—with 1 and q + 1 parameters—using Q − QE, which is distributed approximately as χ2(q) when all non-intercept coefficients are 0.

Finally, several issues related to these procedures for the MHoFE model deserve mention:

  • Most of the above procedures can be implemented as weighted versions of least-squares methods in popular statistical software packages, though care must be taken to ensure that inferential results are handled correctly (i.e., treating wi as known).
  • Some procedures used routinely in ordinary least-squares regression for primary-study data are less useful or harder to justify under this MHoFE model.  For instance, meta-analytic methodologists rarely discuss standardizing ^βF, and R2-type indices are complicated by the model’s known, heterogeneous CVs.
  • For certain models in which all covariates represent coded values for categorical moderators (i.e., ANOVA analogues), some of the above formulas can be expressed in terms of weighted cell means, main and interaction effects, etc.  Some authors report results for such models in terms of a decomposition of total heterogeneity into that due to one or more effects (e.g., main, interaction) and within-cell variation.
  • Comparing models usually requires estimating both models from the same data.  This is often complicated by missing data: Because adding more covariates tends to reduce the number of studies with complete data, some studies with all covariates for a simpler model might not have all covariates for a more complex model.
  • Meta-regression analyses often entail multiple inference, such as several tests or CIs based on a given model (e.g., for 2 or more elements of β or γ) or inferences based on multiple models for the same ES estimates.  In these situations, modifying procedures to avoid inflated Type I error rates or overconfidence might be advisable (e.g., adjusted tests, simultaneous CIs).

Example—Workplace Exercise: Conn et al. (2009) conducted meta-regression analyses to investigate potential moderators of two-group posttest standardized mean differences (SMDs) on four outcome variables: physical activity, fitness, lipids, and anthropometric measures.  Although they (well, we) reported results for only separate mixed-effects (i.e., MRE) analyses of several dichotomous covariates and one three-level categorical covariate, they also conducted fixed-effects (i.e., MHoFE) versions of these analyses as well as fixed- and mixed-effects analyses of various pairs and larger sets of selected dichotomies.  They considered these numerous analyses largely exploratory, aimed at generating hypotheses to be examined in future primary studies.

To illustrate a MHoFE analysis, let’s consider Conn et al.’s (2009) fixed-effects model for all k = 35 (non-outlier) fitness SMDs and the dichotomy Paid During Intervention (PDI)—whether employees were paid during their time participating in the intervention.F2  They dummy coded PDI such that xi = [1 1] if the study reported that employees were paid during the intervention (PDIy, ky = 8 SMDs) and xi = [1 0] otherwise (PDIn, kn = 27 SMDs).  Computational details aside, here are the basic results WLS yields:

  • intercept and its variance: ^β0F = 0.466, Var(^β0F) = 0.06252 .
  • slope and its variance: ^β1F = 0.512, Var(^β1F) = 0.1382 .
  • intercept-slope correlation: Corr(^β0F, ^β1F) = -0.452 .

Their main interest was in the slope.  Denoting the common SMDs for PDIn and PDIy as μy and μn, respectively, we see that their dummy-coding scheme implies that β1 = μy − μn (because μn = β0 and μy = β0 + β1), so ^β1F estimates the difference between common SMDs.  Using the above quantities for standard-normal inferences about β1, we obtain the 95% CI

0.512 ± 1.960(0.138) = (0.241, 0.783) .

A two-tailed test of the nil null hypothesis H0: β1 = 0 at α2 = .05 yields the test statistic

zF = 0.512 / 0.138 = 3.70 ,

for which p2 = .000216.  (Some authors would report this as a heterogeneity statistic for the [non-intercept] model, between-groups, or regression source, QMF(1) = zF2 = 3.702 = 13.7.)  This fixed-effects CI and test support conditional inferences that extend only to studies like the 35 we’ve included.  The test indicates that the common SMD is significantly higher for studies reporting that employees were paid during the intervention, and the CI suggests we can be 95% confident that this PDIy “advantage” is between 0.24 and 0.78.

We could use the WLS results to estimate μy and μn, as Conn et al. (2009) did, and make inferences about these common SMDs, either separately or simultaneously as a pair.  We could also estimate and make inferences about other linear combinations of β0 and β1 (e.g., an unweighted or weighted mean of μy and μn), terms in countless more complex models that include other covariates and joint effects (e.g., interaction effects), and so on.  I’ll defer those, perhaps for later posts.

Finally, this analysis yields the residual (or within-group, or error) heterogeneity statistic QE(1) = 53.9, p = .0122, which indicates significant heterogeneity beyond PDI.  Note that

QMF + QE = 13.7 + 53.9 = 67.6 = Q ,

where Q is from the SHoFE model in Part 5b.  On a related note, we could obtain most of the MHoFE results by fitting the SHoFE model separately to each of the PDIy and PDIn subsets; in particular, we could compute QE as the sum of the two resulting Q statistics, say Qy and Qn.  Hence, we’d have the following decomposition of total heterogeniety into between-groups/model and two within-group/error sources:

Q = QMF + Qy + Qn .

MRE.  Because this model generalizes each of the SRE and MHoFE models, we’ve already done most of the heavy lifting to understand the former’s meta-regression procedures.  Simply put, estimation of and inference for β depends on τ2, the residual between-studies variance component (BSVC) that quantifies residual heterogeneity beyond the covariate(s).  Here I describe a relatively simple two-step procedure that involves a weighted method-of-moments (WMoM) estimator for τ2 and WLS methods for β.  Specially, we first use the fixed-effects weights (wi) and MHoFE residual heterogeneity statistic (QE) to estimate τ2 as

M2 = max{0, [QE − (k − q − 1)] / cM} ,


cM = ∑iwi − tr[(XTWX)-1XTW2X] ,

and tr() denotes the argument matrix’s trace (i.e., sum of diagonal elements).  We next use the BSVC estimate to estimate each study’s unconditional weight as wMi = 1 / (^τM2 + vi), which we in turn use with WLS to estimate β and obtain this estimate’s covariance matrix:

^βR = (XTWMX)-1XTWMy ,

where WM = diag(wM1, wM2, …, wMk), and

Cov(^βR) = (XTWMX)-1 .

(The notation wMi and WM distinguishes these quantities from their counterparts for the SHoFE, SRE, and MHoFE models.)  As with the SRE model, it’s conventional to use ^βR and its covariance matrix for standard-normal or multivariate normal-theory inferences about β, such as confidence regions or tests like those described for the MHoFE model—including linear combinations.  These procedures face similar limitations as their counterparts under the SRE model; Raudenbush (2009) described alternative methods that overcome some of these limitations:

Raudenbush, S. W. (2009). Analyzing effect sizes: Random-effects models. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 295-315). New York: Russell Sage Foundation.

As you might suspect, using QE to test residual homogeneity under the MHoFE model also tests the MRE model’s residual BSVC (i.e., H0: τ2 = 0).  Although comparing nested MRE models is complicated by τ2, which differs between models with different covariates, we can make inferences about subvectors of β by using inference for linear combinations of β (e.g., with point estimate ^γR = L^βR).  Some authors suggest comparing two MRE models (or an MRE and SRE model) informally using the following proportion-of-variance measure based on BSVC estimates for Models A and B:

(^τMA2 − ^τMB2) / ^τMA2 .

This ratio may be negative, however, when Model B accounts for little more heterogeneity relative to its more parameters.

Example—Workplace Exercise: To illustrate a MRE analysis, let’s use the mixed-/random-effects analysis of PDI from Conn et al.’s (2009) Table 4—with the same dummy-coding scheme as above—and compare its results to those from the SRE and MHoFE analyses.F2  To estimate the residual BSVC we’ll need cM = 294.0 (computational details omitted), which yields

τM2 = [53.9 − (35 − 1 − 1)] / 294.0 = 0.2672 .

At this point we can roughly compare the MRE and SRE models by comparing their BSVCs:

(^τS2 − ^τM2) / ^τS2 = (0.3302 − 0.2672) / 0.3302 = 0.191 ,

which suggests that PDI accounts for about 19% of the between-studies variance in SMDs.  Adding the estimated residual BSVC to each study’s CV estimate and computing unconditional weights (wMi) yields the following WLS results for β in the MRE model:

  • intercept and its variance: ^β0R = 0.490, Var(^β0R) = 0.08312 .
  • slope and its variance: ^β1R = 0.433, Var(^β1R) = 0.1872 .
  • intercept-slope correlation: Corr(^β0R, ^β1R) = -0.445 .

Under this model ^β1R estimates the difference in mean SMD between PDIy and PDYn studies.  This difference under the MRE model (0.433) is notably smaller than its MHoFE counterpart (0.512), and the former’s variance (0.1872) is more then 80% larger than the latter’s (0.1382).  Using the MRE quantities for standard-normal inferences about β1, we obtain the 95% CI

0.433 ± 1.960(0.187) = (0.067, 0.799) .

A two-tailed test of H0: β1 = 0 at α2 = .05 yields the test statistic

zR = 0.433 / 0.187 = 2.32 ,

for which p2 = .0203.  (This matches the [non-intercept] model heterogeneity statistic Conn et al. reported, QMR(1) = zR2 = 2.322 = 5.4.)  This mixed-effects CI and test support unconditional inferences that extend to a universe of studies from which our 35 were sampled.  The test indicates that the mean SMD is significantly higher for studies reporting that employees were paid during the intervention, and the CI suggests we can be 95% confident that this PDIy advantage is between 0.07 and 0.80.  This wider CI than the MHoFE model’s reflects less precision due to incorporating between-studies heterogeneity into our inference.

Extensions and Other Variants

I suspect that a majority of meta-analyses conducted to date have used meta-analysis models and procedures described in Parts 5a, 5b, and 5c.  Countless other techniques exist, however, that differ in subtle or substantial ways from those I’ve presented.  In this final section I’ll comment briefly on several different approaches that either extend those I’ve described or depart from them in notable ways.

Model extensions. The models I’ve described can be viewed as special cases of more general models.  Two such extensions involve multiple ES estimates from each study or other focal unit of analysis (e.g., report).  As suggested in Part 1 of this overview, one form of multiple ESs involves a multivariate ES, which is a vector of distinct ESs.  Extending the models I’ve described to multivariate ESs essentially entails replacing within-study and between-studies variances with covariance matrices, accommodating incomplete ESs from some studies (i.e., missing elements), and structuring the design matrix to accommodate (possibly different) covariates for each element.  For instance, multivariate ESs arise in meta-analytic approaches for diagnostic test accuracy (e.g., sensitivity and specificity), mixed-treatment comparisons (e.g., direct and indirect evidence), and explanatory models (e.g., path or factor models).

Multiple ESs may also occur when a study contributes two or more estimates of essentially the same ES parameter, such as from independent samples or the same sample on different measures or occasions.  This nesting or clustering of ESs induces a more complicated structure some authors call “hierarchical dependence,” whereby a study’s ES parameters might be less (or more) heterogeneous among themselves than ES parameters from different studies.  We can accommodate this by extending the two-level models I’ve described to include an intermediate level between studies and ES estimates, which might also specify within-study covariates to account for variation among a study’s ES parameters.

Other types of extensions have also been proposed, such as models that incorporate certain types of bias (e.g., publication bias, inadequate randomization or allocation concealment), individual participant/patient data (IPD), or heterogeneity due to unobserved groups of studies (e.g., finite mixtures).  These are extensions insofar as models I’ve described can be expressed as special cases, such as when there’s no bias, no IPD, or only one group of studies.  Estimation and inference procedures for such models and those involving multiple ESs are beyond this overview’s scope.

Alternative procedures. Estimation and inference for many of the models I described and their extensions mentioned above can be handled using different procedures.  For example, methods developed for linear mixed models—also called multilevel or hierarchical linear models in some contexts—can be adapted for many meta-analytic models; this requires care in handling the latter’s special error structure (e.g., known heterogeneous conditional variances).  Along similar lines, readers familiar with connections between mixed models and structural equation models (SEMs) may not be surprised that clever adaptations of SEM software can be used for many meta-analytic models.

A more substantial departure from procedures I’ve described involves Bayesian approaches, which are becoming more popular and are especially useful for complex models.  Consider a Bayesian approach for the SHoFE model: We could express our belief about plausible values of μ as a prior distribution, and Bayesian techniques could be used to combine this prior with our ES estimates and CVs to obtain a posterior distribution for μ; from this posterior, which represents our prior belief updated by the data, we could obtain a point estimate of or inferences about μ.  We can use similar strategies for more complex models by specifying priors for all (hyper)parameters.  Bayesian methods typically require special software, which is becoming more widespread and accessible for non-statisticians.

Special data types. Meta-analytic models and methods have been proposed for numerous special types of data that may not conform well to the conventional models I’ve described in Part 5.  Below I offer brief remarks on several of these data types.

  • Validity generalization: Methods exist to adjust ESs for various so-called artifacts, such as unreliability and range restriction.  Developed originally for correlations from studies of predictive validity in personnel selection, these methods have been extended to regression slopes, mean differences, and other types of ESs.
  • Reliability generalization: Procedures have been proposed to meta-analyze various measures of reliability (in a psychometrics context), such as test-retest correlations or internal-consistency coefficients.
  • Significance levels: Numerous techniques have been proposed to summarize p values from several studies as one combined test of the composite null hypothesis that every study’s null hypothesis is true.  Historically popular, these procedures neglect ESs and are now used mainly in special applications (e.g., microarrays).
  • Vote counts: When some studies provide for an ES only the estimate’s direction or its directional significance test’s binary result (e.g., significantly positive or not), this crude information can be used to estimate (hyper)parameters in certain meta-analytic models.
  • Categorical outcome: For ESs used with binary or other categorical outcomes (e.g., proportions, counts), models that respect these discrete variables (e.g., binomial, Poisson) may perform better than those based on normal approximations.
  • Single-subject designs: Methods have been proposed for studies that include only one or a few subjects measured on several occasions, usually under two conditions experienced in phases.
  • Longitudinal: When several subjects are measured on multiple occasions, meta-analytic methods for combining such studies typically incorporate information about dependence between repeated measurements.
  • Neuroimaging: Meta-analytic techniques for images of brain structure or function, such as fMRI maps, are complicated by the nature of the data—(relative) activation level from many locations in a three dimensional space.
  • Genetics and genomics: Studies of genetic linkage, genetic association, gene expression, or other phenomena involving genes present challenges for meta-analysis, such as many results from each unit (e.g., in genome-wide studies or from microarrays) and joint effects that involve sets of genes (e.g. pathways).

With that, I’ll end this third and final post of Part 5.  As with previous parts in this overview of meta-analysis, this tour of meta-analytic models and procedures emphasized key ideas but omitted several complications meta-analysts encounter with real data.  Some of these complications will be addressed in Part 6, but others are beyond this overview’s scope.  I hope to address some of the latter in future posts.


1. For the SHoFE model xi = [1] and β = μ, so the MHoFE formulas simplify markedly: Because XTWX = ∑iwi we have Cov(^βF) = Var(^μF), and because XTWy = ∑iwiyi we have ^βF = ^μF.
2. Their moderator analyses did not account for dependence among 4 multiple-treatment pairs and 1 multiple-treatment triplet; for simplicity I’ll follow that practice here.

One Comment on “Overview of Meta-Analysis, Part 5c (of 7): Primary Meta-Analyses (cont.)”

  1. […] Overview of Meta-Analysis, Part 5c (of 7): Primary Meta-Analyses (cont.) […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s