Overview of Meta-Analysis, Part 1 (of 7): Effect Sizes

This post is the first in a seven-part overview of common meta-analytic tasks.  In this first part I’ll introduce a real-world substantive application of meta-analysis and address estimating effect sizes (ESs).  Subsequent parts will focus on the following topics:

  • Part 2: obtaining information about ES sampling error
  • Part 3: collecting features of ESs
  • Part 4: exploring data
  • Part 5: fitting meta-analytic models to ESs (subparts 5a, 5b, and 5c)
  • Part 6: checking for potential problems
  • Part 7: expressing results informatively

I’m writing this overview mainly to convey my conception of the “big picture” in typical meta-analyses and introduce terminology and notation I often use for key ideas and objects, such as ESs and standard meta-analytic models.  Also, while describing each task I’ll mention conceptual or procedural issues that often accompany it.  This overview, though long-ish by blog standards, will be fairly superficial and light on citations, but I hope it provides valuable context and serves as a foundation for future posts about numerous issues raised here.


Before diving into the overview, I’ll note two matters of scope.  First, in keeping with this blog’s emphasis on statistical techniques for quantitative research syntheses, this overview focuses on meta-analysis and closely related tasks that precede or follow such analyses.  Readers interested in other phases of research synthesis might consult some of the numerous pertinent books, articles, and other work.  For instance, the ‘Article Alerts’ section of Research Synthesis Methods—described in a previous post about resources—has featured more than 40 articles from 2009 or 2010 with an overview of research synthesis and nearly 20 with an overview of meta-analysis; in the ‘RSMAA_Print’ worksheet of the Excel file for the fourth installment, keywords for these articles begin with “overview.”

Second, I’ll address common tasks and issues in typical meta-analyses but neglect several tasks and accompanying issues that meta-analysts encounter less often.  For instance, in this overview I’ll emphasize meta-analyses of aggregate data (i.e., summary statistics from each study); though meta-analyzing individual participant/patient data offers distinct benefits and challenges, it’s much rarer.  In light of the wide variety of questions meta-analysts address by employing countless techniques with numerous types of data, a comprehensive overview would be unwieldy.  I hope to address some atypical tasks and issues in future posts.

Running Example: Exercise Interventions in the Workplace

To illustrate key ideas in this overview I’ll refer to the following published project from a much larger NIH-funded research synthesis on exercise interventions:

Conn, V. S., Hafdahl, A. R., Cooper, P. S., Brown, L. M., & Lusk, S. L. (2009). Meta-analysis of workplace physical activity interventions. American Journal of Preventive Medicine, 37, 330-339. doi:10.1016/j.amepre.2009.06.008

In a nutshell, Conn et al. collected and analyzed results from more than 130 studies of diverse exercise interventions provided at workplaces.  They (err … we) were interested in these interventions’ effects on various outcome variables, and they included a few types of comparisons between participants who had versus hadn’t experienced an intervention to increase their exercise.

Although this project was not typical and exemplary in all respects, most readers without specialized knowledge in the substantive area will easily understand the gist of its key variables and research questions.  Also, it’s convenient: A pre-publication version of the article is free in PubMed Central, and because I conducted most analyses the data are readily available.

Task 1: Estimate Effect Sizes

An early task in most meta-analyses is to obtain from each study one or more estimates of an ES.  In this section I’ll explain the basic idea of an ES, introduce notation for ESs from several studies, describe several distinctions among ESs, and comment on a few issues that arise when extracting ESs from study reports.  (Most of this is irrelevant to methods for combining p values, which were popular historically but are now used only rarely—mainly with certain specialized data such as from microarrays or neuroimages.)

Basic Terminology and Notation

An ES estimate is a statistic computed using data from a sample of subjects.  Most ES estimates have a corresponding ES parameter as the estimand, which is sometimes called the true, population, or infinite-sample ES.  In addition to their vital role as a basic data element in most meta-analyses, ESs are used in primary studies to plan sample size and other aspects, interpret results, and convey findings to particular audiences.

To introduce notation for ESs, let’s consider a simple meta-analytic situation where each of k independent studies contributes just one single-valued (e.g., scalar) ES estimate.  I’ll denote Study i‘s ES estimate as yi, i = 1, 2, …, k.  Here’s some related notation:

  • Yi: the (random) estimator of which yi is a particular sample’s realization or instantiation; conducting Study i with different samples of subjects would yield different observed values yi of the estimator Yi
  • θi: the (fixed, unknown) ES parameter that’s Yi‘s estimand and of which yi is an estimate
  • Θ: the random variable of which θi is a realization; the ES parameter might vary among studies due to one or more varying features (e.g., subject population, variables, procedures, setting)

To recap, yi is a sample’s realization of the estimator Yi and an estimate of the ES parameter θi, and for some analyses θi is viewed as a study’s realization of Θ.  Of these four things we observe only yi in an actual meta-analytic data set.  As we’ll discuss in the fifth part of this overview, meta-analytic methods often use yi together with information or assumptions about Yi (e.g., its variance or distribution) to estimate or make inferences about θi or features of Θ‘s distribution.

Distinctions Among ESs

Many ESs quantify the direction and magnitude of a bivariate association, such as a zero-order correlation or a comparison between two groups’ or conditions’ averages (e.g., raw or standardized mean difference, ratio of [positive] means), dispersions (e.g., variance ratio), proportions (e.g., risk difference, risk ratio, odds ratio), or rates (e.g., rate ratio, hazard ratio).  Some ESs, however, pertain to only one variable, such as a mean or proportion.  Still others represent more complex relations among two or more variables, such as a contrast among univariate ESs from several distinct groups or conditions, a linear or more general combination of several simpler quantities (e.g., tetrad difference involving 4 correlations among 4 variables, ratio of two groups’ odds ratios), or a measure of partial association (e.g., partial correlation, covariate-adjusted mean difference).

An aside on terminology: Some authors reserve “effect size” for a specific type of bivariate association, such as a difference in means between experimental groups.  For convenience in discussing widely applicable meta-analytic methods, however, I often use “effect size” to refer broadly to any statistical quantity combined or compared among studies (or other entities) in a meta-analysis.

Complicating Issues

In a given meta-analytic sample the ESs should be commensurable across studies, so they can be compared or combined meaningfully.  Loosely speaking, this means that the ES for every study represents the same quantity, though its value might vary among studies.  For example, it’d be silly to directly compare one study’s correlation with another’s mean difference, but it’s often sensible to compare two studies’ values of a correlation for variables X and Z.  Commensurability can be tricky to establish or assess, and it relates the “apples and oranges” criticism of some meta-analyses; for instance, comparing two studies’ XZ correlations might be less defensible if the studies differed in how X was measured or how subjects were selected.

Although the ES notation introduced above will suffice for many topics I’ll address in this blog, we’ll adapt it for more complex situations as needed.  For now I’ll just mention two such complexities:

  • Multivariate ESs: Some meta-analyses focus on a vector (i.e., ordered list) of quantities, such as pairwise correlations among three or more variables, a diagnostic test’s sensitivity and specificity, or at least two differences or ratios between at least three groups or conditions (e.g., Control vs. Treatments A, B, and C).  I’ll sometimes use “univariate ES” and “multivariate ES” to distinguish between single- versus vector-valued ESs.  As for notation, we can use bold to denote ES vectors (e.g., yi, Yi, θi, Θ).
  • Other data structures: Some studies provide multiple ES estimates, such as from different samples of subjects or from the same sample measured on different variables or occasions.  Multiple ESs from a study are not in general independent, and meta-analytic methods for handling dependent ESs tend to be more complicated or involve additional data (e.g., correlation between variables or occasions).  Another twist is that entities whose ESs are meta-analyzed might not be studies, per se.  For example, meta-analytic methods are used to synthesize results from several related experiments published in one report, single-subject trials, and sites in a multi-site/-center study.

Broadly applicable statements about extracting or analyzing ESs are hard to come by, especially as meta-analysts tackle more complex questions using more diverse types of data.  For instance, certain meta-analytic procedures for studies with binary or other categorical outcomes use sample statistics that don’t have corresponding ES parameters (e.g., counts in binomial or Poisson models).  As another example, some ES statistics give no info about an association’s direction, such as proportion-of-variance indices (e.g., R2, η2, ω2) and intraclass correlations.  Nevertheless, a few commonly encountered issues warrant mention:

  • ES options: In a given meta-analysis there often are several options for an ES.  Choosing among these depends on substantive considerations (e.g., desired interpretation), statistical matters (e.g., sampling properties of estimators), and aspects of studies’ data (e.g., design and measurement issues, distribution features, reporting of results).  For example, here are some choices when the focal quantity is a difference between means: (a) whether the means are from independent or correlated samples, (b) whether to standardize the difference, (c) which standard deviation (SD) to use as the standardizer (e.g., reference sample, pooled samples, external sample[s]), and (d) whether to adjust the difference for any covariates; also, robust and resistant ES options for comparing averages exist.  As another example, ESs for comparing two proportions include a risk difference, risk ratio, odds ratio, Youden’s index, and several others, and we could instead treat the two proportions as a bivariate ES.
  • Extracting ESs: Because meta-analysts rely heavily on how authors conducted their studies and reported results, extracting ES estimates can be tricky (and vexing!).  Ideally each study reports either the desired ES estimate or exactly what’s needed to compute it.  We might, however, have to request data from authors or use what’s reported to approximate the ES—often relying on untestable assumptions.  Furthermore, due to attrition or other reasons some of a study’s reported results might be from only a subset of the initial sample.  For example, suppose we’d like two independent groups’ standardized mean difference (SMD) with a pooled-variance standardizer: We’d ideally estimate this ES from each group’s sample size, mean, and variance or equivalent results (e.g., t test, correlation, regression), but we might have to approximate the SMD using a variance from only one group or other samples (e.g., MSerror from ANOVA), results based on outcome scores that are adjusted (e.g., gain scores, ANCOVA) or dichotomized (e.g., 2 × 2 contingency table, chi-squared test), or crude information about the SMD’s sign or statistical significance (e.g., 2-tailed p > .05).
  • ES transformations: We might wish to transform an ES estimate to improve certain statistical properties related to its sampling distribution—over hypothetical random samples from the same study—or for other reasons.  One common type of transformation reduces an ES estimator’s bias; a bias-adjusted ES estimator is in the same metric as its unadjusted counterpart but has smaller expected deviation from its ES parameter, E(Yi − θi), especially in small samples.  Another type of transformation improves an ES estimator’s conformity to certain meta-analytic methods’ assumptions, such by making it more normal (e.g., [natural] log of odds or odds ratio) or reducing its variance’s dependence on the unknown ES parameter (e.g., Fisher’s z-transformation of Pearson correlation, arcsine transformation of proportion).  Still other transformations are used to put all studies’ ES estimates in a common metric, such as when most studies report SMDs but some report correlations or proportions; this should be done cautiously, especially when some studies used different types of data (e.g., 2 dichotomies, 2 continuous variables, 1 of each).

Example—Workplace Exercise: Conn et al. (2009) reviewed effects of workplace exercise interventions on several outcome variables, including physical activity behavior, health (e.g., fitness, diabetes risk, lipids, anthropometric measures), well-being (e.g., quality of life, mood), and work-related measures (e.g., attendance, job stress, job satisfaction, healthcare utilization).  They quantified these effects using SMDs, largely because (a) measures of most focal outcomes varied among studies (e.g., physical activity behavior as kcal/week, MET-hours/week, minutes/week, or steps/day; diabetes risk as circulating insulin or fasting blood sugar), (b) participants in most studies were assigned essentially continuous scores on these measures, and (c) the focal effects could be represented as a comparison between two groups’ or conditions’ means.

Conn et al. computed more than 800 SMD estimates, adjusted for bias when feasible.  More specifically, they computed the following three types of SMD, based on selected means and SDs from treatment or control groups before or after the intervention:

  • Two-group posttest: difference in post-intervention mean between independent treatment and control groups, divided by these groups’ pooled post-intervention SD
  • Treatment pre-post: difference between treatment group’s post- and pre-intervention means (i.e., mean gain score), divided by its pre-intervention SD
  • Two-group pre-post: difference between treatment pre-post SMD and control group’s corresponding pre-post SMD, which essentially compares the groups’ mean gain scores

They confronted numerous choices and challenges when obtaining data for and computing ESs.  As for selecting data to use, several studies included multiple focal variables or multiple measures of a variable, reported measures on multiple post-intervention occasions, measured variables for multiple treatment or control groups, or some combination of these.  Such “multiples” from a study provide more info but at the cost of probably violating independence among ESs (e.g., multiple-endpoint or -treatment dependence, hierarchical dependence due to clustering).  Conn et al. avoided several complications due to dependence by choosing only one measure and occasion per variable, which simplifies analyses but has potential drawbacks.  They retained multiple treatment or control groups, however; I’ll mention attendant complications in subsequent posts about sampling error and analyses.

Regarding ES computations, many studies provided all required info—usually 1 or 2 sample sizes, means, and SDs—or equivalent data (e.g., mean difference, standard error).  However, some reported results that necessitated approximating SMDs under mild to strong assumptions.  For example, several studies reported both groups’ post-intervention means but only their pre-intervention SDs, which isn’t exactly what’s needed for a two-group posttest SMD.  More troubling were cases where authors had dichotomized scores and reported a success rate for each group or occasion, or where a pre-post correlation (i.e., between pre- and post-intervention scores) was needed to approximate a pretest SD (e.g., from the gain-score SD) but wasn’t reported; details of handling these issues are beyond the present scope.

That’s all for now about estimating ESs.  Despite only skimming the surface I’ve tried to give a sense of numerous interesting—and sometimes hair-pullingly frustrating—issues that meta-analysts encounter in this early stage.   In future posts I hope to explore some of these issues as well as those I’ll raise in the next six parts of this seven-part overview.  Stay tuned!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s