# Overview of Meta-Analysis, Part 4 (of 7): Data Exploration

**Posted:**March 28, 2012 |

**Author:**A. R. Hafdahl |

**Filed under:**Overview of Meta-Analysis |

**Tags:**coding, data management, effect size, meta-analysis, missing data, moderator, outlier, reporting guidance, sample size, significance testing |1 Comment

This seven-part overview’s first three parts focused on collecting data used in meta-analyses: estimates of effect size (ES), sample sizes or conditional variances (CVs) to quantify ES sampling error or (im)precision, and ES features. The overview’s subsequent four parts address analyzing these data and presenting results. In this fourth part I begin by describing **preliminary analyses** that can help identify errors and issues to attend to in primary analyses. (Part 1 of this overview lists the topics for all seven parts.)

## Task 4: Explore Data

After collecting ESs and associated variables but before meta-analyzing these data to answer one’s primary research-synthesis questions, exploring the data using preliminary analyses is prudent. Two main aims for these analyses are to find **errors in the data** and to identify **data features that might require attention** during primary analyses. Although these aims overlap with adjacent tasks that entail collecting and analyzing data, I’ll discuss them as a distinct task. Preliminary analyses might also highlight **issues to address by revisiting earlier stages** of the research synthesis. In this segment I focus on the two main aims but also comment on potential implications for earlier stages.

Let’s first consider finding **errors in the data**, many of which are typically due to errors in either what’s reported (by primary-study authors) or what’s recorded (by research-synthesis personnel). A variety of errors can arise in study reports, such as problems with how the study was designed or implemented, how the methods—including data analyses—were reported, or how results were reported. Inaccurate reporting of methods or results, including errors of omission and commission, is perhaps more problematic for research synthesists than poor study design or conduct: Flawed reporting distorts our record of what transpired, which impedes modeling the data in terms of how they were produced. Likewise, even accurately reported data might be recorded inaccurately en route to the research-synthesis database, such as when an investigator or assistant selects the wrong piece of info to record or incorrectly transcribes or enters the selected info.

Proposed approaches for *preventing* errors in meta-analytic data include guidelines for conducting and reporting primary studies as well as techniques for extracting reported data to synthesize. Here I’ll focus instead on *detecting* errors that’ve crept into meta-analytic data, mainly by using statistical sleuthing strategies akin to data clean(s)ing. To keep this brief, I’ll just mention several types of reporting or recording errors we might find by comparing related info, either within a given study or among studies. How to handle any detected errors is beyond this overview’s scope.

**Comparisons**A number of errors for a given study can be caught by comparing two or more of its values that we expect to be related in some way. For instance, if a study reports both a significance test and statistics used in that test (e.g., sample sizes, means, standard deviations [SDs], correlation), it’s often possible to use those statistics to reconstruct the test; mismatching recorded and reconstructed test results (e.g., test statistic,*within*a study:*p*value) might indicate a reporting or recording error. Another strategy is to compare a particular quantity across groups, conditions, or occasions. For instance, in a two-group pre-post design (e.g., each of 2 independent samples measured before and after an intervention) we might expect certain patterns between groups or occasions, such as sample size no larger at posttest than pretest, similar pretest means or success rates between groups, or similar SDs among all four group-occasion combinations. Similarly, to detect errors in ES features we might check for unexpected associations between variables, such as a marked difference between randomized groups in certain subject characteristics (e.g., female percentage, age mean or SD).**Comparisons**Other types of errors may be caught by comparing certain values among studies (or other units) to find unusual cases. For instance, if a subset of studies includes the same quantity measured commensurably, such as the mean or SD on a specific variable (e.g., subject characteristic, outcome used in ES), we could inspect that quantity’s distribution for that subset. Similarly, we could compare ES estimates among studies to identify potential outliers, which verges into the territory of primary analyses. Extending these strategies to pairs or larger sets of quantities from each study, some of which might be categorical, we could identify cases with unusual*among*studies:*combinations*of values (i.e., high values on both of 2 negatively correlated quantities). Note that any comparison of sample statistics among studies should probably account for differential precision (i.e., due to varying sample sizes or SEs), which essentially involves meta-analytic techniques I’ll describe in Part 5 of this overview.

The second main aim of exploring data involves identifying **issues that might require special handling** during primary analyses. (Some special issues will, of course, be known before data collection or obvious without exploring the data, such as dependent ESs or subject-level aggregates as moderators.) This primarily involves aspects of the data that might interfere with planned meta-analytic procedures, such as characteristics of ES estimates, their associated CVs, or ES features that’ll be used as moderators or in other ways. Although speculating about which aspects might be troubling is difficult without specifying the focal meta-analytic techniques, here are several illustrative situations that can pose challenges for certain common techniques:

- ES estimates at or near their lower or upper bounds, such as proportions or rates near 0.0 or 1.0 or correlations near -1.0 or 1.0
- a distribution of sample sizes or precisions (i.e., inverse CVs) with one or very few values that are markedly larger than the rest
- one or more levels of a categorical moderator variable with no or very few ESs
- a continuous moderator’s distribution with marked bimodality, substantial mass at an extreme value (e.g., pile-up at 0), or very few extreme values separated markedly from the rest
- pairs or larger sets of moderators with sparsely sampled regions, such as two categorical moderators with (nearly) empty cells; this is an issue mainly when the moderators will be analyzed together
- associations between ESs and moderators that are markedly nonlinear
- cases missing ES data or ES features

Many of these issues can be identified by the same strategies mentioned above for finding errors, including simple sorted batches of values, descriptive statistics, and plots. Indeed, many signifiers of a potential error could require special handling if not eliminated by correcting errors. As with handling detected errors, I won’t detail here how to manage potentially problematic data features that one identifies, except to encourage cautious deliberation in deciding how findings from these inspections influence later data-analysis choices and transparency in reporting these decisions; I’ll touch on this in Part 6 of this overview.

Finally, besides the two main exploratory aims of preliminary analyses, it’s worth mentioning how findings from preliminary analyses might prompt **changes to earlier research-synthesis stages**. Research syntheses are especially susceptible to this, because the amount and nature of available data are often beyond the synthesist’s control and difficult to anticipate accurately. Consequently, outcomes of early stages might conflict with plans for later stages. In particular, certain anticipated meta-analytic procedures might be inadvisable or impossible due to the number or nature of ESs extracted from retrieved studies, their sample sizes or CVs, the quality of reported info on pertinent ES features, or other outcomes of data collection or evaluation. For example, inaccurate projections about ESs and their CVs might invalidate results of a priori power analyses, and certain moderators might not be analyzable due to insufficient info on study features. Such a circumstance might prompt changes to one or more earlier stages, such as reformulating the review questions or obtaining more or different data by altering inclusion/exclusion criteria or coding strategies. If so, it’s advisable to document and report these choices and justify them accordingly in the interest of transparency and replicability.

Example—Workplace Exercise:Conn, Hafdahl, Cooper, Brown, and Lusk’s (2009) quantitative review of workplace exercise interventions, described in Part 1 of this overview, involved recording data for up to three types of standardized mean differences (SMDs) on several outcome variables as well as numerous features associated these SMDs—mostly study-level variables or aggregate participant-level variables. While preparing these data for primary analyses, they (well, we) explored them to find errors and identify issues requiring special handling. Below are some of the types of info they checked for errors, depending on which groups (Treatment and Control), occasions (pre- and post-intervention), and outcome variables a given study used.

Sample size:They checked for variation in sample size among outcome variables for a given group at a given occasion (e.g., Treatment post-intervention) and, for each outcome variable, for a difference in sample size between groups (at a given occasion) or occasions (for a given group). Most such sample-size discrepancies were unremarkable, but a few indicated errors.Variance:For each outcome variable they checked for unusually large ratios of sample variances between groups (at a given occasion) or occasions (for a given group), using significance tests to account for sample size and plausible pre-post correlation (e.g., a large ratio is more unusual with a larger sample or a larger correlation). For some outcome variables that were measured commensurably in several studies, they compared variances among studies; this was actually done during a follow-up analysis that involved reporting mean ESs in a raw/unstandardized metric.Days after intervention:They recorded how many days after the intervention (DAI) each outcome variable was measured. This occasionally varied among outcome variables but never due to a verifiable error.SMD direction:To resolve ambiguity about whether higher or lower scores on given measure of an outcome variable represent improvement, they coded each SMD’s direction—whether the intervention mean (e.g., Treatment posttest) was better or worse than the comparison mean (e.g., Control posttest). When a sample contributed two or more SMDs on a given outcome variable (e.g., 2-group posttest and treatment pre-post), they checked consistency between these SMDs’ signs and coded directions; for instance, two SMDs with different signs should also have different coded directions.SMD:They used fixed- and random-effects meta-analytic procedures, which I’ll describe in Part 5 of this overview, to identify unusual SMD estimates. This mainly entailed computing certain quantities (e.g., mean SMD, between-studies heterogeneity) with each study excluded and comparing these leave-1-out results among studies, separately for each type of SMD on each outcome variable. For instance, they computed a type of externally standardized residual from the difference between an SMD estimate and the other SMD estimates’ mean SM.In a related study introduced in Part 2 of this overview, Conn, Hafdahl, and Mehr (2011) also checked for (and found several) inconsistencies between related statistical results, such as

pvalues from tests of mean differences that didn’t match summary statistics used to reconstruct those tests. In yet another related study focused on anthropometric outcomes (currently under review), Conn, Hafdahl, and colleagues checked for unusual “doses” of supervised exercise (e.g., intensity, frequency, duration), such as atypically large minutes per sessions, total number of sessions, or number of sessions per week, as well as inconsistencies among these and related variables.Conn et al. (2009) also conducted exploratory analyses to identify issues that might require special handling in primary analyses, in addition to issues identified in the above analyses (e.g., within-study heteroscedasticity might warrant a different type of SMD or other ES). For instance, they identified studies with extremely large sample sizes that might unduly influence certain analyses, and they found that certain pairs or larger sets of potential moderator variables were available for so few studies that analyzing them together wasn’t feasible (e.g., out of 66 pairs of 12 dichotomous moderators, 32 pairs had no SMD in at least 1 cell and only 17 pairs had at least 2 SMDs in all 4 cells). Similarly, Conn et al. (2011) identified troublesome distributions for several potential continuous moderator variables, such as substantially skewed dose variables and a DAI distribution that was severely positively skewed with a pile-up of values at 0 (i.e., no post-intervention lag before outcome measurement).

That’s all for this brief treatment of exploratory analyses preceding primary meta-analytic procedures. I’ve hinted at several topics that could provide fodder for subsequent posts, such as techniques for identifying and coping with potential outliers and strategies for handling troublesome distributions of moderators. Feel free to suggest particular topics you’d like me to consider addressing.

[…] Part 4 of my meta-analysis overview I mentioned checking for outlier ESs as one aspect of data […]