Sneak Preview: Degraded Effect Sizes and Tricky Conditional VariancesPosted: April 2, 2012 | |
My post on data exploration more than half completed my seven-part overview of meta-analysis. As a diversion while I write Part 5, let’s consider two of several methodological issues I plan to discuss in this blog: degraded effect sizes (ESs) and tricky conditional variances (CVs). My main aim here is to pique your interest in future posts by offering a glimpse at ways to manage selected challenges that routine meta-analytic techniques don’t address. These “teaser” descriptions will be quite superficial. I plan to elaborate on each of these challenges—as well as many others—after laying a foundation in my seven-part overview.
In Part 1 of the overview of meta-analysis I introduced notation for ES statistics (i.e., estimators, estimates) and parameters and described several issues that arise when obtaining ES estimates from primary studies. In this section I’ll focus on “degraded” ESs, by which I roughly mean ES estimates reported only as a range (or ranges) of values in which the estimate falls. Let’s call these values the ES estimate’s matching set.
As a simple example of a degraded ES, a study might report only the ES estimate’s direction relative to a reference value (e.g., no effect, null hypothesis), in which case the matching set is all values sharing that direction (e.g., Pearson correlations between -1 and 0, mean differences between 0 and +inf, proportions between ½ and 1, odds ratios between 0 and 1). More subtle examples involve rounding, such as a Pearson correlation reported as “r = .3″ when it’s between .25 and .35. Many ESs are degraded by partial reporting of results from significance tests; we can often derive an ES estimate’s matching set from reported info about an observed test statistic or p value (e.g., “significantly positive,” “p < .01,” “p > .05,” “.01 < p < .05,” “p = .4″), given sufficient other info about the data or test (e.g., sample size[s], α level, 1- vs. 2-sided alternative).
I think we have little good evidence about how prevalent degraded ESs are or how meta-analysts handle them. I suspect it’s common to simply omit degraded ESs or “impute” an ES value but ignore the degradation (e.g., set a non-significant ES estimate to 0 and use its usual CV). These and other ad hoc methods can badly distort meta-analytic results, such as estimates or tests of the mean or variance of ES parameters, especially if ESs are degraded not at random (e.g., smaller ESs degraded more often). I suspect that meta-analysts rarely use more principled methods, such as modern vote-counting techniques, perhaps because they’re not implemented widely in software and are limited in important ways (e.g., require between-studies homogeneity, recognize only simple types of degradation).
I’ve developed two related approaches for meta-analyzing degraded ESs. Both of these use a matching set for each degraded ES, work when some ESs are degraded and others aren’t, and support various meta-analytic models. The first approach, Bayesian ES imputation (BESI), entails obtaining an imputed ES and adjusted CV for each degraded ES; we then meta-analyze these quantities just like non-degraded ESs and their CVs. The imputed ES and adjusted CV are based on a posterior distribution for the ES parameter: Given a prior distribution for the ES parameter, we find the ES and CV that yield the same posterior mean and variance as our degraded ES and its original CV. The second approach involves maximum-likelihood (ML) estimation and inference: The likelihood for each degraded ES is a probability based on the matching set, and that for each non-degraded ES is the usual density function.
The BESI and ML approaches both rely on a matching set for degraded ESs but differ in important ways. For instance, BESI requires specifying a prior but doesn’t require specifying a particular meta-analytic model to impute ESs and adjust their CVs; ML requires specifying a meta-analytic model but doesn’t require a prior.
In Part 2 of the meta-analysis overview I discussed ES sampling error, with particular attention to quantifying an ES’s sampling error or imprecision as its CV. I’ll focus here on challenges of obtaining CVs in certain situations where a formula for the CV isn’t readily available. In particular, I’ll describe a fairly general-purpose method for obtaining reasonable CVs in many of these situations.
Meta-analysts sometimes extract ES estimates in unconventional ways that don’t lend themselves to using conventional formulas for CVs. Several authors have suggested strategies for approximating certain ESs using a variety of types of reported results, but many of these suggestions lack an accompanying standard error whose square could serve as the CV. Simple examples include approximating a standardized mean difference (SMD) using results based on a dichotomized outcome variable or each group’s sample quartiles, using standard deviations from unconventional sources (e.g., external samples) for a SMD’s standardizer, and approximating a Pearson correlation using a measure of ordinal association (e.g., Spearman’s rho) or results from a 2 × 2 contingency table. More complex examples involve multivariate ESs or missing data.
In some of these situations a formula for the CV is published or can be derived using standard math-stat techniques (e.g., delta method). Depending on one’s circumstances and resources, however, these options might not be available. A broadly applicable approach in many such scenarios is to obtain the CV by simulation, using methods akin to a parametric bootstrap. Here’s the basic idea: Given an ES estimate and other basic info that’s often reported (e.g., sample size[s]), we can generate ES estimates from an approximately correct sampling distribution and use these estimates’ variance as the CV. This approach can be extended to handle multivariate ESs (i.e., for a conditional covariance matrix) and some types of missing data.
For instance, suppose we wish to estimate a SMD between two independent samples, and for each sample we have its size and three quartiles. If we know how to approximate the SMD from these quartiles, here’s a sketch of how to simulate this approximate SMD estimator’s CV: Treating the SMD estimate as the SMD parameter and making reasonable distributional assumptions about subjects’ scores, generate many pairs of independent samples with the same sizes as our two samples; then use each pair’s sample quartiles to approximate the SMD, and take these approximated SMDs’ variance as the desired CV.
That’s all for now. I’ve omitted a lot of technical detail about the above approaches and cautions about using them appropriately, though many statisticians could run with the ideas I’ve sketched. When I address each of these issues in later posts, I’ll elaborate on their rationale and implementation, probably give examples, and perhaps provide some computing tools. Feel free to comment on which of these topics you’d like me to address first.