Validity in the Context of EBA

This page summarizes aspects of psychometric validity as they relate to EBA. Other pages offer more comprehensive and general discussions of validity.

Rubric for evaluating validity and utility (extending Hunsley & Mash, 2008 ; *indicates new construct or category)
Criterion	Adequate	Good	Excellent	*Too Excellent
Content validity	Test developers clearly defined domain and ensured representation of entire set of facets	As adequate, plus all elements (items, instructions) evaluated by judges (experts or pilot participants)	As good, plus multiple groups of judges and quantitative ratings	Not a problem; can point out that many measures do not cover all of the DSM criteria now
Construct validity (e.g., predictive, concurrent, convergent)	Some independently replicated evidence of construct validity	Bulk of independently replicated evidence shows multiple aspects of construct validity	As good, plus evidence of incremental validity with respect to other clinical data	Not a problem
*Discriminative validity	Statistically significant discrimination in multiple samples; Areas Under the Curve (AUCs) < .6 under clinically realistic conditions (i.e., not comparing treatment seeking and healthy youth)	AUCs of .60 to <.75 under clinically realistic conditions	AUCs of .75 to .90 under clinically realistic conditions	AUCs >.90 should trigger careful evaluation of research design and comparison group. More likely to be biased than accurate estimate of clinical performance.
*Prescriptive validity	Statistically significant accuracy at identifying a diagnosis with a well-specified matching intervention, or statistically significant moderator of treatment	As “adequate,” with good kappa for diagnosis, or significant treatment moderation in more than one sample	As “good,” with good kappa for diagnosis in more than one sample, or moderate effect size for treatment moderation	Not a problem with the measure or finding, per se; but high predictive validity may obviate need for other assessment components. Compare on utility.
Validity generalization	Some evidence supports use with either more than one specific demographic group or in more than one setting	Bulk of evidence supports use with either more than one specific demographic group or in multiple settings	Bulk of evidence supports use with either more than one specific demographic group and in multiple settings	Not a problem
Treatment sensitivity	Some evidence of sensitivity to change over course of treatment	Independent replications show evidence of sensitivity to change over course of treatment	As good, plus sensitive to change across different types of treatments	Not a problem
Clinical utility	After practical considerations (e.g., costs, ease of administration and scoring, duration, availability of relevant benchmark scores, patient acceptability), assessment data are likely to be clinically useful	As adequate, plus published evidence that using the assessment data confers clinical benefit (e.g., better outcome, lower attrition, greater satisfaction)	As good, plus independent replication	Not a problem