Evidence based assessment/Validity

From Wikiversity
Jump to navigation Jump to search

Validity in the Context of EBA[edit]

This page summarizes aspects of psychometric validity as they relate to EBA. Other pages offer more comprehensive and general discussions of validity.

Rubric for evaluating validity and utility (extending Hunsley & Mash, 2008 ; *indicates new construct or category)
Criterion Adequate Good Excellent *Too Excellent
Content validity Test developers clearly defined domain and ensured representation of entire set of facets As adequate, plus all elements (items, instructions) evaluated by judges (experts or pilot participants) As good, plus multiple groups of judges and quantitative ratings Not a problem; can point out that many measures do not cover all of the DSM criteria now
Construct validity (e.g., predictive, concurrent, convergent) Some independently replicated evidence of construct validity Bulk of independently replicated evidence shows multiple aspects of construct validity As good, plus evidence of incremental validity with respect to other clinical data Not a problem
*Discriminative validity Statistically significant discrimination in multiple samples; Areas Under the Curve (AUCs) < .6 under clinically realistic conditions (i.e., not comparing treatment seeking and healthy youth) AUCs of .60 to <.75 under clinically realistic conditions AUCs of .75 to .90 under clinically realistic conditions AUCs >.90 should trigger careful evaluation of research design and comparison group. More likely to be biased than accurate estimate of clinical performance.
*Prescriptive validity Statistically significant accuracy at identifying a diagnosis with a well-specified matching intervention, or statistically significant moderator of treatment As “adequate,” with good kappa for diagnosis, or significant treatment moderation in more than one sample As “good,” with good kappa for diagnosis in more than one sample, or moderate effect size for treatment moderation Not a problem with the measure or finding, per se; but high predictive validity may obviate need for other assessment components. Compare on utility.
Validity generalization Some evidence supports use with either more than one specific demographic group or in more than one setting Bulk of evidence supports use with either more than one specific demographic group or in multiple settings Bulk of evidence supports use with either more than one specific demographic group and in multiple settings Not a problem
Treatment sensitivity Some evidence of sensitivity to change over course of treatment Independent replications show evidence of sensitivity to change over course of treatment As good, plus sensitive to change across different types of treatments Not a problem
Clinical utility After practical considerations (e.g., costs, ease of administration and scoring, duration, availability of relevant benchmark scores, patient acceptability), assessment data are likely to be clinically useful As adequate, plus published evidence that using the assessment data confers clinical benefit (e.g., better outcome, lower attrition, greater satisfaction) As good, plus independent replication Not a problem