Evidence based assessment/Reliability

From Wikiversity
Jump to navigation Jump to search

Reliability[edit]

This page focuses on psychometric reliability in the context of Evidence-Based Assessment. There are other more general and comprehensive discussions of reliability on Wikipedia and elsewhere.


Evaluating norms and reliability[edit]

Rubric for evaluating norms and reliability for assessments (extending Hunsley & Mash, 2008; *indicates new construct or category)
Criterion Adequate Good Excellent Too Good
Norms Mean and standard deviation for total score (and subscores if relevant) from a large, relevant clinical sample Mean and standard deviation for total score (and subscores if relevant) from multiple large, relevant samples, at least one clinical and one nonclinical Same as “good,” but must be from representative sample (i.e., random sampling, or matching to census data) Not a concern
Internal consistency (Cronbach's alpha, split half, etc.) Most evidence shows Cronbach's alpha values of .70 to .79 Most reported alphas .80 to .89 Most reported alphas >= .90 Alpha is also tied to scale length and content coverage - very high alphas may indicate that scale is longer than needed, or that it has a very narrow scope
w:Inter-rater reliability Most evidence shows kappas of .60-.74, or intraclass correlations of .70-.79 Most reported kappas of .75-.84, ICCs of .80-.89 Most kappas ≥ .85, or ICCs ≥ .90 Very high levels of agreement often achieved by re-rating from audio or transcript
w:Test-retest reliability (stability) Most evidence shows test-retest correlations ≥ .70 over period of several days or weeks Most evidence shows test-retest correlations ≥ .70 over period of several months Most evidence shows test-retest correlations ≥ .70 over a year or longer Key consideration is appropriate time interval; many constructs would not be stable for years at a time
*Repeatability Bland-Altman plots (Bland & Altman, 1986) plots show small bias, and/or weak trends; coefficient of repeatability is tolerable compared to clinical benchmarks (Vaz, Falkmer, Passmore, Parsons, & Andreou, 2013) Bland-Altman plots and corresponding regressions show no significant bias, and no significant trends; coefficient of repeatability is tolerable Bland-Altman plots and corresponding regressions show no significant bias, and no significant trends across multiple studies; coefficient of repeatability is small enough that it is not clinically concerning Not a concern