Mood Disorder Questionnaire (MDQ)

Mood Disorder Questionnaire (MDQ)

This page goes into detail about how to score and interpret the MDQ.

The Mood Disorder Questionnaire (MDQ) is a brief screen to improve detection of bipolar disorders. It generally shows good sensitivity to bipolar I, but has a harder time detecting the other types of bipolar disorders. It is not designed to measure current symptom severity or treatment response. It is one of the most translated and studied screening tools for bipolar disorders. Its brevity and simple reading level add to its popularity, along with it being free, fast to take and score.


Norms and Reliability

The MDQ was originally developed and validated in a large sample in the United States. Later studies used large clinical samples, online surveys distributed by advocacy groups, and other convenience samples. There are no carefully designed and stratified samples intended to be representative of a general population. Thus the MDQ fits as having "adequate" normative data based on the large number of convenience samples. Several meta-analyses have summarized the performance of the MDQ scores across the range of published languages, clinical settings, and administration formats.


Reliability refers to whether the scores are reproducible. Unless otherwise specified, the reliability scores and values come from studies done with a United States population sample.

Applying the EBA rubric for evaluating norms and reliability to scores from the Mood Disorder Questionnaire
Criterion Rating (adequate, good, excellent, too good*) Explanation with references
Norms Adequate Multiple convenience samples and research studies, including both clinical and nonclinical samples[citation needed]
Internal consistency Good? Cronbach's alpha usually reported based on the symptom items (not the "episodic" or impairment items. These
Inter-rater reliability Not applicable Designed originally as a self-report scale; parent and youth report correlate about the same as cross-informant scores correlate in general[1]
Test-retest reliability (stability Good r = .73 over 15 weeks. Evaluated in initial studies,[2] with data also show high stability in clinical trials[citation needed]
Repeatability Not published No published studies formally checking repeatability

Validity describes the evidence that an assessment tool measures what it was supposed to measure. There are many different ways of checking validity. For screening measures such as the CAGE, diagnostic accuracy and discriminative validity are probably the most useful ways of looking at validity.


Validity describes the evidence that an assessment tool measures what it was supposed to measure. There are many different ways of checking validity. For screening measures, diagnostic accuracy and discriminative validity are probably the most useful ways of looking at validity. Unless otherwise specified, the validity scores and values come from studies done with a United States population sample. A rubric for describing validity of assessment scores in the context of EBA is here.

Evaluation of validity and utility for the General Behavior Inventory (table from Youngstrom et al., unpublished, extended from Hunsley & Mash, 2008; *indicates new construct or category)
Criterion Rating (adequate, good, excellent, too good*) Explanation with references
Content validity Excellent Covers both DSM diagnostic symptoms and a range of associated features[2]
Construct validity (e.g., predictive, concurrent, convergent, and discriminant validity) Excellent Shows convergent validity with other symptom scales, longitudinal prediction of development of mood disorders,[3][4][5] criterion validity via metabolic markers[2][6] and associations with family history of mood disorder.[7] Factor structure complicated;[2][8] the inclusion of “biphasic” or “mixed” mood items creates a lot of cross-loading
Discriminative validity Excellent Multiple studies show that GBI scores discriminate cases with unipolar and bipolar mood disorders from other clinical disorders[2][9][10] effect sizes are among the largest of existing scales[11]
Validity generalization Good Used both as self-report and caregiver report; used in college student[8][12] as well as outpatient[9][13][14] and inpatient clinical samples; translated into multiple languages with good reliability
Treatment sensitivity Good Multiple studies show sensitivity to treatment effects comparable to using interviews by trained raters, including placebo-controlled, masked assignment trials[15][16] Short forms appear to retain sensitivity to treatment effects while substantially reducing burden[16][17]
Clinical utility Good Free (public domain), strong psychometrics, extensive research base. Biggest concerns are length and reading level. Short forms have less research, but are appealing based on reduced burden and promising data

