Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians’ diagnoses

doi:10.1016/S0165-0327(02)00237-9

Journal of Affective Disorders

Volume 78, Issue 2, February 2004, Pages 131-140

https://doi.org/10.1016/S0165-0327(02)00237-9 Get rights and content

Abstract

Background: The aim of this study was to compare the validity of the Hospital Anxiety and Depression Scale (HADS), the WHO (five) Well Being Index (WBI-5), the Patient Health Questionnaire (PHQ), and physicians’ recognition of depressive disorders, and to recommend specific cut-off points for clinical decision making. Methods: A total of 501 outpatients completed each of the three depression screening questionnaires and received the Structured Clinical Interview for DSM-IV (SCID) as the criterion standard. In addition, treating physicians were asked to give their psychiatric diagnoses. Criterion validity and Receiver Operating Characteristics (ROC) were determined. Areas under the curves (AUCs) were compared statistically. Results: All depression scales showed excellent internal consistencies (Cronbach’s α: 0.85–0.90). For ‘major depressive disorder’, the operating characteristics of the PHQ were significantly superior to both the HADS and the WBI-5. For ‘any depressive disorder’, the PHQ showed again the best operating characteristics but the overall difference did not reach statistical significance at the 5% level. Cut-off points that can be recommended for the screening of ‘major depressive disorder’ had sensitivities of 98% (PHQ), 94% (WBI-5), and 85% (HADS). Corresponding specificities were 80% (PHQ), 78% (WBI-5), and 76% (HADS). In contrast, physicians’ recognition of ‘major depressive disorder’ was poor (sensitivity, 40%; specificity, 87%). Limitations: Our sample may not be representative of medical outpatients, but sensitivity and specificity are independent of disorder prevalence. Conclusions: All three questionnaires performed well in depression screening, but significant differences in criterion validity existed. These results may be helpful in the selection of questionnaires and cut-off points.

Introduction

Depressive disorders are associated with high levels of personal suffering, increased disability days, and elevated risk of cardiovascular mortality and suicide (Wells et al., 1989, Broadhead et al., 1990, Ormel et al., 1994, Simon et al., 1995, Frasure-Smith et al., 2000, Penninx et al., 2001, Posternak and Miller, 2001). Unfortunately, physicians only detect 30–50% of patients with depression in primary care (Nielsen and Williams, 1980, Perez-Stable et al., 1990, Ormel et al., 1991, Docherty, 1997, Williams et al., 1999, Hansen et al., 2001). More frequently, depression goes undetected and so remains untreated (Gelenberg, 1999). Major and minor depressive disorders respond well to psychotherapy and/or treatment with anti-depressants (Miranda and Munoz, 1994, Coulehan et al., 1997, Schulberg et al., 1998, Whooley and Simon, 2000, Williams et al., 2000, Jarrett et al., 2001), thus emphasising the need to improve recognition by clinicians. Recently, it has been demonstrated that screening for depression can be cost-effective if screening costs are low and effective treatments are given (Valenstein et al., 2001). Screening questionnaires that guarantee low screening costs are entirely self-administered, and only require a couple of minutes for patients to complete and physicians to review. International and well-established screening questionnaires that meet these requirements are the Hospital Anxiety and Depression Scale (HADS; Zigmond and Snaith, 1983), the WHO (five) Well Being Index (WBI-5; WHO, 1998a), and the Patient Health Questionnaire (PHQ; Spitzer et al., 1999). Of interest to clinicians and researchers is knowing which of the available screening instruments can be recommended for clinical use, the validity of the results, and their superiority to recognition by physicians working with medical outpatients. In addition, users of screening questionnaires need to know optimal cut-off points for detecting depressive disorders according to DSM-IV (American Psychiatric Association, 2000).

The purpose of this study was to determine the comparative validity of the Hospital Anxiety and Depression Scale (HADS), the WHO (five) Well Being Index (WBI-5), the Patient Health Questionnaire (PHQ), and physicians’ recognition of depressive disorders. Specifically, this study aimed:

(1)
to investigate internal consistency and intercorrelations of the three depression scales;
(2)
to analyse the operating characteristics of the depression scales and physicians’ diagnoses according to an independent criterion standard for depressive disorders;
(3)
to determine if any one screening instrument is superior to the others in diagnosing DSM-IV depressive disorders;
(4)
to determine optimal cut-off points for discriminating between subjects with and without depressive disorders.

Section snippets

Subjects

The study was performed in the outpatient clinics of Heidelberg Medical Hospital and 12 family practices in Heidelberg from August 2000 to July 2001. On predetermined days, patients visiting these sites were asked to participate in our study and to complete a set of questionnaires during their waiting time. With the aim of performing 500 Structured Clinical Interviews for DSM-IV (SCID; First et al., 1995, Wittchen et al., 1997) as the criterion standard for the presence of depressive disorders,

Internal consistency and intercorrelations

The internal consistency of all three depression scales was excellent: Cronbach’s α for the PHQ was 0.88; the HADS, 0.86; and the WBI-5, 0.91. The substantial intercorrelations of 0.74 (HADS×PHQ), −0.73 (WBI-5×PHQ), and −0.76 (HADS×WBI-5) demonstrate that the three scales measure nearly the same construct.

Comparative validity for ‘major depressive disorder’

Table 1 shows the operating characteristics of the depression scales and the physicians’ diagnoses for ‘major depressive disorder’ for three potential cut-off points for each instrument, and

Discussion

The main purpose of our study was to investigate the criterion validity of three international screening instruments for depression, and to determine whether they differ significantly regarding their ability to diagnose DSM-IV depressive disorders. Previous comprehensive reviews (Meakin, 1992, Mulrow et al., 1995) have demonstrated reasonable operating characteristics for several case-finding instruments for depression, but significant differences between instruments remain elusive. To our

Acknowledgments

This study was supported by unrestricted research grants from Pfizer, Germany, and from the medical faculty of the University of Heidelberg, Germany (project 121/2000), and there are no conflicts of interest. First of all, we thank our patients and their doctors, who collaborated to this study and made this work possible. We are very grateful to our students Levke Willand and Ingeborg Warnke, who played an important role in data collection. Susanne Geercken, MA, Pfizer, reviewed the German

References (57)

P. Bech et al.
The sensitivity and specificity of the Major Depression Inventory, using the Present State Examination as the index of diagnostic validity
J. Affect. Disord.
(2001)
I. Bjelland et al.
The validity of the Hospital Anxiety and Depression Scale. An updated literature review
J. Psychosom. Res.
(2002)
M.S. Hansen et al.
Mental disorders among internal medical inpatients: prevalence, detection, and treatment status
J. Psychosom. Res.
(2001)
C. Herrmann
International experiences with the Hospital Anxiety and Depression Scale—a review of validation data and clinical results
J. Psychosom. Res.
(1997)
M.A. Posternak et al.
Untreated short-term course of major depression: a meta-analysis of outcomes from studies using wait-list control groups
J. Affect. Disord.
(2001)
R.L. Spitzer et al.
Validity and utility of the PRIME-MD patient health questionnaire in assessment of 3000 obstetric-gynecologic patients: the PRIME-MD Patient Health Questionnaire Obstetrics-Gynecology Study
Am. J. Obstet. Gynecol.
(2000)
J.J. Strik et al.
Sensitivity and specificity of observer and self-report questionnaires in major and minor depression following myocardial infarction
Psychosomatics
(2001)
P. Svanborg et al.
A comparison between the Beck Depression Inventory (BDI) and the self-rating version of the Montgomery Asberg Depression Rating Scale (MADRS)
J. Affect. Disord.
(2001)
J.W. Williams et al.
Case-finding for depression in primary care: a randomized trial
Am. J. Med.
(1999)
American Psychiatric Association, 2000. Diagnostic and Statistical Manual of Mental Disorders DSM-IV-TR. 4th Edition....

P. Bech et al.

The WHO (Ten) Well-Being Index: validation in diabetes

Psychother. Psychosom.

(1996)

B. Bracken et al.

State of the art procedures for translating, validating and using psychoeducational tests in cross-cultural assessment

School Psychol. Int.

(1991)

W.E. Broadhead et al.

Depression, disability days, and days lost from work in a prospective epidemiologic survey

J. Am. Med. Assoc.

(1990)

J.L. Coulehan et al.

Treating depressed primary care patients improves their physical, mental, and social functioning

Arch. Int. Med.

(1997)

E.R. De Long et al.

Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach

Biometrics

(1988)

C. Diez-Quevedo et al.

Validation and utility of the Patient Health Questionnaire in diagnosing mental disorders in 1003 general hospital spanish inpatients

Psychosom. Med.

(2001)

J.P. Docherty

Barriers to the diagnosis of depression in primary care

J. Clin. Psychiatry

(1997)

M.B. First et al.

J.L. Fleiss

Measuring nominal scale agreement among many raters

Psychol. Bull.

(1971)

N. Frasure-Smith et al.

Social support, depression, and mortality during the first year after myocardial infarction

Circulation

(2000)

A. Gelenberg

Depression is still underrecognized and undertreated

Arch. Intern. Med.

(1999)

C. Herrmann et al.

R. Heun et al.

Internal and external validity of the WHO Well-Being Scale in the elderly general population

Acta Psychiatr. Scand.

(1999)

R.B. Jarrett et al.

Preventing recurrent depression using cognitive therapy with and without a continuation phase: a randomized clinical trial

Arch. Gen. Psychiatry

(2001)

J.G. Johnson et al.

Health problems, impairment and illnesses associated with bulimia nervosa and binge eating disorder among primary care and obstetric gynaecology patients

Psychol. Med.

(2001)

H.C. Kraemer et al.

Measuring the potency of risk factors for clinical or policy significance

Psychol. Methods

(1999)

K. Kroenke

Depression screening is not enough

Ann. Intern. Med.

(2001)

K. Kroenke et al.

Similar effectiveness of paroxetine, fluoxetine, and sertraline in primary care: a randomized trial

J. Am. Med. Assoc.

(2001)

Cited by (839)

Clinical effectiveness of patient-targeted feedback following depression screening in general practice (GET.FEEDBACK.GP): an investigator-initiated, prospective, multicentre, three-arm, observer-blinded, randomised controlled trial in Germany
2024, The Lancet Psychiatry
Screening for depression in primary care alone is not sufficient to improve clinical outcomes. However, targeted feedback of the screening results to patients might result in beneficial effects. The GET.FEEDBACK.GP trial investigated whether targeted feedback of the depression screening result to patients, in addition to feedback to general practitioners (GPs), leads to greater reductions in depression severity than GP feedback alone or no feedback.
The GET.FEEDBACK.GP trial was an investigator-initiated, multicentre, three-arm, observer-blinded, randomised controlled trial. Depression screening was conducted electronically using the Patient Health Questionnaire-9 (PHQ-9) in 64 GP practices across five regions in Germany while patients were waiting to see their GP. Currently undiagnosed patients (aged ≥18 years) who screened positive for depression (PHQ-9 score ≥10), were proficient in the German language, and had a personal consultation with a GP were randomly assigned (1:1:1) into a group that received no feedback on their depression screening result, a group in which only the GP received feedback, or a group in which both GP and patient received feedback. Randomisation was stratified by treating GP and PHQ-9 depression severity. Trial staff were masked to patient enrolment and study group allocation and GPs were masked to the feedback recieved by the patient. Written feedback, including the screening result and information on depression, was provided to the relevant groups before the consultation. The primary outcome was PHQ-9-measured depression severity at 6 months after randomisation. An intention-to-treat analysis was conducted for patients who had at least one follow-up visit. This study is registered at ClinicalTrials.gov (NCT03988985) and is complete.
Between July 17, 2019, and Jan 31, 2022, 25 279 patients were approached for eligibility screening, 17 150 were excluded, and 8129 patients completed screening, of whom 1030 (12·7%) screened positive for depression. 344 patients were randomly assigned to receive no feedback, 344 were assigned to receive GP-targeted feedback, and 339 were assigned to receive GP-targeted plus patient-targeted feedback. 252 (73%) patients in the no feedback group, 252 (73%) in the GP-targeted feedback group, and 256 (76%) in the GP-targeted and patient-targeted feedback group were included in the analysis of the primary outcome at 6 months, which reflected a follow-up rate of 74%. Gender was reported as female by 637 (62·1%) of 1025 participants, male by 384 (37·5%), and diverse by four (0·4%). 169 (16%) of 1026 patients with available migration data had a migration background. Mean age was 39·5 years (SD 15·2). PHQ-9 scores improved for each group between baseline and 6 months by –4·15 (95% CI –4·99 to –3·30) in the no feedback group, –4·19 (–5·04 to –3·33) in the GP feedback group, and –4·91 (–5·76 to –4·07) in the GP plus patient feedback group, with no significant difference between the three groups (global p=0·13). The difference in PHQ-9 scores when comparing the GP plus patient feedback group with the no feedback group was –0·77 (–1·60 to 0·07, d=–0·16) and when comparing with the GP-only feedback group was –0·73 (–1·56 to 0·11, d=–0·15). No increase in suicidality was observed as an adverse event in either group.
Providing targeted feedback to patients and GPs after depression screening does not significantly reduce depression severity compared with GP feedback alone or no feedback. Further research is required to investigate the potential specific effectiveness of depression screening with systematic feedback for selected subgroups.
German Innovation Fund.
For the German translation of the abstract see Supplementary Materials section.
Association between commuting time and depressive symptoms in 5th Korean Working Conditions Survey
2024, Journal of Transport and Health
Commuting is an essential activity for workers; however, its potentially harmful effects on depression are yet to be determined. This study explored the possible associations between the length of commuting time and depressive symptoms in South Korea, alleged to have the longest average commuting time and the highest depressive symptoms among OECD countries.
We used the Korean Working Conditions Survey, a nationally representative cross-sectional survey of 23,415 selected wage workers aged between 20 and 59 years. Patients with a World Health Organization Five Well-Being Index total score < 13 were defined as having depressive symptoms. Associations among commuting time, depressive symptoms, and covariates such as sex, age, education, income, region, marital status, children, occupation, weekly working hours, and shift work were examined.
When compared with a short commuting time (< 30 min), a long commuting time (≥ 60 min) was associated with depressive symptoms [odds ratio = 1.16; 95% confidence interval = 1.04–1.29]. Significant associations between long commuting time and depressive symptoms were observed in males 40–49 years and females 20–29 years. Long commuting times were also significantly associated with depressive symptoms when stratified by factors such as low income (male and female), white-collar jobs (male), working 40 h per week (male), without (male) or with (female) shift work, being unmarried (male), and having no (male) or ≥ two children (female).
This study demonstrated differential associations between commuting time and depressive symptoms based on sociodemographic features such as sex, age, and income. Various socio-economic conditions influence commuters' mental health. Tailored approaches suited to these features are needed to mitigate the influence of commuting time on depressive symptoms.
Psychometric properties of the Psy-flex scale: A validation study in a community sample in Korea
2023, Journal of Contextual Behavioral Science
This study examined the psychometric properties of the recently developed six-item Psy-Flex among community samples in Korea. Using a cross-sectional design, data were collected from 1059 participants. Three bilingual experts translated the scale to ensure content validity. Factor analysis was employed to confirm the factor structure, and a polytomous item response theory model was used to examine the individual items and entire scale. The theory-based single-factor structure was confirmed using a Korean community sample, and measurement invariances were found across sex and age groups. In addition, the scale showed a moderate relationship with established measures of interest. Furthermore, item and categorical functioning were investigated using the polytomous Item Response Theory (IRT) model (i.e., the Generalized Partial Credit Model [GPCM]), identifying well-functioning (e.g., items 4, 5, and 6) and suboptimal (e.g., item 2) items. Additionally, the results suggest that the participants might not be able to meaningfully differentiate the original five-response categories. To the best of our knowledge, this is the first attempt to validate the Psy-Flex in Korea. We believe that the findings are of considerable value in facilitating our understanding of the scale and, more broadly, the construct of psychological flexibility.
12-month follow-up of intensive outpatient treatment for PTSD combining prolonged exposure therapy, EMDR and physical activity
2024, BMC Psychiatry
Evaluating the development and well-being assessment (DAWBA) in pediatric anxiety and depression
2024, Child and Adolescent Psychiatry and Mental Health
Addressing help-seeking, stigma and risk factors for suicidality in secondary schools: short-term and mid-term effects of the HEYLiFE suicide prevention program in a randomized controlled trial
2024, BMC Public Health

View all citing articles on Scopus

View full text

Research reportComparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians’ diagnoses

Abstract

Introduction

Section snippets

Subjects

Internal consistency and intercorrelations

Comparative validity for ‘major depressive disorder’

Discussion

Acknowledgments

J. Affect. Disord.

J. Psychosom. Res.

J. Psychosom. Res.

J. Psychosom. Res.

J. Affect. Disord.

Am. J. Obstet. Gynecol.

Psychosomatics

J. Affect. Disord.

Am. J. Med.

The WHO (Ten) Well-Being Index: validation in diabetes

Psychother. Psychosom.

State of the art procedures for translating, validating and using psychoeducational tests in cross-cultural assessment

School Psychol. Int.

Depression, disability days, and days lost from work in a prospective epidemiologic survey

J. Am. Med. Assoc.

Treating depressed primary care patients improves their physical, mental, and social functioning

Arch. Int. Med.

Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach

Biometrics

Validation and utility of the Patient Health Questionnaire in diagnosing mental disorders in 1003 general hospital spanish inpatients

Psychosom. Med.

Barriers to the diagnosis of depression in primary care

J. Clin. Psychiatry

Measuring nominal scale agreement among many raters

Psychol. Bull.

Social support, depression, and mortality during the first year after myocardial infarction

Circulation

Depression is still underrecognized and undertreated

Arch. Intern. Med.

Internal and external validity of the WHO Well-Being Scale in the elderly general population

Acta Psychiatr. Scand.

Preventing recurrent depression using cognitive therapy with and without a continuation phase: a randomized clinical trial

Arch. Gen. Psychiatry

Health problems, impairment and illnesses associated with bulimia nervosa and binge eating disorder among primary care and obstetric gynaecology patients

Psychol. Med.

Measuring the potency of risk factors for clinical or policy significance

Psychol. Methods

Depression screening is not enough

Ann. Intern. Med.

Similar effectiveness of paroxetine, fluoxetine, and sertraline in primary care: a randomized trial

J. Am. Med. Assoc.

Research report
Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians’ diagnoses