Modern medicine has had a considerable impact on mortality rates for serious illness. Many chronic diseases which have previously been associated with an increased mortality now have survival rates approaching those of the background population. However, chronic diseases such as cancer, chronic pain syndromes, and chronic inflammatory conditions impose a considerable burden on families, the health care system, and society. Health related quality of life (HRQOL) is a concept that has developed from the need to estimate the impact of such chronic diseases. HRQOL measurement is a conceptual framework which attempts to predict daily function and well being based on subjective attitudes and experiences of physical, social, and emotional health. It has been evaluated predominantly from the patient's viewpoint as proxy respondents appear to underestimate the full effect of chronic illness on functional status. Measuring HRQOL in clinical research is most frequently undertaken using multi-item questionnaires to estimate daily function. Factors which affect HRQOL can be broadly classed as disease related and disease independent. The use of different assessment techniques permits comparisons between and within disorders. Generic and disease specific instruments used together enhance the ability to direct treatment for individuals and patient populations. Psychometrically sound questionnaires must be used. However, the type of instrument and research methods adopted depend on the question of interest. We have attempted to catalogue and critically assess the disease specific instruments used in the assessment of chronic gastrointestinal disease.
Statistics from Altmetric.com
Chronic gastrointestinal disorders (GID) such as gastro-oesophageal reflux disease (GORD), non-ulcer dyspepsia (NUD), irritable bowel syndrome (IBS), and inflammatory bowel disease (IBD) have mortality rates similar to the general population. Hospitalisation and surgical rates for these disorders are easily predicted by disease severity while daily functioning, well being, and life satisfaction, important features of HRQOL, are better predictors of ambulatory health services used.1 Direct costs in Canada for chronic GID were $3.32 billion in 1997, fourth after cardiovascular, respiratory, and mental disorders.2 HRQOL assessment thus provides an important yardstick to assess these conditions by promoting patient involvement in management, fuller measurement of disease impact, and implementation of the most cost effective strategies.
The number of publications in gastroenterology claiming to address quality of life (QOL) and HRQOL has increased dramatically in recent decades, as shown in fig 1. However, most reports merely pander to the sensitive new age approach to chronic illness and do not truly evaluate HRQOL. We have therefore attempted to catalogue and critically evaluate the published HRQOL instruments pertaining to gastrointestinal diseases, particularly addressing their psychometric properties and clinical applications.
Health related quality of life
HEALTH RELATED QUALITY OF LIFE: A WORKING DEFINITION
HRQOL is a concept which reflects the physical, social, and emotional attitudes and behaviours of an individual as they relate to their prior and current health state.3 HRQOL assessment describes health status from the patients' perspective and serves as a powerful tool to assess and explain disease outcomes.4 For example, two patients with ulcerative colitis (UC) may well have identical disease extent, severity, and medical therapy, yet one may hold a full time job with a vigorous social and family life while the other is unemployed, depressed, and receiving a disability pension. The functional domains that comprise HRQOL are outlined in table 1. Physical symptoms for a particular GID are more likely to be disease dependent, while the psychological and social effects are disease independent and are better predicted by cognitive function, knowledge, socioeconomic status, education, personality, coping strategies, social support network, culture, beliefs, and so on.5
APPLYING HRQOL MEASUREMENT
HRQOL measurement is important to patients, clinicians, researchers, and policy makers. Potential applications include identification of the problems of individuals or populations, assessment of quality of health care delivery, enhancement of disease related knowledge, and measurement of treatment efficacy or disease outcome.6 HRQOL assessment is also a critical component of pharmacoeconomic evaluation.
The development and full psychometric testing of a new HRQOL instrument generally takes several years to complete. Excellent review articles4-7 have addressed the detailed methodological process, which we will briefly summarise.
The three main types of HRQOL instrument areglobal, generic, and disease specific and the benefits of each are shown in table 2.8 The global assessment measures a single attribute using a visual analogue or graded scale to summarise overall function. For example, 80% of patients have “good” HRQOL. These assessments, although easy to perform, do not identify specific areas of dysfunction.3 Genericinstruments are multi-item questionnaires addressing various aspects of health and well being and have been derived in the general population, which includes both healthy subjects and people with acute or chronic illnesses. They are the most likely to detect an unexpected disease impact but may be unable to quantify clinically important dysfunction or change in function.4 For example, a generic instrument will not address abdominal pain, urgency, or fear of leaving the house, problems experienced by many IBS patients, but does emphasise mobility and grooming, which are not common IBS problems. Until recently, generic assessments have represented the predominant method of measuring HRQOL in GID. Instruments such as the sickness impact profile (SIP),9 psychological general well being (PGWB) scale,10 and short form 36 (SF-36)11 are the most commonly used and allow a direct comparison between individuals or populations with different diseases. Several, together with their psychometric properties, are listed in table 2.Disease specific instruments are designed for patients with a particular disease to identify the most relevant problems. Such instruments are generally more sensitive to patient concerns and changes in health status.4 The major disadvantages are that no specific instrument is available for many disorders and that some unanticipated problems may be easily overlooked. To optimise HRQOL assessment, many studies now use both generic and disease specific instruments.
The important steps to develop and psychometrically test a HRQOL instrument are outlined in tables 3 and 4.4 7 12 13 We will focus primarily on disease specific instruments but highlight a few important studies that have used generic instruments.
To identify all disease specific HRQOL measures used in gastrointestinal (GI) or liver disease, a thorough MEDLINE search from 1966 to September 1999 of fully published articles in English using the search terms “quality of life”, “liver disease”, and “gastrointestinal disease” was performed. Reference lists of relevant citations were also reviewed to ensure complete retrieval. Studies combining previously validated questionnaires were not considered as separate instruments.
The gastrointestinal quality of life index (GIQLI) was developed by Eypasch and colleagues to measure HRQOL in multiple GIDs.14 The questionnaire contains up to 36 items, scored on a five point Likert scale (range 0–144, higher score=better QOL), in which additional modules, specified by the particular GID, supplement a set of core questions. Construct validity was supported by demonstrating a reasonable correlation with the Spitzer quality of life index (r=0.53) and the Bradburn affect balance scale (r=0.42) in 204 German patients with a variety of GI illnesses. Patients with the most severe GID had a mean GIQLI score of 45 (14.8) compared with healthy controls who had a mean score of 125.8 (13). The GIQLI also discriminated well between patient groups when stratified by illness severity. The test-retest reliability was excellent (intraclass correlation coefficient (ICC) 0.92), as was internal consistency (Cronbach's alpha >0.90). In 194 patients who underwent laparoscopic cholecystectomy for biliary colic, a significant improvement (responsiveness) was observed from a mean score preoperatively of 87.3 (17.3) to 111.7 (14.6) six weeks postoperatively (p<0.001), although changes in specific subscores were not reported. The concept of a modular questionnaire, similar to combining disease specific and generic instruments, holds promise if it is shown to be psychometrically robust in other GIDs.
HRQOL in gastro-oesophageal reflux disease
Symptoms of GORD occur monthly in approximately 40% and daily in 7% of the adult population.15 Twenty four per cent of sufferers will consult a physician, often fearing a serious condition such as cancer.16 Specific symptoms, such as heartburn, regurgitation, or chest pain, substantially impair HRQOL and over half of patients require intermittent or continuous therapy.17McDougall et al assessed long term HRQOL in GORD using a postal survey.11 After 10 years, 70% of 101 respondents reported persistent symptoms or the need for ongoing therapy. The mean SF-36 physical functionsubscore was significantly worse in GORD patients than in the general population (65.4 v 79.7; p=0.038) but was similar to that of patients with acute myocardial infarction (69.7). The mean social function was even lower for GORD than for congestive heart failure (71.3)18and was significantly impaired compared with the general population (62.5 v 83.3; p<0.001). These results suggest that patients with GORD feel as seriously affected as do patients with important cardiovascular disease.
Harris and colleagues used decision modelling to compare three medical strategies for preventing recurrence of erosive oesophagitis.19 They determined that the degree of QOL impairment could be used to select the optimum therapy; that subjects with poor QOL could be treated more cost effectively with an initial proton pump inhibitor and those with less impaired QOL should receive a H2 receptor antagonist first. Such findings, using generic measures, can be greatly complemented by applying disease specific instruments. To date, five disease specific HRQOL instruments for GORD have been published and are shown in table 5.
The gastrointestinal symptom rating scale (GSRS) was developed by Svedlund et al in 1988 to discriminate between several GIDs.20 Items were selected primarily from IBS and peptic ulcer disease (PUD) symptoms, using clinical experience and a literature review. Initial validation was performed for a physician administered 15 item questionnaire, with items such as epigastric pain, heartburn, and eructation scored on a four point Likert scale. A subsequent self-administered version, using a seven point Likert scale, was shown to have good internal consistency, and factor analysis identified five important domains:abdominal pain syndrome, reflux syndrome, indigestion syndrome, diarrhoea syndrome, and constipation syndrome.21 In a mixed patient population, the GSRS discriminated well between patients with PUD, oesophagitis, and a normal endoscopy on all domains (p<0.01) except the constipation syndrome, with the most marked difference being noted in thereflux syndrome (p<0.00001).21Revicki et al recently undertook further validation and responsiveness testing in 516 GORD patients before and six weeks after administration of ranitidine 150 mg twice daily.22 They observed significant correlations between subscores of the GSRS, SF-36, and PGWB index (r =−0.43 to −0.21; p<0.0001). Mean subscores in all five domains significantly discriminated between responders and non-responders (2.79 v 3.24, respectively; p<0.0001). The greatest improvement occurred in thereflux domain, with therapy producing a mean decrease in score of 1.23 in responders and 0.46 in non-responders (p<0.0001). This identified a clinically important score change of approximately 1.0 and suggested the reflux subscore as the most important for GORD.
Galmiche et al used the GSRS as an outcome in a double blind trial of omeprazole 10 mg or 20 mg daily versus cisapride 10 mg four times daily in 424 patients with mild GORD.23 The global GSRS score improved in all treatment groups while the reflux domain improved significantly in the omeprazole 20 mg group compared with the cisapride group (−1.50 v −0.98; p=0.001). In a similar trial, Havelund et al compared omeprazole 10 mg or 20 mg daily with placebo in 408 endoscopically normal GORD patients.24 After four weeks, thereflux dimension improved significantly in both omeprazole groups (p=0.003—10 mg, p=0.0001—20 mg) as well as in the omeprazole 20 mg compared with the 10 mg group (p=0.04). These data provide further evidence that the GSRS, particularly thereflux domain, can measure clinically important changes in HRQOL.
Locke et al focused on GORD related symptoms from a general bowel questionnaire25 adding the medical outcomes study (MOS) short form 20 (SF-20) to produce the gastro-oesophageal reflux questionnaire (GORQ).26 The final 76 item instrument had acceptable test-retest reliability (kappa 0.70) but the authors have not yet fully examined the validity or responsiveness, thereby limiting the current usefulness of this instrument.
A third GORD specific instrument, the gastro-oesophageal reflux disease health related quality of life scale (GORD-HRQL), was developed by Velanovich and colleagues.27 This 10 item questionnaire was drafted using clinician opinion (face validity), scored using a six point Likert scale, and administered to 72 patients with severe GORD before and after medical or surgical therapy. The GORD-HRQL score discriminated well between individuals based on their satisfaction with current symptoms (median score 26 in the unsatisfiedv 5 in the satisfied group; p<0.0001). Surgical patients were more greatly improved than medical patients (median improvement 27.5 v 11, respectively; p=0.002). However, the scores correlated poorly with pretreatment 24 hour pH testing (r=0.09; p=0.7), lower oesophageal sphincter pressures (r= −0.21; p=0.24), and the SF-36 and subscores.28 Although scores correlated moderately with the endoscopic oesophagitis grade (r=0.53; p<0.001),29 further assessment is clearly needed before it can be recommended for clinical research.
A fourth disease specific instrument, the heartburn quality of life (HBQOL), was developed by Young and colleagues.30Validation of the 15 items against the SF-36 was undertaken but raw data supporting a claim of moderate correlation were not provided. A 12 item version with six domains was later used in a randomised trial.31 Dimensional scores were significantly better than placebo in patients given ranitidine 150 mg twice daily for six weeks for heartburn pain (72.4v 62.8; p<0.001),sleep (87.6 v80.8; p<0.001), diet (83.7v 76.0; p<0.001), andmental health (73.8v 67.2; p<0.001). Unfortunately, the HBQOL was not administered before treatment thereby precluding full responsiveness assessment. This questionnaire will require further psychometric testing.
The final GORD specific HRQOL instrument is the quality of life in reflux and dyspepsia (QOLRAD), a 25 item questionnaire, with each item scored on a seven point Likert scale, and five subscores.32 Items were generated using “focus groups” of patients with GORD or NUD and were then tested in 759 patients referred for endoscopy in five countries. Construct validity was supported by its correlation with almost all domains of the SF-36 (r=0.44–0.71), GSRS (r=0.29–0.63), and severity (r=−0.31 to −0.38) or frequency of symptoms (r= −0.27 to −0.34), as judged by clinicians. QOLRAD scores also significantly discriminated between patients who did or did not use concomitant sedatives for anxiety (mean emotional scores 3.4 v 4.2, respectively). Responsiveness of the QOLRAD has not yet been determined.
Disease specific instruments can therefore discriminate GORD from other disorders, can stratify patients by severity, and are useful as outcomes in clinical trials and decision modelling. Overall, the GSRS has been the most extensively evaluated of the GORD instruments and has favourable psychometric properties, making it more attractive currently than the other questionnaires.
Functional dyspepsia, or NUD, occurs in approximately 25% of the general population.33 Despite normal investigations, subjects experience considerable anxiety and demonstrate health care seeking behaviour.34 Patients with NUD describe abdominal pain, interruption of daily activities,35 and decreased sexual drive.36 An important barrier to dyspepsia research has been the difficulty in quantifying the severity of the subjective complaints,37 which has led to the development of several disease specific instruments, shown in table 6.
An Italian group, led by Bamfi, developed the quality of life in peptic disease questionnaire (QPD).38 Items were generated by patients with confirmed PUD, oesophagitis, or NUD. A 30 item questionnaire was then administered to several patient groups and validation by factor analysis demonstrated three domains:anxiety induced by pain, social restrictions, and symptom perception. Low to moderate correlations were observed with all domains of the SF-36 (r=0.26–0.60) (construct validity). Responsiveness to change was shown by a significant improvement in the total score (mean change 11.5; p=0.001) and dimensional scores (mean change 2.8–4.9; p=0.001) four weeks afterHelicobacter pylori eradication. Cross cultural adaptation in non-Italian patients has not yet been reported.
The functional digestive disorders quality of life questionnaire (FDDQL), developed by Chassany et al to measure QOL in patients with functional dyspepsia or IBS, has been assessed in French, German, and English patients with dyspepsia.39 Seventy four items were later reduced to 43 and scored using a five point Likert scale within eight domains. The FDDQL discriminated well among patients with different degrees of handicap as assessed by the investigators. This was most marked for themean daily activity score (80 in patients with no handicapv 36 for extreme handicap; p<0.05). Construct validity of the FDDQL was supported by significant correlations between its subscores and those of the SF-36. The correlation was strongest between the daily activity score and both the SF-36 physical role limitation and bodily painsubscores (r=0.63, p<0.0001). The FDDQL is currently being evaluated to determine its ability to detect change.
Martin et al developed the quality of life in duodenal ulcer patients (QLDUP) by combining the SF-36, PGWB index, and 13 disease specific items derived from patient and clinician interviews.40 The 54 item instrument with 15 dimensions was administered to French patients with acute duodenal ulcer (DU), a prior history of DU, or NUD, and showed good internal consistency (ICC >0.70) and test-retest reliability (Spearman's coefficient 0.73). Validity was claimed by identifying significant differences in scores between groups. However, the data to support this assertion were not provided. A subsequent trial by Rampal et alin 581 patients with a recently healed DU compared maintenance nizatidine (150 mg/day) with intermittent nizatidine therapy (300 mg/day as needed).41 Patients receiving daily maintenance therapy had significantly better HRQOL compared with the intermittent treatment group in seven of the 15 dimensions at one year follow up (p<0.05). Although these studies support the construct validity of the QLDUP, responsiveness and assessment in other languages are lacking at this time.
A short eight item questionnaire using a five point response scale, developed by Veldhuyzen van Zanten et al, was pilot tested in 10 patients with NUD and 14 with H pylori associated gastritis (HPAG).42 It was then administered to 55 patients with NUD or HPAG before and four weeks after antacid or H pylori eradication therapy, respectively. The instrument was responsive to change for both NUD (mean change −2.7; p=0.003) and HPAG (mean change −3.6; p=0.002) showing a significant improvement in scores, which correlated with the patient's self-reported global response (p<0.0001).
The QOL-RAD, discussed above, has also been validated in dyspeptic patients (table 6).
Each of the six disease specific HRQOL questionnaires for dyspepsia has undergone some psychometric evaluation supporting both validity and responsiveness. However, none has been satisfactorily assessed to warrant a recommendation for preferred use.
Irritable bowel syndrome
IBS is characterised by abdominal pain, altered bowel habit, and disturbed sensory and motor function with normal bowel morphology.43 The prevalence ranges from 6.6% to 21.6% of the general population44 and results in approximately 3.5 million physician visits and 2.9 million prescriptions annually in the USA.45 Whitehead et al have shown that IBS patients have significantly poorer SF-36 scores than healthy controls (general health 62.3 v85.6; p<0.001).46 These patients have difficulty travelling, participating in sports, and attending social gatherings. Extraintestinal symptoms, such as back pain, headache, dyspareunia, urinary symptoms, and sleep disturbance are also more frequent in IBS patients than in healthy controls.47 These symptoms result in work absenteeism, job changes, and premature termination of employment.48 The lack of objective parameters to assess health status has prompted several groups to develop disease specific measures of HRQOL for IBS, as shown in table 7.
The first, the IBSQOL, was developed at UCLA by Hahn and colleagues.49 Each of 30 items is scored on a five or six point Likert scale and summed in nine subscores. The IBSQOL discriminated well between a control group with non-IBS GI disorders and unselected patients with IBS. A later study showed that the IBSQOL could also discriminate between IBS patients with different disease severity.50 However, no data regarding the construct validity or responsiveness have been published.
The IBS-QOL, a 34 item instrument developed by Patricket al, was reviewed by European gastroenterologists in Britain, Germany, Italy, and France during the item reduction phase to ensure cross cultural validity.51A cross sectional survey of 169 patients with IBS demonstrated moderate construct validity with the SF-36 (r=0.30–0.44), PGWB (r=0.31–0.45), and the symptom check list (SCL90-R) (r =−0.27 to −0.46). The IBS-QOL discriminated well between patients with mild and high symptom frequency (mean score 69.7 v 55.0; p<0.0001) and symptom severity (mild vhigh, 72.2 v 53.8; p<0.0001). It could also discriminate between patients based on frequency of physician visits in the preceding six months (mean score 53.0 for greater or 65.6 for fewer; p < 0.05) and by the number of work days missed in the previous year (mean score 68.9 for 0 days missed v54.6 for ⩾6 days missed; p<0.05). Eight different domains were identified by factor analysis. The IBS-QOL had excellent test-retest reliability and internal consistency. However, this study did not assess the responsiveness to change of the IBS-QOL.
A third IBS specific instrument, the irritable bowel syndrome questionnaire (IBSQ), has been developed by Wong and colleagues.52 This 26 item questionnaire is scored using a seven point Likert scale. Item selection was performed using patient and caregiver interviews, literature review, and questions generated in the development of a disease specific HRQOL index for IBD. Item reduction was undertaken in 100 patients. Validation using factor analysis defined four domains: bowel symptoms, fatigue, activity limitations, and emotional function. No data on construct validity, reliability, or responsiveness have been reported.
The FDDQL, described with dyspepsia, was also developed for patients with IBS. However, no validation data for IBS have been published to date.
These four disease specific instruments may prove to be useful for measuring outcomes in IBS. Currently, the IBS-QOL has been the most extensively validated, although none of the instruments has yet been tested for responsiveness. Further experience with these instruments in clinical trials or natural history studies will better define their psychometric properties, allowing researchers to determine the most appropriate tool for their immediate purposes.
Inflammatory bowel disease
IBD has been extensively evaluated in the HRQOL literature. Although many reports are purely observational, there is considerable evidence, using generic and disease specific instruments, that IBD patients have impaired HRQOL compared with healthy controls in physical, social, and emotional function, that HRQOL worsens with more severe disease, and that patients with Crohn's disease (CD) generally have poorer HRQOL than do patients with UC.12 53-56Family members and physicians underestimate dysfunction compared with patients themselves.55 Nevertheless, 80% or more are able to work and maintain meaningful lives.57 58 Non-disease related features such as sex (females), age (older), smoking status (smokers with CD), socioeconomic group (poorer), type of treatment, type of instrument used, as well as the individual's life experiences and personality also predict overall function.53 The most prevalent problems experienced by IBD patients are loose or frequent stools, abdominal pain, worries about subsequent disease flares, cancer or the need for surgery, and social restriction such as not eating in restaurants or avoiding sports.54 56 Some subjects shun personal relationships and sexual dysfunction is a particular problem for patients with perianal CD or post-colectomy.53 The chronicity as well as the sometimes disparate relationship of functional status to inflammatory markers makes IBD well suited to HRQOL measurement. The questionnaires currently available are shown in table 8.
The first published and most extensively validated is the inflammatory bowel disease questionnaire (IBDQ)55 which was developed as a clinical trial outcome measurement. This 32 item questionnaire has four domains with each item scored on a seven point Likert scale (score range 32–224) with a higher score indicating better HRQOL. During the preliminary psychometric assessment patients were more likely to report social and emotional impairment, such as anger or embarrassment, when verbal or written cues were used to elicit these problems. Subsequent studies have shown that mean IBDQ scores in patients during disease exacerbation or remission (total and subscores) are comparable among different study populations (convergent validity) and that mean scores correlate well with disease severity. The IBDQ is responsive to change in disease status for both UC and CD patients and works well whether self-administered or interviewer administered.59 The sensitivity of the IBDQ in detecting change was superior to that of a generic measure (Rand physical and emotional function), indicating that it is disease specific.55 The IBDQ was fully psychometrically assessed during a clinical trial of cyclosporin in 305 patients with CD12 and discriminated significantly between patient groups based on disease activity. Construct validity was supported by a strong correlation of the IBDQ bowel function with the CDAI (r=−0.71; p<0.001). Importantly, the IBDQ worsened by a mean of 32 in patients who deteriorated, and in all the subscores, compared with 16 in those who remained stable. The IBDQ has proved to be an excellent outcome measure in numerous clinical trials.60-62 A 10 item version, the short IBDQ, has been validated to facilitate its use in clinical practice.63 Some cross cultural validation has been undertaken but has not yet been extensively published.64-66 Correct harmonisation of different translations has not been undertaken.
Several groups have added or modified questions, scaling, or wording of IBDQ items without necessarily following the critical development guidelines or repeating the validation process.67-69 In general, results have been similar to those of the original questionnaire.
In 1991 Drossman and colleagues published the rating form of inflammatory bowel disease patient concerns (RFIPC)56which was developed to discriminate IBD from other intestinal disorders and predict disease outcome. This self-administered questionnaire has four subscores and 25 items rated on a visual analogue scale from 0 to 100. The instrument was tested in 991 IBD patients with scores being significantly worse in patients with CD compared with UC, partly due to disease severity (discriminative validity).1 Also, the RFIPC demonstrated construct validity by showing significant correlations (r = 0.4) with the sickness impact profile (SIP), SCL-90, and patients' self reports. However, there has not been a clear association between disease severity and worries and concerns.70 71 Increased scores have been observed in patients who perceive a lack of disease information72 and greater emotional concerns have been noted in patients who have been referred for psychosocial counselling.73 Further work in UC patients before and after colectomy also demonstrated significant correlations between the RFIPC, SF-36 (r=−0.13 to −0.62) and SIP (r=0.43–0.53).74Post-colectomy patients were significantly less worried than a US sample of Crohn's and colitis foundation members (median 16.6v 38.3; p< 0.01). The RFIPC has also been validated in Swedish70 and French.71
The Cleveland clinic questionnaire was developed by Farmer and colleagues75 as a subset of 18 questions from a larger generic questionnaire measure. Items are scored on a five point Likert scale. Four domains were established with good test-retest and interrater reliability. Construct validity was demonstrated by significant correlations with the SIP (data not provided). The instrument discriminated between UC and CD (mean score 77.7v 73.3, respectively; p=0.009) and also UC and CD patients who required surgical therapy (mean score 72.4v 66.0, respectively). This instrument, however, has not been studied as an evaluative measure and cannot be recommended for use in clinical trials at this time.
Finally, Zbrozek and colleagues76 described a 12 item UC specific instrument, scored on a visual analogue scale, which combined seven generic questions used by Somerville and colleagues77 and five new items (face validity). The instrument was assessed in a clinical trial of mesalazine in 376 UC patients. Test-retest reliability was demonstrated by the absence of a significant change in scores in unchanged patients (p>0.05), although no ICC was calculated. Conversely, the instrument showed responsiveness in those patients who either improved or worsened (range of change −2.6 to −4.0 improved v −2.6 to 2.3 worsened; p=0.0001). Discriminative validity was demonstrated by showing a significant difference in scores when groups were stratified by physician assessed severity (p=0.0001). However, no formal construct validation was carried out. This measure may be useful in clinical trials; however, further study is needed to better delineate additional psychometric properties.
There are several disease specific HRQOL instruments for IBD in circulation that have been either partially or fully validated. At the present time, the IBDQ is the most extensively validated and can be recommended as an evaluative instrument for use in clinical trials. Questionnaires should be selected based on the intended application as well as psychometric properties. New instruments will likely be needed for other areas, such as sexual dysfunction in IBD, which may not be adequately assessed by currently available questionnaires.
Anorectal disorders, such as haemorrhoids, affect 4.4% of the population and account for 1.5 million prescriptions annually in the USA78 while 2.5 million British residents are at risk of anorectal infections.79 Pelvic floor and sphincter related dysfunction cause symptoms such as rectal pain, incontinence, and constipation. Although no disease specific measures are yet available to evaluate these disorders, several studies have assessed HRQOL using general instruments. Sailer et aladministered the GIQLI to 325 consecutive patients attending a proctology clinic and stratified subjects into nine groups based on the underlying disorder.80 Patients with anal fissure (mean score 104), constipation (mean score 94), or incontinence (mean score 93) had significantly lower GIQLI scores than healthy controls (mean score 121; p<0.0001). Those with constipation or faecal incontinence also had significantly poorer HRQOL than the other seven groups who suffered from disorders such as haemorrhoids and abscess.
Other investigators have examined adults with faecal incontinence following surgical therapy for childhood anorectal disorders. A Finnish group identified 26 subjects treated surgically for benign sacrococcygeal teratoma81 and 83 who had surgery for low anorectal malformations.82 Both groups had impaired bowel function with 27–39% of the respective cohorts reporting social problems due to impaired continence. In contrast, Mooreet al found that 75% of 178 patients treated surgically for Hirschsprung's disease described excellent function whereas only 6% described persistent incontinence and resultant psychosocial problems.83 This apparent difference in social dysfunction between studies may have been due to a stricter definition of social problems in the latter study, although none of these studies adequately described the methods of QOL assessment.
Baeten et al prospectively studied the impact of anal dynamic graciloplasty in patients with faecal incontinence of various aetiologies.84 Using the Nottingham health profile, the state-trait anxiety inventory (STAI), and the Zung self-rating depression scale, significant improvements were observed in anxiety (median change in STAI score −6 in successful v +5 in unsuccessful; p=0.002) and social life (median score −2 successfulv −1 unsuccessful; p=0.01) in patients deemed to have a clinical success. The same group used QOL assessment to suggest that dynamic graciloplasty was also cost effective for patients with faecal incontinence.85 The cost of dynamic graciloplasty (US$31 733) was higher than conservative therapy (US$12 180) but was associated with improved clinical success and QOL. Pescatori et al demonstrated a similar clinical response and improved anxiety scores in patients treated with transanal electrostimulation for faecal incontinence.86
One hundred and two consecutive patients with chronic constipation of diverse aetiology were assessed using the PGWB and GSRS by Glia and Lindberg.10 PGWB scores (mean score 85.5) were poorer than in a historical control group of healthy individuals (mean score 102.9). Moreover, patients with normal transit constipation (NTC) had a significantly lower mean PGWB score compared with those with slow transit constipation (STC) (NTC 82 v STC 94; p<0.05) and also scored significantly worse than STC patients on theanxiety, depression, well being, self-control, and health subscores of the PGWB (p<0.05 for all). The reason for these differences in QOL between NTC and STC patients was unclear but appeared to reflect greater symptom related concerns in the NTC group. QOL measures have also been used to evaluate treatment efficacy for chronic constipation. Ninety seven per cent of 74 patients who underwent colectomy and ileorectostomy for chronic constipation due to STC alone or with pelvic floor dysfunction were described as “satisfied” after surgery.87 However, the methods used to measure HRQOL were not clearly stated or related to other measures of bowel function.
These studies suggest that patients with severe constipation requiring surgery or those with problems of continence experience impaired QOL, particularly social and emotional dysfunction. Given that the mean age of the global population is increasing and these problems are more prevalent in elderly subjects, there is likely to be merit in developing disease specific instruments for anorectal disorders.
Malignancies of the digestive system constitute more than 20% of all newly diagnosed cancers88 and many are poorly responsive to therapy, necessitating palliative management. HRQOL, therefore, serves as an important measurement to assess the success or failure of these treatments. The European Organization for Research and Treatment of Cancer core quality of life questionnaire (EORTC QLQ-C30) was developed for this purpose.89 Problems identified by cancer patients, such as nausea, vomiting, pain, and fatigue were included. Psychometric evaluation showed good test-retest reliability (ICC 0.82–0.91).90 Later assessment in 98 patients with breast, ovarian, or colon cancer demonstrated fair to strong correlations with the functional living index-cancer (r=0.35–0.76) (construct validity).91 Although this instrument is specific for oncology patients, it is not specific for gastrointestinal malignancy. Modification and validation of the core questionnaire for specific GI malignancies will be required.
Generic QOL assessment has been performed in patients with oesophageal cancer. Retrospective92 and prospective93 observational studies using the EORTC questionnaire have noted improved QOL following oesophagectomy in patients treated for cure or palliation. Patients given epirubicin, cisplatin, and fluorouracil had better QLQ-C30 scores and survival rates than those treated with fluorouracil, doxorubicin, and methotrexate.94 Other generic QOL instruments, such as a dysphagia scale or the Karnofsky performance score, support the efficacy of both laser therapy95 and oesophageal stenting96 for palliation.
The shortcomings of applying a generic cancer instrument to evaluate QOL were illustrated by Blazeby et al in a cross sectional study of 59 oesophageal cancer patients who had been treated surgically or with palliation.97 The QLQ-C30 and a dysphagia scale both had significantly worse scores in the palliative than in the surgical group. However, the correlation between the two scores was poor (r=−0.24, p=0.18), suggesting that these instruments were measuring different features and that the QOL assessment should therefore include problems with eating and dysphagia. The same investigators therefore attempted to improve the specificity of the QLQ-C30 by adding a six domain, 24 item index that addressed the problems of dysphagia, pain, and deglutition.98 Validity and responsiveness of this new index have not yet been reported.
Many studies have used generic instruments to assess outcomes in gastric cancer patients, particularly after surgical therapy. The choice of a total gastrectomy (TG), subtotal gastrectomy (SG), or TG with a pouch reconstruction (TG+R) is somewhat controversial. Several groups have shown superior HRQOL in patients who underwent SG or TG+R compared with TG alone,99-102 while others have suggested that TG remains the treatment of choice.103 104 The most rigorous of these trials was performed by Svedlundet al who prospectively randomised 64 patients with gastric cancer eligible for curative surgery to TG (n=31), SG (n=13), or TG+R (n=20).105 QOL was assessed frequently during the five year follow up using eight different instruments, including the GSRS and SIP. Five year survival was approximately 50% in all groups. The GSRSindigestion anddiarrhoea scores were significantly worse in the TG group compared with the SG and TG+R groups. However, SIP scores worsened in the SG group compared with the TG and TG+R groups throughout the follow up period. The authors concluded that QOL was important in planning surgery for gastric cancer patients and that TG+R should be considered in patients expected to survive long term who would derive the maximum QOL benefits from this procedure.
Troidl et al have drafted a questionnaire to evaluate the subjective feelings of patients with gastric cancer, such as ability to work or enjoy hobbies.106 No methods of item selection or validation were reported. Such an index might well be useful but cannot be considered without further study.
Hallbook et al recommended the application of a disease specific HRQOL index for patients with colorectal cancer, the most prevalent GI malignancy.107They compared straight and colonic pouch anastomoses after rectal excision for cancer but failed to demonstrate a significant difference in QOL between the two procedures using the Nottingham health profile. They suggested that this generic instrument was inadequate to detect clinically relevant differences between the groups. An alternative explanation, however, is that there is little clinically important difference between the two operations.
Zaniboni et al in a randomised trial compared adjuvant 5-fluorouracil and folinic acid to surgical excision alone.108 They developed three disease specific QOL questionnaires based on a literature review and expert opinion. Eighteen common items were scored on a five point Likert scale and grouped into four domains (global quality of life, emotional well being, satisfaction with care, worry about the future). The final score was transformed to a scale from 0 to 100. The instrument had acceptable test-retest reliability (ICC 0.53–0.78) and excellent internal consistency (Cronbach's α 0.85–0.90). Patients receiving adjuvant chemotherapy had a 25% reduction in mortality and no significant difference in QOL score from the surgery only group, indicating that adjuvant chemotherapy was both effective and well tolerated. Further validation of this measure is required.
Durand-Zaleski at al evaluated the cost effectiveness of three treatment modalities for patients with colorectal liver metastases.109 Hepatic arterial infusion had the highest median survival (486 days) compared with systemic chemotherapy (298 days) and symptom control alone (254 days), but was also the most costly (£18 243 v £6089 for systemic chemotherapy and £2136 for symptom control). When the results were re-examined by cost per quality of life year gained, hepatic arterial infusion and systemic chemotherapy were similar (£23 705v £24 280, respectively).
Earlam et al studied 50 colorectal cancer patients with liver metastases and found a significant correlation between the baseline Rotterdam symptom checklist (RSC) score (physical andpsychosocial domains) and survival (r=−0.6; p<0.04).110 They suggested that QOL could provide a better survival estimate than standard measures such as tumour size and that QOL should be considered as a surrogate end point for survival in some clinical trials.
Dominitz et al used the time trade off technique to estimate patient preferences for colorectal cancer screening111 by asking them to rank their current health state in relation to death (0) or perfect health (1.0). Sixty two unscreened patients, 24 about to undergo screening, 114 involved in a study assessing risk factors for colon cancer, and 46 with diagnosed colon cancer were interviewed. Unscreened patients were willing to give up significantly more potential survival in their current health state to avoid screening sigmoidoscopy (median 91 days of life) or colonoscopy (median 183 days) than the other patient groups (median 0 days for sigmoidoscopy and 0–7 days for colonoscopy). Unscreened subjects, who likely had limited knowledge of what screening involved, anticipated a more negative impact on HRQOL. Thus knowledge about disease has the potential to impact positively or negatively on HRQOL. A more detailed discussion of HRQOL in rectal cancer is available in a recent review.112
These studies demonstrate that HRQOL may be a surrogate end point for clinical trials in neoplastic disorders and may assist in selecting treatments based on efficacy, patient preference, and cost effectiveness.
Other gastrointestinal conditions
HRQOL has been assessed in several other chronic GID such as patients with short bowel symdrome receiving home parenteral nutrition (HPN),9 113 114 and those with gastroparesis,115 116 achalasia,117 and chronic intestinal bleeding,118 and appears to be impaired in all of these disorders. Patients receiving HPN had significantly worse IBDQ scores (mean 5.0 v 5.6, respectively; p<0.05) and SIP scores (mean 17%v 8%, respectively (higher scores worse); p<0.001) than patients with short bowel who were not receiving HPN. HPN patients also had poorer SIP scores than patients with renal failure receiving dialysis therapy (13% v17%, respectively), highlighting the severity of their illness.9 Richards and Irving demonstrated that the cost per quality adjusted life year (QALY) for an average HPN patient was £68 975 compared with £190 000 per QALY for providing parenteral nutrition in hospital, suggesting cost benefit potential for treating patients at home.119 Although disease specific measures are not yet available for most of these disorders, applying valid generic instruments can assist in patient management until disease specific measures are developed.
Chronic liver disease
Italian cohort and cross sectional studies suggest a high prevalence in the general population of chronic liver disease between 17.5%120 and 26%.121 Despite the high prevalence, the development of a disease specific HRQOL measure for chronic liver disease has lagged behind developments in lumenal GIDs.7 Chronic hepatitis, cirrhosis, and cholestasis impair HRQOL due to symptoms such as fatigue, pruritus, and abdominal discomfort from ascites. Only one disease specific instrument is available for patients with chronic liver disease. Younossiet al recently developed the chronic liver disease questionnaire (CLDQ), designed to function as an outcome measure in clinical trials.122 Literature review, expert opinion, and patient focus groups were used to generate items. Item reduction was facilitated using impact (frequency and severity) scores derived from 75 patients, as well as factor analysis. The completed 29 item instrument was scored on a seven point Likert scale (possible range 29–203 from worst to best QOL). The construct validity of the CLDQ was supported by a strong correlation with patients' global rating scores (r=0.84; p=0.02). Poorer scores were noted in groups with increasingly severe Child's score (discriminant validity). The CLDQ had moderate test-retest reliability, yielding an ICC of 0.59 in patients who had reported no change in their health status after six months. A particular strength of the CLDQ is that all phases of the validation process included patients with both hepatocellular and cholestatic liver disease of varying severity (no cirrhosis to Child's C cirrhosis). This should allow the CLDQ wide application in hepatology research. The responsiveness of the CLDQ to change over time or after therapy has not yet been assessed.
Hepatitis C (HCV) infection is a common chronic illness which produces significant morbidity and mortality.123 124 A cohort study prospectively examined SF-36 scores in 72 HCV patients without cirrhosis, 30 patients with hepatitis B (HBV), and 17 healthy controls.125 All subscales of the SF-36 were significantly worse in the HCV group compared with healthy controls. Moreover, the HCV group had poorer QOL compared with HBV patients on all subscales with significantly worse scores in social functioning (mean score 65.5 v 81.7; p<0.01), physical role limitation (mean score 56.9 v 84.2; p<0.01), andenergy and fatigue (mean score 48.3v 62; p<0.01). These findings could not be attributed to differences in the degree of hepatic inflammation and suggest that the mechanism of impairment in HCV might differ from that in HBV.
Bonkovsky et al evaluated HRQOL as an outcome in a randomised trial of interferon alpha-2b (IFN) therapy in HCV patients.126 SF-36 scores improved markedly in responders compared with non-responders with statistically significant results in five of the eight subscales. Hunt et al also reported outcomes using the SF-36, hospital anxiety and depression scale, and Beck depression inventory in a cohort of 50 HCV patients treated with IFN.127 The frequency of anxiety decreased from 25% to 4% while on therapy. However, severe depression also significantly increased and then returned to baseline after cessation of IFN. SF-36 scores were similar between responders and non-responders, although responders significantly improved on therole emotional subscale. These studies support the usefulness of HRQOL to evaluate both the response to and adverse effects of IFN treatment and illustrate how HRQOL can complement standard disease outcome measures.
HRQOL assessment is particularly important for patients undergoing liver transplantation. Several groups have used generic QOL measures, such as the Nottingham health profile and the Karnofsky performance status scale. Liver transplantation improved HRQOL in all surviving patients one, two, and five years post-transplant with most patients returning to pre-transplant employment within 2 years.128Similar results have been shown in cholestatic liver disease by Navasa and colleagues129 and Gross and colleagues.130
Formal HRQOL application has increased substantially in the assessment of chronic GID. However, there is wide variability in studies with respect to patient selection, disorders assessed, and methods of evaluation. HRQOL evaluation is part of the skilful clinician's armamentarium and is highly relevant to the patient. Clinicians and researchers use HRQOL to assess individual patient needs, disease outcomes, and to explain discrepancies between disease severity and functional capacity. Such information is then used to tailor management for a particular patient.
Patients must be active participants in their own management and compliance improves when their preferences are considered. The availability of robust disease specific QOL instruments is critical to permit better assessment of HRQOL in GIDs. Combining generic and disease specific instruments is recommended as it allows comparisons between diseases and within disease groups. Identification of patients who might benefit from specific treatments such as aggressive therapy, counselling, or psychosocial interventions can also be done using HRQOL assessment. Using new technologies, such as pen based electronic questionnaires,131 may simplify the administration and scoring of HRQOL instruments, making them more accessible to patients and physicians. Health policy analysts require meaningful data to make informed decisions about providing effective and efficient screening, diagnostic, and therapeutic programmes.
As with clinical trial outcomes and disease severity measurement, the development of many competing HRQOL indices is neither desirable nor helpful to current research. A more efficient approach would be to facilitate collaboration among investigators to develop new (where needed) or appropriate refinement and cross cultural validation of existing instruments. Only when this occurs will we work towards better understanding of the impact of chronic GI disorders on HRQOL.
- Abbreviations used in this paper:
- health related quality of life
- quality of life
- inflammatory bowel disease
- gastro-oesophageal reflux disease
- non-ulcer dyspepsia
- irritable bowel syndrome
- Crohn's disease activity index
- short form
- sickness impact profile
- psychological general well being
- gastrointestinal disorder
- gastrointestinal quality of life index
- intraclass correlation coefficient
- gastrointestinal symptom rating scale
- peptic ulcer disease
- gastro-oesophageal reflux questionnaire
- medical outcomes study
- heartburn quality of Life
- quality of life in peptic disease
- functional digestive disorders quality of life questionnaire
- duodenal ulcer
- quality of life in reflux and dyspepsia
- quality of life in duodenal ulcer patients
- Helicobacter pylori associated gastritis
- symptom checklist
- irritable bowel syndrome questionnaire
- rating form of inflammatory bowel disease patient concerns
- ulcerative colitis
- Crohn's disease
- inflammatory bowel disease questionnaire
- state-trait anxiety inventory
- normal transit constipation
- slow transit constipation
- EORTC QLQ-C30
- European Organisation for Research and Treatment of Cancer core quality of life questionnaire
- total gastrectomy
- subtotal gastrectomy
- total gastrectomy plus gastric reconstruction
- Rotterdam symptom checklist
- home parenteral nutrition
- quality adjusted life year
- chronic liver disease questionnaire
- hepatitis C virus
- hepatitis B virus
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.