Quality of life assessment in gastro-oesophageal reflux disease
  1. E J Irvine
  1. Correspondence to:
    Professor E J Irvine
    Department of Medicine, University of Toronto, Head, Division of Gastroenterology, St Michael’s Hospital, Room 16-052 Cardinal Carter Wing, 30 Bond St., Toronto, ON, M5B 1W8, Canada;


Health related quality of life (HRQoL) is determined by both disease and non-disease related factors. In chronic illnesses, such as gastro-oesophageal reflux disease (GORD), daily function, HRQoL status, and health resource utilisation are critical outcomes. Patients suffering from GORD report many symptoms, such as heartburn or regurgitation, and health care seeking is driven by both symptom severity and the impact on HRQoL. Some individuals intuitively alter their lifestyles while others do not. Most who have bothersome symptoms desire effective medical or surgical treatment. Several descriptive studies have reported significant HRQoL impairment in GORD patients compared with the general population, similar to other chronic conditions, such as myocardial infarction. Disease severity correlates strongly with HRQoL and also contributes to work absenteeism and reduced productivity. Non-disease features, such as the presence of anxiety and comorbid conditions, also negatively impact on HRQoL. Several generic and disease specific HRQoL instruments have been applied in patients with GORD. In clinical trials, the psychological general well being index (PGWBI), SF-36, quality of life in reflux and dyspepsia (QOLRAD) scale, and gastrointestinal quality of life index (GIQLI) have assessed HRQoL as a secondary outcome in patients with GORD. The degree of validation and psychometric assessment of some questionnaires varies. Few have been sufficiently assessed to fully recommend their use. Selecting the most appropriate HRQoL instrument should be based on the research question and study design. However, psychometric robustness is critical to accurately interpret results. Combining a generic and disease specific instrument may avoid missing unexpected outcomes and ensure recognition of all clinically important changes. Effective GORD treatment rapidly improves HRQoL but long term outcomes are most important for this chronic health problem, and must be incorporated into economic evaluations.


Evaluating the impact of GORD has relied heavily in the past on scales assessing symptom severity, such as heartburn, regurgitation, or pain, together with the endoscopic appearance of the oesophageal mucosa. Laboratory markers such as anaemia are too insensitive for diagnosis, except in the most severe GORD cases, and pH studies, although useful in selected subjects, are generally not used to assess therapeutic response. Moreover, these assessment methods, even when used collectively, still fail to reflect the functional status of some patients. HRQoL has surfaced therefore as a clinically relevant measure of disease impact and treatment response.

HRQoL can be defined as the functional effect of an illness and its therapy on an individual, as perceived by the individual himself or herself. The domains that determine HRQoL include physical and occupational function, emotional state, social interactions, and somatic sensation.1 These determinants can also be further classified as simply disease related, including symptom severity, treatment efficacy, and adverse effects of treatment, or disease independent factors, such as sex or age, education and knowledge, personality and coping skills, culture, and beliefs.2 A list of the relevant domains and items that measure HRQoL in patients with GORD are listed in table 1.

Table 1

The domains of health related quality of life in gastro-oesophageal reflux disease

Several decades ago, Engel proposed a biopsychosocial model of chronic disease that has been appropriately applied in gastrointestinal disorders such as GORD or irritable bowel syndrome.3 Briefly, the model acknowledges that an individual may have a genetic or biological predisposition to a condition, such as GORD, and that environmental or psychosocial factors can initiate or enhance alterations in gut motility and/or sensation, which interact through complex central, peripheral, or enteric nerve pathways to produce a symptom and response pattern. Individual experiences and beliefs may well alter these variables, and the HRQoL of two individuals who have identical biological disease could still result in one who is fully functioning and another who is completely disabled.


Evaluation of HRQoL is conventionally performed using questionnaires or surveys that can be scored quantitatively and address problems that occur as a consequence of the chronic illness, as well as features of psychosocial well being. The three types of instrument that measure HRQoL (reviewed by Guyatt and colleagues4) include global assessments (single descriptors), generic measures (multi-item profiles generated in a general population that can be applied to different diseases), and disease specific instruments (developed in and for patients affected by a common condition). Global assessments are simple to administer and provide a useful summary of function but fail to identify the important determinants of good or poor HRQoL. While generic measures, such as the SF-365 or PGWBI,6 allow comparisons of groups with different conditions or help identify unexpected HRQoL issues, it is the disease specific questionnaire that is most likely to detect small but clinically important risks or benefits.4,7,8 A considerable literature is available describing HRQoL assessment in gastrointestinal and liver disease (reviewed by Borgaonkar and Irvine9) and attributes and review criteria to select the most appropriate instrument (reviewed by the Scientific Advisory Committee of the Medical Outcomes Trust10).

Health status scales, such as HRQoL indices, can be scored by examining individual items such as heartburn, regurgitation, or pain, clusters of similar items in subscores (such as vitality or emotional function), or by summing all the items into a global or summary score.11 These different scoring methods allow identification of common problems of individuals, groups, or populations, allow comparisons at a single time point, or help measure change over time (natural history) or after therapy (clinical trials). HRQoL assessment thus provides a useful yardstick for patients attempting to improve their HRQoL, and for clinicians, researchers, or policy makers by helping identify the needs of individual patients, assessing the impact of therapy (in individuals or in clinical trials), and determining health policy. In general, improving the patient’s HRQoL is a common objective that can be effected using a variety of approaches. HRQoL assessment can also be applied in pharmacoeconomic analyses7 when comparing costs to achieve a particular outcome (such as remission, no symptoms), or costs per quality adjusted life year (QALY) gained. Utilities are patient generated preference “weights”, ranging from 0.0 (death) to 1.0 (full health), that are often derived in a reference population using instruments such as the time trade off12 or the health utilities index mark III (HUI).13 One QALY represents one full year in perfect health. Different utility instruments such as different HRQoL tools may elicit different values in the same reference population but may well produce similar results when measuring changes over time or after treatment.

It is the responsibility of the research team to ensure that the HRQoL or utility instrument used for any given study clearly addresses the problems of the research question. Most importantly, it should have undergone a full psychometric assessment (with respect to validity, reliability, and responsiveness).4,7–15 “Validity” is a comparison of the new index score and a reference score (convergent validity), or a construct of what the new index is measuring, such as a prediction that patients with more severe disease will have poorer HRQoL scores (construct validity). “Reliability” is an assessment of the measurement error of scores (test–retest reliability) or the correlation among items or subscores (internal reliability). “Responsiveness” gives a signal to noise ratio and allows interpretation of what degree of change is clinically important. In selecting a questionnaire, it is important that it performs robustly in the population being sampled and for the study design planned. The results must be analysed according to the research methods and interpreted objectively, addressing any biases that may confound the results.


Heartburn is the most common symptom, identified by 89% of GORD patients, and the most prevalent symptom required for eligibility in clinical trials.16 Regurgitation and heartburn are also assessed in the gastrointestinal symptom rating scale (GSRS).17 Other symptoms affecting at least 25% of subjects with GORD include belching, flatulence, acid taste, nausea, stomach gurgling, and bad breath.16 Symptoms are an important part of outcome and may be included as part of a disease specific instrument or separately as a disease severity index.

Few studies have assessed the relative importance of symptoms and other HRQoL problems in patients with GORD. Talley and colleagues,18 in assessing 984 patients entering two clinical trials with endoscope negative reflux disease, observed that the most commonly reported areas of HRQoL dysfunction were eating and drinking (45–81%), sleep problems (39–49%), lack of vitality (41–58%), and poor emotional well being (45–55%). Most daily activities were impacted in approximately 20% of subjects, and as many as 44% avoided bending over because of heartburn.18 Somewhat different symptoms occur in patients who have undergone laparoscopic or open surgery who report flatulence (up to 40%), bloating, and postprandial fullness (10–20%), and dysphagia (10%) after surgery but with no apparent differences between the two types of operation.19 Patient dysfunction is also apparent in the workplace. In a prospective descriptive study of 136 Swedish workers attending their general practitioner for GORD, 30% reported decreased regular daily activities, 23% had reduced weekly work productivity of, on average, 10 hours, and a mean of 2.5 hours per week absenteeism due to heartburn.20 These results suggest that even in community based patients, who might be expected to have mild disease, GORD has a significant daily toll on their HRQoL.

In a recent cross sectional population based study of 1149 Canadians surveyed for general health,21 subjects completed an electronic telephone administered SF-36 together with the Rome II questionnaire.22 A total of 254 (24%) fulfilled Rome II criteria for functional heartburn (heartburn without dysphagia). The respective mean physical (PCS) and mental component summary (MCS) scores, standardised to a mean of 50 and standard deviation of 10, were significantly worse (p<0.05) in subjects who did, versus those who did not, fulfil criteria for functional heartburn, with respective PCS scores of 47.1 versus 50.2 and MCS scores of 48.9 versus 51.2. These scores were significantly lower than population norms. In a long term study (10 years of follow up), McDougall and colleagues23 also demonstrated that SF-36 scores were worse than in the general population and similar to those of patients with conditions such as acute myocardial infarction (physical function) or congestive heart failure (social function), further confirming that GORD patients have poor HRQoL.

Dimenas et al, while deriving normal Swedish population values for the PGWBI, also examined the impact of symptom severity and other factors on HRQoL.24 As heartburn and regurgitation severity increased, HRQoL scores also decreased. There were also significant differences between men and women (poorer PGWBI scores in women) and better scores in the oldest age group (aged 60–70 years) compared with younger patients. More recently, Farup and colleagues25 observed in a national US random telephone survey that subjects with frequent GORD (14% of those surveyed) or nocturnal GORD (10% of those surveyed) symptoms had SF-36 MCS and PCS scores that were significantly poorer than the US general population. In addition, mean summary scores were significantly lower in those with nocturnal symptoms compared with those with frequent, but without nocturnal, symptoms (PCS 38.9 v 41.5 (p<0.05); MCS 46.8 v 49.5 (p<0.05)). Nocturnal GORD subjects also scored significantly worse in all eight of the SF-36 subscales than the group with non-nocturnal GORD and had more bodily pain than patients with hypertension or diabetes, and similar to those with angina or congestive heart failure. In another study assessing GORD severity, Eloubeidi and Provenzale demonstrated in 107 patients with Barrett’s oesophagus and 104 GORD Barrett’s oesophagus negative patients that HRQoL was significantly predicted by heartburn frequency and severity.26 In that study, the number of associated comorbid conditions also predicted poorer HRQoL. A recent post hoc analysis of clinical trial data retrieved from the AstraZeneca database explored other non-disease related factors.27,28 Patients who reported greater anxiety before medical treatment with omeprazole or esomeprazole were less likely to respond to therapy than groups reporting little or no anxiety. These results support the hypothesis that symptom severity and non-disease features, such as sex, age, comorbidity, and personality traits also contribute to HRQoL outcomes.

There are now at least three generic5,6,29 and six disease specific13,30–34 instruments that have been evaluated in patients with GORD (table 2). Some of these have been well evaluated while others have not. The two most extensively examined are the GSRS17 and the QOLRAD scale.33 These have been fully evaluated and demonstrated to be valid, reliable, and sensitive to change.34,35 None the less, the GSRS is primarily a symptom severity score rather than a true HRQoL instrument and thus should be considered separately and not as a quality of life instrument. Talley and colleagues36 fully assessed psychometrically the QOLRAD using results obtained during two randomised controlled trials that examined the GSRS and QOLRAD in 984 patients. Of these subjects, 40–80% reported problems in eating/drinking, sleeping, vitality, or emotional well being. Within two weeks of treatment with esomeprazole, 20 or 40 mg daily, these problems had decreased to 7–19%, and there was an excellent correlation between symptom reduction and improved function. The minimum clinically important difference was identified to be approximately 0.5 per item, with a large difference being approximately 1.5 points per item.

Table 2

Health related quality of life (HRQoL) instruments used in gastro-oesophageal reflux disease (GORD)

The effects of treatment on HRQoL have been reported in several observational studies but only a few randomised trials. One systematic review37 was published in the Cochrane database in 2000 reporting medical therapy in 23 trials, of which only four had included HRQoL as a measured outcome. Three used the GSRS, a symptom scale, two also used the PGWBI, and one used the SF-36 together with a heartburn specific instrument. Significantly improved PGWBI scores and GRSR reflux scores (p<0.05) were noted after treatment of endoscope negative reflux disease with omeprazole 20 or 10 mg daily compared with placebo. No overall difference was noted between omeprazole 20 mg once daily and cisapride 10 mg four times daily, except in the GSRS reflux score, and significant improvement was noted also with ranitidine 150 mg twice daily compared with placebo in three (physical functioning, bodily pain, and vitality) of eight subscales of the SF-36 and all scales of the heartburn specific questionnaire. Currently, a second overview is in progress and has reviewed a total of 174 short and long term GORD trials, of which only nine have assessed HRQoL (Moayyedi P, personal communication).

In a retrospective comparison of symptoms experienced over the previous 12 months by patients with severe reflux who had undergone medical or surgical management, heartburn, regurgitation, waterbrash, and HRQoL were significantly better (p<0.05) after laparoscopic fundoplication compared with medical management.38 As might be expected, medication and proton pump inhibitor use was greater in the medically treated group. However, results from a long term follow up study of subjects who had participated in a randomised trial of medical versus surgical therapy approximately 10 years before showed no significant difference in the PCS or MCS of the SF-36 between medically and surgically treated patients but did show greater bodily pain in the medical treatment group (51.7 v 64.0; p<0.05).39 However, the risk of death was greater in the surgically treated group and no disease specific HRQoL was assessed.

Most studies in which effective therapy has been given, whether medical or surgical, show improvement in HRQoL after treatment. It seems important to remind researchers that when assessing HRQoL, all important outcomes must be determined. Finally, in recent times, studies are undertaken in many countries simultaneously. It is critical to ensure that proper cross cultural validation is performed when HRQoL questionnaires must be translated or adapted for other languages.10,40

Most researchers are now combining generic and disease specific instruments to fully assess HRQoL. This has the advantage of not missing an unexpected finding, and also of finding the most important benefit. However, alternative instruments will be needed for broader or more specific applications, such as to assess mental health, work function, disease knowledge, coping skills, relationships, or sexual dysfunction, which are not fully assessed by currently available questionnaires. Questionnaires, such as the World Health Organisation quality of life (WHOQOL-100),41 a reasonably well validated but rather lengthy instrument (or its shorter version, the WHOQOL-BREF42) have not been examined in the context of GORD but should be considered for evaluation. Utilities, such as the HUI13 and others, should also be explored when pharmacoeconomic analyses are planned.

In summary, GORD patients have impaired HRQoL compared with general populations, and perceive themselves to be as affected by their condition as groups with other serious chronic conditions. The level of impairment and types of problems experienced relate to symptom severity, the type and effectiveness of the treatment, and non-disease related factors such as the presence of other medical conditions, sex, or anxiety. HRQoL measurement complements symptom severity evaluation and should be part of outcome measurement for all therapeutic trials, using both disease specific and generic instruments. Full validation of assessment tools is critical. Long term, as well as short term, evaluation is important and is critical when undertaking comparative pharmacoeconomic evaluations.

