Article Text

Download PDFPDF

Assessment of reflux symptom severity: methodological options and their attributes
  1. P Bytzer
  1. Correspondence to:
    Dr P Bytzer
    Associate Professor of Medicine, Head, Department of Medicine, Division of Gastroenterology, Glostrup University Hospital, DK-2600 Glostrup, Denmark; Peter.BytzerDADLNET.DK


Despite major advances in our understanding of reflux disease, the management of this disorder still presents many challenges. Reduction of heartburn is the most readily apparent objective for the patient with reflux disease. Thus the ability to measure heartburn accurately is of fundamental importance to clinical research in reflux disease. Here, the available data on the assessment of reflux symptoms—predominantly heartburn—in clinical trials of symptomatic reflux disease are examined.

  • GORD, gastro-oesophageal reflux disease
  • PPI, proton pump inhibitor
  • VAS, visual analogue scales

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Heartburn is usually assessed by measuring severity and frequency using modified Likert scales, usually with four, five, or seven grades. The various grades are not always defined and they frequently differ among trials. Severity is measured as either the most severe episode of heartburn over the past day, week, or month, or by the overall average of symptoms. Heartburn frequency is usually assessed at trial entry but not always at the end. Furthermore, frequency of heartburn is seldom a part of the definition of treatment success unless it is incorporated into a description that defines absence of symptoms. The number of days with heartburn over the past week, or the numbers of hours with heartburn over a 24 hour period, have been used to measure frequency. Patients who report frequent symptoms also seem to suffer from more severe grades of heartburn. Clinical trials suggest that the severity and frequency of heartburn improve in parallel during medical therapy. Diverse symptom response measures have been used, many studies reporting the proportion of patients who experienced absence of reflux symptoms or the number of symptom free days as primary outcome measures. Complete absence of heartburn is a very attractive outcome measure because it is unambiguous. Validation studies are lacking and it is not clear what the most appropriate outcome is in patients with heartburn. In short term studies, a strict end point, such as “absence of heartburn for the last seven days”, appears attractive. In long term studies, the phrase, “sufficient control of heartburn”, may be a suitable outcome measure although it too requires appropriate validation.


Despite major advances in our understanding of reflux disease, the management of this disorder still presents many challenges. Reduction of heartburn is the most readily apparent objective for the patient with reflux disease. Thus the ability to measure heartburn accurately is of fundamental importance to clinical research in reflux disease. Obstacles to interpreting the patient’s subjective assessment include lack of agreed definitions of symptoms, arbitrary gradations of symptom severity and frequency, and lack of validated rating scales. An important requirement for future clinical research will be to define guidelines for the assessment of symptoms and, hence, of treatment success. This review will examine the available data on the assessment of reflux symptoms—predominantly heartburn—in clinical trials of symptomatic reflux disease.

Reflux disease may be associated with many symptoms but the major ones assessed in clinical trials are “heartburn” and “regurgitation”. Here, I will concentrate on heartburn to illustrate the problems associated with symptom assessment but it should be recognised that heartburn is probably the best characterised reflux symptom and that difficulties in assessment are even greater for other symptoms that occur less frequently, are associated less clearly with reflux disease, and are often even less well defined than heartburn.

Most importantly, there is no universally accepted definition of heartburn. A definition of heartburn as “a burning feeling rising from the stomach or lower chest towards the neck” leads to improved recognition of reflux symptoms and is predictive of a good symptomatic response to acid suppression with a proton pump inhibitor (PPI).1,2 This description of heartburn is also important because it has been the enrolment criterion for many gastro-oesophageal reflux disease (GORD) treatment studies in patients with either endoscopy negative reflux disease or erosive oesophagitis.3–11 However, despite this, many patients do not consider heartburn and retrosternal burning to be synonymous.1


The vast majority of clinical trials have graded heartburn severity using an ordinal scale (for example, a modified Likert scale) and most have used a four grade modified Likert scale, with word anchors defining heartburn severity by its impact on daily life (for example, “causing interference with normal activities”) (table 1). Only a few studies have used five or seven grade modified Likert scales, despite their methodological advantages over scales with fewer grades.12 More importantly, the time frame for symptom assessment has varied in different studies. Most questionnaires have rated heartburn severity as the overall intensity of the symptom over the previous day or week. However, other questionnaires have asked the patient to grade the severity by defining the most intense episode of heartburn during the previous day or week.13,14 There have been no studies determining whether these variations in the definition of severity result in differences in the classification of individual patients.

Table 1

Examples of definitions of heartburn severity used in clinical trials

In one study, the investigators incorporated both frequency and severity into the same scale, assuming that mild symptoms, which did not interfere with normal activities, occurred only occasionally and that severe symptoms, interfering with normal activities, were likely to be present frequently.15

Heartburn severity has also been graded using visual analogue scales (VAS) (table 1). VAS are continuous, usually 10 cm long, often with the extremes labelled by specific terms like “worst possible symptom” and “no symptom”. Their reproducibility and responsiveness in upper gastrointestinal symptoms are well established.16 When used in serial measurements, patients should see their prior responses as this may increase sensitivity and thus the power of the trial.17 Outcome measures obtained from VAS may be difficult to interpret as small but statistically significant results do not necessarily indicate clinical relevance. Furthermore, the clinical relevance of equal measures or changes in outcome as assessed by continuous scales may differ between subjects.


Most definitions assess the number of days with heartburn over the previous week or month (table 2). A few studies have rated frequency by the number of hours during the last 24 hour period with symptoms.18 The gradings used are arbitrary and have not been validated. Even so, the frequency of heartburn is obviously an important descriptor of reflux disease severity. Patients with frequent reflux symptoms (occasionally versus one to three times daily versus almost constantly present) have a significantly greater oesophageal acid exposure on 24 hour pH monitoring compared with those with less frequent symptoms.19 Assessment of heartburn frequency is often used in clinical trials, for example as part of the eligibility criteria (see tables 3, 4) but is rarely used as part of the definition of treatment success,20 unless complete absence of symptoms (for example, during the previous week) is assumed to indicate complete symptom control.

Table 2

Definitions of heartburn frequency used in clinical trials

Table 3

Summary of various symptom outcome measures in clinical trials of patients with heartburn

Table 4

Summary of assessments of heartburn in randomised treatment trials in endoscopy negative reflux disease


Traditionally, severity, frequency, and duration have all been held to be important symptom qualities. Information about the relative importance of these symptom parameters for patients with heartburn has come from clinical trials. Not surprisingly, both the severity and frequency of heartburn seem to be important characteristics. Duration of the individual symptom episodes is probably also important to patients but this aspect has received very little attention in clinical research.

Data from clinical trials in erosive and non-erosive reflux disease6,7 have established an apparent relationship between heartburn severity, graded as mild, moderate, or severe, and quality of life impairment. It should be noted however that this may be a spurious observation as the gradation of heartburn severity was based on its impact on daily living and it may thus be little more than an indirect quality of life measure. In addition, less frequent symptoms of greater severity and duration may be perceived by some patients as representative of significant disease and thus worth treating. A key issue is thus the level of symptom severity or frequency at which a significant reduction in quality of life is seen. Self reported heartburn frequency in a population based survey is an important predictor of health care seeking.21

Patients who report frequent symptoms also seem to suffer from more severe grades of heartburn. Baseline assessments from three clinical trials in non-erosive reflux disease, comparing esomeprazole with omeprazole with a total of 2642 patients, showed that patients who reported severe heartburn were more likely to have daily heartburn than those with mild heartburn (see fig 2 in Dent and colleagues22 in this supplement (page iv1–iv24)) (AstraZeneca, data on file).

Furthermore, results from these and other trials suggest that the severity and frequency of heartburn improve in parallel during medical therapy. For example, pooled data from three controlled trials comparing esomeprazole with omeprazole showed a relationship between the improvement in heartburn severity (scored on a four grade modified Likert scale) and reduction in the number of days per week with heartburn.22 Thus patients with a pronounced reduction in heartburn frequency also reported a more marked reduction in heartburn severity (fig 1). Comparable findings were reported from a clinical trial in which the symptomatic responses to omeprazole and ranitidine were evaluated in erosive reflux disease.23

Figure 1

Heartburn frequency and severity improves in parallel during medical therapy. Proportion of patients reporting a change in heartburn severity score of 3 (from severe to none) or 2 (from severe to mild or from moderate to none) according to the change in the number of days with heartburn (−7, from daily to none; −6, from daily to one day per week or from six days per week to none, etc). Results are pooled data from three controlled trials comparing esomeprazole with omeprazole (n = 2629) (AstraZeneca, data on file).

Even though these data might suggest that measures of symptom response could be restricted to either severity or frequency, we do not know if different treatment modalities, other than acid inhibitory drugs, might have a different impact on symptom patterns resulting in skewed or differential changes in these parameters. Furthermore, a randomised placebo controlled study comparing two doses of omeprazole suggested that both severity and frequency of heartburn are important independent determinants of patient satisfaction with therapy.4 New data support this, showing that most patients are willing to accept mild heartburn during treatment, but only for up to one day per week, whereas almost none is willing to accept severe or even moderate heartburn (see fig 1 in Dent and colleagues22 in this supplement (page iv1–iv24)).24


A large number of different symptom response measures have been reported in the literature. Outcome measures in non-erosive reflux disease focus almost exclusively on symptom reduction and are usually more detailed and sophisticated than in trials for erosive reflux disease, which tend to concentrate on endoscopic signs of healing. Consequently, this review has focused mainly on methodology reported in non-erosive reflux disease trials.

Outcome measures should be validated in well designed studies designed for that purpose before they are used in clinical trials.25 This ideal requirement has not been satisfied for the symptom outcome measures used in reflux disease and there is a remarkable lack of validation studies in the area. Thus it is not clear which outcome measure is most appropriate in GORD patients.

Only a minority of clinical trials in symptomatic reflux disease offer sufficient methodological details on the recording and definition of heartburn severity and frequency and outcome measures. Table 3 lists a number of different symptom outcome measures reported in major clinical trials in reflux disease. In table 4, eligibility criteria, relevant to heartburn symptoms, are listed together with a summary of outcome measures in trials, which have examined the symptomatic response to antisecretory medication in non-erosive reflux disease.


Reflux patients often describe several different symptoms. Assessment of treatment effect for each individual symptom in clinical trials may thus lead to problems with false positive results as a result of multiple statistical testing. Furthermore, reflux patients may be disappointed if they expect reduction of all gastrointestinal symptoms when in fact the investigator focuses mainly on reduction of heartburn. In the study by Carlsson and colleagues6 which compared the effects of two doses of omeprazole in patients who had symptoms compatible with reflux disease, the primary outcome measure was “complete upper gastrointestinal symptom relief”. Belching and bloating were among the most common individual symptoms recorded at entry, and because these symptoms are probably not associated with gastric acid secretion or gastro-oesophageal reflux episodes, they would not be expected to improve on acid inhibitory drugs. Not surprisingly, this very broad definition of symptom reduction resulted in a response rate of only 35–41%, much lower than in other studies for which the primary outcome was reduction of heartburn.

As the reflux symptoms of heartburn and regurgitation are part of the definition of symptomatic reflux disease, they have been the focus of treatment trials and are the primary concern in everyday clinical practice. Epigastric pain is not considered a specific symptom of gastro-oesophageal reflux but it often improves with active treatment in reflux patients,6,26,27 as does regurgitation.7,27,28 Other upper gastrointestinal symptoms, such as belching, bloating, nausea, and vomiting, have been evaluated in some placebo controlled trials.6,8,26,27,29,30 Some studies reported improvement, independent of treatment allocation, for many of these symptoms.27,29


Ordinal scales are often used to evaluate the effect of treatment on reflux symptoms. This can be done by the use of single state scales or by transition scales. Single state scales, for example four or five grade modified Likert scales, are used to establish a patient’s state at various time points (for example, at entry and completion). Scale scores should be composed of elements that are clearly defined, mutually exclusive, and ranked in a hierarchical manner. Furthermore, scale scores should be easy to translate into a clinical context.31 To optimise responsiveness to change, at least five to seven points should be included in the scale.12 Furthermore, the scale must be able to detect improvement and deterioration equally in the patients under study. If patients are clustered at one end of the scale at entry, then the scale may be unable to detect a change occurring in one direction—for example, deterioration in patients—who all score maximum severity at entry.

Transition scales measure the change in symptoms directly (for example, improved, unchanged, worse) and these scales should be symmetrical in their structure. Asymmetric designs—for example, with more grades for improvement than for deterioration—could potentially bias the results.32

In reflux disease, composite or global symptom scores are usually developed by adding or multiplying daily heartburn severity by heartburn frequency.20,33 A predefined reduction in heartburn score may be used as an outcome criterion.33 Unfortunately, categorisation in such scales is often ambiguous and the categories are not necessarily exhaustive or graded in equal intervals. Differences in scores may be difficult to evaluate unless the investigators provide a clinical context for interpretation.

Many recent studies have reported the proportion of patients who obtain total absence of symptoms or the number of symptom free days as primary outcome measures (table 4). These measures are easily understood, they make clinical sense, and they are not biased by the methodological difficulties associated with measuring subtle changes over time in symptom severity, frequency, and duration. On the other hand, these measures may underestimate treatment effect for subjects whose symptoms are reduced although not completely absent.


Complete abolition of symptoms is a primary aim when treating patients with reflux disease from both methodological and clinical standpoints. From a methodological standpoint, the absence of heartburn is, intuitively, an attractive outcome measure and this is probably why it is one of the most widely used end points in reflux treatment trials. Complete absence of symptoms may not however be the primary long term aim of all patients. Indeed, complete absence of symptoms may seem to be a very ambitious and unrealistic goal, leading to a reduction in symptom severity to levels below those found in a healthy background population. In practice, many patients who are prescribed continuous treatment take their medication only when symptoms become troublesome. Thus reflux patients who were prescribed long term daily PPI therapy took their medication on only 50% of treatment days.34 In controlled trials of on-demand treatment strategies in reflux disease, patients take a PPI on average every second to every third day.35,36 Similar findings have been reported from follow up studies outside the framework of a clinical trial.37 Thus although abolition of reflux symptoms might seem to be an ideal outcome measure, many patients are prepared to accept a recurrence of reflux symptoms before they resume therapy.

In trials that have measured different levels of outcome (for example, “complete absence of heartburn”, “resolution of heartburn”, “adequate control of heartburn”), there is usually a hierarchy of treatment response rates, with the lowest response rate being reported for those in which complete absence of symptoms was the primary outcome. Interestingly, a significantly larger proportion of patients are willing to continue a treatment strategy even if it does not provide absolute symptom control.4,6–8

Some clinical trials have used “sufficient control” or “resolution of heartburn” as an end point (tables 3, 4). This has usually been defined as no more than one day with no more than mild heartburn in the preceding week. In one study, “resolution of heartburn” corresponded well with the overall assessment reported by the patients in response to the question “Does the medication give sufficient control of your heartburn?”.4 A recent study reported on the relationship between complete absence of heartburn symptoms and quality of life.38 Patients with complete absence of heartburn reported improved functioning and well being compared with patients with continuing heartburn problems. Unfortunately, the study did not report any comparisons between patients with complete absence of heartburn and patients with incomplete but acceptable control of their symptoms.


When single scale states are used as outcome measures, baseline measurements are needed. These are often expanded to summarise overall heartburn intensity over an appropriate time prior to treatment (for example, one week or one month). Even though the use of transition scales does not require a baseline assessment, it is generally recommended as it serves to document the patient’s symptom state at entry. At a minimum, outcome should be measured at completion of the trial and this should be the primary data point. Often, intermediary data points are also obtained but repeated measurements may lead to problems with false positive results due to multiple testing. Some outcome measures summarise symptom intensity over time (for example, number of days with or without heartburn) by using diary cards. This may be an important additional measure in clinical trials comparing different interventions which may be associated with differences in the onset of heartburn reduction but which seem to be equally effective when measured at the end of the trial.

Recent studies have focused on the time (in minutes or hours) to symptom reduction after single or repeated doses of antisecretory medication.39–41 This is of obvious importance for patients who take their medication on demand where prompt reduction of symptoms is important. Patients are usually instructed to record symptoms at regular intervals (minutes or hours), depending on the perceived speed of action. An electronic patient diary or an interactive voice response system can ensure more valid symptom recordings and prevent retrospective entries.40–42


As a general rule, the patient should assess symptoms and symptom outcome directly. Patients and investigators may disagree when both evaluate symptom outcome. In many studies, it is not clear whether the final assessment was done by the patients directly or indirectly by way of a physician interview. Several studies have shown that investigators tend to be more optimistic than their patients in estimating the magnitude of treatment response at the final assessment. Thus in the study by Sandmark et al, the investigators rated approximately 75% of patients as completely symptom free after four weeks of omeprazole therapy. In comparison, only approximately 55% of patients felt that their symptoms were completely gone.23


A large number of different symptom response measures have been reported in the literature. Many studies report the proportion of patients who obtain total absence of symptoms or the number of symptom free days as primary outcome measures. These measures are “hard” or more objective end points that are easily understood. Furthermore, they make clinical sense and they are not biased by methodological difficulties. On the other hand, such crude measures will underestimate treatment effects in those with incomplete but satisfactory symptom response. There is a general lack of validation studies in this area and it is not clear what the most appropriate outcome is in patients with heartburn. In short term studies (weeks to a few months), a strict end point such as “absence of heartburn for the last seven days” appears attractive since it is unambiguous and, therefore, methodologically sound. Furthermore, it will provide the patient with an “internal” standard of the best possible care with which to compare future therapies. In long term studies, a less strict end point, such as “sufficient control of heartburn”, may be more appropriate. However, a less strict end point based on predefined criterion such as “no more than mild symptoms on no more than one day per week”, or on the patient’s decision as to treatment adequacy, introduces other problems of subjectivity with respect to the definition of “mild” and “adequacy”.

The choice of symptom outcome measure depends also on the aim of the clinical trial. A study to compare two similar therapies may be best able to discriminate if it uses a “hard” end point, such as complete abolition of symptoms, whereas a study to assess the effect of treatment on patient quality of life or satisfaction may require a more detailed assessment of the magnitude of change in a patient’s symptoms. Given that studies may have different aims, it might be preferable if the primary aim were specified clearly, but if other outcome measures were also specified to allow comparability between different studies.


Linked Articles