BACKGROUND Endoscopic oesophageal changes are diagnostically helpful and identify patients exposed to the risk of disease chronicity. However, there is a serious lack of agreement about how to describe and classify the appearance of reflux oesophagitis
AIMS To examine the reliability of criteria that describe the circumferential extent of mucosal breaks and to evaluate the functional and clinical correlates of patients with reflux disease whose oesophagitis was graded according to the Los Angeles system.
METHODS Forty six endoscopists from different countries used a detailed worksheet to evaluate endoscopic video recordings from 22 patients with the full range of severity of reflux oesophagitis. In separate studies, Los Angeles system gradings were correlated with 24 hour oesophageal pH monitoring (178 patients), and with clinical trials of omeprazole treatment (277 patients).
RESULTS Evaluation of circumferential extent of oesophagitis by the criterion of whether mucosal breaks extended between the tops of mucosal folds, gave acceptable agreement (mean κ value 0.4) among observers. This approach is used in the Los Angeles system. An alternative approach of grouping the circumferential extent of mucosal breaks as occupying 0–25%, 26–50%, 51–75%, 76–99%, or 100% of the oesophageal circumference, gave unacceptably high interobserver variation (mean κ values 0–0.15) for all but the lowest category of extent (mean κ value 0.4). Severity of oesophageal acid exposure was significantly (p<0.001) related to the severity grade of oesophagitis. Preteatment oesophagitis grades A–C were related to heartburn severity (p<0.01), outcomes of omeprazole (10 mg daily) treatment (p<0.01), and the risk for symptom relapse off therapy over six months (p<0.05).
CONCLUSIONS Results add further support to previous studies for the clinical utility of the Los Angeles system for endoscopic grading of oesophagitis.
- columnar lined mucosa
- acid reflux
Statistics from Altmetric.com
The endoscopic oesophageal changes caused by reflux disease are not only helpful diagnostically, but also identify patients exposed to a significant risk of disease chronicity.1-4Furthermore, the severity of oesophagitis gives useful guidance as to the likelihood of success of particular treatments.5
Unfortunately, there is a serious lack of agreement about how to describe and classify the appearances of reflux oesophagitis. There are shortcomings in the many different published systems. For example, many of these systems use ambiguous terminology that is variably interpreted by different endoscopists; some systems suffer from illogicalities; and many doctors grade appearances that are now known to be unreliable for the diagnosis of reflux oesophagitis, such as oedema, increased vascularity, etc.6 ,7 It is probably these deficiencies that lead some endoscopists to use “personal” systems. The lack of agreement about how to describe the endoscopic appearances of reflux oesophagitis arises from a deficiency of critical evaluation of approaches to endoscopic grading of oesophagitis. This hampers both accurate communication about the clinical status of individual patients and the interpretation of data from clinical trials of the treatment of the disease.8 Our international working group on the endoscopic classification of oesophagitis, which is supported by the World Organisation of Gastroenterology (OMGE), developed and proposed the Los Angeles Classification System in 19946 and reported its first evaluation of the criteria used. The system was so named because it was presented at the Los Angeles World Congress of Gastroenterology. The working group has continued to meet regularly in order to evaluate the proposed Los Angeles classification and as a result of these discussions has agreed that the definitive version of the classification should be as given in table 1. This version has been revised slightly from the previously published proposal,6in response to the concern that it is sometimes difficult to determine with confidence whether a mucosal break is completely circumferential or not. The change of grade D from fully circumferential to involvement of more than 75% of the circumference has required a matching modification of the definition of grade C.
The criteria used by the Los Angeles classification focus on the description of the extent of visible mucosal breaks in the belief that this is of greatest diagnostic and prognostic value. In the previous study from this group,6 which used video recording and endoscopic photographs, we found that endoscopists were able to identify mucosal breaks confined to the tops of mucosal folds, lesions that extended around the entire oesophageal circumference, and complications of oesophagitis such as stricture and columnar lined oesophagus. However, our original assessment of the utility of criteria for the assessment of the radial extent of mucosal breaks was inconclusive because of deficiences in the design of the sheet used to score particular endoscopic features.6 This was a major limitation, as definition of circumferential extent was judged to be a key measure of oesophagitis severity. As a result of this experience, the score sheet for assessing this variable was developed substantially and the revised version used in the further evaluations presented in this report.
A system that classifies the endoscopic changes of reflux oesophagitis should not only be unambiguous and simple to use, but should also be shown to distinguish between clinically relevant grades of severity of reflux disease. Thus the grading should distinguish among groups of patients with differing responses to treatment and with differing severity of reflux, as assessed by other measurements. These correlates have been assessed for the Los Angeles system for the first time in the present study.
The study had therefore two major aims: to examine the reliability of criteria that describe the circumferential extent of mucosal breaks; and to evaluate the functional and clinical correlates of patients with reflux disease whose oesophagitis was graded according to the Los Angeles system.
Forty six endoscopists from different countries and continents (Europe, USA, Australia, and Japan) participated. Both trainee endoscopists, defined as those who had performed less than 500 upper gastrointestinal endoscopies, and experts who had performed more than 3000 upper gastrointestinal endoscopies were invited to take part. Of the participants, 25 were classified as experts and 15 as trainees. Information about the level of experience was unavailable for six contributors. As we were unable to reveal any significant differences in outcome between these groups, we present data retrieved when analysing the entire study group in the Results section. Each participant received a videotape which contained edited recordings of endoscopic images from 22 cases, each lasting approximately 30 seconds. The full range of severity of reflux oesophagitis was covered by the recordings. Endoscopists used a detailed worksheet (shown in full in the ), which was a development of the one used in previous studies. The sheet scored the full range of findings relevant to the evaluation of an oesophagitis classification system, according to predefined criteria. The previously agreed definition of mucosal breaks, “an area of slough or erythema with a discrete line of demarcation from the adjacent more normal looking mucosa” was used.6 The original videotapes were acquired with Olympus video endoscopes and either a Super-VHS or NT video recorder. The copies that were distributed were made directly from these original recordings. Five centres participated in producing original video material.
The most widely used coefficient of agreement in clinical studies, the κ statistic,9 was used in its original version in order to evaluate the degree of agreement among observers. The range of possible values for κ is from −1, which indicates perfect disagreement, to +1, which is reached with perfect agreement. Agreement by chance alone gives a value of 0. Though far from perfect, values as low as 0.4 are considered to indicate acceptable agreement. In the κ statistics used, p0 is the observed proportion of agreement and pC is the expected agreement by chance in the relevant contingency table.
We analysed the data from the large number of observers as a multiple of comparisons between each pair of observers. That is, if 46 observers recorded the presence or absence of a particular feature, κ statistics were calculated for a total of [n × (n − 1)] comparisons. This means that a theoretical number of 2070 κ values have been computed for each item. Each κ value measured the agreement between two observers in the endoscopic videotape recordings in the 22 patients; κ values are given as medians and interquartile ranges.
FUNCTIONAL AND CLINICAL CORRELATES
Study I—Oesophageal pH monitoring
These data are a part product of a study into the use of omeprazole as a diagnostic test, which has been reported in detail elsewhere.10 Patients were recruited into the study if they had heartburn as the main symptom for at least six months, or if their major symptom was acid regurgitation, abdominal pain, or discomfort in association with heartburn. In total, 178 patients were entered into the study from 17 centres in Denmark, Norway, and Sweden. The patients scored their reflux induced symptoms in a structured manner during the last two days before enrolment into the study as either mild, moderate, or severe. At endoscopy, oesophagitis was graded according to the Los Angeles classification. Before entry into the therapeutic trial ambulatory 24 hour pH monitoring was carried out with a monocrystalline, antimony electrode positioned 5 cm above the oral margin of the lower oesophageal sphincter, which was located by stationary oesophageal manometry. During the pH study, the patients were asked to act as normally as possible in order to minimise any impact on reflux patterns that could result from the process of measurement.8
Study II—Clinical correlates
In the other clinical study, the major therapeutic outcomes of which have been reported elsewhere,11 patients with upper gastrointestinal discomfort were recruited from the primary care setting. The patients were screened for reflux symptoms by use of a questionnaire,12 which had been developed for the diagnosis of symptomatic reflux disease, and which focuses especially on objective recognition of heartburn, and on exacerbating or relieving factors. Patients with a questionnaire score of 4 or more, aged 18–80 years, were screened by endoscopy for enrolment into the study, provided that they had been experiencing upper gastrointestinal symptoms for at least three months with episodes occurring on at least four days during the previous week. At endoscopy, oesophagitis was graded according to the Los Angeles classification (table 1). For ethical reasons, patients found to have Los Angeles grade D oesophagitis, columnar lined oesophagus, or peptic strictures were excluded from further involvement in the study. Complete endoscopic and clinical data were available for 496 of the 538 patients randomised to treatment (see below). Of these, the 277 who had mucosal breaks were randomised to either omeprazole 20 or 10 mg daily. Patients without mucosal breaks were randomised to either omeprazole 20 or 10 mg daily or to placebo. Treatment was given for four weeks, under double blind conditions, after which it was stopped in those who had complete symptom relief. Those with persistent symptoms at four weeks were given open therapy with omeprazole 20 mg daily for another four weeks. All patients in whom symptoms were relieved at four or eight weeks entered a follow up period without treatment for up to six months, exiting the study if their symptoms relapsed according to predefined criteria before that time. There were 145 patients who had mucosal breaks at their pretreatment endoscopy who entered this phase of the study.
The relation between acid reflux variables and endoscopic grading of mucosal breaks was assessed with a logistic regression analysis in which the endoscopic grade was the dependent variable. The relation between symptom severity and endoscopic grading was evaluated by cross tabulation of the severity of heartburn and endoscopic grading and then a simple correlation analysis. A log rank test was applied to the proportion of patients in clinical remission during follow up with the pretreatment endoscopic grading as the dependent variable.13
There was no significant difference in the levels of agreement on judgements described below between experienced endoscopists and trainees, so the entire data set was pooled. The most important aspect of the endoscopic video sequence assessments was the reproducibility of evaluation of the circumferential extent of mucosal breaks as judged by the extension of mucosal breaks across the tops of two or more mucosal folds. Such mucosal breaks were assessed with the κ value of 0.4 (0.22–0.51) (fig 1).
When endoscopists were asked to evaluate the proportion of the circumference involved by mucosal breaks as 0–25%, 26–50%, 51–75%, 76–99%, or 100%, only the most limited extent (0–25%) was recorded with any level of accuracy (κ = 0.4, range 0.22–0.52). In the remaining groups, κ values equalled or were close to 0. Another aspect of assessing the circumferential extent was the assessment of whether one or more mucosal folds were involved by the mucosal breaks. The examiners again showed an acceptable agreement, scoring a κ value of 0.4 in this judgement.
The score sheets also allowed assessment of interobserver variation in the recognition of other aspects of the endoscopic oesophageal appearances already evaluated in our previous study.6Agreement on the presence or absence of a mucosal break in a particular endoscopic video sequence had a κ value of 0.4 (0.21–0.58). The presence of multiple mucosal breaks (two or more) was assessed with a similar level of accuracy (κ value 0.4 (0.25–0.54)).
The presence of a stricture was assessed with a κ value of only 0.33 (0–0.46) but there were only two cases of stricture among the 22 cases. For the four cases of columnar metaplasia (all histologically confined), recognition of tongues that extended up the oesophagus had a κ value of 0.38 (0.13–0.58) and estimates of the circumferential extent of this metaplastic epithelium had a κ value of 0.42 (0.10–0.65). The investigators were specifically asked to assess the number and depth of ulcers, as opposed to mucosal breaks, with an ulcer being defined as a mucosal break greater than 3 mm deep. Reliable judgement on the presence of an ulcer was not achieved as this assessment had a κ value of −0.01 and the corresponding κ value for the assessment of the number of ulcers only reached −0.02.
The presence and absence of “minimal endoscopic changes” as listed in table 2 was evaluated close to the oesophagogastric mucosal junction; there was no agreement on the presence or absence of these findings.
FUNCTIONAL AND CLINICAL CORRELATES
Oesophageal pH monitoring and symptom assessments were completed in 178 patients as a prelude to inclusion in the short term therapeutic trial with omeprazole (table 3). There was a significant relation between the Los Angeles classification grade and the 24 hour oesophageal acid exposure values (p<0.01).
There was also a significant correlation between pretreatment symptom severity and endoscopic grading of mucosal breaks (fig 2). The severity of heartburn correlated significantly with the pretreatment endoscopic grade in both studies (study I: r=0.31, p<0.001; study II: r=0.23, p<0.01).
After four weeks of treatment with omeprazole 10 mg daily, there was a gradation of endoscopic healing rates from 77% of grade A patients to only 20% in grade C patients (fig 3). There was a gradation of healing rates with omeprazole 20 mg, but this time only between grades B and C.
Symptomatic relapse after initial short term treatment occurred in 83% of the patients during a six month follow up period. The proportion of patients still in clinical remission during these six months showed a significant relation to the pretreatment Los Angeles classification grade of mucosal breaks (fig 4).
This report provides data which support the approaches used by the Los Angeles classification to categorise the extent of oesophagitis. Of special interest was the evaluation of the circumferential extent of the mucosal breaks. The data were generated by a newly structured worksheet and new endoscopic images better directed to the aim of the assessment than the one used previously.6 These results indicate that endoscopists can score the joining of mucosal breaks between the tops of two mucosal folds with fair agreement, thus allowing scoring of circumferential extent with this method. Determination of radial extent without reference to mucosal folds had very poor levels of agreement. This report gives clinical and physiological data, which support the practical relevance of the Los Angeles classification and also provide the definitive description of the finally agreed form of the Los Angeles classification.
The version of the Los Angeles system described in the previous report of this group6 was a proposed system still under evaluation, which has now been modified slightly from the original description (table 1). If, in the future, further data suggest that the Los Angeles classification could be improved by some evolution of the criteria, it is important that even a minor modification should be given an entirely different name, in order to avoid any ambiguity as to what criteria have been used. This is an important lesson, which has not been learnt from the Savary and Miller grading system.14-17 The first version of this grading system differs substantially from the second, yet it is rarely stated which version is being used, either in clinical practice or even in published clinical trials. If the version in use is stated, the reader must be familiar with the detail of each version.
The major aim of the assessments of the endoscopic images presented in this report was to evaluate further how reliably endoscopists could assess the circumferential extent of mucosal breaks by localising the peaks of the mucosal folds. It was believed that the findings of the assessment from our previous study6 could have been influenced by technical limitations of the assessment method. The present study gives a more positive indication that the peaks of the oesophageal mucosal folds are useful endoscopic landmarks for determination of extent. Another important message, related to the clinical usefulness of the system, was that no apparent differences emerged between experienced endoscopists and trainees. We would have liked to have obtained a stronger κ value than 0.4 to support our recommendation for use of the peaks of the mucosal folds as a primary method for determining radial extent. Determination of radial extent is, however, a necessary judgement for endoscopic classification of oesophagitis, and the peaks of the mucosal folds appear to be the best option on the basis of the data obtained.
The quality of the stored images continued to be a significant technical limitation. The stored images did not fully emulate the sensitivity of a live endoscopy done by an observer orientated to the use of the peaks of the mucosal folds as radial landmarks. Furthermore, despite our best efforts, the resolution of some of the endoscopic images was not optimal. Image quality was also impaired by the processes of image storage and copying. The stored images sent out for evaluation did not allow the observer to “explore” the oesophagus in order to clarify appearances. Such clarification should always include partial deflation of the oesophagus to recognise the position of the folds. Similar limitations with the current methodology might explain the differences in κ values between the present investigation and that performed by Bytzer et al.7 However, significant advances in the optical resolution of video endoscopes, and in the capture, storage, and reproduction of video endoscopic images in digital form are currently being introduced. We believe that these technical developments will largely overcome the problems caused by suboptimal resolution of images as currently used. Our working group plans to use these developments for preparation of educational material, but judged that it was inappropriate to delay formal introduction and evaluation of the Los Angeles classification system pending the results of a further assessment done with technically better images. It needs, however, to be emphasised that the intraobserver variability in these situations also requires further validation.
Previously, no attempt has been made to define the endoscopic appearances that should be taken as indicative of erosion or ulceration. We have described these collectively as “endoscopic mucosal breaks” for reasons given previously.6 The definition, “an area of slough or erythema with a discrete line of demarcation from the adjacent, more normal looking mucosa”, still requires interpretation by the observer, but it is at least more specific than no definition at all, which has been the case until now. Currently, some members of the working group are conducting a study which correlates endoscopic mucosal breaks with the histology of biopsies directed precisely to sample these areas. The group will also consider other measures that may aid the distinction of patches or tongues of mucosal columnar metaplasia from mucosal breaks. Methods that hold some promise for this include high resolution video endoscopy and dye spraying methods.18
Although the classification of the extent of the mucosal break was the primary aim of our group, we included a scoring of so called minimal changes as well. κ values close to or below 0 were obtained for recognition of “minimal changes”, in keeping with results from another group.7 Again we have to realise the pivotal role of technical quality of images in order to assess these subtle mucosal appearances. At present, we consider that current evidence indicates that reliable recognition of “minimal changes” is problematical and that the diagnostic significance of these changes for reflux disease is uncertain. Possibly, better image quality may make these changes more reliable in the future. Until “minimal changes” are shown to be of diagnostic value, they probably should not be recorded as present or absent, as to do so only causes confusion.
The Los Angeles classification was developed with the intention to provide a clinically relevant stratification of the severity of oesophagitis. Despite the lack of validation of other oesophagitis classification systems there is considerable data indicating that these recognise clinically relevant variations of the severity of oesophagitis. This is the case for both oesophageal pH monitoring data and response to medical treatment.19-21 The oesophageal acid exposure data included in the present study support the clinical relevance of the Los Angeles classification with the possible exception of the acid reflux values in grade C patients. However, data were available from only nine patients with this oesophagitis grade, a number that is probably too small to draw a meaningful conclusion, given the obvious variability of pH monitoring acid exposure data.
Large prospective, randomised clinical trials gave us the opportunity to evaluate further the clinical relevance of the Los Angeles classification. Previous studies, which have used different endoscopic classification systems to correlate the severity of symptoms with endoscopic grade, have given negative results.22 ,33 In the present study there was a significant correlation between the severity of heartburn and the Los Angeles grading. The clinical studies also found an impressive gradation of endoscopic healing relating to the endoscopic grading, when omeprazole was given in a dose of 10 mg daily for four weeks. The large numbers of patients involved and the statistically significant differences are arguments for the clinical relevance of the Los Angeles classification. On the other hand, the size of the r values (r=0.23–0.31) emphasises that factors other than endoscopic grading play a significant role in the clinical manifestation of the disease. It is worth noting that the data obtained show that the classification of patients into grade A and B seems to provide a clinically relevant subdivision. These patients represent the most numerous groups of patients suffering from reflux diseases, which have previously been grouped somewhat ambiguously into one grade by the Savary and Miller classification.14 Additional information relevant to the significance of the Los Angeles classification system was gained from following patients after a successful initial, short term, drug therapy. The symptomatic relapse curves separated patients, who before treatment had an absence of oesophageal mucosal breaks, from those with Los Angeles grades A–C.
The findings presented in this and our previous report,6represent the only rigorous assessment of an endoscopic grading system for reflux oesophagitis. The criteria used for the Los Angeles classification have been developed and agreed on by a truly international working group. It is hoped that this classification will be used widely and without ad hoc modification, with resultant improvement in communication about patient status for both clinical and research purposes.
The working group was supported by funds provided by OMGE and Astra Hässle AB. Ola Jungardt is gratefully acknowledged for his statistical advice.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.