Background Assessment of disease activity in UC is important for designing an optimal therapeutic strategy. No single histology score is considered optimum. The aim of this study was to compare intraobserver reproducibility and the interobserver agreement of available histological UC activity indexes.
Methods One hundred and two biopsy specimens (collected between 2003 and 2014) were scored blindly by three pathologists by determining Geboes, Riley, Gramlich and Gupta indexes and global visual evaluation (GVE). Intraobserver reproducibility and interobserver agreements for index and items of index were studied by intraclass correlation coefficient for quantitative parameter and by κ values and Krippendorff index for qualitative parameters. Relationship between indexes was studied by computation of Pearson's and Spearman's correlation coefficients.
Results Geboes, Riley, Gramlich and Gupta indexes and GVE showed good intraobserver reproducibility and a good interobserver agreement. Histological items that showed the best interobserver agreement were ‘erosion/ulceration or surface epithelial integrity’ and ‘acute inflammatory cells infiltrate/neutrophils in lamina propria’. The five scores were strongly correlated.
Conclusions Correlation between indexes is strong. Intraobserver reproducibility and interobserver agreement for all indexes is very good. Histological items that showed the best interobserver agreement are ‘erosion/ulceration’ and ‘acute inflammatory cells infiltrate/neutrophils in lamina propria’.
- ULCERATIVE COLITIS
Statistics from Altmetric.com
Significance of this study
What is already known on this subject?
Histological healing may be the ultimate therapeutic goal in UC.
Four histological indexes are available in UC: Geboes, Riley, Gramlich and Gupta indexes.
No single histology score is considered optimum in UC.
What are the new findings?
Correlation between histological indexes is strong.
Intraobserver reproducibility and interobserver agreement for all indexes is very good.
‘Erosion/ulceration’ and ‘acute inflammatory cells infiltrate/neutrophils in lamina propria’ are histological items that showed better correlation.
How might it impact on clinical practice in the foreseeable future?
These findings should be taken into account for the assessment of histological disease activity in UC in clinical practice.
UC, one of the major forms of IBD, presents as a chronic continuous colonic mucosal inflammation, starting at the rectum and extending proximally.1–3 Accumulating evidence indicates that histological healing may be associated with better clinical outcomes in UC and could represent the ultimate therapeutic goal in UC.4 ,5 Acute inflammatory indicators are associated with a twofold to threefold increased risk of colitis relapse during 12-month follow-up6 and basal plasmocytosis predicts UC clinical relapse in patients with complete mucosal healing.7 Assessment of disease activity and severity is important for designing an optimal therapeutic strategy and follow-up of patients with UC.8–10 Several scores for the assessment of histological disease activity have been developed.6 ,11–13 None of the instruments were developed using a formal validation process and their operating properties remain poorly understood.14 Usually they combine chronic and more acute changes, and epithelial as well as inflammatory features.15 Mucosal inflammation is usually graded by means of a scale composed of different features selected because they have proved sensitive in characterising the process.16 ,17 The reproducibility of the histological activity scores has not been studied extensively, but limited data available show good agreement between different observers for the scores that have been evaluated.11 ,16 The two main scoring methods proposed for the histological assessment of UC include the Riley score and the Geboes score.6 ,11 The Riley score mainly looks at density and distribution of neutrophils and at mucosal defects.18 The Geboes score is a more extensive grading system, which also evaluates density of mononuclear cells and density of eosinophils.18 More widely simpler used grading system exist; for Gramlich et al13 and Gupta et al, the term ‘activity’ refers to an infiltrate of neutrophils into the crypt epithelium.8 ,12
The aim of this study was to investigate the level of intraobserver reproducibility and interobserver agreement of available indexes and items of these indexes for the assessment of histological activity in UC. Relationship between indexes was also determined.
Materials and methods
This retrospective study included the first 30 patients who had a colonoscopy or proctosigmoidoscopy in the first quarter of 2012 with an established diagnosis of UC. All included patients were enrolled in the Nancy IBD cohort.
Information about the Nancy IBD cohort is reported to the Commission Nationale de l'Informatique et des Libertés (n°.1404720), which supervises the implementation of the act regarding data processing, data files and individual liberties that came into effect on 6 January 1978, and was amended on 6 August 2004, to protect the personal data of individuals.
A total of 102 H&E stained sections obtained in patients with established UC were examined. Biopsy specimens were scored independently by three experienced pathologists (AB, CB, CBR) with an interest in IBD. Five separate readings were performed: one reading was made by AB, one reading by CB, one reading by CBR and two second readings by AB and CB 1 month after first one. Pathologists read scores independently of clinical information. Slides were identified only by a number of recordings in our laboratory and were thus anonymous. It could not be linked to the clinic chart. The order was not randomised in the second reading, but a second reading after 1 month prevents such bias. Readings were made with light microscopy. There was no formal assessment of slide quality. However, the three pathologists did not encounter any problem with the quality of the slides. Both readings were, therefore, performed independently. For each biopsy specimen, five histological scores were calculated independently of clinical or endoscopic data (figure 1): the Geboes index, the Riley index, Gramlich index, Gupta index and global visual evaluation (GVE), which is a visual scale ranging 0 (minimal activity) to 10 (maximal activity). All four indexes are described in online supplementary text and online supplementary tables 1–5. In the Riley index, acute inflammatory infiltrate means acute inflammatory cells anywhere.6 For the Gramlich index, biopsies that had more than rare neutrophils in epithelium but not crypt abscesses were scored 1.
Statistical analysis was performed using SAS software (SAS Institute, Cary, NC 25513, USA; V.9.2). For qualitative variables, the intrareader reproducibility and the pairwise inter-reader agreement were evaluated by the pairwise percentage of agreement and the weighted Cohen’s κ coefficient19 The overall inter-reader agreement was evaluated with the Krippendorff's α coefficient20 and the average percentage of agreement. According to the terminology suggested by Landis and Koch,21 a κ value or a Krippendorff's α coefficient of <0 indicates poor agreement, 0–0.2 indicates slight agreement, 0.2–0.4 indicates fair agreement, 0.4–0.6 indicates moderate agreement, 0.6–0.8 indicates substantial agreement and 0.8–1.0 indicates almost perfect agreement. For quantitative parameters, the intrareader reproducibility and inter-reader agreement were assessed by the intraclass correlation coefficient (ICC) according to the Fleiss’ method.22 A value >0.8 was considered as a good agreement. All coefficients (weighted Cohen’s κ coefficient, ICC and Krippendorff's α coefficient) were computed with the 95% CI.
For the study of the inter-reader reproducibility, the first readings of AB and CB and the reading of CBR were considered. Relationship between indexes was studied by Spearman's or Pearson's correlation coefficient with the first reading of AB.
One hundred and two biopsy specimens from 30 patients with UC were reviewed: 13 were male, and median age at diagnosis was 42.5 years (range 26–66 years). Median of disease duration was 12.91 years (duration between initial diagnosis and first analysed biopsy in the present study). For extent of disease (Montreal classification), data were not available for 6 patients. Seven patients were E2 and 19 were E3. The 102 biopsy specimens were taken during 55 endoscopy procedures (colonoscopy or proctosigmoidoscopy). Clinical disease activity was present in 22 cases out of 55 (40%). Thirty-three cases out of 55 (60%) were made for disease surveillance. Endoscopic activity (Mayo score) was scored 0 in 21 cases out of 55 (38%), scoring 1 in 6 cases out of 55 (11%), scoring 2 in 12 cases out of 55 (22%) and scoring 3 in 16 cases out of 55 (29%).
The description of the first reading (AB1) with frequency (%) of each histological item is presented in online supplementary table 6. Results of the five readings of the 4 indexes and the GVE are summarised in online supplementary table 7.
Intraobserver reproducibility for histological activity indexes showed very good ICC values (ICC) for the Geboes, Riley indexes and GVE of respectively 0.94, 0.94 and 0.95 for the first reader (AB). For the second reader (CB), intraobserver reproducibility for histological activity indexes was very good for the Riley index and GVE (ICC 0.92 and 0.92, respectively) and good for the Geboes index (ICC 0.84). For Gramlich and Gupta indexes, κ values were also very good for the first reader (AB) with respectively 0.87 and 0.90 scores. Percentages of agreement for Gramlich and Gupta indexes were respectively 88.23% and 90.2%.
For the second reader (CB), the Gramlich index κ value was very good with 0.83 and Gupta index κ value was good with 0.78. Percentages of agreement for Gramlich and Gupta indexes were respectively 78.4% and 74.5% (table 1).
Intraobserver reproducibility between the items of each index
The greatest strength of reproducibility (almost perfect) was obtained for ‘erosion/ulceration’ since the κ scores for both readers was >0.80 (κ scores 0.92 (0.84–1) and 0.85 (0.77–0.93)) (table 2). A very good or good reproducibility was obtained for ‘neutrophils in lamina propria’, ‘erosion/ulceration’ and ‘neutrophils in epithelium’. For each of those three items, at least one reader had a κ score between 0.61 and 0.80. With κ value between 0.41 and 0.6 for at least one reader, a moderate reproducibility was obtained for ‘structural changes’, ‘eosinophils in lamina propria’ and ‘chronic inflammatory infiltrate’.
For the first reader (AB), reproducibility for ‘crypt abscesses’ (κ=1), ‘surface epithelial integrity’ (κ=0.93 (0.89–0.98)) and ‘acute inflammatory cell infiltrate’ (κ=0.89 (0.83–0.95)) was very good. For the second reader (CB), the reproducibility was lower than for reader 1 but was still good with all his κ scores between 0.61 and 0.8 (‘crypt abscesses’ 0.67 (0.55–0.79)); ‘surface epithelial integrity’ 0.73 (0.63–0.83); and ‘acute inflammatory cell infiltrate’ 0.75 (0.68–0.83)). For ‘crypt architectural irregularities’, ‘chronic inflammatory infiltrate’ and ‘mucin depletion’, reproducibility was moderate since at least one reader had a κ score between 0.41 and 0.60.
The greatest reproducibility was obtained for ‘ulceration’ (0.89 (0.79–1) for the first reader (AB) and 0.87 (0.75–0.98) for the second (CB)). Reproducibility for ‘crypt abscesses’ was good since the κ values of both readers were >0.7 (respectively 0.75 (0.57–0.92) and 0.7 (0.5–0.91)). Despite a good κ value for the first reader (AB) for ‘neutrophils in epithelium’ (κ 0.77 (0.65–0.90)), reproducibility of this item was moderate since the second reader had a fair agreement (κ 0.28 (0.05–0.51)).
For ‘ulceration’, the reproducibility was perfect (κ=1) for the first reader and good for the second reader (κ=0.67 (0.51–0.83)). For ‘neutrophils infiltration of <50% sampled crypts’ and ‘Neutrophils infiltration of >50% sampled crypts’, reproducibility was low since at least one reader had a κ value between 0 and 0.20.
Interobserver agreement showed good correlation with ICC for the Geboes, Riley indexes and GVE of respectively 0.86 (0.80–0.90), 0.90 (0.85–0.93) and 0.90 (0.86–0.93). Indices of Krippendorff's α reliability showed almost perfect agreement for Gramlich index (0.88 (0.83–0.92)) and Gupta index (0.81 (0.76–0.87)). Percentages of agreement for Gramlich and Gupta indexes were respectively 83.34% and 78.76% (table 3). Online supplementary table 8 summarises interobserver concordance scoring for all pairwise combinations of readers.
Interobserver agreement between the items of each index
The greatest strength of agreement (very good) was obtained for ‘erosion/ulceration’ (0.82 (0.77–0.88); 84.65% of agreement) and ‘neutrophils in lamina propria’ (0.82 (0.78–0.86); 66.67% of agreement) (table 4). Good strength of agreement was obtained for ‘neutrophils in epithelium’ (0.74 (0.68–0.8); 63.73% of agreement), ‘chronic inflammatory infiltrate’ (0.73 (0.68–0.78); 61.80% of agreement), ‘structural changes’ (0.65 (0.59–0.71); 54.58% of agreement) and ‘crypt destruction’ (0.63 (0.53–0.72); 72.23% of agreement). ‘Eosinophils in lamina propria’ obtained a fair strength of agreement (0.4 (0.32–0.48); 37.26% of agreement).
The greatest strength of agreement (perfect or very good) was obtained for ‘acute inflammatory cell infiltrate’ (0.85 (0.82–0.88); 62.42% of agreement). Good agreement was obtained for ‘surface epithelial integrity’ (0.75 (0.69–0.81); 66.02% of agreement), ‘mucin depletion’ (0.73 (0.67–0.79); 60.79% of agreement), ‘chronic inflammatory infiltrate’ (0.67 (0.61–0.72); 48.04% of agreement), crypt abscesses (0.67 (0.57–0.76); 79.10% of agreement) and ‘crypt architectural irregularities’ (0.64 (0.58–0.69); 40.02% of agreement).
The greatest strength of agreement (almost perfect) was obtained for ‘ulceration’ (0.9 (0.79–0.97); 96.08% of agreement). Moderate strength of agreement was obtained for ‘crypt abscesses’ (0.56 (0.33–0.79); 87.59% of agreement). For ‘neutrophils in epithelium’, strength of agreement was low with 66.67% of agreement (0.26 (0.07–0.44)).
The greatest strength of agreement (substantial or good) was obtained for ‘ulceration’ (0.69 (0.58–0.8); 87.59% of agreement). For ‘neutrophils infiltration of <50% sampled crypts’, the agreement was fair (0.36 (0.18–0.55); 70.6% of agreement). Despite 88.89% of agreement, ‘neutrophils infiltration of >50% sampled crypts’ showed very low κ values (0.05 (−0.54–0.57)) since all the discrepancies (n=15) were in the same way.
Correlation between indexes
Correlation between Geboes and Riley indexes was very strong (correlation coefficient, R=0.93; <0.0001), between Geboes index and GVE (R=0.95; <0.0001) and between Riley index and GVE (R=0.96; <0.0001) (figure 2).
Coefficient showed very strong correlation between Gramlich and Geboes indexes (R=0.91; <0.0001), between Gramlich and Riley indexes (R=0.88; <0.0001) and between Gramlich index and GVE (R=0.90; <0.0001).
Assessment of disease activity and severity is important for designing an optimal therapeutic strategy and follow-up of patients with UC.8–10 According to the European Crohn's and Colitis Organisation, the pathology report in all chronic colitides should give an indication of the activity of disease.23 ,24 A standard system for grading histological activity does not exist25 and numerous methods of classification of histological activity have been proposed.6 ,11 ,12 ,26 None of these scores has been validated. Validation is a prerequisite for using one of the scores to standardise reporting and grading.27 Intraobserver reproducibility and interobserver agreement of histological features is necessary to validate an index.
The two main scoring methods proposed for the histological assessment of UC are the Riley score and the Geboes score.18 In 2000, Geboes et al11 developed a grading scale of histological activity in UC with six criteria assessing acute inflammation and chronicity of disease. This grading system, evaluated on 99 biopsy samples, showed a good reproducibility and modest agreement with endoscopic grading system, which it complemented.11 The Geboes index has been used as a secondary point in relatively few clinical trials in UC.28 The Riley index was developed in 1991, including 6 histological features assessing acute and chronic inflammation. It was assessed on 82 biopsy samples for interobserver variability in the original publication and in a cohort of 215 Australian patients and was found to be highly reproducible.6 ,29 It has also been used in relatively few clinical trials.29–31 The differences between the Geboes index and the Riley index are the stepwise assessment and a more elaborate grading of crypt lesions and surface epithelial damage in the Geboes.32 Reproducibility of Gupta and Gramlich indexes is unknown.
Our study shows that there is a good correlation between all histological indices of UC activity. The intraobserver reproducibility and interobserver agreement of histological features that contributed to determine indexes is good for two items: ‘erosion/ulceration’ and ‘neutrophils in lamina propria or acute inflammatory cell infiltrate’. ‘Erosion:ulceration’ is a histological feature found in all studied indexes. This explains the very good correlation between indexes. ‘Neutrophils in lamina propria’ in Geboes index or ‘acute inflammatory cell infiltrate’ in the Riley index have the same meaning for a pathologist and appear to be a reliable histological feature, whereas neutrophils detection in epithelial cells or crypt (‘neutrophils in epithelium’, ‘crypt abscess’, ‘neutrophil infiltration of <50% of sampled crypts’ and ‘neutrophil infiltration of >50% of sampled crypts’) are the less reproducible histological features. Furthermore, ‘neutrophil infiltration of <50% of sampled crypts’ item is rarely found on biopsy specimen and by thus has little chance to discriminating. This finding confirms previous studies showing that neutrophils can be assessed reproducibly and that interobserver agreement is good for histological features associated with neutrophils.6 ,11 ,16 ,17 ‘Chronic inflammatory infiltrate’ and ‘eosinophils in lamina propria’ items show moderate reproducibility and interobserver agreement; this could be explained by the fact that these cells are normally present in lamina propria and morphological criteria of cells increase are not clearly defined. There are no values defining mild, moderate or severe increase. The boundaries of the normal cellular lamina propria infiltrate need clarification and standardisation.11 ,33 ,34 This finding confirms the earlier study11 that showed that evaluation of the lamina propria mononuclear infiltrate remained a problem and that identification of eosinophils is another source of disagreement. Furthermore, the distribution of increase inflammatory cells in lamina propria is often heterogeneous in UC and varies from one biopsy to another.11 Disagreement is partly due to the presence of more than one sample and discontinuity of the lesion when samples are compared.11 Disagreement in ‘architectural changes’ was mainly due to the availability of more than one sample from the same patient and the same area. Histological features that defined architectural changes are difficult to use. Contrary to Riley et al6 that showed a good reproducibility for ‘mucin depletion’, we found a moderate reproducibility and interobserver for this feature. Indeed, it depends on the fixation and colouration parameters of the sample. This may explain discrepancy between our findings and those initially found by Riley et al.6
In conclusion, correlation between indexes is strong. Intraobserver reproducibility and interobserver agreement for all indexes is very good. ‘Erosion/ulceration’ and ‘acute inflammatory cells infiltrate/neutrophils in lamina propria’ showed the best reproducibility among the histological items. These findings should be taken into account for the assessment of histological disease in UC in clinical practice.
Contributors AB performed the research, analysis and interpretation of data, drafting of the manuscript; CB, SD and CBR performed the research; JS did analysis of data, drafting of the manuscript; LP-B were responsible for the conception and design of study, analysis of data, drafting of the manuscript.
Competing interests None.
Ethics approval CNIL.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement We agree to share data.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.