Reliability among central readers in the evaluation of endoscopic findings from patients with Crohn's disease

Reena Khanna; Guangyong Zou; Geert D'Haens; Paul Rutgeerts; J W D McDonald; Marco Daperno; Brian G Feagan; William J Sandborn; Elena Dubcenco; Larry Stitt; Margaret K Vandervoort; Allan Donner; Allison Luo; Barrett G Levesque

doi:10.1136/gutjnl-2014-308973

Article Text

PDF

PDF +
Supplementary
Material

Endoscopy

Original article

Reliability among central readers in the evaluation of endoscopic findings from patients with Crohn's disease

Reena Khanna1,2,
Guangyong Zou1,3,
Geert D'Haens1,4,
Paul Rutgeerts5,
J W D McDonald1,
Marco Daperno6,
Brian G Feagan1,2,3,
William J Sandborn1,7,
Elena Dubcenco1,
Larry Stitt1,
Margaret K Vandervoort1,
Allan Donner1,
Allison Luo8,
Barrett G Levesque1,7

¹Robarts Clinical Trials, Robarts Research Institute, University of Western Ontario, London, Ontario, Canada
²Department of Medicine, University of Western Ontario, London, Ontario, Canada
³Department of Epidemiology and Biostatistics, University of Western Ontario, London, Ontario, Canada
⁴Inflammatory Bowel Disease Centre, Academic Medical Centre, Amsterdam, The Netherlands
⁵Department of Gastroenterology, University Hospital, Gasthuisberg, Leuven, Belgium
⁶Gastroenterology Division, A.O. Ordine Mauriziano, Torino, Italy
⁷Division of Gastroenterology, University of California San Diego, La Jolla, California, USA
⁸Bristol-Myers Squibb, Princeton, New Jersey, USA

Correspondence to Dr Brian G Feagan, Robarts Clinical Trials, Departments of Medicine, Epidemiology and Biostatistics, University of Western Ontario, London, Ontario, Canada N6A 5K8; brian.feagan{at}robartsinc.com

Abstract

Objective The Crohn's Disease Endoscopic Index of Severity (CDEIS) and Simple Endoscopic Score for Crohn's Disease (SES-CD) are commonly used to assess Crohn's disease (CD) activity; however, neither instrument has been fully validated. We assessed intra-rater and inter-rater reliability of these indices.

Design Video recordings of colonoscopies obtained from 50 patients with CD who participated in an induction trial of a biological therapy were triplicated and reviewed in random order by four central readers. Data were used to assess intra-rater and inter-rater reliability for CDEIS, SES-CD and a global evaluation of lesion severity (GELS). Subsequently, readers participated in a consensus process that identified common sources of disagreement.

Results Intraclass correlation coefficients (ICCs) for intra-rater reliability for CDEIS, SES-CD and GELS (95% CIs) were 0.89 (0.86 to 0.93), 0.91 (0.89 to 0.95) and 0.81 (0.77 to 0.89), respectively, with standard error of measurement (SEM) of 2.10, 2.42 and 1.15. The corresponding ICCs for inter-rater reliability were 0.71 (0.63 to 0.76), 0.83 (0.75 to 0.88) and 0.62 (0.52 to 0.70), with SEM of 3.42, 3.07 and 1.63, respectively. Correlation between CDEIS and GELS was 0.75, between SES-CD and GELS was 0.74 and between CDEIS and SES-CD was 0.92. The most common sources of disagreement were interpretation of superficial ulceration, definition of disease site at the ileocolonic anastomosis, assessment of anorectal lesions and grading severity of stenosis.

Conclusions Central reading of CDEIS and SES-CD had ‘substantial’ to ‘almost perfect’ intra-rater and inter-rater reliability; however, the responsiveness of these instruments is yet to be determined.

Trial registration number Clinicaltrials.gov NCT01466374.

INFLAMMATORY BOWEL DISEASE
ENDOSCOPY

https://doi.org/10.1136/gutjnl-2014-308973

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known on this subject?

Two endoscopic indices, the Crohn's Disease Endoscopic Index of Severity (CDEIS) and Simple Endoscopic Score for Crohn's Disease (SES-CD), have been widely used to assess Crohn's disease (CD) activity in clinical trials.
Neither endoscopic index has been fully validated.
Validated outcome measures are necessary for the conduct of clinical trials.

What are the new findings?

Central reading of CDEIS and SES-CD had ‘substantial’ to ‘almost perfect’ intra-rater and inter-rater reliability.
Four endoscopic lesions-superficial ulceration, defining location of ulcers involving two contiguous segments, differentiation between anal and rectal lesions, and grading severity of stenosis-led to the greatest disagreement among readers.
A consensus process identified the sources of variance and methods to standardise these assessments.

How might it impact on clinical practice in the foreseeable future?

Validation of outcome measures is necessary for clinical trial design.
Based on these results, the responsiveness of the indices and modifications to the indices will be evaluated.
Application of standardised rules may result in more consistent reporting of endoscopic activity of CD.

Clinical trials require validated instruments to determine disease activity. The Crohn's Disease Activity Index (CDAI), a composite instrument that incorporates patient reported outcomes, items generated from the physical examination, and the haematocrit, was developed more than 30 years ago to assess both patient eligibility and treatment efficacy in clinical trials.1 ,2 However, the CDAI is heavily weighted towards three subjective items, namely, abdominal pain, the presence of loose bowel movements and general well-being, that are collected by the patient using a 7-day diary. Recently, two important concerns have risen regarding the CDAI.

First, high placebo response rates are observed when patients are included in trials exclusively on the basis of CDAI-defined disease activity.3–5 In the SONIC trial,6 18% of patients who met the minimum CDAI entry criterion of ≥220 points had no objective evidence of active disease, as defined by the presence of lesions on ileocolonoscopy or an elevated C-reactive protein (CRP). A high remission rate (30%) was observed in these patients and no effect of infliximab/azathioprine treatment was identifiable in this subgroup, despite the presence of a compelling benefit of combination therapy in the overall population. A posthoc analysis that excluded these patients resulted in a greater estimate of the treatment effect.6 Hence, exclusive use of the CDAI to determine eligibility may result in the inclusion of a substantial proportion of patients without inflammatory disease, who will not contribute to the identification of a treatment effect, should one actually exist. Although this problem might be reduced through the adjunctive use of serum or faecal markers to confirm the presence of active disease,5 false positive and false negative rates are important limitations of these tests.

A second concern relates to the use of the CDAI to assess treatment efficacy. The patient-generated items that dominate the score are subjective, and they are also non-specific. Thus, other disease processes such as IBS,7 bile salt diarrhoea and non-inflammatory strictures can influence scores. As a result, the CDAI is an imprecise measure that requires relatively large numbers of patients, typically a minimum of 50 or 60 per treatment group, to identify a therapeutic benefit. The statistical inefficiency of this instrument is an important constraint to early drug development and to identifying smaller effect sizes in later drug development.

Based on these considerations an endoscopic-based index is an attractive alternative to the CDAI that is relatively objective,2 can be standardised and directly evaluates the inflammatory process of the underlying disease. Moreover, as was recently demonstrated for UC,8 central reading of digitally recorded endoscopic images by experts has the potential to reduce both measurement variability and bias.

Two endoscopic indices (EIs) currently exist to evaluate disease activity in CD. The Crohn's Disease Endoscopic Index of Severity (CDEIS),9 ,10 which was originally developed for use in a multicentre clinical trial of corticosteroid therapy, is based on scoring both the activity and extent of disease involvement within five intestinal segments (figure 1A). The inherent complexities of the CDEIS, in particular the need to estimate the percentage of surface area affected within each segment, led Daperno and colleagues to develop a modification, the Simplified Endoscopic Index of Severity (SES-CD).11 The principal difference between these instruments is that the complexity of surface area estimation and the number of calculations for the SES-CD were reduced (figure 1B). Preliminary validation data have been obtained for both instruments, however, intra-rater and inter-rater reliability have not been evaluated in large scale studies and formal assessments of other operating properties, such as reliability and responsiveness, have not been reported. Based on these considerations we determined whether these indices are sufficiently reproducible for use as outcome measures in multicentre randomised controlled trials.

Figure 1

(A) Crohn's Disease Endoscopic Index of Severity. *For partially explored segments and for the ileum, the 10 cm linear scale represents the surface effectively explored.9 (B) Simple Endoscopic Score for Crohn's Disease.11

Materials and methods

Study design

Fifty video recordings were obtained from patients with active CD who were eligible for participation in a randomised controlled trial of induction therapy using a biologic agent (Clinicaltrials.gov NCT01466374). The study was conducted in 33 centres across the USA, Puerto Rico, South Africa and Europe. These patients had a minimum CDAI score of 220 points and a CRP concentration ≥5 mg/L or a faecal calprotectin concentration ≥250 µg/g. Although endoscopic evidence of disease activity was not a requirement for participation in the trial, this source of videos ensured that (1) a broad spectrum of endoscopic disease activity was available for evaluation and (2) that the assessments occurred in a population of patients who were representative of patients participating in clinical trials.

Central readers independently reviewed the 50 recordings that were marked by segment (rectum, descending and sigmoid colon, transverse colon, ascending colon and terminal ileum), in the absence of clinical information. Disease severity was scored using the CDEIS, SES-CD and a global evaluation of lesion severity (GELS) measured on a 10 cm visual analogue scale where 0 represents no disease activity and 10 represents severe disease activity. The CDEIS score ranges from 0 to 44. The SES-CD score for each item ranges from 0 to 15 with the exception of stenosis which ranges between 0 and 11. The total score ranges between 0 and 56. Higher values indicate more severe endoscopic disease for both instruments.

Since the calculations for each EI were performed centrally, the readers were not aware of the total score when they graded the GELS.

Four gastroenterologists with extensive experience in scoring the two instruments participated as central readers (PR, JWDM, MD, GDH). The central readers were trained in the use of the central image management system that hosted the videos. Although these readers had participated in multiple clinical trials that used the EIs as outcome measures, for the purposes of this study additional standardised training materials were provided that specifically detailed scoring of the instruments.

This study was conducted from 2013 to 2014 and evaluated the intra-rater and inter-rater reliability of the CDEIS and the SES-CD using a design in which four raters performed three independent measurements for each of the 50 videos. To ensure that the three measurements were true replicates, each of the 50 videos was replicated twice to create 150 samples that were read by the four readers in random order. To minimise recall bias, the sequence was generated with the specification that copies of the same video were separated by a minimum of 20 videos. Reading occurred over a period of 7 months. Intrarater reliability was quantified by the correlation between a pair of randomly selected measurements on the same video made by a randomly selected rater, while inter-rater reliability was quantified by the correlation between randomly selected measurements on the same video made by a pair of randomly selected raters.

Following completion of the initial reading process, sources of disagreement among readers were identified using a two-step procedure. First, outlying videos were identified using case-deletion diagnostics for mixed models.12 This method allows assessment of the impact of observations from one patient on reliability estimates, by temporarily deleting these observations and obtaining the estimates for the remaining n-1 patients. The new estimate was compared with the original value which included the full dataset. This process was continued until the impact of each of the 50 patients was assessed. Since intraclass correlation coefficients (ICC) are defined by variance components, videos that had the largest impact on estimates of variance components were identified as outliers in the estimation of ICCs. Second, a consensus process was used to identify the most common items leading to disagreement, after re-reading the videos selected by the case-deletion diagnostics. Each reader completed a survey to identify potential sources of disagreement. Results were discussed among the group to attain consensus regarding the sources of variance and methods to standardise these assessments. In addition, the items with the greatest source of disagreement were removed and the modified CDEIS and SES-CD were recalculated.

Statistical analyses

Descriptive statistics were used to assess the clinical characteristics of the patients.

A distinctive feature of this study is that each reader made repeated measurements rendering classic methods, such as Shrout and Fleiss,13 inapplicable. Instead, point estimates for reliability were obtained using methods suggested by Eliasziw.14 Briefly, the ICC for inter-rater reliability is defined as the covariance between two measurements made by different readers on the same patient divided by the total variance, while that for intra-rater reliability is defined as the covariance between two measurements made by the same reader on the same patient divided by the total variance. For estimation of these ICCs, a two-way random effects model with interaction between videos and readers was adopted for the CDEIS, SES-CD and GELS. The rater effects were regarded as random so that the emphasis could be directed at the reliability of the measurement process rather than in potential differences among raters. The two-way random effects model for a repeated measures design14 partitions total variability of measurements into four components: video, rater, inter-rater and intra-rater random errors. The intra-rater ICC can then be defined as the sum of variance components for video, rater and inter-rater divided by the total variance, while the inter-rater ICC can be defined as variance components for video divided by the total variance.14 ,15 To avoid the normality assumption for the raw data, the associated two-sided 95% CIs for ICCs were obtained using the non-parametric percentile bootstrap method, commonly known as the cluster bootstrap method,16 with 2000 replicates sampled and replaced at the level of the video to maintain the structure of the data.

The strength of reliability was interpreted according to the subjective but well-established, benchmarks of Landis and Koch whereby kappa of <0.00, 0.00–0.20, 0.21–0.40, 0.41–0.60, 0.61–0.80, above 0.80 indicate poor, slight, fair, moderate, substantial and almost perfect reliability, respectively.17

These benchmarks are more conservative than those suggested by Fleiss et al18 and Cicchetti19 who unified the benchmarks for kappa, weighted kappa and ICCs. Correlations between the EIs and the GELS were estimated using mixed models to account for repeated measurements,20 with 95% CI obtained using the cluster bootstrap percentile method.16

As pointed out in the previous literature,14 ,21 ,22 standard error of measurement (SEM) is another important aspect of a reliability study. For the design in the present study, inter-rater SEM may be defined as the square root of the sum of variance components for rater, inter-rater and intra-rater random error, while the intra-rater SEM is the square root of intra-rater random error.

The results of the consensus process to determine sources of disagreement were reported using descriptive methods. Regression of the CDEIS and SES-CD against the GELS after exclusion of these items was used to assess correlation coefficients of the modified indices.

The total number of videos required, which was determined to be conservative without considering triplicates, was estimated using the method suggested by Zou.23 Assuming a true ICC of 0.75, the rating of 50 videos by a minimum of four central readers would yield an 83% chance of obtaining a one-sided 95% lower boundary that is greater than 0.6, the ‘substantial’ reliability criterion. A table of samples sizes for various statistical parameters is provided in online supplementary table S1.

The endoscopic videos analysed in this study were obtained during the execution of a separate clinical trial protocol. The informed consent obtained during the clinical trial complied with the International Conference of Harmonisation-Good Clinical Practice and all applicable regulatory requirement(s). The consent of study subjects included the use of the collected data for other medical purposes. All subject information used in this study was de-identified with respect to the originating study, subject identification number and investigational site. The video assessments were made based on the visual information presented in the video, and did not affect the corresponding subject's treatment and/or well-being.

Results

Study population

Demographic information on the patients from which the 50 videos were obtained is summarised in table 1. The average age of the patients was 39.6, and 46.9% were men. The range of endoscopic activity for the videos, based on the average GELS score of the four readers, is shown in figure 2.

View this table:

Table 1

Patient demographics and baseline characteristics

Figure 2

Histogram of Global Evaluation of Lesion Severity (GELS). n=49. GELS measured on a 10 cm visual analogue scale where 0 represents no disease activity and 10 represents severe disease. The average GELS score was based on 12 videos (incorporating the replicates).

Intrarater and inter-rater reliability

ICCs for intra-rater agreement for the CDEIS, SES-CD and GELS scores (95% CIs) were 0.89 (0.86–0.93), 0.91 (0.89–0.95) and 0.81 (0.77–0.89), respectively, with SEM of 2.10, 2.42 and 1.15. The corresponding ICCs for inter-rater agreement were 0.71 (0.63–0.76), 0.83 (0.75–0.88) and 0.62 (0.52–0.70) , respectively, with SEM of 3.42, 3.07 and 1.63. The item level data are also shown in tables 2 and 3. Stenosis performed the most poorly of the items, with the lowest ICCs for intra-rater and inter-rater agreement for both the CDEIS and SES-CD.

View this table:

Table 2

Reliability of CDEIS and its components*

View this table:

Table 3

Reliability of SES-CD and its components*

Correlation between CDEIS, SES-CD and GELS

The correlation coefficient between CDEIS and GELS, SES-CD and GELS, and CDEIS and SES-CD was 0.75 (95% CI 0.67 to 0.81), 0.74 (95% CI 0.64 to 0.81) and 0.92 (95% CI 0.88 to 0.95), respectively (table 4).

View this table:

Table 4

Correlation between the indices

Disagreement and consensus on scoring CDEIS and SES-CD

Of the 50 videos, 10 were responsible for the greatest disagreement. Based on the consensus process, the most common sources of disagreement were interpretation of superficial ulceration, defining location of ulcers involving two contiguous segments, differentiating between anal and rectal lesions and grading severity of stenosis. For the CDEIS and SES-CD, assessment of stenosis contributed the most to disagreement, both by examination of the item level ICCs and by their influence on variance components as exhibited by Cook's D statistic. The correlation coefficients for the original and modified indices (excluding evaluation of stenosis) with GELS were nearly identical (table 4).

Based on a consensus process, conventions to standardise the interpretation of the items with the greatest disagreement were developed. A detailed description of the new item definitions and conventions are provided in table 5; however, a summary of the major recommendations follows.

In both the CDEIS and SES-CD, superficial ulcers, including aphthous ulcerations, contribute to the ulcerated surface.
Ulcerations confined to the junction of two segments are assigned to the most affected segment.
Anal lesions will contribute to the rectal score.
Narrowing was defined as a decrease in the diameter of the lumen, compared with the previous and subsequent bowel, which does not insufflate with air, and may be associated with trauma when passing the lesion. A stenosis that cannot be passed without dilation is scored as ‘stenosis’ or a ‘non-passable narrowing’ in the CDEIS and SES-CD, respectively. Stenoses spanning two segments are assigned to the distal segment, with the exception of anal stenosis that is assigned to the rectal segment. When dilation is required to complete a screening procedure, the patient will be deemed ineligible for enrolment in a clinical trial; however, if dilation is performed in a setting other than screening, segments proximal to the dilated stenosis may be scored provided that a sufficient proportion is observed.

View this table:

Table 5

Consensus panel scoring results

Discussion

Theoretically, endoscopic assessment should improve the efficiency of clinical trials by ensuring that eligible patients have active inflammation and by reducing measurement variability. For these advantages to be realised, valid instruments with high intra-rater and inter-rater reliability are required. Although CDEIS and SES-CD correlate with each other as well as to changes in clinical disease severity,24 few studies have evaluated their operating properties.

In the current study, central reading of the EIs for CD, by experts, had ‘substantial’ to ‘almost perfect’ intraobserver and inter-observer reliability. These results indicate that central reading is highly reliable for the assessment of CD endoscopic disease activity. Furthermore, the substantial correlation observed between the EIs and the GELS-based global rating of disease activity supports their validity. Nevertheless, our study identified specific items in both instruments with suboptimal performance. The most common sources of disagreement were the interpretation of superficial ulceration, defining location of ulcers involving two contiguous segments, scoring anal lesions and grading stenosis. These findings underscore the need for improved definitions to increase the reliability in scoring of these items.

As a result of a consensus process, we suggest that superficial ulcers should be included in the assessment of the proportion of ulcerated surface. Despite training and the consensus process, there was continued disagreement regarding the scoring of this item in the CDEIS. As a result, consensus regarding a convention was not achieved. The lack of standardisation when scoring superficial ulcers remains a weakness of the CDEIS, as there is an inherent challenge in distinguishing between superficial and deep ulcers on a flat screen. Anastomotic lesions should be scored as part of the more affected segment, thereby avoiding inflating the score by attributing the lesion to both segments. Anal canal lesions are included in the assessment of the rectum. Although this region is technically difficult to assess due to the inherent erythema and limited visualisation from poor distension, anal disease can be an important cause of symptoms.

Stenosis performed the most poorly, both in terms of influence on the variance components and by inspection of item level ICCs. Removal of stenosis from the CDEIS and SES-CD did not change the correlations of these modified indices compared with the original scales and GELS (table 4). Furthermore, stenosis is unlikely to contribute to the evaluation of disease activity for several reasons:

Unlike ulceration, stenosis occurs relatively infrequently.
Once a stenosis is encountered, it frequently cannot be passed, and the remaining segments cannot be scored. Importantly, the SES-CD does not provide a procedure to account for the segments that are not visualised.
Stenosis performed poorly in both the CDEIS and SES-CD.
Stenosis is unlikely to be as responsive to therapy as the other endoscopic items.

Accordingly, our results suggest that stenosis might be removed from these indices to simplify scoring without influencing the correlation with GELS. However, a study that evaluates assessment with the original EIs and retests with the modified indices (without stenosis) is required prior to adopting them into routine use. In the interim, the consensus process has defined narrowing, attributed these lesions to the distal segment and established conventions for scoring in the setting of dilation. Application of these standard conventions provides a more rigorous assessment by eliminating the variability in interpretation of stenotic lesions.

Several limitations exist to this work. First, these results are based on a single trial that evaluated one agent. However, the endoscopic videos were assessed by blinded central readers who were unaware of treatment strategy or study of origin. Second, as the current central readers were highly experienced in the use of the EIs, these results may not be generalisable to other readers. However, the objective of this study was to define the optimum operating characteristics and to identify potential sources of disagreement, when using the CDEIS and SES-CD as outcome measures in clinical trials, which mandated the use of experienced central readers. Finally, and most importantly, the responsiveness of these indices is yet to be determined by examining baseline and post-treatment videos in patients who receive a treatment of known efficacy. However, an instrument with high inter-observer reliability is likely to be more responsive if multiple readers are used. The SES-CD has shown numerically higher ICCs, stronger correlation with the GELS and is less cumbersome to calculate than the CDEIS, but further study is required before it can be considered the instrument of choice for use in clinical research.

By determining the reliability of the EIs for CD, this study has taken an important step forward in validating the CDEIS and SES-CD. However, to be clinically useful, these endoscopic instruments must be valid, responsive, reliable and feasible. These remaining properties are being evaluated in continuing studies that will ultimately define the optimal endoscopic score for use in clinical trials.

Acknowledgments

This research was based on data obtained during the conduct of a Bristol-Myers Squibb trial of induction therapy for a biological agent.

References

↵
1. Best WR,
2. Becktel JM,
3. Singleton JW, et al
. Development of a Crohn's disease activity index. National Cooperative Crohn's Disease Study. Gastroenterology 1976;70:439–44.
OpenUrl PubMed Web of Science
↵
1. Sandborn WJ,
2. Feagan BG,
3. Hanauer SB, et al
. A review of activity indices and efficacy endpoints for clinical trials of medical therapy in adults with Crohn's disease. Gastroenterology 2002;122:512–30. doi:10.1053/gast.2002.31072
OpenUrl CrossRef PubMed Web of Science
↵
1. Schreiber S,
2. Khaliq-Kareemi M,
3. Lawrance IC, et al
. Maintenance therapy with certolizumab pegol for Crohn's disease. N Engl J Med 2007;357:239–50. doi:10.1056/NEJMoa062897
OpenUrl CrossRef PubMed Web of Science
↵
1. Hanauer SB,
2. Feagan BG,
3. Lichtenstein GR, et al
. Maintenance infliximab for Crohn's disease: the ACCENT I randomised trial. Lancet 2002;359:1541–9. doi:10.1016/S0140-6736(02)08512-4
OpenUrl CrossRef PubMed Web of Science
↵
1. Sandborn WJ,
2. Gasink C,
3. Gao LL, et al
. Ustekinumab induction and maintenance therapy in refractory Crohn's disease. N Engl J Med 2012;367:1519–28. doi:10.1056/NEJMoa1203572
OpenUrl CrossRef PubMed Web of Science
↵
1. Colombel JF,
2. Sandborn WJ,
3. Reinisch W, et al
. Infliximab, azathioprine, or combination therapy for Crohn's disease. N Engl J Med 2010;362:1383–95. doi:10.1056/NEJMoa0904492
OpenUrl CrossRef PubMed Web of Science
↵
1. Lahiff C,
2. Safaie P,
3. Awais A, et al
. The Crohn's disease activity index (CDAI) is similarly elevated in patients with Crohn's disease and in patients with irritable bowel syndrome. Aliment Pharmacol Ther 2013;37:786–94. doi:10.1111/apt.12262
OpenUrl CrossRef PubMed
↵
1. Feagan BG,
2. Sandborn WJ,
3. D'Haens G, et al
. The role of centralized reading of endoscopy in a randomized controlled trial of mesalamine for ulcerative colitis. Gastroenterology 2013;145:149–57.e2. doi:10.1053/j.gastro.2013.03.025
OpenUrl CrossRef PubMed Web of Science
↵
1. Mary JY,
2. Modigliani R
. Development and validation of an endoscopic index of the severity for Crohn's disease: a prospective multicentre study. Groupe d'Etudes Therapeutiques des Affections Inflammatoires du Tube Digestif (GETAID). Gut 1989;30:983–9. doi:10.1136/gut.30.7.983
OpenUrl Abstract/FREE Full Text
↵
1. Modigliani R,
2. Mary JY
. Reproducibility of colonoscopic findings in Crohn's disease: a prospective multicenter study of interobserver variation. Groupe d'Etudes Therapeutiques des Affections Inflammatoires du Tube Digestif (GETAID). Dig Dis Sci 1987;32:1370–9. doi:10.1007/BF01296663
OpenUrl CrossRef PubMed
↵
1. Daperno M,
2. D'Haens G,
3. Van Assche G, et al
. Development and validation of a new, simplified endoscopic activity score for Crohn's disease: the SES-CD. Gastrointest Endosc 2004;60:505–12. doi:10.1016/S0016-5107(04)01878-4
OpenUrl CrossRef PubMed
↵
1. Christensen R,
2. Pearson LM,
3. Johnson W
. Case-deletion diagnostics for mixed models. Technometrics 1992;34:38–45. doi:10.2307/1269550
OpenUrl CrossRef Web of Science
↵
1. Shrout PE,
2. Fleiss JL
. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420–8. doi:10.1037/0033-2909.86.2.420
OpenUrl CrossRef PubMed Web of Science
↵
1. Eliasziw M,
2. Young SL,
3. Woodbury MG, et al
. Statistical methodology for the concurrent assessment of interrater and intra-rater reliability: using goniometric measurements as an example. Phys Ther 1994;74:777–88.
OpenUrl Abstract/FREE Full Text
↵
1. Damon RA,
2. Harvey WR
. Experimental design, ANOVA, and regression. New York: Harper & Row, 1987.
↵
1. Davison AC,
2. Hinkley DV
. Bootstrap methods and their application. Vol. 1. Cambridge University Press, 1997.
↵
1. Landis JR,
2. Koch GG
. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74. doi:10.2307/2529310
OpenUrl CrossRef PubMed Web of Science
↵
1. Fleiss JL,
2. Levin B,
3. Paik MC
. Statistical Methods for Rates and Proportions. 3rd edn. Hoboken, New Jersey: A John Wiley & Sons, Inc., 2003.
↵
1. Cicchetti D
. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 1994;6:284–90. doi:10.1037/1040-3590.6.4.284
OpenUrl CrossRef
↵
1. Hamlett A,
2. Ryan L,
3. Serrano-Trespalacios P, et al
. Mixed models for assessing correlation in the presence of replication. J Air Waste Manag Assoc 2003;53:442–50. doi:10.1080/10473289.2003.10466174
OpenUrl CrossRef PubMed Web of Science
↵
1. de Vet HC,
2. Terwee CB,
3. Knol DL, et al
. When to use agreement versus reliability measures. J Clin Epidemiol 2006;59:1033–9. doi:10.1016/j.jclinepi.2005.10.015
OpenUrl CrossRef PubMed Web of Science
↵
1. Weir JP
. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res 2005;19:231–40.
OpenUrl CrossRef PubMed Web of Science
↵
1. Zou GY
. Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Stat Med 2012;31:3972–81. doi:10.1002/sim.5466
OpenUrl CrossRef PubMed
↵
1. Sipponen T,
2. Nuutinen H,
3. Turunen U, et al
. Endoscopic evaluation of Crohn's disease activity: comparison of the CDEIS and the SES-CD. IBD 2010;16:2131–6. doi:10.1002/ibd.21300
OpenUrl

View Abstract

Supplementary materials

Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Files in this Data Supplement:

Data supplement 1 - Online supplement

Footnotes

Contributors RK and BGL: study concept and design, analysis and interpretation of data, drafting the manuscript, critical revision of the manuscript. LS, AD and GYZ: study concept and design, analysis and interpretation of data, statistical analysis, critical revision of the manuscript. GDH, PR, JWDM, MD and AL: study concept and design, critical revision of the manuscript. BGF and WJS: study concept and design, analysis and interpretation of data, critical revision of the manuscript. ED: study concept and design, administrative, study supervision. MKV: administrative, study supervision.
Competing interests RK has received honoraria from Takeda Pharma and consulting fees from AbbVie. GYZ is an employee of Robarts Clinical Trials which was the research organisation that conducted this study. GDH has received consulting fees from Abbott/AbbVie, ActoGeniX NV, Amgen, AM-Pharma BV, Boehringer-Ingelheim, ChemoCentryx, Centocor/Jansen Biologics, Cosmo Technologies, Elan/Biogen, EnGene, Ferring Pharmaceuticals, Gilead Sciences, Given Imaging, GSK, Merck Research Laboratories, Merck Serono, Millennium Pharmaceuticals, Novo Nordisk, NPS Pharmaceuticals, PDL Biopharma, Pfizer, Receptos, Salix Pharmaceuticals, Schering Plough, Shire Pharmaceuticals, Sigmoid Pharma, Teva Pharmaceuticals, Tillotts Pharma AG, UCB Pharma; research grants from AbbVie, GSK, Falk, Janssen, Merck, Given Imaging; payments for lectures/speakers bureaux from AbbVie, Jansen, Merck, Takeda, UCB, Shire. PR discloses consulting fees from Centocor, Merck, UCB, Abbott, Millennium/Takeda, Genentech/Hoffman LaRoche, Neovacs, Merck/Serono, Bristol Myers Squibb, Robarts Clinical Trials, Tillotts Pharma AG, Pfizer and Falk Pharma; lecture fees from Centocor, Merck and Abbott; and research support (university) from Centocor, Merck, UCB and Abbott. JWDM is an employee of Robarts Clinical Trials which was the research organisation that conducted this study. MD: AbbVie, MSD. This is not the initials of an author. MSD is the pharmaceutical company Merck. Board member and lectures; Sofar, Ferring, Chiesi: lectures; Hospira, Takeda: Board member. BGF has received grant/research support from Millennium Pharmaceuticals, Merck, Tillotts Pharma AG, Abbott Labs, Novartis Pharmaceuticals, Centocor, Elan/Biogen, UCB Pharma, Bristol-Myers Squibb, Genentech, ActoGenix, Wyeth Pharmaceuticals; consulting fees from Millennium Pharmaceuticals, Merck, Centocor, Elan/Biogen, Janssen-Ortho, Teva Pharmaceuticals, Bristol-Myers Squibb, Celgene, UCB Pharma, Abbott Labs, Astra Zeneca, Serono, Genentech, Tillotts Pharma AG, Unity Pharmaceuticals, Albireo Pharma, Given Imaging, Salix Pharmaceuticals, Novonordisk, GSK, Actogenix, Prometheus Therapeutics and Diagnostics, Athersys, Axcan, Gilead, Pfizer, Shire, Wyeth, Zealand Pharma, Zyngenia, GiCare Pharma, Sigmoid Pharma; Speakers Bureau for UCB, Abbott, J&J/Janssen. WJS has received consulting fees from Abbott, ActoGeniX NV, AGI Therapeutics, Alba Therapeutics Corp, Albireo, Alfa Wasserman, Amgen, AM-Pharma BV, Anaphore, Astellas, Athersys, Atlantic Healthcare, Aptalis, BioBalance Corp, Boehringer-Ingelheim, Bristol-Myers Squibb, Celgene, Celek Pharmaceuticals, Cellerix SL, Cerimon Pharmaceuticals, ChemoCentryx, CoMentis, Cosmo Technologies, Coronado Biosciences, Cytokine Pharmasciences, Eagle Pharmaceuticals, EnGene, Eli Lilly, Enteromedics, Exagen Diagnostics, Ferring Pharmaceuticals, Flexio Therapeutics, Funxional Therapeutics, Genzyme Corp, Gilead Sciences, Given Imaging, GSK, Human Genome Sciences, Ironwood Pharmaceuticals, KaloBios Pharmaceuticals, Lexicon Pharmaceuticals, Lycera Corp, Meda Pharmaceuticals, Merck Research Laboratories, Merck Serono, Millennium Pharmaceuticals, Nisshin Kyorin Pharmaceuticals, Novo Nordisk, NPS Pharmaceuticals, Optimer Pharmaceuticals, Orexigen Therapeutics, PDL Biopharma, Pfizer, Procter and Gamble, Prometheus Laboratories, ProtAb, Purgenesis Technologies, Relypsa, Roche, Salient Pharmaceuticals, Salix Pharmaceuticals, Santarus, Schering Plough, Shire Pharmaceuticals, Sigmoid Pharma, Sirtris Pharmaceuticals, SLA Pharma UK, Targacept, Teva Pharmaceuticals, Therakos, Tillotts Pharma AG, TxCell SA, UCB Pharma, Viamet Pharmaceuticals, Vascular Biogenics, Warner Chilcott UK and Wyeth; research grants from Abbott, Bristol-Myers Squibb, Genentech, GSK, Janssen, Milennium Pharmaceuticals, Novartis, Pfizer, Procter and Gamble, Shire Pharmaceuticals and UCB Pharma; payments for lectures/speakers bureaux from Abbott, Bristol-Myers Squibb and Janssen; and holds stock/stock options in Enteromedics. ED is an employee of Robarts Clinical Trials which was the research organisation that conducted this study. MKV is an employee of Robarts Clinical Trials which was the research organisation that conducted this study. LS and AD are employees of Robarts Clinical Trials which was the research organisation that conducted this study. AL: Employee of Bristol-Myers Squibb (BMS) and holds stock/stock options in BMS. BGL is a consultant for Prometheus Laboratories and Santarus has participated on Speakers’ Bureau for Salix, and is a consultant for Robarts Clinical Trials which was the research organisation that conducted this study.
Ethics approval Research Ethics Board, University of Western Ontario, London, Ontario, Canada.
Provenance and peer review Not commissioned; externally peer reviewed.

[1] ↵
Best WR,
Becktel JM,
Singleton JW, et al
. Development of a Crohn's disease activity index. National Cooperative Crohn's Disease Study. Gastroenterology 1976;70:439–44.
OpenUrl PubMed Web of Science

[2] Best WR,

[3] Becktel JM,

[4] Singleton JW, et al

[5] ↵
Sandborn WJ,
Feagan BG,
Hanauer SB, et al
. A review of activity indices and efficacy endpoints for clinical trials of medical therapy in adults with Crohn's disease. Gastroenterology 2002;122:512–30. doi:10.1053/gast.2002.31072
OpenUrl CrossRef PubMed Web of Science

[6] Sandborn WJ,

[7] Feagan BG,

[8] Hanauer SB, et al

[9] ↵
Schreiber S,
Khaliq-Kareemi M,
Lawrance IC, et al
. Maintenance therapy with certolizumab pegol for Crohn's disease. N Engl J Med 2007;357:239–50. doi:10.1056/NEJMoa062897
OpenUrl CrossRef PubMed Web of Science

[10] Schreiber S,

[11] Khaliq-Kareemi M,

[12] Lawrance IC, et al

[13] ↵
Hanauer SB,
Feagan BG,
Lichtenstein GR, et al
. Maintenance infliximab for Crohn's disease: the ACCENT I randomised trial. Lancet 2002;359:1541–9. doi:10.1016/S0140-6736(02)08512-4
OpenUrl CrossRef PubMed Web of Science

[14] Hanauer SB,

[15] Feagan BG,

[16] Lichtenstein GR, et al

[17] ↵
Sandborn WJ,
Gasink C,
Gao LL, et al
. Ustekinumab induction and maintenance therapy in refractory Crohn's disease. N Engl J Med 2012;367:1519–28. doi:10.1056/NEJMoa1203572
OpenUrl CrossRef PubMed Web of Science

[18] Sandborn WJ,

[19] Gasink C,

[20] Gao LL, et al

[21] ↵
Colombel JF,
Sandborn WJ,
Reinisch W, et al
. Infliximab, azathioprine, or combination therapy for Crohn's disease. N Engl J Med 2010;362:1383–95. doi:10.1056/NEJMoa0904492
OpenUrl CrossRef PubMed Web of Science

[22] Colombel JF,

[23] Sandborn WJ,

[24] Reinisch W, et al

[25] ↵
Lahiff C,
Safaie P,
Awais A, et al
. The Crohn's disease activity index (CDAI) is similarly elevated in patients with Crohn's disease and in patients with irritable bowel syndrome. Aliment Pharmacol Ther 2013;37:786–94. doi:10.1111/apt.12262
OpenUrl CrossRef PubMed

[26] Lahiff C,

[27] Safaie P,

[28] Awais A, et al

[29] ↵
Feagan BG,
Sandborn WJ,
D'Haens G, et al
. The role of centralized reading of endoscopy in a randomized controlled trial of mesalamine for ulcerative colitis. Gastroenterology 2013;145:149–57.e2. doi:10.1053/j.gastro.2013.03.025
OpenUrl CrossRef PubMed Web of Science

[30] Feagan BG,

[31] Sandborn WJ,

[32] D'Haens G, et al

[33] ↵
Mary JY,
Modigliani R
. Development and validation of an endoscopic index of the severity for Crohn's disease: a prospective multicentre study. Groupe d'Etudes Therapeutiques des Affections Inflammatoires du Tube Digestif (GETAID). Gut 1989;30:983–9. doi:10.1136/gut.30.7.983
OpenUrl Abstract/FREE Full Text

[34] Mary JY,

[35] Modigliani R

[36] ↵
Modigliani R,
Mary JY
. Reproducibility of colonoscopic findings in Crohn's disease: a prospective multicenter study of interobserver variation. Groupe d'Etudes Therapeutiques des Affections Inflammatoires du Tube Digestif (GETAID). Dig Dis Sci 1987;32:1370–9. doi:10.1007/BF01296663
OpenUrl CrossRef PubMed

[37] Modigliani R,

[38] Mary JY

[39] ↵
Daperno M,
D'Haens G,
Van Assche G, et al
. Development and validation of a new, simplified endoscopic activity score for Crohn's disease: the SES-CD. Gastrointest Endosc 2004;60:505–12. doi:10.1016/S0016-5107(04)01878-4
OpenUrl CrossRef PubMed

[40] Daperno M,

[41] D'Haens G,

[42] Van Assche G, et al

[43] ↵
Christensen R,
Pearson LM,
Johnson W
. Case-deletion diagnostics for mixed models. Technometrics 1992;34:38–45. doi:10.2307/1269550
OpenUrl CrossRef Web of Science

[44] Christensen R,

[45] Pearson LM,

[46] Johnson W

[47] ↵
Shrout PE,
Fleiss JL
. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420–8. doi:10.1037/0033-2909.86.2.420
OpenUrl CrossRef PubMed Web of Science

[48] Shrout PE,

[49] Fleiss JL

[50] ↵
Eliasziw M,
Young SL,
Woodbury MG, et al
. Statistical methodology for the concurrent assessment of interrater and intra-rater reliability: using goniometric measurements as an example. Phys Ther 1994;74:777–88.
OpenUrl Abstract/FREE Full Text

[51] Eliasziw M,

[52] Young SL,

[53] Woodbury MG, et al

[54] ↵
Damon RA,
Harvey WR
. Experimental design, ANOVA, and regression. New York: Harper & Row, 1987.

[55] Damon RA,

[56] Harvey WR

[57] ↵
Davison AC,
Hinkley DV
. Bootstrap methods and their application. Vol. 1. Cambridge University Press, 1997.

[58] Davison AC,

[59] Hinkley DV

[60] ↵
Landis JR,
Koch GG
. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74. doi:10.2307/2529310
OpenUrl CrossRef PubMed Web of Science

[61] Landis JR,

[62] Koch GG

[63] ↵
Fleiss JL,
Levin B,
Paik MC
. Statistical Methods for Rates and Proportions. 3rd edn. Hoboken, New Jersey: A John Wiley & Sons, Inc., 2003.

[64] Fleiss JL,

[65] Levin B,

[66] Paik MC

[67] ↵
Cicchetti D
. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 1994;6:284–90. doi:10.1037/1040-3590.6.4.284
OpenUrl CrossRef

[68] Cicchetti D

[69] ↵
Hamlett A,
Ryan L,
Serrano-Trespalacios P, et al
. Mixed models for assessing correlation in the presence of replication. J Air Waste Manag Assoc 2003;53:442–50. doi:10.1080/10473289.2003.10466174
OpenUrl CrossRef PubMed Web of Science

[70] Hamlett A,

[71] Ryan L,

[72] Serrano-Trespalacios P, et al

[73] ↵
de Vet HC,
Terwee CB,
Knol DL, et al
. When to use agreement versus reliability measures. J Clin Epidemiol 2006;59:1033–9. doi:10.1016/j.jclinepi.2005.10.015
OpenUrl CrossRef PubMed Web of Science

[74] de Vet HC,

[75] Terwee CB,

[76] Knol DL, et al

[77] ↵
Weir JP
. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res 2005;19:231–40.
OpenUrl CrossRef PubMed Web of Science

[78] Weir JP

[79] ↵
Zou GY
. Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Stat Med 2012;31:3972–81. doi:10.1002/sim.5466
OpenUrl CrossRef PubMed

[80] Zou GY

[81] ↵
Sipponen T,
Nuutinen H,
Turunen U, et al
. Endoscopic evaluation of Crohn's disease activity: comparison of the CDEIS and the SES-CD. IBD 2010;16:2131–6. doi:10.1002/ibd.21300
OpenUrl

[82] Sipponen T,

[83] Nuutinen H,

[84] Turunen U, et al

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Significance of this study

What is already known on this subject?

What are the new findings?

How might it impact on clinical practice in the foreseeable future?

Materials and methods

Study design

Statistical analyses

Results

Study population

Intrarater and inter-rater reliability

Correlation between CDEIS, SES-CD and GELS

Disagreement and consensus on scoring CDEIS and SES-CD

Discussion

Acknowledgments

References

Supplementary materials

Supplementary Data

Footnotes

Read the full text or download the PDF:

Log in using your username and password