Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs). We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR.
Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection. The primary outcome was ADR.
Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis. The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p<0.001) and the mean number of adenomas per patient (0.53vs0.31, p<0.001). This was due to a higher number of diminutive adenomas found (185vs102; p<0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p<0.001).
Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps. The cost–benefit ratio of such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221; Results.
- colorectal cancer screening
- computerised image analysis
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Significance of this study
What is already known on this subject?
Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods.
Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies.
What are the new findings?
This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%.
This effect was mainly due to a higher rate of small adenomas found.
The detection rate of hyperplastic polyps was also significantly increased.
How might it impact on clinical practice in the foreseeable future?
Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates.
However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented.
Colorectal cancer (CRC) is the second and third-leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold standard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator characteristics.11 12
Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15
Ideally, a real-time automatic polyp detection system, with performance close to that of expert endoscopists, could assist the endoscopist in detecting lesions that might correspond to adenomas in a more consistent and reliable way than a human assistant. Though several automatic polyp detection systems have been developed over the past decade,16 17 evidence on the ability of this technology to locate and trace polyps in clinical practice during live colonoscopy is lacking.
The aim of this study was to investigate whether a high-performance real-time automatic polyp detection system18 can increase polyp and adenoma detection rates in the real clinical setting.
Materials and methods
Real-time automatic polyp detection system
In a preliminary study from our group, the algorithm was validated and found to have a per-image sensitivity of 94.38% per-image specificity of 95.92% and an area under the receiver operating characteristic curve of 0.984. By using a multithreaded processing system, the system processed at least 25 frames per second with a latency of 76.80±5.60 ms in real-time video analysis.18 Furthermore, the detection delay was hardly noticeable for endoscopists. The system monitor was fixed adjacent and parallel with the original endoscopy monitor.
Prospective comparative study
The prospective study was designed as a randomised controlled trial to investigate the impact of an automatic polyp detection system acting as an assistant to the endoscopist on PDR and ADR. This study was conducted in the Endoscopy Center of the Sichuan Provincial People’s Hospital, China. Consecutive patients who underwent a colonoscopy from September 2017 to February 2018 were eligible for enrolment (figure 2). Routine bowel preparation consisted of 4 L of polyethylene glycol, given in split doses. Colonoscopies were performed with high definition colonoscopes (Olympus CF-Q260, CF-H260) and high-definition monitors. We excluded patients with a history of inflammatory bowel disease (IBD), CRC, colorectal surgery and patients with a contraindication for biopsy. Patients with prior failed colonoscopy and high suspicion of polyposis syndromes, IBD and typical advanced CRC were also excluded. Basic demographic characteristics including gender, age, indication for colonoscopy, procedure time (morning/afternoon), type of sedation, risk factors of colon polyps including diabetes mellitus, coronary artery disease, body mass index, family history of colon adenoma or cancer, use of aspirin or other nonsteroidal anti-inflammatory drugs, metformin, folic acid, calcium or hormone replacement therapy and alcohol and tobacco use were recorded before colonoscopy. Any complication during the procedure or recovery were also recorded by the staff assistant. Eight physicians from the division of gastroenterology participated in the study, including two senior endoscopists (>20 000 colonoscopies), two midlevel endoscopists (between 3000 and 10 000 colonoscopies) and four junior endoscopists (between 100 and 500 colonoscopies). Each patient was prospectively randomised into two groups by the staff assistant using a digital random number generator before the colonoscopy procedure. In the control group, a routine colonoscopy was performed. In the research group (computer-aided detection (CADe) group), the real-time automatic polyp detection system was used to assist the endoscopist. The system was connected to the endoscopy generator, and the video stream was captured synchronously. Furthermore, the system processed each frame and displayed the detected polyp location with a hollow blue tracing box on an adjacent monitor with a simultaneous sound alarm (figure 1) (see online supplementary file 1). The system was turned on during withdrawal only. The endoscopist focused mainly on the main monitor during the procedure and was prompted to look at the system monitor by the sound alarm. The endoscopist was required to check every polyp location detected by the system. This was performed without the assistance of nurses, trainees or staff assistants. In both groups, the staff assistant recorded the type of colonoscope used (CF-H260/CF-Q260), the insertion time, withdrawal time and Boston Bowel Preparation Scale (BBPS) as described by the endoscopist. When a polyp was detected, the nurse assisted in performing cold forceps biopsy for histology and the staff-assistant recorded the location, size and morphological features according to the Paris classification. In the CADe group, missed polyps by the system and system false alarms were also recorded. A missed polyp was defined as a polyp confirmed by the endoscopist but undetected by the system. A false alarm was defined as a detected lesion, which was continuously traced by the system, deemed by the endoscopist not to be a polyp. It was registered with Chinese Clinical Trial Registry.
Supplementary file 1
All authors had access to the study data and reviewed and approved the final manuscript.
We prospectively designed this study to allow for 80% power or more to detect a 10% difference (30% vs 20%), in adenoma detection rate, between colonoscopy procedures with a two-group χ2 test with a two-sided α level of 0.05. A sample size of 702 participants was needed, and the overall participant enrolment goal was 1130 to allow for potential exclusions or dropouts.
Statistical analysis was performed with R studio V.3.4.0 or higher. Comparison of baseline clinical and demographic characteristics between the CADe and the control group were performed using the χ2 test for categorical variables and using the two-sample t-test for continuous variables. Regarding the adenoma and polyp detection rate, a logistic regression was performed to evaluate the effect of computer assisted diagnosis for colonoscopy on the adenoma/polyp detection rate. The response variable was the binary outcome of whether an adenoma/polyp was detected. The covariate was the group variable indicating whether the patient belonged to the computer assisted group. Regarding the number of detected adenomas and polyps, a Poisson regression was applied to evaluate the effect of computer assisted diagnosis for colonoscopy. A two-sided p value of 0.05 was used as the threshold for statistical significance. In the event of any baseline clinical and demographic characteristics showing a statistically significant difference between the two comparison groups, additional covariate adjusted logistics/Poisson regression models were built to address the possible confounding effect by adding those significant characteristics into the models as covariates. The primary outcome was ADR. The secondary outcomes were PDR, the mean number of polyps detected per colonoscopy, the mean number of adenomas detected per colonoscopy and the rate of false positives and false negatives.
Patient enrolment and baseline data
A total of 1130 consecutive patients were eligible for enrolment. Among these patients, 72 patients (31 in routine group, 41 in CADe group）were excluded during colonoscopy due to meeting exclusion criteria (figure 2). A total of 1058 eligible patients were analysed, with 536 patients randomised prospectively into the control group and 522 into the CADe group. Baseline characteristics are presented in table 1 (see online supplementary file 2). There were no statistically significant differences between the two groups in terms of demographic data and adenoma detection risk factors. There were no complications reported. The total withdrawal time in the control and CADe group were 6.39 min and 6.89 min (p<0.001), respectively. Two hundred and twenty-nine more biopsies were performed in the CADe group. Withdrawal times when biopsy time was excluded from analysis were 6.07 min and 6.18 min in the control and CADe groups, respectively (p=0.15).
Supplementary file 2
Polyp characteristics, mean number of polyps per case and PDR
A total of 767 polyps were detected. There were 422 (55.02%) adenomas and 31 (4.04%) sessile serrated adenomas. Overall, 269 polyps (35.07%) were found in the control group and 498 (64.93%) in the CADe group (table 2). The mean number of polyps detected per colonoscopy in the control and the CADe group were 0.51 and 0.97, respectively (p<0.001). There was a 1.89-fold increase in the mean number of polyps detected between the two groups (95% CI 1.63 to 2.192, p<0.001) (table 3). The PDR of the control and CADe group were 0.29 and 0.45, respectively (OR=1.995, 95% CI 1.532 to 2.544, p<0.001) (table 3). There was no statistically significant difference between the two groups in terms of baseline clinical and demographic variables. Thus, covariate adjusted models were not considered to address the potential confounding effect.
Adenoma characteristics, mean number of adenomas per case and ADR
A total of 422 adenomas were detected (table 2). The mean number of adenomas detected per colonoscopy in the control and CADe group were 0.31 and 0.53, respectively (p<0.001). There was a 1.72-fold increase in the mean number of adenomas detected between the experimental and control groups (95% CI 1.419 to 2.084, p<0.001) (table 3). The ADR of the control and the CADe groups were 0.20 and 0.29, respectively (OR=1.61, 95% CI 1.213 to 2.135, p<0.001) (table 3).
The number of detected polyps was significantly higher in the CADe group as compared with the control group when considering non-pedunculated polyps, polyps ranging in size from 0 cm to 1 cm and polyps in all segments of the colon. The number of detected adenomas was also significantly higher in CADe group when considering non-pedunculated polyps, polyps smaller than 0.5 cm and polyps in all segments of the colon with the exception of the caecum and the ascending colon (table 3).
Outcomes in excellent bowel preparation (BBPS ≥7)
In the situation of excellent bowel preparation, ADR in the CADe group showed a trend of 6% increase superior to that of the routine group. However, due to the inadequate sample size of the subgroup analysis, it failed to show a statistically significant difference. Other outcomes, including the mean number of detected adenomas, mean number of detected polyps and PDR were all significantly increased in the CADe group (table 4).
CRC is a major public health issue given its high incidence and mortality rate. Furthermore, a recently published study reported a marked increase in the annual percentage change in the incidence of CRC among young adults.19 The mortality rate and incidence of CRC in adults have drastically decreased (by 51% and 32%, respectively) in the last half century, mainly as a result of CRC screening and the removal of adenomatous polyps.20 Screening colonoscopies have also allowed for an increase in 5-year survival rate in CRC, a consequence of early detection as well as removal of precancerous adenomas.20 Given the impact of colonoscopy on CRC incidence and mortality, technological advances such as full spectrum colonoscopy, retrograde viewing accessories, balloon colonoscope and endocuff-assisted colonoscopy have been introduced to improve the ADR by expanding the visual field.21–25 Methods aimed at improving the quality of colonoscopies such as minimal withdrawal time, split-dose bowel preparation and retroflexion in the right colon have also increased the ADR.26–30 Despite these advancements, polyps can still be missed. Studies have also reported that some polyps are missed by the endoscopist despite being within the visual field.31 32 Several hypotheses have been proposed to explain the mechanism by which polyps may be missed. These include differences in endoscopist skill level, differences in endoscopist tracking patterns, ‘inattentional blindness’, wherein an observer fails to process an image on the screen due to distraction, and ‘change blindness’, wherein changes are missed during interruptions in visual scanning or during eye movements13.33–37 Distraction caused by fatigue or emotional factors may also contribute. A second party such as a nurse or a trainee observing may improve PDR. While several studies have shown that this increases PDR, controversy remains regarding ADR.13–15 It is likely that adding additional human observers beyond one would not completely overcome these limitations.
With recent breakthroughs in artificial intelligence, especially with deep learning in computer vision, computer-aided diagnosis (CADx) for polyps during colonoscopy has drawn increased attention and has been shown to allow histological classification of colon polyps.38 39 Though optical biopsy remains a promising field, tissue biopsy remains the gold standard, and the accuracy of AI diagnosis in optical biopsy depends on how much surface microstructures can reflect the histologic features of a lesion. In the case of a missed lesion, no further diagnosis can be made, accounting for the ongoing research in the field of computer-aided detection (CADe) in gastrointestinal (GI) endoscopy, which aims to increase adenoma detection automatically during white light endoscopy in real time.
High fidelity and consistent automatic colon polyp detection has been an attractive research topic for the past decade, with the aim of an increased ADR. However, to our knowledge, the current technology has yet to yield a sufficient diagnostic performance in order to be considered for clinical application.40 41 In order for an automatic polyp detection system to be considered for real-world clinical application, it must have a very high sensitivity and specificity, a sufficient real-time standard processing time and an onscreen alerting system.41 42 An inadequate specificity would create a number of false positives. Conversely, an inadequate sensitivity would not increase the PDR. Moreover, for real-time detection to be efficient, the time of analysis must be fast, with no noticeable delay to the endoscopist. As a result of these prerequisites, most current studies on automatic polyp detection are small-scale, non-clinical investigations, though with rapidly increasing interest in the field and with the emergence of deep learning, dramatic advances are expected in the coming years.41
In this study, a significant increase in ADR, PDR and mean number of polyps and adenomas per colonoscopy were found in the CADe group as compared with the control group; however, the increase in overall adenoma detection was mainly due to an increase in diminutive adenomas. Most diminutive adenomas detected by the CADe system were, however, smaller, which supports the conventional view that small polyps are more likely to be missed within the visual field rather than bigger and more prominent polyps. Although diminutive adenomas confer less risk for malignancy compared with larger adenomas, the increase in overall adenoma detection rate may eventually contribute to a decreased risk of interval CRC. Further studies should address the role of CADe on decreasing interval cancer, which is the main goal of any screening colonoscopy.9
The results also showed a major increase in the detection of diminutive hyperplastic polyps, which may represent additional unnecessary polypectomies and add to workload. In the future, this CADe system may be combined with a CADx system to support a detect, diagnose and disregard43 strategy to avoid excessive workload.
High reproducibility, fidelity and uniformity of such a CADe system are advantages compared with human assistance. However, direct comparison between the automatic polyp detection system and medical staff assistance of differing experience levels is also worthy of further investigation.
The reason why the system may have failed to assist the endoscopist in detecting more adenomas in the caecum and ascending colon could be due to the higher instability of the colonoscope in those areas, thus reducing the visual field. Moreover, there is also no significant difference in rectum possibly due to the good visualisation and stability of colonoscope in this segment.
This study is the first prospective, randomised controlled trial using a high-performance CADe system based on deep learning to assist endoscopists on detecting colon polyps with a large number of enrolled patients. The results indicate that previously unrecognised polyps may effectively be addressed by an artificial intelligence system. However, polyps that remain outside the visual field are still a major issue not addressed by the current CADe system. Both unrecognised and non-visualised polyps may be addressed together with a combination of different technologies in the future.
This study has several limitations. First, given the inability to blind the endoscopist, the exact contribution of the system may be difficult to assess. The very act of being observed may have also affected ADR in the experimental group because of ‘competitive spirit’.13 This mechanism may explain a potential confounder in the CADe group in that the endoscopist may have been more attentive in the setting of known observation. In this study, we subtracted the time of biopsy procedures from each corresponding withdrawal time as an indirect marker of attentiveness (table 1). The withdrawal time was similar (6.07 min vs 6.18 min, p=0.15) between the two groups, which may represent similar observation attentiveness. False alarms of the CADe system, 0.075 instances per colonoscopy, did not contribute to longer withdrawal time.
In the future, double-blind studies could be designed to investigate the exact contribution of this system in the increased adenoma detection rate. Such a study may also help determine whether a polyp is detected simultaneously by both the endoscopist and the system or initially missed by the endoscopist, a question that the current study was not designed to address.
The second limitation is the lack of external validity. The baseline adenoma and polyp detection rates of this study were not as high as reported from Western countries.13 44–48 Multiple factors may contribute to this finding, including genetic, dietary, lifestyle and habitus differences between Chinese and Western populations, as well as differences in the morbidity of colon polyps/adenomas among the two groups. Therefore, the results of this study may not be generalisable to areas of the world where baseline ADR is higher. Further studies are needed to investigate the adaptability and effectiveness of this system in such areas.
Third, though false-positives rates were low, some false positives were unexpected by the designers of the system and occurred due to detection of medication capsules, of local sites of bleeding or of undigested debris, causing potential distraction during the procedure. This might be corrected by adding sufficient training data to the current system.
Fourth, this study did not control the fatigue level of participating endoscopists, which could be an independent factor that affects ADR. Future studies are needed to investigate the effectiveness of this CADe system on different fatigue levels.
Fifth, because of the inadequate sample size of colonoscopies performed by junior endoscopists, further studies are needed to show the role and effectiveness of this CADe system in different levels of training.
Lastly, the study was conducted using Olympus colonoscopy equipment. Thus, the adaptability of the system on equipment manufactured by other companies should also be explored.
In conclusion, this study shows that a real-time CADe system based on deep learning led to significant increases in both colorectal polyp and adenoma detection rates in a low prevalent ADR region. Given its high accuracy, fidelity and stability, the current CADe system is potentially applicable in current clinical practice for better detection of colon polyps.
Contributors PW and XL contributed to study concept and design. PW, XL, LL, PL, XX, YS, DZ, GX, MT and YL contributed to acquisition of data. PW, TMB, JRGB, SB and AB contributed to analysis, interpretation of data, drafting of the manuscript and statistical analysis. All authors read and approved the final manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Disclaimer The CADe system was developed by Shanghai Wision AI Co, Ltd., and was provided free-of-charge for the purposes of this study. Employees in the company were not involved in the clinical trial in any way, including in study design, statistical analysis or manuscript writing
Competing interests None declared.
Ethics approval This study was approved by the Institutional Review Board of Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital. It was registered with Chinese Clinical Trial Registry (ChiCTR) under identifier ChiCTR-DDD-17012221.
Provenance and peer review Not commissioned; externally peer reviewed.
Patient consent for publication Not required.