Background In general, academic but not community endoscopists have demonstrated adequate endoscopic differentiation accuracy to make the ‘resect and discard’ paradigm for diminutive colorectal polyps workable. Computer analysis of video could potentially eliminate the obstacle of interobserver variability in endoscopic polyp interpretation and enable widespread acceptance of ‘resect and discard’.
Study design and methods We developed an artificial intelligence (AI) model for real-time assessment of endoscopic video images of colorectal polyps. A deep convolutional neural network model was used. Only narrow band imaging video frames were used, split equally between relevant multiclasses. Unaltered videos from routine exams not specifically designed or adapted for AI classification were used to train and validate the model. The model was tested on a separate series of 125 videos of consecutively encountered diminutive polyps that were proven to be adenomas or hyperplastic polyps.
Results The AI model works with a confidence mechanism and did not generate sufficient confidence to predict the histology of 19 polyps in the test set, representing 15% of the polyps. For the remaining 106 diminutive polyps, the accuracy of the model was 94% (95% CI 86% to 97%), the sensitivity for identification of adenomas was 98% (95% CI 92% to 100%), specificity was 83% (95% CI 67% to 93%), negative predictive value 97% and positive predictive value 90%.
Conclusions An AI model trained on endoscopic video can differentiate diminutive adenomas from hyperplastic polyps with high accuracy. Additional study of this programme in a live patient clinical trial setting to address resect and discard is planned.
- colorectal adenomas
- endoscopic polypectomy
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Significance of this study
What is already known on this subject?
Using new imaging modalities such as narrow band imaging, endoscopists have studied the potential for a ‘resect and discard’ strategy for management of diminutive colorectal polyps.
Experts have good results in general but community endoscopists fall short of Preservation and Incorporation of Valuable endoscopic Innovations (PIVI) guidelines.
Artificial intelligence (AI) is rapidly growing and shows promise in performing optical biopsy.
What are the new findings?
We show, using a type of AI known as deep learning, that our model is accurate at 94% in differentiating diminutive adenomas from hyperplastic polyps on unaltered videos of colon polyps.
Our model operates in quasi real-time on such videos, with a delay of just 50 ms per frame.
How might it impact on clinical practice in the foreseeable future?
If validated in planned clinical trials with patients during live procedures, this AI platform could accelerate the adoption of a ‘resect and discard’ strategy for diminutive colorectal polyps.
Endoscopists combine their knowledge of the spectrum of endoscopic appearances of precancerous lesions with meticulous mechanical exploration and cleaning of mucosal surfaces to maximise lesion detection during colonoscopy. An extension of detection is endoscopic prediction of lesion histology, including differentiation of precancerous lesions from non-neoplastic lesions, and prediction of deep submucosal invasion of cancer.1 2 Image analysis can guide whether lesion removal is necessary and direct an endoscopist to the best resection method.1–3
Image analysis during colonoscopy has achieved increasing acceptance as a means to accurately predict the histology of diminutive lesions,4 5 which have minimal risk of cancer,6 so that these diminutive lesions could be resected and discarded without pathological assessment or left in place without resection in the case of diminutive distal colon hyperplastic polyps.3 Discarding most diminutive lesions without pathological assessment has the potential for large cost saving with minimal risk.7 8
Unfortunately, both lesion detection during colonoscopy9–12 and image assessment of detected lesions during colonoscopy to predict histology13 14 are subject to substantial operator dependence. Thus, using virtual chromoendoscopy, experts have been able to exceed the accuracy threshold for polyp differentiation recommended to permit resect and discard,4 but performance by community based physicians has been variable and in some cases below accepted performance thresholds.13 14
Accordingly, different initiatives were developed to investigate cost-effective approaches to derive qualitative histological information from endoscopic images, also referred to as optical biopsy. Originating from such effort, a sound body of evidence suggests that a simple narrow band imaging (NBI)-based classification system, the NBI International Colorectal Endoscopic (NICE) classification, could enable differentiating hyperplastic from adenomatous polyps (including diminutive polyps). The NICE classification scheme was designed to enable trained endoscopists to recognise visual cues such as colour, presence of vessels and surface patterns and to be readily applicable in routine practice without optical magnification by endoscopists without extensive experience in endoscopic imaging, chromoendoscopy or pit-pattern diagnosis.1 However, NICE is not perfect and does not, for example, address the issue of sessile serrated polyps (SSPs), which is clearly problematic in the efforts to deliver ‘true’ optical biopsy. Attempts have been made to address this problem, such as the workgroup on serrated polyp and polyposis (WASP) scheme from the Dutch Workgroup on serrated polyp and polyposis (WASP).15 This is based on the NICE classification, but serrated lesions have now been added in the WASP classification. However, WASP also has its limitations, and very reliable optical biopsy continues to prove elusive in general usage.
The National Institute of Health and Care Excellence in the UK has very recently published evidence-based recommendations in an online document stating that virtual chromoendoscopy using NBI, Fuji Intelligent Chromo Endoscopy or Pentax i-SCAN is recommended to assess polyps of 5 mm or less during colonoscopy, instead of histopathology, to determine whether they are adenomatous or hyperplastic, only if high-definition enabled virtual chromoendoscopy equipment is used, the endoscopist has been trained to use virtual chromoendoscopy and accredited to use the technique under a national accreditation scheme, the endoscopy service includes systems to audit endoscopists and provide ongoing feedback on their performance and, importantly, the assessment is made with high confidence (https://www.nice.org.uk/guidance/dg28).
A potential solution to mitigate both the variability in endoscopic detection and histology prediction is to apply computerised image analysis to deliver computer decision support solutions. Recent studies have successfully used automatic image analysis techniques to accurately predict histology based on images captured with endocytoscopy16 and magnification endoscopy17 or to improve lesion detection.18 Studies using traditional machine learning16 17 have the limitations inherent to hand-crafted feature extraction, guided by the desire to ‘visually capture what is seen’ and are inherently limited by such. Considerable hand-engineering of imaging features is required for presentation to a support vector classifier. In addition, previous work has focused on high magnification endoscopy,16 17 which is not commonly used in clinical practice.
The European Society of Gastrointestinal Endoscopy (ESGE) published a technology review in 2016 in relation to advanced endoscopic imaging.19 In this comprehensive review, the topic of decision support tools, and computer-aided diagnosis (CAD) was covered, with questions from this paper around the role of CAD assistance in training for optical diagnosis or whether such systems would initially be a ‘second reader’ to support the endoscopist’s diagnosis. The ESGE committee went on to further state that ‘the stand alone use of such systems to completely replace clinical judgment for decision making would require a much higher diagnostic performance and additional safeguards’ but that ‘availability of CAD combined with advanced endoscopic imaging is likely to emerge in clinical practice in the next few years’.
More recently, a field of artificial intelligence known as deep learning has opened the door to more detailed image analysis and real-time application by automatically extracting relevant imaging features, departing from human perceptual biases. Deep learning20 21 is an umbrella term for a wide range of machine learning models and methods, typically based on artificial neural networks,22 23 which aim at learning multilevel representations of data useful for making predictions or classifications. In particular, the development of deep convolutional neural networks (DCNN) has transformed the field of computer vision.23 24 In contrast to even recently published work in the gastroenterology literature, the DCNN approach in our study works in almost real-time with raw, unprocessed frames from the video sequence captured from the endoscope. In this study, we used a DCNN to train a deep learning-based AI model to differentiate conventional adenomatous from hyperplastic polyps. We tested the model on unaltered videos of 125 consecutively identified diminutive polyps with proven histology.
We developed a deep learning-based AI model for real-time assessment of endoscopic video images of colorectal polyps. We used stored videos of unaltered endoscopic polyps provided by DKR to train the model. The videos used were available from a previous study and were deidentified, and hence the institutional review board waived review of this current study. We trained the model on videos containing NBI segments only of colorectal polyps captured with 190 series Olympus (Olympus Corp, Center Valley, Pennsylvania, USA) colonoscopes. All polyps were first detected in the normal ‘far focus’ mode. Then the colonoscope was moved close to the polyp and the near focus mode activated. There was no effort to video the polyp in the near focus mode for a set time interval. Once clear views had been subjectively obtained in the near focus mode, the polyp was resected and retrieved. The training video sequences comprised polyps of all size ranges including many polyps >10 mm in size, were previously de-identified and sorted by their respective pathology but were not of consecutive polyps. Videos of normal mucosa containing no polyp were also used to train the model. All recordings were made using high-definition Olympus video recorders.
The NICE classification1 was used as the foundation for training the deep learning programme in association with the endoscopic video images. A DCNN model was used in this study. A convolutional neural network (CNN) is a type of artificial neural network used in deep learning and has been applied by several groups to analysis of visual imagery. CNNs incorporate very little preprocessing in comparison with other image classification algorithms, and these networks (such as used in our study) learn the filters that were previously hand-engineered in more traditional algorithms. This independence from prior knowledge and human effort in feature design represents a significant advantage of neural network models over other types of machine learning.25 26 The DCNN model used is based on the inception network architecture24 (figure 1). Following standard procedure with DCNNs, model training was carried out with stochastic gradient descent from randomly initialised weights to minimise a frame-level cross-entropy loss function. To construct each mini batch of 128 frames during training, frames were randomly selected from the training set such that they were approximately balanced across classes and source video. For each frame, we applied a data augmentation procedure to create a richer diversity of frames by a random resizing and cropping of the frame, followed by a random flipping along either axis. Training stopped when the loss started increasing on an independent validation set.
Each frame was reviewed according to the multiclass model under consideration, by medical students and GI fellows (figure 2).
Frames (NBI only) used to train the model were split equally between relevant multiclasses. The processing time of our DCNN model required 50 ms per frame on a PC with an NVIDIA graphics processing unit.
The DCNN model allowed essentially real-time analysis of endoscopic polyp videos and calculates a probability that a polyp is a conventional adenoma or a serrated class lesion. The probability of a hyperplastic or adenomatous polyp (NICE types 1 and 2) is displayed immediately on each endoscopic video image.
To give a sense of how the system operates in real-time, the model builds a credibility score by analysing how the NICE class predictions fluctuate across successive frames (figure 3). The idea is to mimic the human perceptual system that promotes longitudinal coherence over short-lived information in order to provide clinically relevant information to the endoscopist. Accordingly, the credibility is also updated in real-time, in a form of exponential smoothing: credibility(t)=alpha * credibility(t−1) + (1−alpha) * update(t), where update(t) is an indication of whether the model’s predictions have changed between frames at t−1 and t, and alpha is a parameter between 0 and 1 that is found by searching over a validation set. If the credibility is below 50%, the model is considered to have insufficient confidence to make a prediction. Videos with such a low credibility score were excluded from all accuracy calculations in a manner equivalent to a low confidence interpretation by an endoscopist. The process for determining confidence is quantitative and reproducible.
The training, validation, and final testing sets of endoscopic videos of polyps had no overlap. All frames were in NBI only and were a mixture of normal focus and near focus. For the training set, we used 223 polyp videos (29% NICE type 1, 53% NICE type 2 and 18% of normal mucosa with no polyp), comprising 60 089 frames. For the validation set, we used 40 videos (NICE type 1, NICE type 2 and two videos of normal mucosa). The final test set included 125 consecutively identified diminutive polyps, comprising 51 hyperplastic polyps and 74 adenomas. Overfitting can be a concern in such studies. To address this, we ensured that all test images were from a completely separate dataset, never seen by the model in the training or validation phases, such that the reported results can represent the expected out-of-sample accuracy.
After training the model with the high-definition videos, the validation dataset was used to modify/fine-tune the ‘hyper-parameters’ for the AI system architecture (number of layers in the neural network, size of such layers, and so on). We then tested the model’s accuracy on a consecutive sample of diminutive (≤5 mm) colorectal polyps that were video recorded in NBI and resected by DKR for histological analysis. DKR recorded the test videos for prospective use in another trial.27 All videos were fully deidentified.
The video recordings for the test set were typically 10–20 s in length (median 16 s). All included normal and near focus imaging and at least one short frozen segment when a photograph was taken (the Olympus 190 video image freezes briefly when a photograph is taken). Each polyp was resected using cold methods (snare or forceps) and submitted to pathology separately.
Conventional adenomas were lesions that were dysplastic (high or low grade) and further characterised as tubular, tubulovillous or villous. All NICE type 1 lesions in the study were hyperplastic polyps by pathology (sessile serrated polyps (SSPs) were excluded). In the test set, polyps that were reported as normal tissue at pathology were not recut. We used the pathology report provided for routine patient care. Pathologists at Indiana University can access endoscopy reports at their discretion, and these reports include photographs of lesions in many cases. However, to our knowledge, none of the pathologists routinely access the reports and none of the pathologists are trained in endoscopic prediction of polyp pathology. Furthermore, none of the reports contained verbal descriptions of the endoscopist’s prediction of histology. The pathologists at Indiana University use widely accepted terminology to describe colorectal polyps, including the descriptors tubular, tubulovillous, and villous as well as low or high grade dysplasia for conventional adenomas. Serrated class lesions are characterised as traditional serrated adenomas, hyperplastic polyps or SSPs, without or with cytological dysplasia according to criteria recommended by a National Institute of Health consensus panel.28
In the testing dataset, a total of 158 consecutive diminutive polyps were identified, video-recorded, resected, and submitted for pathological examination by DKR. Thirty polyps were excluded from the study because the pathological report was SSP (n=3), normal tissue or lymphoid aggregate (n=25) or faecal material (n=2); one video was excluded because it was corrupted and two had frames with multiple polyps.
Accordingly, 125 polyp videos were evaluated using the AI model. The final pathology of the 125 lesions was 51 hyperplastic polyps and 74 adenomas. Of these, the model did not build enough confidence to predict the histology in 19 polyps, leaving 106 in which the model made a high confidence prediction. Table 1 shows the predictions of the model for these 106 diminutive polyps compared with the histologies of the polyps. Figure 4 shows screenshots of the model as it appears in real-time in the evaluation of NICE type 1 and 2 lesions (unaltered) and for which the model reached high confidence. The video shows the model as it appears during colonoscopy (online supplementary video 1). For the 106 polyps, the accuracy of the model was 94% (95% CI 86% to 97%), the sensitivity for identification of adenomas was 98% (95% CI 92% to 100%), specificity was 83% (95% CI 67% to 93%), negative predictive value was 97% and positive predictive value was 90%.
Supplementary video 1
Colonoscopy plays a pivotal role in diagnosis and prevention of colorectal cancer (CRC), which is overall the third leading cause of cancer death in the USA.29 Unfortunately, colonoscopy is technically a highly operator dependent procedure, including detection of adenomas9 10 and serrated lesions11 12 and polyp resection.30 This operator dependence leads to substantial variation between endoscopists in their effectiveness in preventing CRC with colonoscopy,31 32 which is the fundamental goal of most colonoscopies. Increasingly, clinicians are advised to make quality measurements,33 and clinical trials address educational and technical adjuncts that could improve detection.34
Although not all aspects of colonoscopy performance are currently amenable to reduction in performance variability by use of software, detection of polyps and prediction of histology are two aspects of performance that could potentially be enhanced by imaging analytics. In this study, we showed that an AI model trained in polyp differentiation could accurately identify whether consecutive diminutive polyps were conventional adenomas with an overall accuracy of 94%. For conventional adenomas verified by pathology, the sensitivity, specificity, positive predictive value and negative predictive value were 98%, 83%, 97%, and 90%, respectively. Acknowledging that our study is on unaltered videos rather than live patients, nonetheless, the AI model performed as well as experts typically perform using the NICE criteria and better than many community endoscopists have performed.4 Furthermore, the computer analysis of histological prediction is available in almost real-time (delay of 50 ms per frame). If the model’s accuracy is verified in prospective clinical trials, it could revolutionise the management of diminutive colorectal polyps by essentially enabling the ‘resect and discard’ and ‘leave distal colon hyperplastic polyps in place’ paradigms to be accurately executed by both academic and community colonoscopists.
In this study, we apply deep learning to the real-time challenge for polyp differentiation into NICE types 1 and 2, using non-magnification colonoscopy, and most importantly where computer decision support is provided in real-time on unaltered endoscopic video streams. Previous studies of computer decision support for colorectal polyps have used magnifying colonoscopes17 or endocytoscopy,16 both of which are rarely available in the USA or Europe, and while acknowledging the great work of these investigators in this field, our DCNN approach is very different. Our model works with unprocessed frames and can operate in quasi real-time, with a frame processing time of 50 ms on consumer-grade hardware. Our model also works regardless of the polyp location in the frame (the operator does not need to precisely locate the polyp in the middle of the frame). The DCNN is trained end-to-end, meaning that the complete image preprocessing and classification task is solved within the same learning procedure, resulting in a much more robust model than previous work16 17 that consisted of hand-specified preprocessing followed by a trainable classifier. In the broader computer vision community, the end-to-end training of DCNNs has been, since 2013, systematically overtaking hand-engineered features and support vector classification. Ongoing work will determine if such an AI-based clinical decision support system could aid in the widespread adoption of a ‘resect and discard’ strategy.
Limitations of this study are several. These include collection of the videos by a single operator who is also a recognised expert colonoscopist and the use of video recordings rather than real-time assessments of polyps. However, we expect that the ability to manoeuvre and stabilise the instrument to allow stable imaging of colorectal polyps in focus will be achievable by colonoscopists with a wide range of skills. Even though of course this is not ‘clinical’ real-time in that we have not yet used this model in an actual patient setting, as mentioned in detail above, the testing dataset is raw, untouched, colon polyp screening footage, and our AI model performs in almost real-time (50 ms delay).
In addition, 19 of the 125 videos (15%) of consecutive diminutive polyps in the test set were excluded by the AI model, because it did not develop at least 50% confidence in the diagnosis. This low confidence determination by the model is analogous to a low confidence interpretation by an endoscopist.3 4 There was no particular trend for a certain type of histology or morphology in polyps where there was not enough confidence generated by the model. In a resect and discard paradigm, a polyp that the model could not generate >50% confidence in a diagnosis would be resected and sent to pathology. However, the videos used to train and test the model in this study were not originally recorded for the purpose of this study. In this retrospective video dataset, in some cases, images were blurred, or only very partial views were obtained, and the confidence in prediction was low. When the model is used in a true live patient scenario, an endoscopist would be able to move the colonoscope tip and change the image in an attempt to allow the model to build up its confidence. This is essentially no different than what happens in day-to-day practice right now where endoscopists spend additional (few seconds) time looking more closely at a ‘possible’ polyp, washing the lens and so on. Thus, in actual clinical practice, the fraction of polyps with low confidence ratings may be lower than observed in this study. Such a clinical study is currently being planned.
The NICE classification system has been criticised for not incorporating sessile serrated adenomas (SSAs), and the WASP schema has been suggested as an improvement as it incorporates SSAs.15 We chose traditional adenomas and hyperplastic polyps, and the NICE classification, for this study as a proof of concept. Our AI algorithm is pathology agnostic. It is important to point out that we do not specify ‘a priori’ any imaging features that may distinguish between type 1 and type 2 polyps, for example. We never code within the system any features that can help distinguish between types 1 and 2. Our model discriminates this from raw pixels and nothing more. A ‘binary’ decision could easily be a ‘three way or a four way or more’ decision, determining if a polyp is a traditional adenoma, a benign hyperplastic polyp or a SSA, a lymphoid aggregrate or normal tissue, for example. The only difference in such a model would be the composition of the training dataset. We are already collecting datasets to work on this clinical question of SSAs, but in our current work, we chose two polyp classes with our novel AI approach. Furthermore, there is currently great difficulty in studying AI or any other endoscopist method to identify SSP/SSA because the pathology gold standard is subject to marked interobserver variation in differentiation from hyperplastic polyps. Despite this limitation, the AI programme described here could still be used to support a resect and discard paradigm for diminutive polyp management because experts in this field are now endorsing a strategy of resect and discard for diminutive adenomas anywhere in the colon, identify and leave in place NICE type 1 lesions in the rectosigmoid (which are hyperplastic in >98% of cases), and resect and submit to pathology for NICE type 1 lesions proximal to the sigmoid (to allow the opportunity to identify SSAs by pathology in these lesions).
Incorporation of AI into widespread community clinical use will be challenging. Any new technology will have to have minimal impact on the workflow of the endoscopist and also not be distracting with its onscreen presence. The ‘form factor’ for incorporation of AI into clinical endoscopy will be crucial to its adoption and safe use. There will also be significant regulatory and reimbursement hurdles to overcome before artificial intelligence in endoscopy becomes a reality in clinical practice. Gaining the confidence of the physician community will be key to allow said physicians to gain the confidence of our patients that a computer can help the doctor to make a diagnosis or the even bigger challenge of placing trust in the exclusive decision making of a computer.
For the future, a similar deep learning approach also holds substantial potential to facilitate detection by highlighting areas of possible adenomatous or serrated mucosa for close inspection by the endoscopist. In addition, the strategy followed here for training the programme to differentiate hyperplastic polyps from adenomatous could potentially be used to improve diagnostic assessment of a variety of endoscopic images, addressing clinical problems such as identification of dysplasia in Barrett’s oesophagus and detection of intestinal metaplasia and dysplasia in the gastric mucosa. Furthermore, alternative endoscopic images such as confocal laser and endocytoscopy can potentially be used to train this platform to provide automatic interpretation of clinically acquired images.
In summary, we have demonstrated that an AI model can achieve high accuracy in sorting diminutive colorectal polyps into conventional adenoma versus hyperplastic polyps when used on unaltered colon polyp video sequences. We are planning clinical trials to evaluate the potential of this imaging analytics AI technology in day-to-day practice.
Contributors Study conception and design: MFB. Drafting the manuscript: MFB and DKR. Data analysis: all authors. Development of the artificial intelligence model: NC, FS and FC. Video recording: DKR. Critical revision of the manuscript: all authors.
Funding This work was primarily supported by ’ai4gi', a joint venture between Satis Operations Inc and Imagia Cybernetics.
Competing interests MFB: CEO and shareholder, Satis Operations Inc, ’ai4gi’ joint venture; research support: Boston Scientific. NC: Imagia shareholder, ‘ai4gi’ joint venture. FS: Imagia shareholder, ‘ai4gi’ joint venture. CO: Imagia shareholder, ‘ai4gi' joint venture. FC: Imagia shareholder, ’ai4gi' joint venture. DKR: consultant: Olympus Corp and Boston Scientific; research support: Boston Scientific, Endochoice and EndoAid.
Ethics approval Indiana University.
Provenance and peer review Not commissioned; internally peer reviewed.