Article Text

Download PDFPDF

New artificial intelligence system: first validation study versus experienced endoscopists for colorectal polyp detection
  1. Cesare Hassan1,
  2. Michael B Wallace2,
  3. Prateek Sharma3,
  4. Roberta Maselli4,
  5. Vincenzo Craviotto4,
  6. Marco Spadaccini4,
  7. Alessandro Repici4
  1. 1Endoscopy Unit, Nuovo Regina Margherita Hospital, Rome, Italy
  2. 2Endoscopy Unit, Mayo Clinic, Jacksonville, Florida, USA
  3. 3Endoscopy Unit, Veterans Affairs Medical Center and University of Kansas, Lawrence, Kansas, USA
  4. 4Endoscopy Unit, Istituto Clinico Humanitas, Rozzano, Italy
  1. Correspondence to Dr Marco Spadaccini, Humanitas University, Rozzano 20089, Italy; marco.spadaccini{at}humanitas.it

Statistics from Altmetric.com

Message

To improve colorectal polyp detection, a new artificial intelligence (AI) system (GI-Genius, Medtronic) was trained and validated on a dataset of white-light endoscopy videos from a high-quality randomised controlled trial in comparison to the detection rate and the reaction time (RT) on a lesion basis (n=337/338, sensitivity: 99.7%); false-positive frames were seen in less than 1% of frames from the whole colonoscopy. The RT was faster by AI system as compared with endoscopists in 82% of cases (n=277/337; difference 1.27+3.81 s). This promising system will be tested in clinical studies.

In more detail

Despite its efficacy in colorectal cancer prevention, colonoscopy is affected by a high miss rate of neoplastic lesions and unacceptable variability in adenoma detection rate among individual endoscopists.1 2 By aiding polyp detection on colonoscopy images (figure 1), artificial intelligence (AI) can reduce performance variability. AI algorithms for object detection usually comprise a convolutional neural network trained using as ground truth images annotated by experts. Once trained, these AI systems are able to detect and pinpoint objects in real time, such as colorectal polyps. A new AI system (GI-Genius, Medtronic) was trained and validated using a series of videos of 2684 histologically confirmed polyps from 840 patients who underwent high-definition white-light colonoscopy as part of a previous randomised controlled study with centralised pathology.3 A total of 1.5 million images showing these polyps from different perspectives were extracted from videos and manually annotated by expert endoscopists. For the purpose of the study, patients were randomised between a validation group and a training group. In detail, for the validation phase, 338 polyps (168/338 adenomas or sessile serrated adenomas, 49.7%) from 105 patients were used. For each of these polyps, a video clip was cut starting 5 s before polyp appearance and ending when the snare/biopsy forceps appeared. To assess sensitivity, a true-positive per lesion (TPL) was defined when the target polyp, irrespective of the histology, was detected by AI in at least one frame, otherwise being a false negative per lesion (FNL). To assess the impact of false positives on the procedure, a metric per frame was defined as activation noise (AN). For AN, two experienced endoscopists in rotation reviewed all the frames in the full-length colonoscopy videos where the algorithm marked a detection, discriminating between false and true positives. AN for a given video was defined as the ratio between the number of false-positive frames and the total number of frames in the full-length video. To quantify the aptitude of AI to detect a polyp before or after the endoscopist’s first perception of the same polyp, a metric per polyp was defined as reaction time (RT). For RT, five expert endoscopists were asked to observe each of the 338 video clips, pressing a button as soon as they detected the appearance of a polyp; shame video clips (67 videos not containing polyps) were included to prevent operator bias; baseline RT was measured before starting the experiment by showing a white box appearing suddenly on a dark background (10 measurements for each endoscopist). The earliest detection of each polyp by AI was compared against the mean RT of the five reviewers for the same polyp (corrected for baseline RT. Overall sensitivity per lesion was 99.7% (337 TPL and 1 FNL). Frames with false-positive results were limited to less than 1% of the whole colonoscopy video, corresponding to an AN of 0.9%±0.5%. The AI system anticipated the detection of polyps against the average of the five reference endoscopists in 277/337 (82%) cases. In detail, the difference in RT was −1.27±3.81 s.

Figure 1

Diminutive adenoma of the proximal colon with artificial intelligence-aided detection shown by a green frame.

Comment

According to our data, an AI system is able to virtually detect all the lesions extracted from a high-quality randomised study performed by expert endoscopists with an anticipation of the diagnosis as compared with the human reader in the vast majority of the cases. We also showed that the rate of false-positive results is negligible, showing the high precision of an AI-based algorithm in discriminating between normal mucosa, on one hand, and adenomatous and serrated lesions, on the other.

The nearly 100% sensitivity per lesions is clinically relevant, as it indicates that AI detection occurs irrespective of the shape, size, location and histology of the lesion. On the other hand, miss rate by a human endoscopist has been strictly associated with one or more of these factors.1 As only one every two lesions was rated as adenomatous by centralised pathology in the original study3 (see online supplementary data), AI may result in the useless removal of hyperplastic polyps. Thus, a leave in situ strategy at least for distal non-adenomatous lesions is critical for AI not to increase false-positive rate at colonoscopy. Similarly, the <1% rate of frames with AI-related false-positive results is likely to exclude any relevant detrimental effect of AI on withdrawal time. However, real-life data are needed, as the absolute number of false positives may still be relevant, when considering that, on average, each colonoscopy contains nearly 50 000 frames. One of the main reasons for false positives was folds insufficiently distended, as it usually occurs in the insertion phase.

The anticipation of diagnosis in >80% of cases deserves a special comment in relation to the study design. Despite the general enthusiasm, prospective series are affected by ineludible operator-related bias, as the endoscopist cannot be blinded in most of the cases to the innovative technique. Thus, retrospective reassessment of whole colonoscopy videos, such as those in our series, owns the unique advantage of eliminating all of these subjective biases, objectively exposing independent videos to the new technologies. However, a possible bias that we could not eliminate is the fact that AI had in each video clip a relatively long interval between a few seconds before polyp appearance and the use of snare or another device. Thus, we could not exclude in principle that AI was somewhat facilitated by the fact that the polyp is usually put in a better and close position to the endoscope before resection. However, our additional analysis of the RT, adjusted for the naturally slower human reaction to any visual finding, consistently indicates that AI was faster than a human reader in most of the cases, marginalising any bias in our study.

Faster detection by AI as compared with a human reader is also clinically relevant. An AI that would only merely confirm a diagnosis already done by the endoscopist would be disappointing, as he would not perceive any additional benefit of AI for polyp detection. On the other hand, the endoscopist would increasingly trust an aided diagnosis when it orientates the polyp detection. When considering the artificial setting of our study, however, the additional benefit of AI on adenoma detection in real life must be assessed for both high and low detectors.

The clinical implications of our findings are relevant when assuming that most of the adenoma miss rates at colonoscopy, as well as variability in adenoma detection rate across endoscopists, are related to perceptual errors. There is convincing evidence that individual endoscopists routinely fail to recognise polyps actually visible in the monitor.4 5 Limitations in human visual perception and other human biases, such as fatigue, distraction and level of alertness during examination, increase such perceptual errors, and AI appears as the best way of mitigating them. In addition, a possible learning effect of AI on a human endoscopist cannot be excluded, especially for non-polypoid and subtle lesions. On the other hand, AI cannot compensate for lesions missed for a suboptimal exploration of colorectal mucosa. Thus, an adequate level of cleansing, a longer than 6 min withdrawal time and a good withdrawal technique remain prerequisites to maximise the performance of AI.6 We limited our study to white-light high definition colonoscopy. However, AI systems are generally very robust and, therefore, a possible good performance in other settings cannot be excluded. Otherwise, we cannot exclude that other lights, such as blue-based chromoendoscopy, can have a synergistic effect with AI for polyp detection.

Differently from radiology procedures, colonoscopy and endoscopy, in general, are real-time procedures requiring complex analysis of millions of frames without the opportunity to review them afterwards. In most of the cases, the clinically relevant lesions are limited to a relatively few number of frames. By performing a real-time analysis, AI appears as a necessary aid for colorectal polyp detection, and it is likely to be incorporated in a short time in our clinical practice.

References

View Abstract

Footnotes

  • Contributors CH, MBW, PS and AR: substantial contributions to the conception or design of the work, or the acquisition, analysis or interpretation of data; final approval of the version published. All authors: drafting the work or revising it critically for important intellectual content; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests CH, AR and MBW: consultancy for Medtronic.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; internally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.