Article Text

Download PDFPDF
Artificial intelligence and colonoscopy experience: lessons from two randomised trials
  1. Alessandro Repici1,2,
  2. Marco Spadaccini1,2,
  3. Giulio Antonelli3,4,
  4. Loredana Correale2,
  5. Roberta Maselli1,2,
  6. Piera Alessia Galtieri2,
  7. Gaia Pellegatta2,
  8. Antonio Capogreco1,2,
  9. Sebastian Manuel Milluzzo5,
  10. Gianluca Lollo6,
  11. Dhanai Di Paolo7,
  12. Matteo Badalamenti2,
  13. Elisa Ferrara2,
  14. Alessandro Fugazza2,
  15. Silvia Carrara2,
  16. Andrea Anderloni2,
  17. Emanuele Rondonotti7,
  18. Arnaldo Amato7,
  19. Andrea De Gottardi6,
  20. Cristiano Spada5,
  21. Franco Radaelli7,
  22. Victor Savevski8,
  23. Michael B Wallace9,
  24. Prateek Sharma10,11,
  25. Thomas Rösch12,
  26. Cesare Hassan3
  1. 1 Department of Biomedical Sciences, Humanitas University, Milan, Italy
  2. 2 Endoscopy Unit, Humanitas Clinical and Research Center IRCCS, Rozzano, Italy
  3. 3 Gastroenterology and Digestive Endoscopy Unit, Ospedale Nuovo Regina Margherita, Roma, Italy
  4. 4 Department of Translational and Precision Medicine, “Sapienza” University of Rome, Rome, Italy
  5. 5 Digestive Endoscopy Unit, Poliambulanza Brescia Hospital, Brescia, Lombardia, Italy
  6. 6 Department of Gastroenterology and Hepatology, Università della Svizzera Italiana, Lugano, Switzerland
  7. 7 Division of Digestive Endoscopy and Gastroenterology, Valduce Hospital, Como, Italy
  8. 8 Artificial Intelligence Research, Humanitas Clinical and Research Center IRCCS, Rozzano, Italy
  9. 9 Endoscopy unit, Mayo Clinic, Jacksonville, Florida, USA
  10. 10 University of Kansas, Kansas City, Kansas, USA
  11. 11 Endoscopy unit, University of Kansas city, Kansas city, Kansas, USA
  12. 12 Interdisciplinary Endoscopy, University Hospital Hamburg-Eppendorf, Hamburg, Germany
  1. Correspondence to Professor Alessandro Repici, Gastroenerology and endoscopy Unit, IRCCS Humanitas Research Hospital, Rozzano, Lombardia, Italy; alessandro.repici{at}hunimed.eu

Abstract

Background and aims Artificial intelligence has been shown to increase adenoma detection rate (ADR) as the main surrogate outcome parameter of colonoscopy quality. To which extent this effect may be related to physician experience is not known. We performed a randomised trial with colonoscopists in their qualification period (AID-2) and compared these data with a previously published randomised trial in expert endoscopists (AID-1).

Methods In this prospective, randomised controlled non-inferiority trial (AID-2), 10 non-expert endoscopists (<2000 colonoscopies) performed screening/surveillance/diagnostic colonoscopies in consecutive 40–80 year-old subjects using high-definition colonoscopy with or without a real-time deep-learning computer-aided detection (CADe) (GI Genius, Medtronic). The primary outcome was ADR in both groups with histology of resected lesions as reference. In a post-hoc analysis, data from this randomised controlled trial (RCT) were compared with data from the previous AID-1 RCT involving six experienced endoscopists in an otherwise similar setting.

Results In 660 patients (62.3±10 years; men/women: 330/330) with equal distribution of study parameters, overall ADR was higher in the CADe than in the control group (53.3% vs 44.5%; relative risk (RR): 1.22; 95% CI: 1.04 to 1.40; p<0.01 for non-inferiority and p=0.02 for superiority). Similar increases were seen in adenoma numbers per colonoscopy and in small and distal lesions. No differences were observed with regards to detection of non-neoplastic lesions. When pooling these data with those from the AID-1 study, use of CADe (RR 1.29; 95% CI: 1.16 to 1.42) and colonoscopy indication, but not the level of examiner experience (RR 1.02; 95% CI: 0.89 to 1.16) were associated with ADR differences in a multivariate analysis.

Conclusions In less experienced examiners, CADe assistance during colonoscopy increased ADR and a number of related polyp parameters as compared with the control group. Experience appears to play a minor role as determining factor for ADR.

Trial registration number NCT:04260321.

  • colonoscopy
  • adenoma
  • artificial Intelligence
  • colorectal cancer
  • screening

Data availability statement

Data are available upon reasonable request.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known on this subject?

  • A recent computer-aided detection (CADe) system for colonoscopy based on deep learning has been shown to increase detection rate of colorectal neoplasia by expert endoscopists.

  • The influence of examiner expertise versus CADe on the adenoma detection rate (ADR) is yet to be addressed.

What are the new findings?

  • In a randomised trial (AID-2), CADe was able to increase ADR of non-expert endoscopists by 22% as compared to the control group.

  • This was also the case for the adenoma per colonoscopy rate (increase by 21%), mainly due to a higher detection of subcentimetric flat and distal neoplasia.

  • Withdrawal time and rate of non-neoplastic lesions was not influenced.

  • When pooling the two randomised studies with expert and non-expert examiners together in a post-hoc analysis, CADe, but not the level of experience was associated with an increased ADR.

How might it impact on clinical practice in the foreseeable future?

  • Early rather than late integration of CADe for non-expert endoscopists should be considered.

Background

Adenoma detection rate (ADR) is a proxy for the competence of the endoscopist, primarily his sensitivity for detecting colorectal neoplasia, and it has been inversely related with the risk of post-colonoscopy colorectal cancer (CRC).1–5 Such competence is gradually acquired throughout the training period, and it mainly depends on the acquisition of technical and cognitive skills, namely the maximisation of the inspection of the colorectal mucosa and the accurate recognition of polypoid and non-polypoid neoplastic lesions.6–8 Disappointingly, a standardised approach to acquisition and maintenance of these skills is lacking, generating variability in performance across endoscopists.3 7 9 Data on the effect of experience are controversial: in an artificial setting, non-expert endoscopists showed a lower sensitivity for colorectal neoplasia as compared with those expert,10 11 and mostly related to <5 mm subtle lesions.12 In a clinical scenario, the relationship between level of experience and ADR is less clear and has only been scarcely addressed13: for the initial training period, less than 300 colonoscopies or a 6-month training period seem to be enough to achieve an acceptable ADR.14 After that phase, when different levels of expertise are compared, a study on the German colonoscopy screening programme showed that, when comparing those incoming with those leaving the German screening programme, a trend for a higher ADR for incoming endoscopists was found.15

Artificial Intelligence (AI)—also known as computer-aided detection (CADe)—has been shown to increase ADR and adenoma per colonoscopy (APC) by 30% and 46% in a recent randomised trial (AID-1) in expert endoscopists.16 In general, AI would appear ideal to offset the suboptimal skill in lesions recognition by non-expert endoscopists.17 18 However, ADR depends on various other factors such as experience level of the endoscopist. We therefore performed a randomised study in non-expert endoscopists (AID-2) comparing the ADR and related parameters between the CADe and the control group. In a post-hoc analysis, we then compared data from this trial with our previous AID-1 trial with expert endoscopists in otherwise similar setting in order to further assess the multidimensional relationship among CADe, experience and ADR.

Methods

This multicentre, parallel, randomised, controlled, non-inferiority (NI) trial was performed in five centres (Italy: 4 and Switzerland: 1) where only non-expert endoscopists were selected to perform the study colonoscopies. The study was designed as an NI trial in analogy with the previous AID-1 study.16 However, a superiority analysis was planned in the case NI was matched (see below). The study was reported according to the Consolidated Standards of Reporting Trials (CONSORT) for randomised controlled trials (RCTs)19 and was registered on the ClinicalTrial.gov. This was a no-profit study, and no funding was received or solicited, except the loan of the equipment by Medtronic. All authors had access to the study data and reviewed and approved the final manuscript.

Definition of non-expert endoscopist and study population

For the purpose of our study, we defined non-expert endoscopists as those with a lifetime volume of less than 2000 colonoscopies who were eligible to autonomously perform screening colonoscopy and non-complex therapeutic procedures, such as polypectomy or biopsy.16 The reason for such cut-off is to include endoscopists who were not novice, but at the same time with a much less burden of experience as compared with those experienced as defined in our previous trial.16 This is in agreement with guidelines ascribing full colonoscopy and polypectomy competence above 1500 examinations20; on the other hand, there is little data on the association between lifetime experience (ie, years) in colonoscopy and ADR. Detailed data of the participating endoscopists in terms of lifetime procedures, training status, years of experience and yearly procedures is provided in the online supplemental table S1.

Supplemental material

We enrolled 40–80 year-old subjects undergoing colonoscopy for colorectal neoplasia diagnosis, which could be divided into four groups: (1) primary screening colonoscopy (outside of the regional screening programme), (2) work-up following faecal immunochemical test (FIT) positivity (cut-off=20 µg Hb/g faeces) within the national screening programme, (3) post-polypectomy surveillance and (4) work-up for symptoms/signs, called diagnostic colonoscopy. Patients were excluded in case of personal history of CRC, or IBD, previous colonic resection, antithrombotic therapy precluding polyp resection and lack of informed written consent. Those with poor level of cleansing (Boston Bowel Preparation Scale (BBPS): 0) were also excluded. These inclusion/exclusion criteria were the same as in the AID-1 study.16 Before colonoscopy, subjects were randomised in a 1:1 ratio between colonoscopy with or without CADe. Each centre received a list of random numbers generated by the coordinating centre. Randomisation was stratified by gender, age and personal history of adenomas. The operator was not blinded to the study arm assigned to the patient before colonoscopy treatment.

AI (CADe) and colonoscopy

In order to assist the endoscopist in real-time for polyp detection, we used the same CADe (GI Genius, Medtronic) we already adopted for AID-1, that has been previously described in terms of standalone performance measures.16 21 Briefly, such CADe is based on a Convolutional Neural Network that was trained on a series of 2684 histologically confirmed polyps diagnosed in a previous high-quality randomised trial21 22 (figure 1). CADe was active for both insertion and withdrawal phases of the procedure, providing as output a bounding box any time a lesion suspected to be a polyp was recognised by CADe (figure 1).

Figure 1

Computer-aided detection ability to identify and localise one or more adenomatous lesions in real-time colonoscopy. The output appears on the same screen of the endoscopy system without affecting the routine technique of the operator.

For the purpose of the study, high-definition colonoscopies (ELUXEO 700, Fujifilm, Tokyo; EXERA III, Olympus CV-190; Olympus, Tokyo) were used. Bowel preparation was evaluated and graded by the endoscopist performing the exam, using the BBPS;23 a score of 6 and above was considered adequate. Subjects with scores 0 or 1 in any one of the three segments were excluded from the primary analysis. Intubation and inspection time during withdrawal were measured using a stopwatch, pausing during diagnostic (ie, biopsy) and therapeutic interventions and more extensive washing. Endoscopists were required to comply with a minimum of 6 min of inspection on withdrawal. All polyps were classified according to their location, size and morphology according to Paris classification,24 and histology according to Vienna classification.25 An advanced adenoma was defined as an adenoma ≥10 mm and/or with villous component >20%, and/or high-grade dysplasia.25

Outcome measures

The primary outcome was the ADR according to the intervention arm. ADR was defined as the proportion of patients with at least one histologically proven adenoma including carcinoma. Sessile serrated lesions (SSLs) were not computed in ADR calculation as in the previous studies relating ADR with post-colonoscopy cancer risk.1 2 4 5 Secondary outcomes were proximal ADR, total number of polyps detected, (separate) SSL detection rate, mean number of APC, caecal intubation rate and withdrawal time. APC was defined as the mean number of APC. We also defined non-neoplastic resection rate as the proportion of patients with no adenoma or SSL within any excised lesions who had undergone at least one excision with histopathological examination.

Sample size

The sample size was calculated based on the evaluation of primary outcome, that is, the per-patient ADR. A sample size of 322 patients per arm was required, based on the expected ADR of 35% and 37% for unassisted and CADe-assisted colonoscopy, a NI margin of 10%, power of 90% and an alpha level of 2.5% (one-sided). Such 10% NI margin would correspond to the minimally acceptable ADR cut-off of 25%.3

Post-hoc analysis: pooling AID-1 and AID-2 studies

To compare outcome measures according to the levels of experience of the colonoscopists, we pooled the present study (AID-2) with our previous study on expert endoscopists (AID-1, six endoscopists with >2000 examinations) in an otherwise similar setting.16 Briefly, AID-1 was a randomised, multicentre NI trial performed in three centres also participating in the AID-2 study, following the same eligibility criteria, study procedures and exclusion criteria (685 patients; mean ADR, CADe group: 40.4%; control group: 54.8%; relative risk, 1.30; 95% CI, 1.14 to 1.45).16 Additional details on characteristics of the six experienced endoscopists are provided in the Appendix (online supplemental table S1) in order to facilitate the comparison with those non-experienced of the present trial, and to assess more in general the relationship between experience and ADR. We also provided ADR according to indications and study arm in both of studies (Rev#1–2).

Statistical analysis

Data analysis included descriptive statistics computed for continuous variables including means and SDs. Percentages were used for categorical variables. Analyses were based on the intent-to-treat population using all randomised patients who underwent colonoscopy. At patient level, a two-sided 95% CI (one-sided 97.5%) for the difference in ADR between study arms was calculated. NI was met if the lower one-sided 97.5% confidence limits excluded a 10% or greater difference in favour of standard colonoscopy. If NI was demonstrated, the outcomes were assessed for superiority (one-sided p<0.025) using the Fisher’s exact test.

We performed further statistical analyses to examine the effect of CAD on diagnostic performance measures. Specifically, a multiple logistic regression model was used to estimate adjusted relative risk (RR) and its 95% CI with study arm, patient age, gender and indication for colonoscopy as covariates, the presence or absence of one or more adenomas was considered response variable. Univariate logistic regression analyses were performed to explore differences in ADR by size, morphology and morphology between study arms. No adjustment for multiple testing was made, as it was considered analyses beyond the primary comparison (the overall ADR) to be explorative only. To compare the detection of all adenomas (per-polyp analysis) a Poisson regression model was used. Sensitivity analyses using the negative binomial model were performed to take into account overdispersion (variance larger than expected). For all regression analyses, we fitted multilevel models given the hierarchical structure of our data (ie, patients are aggregated at the endoscopist level). Each model was a two-level model in which patient characteristics were included as fixed effects and in which the endoscopist was introduced as a random effect. Multilevel models also allow for assessment of heterogeneity among endoscopists. For this reason, we also fitted two-level models without covariates to calculate the intracluster correlation coefficient (ICC), which is considered as a measure of endoscopist effects on the outcome. When the ICC is negligible (<0.05), one could consider running traditional one-level regression analysis: the observations do not depend on endoscopist cluster. On the contrary, ICC=1 indicates that the observations only vary between clusters. We did not control for clustering within the endoscopy centre.

For the pooled analysis (AID-1+AID-2), adenoma detection was modelled using logistic regression including normally distributed, colonoscopist-specific random effects to account for the correlation among exams performed by the same colonoscopist. Adenoma detection rates were estimated at the median of the random effects distribution. Adjusted, colonoscopist-specific relative performance was measured by a RR with 95% CIs comparing CADe use to no CADe (ie, control arm), adjusting for patient age, patient gender, endoscopist experience and indication for colonoscopy.

All statistical analyses were performed using R software V.4.0.3 (2020-10-10). Unless reported differently, a p value of 0.05 (two-sided) was used as the threshold for statistical significance.

Results

Study population

A total of 678 subjects were considered eligible for the study between February and December 2020. After the exclusion of 18 patients (figure 2), the study cohort included 660 randomised patients (men: 50%, mean age 62±10 years). Of these, 330 were allocated to the CADe arm, and 330 to the control arm; no difference in clinical indication as well as bowel cleanliness between the two arms was found (table 1). Caecal intubation was achieved in all patients after excluding patients with inadequate bowel preparation (figure 2).

Figure 2

Study flow-chart including clinical outcomes. *Relative risk 1.22 (1.04 to 1.40). **Incidence risk ratio 1.21 (1.05 to 1.40). CADe, computer-aided detection; HD, high-definition.

Table 1

Patients’ characteristics according to the intervention arm

Per-patient analysis

In the CADe group, 176/330 patients were diagnosed with at least one adenoma or CRC at colonoscopy as compared with 147/330 patients in the control group, corresponding to an ADR of 53.3% and 44.5%, respectively (figure 3). Compared with the standard colonoscopy, CADe was associated with a difference in proportion of detected adenomas of 8.8% (95% CI: 2% to 17.9%). This means that ADR in the CADe group was non-inferior to the control group (p for NI: <0.01; online supplemental table S2). After adjusting for age, gender and indication, ADR was significantly higher in the CADe as compared with the control group (RR, 1.22; 95% CI: 1.04 to 1.40; p for superiority=0.02; table 2; online supplemental table S3).

Figure 3

(Per-patient) Adenoma detection rate as well as (per-patient) adenoma detection rate by adenomas features. Adenoma size category, large (≥10 mm) versus small (<10 mm); adenoma morphology, polypoid: (pedunculated (0–1 p), sessile (0–1 s) or mixed (0-1 sp)) versus non-polypoid lesions: (superficial slightly elevated (IIa), flat (IIb), superficial depressed (IIc) and excavated (III) types); and colon location, proximal colon (cecum, ascending and transverse) versus distal colon (descending, sigmoid and rectum).

Table 2

Per-patient analysis: adenoma detection rate (ie, rate of patients with at least one lesion of different histology) according to intervention arm, as well as distribution of adenomatous polyps according to morphology, size and location (for definitions, see text)

Table 2 shows the detailed results for the secondary outcomes on a per-patient basis: Significant differences (p=0.019 for superiority) were only found in the subgroups with non-advanced adenomas as well as in distal adenomas; ADR in all other subgroups as well as rate of SSL were not significantly different. Overall, 430/660 (65.2%) patients had polyp resections. Of these, 79/430 (18.4%) did not have histologically proven adenomas, SSLs or CRCs. These non-neoplastic polyp rates, representing ‘unnecessary’ polypectomies, were 12.1% and 11.8% in CADe and control group, respectively (RR: 1.03; 95% CI: 0.67 to 1.53).

Per-polyp analysis

Characteristics of detected polyps and cancers according to intervention arm are summarised in online supplemental tables S4 and S5. Table 3 summarises the results for APC overall (1.15±1.79), which was significantly higher in the CADe than in the control group (1.26±1.82 vs 1.04±1.75; incidence rate ratio, 1.21; 95% CI: 1.05 to 1.40) A statistically significant increase in APC between CADe and control group was found for non-polypoid lesions, those <10 mm and those with distal location. Sensitivity analyses using negative binomial regression had similar results, namely that the CAD has a positive effect on APC (online supplemental table S6). Finally, the association between APC and study arm remained significant after adjusting for gender, and colonoscopy indication in a random effect model (online supplemental table S7). APC by study colonoscopists are shown in the online supplemental table S7.

Table 3

Per-polyp analysis: mean number of adenomas per colonoscopy (APC) and univariate Poisson regression analysis by polyp characteristics among study participants (n=660); (for definitions, see text)

Effect of endoscopists experience and CADe on ADR by pooling AID-1 and AID-2 studies

In our post-hoc analysis, the key characteristics of the study populations enrolled in the two studies AID-1 and AID-2 are summarised in online supplemental table S9. Overall, 1346 patients (50.4% women; mean age, 61.8 years) were included in the pooled analysis. Parameters between the two studies were not significantly different except for FIT+ screening indications (30.2% AID-1 vs 7.3% AID-2) as explained above in Methods. The AID-1 and AID-2 studies reported consistent results, with non-significant interstudy variability.

The data from the two studies were pooled and analysed by using a multilevel mixed-effects model. Results from the pooled analysis including all patients enrolled in either AID-1 or AID-2 are reported in table 4. A (meta-analytical) forest plot of the study outcome estimates in both studies is depicted in online supplemental figure S2.

Table 4

Pooled-analysis of AID-1 (n=685) and AID-2 (n=660) data: results from a two-level logistic regression model with adenoma detection as the outcome and patient’s characteristics (ie, age, gender, colonoscopy indication) and endoscopist’s experience as the regression

According to this multilevel regression model, the adenoma detection rate was significantly improved with the use of CAD (RR, 1.29; 95% CI: 1.16 to 1.42; table 4), but the level of experience did not have a statistically significant effect ADR (non-expert vs expert, RR: 1.02; 95% CI: 0.89 to 1.16). These results remained the same when endoscopist experience was treated as continuous variable (ie, years of experience, figure 4) and for all colonoscopy indication subgroups (except for diagnostic colonoscopies, p=0.181). In addition, colonoscopy indication per se was a significant predictor of ADR, being significantly lower in primary screening or diagnostic colonoscopies compared with FIT+ colonoscopies (table 4, details in online supplemental figure S3-S5). However, a sensitivity analysis in which we excluded FIT+ patients (including n=1049 patients) showed similar results to those for the overall study population (CAD-e vs control, RR, 1.23; 95% CI: 1.04 to 1.46, p=0.016; non-experts vs experts, RR, 0.95; 95% CI: 0.69 to 1.33, p=0.801), suggesting that our findings were not driven by the FIT+ subgroup.

Figure 4

Pooled data analysis of the two randomised controlled trials (AID-1: 6 experienced endoscopists; AID-2: 10 inexperienced endoscopists). Pooled ADR according to experience of colonoscopy in years was reported without and with CAD. Only CAD, but not years of experience, was found associated with ADR at regression analysis. ADR, adenoma detection rate; CAD, computer-aided detection.

While variability in ADR among experts (AID-1) was not significant in both groups (p=0.55), such variability was significant among non-experts only in the CADe group (p=0.02; see online supplemental figure S4), but not in the control group (p=0.94).

Discussion

In our randomised trial on non-expert endoscopists, the use of artificial intelligence (CADe) for polyp detection led to a 22% and 21% increase in both ADR and APC. These results supported the trials primary assumption of NI, but also showed CADe being superior to standard colonoscopy in this setting. As with most other studies on AI as well as other imaging techniques,26 the AI-benefit was mainly due to a higher detection rate of subcentimetric non-polypoid and distal colorectal neoplasia. AI did not prolong examination time by a longer inspection or by prompting endoscopists to ‘unnecessarily’ resect a higher number of non-neoplastic polyps, probably due to the ability of non-experts to recognise irrelevant distal hyperplastic polyps which do not require resection. Finally, when pooling results of this study with those of our previous RCT on expert endoscopists in a post-hoc analysis, we also showed that the level of experience did not have an influence on ADR and ADR improvement by CADe. This was somewhat unexpected, since the common belief is that experts may have a higher ADR than non-experts.

As with all other studies on imaging techniques and devices to improve ADR, the background assumption of our study is the fact that a higher ADR may lead to a better patient outcome: Meanwhile, four studies1 2 4 5 have shown a clear correlation of ADR with interval cancers, so that examiners with a higher ADR may prevent more CRCs. Final proof of the relevance of ADR as the surrogate parameter for outcome quality however may only come from long-term prospective follow-up studies which includes also other parameters such as resection quality and adherence to surveillance.

Our study showed that AI also improves ADR in non-expert endoscopists as well as other outcome parameters as APC (ie, total number of polyps per colonoscopy) and especially the number of small adenomas. This confirms previous randomised studies, of which only one has been from the West (namely our RCT in expert endoscopists),16 whereas all others were performed in China,27–31 some of them with very low ADR in the control group.31

One initial assumption before we started our second study on AI for polyp detection—but this time in non-expert endoscopists—was that this group of examiners would primarily have a lower ADR and may even more benefit from AI than the expert group. However, in contrast to what we expected, we saw 4.1% higher ADR already in the control group with a similar benefit due to AI in the study group. Since the two studies were also somewhat different in the distribution of indications, we performed a post-hoc multilevel analysis to elucidate the relevance of these different parameters, namely AI, experience and indications (among others) on ADR: It turned out that both AI and indication, but not experience were relevant factors for ADR. In an exploratory analysis, we also found a higher degree of variability in the ADR increase of CADe among inexperienced as compared with experienced endoscopists. This may be related to a more scattered and somewhat inhomogeneous process of acquisition of the multiple endoscopic skills among non-expert as compared with expert endoscopists. We are well aware that these results are based on a post-hoc analysis and their validity may therefore be limited (as opposed to a 4four-arm randomised trial with experienced and less experienced endoscopists being allocated to different groups), but the observation is nevertheless interesting and may serve as a motivation to perform further studies.

In the literature, the influence of experience on ADR has been insufficiently studied, and in general, more data are available on the initial training/learning period, that is, in trainees, than from a later stage with higher overall case numbers, where non-experts are compared with experts. The latter question also suffers from insufficient data on a cut-off value to discriminate both groups. In general, acquisition and maintenance of competence in lesions recognition are complex, and no structured training for such cognitive skills is administered during the training and retraining of colonoscopy.7 32 The poorer detection rate of flat neoplasia in the control arm in our present study is in line with the poorer sensitivity of non-expert endoscopists, both only in artificial studies for both lower gastrointestinal (GI) and upper GI.10 11 17 33 Furthermore, the learning and (re)training period not only consists of lesion recognition in a given endoscopic image on the screen (which can be influenced by AI), but also skills in scope and tip manipulation to expose as much as possible of the colonic mucosa including looking behind folds, which is much less aided by AI. These technical skills have been shown to be acquired early in the training phase, with a >90% caecal intubation rate being reported after an average of 200 procedures.7 The cognitive skills may be improved by AI system which was largely trained on small and flat lesions—nearly 50% of all the training subset-phases—providing plausibility to the increased detection of flat lesions in the AI-assisted arm.21 34

The literature on experience and ADR in real life studies is summarised in online supplemental table S10; it is evident that cut-off values to discriminate expert from non-expert endoscopists are variable and mostly indirect evidence is available. In the German screening colonoscopy registry, incoming endoscopists tended to perform better as compared with senior endoscopists.15 Similarly, in the Italian organised screening programme, endoscopists with less than 5-year experience did not have inferior ADR as compared with those with more experience.35 On a similar line, the eligibility criteria for potential screeners in the British CRC screening programme included a minimum of 1000 procedures.36 37 Of note, the endoscopists participating in our AID-2 study had an average of less than 3-year experience in GI-endoscopy with an average lifetime volume less than 1000 procedures, while those in AID-1 had an average of 20 years and 10 000 lifetime procedures. In general, the benefit by AI in both of the group was independent of the indication, and it was similar in the subgroup of FIT+ subjects in both of the studies (R1-3). The estimates of ADR in FIT+ subjects in AID-1 and AID-2 (44% and 42%) was also in line with a previous estimate from a nationwide FIT+ database (44%).35

A possible limitation of this study on inexpert endoscopists is that we did not compare AI-assistance with alternative educational interventions, such as short-retraining courses that have been shown to be effective in improving ADR of less expert endoscopists. Such interventions appear complimentary, and a synergistic effect may be reasonably considered. Differently from the standalone performance study,33 this effectiveness study design was not fit to assess the sensitivity/specificity of the device, as this would have severely affected the routine technique of real-life colonoscopy in such a trial. On the other hand, we provided confirmatory data on the interaction between endoscopist and AI in terms of main outcomes of colonoscopy. However, additional analyses are required to explain how the AI outputs can contribute to decision-making in clinical practice as recommended by the CONSORT-AI extension.38 Finally, power analyses were calculated for our primary outcome to power the present AID-2 trial, but no power calculations were done for any of our secondary outcomes.

In conclusion, non-expert endoscopists have a similar ADR increase by the use of AI as expert endoscopists. This especially concerns flat lesions. When considering the ethical and legal implications of autonomous procedures by non-expert operators, an early integration of AI in this setting would appear reasonable within a quality assurance programme for all the stakeholders involved in screening and surveillance colonoscopy.

Data availability statement

Data are available upon reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

This study was conducted according to Helsinki declaration and approved by the Institutional Review Board of the Humanitas Research Hospital (n 2363).

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @adegottardi

  • Contributors AR, MS and CH designed the study and drafted the manuscript. LC performed statistical analysis. AR, MS, GA, RM, PAG, AC, SMM, GL, MB, EF, AF, SC, AAn, AAm, ADG, CS, FR and CH recruited patients, performed colonoscopy procedures and/or participated in the data collection. MBW, PS, VS and TR critically revised the draft for important intellectual content. All the authors revised and approved the final manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests Conflict of interest statement/disclosure(s): All authors for equipment loan by Medtronic. AR and CH received consultancy fee from Medtronic. MBW provides consulting activity to Medtronic and Cosmo on behalf of Mayo Clinic and has equity interest in Virgo.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.