Article Text

Original article
Training and competence assessment in GI endoscopy: a systematic review
  1. Vivian E Ekkelenkamp,
  2. Arjun D Koch,
  3. Robert A de Man,
  4. Ernst J Kuipers
  1. Erasmus MC—University Medical Center, Rotterdam, The Netherlands
  1. Correspondence to Vivian E Ekkelenkamp, Erasmus MC University Medical Center Rotterdam, The Netherlands; v.ekkelenkamp{at}


Introduction Training procedural skills in GI endoscopy once focused on threshold numbers. As threshold numbers poorly reflect individual competence, the focus gradually shifts towards a more individual approach. Tools to assess and document individual learning progress are being developed and incorporated in dedicated training curricula. However, there is a lack of consensus and training guidelines differ worldwide, which reflects uncertainties on optimal set-up of a training programme.

Aims The primary aim of this systematic review was to evaluate the currently available literature for the use of training and assessment methods in GI endoscopy. Second, we aimed to identify the role of simulator-based training as well as the value of continuous competence assessment in patient-based training. Third, we aimed to propose a structured training curriculum based on the presented evidence.

Methods A literature search was carried out in the available medical and educational literature databases. The results were systematically reviewed and studies were included using a predefined protocol with independent assessment by two reviewers and a final consensus round.

Results The literature search yielded 5846 studies. Ninety-four relevant studies on simulators, assessment methods, learning curves and training programmes for GI endoscopy met the inclusion criteria. Twenty-seven studies on simulator validation were included. Good validity was demonstrated for four simulators. Twenty-three studies reported on simulator training and learning curves, including 17 randomised control trials. Increased performance on a virtual reality (VR) simulator was shown in all studies. Improved performance in patient-based assessment was demonstrated in 14 studies. Four studies reported on the use of simulators for assessment of competence levels. Current simulators lack the discriminative power to determine competence levels in patient-based endoscopy. Eight out of 14 studies on colonoscopy, endoscopic retrograde cholangiopancreatography and endosonography reported on learning curves in patient-based endoscopy and proved the value of this approach for measuring performance. Ten studies explored the numbers needed to gain competence, but the proposed thresholds varied widely between them. Five out of nine studies describing the development and evaluation of assessment tools for GI endoscopy provided insight into the performance of endoscopists. Five out of seven studies proved that intense training programmes result in good performance.

Conclusions The use of validated VR simulators in the early training setting accelerates the learning of practical skills. Learning curves are valuable for the continuous assessment of performance and are more relevant than threshold numbers. Future research will strengthen these conclusions by evaluating simulation-based as well as patient-based training in GI endoscopy. A complete curriculum with the assessment of competence throughout training needs to be developed for all GI endoscopy procedures.


Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known on this subject?

  • The use and validation of simulators in endoscopy training is extensively studied.

  • Moreover, the classic master–apprentice model for teaching endoscopy is nowadays not accepted anymore.

  • Threshold numbers for the assessment of competence in GI endoscopy are widely used, but might be outdated.

What are the new findings?

  • There is sufficient evidence for the use of validated simulators in endoscopy training curricula. However, the extent to which simulator training should be carried out is a matter of debate.

  • Continuous assessment of a trainee's performance with a validated assessment tool is valuable and provides insight into the individual and group learning curve.

  • This is definitely more thorough than defining competence based on threshold numbers alone.

How might it impact on clinical practice in the foreseeable future?

  • We would advocate a prepatient training curriculum on a validated simulator for flexible endoscopy. Moreover, the use of learning curves in patient-based training is a sound method for the assessment of competence. However, more research is needed in order to evaluate a complete endoscopy training programme: from novice to competence to excellence.


The focus on training in procedural skills in GI endoscopy is shifting from threshold numbers towards an individual approach. This illustrates the awareness that the classic master–apprentice model may not reflect all necessary aspects of training. Moreover, the old adage “see one, do one” seems no longer appropriate for educating health professionals to perform complex technical procedures, such as flexible endoscopy.1 Virtual reality (VR) simulators may be of benefit in the education of gastroenterology trainees. However, a substantial part of training still has to be patient-based. The assessment of a trainee's competence is not clearly defined and competence benchmarks for trainees are sparse. The use of threshold numbers is nowadays considered a poor surrogate marker for competence. Keeping track of one's performance by measuring skill development seems preferable. However, training guidelines differ worldwide and there is no consensus on the skills a trainee has to possess at the end of education. On top of that, for most procedural skills in flexible endoscopy, the proper assessment tools to measure these skills are lacking.

The aim of this systematic review was therefore to evaluate the available literature on different training and assessment methods in GI endoscopy. Second, we aimed to identify the role of simulator training and competence development in patient-based training, specifically for procedures that normally will be learnt during residency. Third, we aimed to propose a structured training curriculum based on the presented evidence.


Literature search strategy

A systematic literature search was carried out in July 2013 in seven different medical and educational literature databases: Embase, Medline OvidSP, Web of Science, Cochrane central, Google Scholar, Research and Development Resource Base and Education Recourse Information Center. There was no restriction regarding the time of publication or language. The search strategy for Medline OvidSP is shown in supplementary table S1.

Inclusion and exclusion criteria

All studies pertaining to training and assessment in GI endoscopy (colonoscopy, esophagogastroduodenoscopy [EGD], endoscopic retrograde cholangiopancreatography [ERCP] and endosonography [EUS]) were included in this review. The studies were to report outcome measures with respect to learning curves, assessment methods or tools and training programmes including simulators. Two reviewers independently examined all retrieved studies. When disagreement existed over studies to be included or excluded, these were discussed until consensus was reached. Reviews, systematic reviews, meta-analyses and abstracts were excluded, as well as studies on tools to improve completion of colonoscopy. However, reference lists of potential relevant systematic reviews and meta-analyses were checked for any missed papers.

Data extraction and analysis

For each study, the methods, way of assessment and endpoints were recorded according to a predefined protocol. Two reviewers extracted all data. The quality of the studies was appraised and the reviewers assigned a level of evidence to each study using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) criteria.2 The quality of the studies was distributed in four categories: very low (⊕∘∘∘); low (⊕⊕∘∘); moderate (⊕⊕⊕∘); and high (⊕⊕⊕⊕). Based on the presented evidence, we aimed to provide recommendations regarding the different subjects. The validation method and type of each simulator study was designated according to the consensus guidelines for the validation of VR simulators as described by Carter et al.3 The validation of simulators is in most cases performed by demonstrating different types of validity. Validity in itself is defined as the extent to which an assessment tool, in this case a simulator, measures what it is supposed to measure. One of the simplest forms of validity is face validity. This is demonstrated by questioning a defined group of subjects, to judge the simulator on realism between the simulator and the real activity. Usually, a group of experts is questioned. This is why the term expert validity is also used. Construct validity describes the extent to which the simulator can distinguish between different levels of expertise. The most used method of establishing construct validity is that the simulator can distinguish beginners from more experienced endoscopists and experts by the simulators performance parameters. The reliability of the simulator relates to the power of the simulator to provide consistent results. The most commonly used test is the test–retest reproducibility. It predicts to what extent a subject can ‘beat the test’ by repeated assessment. The most powerful evidence of validity is concurrent validity. This refers to the level of which performance on the simulator correlate to the real activity, in this case patient-based endoscopy.

Since we aimed to provide a complete overview of the available literature on training and assessment in GI endoscopy, the included studies were fairly heterogeneous in clinical outcome. Therefore, it was judged that the statistical pooling of the data was not suitable.



Figure 1 shows the flowchart of the selection process of the included studies. Ninety-four studies investigating simulators, assessment methods, learning curves and training programmes for GI endoscopy were included in this review. In order to provide a systematic overview of these studies, they were divided into different categories (simulator training, learning curves, numbers needed to gain competence, assessment of performance and evaluation of [patient-based] training models). Table 1 provides an overview of the included studies per category. In a more detailed discussion, we will focus on studies of moderate to high quality.

Table 1

Overview of included studies per subject

Figure 1

Flow diagram of the studies included.

Simulator validation studies

Twenty-seven simulator validation studies were retrieved.4–29 The validation studies comprised the evaluation of the devices alone. We included 11 studies on colonoscopy, two on flexible sigmoidoscopy, three studies on basic flexible endoscopy in general, one study on EGD, six studies on ERCP, two on EUS and one study on dexterity exercises in forward viewing endoscopy. All studies are shown in online supplementary table S2. Besides the ERCP and EUS studies, all five other categories of studies focused on conventional, forward viewing flexible endoscopy with a large overlap in outcome parameters. Procedures like ERCP and EUS show profound differences compared with basic forward viewing flexible endoscopy, because of combination with radiological or ultrasonographical imaging and also of a complete different perception by the endoscopist in side viewing endoscopy. We have therefore analysed them separately from the larger group that we refer to as forward viewing flexible endoscopy procedures. Eight validation studies on flexible endoscopy tasks were performed using the Simbionix GI Mentor VR computer simulator.8–10 ,12 ,14 ,15 ,22 ,25 Two studies reported on face validity. The largest study included 35 experts and demonstrated good face validity for colonoscopy.15 A smaller study reported the low level of realism as judged by six experts on all modules of the simulator.14 All studies reported consistent results and good construct for performance metrics on procedure times of the GI Mentor. These procedural times varied from time to caecal intubation, time spent with clear view and time spent with endoscope loops. Although these types of parameters, measuring a time aspect, are usually considered surrogate markers for competence, it seems to be the most consistent and therefore the most reliable parameter to distinguish between competence levels. There is a fairly large clinical heterogeneity on other outcome parameters. Five studies reported on the AccuTouch Immersion Medical computer simulator.4 ,19–21 ,27 Two studies reported on face validity with conflicting results. Again, as for the GI Mentor, realism was judged as valid by experts for the colonoscopy module but not for the complete set of modules on the simulator as a whole. The AccuTouch simulator seemed to have the same construct validity profile as the GI Mentor. That is, construct validity was consistently reported as good for performance measures related to procedural times in all published studies. Three validation studies reported on the Olympus Endo TS-1 VR computer simulator for colonoscopy.13 ,16 ,28 Face validity was rated as good by two studies and all three demonstrated good construct validity on all studied procedures. One study reported good construct the validity of the Kyoto Kagaku Colonoscope Training Model. Face validity was not studied.24 The last study demonstrated good face, construct and concurrent validity in a bovine explant colon model.26 Performance on this model correlated well with the level of expertise of the subjects and their performance in caecal intubation in patient-based endoscopy.

Six validation studies were performed on ERCP. Two studies were feasibility studies and no formal validation was done. These two were both in mechanical models.11 ,23 Only one validation study was performed using a VR computer-based simulator. This study demonstrated both face and construct validity for the ERCP modules in the Simbionix GI Mentor II simulator.7 A similar study was done for the X-Vision ERCP Training System, a mechanical simulator, showing both face and construct validity.29 Two studies on the same mechanical ERCP training simulator were performed by the same research group.17 ,18 The ERCP Mechanical Simulator demonstrated a good construct validity and excellent face validity. In a direct comparison to an ex vivo porcine stomach model, the mechanical simulator was rated more realistic and useful. Another study compared live porcine models versus the Erlangen Endo-Trainer versus the Simbionix GI Mentor VR simulator for ERCP.30 The Erlangen model scored highest on realism and educational value. The GI Mentor scored lowest. However, it was felt that the GI Mentor was more easily incorporated in a training programme. Although the validation studies for ERCP simulation comprised a fairly heterogeneous group of simulators, the strongest evidence was provided for the mechanical simulators. For EUS, only two studies by the same author reported on feasibility to perform EUS and fine needle aspiration (FNA) in a porcine model.5 ,6 No attempt at validation has been published to date.

Simulator training and learning curve studies

Twenty-three studies reported on simulator training and learning curves.31–53 All but one of these studies focused on the diagnostic aspects of forward viewing flexible endoscopy, for example, intubation skills. Twenty studies reported on forward viewing flexible endoscopy (three EGD, three sigmoidoscopy and 14 colonoscopy), one study reported on EUS, one on ERCP and one on training haemostasis in upper GI bleeds. The studies are shown in online supplementary table S3. Eleven studies were performed using the AccuTouch Immersion Medical VR computer simulator for training.31 ,33 ,34 ,40 ,44 ,45 ,47 ,49–52 All studies on flexible sigmoidoscopy and colonoscopy had a randomised design and compared simulator-based training groups versus controls. Acquired competence was evaluated using the same simulator and in six studies also during patient-based assessment. The most consistent outcome parameters demonstrating improved performance were on procedural times, caecal intubation rates (CIRs), and times in red-out, meaning that luminal view was lost. Patient comfort scores were measured in two studies.40 ,50 One study favoured simulator training versus no simulator training prior to starting patient endoscopies, the second study showed no difference between groups.

Six studies were carried out using the Simbionix GI Mentor VR simulator for training and learning curves.32 ,34 ,36–38 ,41 Four studies were on colonoscopy tasks, two on EGD. All studies demonstrated that simulator training improved the performance of novices. There were no learning effects for experienced endoscopists. Due to the methodological heterogeneity of these studies, improved performance could not be expressed in terms of exact numbers. Performance was assessed by means of the simulator construct in three studies. Two studies used patient-based assessment for the evaluation of the simulator-based learning effect. The competence parameters that consistently improved significantly were: (I) procedure time, (II) CIR: a direct comparison of simulator-training versus controls showed a 4.5-fold increased CIR in the simulator-training group in the early learning curve,34 (III) time with clear view, (IV) time of endoscope looping, and (V) objective performance scores, as judged by expert supervisors during patient-based endoscopy assessment. Improved performance in the simulator-trained groups versus controls was observed in up to 60 patient-based assessed EGDs and 80 procedures in colonoscopy training. Only one study used the Olympus Endo TS-1 colonoscopy simulator for training.42 This multicentre, randomised study compared simulator-based training versus patient-based training. Blinded experts assessed performance during patient-based endoscopy. Both groups (simulator trained and patient trained) showed equal performance during patient-based colonoscopy. One multicentre, randomised study was performed using all kinds of simulators.53 The study showed that patient-based training with complementary simulator training was superior to patient or simulator-based training alone. One study was done on ERCP.46 This study had a multicentre, randomised design. It demonstrated significantly higher cannulation success rates in less time in the study group after training on the ERCP Mechanical Simulator. One study was performed evaluating the CompactEASIE simulator, a mechanical simulator with an ex vivo porcine stomach.48 Significant improvement in skills in endoscopic haemostatic therapy was demonstrated with a sufficient level of evidence. No previous formal validation of the model was carried out. Only one study was performed on the subject of learning diagnostic and therapeutic EUS.35 Only a description of improved performance on live porcine models before and after a hands-on training course was provided. No formal statistical calculation was carried out. The model had not been previously validated.

Simulator competence assessment studies

Four studies reported on the use of simulators for the assessment of competence.54–57 Two studies focussed on colonoscopy, one on sigmoidoscopy and one on both EGD and sigmoidoscopy; all evaluated the diagnostic aspects of endoscopy. The studies are summarised in online supplementary table S4. In two studies of moderate quality, performance parameters derived from the simulators did not correlate to performance scores given by blinded experts.55 ,57 It seems that current simulators lack the discriminative power to assess performance and determine competence levels in patient-based endoscopy.

Learning curves

Fifteen studies reported on learning curves for colonoscopy (n=8), ERCP (n=5) and EUS (n=2).58–72 These are shown in online supplementary table S5 (A,B,C, respectively).

For colonoscopy, four studies reached a sufficient quality level.59 ,61 ,62 ,65 These studies had a prospective design and evaluated 8–41 trainees with procedure numbers varying from 2887 to 4351. All of these studies evaluated patient-based performance, focussing on the intubation of the caecum. However, outcome measures and use of competence standards were fairly heterogeneous. The studies reported on CIR or completion rate, time to caecum or a combination of those outcomes. One group described the learning curve by means of scoring different aspects of the procedure on a newly developed assessment tool (Mayo Colonoscopy Skills Assessment Tool, MCSAT), but also described learning curves for outcomes such as CIR.65 The Rotterdam Assessment Form for Colonoscopy (RAF-C) was used by another group as an easy-to-use assessment form to document the colonoscopy learning curve but also provide a platform for repetitive assessment and feedback to improve performance.61 The number of colonoscopies that trainees needed to perform in order to achieve a CIR of >85–90% varied from 150 to 280 procedures. The two studies of the highest quality reported 275 and 280 procedures needed to achieve a 90% CIR.61 ,65 These consistent numbers probably provide the best evidence currently available regarding threshold numbers for colonoscopy training in intubating the caecum.

From the five studies focusing on ERCP, only two had a reasonable quality level.60 ,71 These described a prospective evaluation of, respectively, 17 and 20 trainees, with the following outcome measures: subjective score regarding performance (overall and per part of the procedure) on a six-point scale where a score of 1, 2, or 3 was considered competent, and the success of selective cannulation of the common bile duct (CBD) or pancreatic duct (PD). One study concluded that an overall sufficient score was reached after 137 (probability of success=0.8) or 185 ERCPs (probability of success=0.9).60 A different group reported that an 85% selective cannulation rate was reached after 70 procedures for the PD and after >100 ERCPs for the CBD.71

The two studies on EUS described the performance per anatomic station of the procedure.70 ,72 There was a large variability in achieving overall competence, with acceptable performance after a range of 255–>400 EUS procedures.70 One study did not report on overall competence, but stated that 78 procedures were necessary for competence in duodenal examination.72

Threshold numbers needed to gain competence

Nine studies reported numbers needed to gain competence in different procedures in GI endoscopy.72–81 These studies are shown in online supplementary table S5 as well. Two studies handled both EGD and colonoscopy,73 ,80 whereas most of the studies pertained to colonoscopy alone.74 ,75 ,77–79 There were two single studies on sigmoidoscopy and ERCP.76 ,81 The quality of the studies was moderate for most of them due to the designs and numbers of procedures evaluated. Only three groups performed studies (regarding EGD and colonoscopy) with a prospective design and a considerable amount of trainees evaluated.73 ,74 ,80 These will be discussed in further detail. These studies evaluated patient-based performance and focussed on intubation skills, for both EGD and colonoscopy.

For EGD, competence was measured in two ways: intubation of the oesophagus and reaching a sufficient score on the Global Assessment of Gastrointestinal Endoscopic Skills (GAGES). One group demonstrated an 80% success rate of oesophageal intubation after 100 procedures, whereas another study concluded a plateau in the GAGES score after 50 procedures.73 ,80 Concerning colonoscopy, competence was measured through CIR and scores on the GAGES form as well. Two studies concluded that 100 colonoscopies was insufficient for reaching a >90% CIR,73 ,74 whereas the GAGES score displayed a plateau score at n=75 procedures.80 All studies confirmed that the performance of trainees increased with experience. The best evidence for a threshold number to reach a steady 90% CIR was already mentioned in the previous paragraph in two high quality studies with a threshold number of 275–280 colonoscopies during training.61 ,65

Assessment and grading of performance

Nine studies described the development and evaluation of assessment tools for colonoscopy (n=6), sigmoidoscopy (n=1), both (n=1) and both colonoscopy and EGD (n=1).82–90 These are shown in online supplementary table S6. All studies of moderate to high quality focused on colonoscopy, flexible sigmoidoscopy or both, had a prospective design and reported on 18–162 participants.82 ,86–89 The British Direct Observation of Procedural Skills (DOPS) appears effective for the evaluation of competence for already registered endoscopists.82 This form addressed both intubational and withdrawal and therapeutic skills. The MCSAT was more effective in discriminating different experience levels, and therefore applicable in training settings.87 The RAF-C was already mentioned in the paragraph on learning curves. This assessment form documents objective parameters such as caecal intubation, procedural times and polyp detection and combines this evaluation of performance, self-reflection and an improvement plan.61 Two studies reported on some sort of video assessment of endoscopic skills.88 ,89 The tri-split video recording assessment tool proved to be valid, but reliability was lacking.88 The other study on video assessment described the development of an assessment tool for sigmoidoscopy withdrawals in a series of five experiments.89 They concluded that the sequential assessment of five withdrawals led to the highest agreement. However, all procedures included in this video study were performed by experienced endoscopists. Some assessment tools were applicable in training situations, while others were only evaluated in a setting with experienced endoscopists. This difference makes it therefore difficult to compare the assessment tools.

Training models

Finally, seven studies reported on different kinds of training models for colonoscopy (n=4), sigmoidoscopy (n=1) and EGD (n=2).91–97 Online supplementary table S7 provides an overview of these studies. Two groups described the evaluation of the accelerated colonoscopy training course as it is carried out in the UK.96 ,97 This training model comprised theory, simulator sessions, hands-on training and live case assessment. Both studies concluded that performance in knowledge, colonoscopy performance and DOPS scores improved significantly after the training week. Thomas-Gibson et al added an evaluation at a median follow-up of 9 months. There were, however, no differences between post-training assessment and follow-up. A different training model was the ‘gastroenterological education—training endoscopy’ model.91 The model focussed on knowledge and simulator training; there were no patient-based endoscopies involved. This training model showed improvement in post-test results and simulator performance. A German group tried to identify predictors for performance in a 1-week training course by psychological and psychomotor tests.94 The training week resulted in improved performance, but only one specific (double labyrinth) test was identified as a predictor for improvement in performance.

One randomised control trial (RCT) evaluated the impact of systematic feedback on patient-based colonoscopy performance.92 Although only four trainees were evaluated, there was a significant improvement in CIR performance in the feedback group, while the control group showed no improvement.


Forward viewing flexible endoscopy procedures

GI endoscopic procedures are fairly complex. The sole use of the classic master–apprentice model for teaching endoscopy is nowadays less accepted. The use of simulators in the early training phase is gaining acceptance and several VR endoscopy simulators have been validated (see online supplementary table S2). The GI Mentor, AccuTouch and Endo TS-1 were shown to have good validity.4 ,8–10 ,12–16 ,19–22 ,25 ,27 ,28 These can thus be considered as realistic devices that have discriminative abilities for distinguishing dexterity and competence levels in flexible endoscopy. Since these simulators proved to have good validity, we recommend to use one of these devices in early training. ⊕⊕⊕∘

Following validation, the impact of simulator training on learning curves needs to be assessed. A VR simulator with good validity, but not improving performance after repeated exercise, especially in patient-based endoscopy, is not suitable for implementing in a training programme. Three studies provided high quality evidence for the positive effect of simulator training in novices in flexible endoscopy, measured in terms of both VR as well as live endoscopy.37 ,42 ,53 Two of these were well-designed randomised multicentre trials comparing the combination of simulator- and bedside-training versus bedside training alone for the colonoscopy training of novices. These studies demonstrated that simulator training is effective.37 ,42 The first RCT demonstrated significantly higher objective competency rates during the first 80 patient-based colonoscopies after 10 h of unsupervised simulator training. There was, however, no difference in the number of procedures to reach a 90% competency level. The second study demonstrated similar performance of novices during patient-based endoscopy, as judged by blinded experts, after either 16 h of supervised simulator training or 16 h patient-based training. There was no follow-up of participants to procedural competency in this study. Several studies on simulator learning curves for EGD, sigmoidoscopy and colonoscopy had a moderate level of quality.31 ,32 ,34 ,36 ,38–41 ,43–45 ,47 ,49–52 Nonetheless, some studies only measured performance during VR endoscopy, which is obviously inferior to measuring performance during patient-based endoscopy, since the ultimate goal is improvement of patient-based performance.33 ,35 ,36 ,38 ,44 ,45 ,47 ,51 Based on this evidence, one can conclude that simulator training is complementary to patient-based learning and is useful in the early training phase in speeding up the early learning curve and reducing patient burden. To reach procedural competency in patient-based endoscopy, the same numbers of patient-based procedures seem to be necessary. We do nonetheless recommend the use of simulators in the early training phase. ⊕⊕⊕∘

The four studies that reported on the use of a simulator as a competence assessment tool showed diverging results.54–57 Therefore, we would suggest that at this point, the use of simulators as an assessment tool is not sensible. ⊕∘∘∘

Elaborating further on the learning curve, the next step is (continuous) assessment of a trainee's performance during patient-based training. The currently available recommendations and guidelines focus mainly on minimum numbers as a threshold for competence.73–80 However, outcomes and proposed minimum numbers for flexible endoscopic procedures vary widely. Nowadays there is a tendency to define more objective criteria for competence. Two large prospective single-centre studies of high quality provided evidence for the use of an assessment form as a measure of competence, respectively, the MCSAT and the RAF-C.61 ,65 The learning curves obtained in these studies were similar. Both studies focus on the description of the gradual process of acquisition of competence rather than setting a prefixed threshold number. The strength of these curves is that improved performance over time is documented rather than the assessment of an ‘incidental lucky procedure’ or a total number of procedures performed without any qualitative content. We recommend to implement the use of one of these forms in assessment of trainees’ performance and learning curves. On top of that, we recommend to use the DOPS for assessment of ‘end-stage’ competence. ⊕⊕⊕⊕82

Overall, some high-quality studies have been performed for each individual step in training, providing valuable information on the effect of simulator training, learning curves and assessment methods. The most and best evidence for all these stages regarding basic flexible endoscopy is available for colonoscopy. However, one can imagine that some results can be extrapolated to other basic GI endoscopy procedures as well, since the techniques are comparable.

Endoscopic retrograde cholangiopancreatography

One of the most challenging procedures with high complication rates in GI endoscopy is ERCP. It takes a great deal of training and a large number of procedures to reach competence. However, little is known about the learning curve for trainees in ERCP. A number of questions remain unanswered when it comes to the shape of the learning curve, the number of procedures needed to gain competence, and the definition of competence itself. The six studies on learning curves in ERCP varied widely in design, the number of trainees and procedures included, as well as outcome, resulting in a large heterogeneity among them. Successful cannulation in >85% of the patients was seen after a number of 100–185 ERCPs. Due to the large clinical heterogeneity between the studies, we cannot make any recommendations regarding ERCP learning curves. ⊕∘∘∘ There would be a great benefit if part of the learning curve for endoscopists could be accomplished by training on simulator models. In reality the number of available simulators for training in ERCP is limited. Of in total six devices, the GI Mentor is the only validated VR simulator for ERCP.7 ,30 The face and construct validity was demonstrated in these two studies and although it received lower scores than the ex vivo or live porcine model in a head-to-head comparison, it was considered the easiest of all ERCP simulator models to incorporate in a training curriculum.7 ,30 The live porcine model was validated only once in comparison to the ex vivo model and the GI Mentor in the same study.30 The ex vivo simulators and purely mechanical simulators are highly comparable among each other and achieve similar results. All of these models require a real endoscope to be introduced to reach a papilla which is either a synthetic or an ex vivo papilla located in a mechanical tube representing the duodenum or an ex vivo duodenum. Overall these ERCP simulator models receive the highest scores on realism. The use of one of the validated ERCP simulators before patient-based training is recommendable, since these have fairly good validity.7 ,30 ⊕⊕⊕∘ We cannot provide a recommendation regarding the use of simulators in order to speed up the learning curve, since there was only one study performed.46 There were no studies found on validated competence assessment tools to objectify performance in ERCP. The most common performance parameter is cannulation success rate. This only partly reflects the extent, therapeutic intent and diversity of a therapeutic procedure like an ERCP.


EUS is widely practiced with an increasing number of therapeutic possibilities since the first reports of transgastric drainage of pseudocysts by Grimm et al.98 This makes EUS more complex. Especially the therapeutic procedures have a marked overlap with ERCP and demand a great deal of experience. There are only a few reports on simulator-based training in EUS.5 ,6 Training diagnostic and interventional EUS seems logical and feasible in a live porcine model but no formal attempt at validation has been made. No grade of recommendation can be given based on these studies. A learning effect by repeated exercise and improvement of performance during EUS procedures in the live porcine model itself was documented in one study.35 There is a lack of scientific evidence of transfer of competence to a patient-based setting. There is an even greater scarcity of evidence on learning curves and numbers to reach competence in EUS. Two studies were performed that both included five trainees. The first study included only radial EUS.72 They reported no additional effect of observing large numbers of procedures; the largest benefit was achieved during hands-on training. There is only one study of moderate quality performed.70 The learning curves differed considerably among the five trainees. These studies demonstrated the substantial need for much more training than the 150 procedures recommended by the American Society for Gastrointestinal Endoscopy in order to reach proficiency. ⊕⊕∘∘


The clinical heterogeneity of the studies regarding forward viewing endoscopy limits the conclusions that can be drawn. This systematic review covers a broad range of studies regarding training and assessment in GI endoscopy. This broad approach automatically results in a large variety of methodology, devices used and endpoints measured. This hampers head-to-head comparison of individual studies. Another limitation concerns the fact that all studies focused on specific aspects of the endoscopic procedure, instead of on overall performance, which is both overall competence assessment from novice to experienced, certified endoscopy, as well as expert levels for specific procedures.

The evidence in the literature on learning curves and competence measures for ERCP is highly heterogeneous. This makes it impossible to provide a level of recommendation. Also, cannulation success rates do relate to improved performance but do not entirely reflect the diversity of a complex procedure like ERCP. No solid data are currently available on other aspects of therapeutic interventions related to learning curves and benchmarks in ERCP. As of yet, no validated competence assessment tools have been developed for ERCP. This should be a prerequisite before attempts to define learning goals and benchmarks are made.

Future research

Future research, based on the presented evidence in this review, should therefore include a complete training programme. We propose a prepatient curriculum using simulator training. The transfer of simulation skills to patient-based procedures needs to be further explored. Simulation training needs to be followed by the continuous assessment of patient-based endoscopies to provide individual and group learning curves and after a period of time, (repeated) overall assessments of performance by an expert. Therefore, the development of validated assessment tools is necessary and the effect of expert assessments on daily practice needs to be measured.

With respect to ERCP, there is a rationale to start training using simulators. There is however no evidence yet as to what extent or performance level simulator-based training has to be carried out. The next step would be to investigate the transfer of skills to patient-based training. These research objectives seem to be clear goals for future research. There is a need for the development of validated objective assessment tools in ERCP to document progress in training and finally proficiency. Benchmarks can be set using the same assessment tools in ERCP performed by experts.

The evidence on training and competence assessment in EUS is extremely scarce. Although training in a live porcine model seems logical, in the current era of evidence-based medicine, validation studies should be carried out to establish the degree of realism and training potential. Current threshold numbers for training appear to be inadequate, but the available data are sparse. We seem to be far away from establishing benchmarks for competence in EUS and validated assessment tools are lacking.

General conclusions and recommendations

Based on the presented evidence, we propose the implementation of simulator training in GI endoscopic training curricula. Regarding basic flexible endoscopy (EGD, sigmoidoscopy and colonoscopy), simulator-based training has proven its value and it is justifiable to start a prepatient training course using a validated simulator. This will result in speeding up the early learning curve and reducing patient burden. However, to reach procedural competency in patient-based endoscopy, the same numbers of patient-based procedures seem to be necessary. The extent to which simulator-based training should be carried out is still a matter of debate. A structured and supervised 2 days or 16 h training course seems to be of added value. Furthermore, objective outcome parameters should be measured continuously in patient-based training. This provides insight in the learning curve in a qualitative fashion and is preferable to threshold numbers. The MCSAT, RAF-C and DOPS assessment forms seem to be the best forms to document progress or proficiency levels. Regarding ERCP training, we would recommend a prepatient training curriculum using a validated simulator as well. Evidence for evaluation of learning curves and continuous assessment in ERCP is scarce. This makes competency-based training difficult. The available data support prolonged training, at least to a larger extent than current upheld threshold numbers in most countries. The results so far may hopefully stimulate further research. The evidence on endosonography training and competence is yet the least investigated. A prepatient training curriculum is logical and attractive. However, the evidence is too scarce to give recommendations at this moment.


Supplementary materials


  • VEE and ADK contributed equally to this work.

  • Contributors VEE, ADK: substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; drafting the work or revising it critically for important intellectual content; final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. RDM, EJK: drafting the work or revising it critically for important intellectual content; final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.