Article Text

Download PDFPDF
Original research
Development and validation of a model to predict survival in colorectal cancer using a gradient-boosted machine
  1. Jean-Emmanuel Bibault,
  2. Daniel T Chang,
  3. Lei Xing
  1. Radiation Oncology, Stanford Medicine, Stanford, California, USA
  1. Correspondence to Dr Jean-Emmanuel Bibault, Radiation Oncology, Stanford Medicine, Stanford, CA 94305, USA; jbibault{at}


Objective The success of treatment planning relies critically on our ability to predict the potential benefit of a therapy. In colorectal cancer (CRC), several nomograms are available to predict different outcomes based on the use of tumour specific features. Our objective is to provide an accurate and explainable prediction of the risk to die within 10 years after CRC diagnosis, by incorporating the tumour features and the patient medical and demographic information.

Design In the prostate, lung, colorectal and ovarian cancer screening (PLCO) Trial, participants (n=154 900) were randomised to screening with flexible sigmoidoscopy, with a repeat screening at 3 or 5 years, or to usual care. We selected patients who were diagnosed with CRC during the follow-up to train a gradient-boosted model to predict the risk to die within 10 years after CRC diagnosis. Using Shapley values, we determined the 20 most relevant features and provided explanation to prediction.

Results During the follow-up, 2359 patients were diagnosed with CRC. Median follow-up was 16.8 years (14.4–18.9) for mortality. In total, 686 patients (29%) died from CRC during the follow-up. The dataset was randomly split into a training (n=1887) and a testing (n=472) dataset. The area under the receiver operating characteristic was 0.84 (±0.04) and accuracy was 0.83 (±0.04) with a 0.5 classification threshold. The model is available online for research use.

Conclusions We trained and validated a model with prospective data from a large multicentre cohort of patients. The model has high predictive performances at the individual scale. It could be used to discuss treatment strategies.

  • colorectal cancer

Data availability statement

Data may be obtained from a third party and are not publicly available. Data are available from the NIH.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data availability statement

Data may be obtained from a third party and are not publicly available. Data are available from the NIH.

View Full Text


  • Twitter @jebibault

  • Contributors J-EB conceived and designed the study, analysed the data and interpreted the results. J-EB, DTC and LX drafted the article and contributed to the writing of the final version of the article.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.