Article Text

Download PDFPDF
Original research
Development and validation of a model to predict survival in colorectal cancer using a gradient-boosted machine


Objective The success of treatment planning relies critically on our ability to predict the potential benefit of a therapy. In colorectal cancer (CRC), several nomograms are available to predict different outcomes based on the use of tumour specific features. Our objective is to provide an accurate and explainable prediction of the risk to die within 10 years after CRC diagnosis, by incorporating the tumour features and the patient medical and demographic information.

Design In the prostate, lung, colorectal and ovarian cancer screening (PLCO) Trial, participants (n=154 900) were randomised to screening with flexible sigmoidoscopy, with a repeat screening at 3 or 5 years, or to usual care. We selected patients who were diagnosed with CRC during the follow-up to train a gradient-boosted model to predict the risk to die within 10 years after CRC diagnosis. Using Shapley values, we determined the 20 most relevant features and provided explanation to prediction.

Results During the follow-up, 2359 patients were diagnosed with CRC. Median follow-up was 16.8 years (14.4–18.9) for mortality. In total, 686 patients (29%) died from CRC during the follow-up. The dataset was randomly split into a training (n=1887) and a testing (n=472) dataset. The area under the receiver operating characteristic was 0.84 (±0.04) and accuracy was 0.83 (±0.04) with a 0.5 classification threshold. The model is available online for research use.

Conclusions We trained and validated a model with prospective data from a large multicentre cohort of patients. The model has high predictive performances at the individual scale. It could be used to discuss treatment strategies.

  • colorectal cancer

Data availability statement

Data may be obtained from a third party and are not publicly available. Data are available from the NIH.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.