Objective This study aimed to develop and validate a model to estimate the likelihood of detecting advanced colorectal neoplasia in Caucasian patients.
Design We performed a cross-sectional analysis of database records for 40-year-old to 66-year-old patients who entered a national primary colonoscopy-based screening programme for colorectal cancer in 73 centres in Poland in the year 2007. We used multivariate logistic regression to investigate the associations between clinical variables and the presence of advanced neoplasia in a randomly selected test set, and confirmed the associations in a validation set. We used model coefficients to develop a risk score for detection of advanced colorectal neoplasia.
Results Advanced colorectal neoplasia was detected in 2544 of the 35 918 included participants (7.1%). In the test set, a logistic-regression model showed that independent risk factors for advanced colorectal neoplasia were: age, sex, family history of colorectal cancer, cigarette smoking (p<0.001 for these four factors), and Body Mass Index (p=0.033). In the validation set, the model was well calibrated (ratio of expected to observed risk of advanced neoplasia: 1.00 (95% CI 0.95 to 1.06)) and had moderate discriminatory power (c-statistic 0.62). We developed a score that estimated the likelihood of detecting advanced neoplasia in the validation set, from 1.32% for patients scoring 0, to 19.12% for patients scoring 7–8.
Conclusions Developed and internally validated score consisting of simple clinical factors successfully estimates the likelihood of detecting advanced colorectal neoplasia in asymptomatic Caucasian patients. Once externally validated, it may be useful for counselling or designing primary prevention studies.
Statistics from Altmetric.com
Significance of this study
What is already known about this subject?
Colorectal cancer screening is currently recommended for average-risk population older than 50 years of age, but the adherence to this recommendation is generally not sufficient.
One of the most important barriers to screening is low perceived risk of colorectal cancer among average-risk patients and primary care providers.
Advanced colorectal neoplasia is not evenly distributed throughout the ‘average-risk’ screening population.
What are the new findings?
The new score uses age, sex, family history of colorectal cancer, cigarette smoking and Body Mass Index to estimate the likelihood of detecting advanced colorectal neoplasia in asymptomatic Caucasian patients.
The score estimated the likelihood of detecting advanced colorectal neoplasia in so-called ‘average-risk’ asymptomatic Caucasian patients from 1.3% to 19.1%.
How might it impact on clinical practice in the foreseeable future?
The score may increase colorectal cancer risk awareness and help healthcare providers to encourage people to get screened by adhering to existing national screening programmes.
By identifying patients at high likelihood of detecting advanced colorectal neoplasia, the score may help to target primary prevention interventions.
The strength of evidence regarding the efficacy of colorectal cancer screening in reducing the incidence of colorectal cancer and associated mortality is increasing.1 ,2 Colorectal cancer screening is currently recommended in the European Union3; however, adherence to this recommendation is not sufficient.4–6 One of the most important barriers to screening is a lack of perceived risk of colorectal cancer among average-risk patients and primary care providers.7 ,8 The risk of colorectal cancer or advanced colorectal neoplasia varies with regard to several factors, including age,9–11 sex,10–12 family history of colorectal cancer,10 ,13 smoking,14 ,15 obesity,11 ,16 diabetes mellitus,17 long-term non-steroid anti-inflammatory drug use,15 ,18 diet15 ,19 and physical activity.15 ,16 Information about some of these factors is easy to obtain and could be used to indentify patients at high-average risk of advanced colorectal neoplasia who are likely to benefit the most from screening. This high-average risk population should be the target of most intensive participation improvement interventions and primary prevention studies.
We performed a cross-sectional analysis of data from a national colonoscopy screening programme to derive and validate a risk prediction model for detection of advanced colorectal neoplasia. The results of the model were used to develop a simple scoring system that estimates the likelihood of detecting advanced colorectal neoplasia in asymptomatic patients.
Study design and oversight
We performed a cross-sectional analysis of database records for 40-year-old to 66-year-old patients who entered the national colonoscopy screening programme for colorectal cancer in Poland, from January 2007 through December 2007. The database contained demographic data, colonoscopy and histopathology results, follow-up information, and the results of an epidemiological questionnaire on potential risk factors for advanced colorectal neoplasia from 73 screening centres throughout Poland.
The research proposal was reviewed by the Research Ethical Committee at the authors’ institution and was judged to be exempt from oversight. Written informed consent was obtained from all participants entering the National Colorectal Cancer Screening Program.
Patients between the ages of 50 years and 66 years (40 years and 66 years in case of positive family history of cancer of any type) were advised by their family or general practitioners to participate in the screening. Exclusion criteria were clinical suspicion of colorectal cancer; characteristics that met the criteria for Lynch syndrome, familial adenomatous polyposis, or inflammatory bowel disease; and colonoscopy within the preceding 10 years. For this study, we excluded patients who had screening-detected polyps 10 mm or larger that were not removed (hence histology was unavailable) and patients who had not fully completed the epidemiological questionnaire.
Study procedures and definitions
Patients eligible for screening were asked to complete the epidemiological questionnaire regarding the following potential risk factors for advanced colorectal neoplasia: age,9–11 sex,10–12 weight and height (to calculate Body Mass Index),11 ,16 family history of colorectal cancer in first-degree relatives,10 ,13 diabetes mellitus,17 smoking history (number of years of smoking, number of cigarettes smoked per day, and current smoking status),14 ,15 and regular aspirin use (use for at least 3 months at any dose).15 ,18 Information about physical activity, diet, other than aspirin non-steroid anti-inflammatory drug use and alcohol consumption were not collected.
Screening colonoscopy procedures have been described elsewhere.10 All screening colonoscopists and histopathologists participated in the quality assurance programme.20 Colorectal findings were categorised on the basis of the most advanced lesion identified at screening (including additional required colonoscopies to remove all polyps, when indicated).9 ,10 Advanced neoplasia was defined as cancer or adenoma that was at least 10 mm in diameter, had high-grade dysplasia, had villous or tubulovillous histologic characteristics, or any combination thereof.9 ,10 For the purpose of the analysis, traditional serrated adenomas, sessile serrated lesions, and mixed serrated polyps were categorised as tubular adenomas. Polyps<10 mm in size that were not removed or retrieved were categorised as non-neoplastic.
The following predefined categories of variables were used to analyse the risk factors for detecting advanced neoplasia: age (40–49, 50–54, 55–59, or 60–66 years), sex, family history of colorectal cancer (none, one first-degree relative ≥60 years of age with colorectal cancer, one first-degree relative <60 years of age with colorectal cancer, or two first-degree relatives with colorectal cancer), pack-years smoked (none, <10, 10–19, or ≥20 pack-years), diabetes mellitus (yes or no), Body Mass Index (<25, 25–29, or ≥30 kg/m2), and regular aspirin use (yes or no). We performed a sensitivity analysis using age as a continuous variable and compared its discriminatory power with the model using age as a categorised variable.
The original dataset was randomly partitioned in a 1:1 ratio to generate a test set and a validation set, while controlling for the distribution of the most advanced lesions.21 A multivariate logistic regression model was used to investigate the relation between clinical variables and the presence of advanced neoplasia in the test set.22 The likelihood ratio test was used to determine a significant association of a particular variable with the presence of advanced neoplasia and the interaction between variables. For statistically significant effects, the OR and 95% CI were reported for each predefined category of variables. The model was internally validated using the validation set. The Hosmer–Lemeshow test was used to check the goodness-of-fit of the models.22
The calibration of the model was assessed using the validation set by comparing the expected and observed numbers of patients with advanced neoplasia, overall and for each category of variables.23 Homogeneous participant groups were defined by all combinations of categories of significant predictors. The expected number of patients with advanced neoplasia for each homogeneous group of study participants was calculated by summing the estimated individual absolute risk predicted by the model developed on the test set. The 95% CIs for the expected to observed ratio were calculated by using normal approximations to Poisson distributions.
The concordance statistic was used to measure models’ discrimination among patients with and without advanced neoplasia. For binary logistic regression models, the concordance statistic is equivalent to the area under the receiver-operating characteristic curve.24
The results of the multivariate logistic regression model were used to develop a risk score for detecting advanced neoplasia in asymptomatic patients. Model-adjusted coefficients were rounded up to the nearest one-half integer and then multiplied by two to avoid decimals.11 The performance of the risk score was assessed in the validation set using the concordance statistic.
A p value <0.05 was considered statistically significant. All reported p values are two-sided and not adjusted for multiple testing. The analyses were performed using Stata Statistical Software, V.10 (Stata Corporation, College Station, Texas, USA).
Of the 39 265 patients who met the eligibility criteria and underwent colonoscopy in one of the 73 screening centres between January and December 2007, 3347 (8.5%) were excluded, due to incomplete questionnaire feedback (3242 screened participants, 8.3%) or polyps measuring ≥10 mm that were not removed (105 screened participants, 0.3%).
The remaining 35 918 patients, 22 164 women (61.7%) and 13 754 men (38.3%), all Caucasians, had a mean age of 55.6±5.2 years. Of the 35 918 patients, 6897 (19.2%) had a family history of colorectal cancer, 15 678 (43.7%) had history of smoking, 1440 (4.0%) had diabetes mellitus, 7931 (22.1%) had a Body Mass Index ≥35 kg/m2, and 4623 (12.9%) reported regular use of aspirin. The characteristics of the study population are summarised in table 1.
Colonoscopy was completed to the caecum in 34 469 patients (96.0%). A total of 6909 patients (19.2%) had an adenoma or cancer. A total of 232 patients (0.6%) had polyps <10 mm in size that were not removed or retrieved, hence were categorised as non-neoplastic abnormalities. Advanced neoplasia was detected in 2544 patients (7.1%), including 336 participants (0.9%) with adenocarcinoma (table 1). Clinically significant complications requiring medical intervention occurred in 42 patients (0.1%) and included seven cases of perforation (three of which occurred after polypectomy), 21 episodes of bleeding, nine cardiovascular events, and five other events. No deaths occurred as a result of screening colonoscopy or its complications.
Model for the detection of advanced neoplasia
The test and validation sets consisted of 17 979 and 17 939 patients, respectively. We built the multivariate logistic regression model and used the test set to investigate the predictors of detecting advanced neoplasia. The results of the likelihood ratio test indicated significant association between the risk of detecting advanced neoplasia and the following variables: age, sex, family history of colorectal cancer, cigarette smoking and Body Mass Index (table 2). It also revealed significant association of the interaction between sex and Body Mass Index and the risk of detecting advanced neoplasia. The following insignificant variables were reduced from the model: diabetes and regular aspirin use (likelihood ratio test, p values equalling 0.24 and 0.95, respectively). Table 2 depicts the ORs and 95% CI for each category of a significant variable. Tests for goodness-of-fit of the models in the test and validation datasets permitted acceptance of the fit (p values equalling 0.74 and 0.16, respectively).
The results of the model calibration performed in the validation dataset are shown in table 3. The ratio of expected to observed risk of advanced neoplasia was 1.00 (95% CI 0.95 to 1.06) overall, 1.03 (95% CI 0.97 to 1.12) in women, and 0.98 (95% CI 0.91 to 1.06) in men, indicating good calibration. The concordance statistics of the model were 0.64 for the test set and 0.62 for the validation set, indicating moderate discrimination. A sensitivity analysis performed in a test set, with age as continuous variable showed comparable concordance statistics of the model (0.64, 95% CI 0.63 to 0.66; χ2 p value=0.82).
The score to predict detection of advanced colorectal neoplasia
The adjusted β coefficients of the logistic regression model fitted on the test set were used to develop the risk score by estimating the likelihood of detecting advanced neoplasia for each category of significant factors (see table 2). The scores for Body Mass Index ≥30 kg/m2 for different sexes were adjusted according to the interaction coefficient. The score calculated for each person from the validation set estimated the likelihood of detecting advanced neoplasia from 1.32% for patients with a score of 0 to 19.12% for patients with scores of seven and eight (figure 1). The performance characteristics of the score in the validation set are shown in table 4. The concordance statistic for the simplified score in the validation set was 0.62 (95% CI 0.60 to 0.64); the course of the receiver operating characteristcs curve is shown in online supplementary figure S1. Online supplementary table S1 depicts the ratio of expected to observed risk for advanced colorectal neoplasia in the validation set by simplified score.
Our previous study found that male sex, age of 50 years or more and family history of colorectal cancer were independent risk factors for detecting advanced colorectal neoplasia.10 In the ensuing discussion, it has been suggested that the observed disparity of advanced neoplasia risk between men and women might have merely reflected sex-based differences in smoking patterns.25 In the present study, we used a new dataset to derive and validate a model for the detection of advanced colorectal neoplasia that included smoking status and other potential confounders, such as age, sex, family history of colorectal cancer, and Body Mass Index. We confirmed previously identified associations, and also found that smoking ≥10 pack-years, and Body Mass Index ≥30 kg/m2 were independent risk factors for detecting advanced colorectal neoplasia. Our study corroborated previously identified risk factors for advanced colorectal neoplasia;10–16 it also, for the first time, combined all five important factors and their categories in a multivariate analysis and confirmed obtained results in a validation set.
The present model was well calibrated overall, as well as in men and women, as verified in the validation set, which means that the observed risk of advanced colorectal neoplasia well fitted the expected risk. Therefore, we used the model to develop a simple score for the detection of advanced colorectal neoplasia in asymptomatic patients. The score, based on age, sex, family history of colorectal cancer, smoking status and Body Mass Index, estimated the likelihood of detecting advanced colorectal neoplasia in the validation set from 1.32% to 19.12% in patients with 0 to 7–8 points, respectively. The estimation of individual risk of detecting advanced colorectal neoplasia may help asymptomatic patients and healthcare providers to make informed decisions about screening.26 For example, the likelihood of detecting advanced colorectal neoplasia in a 53-year-old, overweight, never-smoking woman, with one first-degree relative 60 years of age or older with colorectal cancer, is difficult to compare with that of a 56-year-old man who smoked for 20 pack-years, but has healthy weight and no family history of colorectal cancer. However, based on the results of the present model, the respective likelihood of detecting advanced neoplasia for two such patients are 4.65% and 12.46% (or 4.57% and 11.27%, respectively, using simplified scoring). Such results do not mean that one should discourage the woman from participation in an existing screening programme in a given country aiming at average risk group; rather they indicate that the man should be specifically encouraged to be screened, because the likelihood of detecting advanced neoplasia in his colorectum is almost twice that of the average screening population. For ease of clinical application, the present model could be transformed into an online calculator of the likelihood of detecting advanced colorectal neoplasia and used in mobile easy access media. Although lack of symptoms and low perceived risk of colorectal cancer are considered major barriers to screening,7 ,8 it is unknown, whether providing the estimate of individual risk could facilitate the informed decision to undergo endoscopy screening in a similar way as it worked for prostate cancer screening.27 It is particularly unknown, what kind of effect on participation in screening, would have the lower than average estimate of likelihood of detecting advanced colorectal neoplasia.
Another potential application of a model for the detection of advanced colorectal neoplasia is to guide practical recommendations for mass screening; however, this application would require a model with high discriminatory power.28 The present model had only moderate concordance statistic value, comparable to that of previously published models for the detection of advanced colorectal neoplasia in Western populations,11 ,29 even though the present model included more risk factors than previous models did. Three issues may explain this observation. First, the model of Betes et al11 lacked validation, which may have led to overestimation of its discriminatory power. Second, the models of Betes et al11 and Lin et al29 were derived from populations with a broader age range, which may have increased their discriminatory power, because age is the most powerful clinical risk factor for advanced colorectal neoplasia. Third, additional independent risk factors included in the present model were too weak to significantly change its discriminatory power.30 The models for the detection of advanced colorectal neoplasia in East Asian populations demonstrated variable discriminatory power.31 ,32 The model by Yeoh et al31 demonstrated discriminatory power comparable to that achieved in Western populations, while the model by Cai QC et al32 demonstrated better discrimination. The latter model missed family history of colorectal cancer but included various dietary factors, which (in contrast to previous studies15 ,16) showed strong association with the risk of advanced colorectal neoplasia; however, these factors are prone to recall bias. Therefore, it is rather unlikely that a model based on simple, reliable clinical factors alone would ever have sufficient discriminatory power to limit the target population for screening. It is corroborated by the results of a very recent study by Tao S et al.33 The model that included risk factors missing in the present study (alcohol consumption, red meat consumption, ever regular use of non-steroidal anti-inflammatory drugs, previous colonoscopy and previous detection of polyps), also demonstrated moderate discriminatory value.33 On the other hand, indirect comparison suggests that the present model's sensitivity and specificity for advanced colorectal neoplasia may be comparable with a single round guaiac faecal occult blood test (mostly due to poor diagnostic performance of guaiac faecal occult blood test for detection of advanced adenomas).34 ,35 Although the model has considerably lower discriminatory power for advanced colorectal neoplasia compared to the one reported for faecal immunochemical tests,34–37 it has been shown that combining clinical risk factors with faecal immunochemical test outcome results in improved discrimination.38 ,39 Therefore, it is likely that in the future the clinical factors identified in the model will be combined with results of faecal immunochemical test and/or blood-based biomarkers to select a target population for colonoscopy.
Our study has certain notable features. Despite a large sample size, we have not identified any statistically significant association between diabetes mellitus or aspirin use and the risk of advanced colorectal neoplasia. Although diabetes mellitus is a known risk factor for colorectal cancer,17 its association with advanced colorectal neoplasia is less certain.31 The observed lack of association may also be due to the lower-than-expected prevalence of diabetes mellitus in the study cohort,40 which likely reflects recall bias or self-selection to opportunistic screening.41
The lack of a statistically significant association between aspirin use and the risk of advanced colorectal neoplasia in our study may be due to recall bias or missing data regarding the dose and regularity of aspirin use. Moreover, we have not collected the data on other non-steroid anti-inflammatory drugs, which in some studies were analysed together with aspirin.15
Advanced neoplasia, not just cancer, was chosen for analysis because it has been suggested as the most appropriate target for endoscopy screening.9–12 ,15 ,29 ,31 ,32 Although some previous risk prediction models were developed for cancer alone,23 ,42 ,43 cancers and advanced neoplasia are surrogate endpoints of primary cancer screening endpoint, which is colorectal cancer mortality.44 Early detection and treatment of colorectal cancer is associated with a reduction in colorectal cancer mortality,2 but a detection and removal of adenomas, especially advanced ones, is associated with additional reduction in colorectal cancer incidence and mortality.45 ,46 Therefore, it is uncertain, whether cancer alone or advanced neoplasia is a better endpoint for risk prediction models, but the latter may be particularly suited for use in endoscopy screening.
The primary endpoint of our model was advanced neoplasia located anywhere in the colorectum, therefore, the risk score is not optimised for sigmoidoscopy screening. Nevertheless, we built an additional model to investigate risk factors for detecting distal advanced neoplasia, using sigmoid-descending colon junction as an artificial boundary between distal and proximal colon. The model for the detection of distal advanced neoplasia identified the same risk factors and showed comparable discriminatory power (data not shown).
The limitations of our study require comment. First, the validation process was limited because it was performed in a dataset that was randomly selected from a population recruited in the same setting. The model's performance has not been tested outside Poland or in non-Caucasians. Nonetheless, the National Colorectal Cancer Screening Program recruited participants in 73 centres located in all administrative and geographic regions of Poland, and was open free of charge to all eligible Polish citizens, providing our study with sociodemographic diversity. Moreover, the prevalence of advanced colorectal neoplasia identified in our study was 7.1%, which is within the range of values reported in studies performed in the USA9 ,15 ,29 and Europe.5 ,10 ,11 Additionally, the adjusted ORs for detecting advanced colorectal neoplasia in various categories of risk factors in our study are similar to that reported in previously published large studies.10 ,31 ,32 ,47 ,48 Notably, in our previous study,10 performed several years before and in different endoscopy centres, we used the same key to categorise age, family history of colorectal cancer and gender and yielded virtually the same adjusted ORs for each category of variables.
Second, our cohort does not fully cover the recommended age range for screening (people aged 67–75 years were not included) what may limit the applicability of the results for the entire population eligible for screening.49 On the other hand, by including people at the lower age range for screening (people aged 40–49 years with family history of cancer), this model may help to identify and encourage younger people at considerable risk to undergo screening.
Third, given the cross-sectional design of the study, our risk score is suitable only to predict the detection of advanced neoplasia at the present time, and not the future risk of developing advanced colorectal neoplasia or dying from colorectal cancer.
In summary, we derived and internally validated a model that predicts the likelihood of detecting advanced colorectal neoplasia in asymptomatic Caucasian patients based on age, sex, smoking habits, Body Mass Index, and family history of colorectal cancer. The results of the model were used to develop a simple score that estimates the likelihood of detecting advanced colorectal neoplasia. Once externally validated, the score may be useful for counselling or designing primary prevention studies.
The authors are grateful to all the endoscopists and histopathologists who participated in the National Colorectal Cancer Screening Program in Poland. The authors thank Michael Bretthauer, MD, PhD, and Øyvind Holme, MD, from the Department of Management and Health Economy University of Oslo, and the Department of Transplantation Medicine, Gastroenterology, Oslo University Hospital at Rikshospitalet, Oslo, Norway, for their editorial review of an earlier draft of this article. The authors thank Tomasz Burzykowski, PhD, from the Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium, Mitchell Gail, MD, PhD, from the Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, and Krzysztof Przewozniak, MSc, from the Cancer Epidemiology and Prevention Department, the Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland, for their advice.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
Contributors The study was designed by the first three investigators, and the first author wrote the first draft of the manuscript. All authors participated in data collection and analysis, contributed to the manuscript, approved the final version of the manuscript, agreed to submit the manuscript for publication, and vouched for the completeness and accuracy of the data.
Funding This study was supported by the Polish Ministry of Health and the Polish Foundation of Gastroenterology. Michal F Kaminski received a stipend from the Foundation for Polish Science during the study period.
Competing interests None.
Patient consent Obtained.
Ethics approval the Research Ethical Committee at the Maria Sklodowska-Curie Memorial Cancer Centre and Institute of Oncology.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.