Objective Colorectal cancer (CRC) is the second leading cause of cancer-associated mortality in the USA. The faecal microbiome may provide non-invasive biomarkers of CRC and indicate transition in the adenoma–carcinoma sequence. Re-analysing raw sequence and metadata from several studies uniformly, we sought to identify a composite and generalisable microbial marker for CRC.
Design Raw 16S rRNA gene sequence data sets from nine studies were processed with two pipelines, (1) QIIME closed reference (QIIME-CR) or (2) a strain-specific method herein termed SS-UP (Strain Select, UPARSE bioinformatics pipeline). A total of 509 samples (79 colorectal adenoma, 195 CRC and 235 controls) were analysed. Differential abundance, meta-analysis random effects regression and machine learning analyses were carried out to determine the consistency and diagnostic capabilities of potential microbial biomarkers.
Results Definitive taxa, including Parvimonas micra ATCC 33270, Streptococcus anginosus and yet-to-be-cultured members of Proteobacteria, were frequently and significantly increased in stools from patients with CRC compared with controls across studies and had high discriminatory capacity in diagnostic classification. Microbiome-based CRC versus control classification produced an area under receiver operator characteristic (AUROC) curve of 76.6% in QIIME-CR and 80.3% in SS-UP. Combining clinical and microbiome markers gave a diagnostic AUROC of 83.3% for QIIME-CR and 91.3% for SS-UP.
Conclusions Despite technological differences across studies and methods, key microbial markers emerged as important in classifying CRC cases and such could be used in a universal diagnostic for the disease. The choice of bioinformatics pipeline influenced accuracy of classification. Strain-resolved microbial markers might prove crucial in providing a microbial diagnostic for CRC.
- COLORECTAL CANCER
- COLORECTAL ADENOMAS
- INTESTINAL BACTERIA
Statistics from Altmetric.com
Twitter Follow Manasi Shah @GoingByGut
Contributors MSS: Study design, data collection, sequence processing, statistical analysis and manuscript preparation. TDeS: Study design, data collection, sequence processing, statistical analysis and manuscript preparation. TW: Sequence processing and manuscript preparation. PJMcM: Statistical analysis and manuscript preparation. JLC: Sequence processing and manuscript preparation. AA: Statistical analysis and manuscript preparation. J-MY: Statistical analysis and manuscript preparation. EBH: Study design, data collection, sequence processing, statistical analysis and manuscript preparation.
Competing interests MSS worked as a consultant with Second Genome during the course of work. TDeS, TW, PJMcM and AA were employed by Second Genome during the course of the work and hold stock options.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.