Article Text

Original research
Integrated metagenomic and metabolomic analysis reveals distinct gut-microbiome-derived phenotypes in early-onset colorectal cancer
  1. Cheng Kong1,2,
  2. Lei Liang1,2,
  3. Guang Liu3,
  4. Lutao Du4,
  5. Yongzhi Yang1,2,
  6. Jianqiang Liu5,
  7. Debing Shi1,2,
  8. Xinxiang Li1,2,
  9. Yanlei Ma1,2
  1. 1 Department of Colorectal Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
  2. 2 Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
  3. 3 Guangdong Hongyuan Pukang Medical Technology Co., Ltd, Guangdong, China
  4. 4 Department of Clinical Laboratory, The Second Hospital of Shandong University, Jinan, Shandong province, China
  5. 5 Department of Endoscopy, Fudan University Shanghai Cancer Center, Shanghai, China
  1. Correspondence to Dr Yanlei Ma, Department of Colorectal Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China; yanleima{at}fudan.edu.cn

Abstract

Objective The incidence of early-onset colorectal cancer (EO-CRC) is steadily increasing. Here, we aimed to characterise the interactions between gut microbiome, metabolites and microbial enzymes in EO-CRC patients and evaluate their potential as non-invasive biomarkers for EO-CRC.

Design We performed metagenomic and metabolomic analyses, identified multiomics markers and constructed CRC classifiers for the discovery cohort with 130 late-onset CRC (LO-CRC), 114 EO-CRC subjects and age-matched healthy controls (97 LO-Control and 100 EO-Control). An independent cohort of 38 LO-CRC, 24 EO-CRC, 22 LO-Controls and 24 EO-Controls was analysed to validate the results.

Results Compared with controls, reduced alpha-diversity was apparent in both, LO-CRC and EO-CRC subjects. Although common variations existed, integrative analyses identified distinct microbiome–metabolome associations in LO-CRC and EO-CRC. Fusobacterium nucleatum enrichment and short-chain fatty acid depletion, including reduced microbial GABA biosynthesis and a shift in acetate/acetaldehyde metabolism towards acetyl-CoA production characterises LO-CRC. In comparison, multiomics signatures of EO-CRC tended to be associated with enriched Flavonifractor plauti and increased tryptophan, bile acid and choline metabolism. Notably, elevated red meat intake-related species, choline metabolites and KEGG orthology (KO) pldB and cbh gene axis may be potential tumour stimulators in EO-CRC. The predictive model based on metagenomic, metabolomic and KO gene markers achieved a powerful classification performance for distinguishing EO-CRC from controls.

Conclusion Our large-sample multiomics data suggest that altered microbiome–metabolome interplay helps explain the pathogenesis of EO-CRC and LO-CRC. The potential of microbiome-derived biomarkers as promising non-invasive tools could be used for the accurate detection and distinction of individuals with EO-CRC.

  • COLORECTAL CANCER

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • The incidence of early-onset colorectal cancer (EO-CRC) is steadily increasing.

  • Multiple studies have suggested a critical role for the gut microbiota and metabolites in the pathogenesis of CRC.

  • Characteristics of the microbiota, metabolites, microbial enzyme-involved interactions and diagnostic efficacy of these biomarkers in patients with EO-CRC have not been reported.

WHAT THIS STUDY ADDS

  • Reduced alpha-diversity and shifted stool microbiome and metabolome were apparent in both, late-onset CRC (LO-CRC) and EO-CRC.

  • Fusobacterium nucleatum enrichment and short-chain fatty acid depletion characterise LO-CRC, including reduced microbial GABA biosynthesis and a shift in acetate/acetaldehyde metabolism towards acetyl-CoA production.

  • Enriched Flavonifractor plauti and elevated red meat intake-related species, choline metabolites, KEGG orthology (KO) pldB and cbh gene axis may be potential tumour stimulators in EO-CRC.

  • The predictive model based on metagenomic, metabolomic and KO gene markers achieved a powerful classification performance for distinguishing EO-CRC from the control.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Altered microbiome–metabolome interplay helps explain the pathogenesis of EO-CRC and LO-CRC.

  • The potential of gut microbiota-derived biomarkers to be used as promising non-invasive tools for the accurate detection and distinction of individuals with EO-CRC.

Introduction

The increase in early-onset cancers in various organs has become a global concern, particularly early-onset colorectal cancer (EO-CRC) diagnosed before the age of 50, whose incidence is steadily rising in many parts of the world.1–3 There are several possible reasons for this alarming finding. Younger patients are not covered by routine CRC screening owing to the traditional recommendation that CRC screening start at the age of 50. Patients with EO-CRC have also shown adverse pathological features and more advanced stages of CRC, and there remains a lack of diagnostic and therapeutic protocols dedicated to sporadic CRC in young individuals.4 Although multiple factors may be involved, including genetics and the environment, it is unclear whether there are unique molecular signatures and multiomics profiles underlying this cohort of patients with EO-CRC.

The link between CRC, the gut microbiota and its metabolites has been supported by several studies.5 6 Interestingly, the impact of gut microbiota on health seems to be important in both, elderly and young people, because it may regulate aging-related changes in inflammation, innate immunity and cognitive function.7 8 Both, in vivo and in silico studies have revealed significant differences in the gut microbiomes of young and old individuals.9–11 Unique gut microbial metabolites, such as specific beneficial bile acids, have been reported to accumulate in centenarians and may affect their health as potential ‘gut hormones’ and serve as biomarkers.12 A study based on the faeces of 314 young individuals of multiple ethnicities from different regions of China found that a functional core gut microbiome was present in a healthy population of young Chinese people, that included various bacteria involved in butyrate production, maintaining anti-inflammatory function, and intestinal barrier integrity in healthy people.13 However, risk factors such as unhealthy diets, obesity and sedentary lifestyles are increasing in the younger generation, leading to our hypothesis that altered gut microbiota and metabolites in young individuals may interact with their underlying genetic background to trigger early disease onset. It has been reported that long-term physiological stress changes the composition and metabolism of gut bacteria in young people, and causes increased permeability and inflammation, making it possible to diagnose and explain the pathogenesis of early-onset patients based on the characteristic microbial spectrum.14 15 In the EO-CRC population, the phenotypes and diagnostic characteristics of gut microbiota and/or their metabolites studied through large-sample multiomics are not known. Here, we defined the gut microbiome, metabolites and KEGG orthology (KO) gene signatures for EO-CRC and late-onset CRC (LO-CRC), determined from metagenomic and metabolomic characterisation of stool samples from CRC patients and age-matched healthy volunteers. Notably, the random forest model accurately identified patients with EO-CRC and LO-CRC in two independent cohorts, demonstrating its robustness and potential utility as a diagnostic tool.

Methods

Study subjects and sample collection

A total of 549 faecal samples were collected from the Fudan University Shanghai Cancer Center, Shanghai, China (discovery cohort, n=441) and the Second Hospital of Shandong University, Shandong, China (validation cohort, n=108) from 2018 to 2021. Faecal samples were stored at −80°C until microbial and metabolic analysis. In the sporadic CRC group, all patients were newly diagnosed with CRC based on postoperative pathological examination. Tumour stage was evaluated based on tumour size, node and metastasis (TNM) staging system. Stool samples were collected before surgery, and participants with a family history of CRC, irritable bowel syndrome, neoadjuvant therapy and other coexisting malignancies were excluded. The enrolled CRC patients were divided into an EO-CRC group (aged <50 years) and an LO-CRC group (≥50 years). The healthy control group recruited volunteers with no gastrointestinal tumours as confirmed by colonoscopic screening and were then divided into the EO-Control group (aged <50 years) and the LO-Control group (≥50 years). Participants’ demographics and clinicopathological characteristics, including age, sex, tumour location and size, tumour differentiation, TNM stage, KRAS/NRAS/BRAF mutations, nerve, lymphatic and vascular invasion, and mismatch repair (MMR) status, were collected from the electronic medical record system.

Faecal DNA extraction, library construction and metagenomic sequencing

Faecal DNA was extracted using the QIAamp DNA Stool Mini Kit (Qiagen, Hilden, Germany). DNA integrity, sizes and concentrations were determined by agarose gel electrophoresis and NanoDrop spectrophotometry (NanoDrop, Germany). Sequencing libraries were constructed as previously described.16 After library quality control, high-throughput sequencing was performed using the NovaSeq6000 platform (Illumina).

Sequencing data analysis

Raw sequencing reads were processed to obtain valid reads as previously described.16 Quality-filtered reads were obtained and reassembled using IDBA-UD (V.1.1.1). The clean reads were aligned to the database (V.202003, (ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/)) using Kraken2 software (V.2.1.1) and Braken software (V.2.5) to obtain species-level information. Based on the taxonomic profiling from the results of Kraken2, profiles at the species-level were selected for further analysis. Biomarkers among groups were determined based on linear discriminant analysis (LDA) scores from LDA effect size (LEfSe) analysis. MaAsLin2 was used to evaluate the multivariable association between phenotypes (age and sex) and microbial taxa features. The KO gene profile was used to explore the differences between the case and control groups, and a p value<0.05 was considered significant using two-sided Mann-Whitney U test. To explore the pathway modules in the gut microbial populations, KO genes from three categories were selected. Lists of amino acids and lipids related to the KO genes were obtained from KEGG BRITE ‘ko00002.keg’ with the keywords ‘amino acid metabolism’ and ‘lipid metabolism’. Lists of KO genes reported in microbes were collected from KEGG BRITE ‘map01120’ with the keywords ‘microbial metabolism in diverse environments’.

Faecal metabolite extraction and liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis

The faecal metabolite extraction method was adapted from a previously published protocol.17 All chromatographic separations were performed using an ultra-performance liquid chromatography system (SCIEX, UK). A high‐resolution tandem mass spectrometer TripleTOF5600plus (SCIEX, UK) was used to detect eluted metabolites in the column, as described previously.18 The pretreatments on acquired MS data were performed using the XCMS software. Detailed methods for LC-MS raw data file processing, calculation of the exact molecular mass data (m/z), online KEGG and HMDB database annotation, and quality control have been described previously.18 Tail area-based false discovery rate (Fdr) correction (q-value) was applied to strictly control for false positives. Metabolites were considered as biomarkers if they fulfilled the following standards: (1) q <0.05, p value using two-sided Mann-Whitney U test corrected by Fdr and (2) variable importance for the projection >1 using partial least squares discriminant analysis (PLS-DA).

Random forest and biomarker identification

To explore the potential use of biomarkers (metagenomic and metabolomic features) in disease diagnosis and prediction, a machine learning method (random forest) was used to construct classification models based on the species-level, KO genes, metabolite profiles and a combination of the three types of data. First, a feature importance score (mean decrease accuracy, which indicates the feature contributing to the accuracy of the model) was calculated using the random forest algorithm. Second, ten trials of 10-fold cross-validation were performed using random forest to identify optimal biomarkers with the cut-off point selected by the mean of the minimum cross-validation error. Random forest analysis was performed using R software and the Rscript was replicated (cv.time, rfcv1 (train.x, train.y, cv.fold=cv.fold, step=cv.step), simplify=F). The optimal biomarker sets were selected according to the cut-off point in the cross-validation error curve, which was considered the minimum cross-validation error.

Statistical analyses

Unpaired Student’s t-test, Mann-Whitney U test or Dunnett’s t-test were used to evaluate the differences between quantitative data, wherever appropriate. Continuous variables were compared using a two-sided Wilcoxon rank-sum test. Pearson’s χ2 test or Fisher’s exact test were used to compare the categorical variables. Spearman’s correlation analysis was performed to analyse the correlations among taxa, metabolites and KO genes. Statistical analyses were performed using Graph Pad Prism V.8.0 software (GraphPad Software, San Diego, California, USA), R V.3.6.3 (R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/) and Microsoft Excel (Microsoft, Seattle, Washington, USA).

Results

Participant information

The metagenomic sequences and LC-MS/MS metabolomes of faecal samples taken from 441 individuals comprising 130 patients with LO-CRC, 114 patients with EO-CRC, 97 LO-Control subjects and 100 EO-Control subjects from Shanghai (discovery cohort) were analysed. In this cohort, we aimed to characterise the gut microbial taxonomic and metabolic signatures in LO-CRC and EO-CRC populations and determine the association between differential taxa, metabolites and KO genes for microbial enzymes (figure 1A). The clinical characteristics of age were matched between the CRC and corresponding control groups. The LO-CRC and EO-CRC groups had similar clinical characteristics, including sex, tumour differentiation, tumour node metastasis (TNM) stage, tumour size and location, KRAS/NRAS/BRAF mutations, nerve and vascular invasion, whereas the rate of lymphatic invasion and microsatellite instability was significantly higher in patients with EO-CRC than in those with LO-CRC. The detailed demographic, clinical and biochemical profiles of the cohort are provided in online supplemental table S1.

Supplemental material

Figure 1

Bacterial diversity of the faecal microbiota associated with LO-CRC and EO-CRC. (A) Study design and flow diagram. Faecal metagenomic sequence and faecal LC-MS/MS metabolome were analysed from 441 individuals comprising 130 patients with LO-CRC, 114 patients with EO-CRC, 97 LO-Control subjects, and 100 EO-Control subjects from Shanghai (Discovery cohort). In this cohort, we aim to characterise the gut microbial taxonomic and metabolic signatures in LO-CRC and EO-CRC populations, and to determine the association between differential taxa, metabolites and KO genes. (B, C) Faecal microbial diversity estimated by (B) Breakaway estimates index. P values are calculated by two-sided unpaired Student’s t-test. Bars represent SD. (C, D) Beta diversity calculated by PCoA of Bray-Curtis distance and PERMANOVA (C) between LO-CRC and LO-Control, and (D) between EO-CRC and EO-Control. n=441 from the Discovery cohort. 130 LO-CRC vs 97 LO-Control, two-sided, p=0.001; 114 EO-CRC vs 100 EO-Control, p=0.001. CRC, colorectal cancer; EO-CRC, early-onset CRC; LC-MS/MS, liquid chromatography-tandem mass spectrometry; PCoA, principal coordinate analysis; PERMANOVA, permutational multivariate analysis of variance analysis.

Decreased alpha-diversity and altered overall microbial composition in LO-CRC and EO-CRC populations

Compared with age-matched healthy controls (LO-Control and EO-Control), faecal alpha-diversity breakaway estimates were significantly lower in LO-CRC and EO-CRC groups (p=0.0039 and 0.0074, respectively) (figure 1B). Beta diversity was calculated by Bray-Curtis distance, and principal coordinate analysis (PCoA) was performed to display the microbiome distances between samples. The results indicated a significantly different distribution of faecal bacteria between LO-CRC and LO-Control, and between EO-CRC and EO-Control (both p=0.001, PERMANOVA, figure 1C,D). Collectively, these data suggest that LO-CRC and EO-CRC have different diversity and microbial distance metrics from their corresponding control groups, and the similar diversity of LO-CRC and EO-CRC groups is driven by their unique microbial signatures.

Taxonomic signatures of microbiota in LO-CRC and EO-CRC groups

A broad overview of the species-level taxonomic data from all 441 subjects is shown in online supplemental figure S1A. The differentially abundant species signatures in LO-CRC and EO-CRC were assessed by LDA coupled with effect size analysis (LEfSe) algorithms between (1) LO-CRC and LO-Control groups and (2) EO-CRC and EO-Control groups. A total of 335 species between LO-CRC and LO-Control groups, and 106 species between EO-CRC and EO-Control groups were identified as differentially abundant bacterial species with LDA score >2.0 and p<0.05 (online supplemental table S2,S3). When comparing the top 30 different species in each group, decreased abundance of bacteria, including the butyric acid-producing bacteria Faecalibacterium prausnitzii, Eubacterium rectale, Eubacterium eligens, Roseburia intestinalis, Roseburia hominis, Bifidobacterium adolescentis and Anaerostipes hadrus, were observed in both, LO-CRC and EO-CRC groups, compared with their age-matched controls, which was consistent with findings from previous studies with large CRC cohorts (figure 2A,B).6 Notably, Bacteroides fragilis, Bacteroides ovatus, Bacteroides thetaiotaomicron and Porphyromonas asaccharolytica were common microbial signatures in LO-CRC and EO-CRC groups, which is supported by previous studies showing that enterotoxigenic Bacteroides fragilis and Porphyromonas have the potential to induce intestinal inflammation and tumourigenesis.19 20 Furthermore, the species Flavonifractor plautii, Bacteroides vulgatus, Bacteroides cellulosilyticus, Parabacteroides sp CT06 and Odoribacter splanchnicus comprised the unique taxa signature in the EO-CRC group, while an uniquely increased abundance of species Fusobacterium nucleatum, Bacteroides caccae, Prevotella intermedia and Enterococcus faecium were observed in the LO-CRC group (figure 2C). Multivariate association with linear models (MaAsLin2) was then applied to control for potentially confounding factors, including age and sex, and two species in the LO-CRC signature and six species in the EO-CRC signature still reached statistically significant associations (figure 2C). Among them, F. nucleatum, the core carcinogenic bacteria of extensive concern, was still significantly enriched in the LO-CRC group.21 Moreover, Flavonifractor plautii, which was identified as one of the key bacteria associated with CRC in Indians in a recent study,22 was consistently the dominant population of the EO-CRC group in both, the current data and our previous study.16 We further identified Lachnospiraceae bacterium GAM79 and Collinsella aerofaciens as key taxa signatures in the LO-Control and EO-Control groups, respectively (figure 2D). Twelve species remained significantly different after adjusting for MaAsLin2 in both, LO-Control and EO-Control microbial signatures (figure 2D). These findings indicate that unique gut microbiome profiles are present in LO-CRC and EO-CRC groups, which may be associated with differences in tumour clinicopathological characteristics and diagnosis.

Supplemental material

Figure 2

Taxonomic signatures of LO-CRC and EO-CRC microbiota. (A, B) The top 30 histograms of LDA coupled with effective size measurement based on the metagenomic sequencing (adjusting by MaAsLin2, n=130 for LO-CRC, n=114 for EO-CRC, n=97 for LO-Control, and n=100 for EO-Control in Discovery cohort) (A) between LO-CRC and LO-Control, and (B) between EO-CRC and EO-Control. P values are calculated by Kruskall-Wallis test, logarithmic LDA score >2.0, p<0.05. ‘#’ showed bacterial taxa with distinct relative abundances between groups detected by metagenomic sequencing after adjusting for the age and gender using MaAsLin2. (C D) Venn diagram outlined the taxa signature associated with LO-CRC, EO-CRC, LO-Control and EO-Control, respectively, as well as the taxa consistently altered in both CRCs or both health controls. ‘#’ showed bacterial taxa with distinct relative abundances between Case and Control detected by metagenomic sequencing after adjusting for the age and gender using MaAsLin2. CRC, colorectal cancer; EO-CRC, early-onset CRC; LO-CRC, late-onset CRC; LDA, linear discriminant analysis.

Faecal metabolomic alterations in LO-CRC and EO-CRC

Given the interactions between gut microbiota and host-microbe cometabolism, we performed non-targeted metabolomics on stool samples to assess the overall differences in faecal metabolites between CRCs and healthy controls (LO-CRC n=130, LO-Control n=97, EO-CRC n=114, and EO-Control n=100). A broad overview of our metabolomic data is shown in online supplemental figure S1B. PLS-DA models revealed that the metabolomic composition of LO-CRC and EO-CRC groups was largely separated from their corresponding controls, which is consistent with the broad changes in faecal taxa profiles described in the earlier context (online supplemental figure S2A,B). We then investigated the association of each annotated metabolite with the LO-CRC and EO-CRC groups and identified 162 differential metabolites between LO-CRC and LO-Control, and 167 between EO-CRC and EO-Control groups (figure 3A,B, online supplemental table S4,S5). The potentially ecotoxic metabolite perfluorooctanesulfonic acid accumulated consistently in LO-CRC and EO-CRC samples.23 Among the amino acid metabolites, L-phenylalanine and D-ornithine levels were significantly increased in both CRCs, whereas several specific amino acids such as glycine, L-aspartate, tryptophan and microbiota derivatives of tryptophan (indole-3-acetaldehyde) were enriched only in EO-CRC samples (figure 3C,D). Amino acid metabolites, including L-arginine, acetate and acetaldehyde, were significantly reduced only in LO-CRC samples (figure 3C). Interestingly, compared with the EO-Control, bile acids and choline metabolites were significantly more abundant in EO-CRC, with the strong effects observed among phosphatidylcholine, 1-acyl-sn-glycero-3-phosphocholine, choline, linoleate, deoxycholic acid and cholic acid. (figure 3D). The distinct roles of these organic compounds in LO-CRC and EO-CRC need to be further studied, allowing for potential correlation analysis based on metabolite–microbial interactions.

Supplemental material

Figure 3

Faecal metabolome changes in LO-CRC and EO-CRC. (A, B) volcano plot demonstrated metabolites changes (A) between 130 LO-CRC and 97 LO-Control, and (B) between 114 EO-CRC and 100 EO-Control, respectively. The x-axis indicates log2-transformed fold change of faecal metabolite abundances, and the y-axis denotes log10-transformed Q values (p value adjusted using the tail area-based FDR). The horizontal lines represent q<0.05. (C, D) Boxplot showed representative metabolites that were significantly changed in (C) LO-CRC and (D) EO-CRC, respectively. Metabolite abundances are visualised after log2 transformation. CRC, colorectal cancer; EO-CRC, early-onset CRC; FDR, false discovery rate; LO-CRC, late-onset CRC.

LO-CRC-associated and EO-CRC-associated changes in microbial genes summarised in KO genes and KEGG pathway modules

Considering the multiomics shift of the gut microbiome and metabolome in CRC, we hypothesised that metabolite differences might reflect differences in microbial enzyme gene expression. To further determine the microbial metabolic processes occurring in LO-CRC and EO-CRC, we annotated metagenome-analysed microbial genes in the KEGG orthology (KO) database (online supplemental figure S1C). In the PCoA analysis, a significantly different distribution of KO genes was observed between LO-CRC and LO-Control, and between EO-CRC and EO-Control groups (online supplemental figure S2C-D). Next, pathway modules were constructed by modifying KEGG pathway maps that referred to amino acid metabolism (ko00002.keg), microbial metabolism in diverse environments (map01120), and lipid metabolism (ko00002.keg) (figure 4A–C). In amino acid metabolism function, many more functional modules, such as urea cycle (genes OTC and argH), leucine/isoleucine biosynthesis (genes leuA, leuB, leuC, leuD, ilvB, ilvC, ilvD), histidine biosynthesis (genes hisA, hisB, hisC, hisD, hisF, hisG, hisH) and lysine biosynthesis (genes lysA, dapA, dapB and dapF), were significantly depleted in LO-CRC and EO-CRC groups relative to age-matched controls (figure 4A). Among the differentially abundant pathways, histidine degradation genes (fctD, hutl, hutU, hutH), leucine degradation genes (DLD, bkdA) and phenylalanine biosynthesis genes (AROA1, pheA2) were upregulated in all CRCs. Notably, the tryptophan biosynthesis pathway (trpB) and arginine biosynthesis pathway (argE) were significantly upregulated in EO-CRC but not in LO-CRC. Furthermore, the expression of genes involved in GABA biosynthesis (gene MAO) was significantly decreased in the LO-CRC KO signature (figure 4A). In the KO map of microbial metabolism in diverse environments, the pentose phosphate pathway (gene rpiA) was elevated in LO-CRC compared with EO-CRC (figure 4B). Furthermore, ectoine degradation-related genes were significantly increased in EO-CRC (gene doeD, metabolises ectoine to produce aspartate) but decreased in LO-CRC (gene doeB) compared with age-matched controls. Importantly, in our metabolite results, L-aspartate, a metabolite of ectoine, accumulated in EO-CRC. Recent studies have shown that aspartate has a potential protumour effect, and aspartate starvation therapy is one of the strategies for tumour suppression, which may be a unique metabolic characteristic of EO-CRC.24 Furthermore, in the lipid pathways, genes involved in phosphatidylcholine biosynthesis (genes PCYT1 and pmtA) were elevated mainly in EO-CRC (figure 4C).

Figure 4

The LO-CRC-associated and EO-CRC-associated changes in microbial genes summarised in KO genes and KEGG pathway modules. (A–C) Gene abundances were assessed for significant elevation or depletion (p<0.05; one-sided Mann-Whitney U test) between 130 LO-CRC and 97 LO-Control, and between 114 EO-CRC and 100 EO-Control, respectively. relative abundance of KO genes involved in (A) amino acid metabolism, (B) microbial metabolism in diverse environments, and lipid metabolism that showed significant differences between case and control were shown in the heat map. KO genes with a prevalence of 10% or higher (KO genes detected in more than 10% out of 441 subjects) are shown. Significant changes (elevation and depletion) are denoted as follows: *p<0.05; **p<0.01; ***p<0.005. CRC, colorectal cancer; EO-CRC, early-onset CRC; LO-CRC, late-onset CRC.

To explore more accurate evidence of unique metabolic characteristics and microbial enzyme–metabolite interactions, based on the existing metabolite reactions in the KEGG database, we associated the metabolite markers with the KO markers found in LO-CRC and EO-CRC (online supplemental table S6,S7). In the representative formula listed for chemical reactions involving metabolite and enzyme genes, acetate and acetaldehyde as substrates were significantly decreased while the enzymes metabolising them, namely, atoD, atoA and E1.2.1.10, were enriched in LO-CRC patients (figure 5A). The orientation of these reactions indicated that patients with LO-CRC may consume and metabolise acetate and acetaldehyde to give a large amount of the intermediate product, acetyl-CoA, which could subsequently be involved in tumour metabolism (figure 5A).25 26 In other identified chemical reactions, genes responsible for the production of acetate and acetaldehyde (argE and cbiGH-cobJ) for subsequent acetyl-CoA synthesis were significantly elevated in LO-CRC group (figure 5A). Furthermore, cholic acid (the primary bile acid), deoxycholic acid (the secondary bile acid metabolised by gut microbiota, an important tumour stimulating factor,27 and its synthase enzyme gene cbh, were significantly upregulated in EO-CRC patients compared with EO-Control. Some choline metabolites accumulated while enzyme genes for choline, phosphatidylcholine, 1-Acyl-sn-glycero-3-phosphocholine metabolism, including pldB, NTE, adhE, gbsB and yiaY, were significantly downregulated in EO-CRC patients (figure 5B). Taken together, our results suggest that patients with LO-CRC and EO-CRC have unique microbial enzymatic reactions and metabolic processes.

Figure 5

The metabolisation–enzyme reaction formulas with representative KO genes and metabolites markers in LO-CRC and EO-CRC. (A, B) representative KO genes, metabolites markers appearing in the existing (A) LO-CRC and (B) EO-CRC metabolisation–enzyme reactions are shown in the formula listed. Each boxplot in a reaction represents a compound or a KO gene (two-side Wilcoxon rank-sum). Bar plots show relative metabolite concentrations or gene abundances averaged over samples within each group (n=130 for LO-CRC, n=114 for EO-CRC, n=97 for LO-Control, and n=100 for EO-Control in the discovery cohort) and are coloured according to the group. CRC, colorectal cancer; EO-CRC, early-onset CRC; LO-CRC, late-onset CRC.

Associations between the disease-linked microbiota and metabolites

Our multiomics data enabled us to identify the dynamic interactions among differential taxonomic, metabolic and KO gene signatures. To dissect interactions between the host and microbiota that might underlie features in LO-CRC and EO-CRC, we assessed the correlations between all differentially abundant taxa and metabolites, and combined the associations of representative taxa, metabolites and KO genes to draw network diagrams representing the multiomics signatures of LO-CRC and EO-CRC, respectively (figure 6A,B, online supplemental figures S3,S4). In general, we observed strong positive associations between taxa and metabolites that were elevated in both the controls, as well as negative associations between control-enriched taxa and disease-enriched metabolites (online supplemental figures S3,S4). Our results demonstrated that the reduced level of acetate was closely related to the decreased butyrate-producing Roseburia intestinalis, Eubacterium eligens, Faecalibacterium prausnitzii, Anaerostipes hadrus, Eubacterium rectale in LO-CRC, and was negatively correlated with the increased abundance of potentially carcinogenic Bacteroides fragilis, Porphyromonas asaccharolytica, and acetate metabolism-related KO genes atoA and atoD in LO-CRC (figure 6A and online supplemental figure S3). Notably, there was a strong negative correlation between control-enriched L-arginine and disease-enriched F. nucleatum and P. asaccharolytica in LO-CRC (figure 6A and online supplemental figure S3). Furthermore, EO-CRC-enriched tryptophan and its microbial-derived metabolite indole-3-acetaldehyde were negatively correlated with control-enriched species, including Faecalibacterium prausnitzii, Roseburia hominis, B. adolescentis, Christensenella minuta, Clostridium sporogenes, and Mordavella sp. Marseille-P3756 in the present study (figure 6B and online supplemental figure S4). The bile acid metabolites deoxycholic acid and cholic acid were negatively associated with the control-enriched taxa Anaerostipes hadrus, Faecalibacterium prausnitzii and EO-Control-specific taxa Mycoplasma gallinaceum and bile acid synthesis-related KO genes cbh, NTE and pldB (figure 6B and online supplemental figure S4). Moreover, red meat-related species B. vulgatus and Parabacteroides spp CT06 were enriched in the EO-CRC samples28 and were associated with the accumulation of the choline metabolite 1-acyl-sn-glycero-3-phosphocholine and downregulation of choline metabolism-related KO gene pldB in EO-CRC (figure 6B and online supplemental figure S4). The network also demonstrated that Flavonifractor plautii, enriched specifically in EO-CRC, was positively correlated with cancer-related metabolites, perfluorooctanesulfonic acid, linoleate, L-phenylalanine and KO gene pldA (figure 6B and online supplemental figure S4). Therefore, our data suggest that alterations in microbial KO genes and metabolites were associated with changes in microbiota in CRC faeces, and the combined analysis of the three types of data may partially explain the different pathogenesis of LO-CRC and EO-CRC.

Supplemental material

Supplemental material

Figure 6

Integrated analysis of multiomics in LO-CRC- and EO-CRC. (A, B) The network revealed representatively significant and suggestive associations (p<0.05, Spearman analysis) among differentially abundant taxa, metabolites and KO genes (A) between 130 LO-CRC and 97 LO-Control, and (B) between 114 EO-CRC and 100 EO-Control, respectively. Nodes are coloured according to the group which represents features increased or decreased in case compared with control. Lines connecting nodes indicate positive (red) or negative (blue) correlations. CRC, colorectal cancer; EO-CRC, early-onset CRC; LO-CRC, late-onset CRC.

Integrative multiomics signatures of LO-CRC and EO-CRC patients

Significant associations among the three omics of taxa, metabolites, and KO genes were revealed using Procrustes analysis (online supplemental table S8). To investigate the potential of gut microbial, metabolic and KO gene profiles as diagnostic markers, we built random forest classifiers to discriminate LO-CRC and EO-CRC cases from age-matched healthy controls. A 10-fold cross-validated random-forest model was used in the training phase (70% of the samples in each group randomly selected from the discovery cohort) to select key discriminatory bacterial taxa, metabolites, KO genes and a combination of the three. In the testing phase, the remaining 30% of the samples from the discovery cohort were used to validate the diagnostic efficacy of the LO-CRC and EO-CRC classifiers, respectively. Moreover, 108 participants, including 38 LO-CRC, 22 LO-Control, 24 EO-CRC and 24 EO-Control from Shandong China, served as an independent external validation cohort to verify the potential of the classifiers (figure 7A). The detailed demographic, clinical and biochemical profiles of the validation cohort are provided in online supplemental table S9. Our analysis identified a bacterial taxa signature composed of 32 species as the optimal marker set between LO-CRC and LO-Control (from the training phase) (online supplemental table S10), and the area under curve (AUC) value of the microbial markers was 85.22% (online supplemental figure 5A). The top-ranking feature of the LO-CRC classification included oral anaerobe F. nucleatum, which has previously been identified as marker species for CRC.29 The models based on 16 metabolites and 59 KO features also performed well, with an AUC of 84.46% and 86.11%, respectively (online supplemental figure 5A, online supplemental table S11,S12). Notably, the integration of the three features improved classification accuracy (AUC: 87%) (online supplemental figure S5A,S5C, online supplemental table S13). In the testing phase, using either bacterial taxa, metabolites or KO gene markers alone as predictors between LO-CRC and LO-Control generated an AUC of 84.53%, 83.3%, and 84.97%, respectively; however, the combination of the three achieved an AUC of 92.34% (figure 7B). Furthermore, the power of random forest classifiers for distinguishing LO-CRC patients from age-matched healthy individuals was validated in an external validation cohort from Shandong, China. The bacterial taxa, metabolites, or KO gene alone yielded AUC of 78.17%, 78.47% and 81.22%, respectively, to discriminate 38 LO-CRC patients from 22 LO-Control individuals, while the combined markers increased the AUC to 82.36% (figure 7C). A summary of the AUC with a 95% CI is listed in online supplemental table S18.

Supplemental material

Figure 7

Integrative multiomics markers of LO-CRC and EO-CRC by random forest models. (A) Study design and flow diagram for random forest models. to investigate the potential of gut microbial, metabolic and KO gene profiles to act as diagnostic markers, random-forest classifiers were built to discriminate 130 LO-CRC and 114 EO-CRC cases from 97 LO-Control and 100 EO-Control in the discovery cohort. Ten-fold cross-validated random-forest model in the training phase (70% of the samples in each group randomly selected from the discovery cohort) was performed to select key discriminatory bacterial taxa, metabolites, KO genes, and the combination of the three. in the testing phase, the remaining 30% samples from the discovery cohort were used to validate the diagnosis efficacy of LO-CRC and EO-CRC classifiers respectively. Moreover, 108 participants including 38 LO-CRC, 22 LO-Control, 24 EO-CRC and 24 EO-Control from Shandong China served as an independent external validation phase to verify the potential of the classifiers. (B, C) the AUC values between LO-CRC and LO-Control in (B) the testing phase, and (C) the independent external validation phase. (D, E) the AUC values between EO-CRC and EO-Control in the (D) testing phase and (E) the independent external validation phase. AUC, area under curve; CRC, colorectal cancer; EO-CRC, early-onset CRC; LO-CRC, late-onset CRC.

The power of faecal microbial, metabolic and KO gene markers in distinguishing EO-CRC from EO-Control was also illustrated using the random forest model. In the training phase, 49 differentially abundant species, 36 metabolites, 59 KO genes, and 27 integrated markers were selected as the optimal marker sets (online supplemental figure S5D, online supplemental tables S14, S15, S16, and S17). Comparatively, the three-feature integrated model was more accurate than the models using only the taxa, metabolites, and KO-genes in discriminating between EO-CRC and EO-Control groups (AUC: 91.02%, 88.38%, 89.84% and 86.81%, respectively) (online supplemental figure S5B). Simultaneously, AUC of 91.65%, 88.6%, 88.28% and 83.95%, respectively, was obtained in the testing phase (figure 7D). Similar results were found when 24 EO-CRC and 24 EO-Control individuals were compared in an external validation cohort using either only the taxa, metabolites or KO markers (AUC: 77.34%, 75.35% and 75.52%, respectively) and the three-feature integrated markers (AUC: 78.47%) (figure 7E). A summary of the AUC with a 95% CI is listed in online supplemental table S18. Taken together, our results show that patients with LO-CRC and EO-CRC have their unique faecal microbial, metabolic, and KO gene markers for better distinguishing them from healthy individuals. These three-feature integrated markers have great potential as promising non-invasive tools for detecting and distinguishing between LO-CRC and EO-CRC.

Discussion

This study reported a large-scale integrated analysis of the gut microbiome, metabolome and microbial KO genes in patients with LO-CRC and EO-CRC. These uniquely altered multiomics signatures can potentially be used to distinguish LO-CRC and EO-CRC patients from their age-matched controls. Furthermore, the characteristics of taxa, metabolites and KO genes were widely correlated with each other, and integrated analysis showed unique reactions in LO-CRC and EO-CRC patients, providing new mechanistic insights into the pathogenesis of the disease (online supplemental figure S6).

Supplemental material

Decreased alpha diversity and alterations in microbial community composition and taxa abundance were common features of gut dysbiosis in LO-CRC and EO-CRC patients. Many consistently altered differential taxa and metabolites have been identified in LO-CRC and EO-CRC patients, which may indicate a general mechanism of CRCs. Decline in butyric acid-producing bacteria Faecalibacterium prausnitzii, Eubacterium rectale, Roseburia intestinalis, etc, were detected in both the disease groups, coupled with the reduction of the short-chain fatty acid (SCFAs) acetate and downregulation of GABA biosynthesis genes in LO-CRC. The reduction in SCFAs, that are potentially involved in inflammation regulation and tumour suppression, has been found in stool samples analysed during previously conducted large CRC cohort studies.30 31 Moreover, B. fragilis and P. asaccharolytica increased in both CRCs, and a negative correlation was found between these two bacteria and LO-CRC-reduced SCFA acetate. It has been reported that Bacteroides fragilis may promote CRC progression through its toxic metabolites and cooperate with other pathogenic bacteria, such as Escherichia coli.32 In recent studies, P. asaccharolytica has been consistently identified as a key microbial species associated with CRC, which may aggravate the disturbance of fatty acid metabolism and promote inflammation and cancer development.33–35

Despite the partial overlap in the diversity of the faecal microbiome between LO-CRC and EO-CRC patients, the spatial distribution was driven by their unique taxa signatures. For instance, F. nucleatum enrichment was only observed in LO-CRC patients, whereas Flavonifractor plautii, B. vulgatus and Parabacteroides spp CT06, were unique taxa signatures in the EO-CRC group. In fact, these species have been reported to be pathogenic in different diseases. F. nucleatum is an opportunistic pathogen in many chronic oral and intestinal diseases, and the infection rate increases with age, which partly explains its differential abundance only in the LO-CRC group.28 36 F. plautii, a flavonoid-degrading bacterium, affects antigen-induced T helper 2 cell immune responses in mice.37 Flavonoids, which are abundant polyphenolic compounds in plant-based diets, are mainly composed of polyphenolic secondary metabolites with broad-spectrum pharmacological activities but have not been found to be associated with ageing. Accumulating evidence from epidemiological, preclinical and clinical studies supports the role of polyphenols in preventing cancer, cardiovascular disease, type 2 diabetes and cognitive dysfunction.38 39 Notably, it was reported that dietary flavonoid intake was negatively correlated with F. plautii abundance, which may reflect the potential deficiency of flavonoid intake or increased degradation of flavonoids in patients with EO-CRC.40 In addition, the increased faecal proteolytic and elastase activity, a characteristic of intestinal inflammation, was directly correlated with known proteolytic B. vulgatus accumulation, supporting the hypothesis that diet-related faecal protein metabolism and the inflammatory microenvironment are enhanced in EO-CRC.41 The opportunistic pathogens Parabacteroides spp and B. vulgatus have been reported to be positively correlated with red meat intake and negatively correlated with the consumption of fruits and vegetables.28 Thus, the incidence of EO-CRC may be more closely related to diet, and a high intake of red meat appears to be associated with overgrowth of bacteria, which may lead to a more hostile gut environment in the EO-CRC population.

Metabolomic analysis revealed that, levels of amino acid metabolites L-phenylalanine and D-ornithine, and histidine degradation genes (fctD, hutl, hutU, hutH) and phenylalanine biosynthesis genes (AROA1, pheA2) were upregulated in all CRCs. These metabolite discrepancies could be explained by differences in the gut microbiota and their enzymes involved in amino acid metabolism and synthesis.6 A growing body of evidence shows that phenylalanine biosynthesis is upregulated in patients with gastrointestinal cancer, potentially serving as a useful marker for cancer patients.42–44 Despite these similarities, LO-CRC and EO-CRC have unique metabolite profiles. Among them, increased accumulation of choline, tryptophan and bile acid metabolites in patients with EO-CRC was observed. These results were also confirmed by KO gene analysis, in which phosphatidylcholine and tryptophan biosynthesis genes were upregulated only in EO-CRC patients. Choline is an essential dietary nutrient for humans and is necessary for the synthesis of the neurotransmitter acetylcholine and membrane lipid phosphatidylcholine. A previous study has revealed a critical role for gut microbes, such as Eubacterium, Actinobacteria and Proteobacteria in the degradation of choline to red meat intake-related trimethylamine (TMA), which is further metabolised to trimethylamine N-oxide (TMAO) in the liver.45 The proinflammatory effect of TMAO activates inflammasomes. In colon epithelial cells, TMAO triggers inflammasome activation and reactive oxygen species production in a dose-dependent and time-dependent manner, suggesting a possible role for choline-derived TMAO from red meat intake in the pathogenesis of colitis and EO-CRC.46 Recent studies have also reported the accumulation of tryptophan and its derivatives in patients with CRC, and the alteration of tryptophan metabolic pathways, such as 5-hydroxytryptamine and indole-3-carbinol, was closely associated with the occurrence of CRC.47 48 Interestingly, EO-CRC-enriched major secondary bile acid, deoxycholic acid, may increase vagal afferent firing in the proximal colon via 5-HT release.49

Analysis of the differential metabolome and KO genes unique to LO-CRC, revealed that the metabolites L-arginine, acetate and acetaldehyde, and genes involved in GABA biosynthesis (MAO) were significantly downregulated, while the pentose phosphate pathway (genes rpiA) was significantly elevated. Arginine is involved in various aspects of tumour metabolism, including the synthesis of polyamines, nitric oxide, proline, nucleotides and glutamate. However, previous studies have shown that arginine has a two-sided effect. Some studies have found that arginine promotes tumour growth,50 whereas others suggest it to be a suitable candidate for cancer treatment.51 Although the role of arginine metabolism in CRC is still not systematically reviewed, our results reveal that arginine may have the potential to identify or explain differences in the pathogenesis of LO-CRC and EO-CRC. In addition, the reduction in the biosynthesis of potential antitumour compounds, such as GABA, is associated with the dysbiosis in LO-CRC patients and the decrease of butyric acid-producing bacteria, such as B. adolescentis.52 The inhibitory neurotransmitter GABA can also act as a growth factor to support the growth of specific bacteria, thereby helping shape intestinal homeostasis.53 Moreover, the pentose phosphate pathway is especially critical for cancer cells because it not only provides pentose phosphates for high rates of nucleic acid synthesis but also NADPH, which is required for the synthesis of fatty acids and cell survival under stress conditions.54 Accumulating data indicate that neoplastic lesions in cancer cells modulate the flux of the pentose phosphate pathway directly or indirectly, which may be an important part of LO-CRC pathogenesis.54 Taken together, the above evidence supports our hypothesis that distinct gut microbiota and metabolic activity in LO-CRC and EO-CRC are likely to be involved in the pathogenesis of the disease.

By integrating multiomics data, our study further revealed a range of microbiome-KO gene-metabolite interactions with potential mechanistic implications. The association between faecal SCFA acetate, cancer-promoting species B. fragilis, P. asaccharolytica, butyric acid-producing bacteria Faecalibacterium prausnitzii, Eubacterium rectale, Roseburia intestinalis and acetate metabolism-related KO genes (atoA, atoD) highlights that increased acetate metabolism and acetyl-CoA synthesis are unique features of LO-CRC, in agreement with previous results found in acetyl-CoA induced CRC metastasis.55 Acetyl-CoA is a key molecule in the central carbon metabolism of microbiota and participates in various cellular processes. The regulation of acetyl-CoA metabolism in host CRC cells by metabolic engineering of microbial cell factories to produce or consume acetyl-CoA may be a potential solution for LO-CRC therapy.56

Finally, we demonstrated that core gut microbiome signatures can distinguish patients with LO-CRC and EO-CRC from age-matched healthy controls across geographically separated cohorts based on a random forest classification model. The differences in microbiome markers between LO-CRC and EO-CRC suggest that key microbial species within signatures may play key roles in the pathophysiology and diagnosis of CRC at different ages. In addition to microbiome components, dietary and microbe-derived metabolites and microbial enzyme-involved interactions (KO gene characteristics) are also important components of the intestinal tumour microenvironment and are potentially involved in tumour pathogenesis and diagnosis.6 57 As observed in previous studies, exploring only the diagnostic value of a single omics has certain limitations.58 In fact, the three closely related omics, taxa, metabolites and KO genes explored in this study have their own unique signatures in LO-CRC and EO-CRC. To further support a potential causal relationship, metabolite-derived and KO gene-derived markers achieved similar diagnostic accuracy compared with microbiome markers in both LO-CRC and EO-CRC groups, and the three integrated signatures achieved the best performance in the training, test and independent validation cohorts. These findings suggest that multiomics-integrated biomarkers may be useful in predicting the risk of LO-CRC and EO-CRC as well as in differentiating CRC onset according to age. Recent practice guidelines recommend lowering the age for starting CRC screening to 45 years, but this change will also raise socioeconomic issues such as cost-effectiveness.59 60 Therefore, personalised precision screening strategies can help improve the detection of high-risk individuals. The robust accuracy of our non-invasive detection of EO-CRC can help promote colonoscopy in younger populations with CRC-related multiomics signatures to reduce the incidence of sporadic CRC.

The innovations in our study include the use of both, metagenomics and metabolomics data for microbial phylogeny and derived metabolite and functional analysis, comparative evaluation of non-invasive omics testing methods through a multicentre study design and age-matched assessment. Despite these meaningful findings, as a cross-sectional study, some changes in bacterial composition and metabolites may also be caused by cancer or bystanders, and we will make efforts to clarify the specific causal relationship through mechanistic experiments in our future work. In addition, overfitting may occur in random forest models, although we continue to avoid this problem by adjusting the model parameters. Validation of the diagnostic model based on a larger sample size and basic research on specific bacteria and metabolites will be conducted in the future.

In conclusion, by leveraging multiomics data, our study is the first attempt to reveal the common states of faecal microbiome dysbiosis and metabolome dysregulation in LO-CRC and EO-CRC patients and to identify unique microbe–metabolite interactions. Multiomics-based biomarkers have a robust advantage in distinguishing EO-CRC patients from age-matched controls. Although more mechanistic studies and clinical validation are needed, our study highlights the need for further investigation of the potential associations between gut microbiota-derived omics signatures and CRC risk in young adults, which may drive the clinical transformation of microbiome-derived strategies towards precise screening and diagnosis.

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

Ethics statements

Patient consent for publication

Ethics approval

Ethical approval was obtained from the Institutional Review Board of Fudan University Shanghai Cancer Center (ID. 050432-4-1911D). Participants gave informed consent to participate in the study before taking part.

References

Supplementary materials

Footnotes

  • CK, LL, GL, LD and YY contributed equally.

  • Contributors YM is responsible for the overall content as the guarantor. CK, LL, GL, LD, YY and YM designed the experiments. LL, LD, YY, JL, XL, DS and YM provided the clinical samples and performed the experiments. CK, YY and GL analysed the data. CK, GL and YM wrote the manuscript. All authors edited the manuscript.

  • Funding This work was supported by grants from the National Natural Science Foundation of China (Nos. 81920108026, 81871964), the National Ten Thousand Plan Young Top Talents (for YM), the Shanghai Science and Technology Development Fund (No.19410713300), the Program of Shanghai Academic Research Leader (No. 20XD1421200), the CSCO-Roche Tumor Research Fund (No. Y-2019Roche-079) and the Fudan University Excellence 2025 Talent Cultivation Plan (for YM). The authors take this opportunity to thank all of the participating patients and healthy volunteers for supporting this study by donating the precious samples used in this research.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles