Article Text
Abstract
Background The association between an imbalanced gut virus community and colorectal cancer (CRC) is well-established, while the specific characteristics and genomic functions of these viruses remain poorly understood. This study aimed to investigate the lifestyle, gene function, single nucleotide polymorphisms (SNPs), structural variations (SVs), and phage-host interactions of gut viruses in CRC through the analysis of assembled viral genomes.
Methods Fecal samples were collected from CRC patients and healthy controls, and virus-like particles (VLPs) were isolated. PacBio sequencing platforms were used to sequence VLP DNA with a minimum length of 15 kb and at least 10 μg per sample. The gut virus genomes were assembled and annotated using Canu and EggNog. MetaPop and Sniffles2 were employed for SNP/SV analysis.
Results After assembling and filtering, more than 23,000 high-quality gut virus genomes were obtained. The prevalence of novel viruses was higher in CRC compared to the healthy control group. Notably, the proportion of phages, particularly temperate phages, was significantly higher in CRC than in the healthy control group (P < 0.001). KEGG pathway analysis of virus genes revealed enrichment in DNA repair and recombination, homologous recombination, and mismatch repair in CRC. Furthermore, 15 genes with differential SNPs accurately distinguished CRC from healthy controls with an AUC of 88.03% using a random forest model. Additionally, significant alterations were observed in the interaction between gut phages and host bacteria in CRC, particularly involving CRC-associated bacteria. Notably, the number of Bacteroides fragilis-hosted phage genomes and their relative abundance were significantly lower in CRC than in healthy controls (P < 0.05). Lastly, 15 viruses with differential abundances effectively discriminated CRC from healthy controls, with an AUC of 94.12% in the training cohort. These findings were validated in five global cohorts, with AUCs ranging from 71.94% to 81.72%.
Conclusions Multidimensional analysis based on PacBio sequencing revealed substantial changes in gut virus communities, lifestyle, gene function, SNP/SV, and phage-host interactions in CRC patients compared to healthy controls. Gut virus abundance and SNPs show promise as potential biomarkers for diagnosing CRC.