Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
With great interest, we have read the article by Clooney et al, which highlighted the regional effects on the heterogeneity of the gut microbiota among populations with inflammatory bowel disease (IBD).1 As a result, regional effects would largely limit the microbial-based diagnosis of diseases across regions. Although current machine learning methods based on microbial features have been applied to diagnosis of diseases such as IBD2 and type 2 diabetes,3 these methods are unable to mitigate the regional effects and meet the demand of microbial-based cross-regional diagnosis of diseases.
Here, we proposed a machine learning framework (online supplemental figure S1, accessible at: https://github.com/HUST-NingKang-Lab/EXPERT-Disease-GGMP), which integrated the neural network and transfer learning, to effectively reduce regional effects for microbial-based cross-regional diagnosis. Importantly, transfer learning can ‘borrow’ the mature knowledge about diseases from a source city to assist the disease diagnosis for a target city, especially when there is little knowledge about microbiota patterns in the target city.4
To assess the framework, we obtained genus-level taxonomy profiles from the Guangdong Gut Microbiome Project.5 These samples were collected from 14 cities and seven representative diseases were selected for assessment (figure 1A and online supplemental table S1). We randomly divided samples of each city into the training subset and the testing subset (80%:20% by default), then performed assessments for three models: (1) Independent disease neural network (DNN) model: ab initio training and testing the DNN model on the training subset and the testing subset of each city, respectively. (2) Regional DNN model: ab initio training the DNN model using the training subset of one city A (source city) and testing it on the testing subset of another city B (target city). (3) Transfer DNN model: ab initio training the DNN model using training subset of one city A, followed by applying transfer learning to a certain proportion (from 20% to 80%) of samples from city B, and then testing the transfer DNN model on the testing subset of city B (figure 1B and online supplemental figure S1).
We found that the regional DNN model across cities presented a low average accuracy of 0.506 compared with the independent DNN model with an average accuracy of 0.743 (pWilcox=2.22×10−16; figure 1C and online supplemental figure S2). It suggested that regional factors largely limited the cross-regional diagnosis, as also indicated in previous studies.5 However, the transfer DNN model profoundly increased prediction accuracy across cities with an average accuracy of 0.829 (pWilcox=2.22×10−16, compared with the independent DNN model; figure 1C and online supplemental figure S2). Intriguingly, once the proportion of samples used in the target city exceeded 50% for transfer learning, the transfer DNN model could even present higher prediction accuracy than that of the independent DNN model (figure 1D). Furthermore, the transfer DNN models also had good performance when we have applied this approach on two intercontinental cohorts (online supplemental figures S3 and S4).
Moreover, our machine learning framework is advantageous in identification of region-specific microbes, as well as microbes shared across all regions. We used the ‘leave-one-feature-out’ method to discover certain microbes which were strongly affected by regions, such as Enterobacteriaceae and Clostridium, while others were less affected by regions, such as Parabacteroides and Faecalibacterium (online supplemental table S2). We speculated that the region-specific microbes may contribute to the effectiveness of the transfer DNN model in the cross-regional diagnosis of diseases.
Collectively, our study demonstrates that transfer learning can realise microbial-based cross-regional diagnosis of diseases with high accuracy and robustness, by using knowledge about microbial features across regions. This study provides a new venue to exceed the regional limitation, and facilitate microbial-based cross-regional diagnosis of diseases in clinical trials by artificial intelligence techniques.
Data accession: metagenomic sequencing samples are available in the European Bioinformatics Institute (EBI) database of European Molecular Biology Laboratory (EBI accession number PRJEB18535) at https://www.ebi.ac.uk/ena/browser/view/PRJEB18535.
Patient consent for publication
Contributors NW and KN designed the study, conceived and proposed the idea. NW performed the experiments and analysed the data. NW visualised the data. NW, MC and KN contributed to editing and proofreading the manuscript. All authors read and approved the final manuscript.
Funding This work was partially supported by the National Natural Science Foundation of China (Grant Nos. 32071465, 31871334, and 31671374), and the National Key R&D Programme of China (Grant No. 2018YFC0910502).
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.