Article Text

Original research
Underdevelopment of the gut microbiota and bacteria species as non-invasive markers of prediction in children with autism spectrum disorder
  1. Yating Wan1,2,3,4,5,
  2. Tao Zuo1,2,3,4,
  3. Zhilu Xu1,2,3,4,5,
  4. Fen Zhang1,2,3,4,5,
  5. Hui Zhan1,2,3,4,5,
  6. Dorothy CHAN6,
  7. Ting-Fan Leung6,
  8. Yun Kit Yeoh1,5,7,
  9. Francis K L Chan1,2,3,4,5,
  10. Ruth Chan8,
  11. Siew C Ng1,2,3,4,5
  1. 1 Centre for Gut Microbiota Research, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
  2. 2 Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, China
  3. 3 Li Ka Shing Institute of Health Sciences, State Key Laboratory of Digestive Disease, The Chinese University of Hong Kong, Hong Kong, China
  4. 4 Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong, China
  5. 5 Microbiota I-Center (MagIC), Hong Kong, China
  6. 6 Department of Paediatrics, The Chinese University of Hong Kong, Hong Kong, China
  7. 7 Department of Microbiology, The Chinese University of Hong Kong, Hong Kong, China
  8. 8 Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, China
  1. Correspondence to Professor Siew C Ng, Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, Hong Kong; siewchienng{at}; Dr Ruth Chan, Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, China; ruth.chansm{at}


Objective The gut microbiota has been suggested to play a role in autism spectrum disorder (ASD). We postulate that children with ASD harbour an altered developmental profile of the gut microbiota distinct from that of typically developing (TD) children. Here, we aimed to characterise compositional and functional alterations in gut microbiome in association with age in children with ASD and to identify novel faecal bacterial markers for predicting ASD.

Design We performed deep metagenomic sequencing in faecal samples of 146 Chinese children (72 ASD and 74 TD children). We compared gut microbial composition and functions between children with ASD and TD children. Candidate bacteria markers were identified and validated by metagenomic analysis. Gut microbiota development in relation to chronological age was assessed using random forest model.

Results ASD and chronological age had the most significant and largest impacts on children’s faecal microbiome while diet showed no correlation. Children with ASD had significant alterations in faecal microbiome composition compared with TD children characterised by increased bacterial richness (p=0.021) and altered microbiome composition (p<0.05). Five bacterial species were identified to distinguish gut microbes in ASD and TD children, with areas under the receiver operating curve (AUC) of 82.6% and 76.2% in the discovery cohort and validation cohort, respectively. Multiple neurotransmitter biosynthesis related pathways in the gut microbiome were depleted in children with ASD compared with TD children (p<0.05). Developing dynamics of growth-associated gut bacteria (age-discriminatory species) seen in TD children were lost in children with ASD across the early-life age spectrum.

Conclusions Gut microbiome in Chinese children with ASD was altered in composition, ecological network and functionality compared with TD children. We identified novel bacterial markers for prediction of ASD and demonstrated persistent underdevelopment of the gut microbiota in children with ASD which lagged behind their respective age-matched peers.

  • intestinal microbiology

Data availability statement

Data are available in a public, open access repository. (Raw sequence) data that support the findings of this study have been deposited in (NCBI) with the (PRJNA686821) accession codes []

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known on this subject?

  • Alterations in faecal bacteriome have been reported in children with autism spectrum disorder (ASD), but causality is yet to be established in humans.

  • The microbiota–gut–brain axis, the bidirectional communication pathway between gut bacteria and the central nervous system, has a profound effect on social behaviours.

  • However, data on gut microbiome development during early age in children with ASD are lacking.

What are the new findings?

  • Gut microbiome composition was not associated with diet in this cohort.

  • We identified five bacteria markers that differentiate children with ASD from typically developing (TD) children in a discovery set (AUC 82.6%) and validated the findings in an independent cohort (AUC 76.2%).

  • Microbial functions relating to neurotransmitter biosynthesis are significantly decreased in children with ASD compared with TD children.

  • We demonstrated for the first time persistent under-development of gut microbiome in children with ASD relative to age-matched and gender-matched TD children.

How might it impact on clinical practice in the foreseeable future?

  • Our study supports the potential role of non-invasive prediction of ASD based on faecal bacteria markers and age-related bacteria development profile.

  • Future therapeutics targeting reconstitution of gut microbiota in early life and increasing abundance of neurotransmitter-synthesised bacteria such as Faecalibacterium should be explored for ASD.


Autism spectrum disorder (ASD) is a group of neurodevelopmental conditions that begins in early life and is characterised by impaired social communication and interactions as well as stereotyped, repetitive behaviour.1 The prevalence of ASD in children and adolescents is 0.36% in Asia2 and 1.85% in western countries.3 Over the past two decades, the incidence of ASD in China has increased from 2.80 per 10 000 in 2000 to 63 per 10 000 in 2015.4 5 Genetic research has highlighted the importance of de novo mutations in ASD6 but no single gene has been identified that substantially increases the risk of ASD. Apart from genetic factors, the gut microbiota has been suggested to play a role in ASD. The community of microorganisms in the gastrointestinal (GI) tract is known to influence brain physiology and social-behaviour via a diverse set of pathways,7 8 including immune activation, production of microbial metabolites and peptides and production of various neurotransmitters and neuromodulators.9 In early childhood during a period of behavioural and biological development, gut microbes are thought to be essential in development by assisting in energy metabolism and modulating the immune system.10 Subramanian et al previously described an assembly of gut bacteria that matured with chronological age in healthy children and a comparatively ‘immature’ microbiota profile in malnourished children.11 In addition to biological underdevelopment, it was also revealed that the development of gut microbiota has a close relationship with cognitive development.12 Age-related change of bacteria alpha diversity was reported in ASD but developmental trajectory of gut microbiota in ASD has not been reported.13 Microbiota maturity may provide a microbial measure of children development as a way of classifying health or disease states and provide new insight on disease occurrence, progress and treatment of disease. Given that the gut microbiome has been linked to brain function via the gut-brain axis, we hypothesise that an underdeveloped gut microbiota may be associated with ASD.

Currently, diagnosing ASD can be challenging because there is no definitive medical test and diagnosis is based on physician assessment. Potential faecal bacteria biomarkers used to predict ASD can therefore facilitate early treatment and intervention. We identified distinct changes in the gut microbiome of children with ASD compared with typically developing (TD) children and identified five bacteria species marker candidates that may serve as non-invasive biomarkers for ASD. We further showed that development of the gut microbiome in children with ASD lagged chronological age-matched TD peers suggesting an under-development of gut microbiome in children with ASD compared with TD children.


ASD and age had the most significant impact on children’s gut microbiome

In total, 64 preschoolers aged 3–6 years with a diagnosis of ASD, and 64 TD preschoolers matched by children’s age (within 6 months) and gender were recruited from the community (table 1, (online supplemental table S1). We first examined associations between host factors and children’s faecal microbiome composition. Among the examined host factors, chronological age, ASD, and body mass index (BMI) showed the largest associations with faecal microbiome composition based on effect size (online supplemental figure S1A, figure 1A, permutational multivariate analysis of variance (PERMANOVA), false discovery rate (FDR) <0.05). Diet was not correlated with gut microbiome composition. Among these main factors, the impact of ASD, chronological age and BMI on gut microbiome are independent of each other (online supplemental figure S1B, online supplemental table S2). To further explore how host factors impacted gut microbiome composition, we interrogated the correlations between individual host factors and detected bacterial species. Nineteen bacterial species were significantly correlated with ASD, age, gender, length of breast feeding (months), diet quality, delivery mode and gestational age (MaAslin, figure 1B). Alistipes indistinctus, candidate TM7c were positively correlated with ASD (abundance significantly higher in ASD vs TD, figure 1B). Lachnospiraceae bacterium positively correlated with 3-day diet quality. Parabacteroides merdae was decreased in children delivered via caesarean section compared with vaginal delivery and this species was reduced in ASD irrespective of delivery mode (Mann-Whitney U test, p<0.05 online supplemental figure S1C). Altogether, our data indicate chronological age, ASD and BMI were the main factors associated with gut microbiome variation in this cohort.

Supplemental material

Supplemental material

Supplemental material

Table 1

Clinical information of study participants

Figure 1

Host factors impacted the gut microbiome in children. (A) The effect size of host factors on children gut bacteriome variation via multivariate analysis. Effect size and statistical significance were determined via PERMANOVA with adonis function. Only significant host factors were coloured, adjust p value was collected. *P<0.05, **p<0.01. (B) Heatmap of correlation between host factors and gut bacterial species. Correlation coefficients were calculated through MaAslin. only statistically significant correlations (FDR <0.1) were plotted. The colour intensity of bottom bar was proportional to the correlation coefficient, where blue indicate inverse correlations and red indicate positive correlations. ASD, autism spectrum disorder; BMI, body mass index.

Identification of faecal bacteria species as potential biomarker for ASD

Microbial richness was higher in children with ASD than age-matched and BMI-matched TD children (t-test, p<0.05, figure 2A). At the genus level, genera such as Clostridium, Dialister and Coprobacillus were enriched in children with ASD whereas Faecalibacterium known to produce butyrate14 was significantly decreased (online supplemental figure S2B, online supplemental table S3), FDR <0.05). At the species level, gut microbiome composition in children with ASD were significantly distinct compared with TD (figure 2B, PERMANOVA, p<0.05, based on the Bray-Curtis dissimilarities). Furthermore, the gut microbiome was more heterogeneous across children with ASD compared with TD as demonstrated by a significantly increased in interindividual microbiome dissimilarity in children with ASD relative to TD children (Bray-Curtis dissimilarities, t-test, p<0.0001, online supplemental figure S2A). These species-level compositional differences were largely attributed to five bacterial species including Alistipes indistinctus, candidate division_TM7_isolate_TM7c, Streptococcus cristatus, Eubacterium limosum and Streptococcus oligofermentans (identified by Random Forest (RF) via 10-fold cross-validation, figure 2C). Using these five taxa, a RF model returned an area under the curve (AUC) value of 82.6% in distinguishing between children with ASD and TD children. To validate the biomarkers, we obtained faecal metagenomes from an independent children cohort consisting of eight children with ASD and 10 TD children recruited from different community sources in Hong Kong (validation set). RF classification using the same five biomarkers showed AUC of 76.2% in this validation cohort (figure 2D). These results indicate that compositional differences in gut microbiota between TD children and children with ASD could serve as a non-invasive screening tool for ASD.

Supplemental material

Figure 2

Alteration in gut microbiome in Chinese children with ASD. (A) Comparison of faecal bacterial genera richness between children with ASD and TD children. For boxplots, the boxes extend from the first to third quartile (25th to 75th percentiles), with the median depicted by a horizontal line. Statistical significance between ASD and TD group was determined by t-test, *p<0.05. (B) NMDS (non-metric multidimensional scaling) of bacterial community composition in ASD and TD group based on Bray-Curtis dissimilarities, statistical significance was determined by PERMANOVA, p<0.05. (C) Comparison of the relative abundance of 5 bacterial species between ASD and TD. The five bacterial species markers were identified by random forest and 10-fold cross-validation. (D) Random forest classifier performance for classifying ASD versus TD microbiome. Receiver operating characteristic curves depict trade-offs between RF classifier true and false positive rates as classification stringency varies. AUC values of the training set, test set and validation set represented were given in red, blue and green line respectively. ASD, autism spectrum disorder; RF, Random Forest; TD, typically developing.

Gut bacterium ecological network in children with ASD versus TD children

To understand potential relationships among bacteria within the gut microbiota of TD children and children with ASD, we assessed ecological interactions among the detected bacteria species by evaluating pairwise Spearman’s rank correlations of their relative abundances. Most of the correlations in both ASD and TD were positive correlations (figure 3), indicating that the ecosystem was primarily dominated by microbial cooperation instead of competition. A stronger correlation network was observed in children with ASD in contrast to the sparse correlation network in TD as indicated by both the number (671 vs 368) and coefficients of significant correlations (figure 3, FDR <0.05, |correlation coefficient|>0.5). In TD children, bacteria from the phylum Firmicutes showed most interspecies interactions, and genus Lactobacillus showed a key and central role in bacterial interactions. As shown in figure 3, Bacteroidetes showed robust correlations and occupied a centre position in the ecological network of children with ASD. The number of correlations of Porphyromonas was high in children with ASD (number of the interaction >10). Species from Porphyromonas are associated with the development of neurodegenerative diseases.15 Additionally, several Clostridium species enriched in ASD closely interacted with each other and formed a connected group in ASD. Clostridia species have been linked with ASD via production of clostridial toxins which have pathological effects in the central nervous system.16 Such changes in the gut microbiome ecological network suggest that interspecies communication or interplay was significantly altered in children with ASD.

Figure 3

Gut bacterium-bacterium ecological network in children with ASD versus TD children. Correlations between bacteria-bacteria at the species level in ASD and TD, respectively. Correlations between taxa were calculated through Spearman’s rank correlation analysis. Statistical significance was determined for all pairwise comparisons. Only statistically significant correlations (FDR <0.05) with |correlation coefficient| >0.5 were plotted. The correlation network was visualised via Cytoscape (3.8.1). The size of node, corresponding to individual microbial species, is proportional to the number of significant inter-species correlations. The colour of node indicates the phylum to which the corresponding microbial species belong. The colour intensity of connective lines is proportional to the correlation coefficient, where blue lines indicate inverse correlations and red lines indicate positive correlations. ASD, autism spectrum disorder; TD, typically developing.

Pathways related to neurotransmitter biosynthesis were decreased in the gut microbiome of children with ASD

Gut microbiome functionality in relation or consequence to their compositional changes in ASD was assessed using HUMAnN2.17 Pathways related to neurotransmitter biosynthesis were significantly decreased in the gut microbiome of children with ASD compared with TD children (Mann-Whitney U test, figure 4A). Chorismate biosynthesis I (ARO-PWY) and PWY-6163 involved in the biosynthesis of chorismate (a precursor for tryptophan biosynthesis) were significantly decreased in the gut microbiome of children with ASD compared with TD children. Concomitantly, the COMPLETE-ARO-PWY function corresponding to the biosynthesis of aromatic amino acids (including L-tryptophan, L-tyrosine), all starting from the principal common precursor chorismite,18 was also decreased in ASD. In addition, microbial pathway of glycine (inhibitory neurotransmitter) biosynthesis and the abundance of glutamate synthase, an enzyme that manufactures glutamate encoded by microbial genes was depleted in children with ASD (online supplemental figure S3B) compared with TD children. The species Ruminococcus sp. 5_1_39BFAA, Eubacterium rectale and Ruminococcus bromii, Faecalibacterium prausnitzii were main contributors to the biosynthesis of L-tryptophan and glycine, respectively, (figure 4B,C) and Faecalibacterium also showed lowered abundances in ASD. Notably, contribution of Faecalibacterium prausnitzii to serine-glycine metabolism was significantly decreased in children with ASD compared with TD children (online supplemental figure S3A). Altogether, our data show that the microbiome functionalities associated with neurotransmitter synthesis were markedly reduced in children with ASD which may have profound functional consequences on the psychiatric abnormalities in ASD. Since neurotransmitters enable signal transmission across synapses to nerve cells where synaptic dysfunction is thought to contribute to the pathophysiology of ASD,19 these results suggest that the role of gut microorganisms in ASD is related to amino acid metabolism.

Figure 4

Functionality alterations in the gut microbiome in ASD. (A) Abundance of pathways related to neurotransmitter biosynthesis in children with ASD versus TD children. The significance was determined by Welch t-test and is indicated as *p<0.05. Species contribution to the indicated microbial functionalities, aromatic amino acid (B) and L-serine and glycine biosynthesis (C), in the gut microbiome of children with ASD and TD children respectively. In each functional module, the biosynthesis was contributed by a mixture of species (blocks of each stacked BAR) in the gut, and each stacked bar represents one of subject metagenomes. ASD, autism spectrum disorder; TD, typically developing.

Building on the gut microbiome functionality profile and clinical parameters of the study subjects, we explored the relationships between the abundance of microbiome functional modules and host factors via MaAslin. We found that age was the only factor significantly associated with microbial functional pathway (online supplemental table S4), FDR <0.1). Of those associations, cell structure biosynthesis and purine nucleotide biosynthesis pathways were decreased with age. Therefore, chronological age had an effect on the functionality of the gut microbiome in children.

Supplemental material

Altered gut microbiome development in children with ASD

Given the influence of host chronological age on composition and functionality of the gut microbiome, we hypothesised that the gut microbiomes of children with ASD develop differently compared with TD children. We modelled microbiota development in TD children using RF to regress relative abundances of the total community against chronological age of TD children at time of faecal sample collection. Consequently, 26 age-discriminatory bacterial species were identified (figure 5A) as proxies of typical development of children’s gut microbiota with age (figure 5B, left panel). In contrast to TD children, abundances of these taxa were substantially disrupted in children with ASD and did not show correlations with age (figure 5B, right panel). For instance, in TD children relative abundances of Eubacterium limosum and Bifidobacterium breve decreased with age, whereas Eubacterium brachy, Haemophilus parainfluenzae, Bacteroides cellulosilyticus and Lachnospiraceae bacterium 3_1_46FAA increased with age (figure 5B, left panel). These associations were absent in children with ASD (figure 5B, right panel), suggesting abnormal development of the gut microbiota during early life growth.

Figure 5

Underdevelopment of age-discriminatory taxa in ASD. (A) Twenty-six species were identified as age-discriminatory bacterial taxa via random forest regression of relative abundances of faecal bacterial species against host chronologic age in TD subjects. The age-discriminatory species was ranked in descending order of their importance to the accuracy of the model. Importance was determined based on the percentage increase in mean-squared error of microbiota age prediction when the relative abundance values of each taxon were randomly permuted. The insert shows five times 10-fold cross-validation error as a function of the number of input bacterial species (blue line). (B) Heatmap of the relative abundances of the 26 age-discriminatory bacterial taxa plotted against the chronologic age spectrum (months) in TD and ASD children, respectively. (C) Underdevelopment of the gut microbiome in ASD children versus TD children. Microbiota age prediction model was established as a function of the biological age in TD children and was then employed to predict the microbiome age against their chronological age in children with ASD. ASD, autism spectrum disorder; TD, typically developing.

To validate our finding, we developed a sparse microbiome-age prediction model as a function of chronological age in TD children based on the abundances of the 26 age-discriminatory species (figure 5C). In TD children, the predicted microbiota-age increases linearly with age illustrating a steady development landscape of the gut microbiota with age. However, when employing the microbiota-age model developed in TD children to predict the microbiota-age of children with ASD, we found that the gut microbiota of children with ASD showed under-development in keeping up with the chronological age of the host as illustrated by the more placid slope compared with TD children (slope of the linear model: 0.10 vs 0.31 respectively, figure 5C). These data altogether suggest that children with ASD have impaired development of their gut microbiota during childhood growth. Gut microbiota co-evolved with children to develop a mutualistic and symbiotic relationship, abnormal gut microbial development in childhood may have a long-lasting effect in host health.


To our knowledge, this study represents the most in-depth human faecal microbiota study in children with ASD and TD children in the Chinese population. By integrating composition and functionality of the gut microbiome in association with children’s characteristics, we demonstrate abnormal development in association with age and functional distortions in the gut microbiomes of children with ASD. Notably, potential pathogens such as Clostridium and Alistipes indistinctus were enriched in children with ASD whereas Faecalibcterium was underrepresented. Clostridium has been reported to be associated with brain tissue damage and neurological disorders by producing clostridial toxins that could distort dendritic arbour complexity during childhood development.20 In addition, A higher relative abundance of Clostridium was previously reported in children with ASD in an Italian cohort,16 indicating that their enrichment could be common across biogeographies. Nevertheless, there is still large variabilities in implicated gut microbial taxa in children with ASD across populations and studies.21–23 In contrast to our observations, a study in Japan have shown that the genus Faecalibcterium was more abundant in children with ASD.21 Similarly, the abundance of Alistipes was lower in children with ASD in a European study in contrast to our Chinese cohort.16 This is analogous to other studies where geographic factor was depicted as a critical contributor to gut microbiome variations.24 25 Given that geography and population features are tightly linked with diet, environment and lifestyle, these factors are likely to contribute to gut microbiome variations in children.

Extensive research indicates that diet plays a role in shaping the gut microbiome.26 Nutrients including carbohydrates, proteins and fats serve as fuel for the gut microbiome. Children with ASD have symptoms of repeat behaviours and restricted interests which could affect eating habits and thus contribute to nutrient deficiencies, as such dietary differences between TD and ASD children could confound comparisons of gut microbiota composition.27 To address the possibility of dietary differences among children surveyed in this study, we assessed associations between their dietary intake and gut microbiome composition taking into account other parameters such as age, gender and BMI. The results showed that diet did not have a statistically significant effect on gut microbiome variation (figure 1A, online supplemental figure 1A), indicating diet was unlikely to have been a confounding factor in this cohort.

Ecological networks of gut microbiota are considered critical for host health and well-being, because it shows that beneficial symbionts and their associated functions are maintained over time.28 29 Altered bacterial networks have been described in inflammatory bowel disease and depression, both characterised by disrupted correlation networks and disturbance to key bacteria.30 31 In our study, the gut microbial community of children with ASD showed a more complex ecological network where more interactions between bacterial species were observed. Microbial communities showing high cooperation was regarded as less stable compared with a competitive community.32 As shown in previous study gradually increasing only the proportion of cooperative interactions within communities nearly always decreases the overall return rate and the likelihood of stability.33 As for key taxa in the network in ASD, Bacteroidetes produce propionic acid and other short-chain fatty acids (SCFA). Rat models administered with propionic acid show increased restrictive/repetitive behaviours as well as impaired social behaviour.34 Zhang J et al revealed that Porphyromonas gingivalis induces cognitive dysfunction, mediated by neuronal inflammation in mice.15 In our study, Porphyromonas asaccharolytica and Porphyromonas bennonis showed the most interactions and identified as key species in the network in children with ASD. The associations between these bacteria and ASD requires further exploration. Besides this, Clostridium species associated with neurological disorders22 were connected closely with each other under the same genera. Interactions between unfavourable bacteria may enhance and accelerate disease progression.35 Taken together, an unstable and unfavourable ecosystem was observed in the gut microbiome of ASD.

Alteration in microbial abundances and interactions could influence many functional aspects of the gut microbiome including physiochemical changes, metabolite exchange and signal transduction.36 In our study, an interesting finding was that several neurotransmitter biosynthesis-related pathways were significantly decreased in children with ASD relative to TD children. Depleted tryptophan biosynthesis and glutamate synthase may have detrimental effects on host psychiatric responses and this has been implicated in depression and other psychiatric disorders.37 The above data indicate that the neurotransmission related pathways and metabolites might impair or not sufficient to support normal physiological and psychiatric activities in ASD. Microbial taxa such as Ruminococcus spp, Lachnospiraceae spp and Faecalibacterium spp were predominant players in functional pathways including aromatic amino acid, L-serine and glycine biosynthesis. Although these two pathways were reduced in children with ASD, the relative abundance of Ruminococcus spp and Lachnospiraceae spp were not significantly changed in the ASD cohort compared with TD. These observations suggest that the decrease in L-serine and glycine biosynthesis pathways in children with ASD were predominantly due to loss of Faecalibacterium prausnitzii rather than Ruminococcus spp, Lachnospiraceae spp.

In summary, our study shows for the first time that the gut microbiota of children with ASD is abnormally developed and lags that of age-matched peers. As development of microbial communities within the GI tract during childhood represents a critical window of human growth and health, shifts in the gut microbiota during early life development may have important functional roles in the pathogenesis of ASD and thus warrants extensive investigations.


Cohort description and study subjects

For the discovery set, 64 Chinese children (aged between 3 and 6 years) with a diagnosis of ASD were recruited from a regional teaching hospital (Prince of Wales Hospital, Shatin, Hong Kong) which serves a catchment area of 1.5 million people (one-seventh of the Hong Kong population), and 64 TD children matched by age (within 6 months) and gender (table 1, (online supplemental table S1) were recruited from other sources in the same catchment area within the same period. In the validation cohort, a separate group of children with a diagnosis of ASD were recruited from non-governmental ASD organisations, the Prince of Wales Hospital and parents’ support groups. Controls in the validation group were recruited from community sources in Hong Kong during the same period. A structured interview was arranged for the families for data and sample collection. The study protocol was performed in compliance with the Declaration of Helsinki and approved by the Clinical Research Ethics Committee of The Chinese University of Hong Kong (CREC Ref. No.: 2014.026 and 2016.607).

Children with ASD were diagnosed by paediatricians or clinical psychologists according to the standard of the fourth or fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV or DSM-V). TD children are free of developmental delay and do not have any first-degree relatives with ASD. Parents of TD children were first asked to complete the Chinese validated version of Social Responsiveness Scale second edition (SRS-2).38 Only those screened negative with SRS-2 or those positive according to SRS-2 but subsequently ascertained by a developmental paediatrician as free of symptoms of ASD were included in this study. Children who met any of the following criteria were excluded from the study: children with diagnosis of chronic seizures, suffering from recent infection 1 month prior to data and sample collection, having disorders that affect dietary/physical activity habits, usage of antibiotics and antifungal medications as well as prebiotics and probiotics supplements 1-month prior to data and sample collection, concurrent or recent (ie, 1 month) participation in any trials or dietary intervention programmes, and having other conditions as judged by the investigators as ineligible to participate.

Questionnaires and measurements

A standardised questionnaire was used to capture data on family demographics and other aspects of children’s health, such as birth data. Children’s weight to the nearest 0.1 kg and height to the nearest 1 mm were measured.

Children’s current and usual dietary intake were assessed using a 3 day food record and a validated Chinese version of the Food Frequency Questionnaire (online supplemental table S5, online supplemental table S6).39 Parents were asked to complete both questionnaires in a face-to-face interview with assistance from trained research staff. Daily nutrient intake and consumption of food were calculated using the nutrition analysis software Food Processor Nutrition analysis and Fitness software V.8.0 (ESHA Research, Salem, USA). Dietary and nutrient data were used to generate the Chinese Children Dietary Index (CCDI).40 Since data on children’s vitamin A and detailed fatty acid intake, whether the children had breakfast or dinner with parents, and children’s sedentary behaviours were not available in the present study, these components were removed from the calculation of the CCDI. The total score (CCDI-TS) was therefore modified from the original 160 scores to 120 scores in the present study, and a higher score indicates better diet quality. A sub-component of CCDI Diet Variety Score was also used to measure children’s diet variety in the present study. Daily consumption of at least one serving from each of the food groups (grains, vegetables, fruits, dairy/beans, and meat/poultry/fish/eggs) was used to calculate diet variety. A TS of 0 to 10 was generated and a higher score indicates higher diet variety.

Supplemental material

Supplemental material

Fecal DNA extraction

Faecal bacterial DNA was extracted using a Maxwell RSC PureFood GMO and Authentication Kit (Promega) with modifications to increase the yield of DNA. Approximately 100 mg from each stool sample was added 800 ul TE buffer (PH 7.5), 16 ul beta-Mercaptoethanol and 250 U lyticase and digestion at 37°C for 90 min. Pelleted by centrifugation at 13,000×g for 3 min. After pretreatment, precipitate was suspended in 800 uL CTAB buffer (Maxwell RSC PureFood GMO and Authentication Kit following manufacturer’s instructions) and mixed. Samples were heated at 95°C for 5 min and cooled down. Nucleic acid was released from the samples by vortex with 0.5 mm and 0.1 mm beads at 2850 rpm for 15 min. Following that, 40 ul Proteinase K and 20 ul RNase A were added and nucleic acid digested at 70°C for 10 min. Finally, supernatant was placed in a Maxwell RSC instrument for DNA extraction. The extracted faecal DNA was used for ultra-deep metagenomics sequencing on an Ilumina Novoseq 6000 (Novogene, Beijing, China).

Quality control of raw sequences and data analysis

Raw sequence reads were trimmed using Trimmomatic to remove adapters and low quality regions (Trimmomatic-0.36) and then removed of contaminating human reads using KneadData (Reference database: GRCh38 p12). Paired-end reads were concatenated. Profiling of the composition of bacterial communities was performed on trimmed reads using MetaPhlAn2.41 Bacterial functions were predicted using HUMANN2 (chocophlan).17

Bacteriome data analysis

The collected subject metadata included participants’ anthropometric features, dietary quality, gestational age, mode of delivery and feeding mode (online supplemental table S1). Associations between metadata factors and community composition were assessed using PERMANOVA (adonis)42 in the vegan R package (999 permutations; FDR <0.05). Each host factor was calculated according to its explanation rate (R2), and p values were generated based on 999 permutations. All p values were FDR adjusted.

Calculations of diversity richness (Chao1 index) analysis were performed using phyloseq and vegan in R. Gut microbiota composition was visualised via non-metric multidimensional scaling based on Bray-Curtis dissimilarities at the species level. Species relative abundance tables were input into R using phyloseq for statistical analysis.

LefSe analyses were performed on the Huttenhower lab Galaxy server (http://huttenhowersphharvard. edu/galaxy/) by importing the bacterial genera relative abundance values. MaAsLin2 ( analysis was performed on microbial functionality and bacterial species for host factors to identify features. FDR was used in LefSe and MaAslin to adjust p values.

A co-occurrence network of gut bacterial species within two groups were calculated by Spearman’s rank correlation coefficient with psych package in R. The network was then constructed by using the method implemented in Cytoscape (3.8.1). In the network, edges denote significant correlations of all pairwise species (FDR <0.05) with correlation coefficient >0.5 (red line of the edge) or <−0.5 (blue line of the edge). The size of the nodes is proportional to the interaction number of the species, and the colour of nodes denotes phylum taxonomy.

RF was used for the prediction of ASD or TD. The importance value of each species to the classification model was evaluated by recursive feature elimination. According to descending importance value, the selected species were added one by one to the RF model if its Pearson correlation value with any existing probe in the model was <0.7. The discovery set was randomly divided into 80% for training the model and 20% for test the performance of the model. Each time a new feature was added to the model, the performance of the model was re-evaluated using 10-fold cross-validation. To avoid overfitting, this process was repeated 100 times. These models were compared in terms of binary classifiers with AUC in Receiver Operating Characteristic (ROC) curves. The final model was chosen when best accuracy for the test set was achieved. These analyse were done using R packages randomForest v4.6–147 and pROC V.1.15.39.

RF regression was used to regress relative abundances of species in the time-series profiling of the microbiota of the TD cohort against their chronological age using default parameters of the R implementation of the algorithm (R package ‘RF’, ntree 1000, using default mtry). The RF algorithm, due to its non-parametric assumptions, was applied and used to detect both linear and nonlinear relationships between bacterial species and chronological age, thereby identifying taxa that discriminate different periods of postnatal life in healthy children. Ranked lists of important taxa in order of reported feature importance were determined over five times 10-fold of the algorithm. To estimate the minimal number of top ranking age-discriminatory taxa required for prediction, the rfcv function implemented in the ‘RF’ package was applied over five times 10-fold. A sparse model consisting of the top 26 taxa was then trained on the training set. Without any further parameter optimisation, this model was validated in other healthy children and then applied to samples from children with ASD.

Data availability statement

Data are available in a public, open access repository. (Raw sequence) data that support the findings of this study have been deposited in (NCBI) with the (PRJNA686821) accession codes []

Ethics statements

Patient consent for publication

Ethics approval

This study was approved by the Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committees. All subjects provided informed consent to participate in this study and agreed for publication of the research results.


We are grateful to Stephen Kwok-Wing TSUI from School of Biomedical Sciences, The Chinese University of Hong Kong for expert advice.


Supplementary materials


  • Contributors Author contribution: YW conducted the study, performed DNA extraction, data analysis and drafted the manuscript. RC was responsible for ethical application, subject recruitment, collection of all questionnaire data and clinical samples, and commented on the manuscript. TZ, ZX, FZ, HZ, YKY and FKLC provided significant intellectual contribution to the manuscript. DFC and T-FL provided support and clinical advice on subject recruitment and DNA extraction. SCN designed the study, supervised the study and revised the manuscript.

  • Funding This work was supported by InnoHK, The Government of Hong Kong, Special Administrative Region of the People’s Republic of China. The work was also funded by a grant from the Health and Medical Research Fund (HMRF, grant number:14152251), Food and Health Bureau, Hong Kong SAR Government.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.