Article Text

Download PDFPDF
Original article
The HLA-DQ2 genotype selects for early intestinal microbiota composition in infants at high risk of developing coeliac disease
  1. M Olivares1,
  2. A Neef1,
  3. G Castillejo2,
  4. G De Palma1,
  5. V Varea3,
  6. A Capilla4,
  7. F Palau4,
  8. E Nova5,
  9. A Marcos5,
  10. I Polanco6,
  11. C Ribes-Koninckx7,
  12. L Ortigosa8,
  13. L Izquierdo1,
  14. Y Sanz1
  1. 1Instituto de Agroquímica y Tecnología de Alimentos, Consejo Superior de Investigaciones Científicas (IATA-CSIC), Valencia, Spain
  2. 2Hospital Universitario Sant Joan de Reus, Tarragona, Spain
  3. 3Gastroenterología, Nutrición y Hepatología Pediátrica, Hospital Universitario Sant Joan de Deu and Unidad de Gastroenterología Pediátrica del Institut Dexeus, Barcelona, Spain
  4. 4Centro de Investigación Príncipe Felipe (CIPF) and IBV-CSIC Associated Unit, CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
  5. 5Department Metabolismo y Nutrición, ICTAN-CSIC, Madrid, Spain
  6. 6Servicio de Gastroenterología y Nutrición Pediátrica, Hospital Universitario La Paz, Madrid, Spain
  7. 7Unidad de Gastroenterología Pediátrica, Hospital Universitario La Fe, Valencia, Spain
  8. 8Unidad de Gastroenterología, Hepatología y Nutrición Pediátrica, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Canarias, Spain
  1. Correspondence to Dr Yolanda Sanz (IATA-CSIC), Av. Agustín Escardino, 7, Paterna 46980, Valencia, Spain; yolsanz{at}


Objective Intestinal dysbiosis has been associated with coeliac disease (CD), but whether the alterations are cause or consequence of the disease is unknown. This study investigated whether the human leukocyte antigen (HLA)-DQ2 genotype is an independent factor influencing the early gut microbiota composition of healthy infants at family risk of CD.

Design As part of a larger prospective study, a subset (n=22) of exclusively breastfed and vaginally delivered infants with either high genetic risk (HLA-DQ2 carriers) or low genetic risk (non-HLA-DQ2/8 carriers) of developing CD were selected from a cohort of healthy infants with at least one first-degree relative with CD. Infant faecal microbiota was analysed by 16S rRNA gene pyrosequencing and real time quantitative PCR.

Results Infants with a high genetic risk had significantly higher proportions of Firmicutes and Proteobacteria and lower proportions of Actinobacteria compared with low-risk infants. At genus level, high-risk infants had significantly less Bifidobacterium and unclassified Bifidobacteriaceae proportions and more Corynebacterium, Gemella, Clostridium sensu stricto, unclassified Clostridiaceae, unclassified Enterobacteriaceae and Raoultella proportions. Quantitative real time PCR also revealed lower numbers of Bifidobacterium species in infants with high genetic risk than in those with low genetic risk.

In high-risk infants negative correlations were identified between Bifidobacterium species and several genera of Proteobacteria (Escherichia/Shigella) and Firmicutes (Clostridium).

Conclusions The genotype of infants at family risk of developing CD, carrying the HLA-DQ2 haplotypes, influences the early gut microbiota composition. This finding suggests that a specific disease-biased host genotype may also select for the first gut colonisers and could contribute to determining disease risk.

  • Intestinal Bacteria
  • Bifidobacteria
  • Celiac Disease

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known about this subject?

  • Intestinal dysbiosis is characteristic of subjects with untreated and treated coeliac disease (CD) with a gluten-free diet.

  • The genetic risk of developing CD (HLA-DQ2/DQ8 vs non-HLA-DQ2/8 genotype) and the milk-feeding type (breastfeeding vs formula feeding) influence gut colonisation of healthy infants with a first-degree relative with CD as revealed by quantitative PCR.

  • It has been suggested that these alterations in the gut microbiota could contribute to the aetiology and pathogenesis of CD by in vitro and animal studies that evaluated the protective (Bifidobacterium species) or adverse (Escherichia coli and Bacteroides fragilis) role of isolates of the target bacterial groups.

What are the new findings?

  • This is the first study that reveals that the HLA-DQ2 genotype per se influences the early gut microbiota composition in infants at family risk of developing CD by high-throughput sequencing.

  • Infants with high genetic risk of CD development (HLA-DQ2) have higher proportions of Firmicutes (Clostridium species) and Proteobacteria (Enterobacteriaceae) and lower proportions of Actinobacteria (Bifidobacterium species) than those with low genetic risk (non-HLA-DQ-2/8-carriers).

  • Bifidobacterium species (Actinobacteria) negatively correlated with Escherichia/Shigella (Proteobacteria) and Clostridium species (Firmicutes), suggesting that the former genus excludes the others in the gut ecosystem of high-risk infants.

  • Inverse relationships between the high CD risk genotype and Bifidobacterium species proportions exist above the well-known bifidogenic effect of breastfeeding.

How might it impact on clinical practice in the foreseeable future?

  • This study contributes to unravelling the genetic factors that dictate the early gut microbiota composition and that may determine CD risk, which could change the way this disorder is investigated, regarding its aetiology and management.


Coeliac disease (CD) is a chronic intestinal inflammatory disorder caused by a deregulated immune response to cereal gluten proteins of wheat, barley and rye in genetically predisposed individuals. The expression of the human leukocyte antigen (HLA) Class II molecules DQ2 and DQ8, coded by the DQA1 and DQB1 genes, is strongly associated with susceptibility to CD. The HLA-DQA1*05:01 and DQB1*02:01 alleles forming the particular DQ2.5 haplotype confer high susceptibility to CD.1 Susceptibility to CD is increased in those homozygous subjects with this haplotype in cis, or those carrying the DQ2.2 haplotype (HLA-DQA1*02:01 and DQB1*02:02) in trans with the DQ7.5 haplotype (HLA-DQA1*05:05 and DQB1*03:01).2 In patients with CD, gluten peptides are deaminated by tissue transglutaminase in the lamina propria and recognised by dendritic cells expressing HLA-DQ2/DQ8 molecules that mediate the typical Th1 response of the disease, producing mainly interferon-γ.3 Most patients are carriers of the HLA-DQ2/DQ8 genes but these genes are also present in about 40% of the general population, and only a small percentage (2–5%) develops CD.4 ,5 This indicates that the HLA-DQ genotype is necessary but not solely responsible for the development of the disease. Gluten is the main environmental trigger of CD but its intake does not fully explain the onset and clinical expression of the disease. In recent years, other environmental factors influencing the early gut microbiota composition such as type of delivery and milk-feeding, intestinal infections and antibiotic intake, have also been associated with the risk of developing CD.6–11

Colonisation of the newborn intestine is thought to contribute to proper development of the host's immune function and to determine susceptibility to immune-mediated disorders in early and later life.12 ,13 Most studies in patients with CD report imbalances in the intestinal microbiota14 with a few exceptions.15 ,16 In our own studies, the microbiota of patients with CD was characterised by decreased numbers of Bifidobacterium species and increased numbers of Bacteroides species in faeces and intestinal biopsies.17 ,18 Intestinal dysbiosis was not completely restored after adherence to a gluten-free diet suggesting that some changes in microbiota are not secondary to the inflammatory milieu of the active phase of the disease, but could play a primary role in predisposition to CD development. To date, we have a limited understanding of the host genotype's influence on the intestinal microbiota composition, but a few studies have been done in relation to chronic inflammatory bowel disorders.19–21 In the case of CD, there is only one study in a cohort of infants at family risk of developing the disease that has reported reliable associations between the HLA-DQ genotype and the composition of the intestinal microbiota, assessed by fluorescent in situ hybridisation and real-time PCR.22 ,23 Nevertheless, these preliminary associations could be confounded by diverse environmental variables (eg, type of milk-feeding, type of delivery, etc). Furthermore, the abovementioned studies were limited by their use of molecular techniques, which can only detect a small number of intestinal bacterial groups for which primers and probes are available. However, this problem can be now overcome by analysing 16S rRNA gene fragments from the whole faecal DNA (metagenome) by next generation sequencing techniques, which help to study the intestinal ecosystem in greater depth.24

The objective of this study was to characterise the microbiota of exclusively breastfed and vaginally delivered infants at family risk of developing CD to reveal possible associations between the HLA-DQ2 genotype, the early microbiota composition and the ecological interactions between relevant taxa. The broader aim is to gain a greater understanding of the genetic and environmental factors influencing the early colonisation process of the newborn intestine, and their impact on the risk of developing CD.

Materials and methods

Subjects and sampling

A subset (n=22) of 1-month-old, exclusively breastfed and vaginally delivered infants with either high genetic risk (HLA-DQ2 genotype, including homozygous HLA-DQ2.5 or heterozygous DQ2.5/DQ2.2 and DQ2.2/DQ7.5 carriers) or low genetic risk (non-HLA-DQ2/8 genotype) of developing CD were selected from a cohort of healthy infants with at least one first-degree relative with CD participating in a larger prospective study.23 The risk of developing CD was determined by PCR-SSP DQB1 and DQA1 typing as previously described.23 The group of 11 infants classified as high risk (HR) represented the highest probability (>20%) of developing CD and included those carrying the DQ2.5 haplotype (DQA1*05:01-DQB1*02:01) in homozygosis and the DQ2.5/DQ2.2 or the DQ2.2/DQ7.5 haplotypes in heterozygosis (see online supplementary table 1). The low-risk (LR) group included those individuals with other common genotypes unassociated with CD, thus having the lowest probability (<1%) of developing CD. None of the infants included in the study received antibiotics during the sampling period.

The study was approved by the local ethics committees and written informed consent was obtained from the parents of infants included in the study.

DNA extraction

For DNA extraction, 0.2 g of faecal sample was homogenised in 15 mL of phosphate buffer saline (PBS) (130 mM sodium chloride, 10 mM sodium PBS, pH 7.4). Afterwards, the suspension was filtered through a 100 μm nylon filter, washed twice with PBS and resuspended in tris-EDTA buffer (10 mM Tris, 1 mM EDTA, pH 8.0 (HCl)). Of the suspension 1 mL was added to 2 mL of lysozyme (2.5 mg/mL) and incubated at 37°C for 1 h. After adding 0.67 mg of proteinase K, the mix was incubated at 55°C for 10 min. A volume of 400 μL 10% (w/v) sodium dodecyl sulfate was added to the mix and samples were incubated at 55°C for 1 h under gentle shaking. One aliquot of the mix was used for DNA extraction following the DNeasy Blood and Tissue kit protocol (Qiagen, Hilden, Germany).

Sequencing the 16S rDNA amplicons

The extracted metagenomic DNAs were used to amplify the V5 and V6 hypervariable regions of the 16S rRNA using the primers 784F 5′-AGGATTAGATACCCTGGTA-3′ and 1061R 5′-CRRCACGAGCTGACGAC-3′ by PCR, as previously described.25 The forward primer contained the sequence of the Titanium A adaptor 5′-CCATCTCATCCCTGCGTGTCTCCGACTCAG-3′ and the bar code sequence. For each sample, a PCR mix of 100 μL was prepared containing 1×PCR buffer, 2U of KAPA HiFi Hotstart polymerase blend and deoxynucleotide triphosphates (Kapa biosystems, Wilmington, USA), 300 nM primers (Eurogentec, Liege, Belgium) and 60 ng gDNA. Thermal cycling consisted of initial denaturation at 95°C for 5 min, followed by 25 cycles of denaturation at 98°C for 20 s, annealing at 56°C for 40 s and extension at 72°C for 20 s, with a final extension of 5 min at 72°C. Then 3 µL of PCR product were added to a new PCR mix (identical to the first round of PCR) for the nested PCR of 15 cycles. Amplicons were visualised on 1% agarose gels using GelGreen Nucleic Acid gel stain in 1× tris-acetate-EDTA buffer (Biotium, Hayward, USA) and were cleaned using the Wizard SV Gel and PCR Clean-up System (Promega, Madison, USA) according to the manufacturer's instructions. Amplicon DNA concentrations were determined using the Quant-iT PicoGreen dsDNA reagent and kit (Life Tech, Carlsbad, USA) following the manufacturer's instructions. Assays were carried out using 2 μL cleaned PCR product in a total reaction volume of 200 μL in black, 96-well μtitre plates. Following quantitation, cleaned amplicons were combined in equimolar ratios into a single tube. The final pool of DNA was eluted in a volume of 100 µL nuclease-free water, purified using Agencourt Ampure XP Purification systems (Agencourt Biosciences Corporation-Beckman coulter, Beverly, USA) and then resuspended in 100 µL of tris-EDTA 1x. The concentration of the purified pooled DNA was determined using the Quant-iT PicoGreen dsDN reagent and kit (Life Tech, Carlsbad, USA) following the manufacturer's instructions. Pyrosequencing was carried out using primer A on a 454 Life Sciences Genome Sequencer FLX instrument (Roche, Basel, Switzerland) with titanium chemistry. 16S rDNA amplicons were sequenced by DNA Vision Agrifood SA (Liège, Belgium).

Sequence and clustering analysis

Original reads were filtered by length (>240 bp) and quality (average Phred value >25) and then for chimaeras using UCHIME26 resulting in 12 188±3,191 sequence reads per sample on average. Reads were identified at phylum, family and genus levels at an 80% confidence level using the Ribosomal Database Project multiclassifier tool. Rarefaction curves and cluster analysis were calculated with the mothur package using 1000 randomisations.27 For analysis of clostridia sequences the Living Tree Project data set available from was used and the respective reads were subjected to a BlastN search against a database of microbial 16S rRNA gene sequences (SILVA 111,

Similarity of the subject's microbiota was evaluated by cluster analysis of Bray-Curtis distances between individuals, considering the relative abundances of different genera. Decimal logarithms of raw abundance data were classified according to an arithmetic progression in a scale from 0 to 7.28 Logarithms were obtained adding 1 to each raw value, 0 corresponded to absence of the genera and 7 was fixed as the maximum raw datum considering all genera and individuals. Bray-Curtis distances between subjects were computed and agglomerative nesting cluster (AgNes function of the cluster package of R) analysis was applied to obtain the distances.28 Weighted UniFrac analysis was performed with a set of 258 dereplicated sequences with abundances of at least 100 reads26 using QIIME and the FastUniFrac tool at the URL

Richness and diversity indices

Microbial biodiversity and richness were analysed for each sample. Both parameters are based on operational taxonomic units (OTUs), which are clusters of reads defined by their interdistance using the Chao index. Richness is directly related to the number of observed OTUs whereas the Shannon diversity index depends on the distribution of sequence abundances in the observed OTUs. The Simpson index is a measure of the probability of randomly resampling an OTU. Microbial richness, Shannon and Simpson indices were calculated with the mothur package.27

Correlation analyses

To study possible ecological interactions between different bacterial taxa, we established correlations within the bacterial groups identified in the low and high genetic risk groups of infants. We considered only those genera present in at least 9 of the 11 infants in each group. The weighted number of reads for each infant was expressed as a logarithm and cases with double zeroes (absence of the genera in the two infant groups compared) were dismissed. The significance (p value) of the correlation coefficients between abundance of any two genera was estimated by a permutation test, performing 1000 simulations for each coefficient.29 Only positive correlations with |1−p|<0.100 and negative correlations with p<0.100 were considered. Additionally, we considered possible correlations among the genera Bifidobacterium, Corynebacterium, Gemella, Clostridium sensu stricto and Raoultella because they enabled us to discriminate between both groups of infants in previous analyses.

Real time quantitative PCR (qPCR)

DNA was amplified using group-specific and genus-specific primers as described previously30 ,31 to quantify different bacterial groups in the intestinal microbiota. Each reaction mixture consisted of 7.5 μL of SYBR Green PCR Master Mix (Roche), 3.5 μL of DNase RNase free water, 0.75 μL of each of the specific primers (10 μM) (Isogen, Barcelona, Spain), and 2.5 μL of template DNA. PCR amplification and detection were performed using a Light Cycler LC480 (Roche). Gene copy numbers of each bacterial group were calculated by comparing the cycle threshold (Ct) values obtained with those from a standard curve. Standard curves were generated from serial dilutions of a known copy number of the 16S rRNA gene cloned into a pGEM-T Easy Vector System (Promega). Escherichia coli DH5α was transformed with the recombinant plasmids and plasmid DNA was extracted by the miniprep method.32

Data processing and statistical analyses

Of the demographic data, the categorical variables (number of first-degree relatives with CD) were analysed using the χ2 test and the continuous variables (size, weight and weeks of gestation at birth) using a t test since data distribution was normal as assessed with Shapiro-Wilk W test (SPSS software V.19.0). Data of microbiota composition was not normally distributed as assessed by the same test. Comparisons of data from qPCR were done by applying the non-parametric Mann-Whitney U test (SPSS software V.19.0). Read numbers for each taxon obtained for each infant by pyrosequencing were weighted using the mean values of the total reads for all 22 individuals. The recalculated numbers of reads assigned to each taxon for the high genetic risk (n=11) and low genetic risk (n=11) groups of infants were compared using permutation analyses (DAAG package of R software). To compare diversity indices we implemented the R-software Wilcoxon test. Correlations between the data obtained by pyrosequencing and qPCR were analysed using Pearson correlation coefficient (SPSS software V.19.0). In all cases, statistically significant differences were established at p<0.050.


Characteristics of the infants included in the study

The demographic characteristics and the HLA genotype of the study infants are included in table 1. No statistically significant differences (p>0.050) were observed in size, weight and weeks of gestation at birth or in the number of first-degree relatives with CD (mother, father, brother or/and sister) between the two infant groups.

Table 1

Demographic characteristics and genotype of the infants included in the study

Microbiota composition by 16S rRNA gene sequencing

The pyrosequencing analysis detected sequences belonging to four phyla in all samples: Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria (figure 1A). The intestinal microbiota of infants with low genetic risk of developing CD was characterised by a very high proportion of Actinobacteria (mean (SD) 79.6 (19.1)%), a moderate proportion of Firmicutes (14.9 (16.6)%) and a low proportion of Proteobacteria and Bacteroidetes (3.3 (7.3)% and 2.2 (6.2)%, respectively). Infants at high genetic risk of developing CD showed a more heterogeneous and evenly distributed microbiota among the phyla Actinobacteria, Firmicutes and Proteobacteria (36.1 (39.5)%, 38.2 (28.2)% and 22.6 (28.8)%, respectively), with a small proportion of Bacteroidetes (3.0 (9.9)%). High SD values revealed notable interindividual variability of the microbiota composition (figure 1B). Infants with a HR of developing CD showed significantly higher proportions of Firmicutes (p=0.026) and Proteobacteria (p=0.039) and a lower proportion of Actinobacteria (p=0.005) than those with low genetic risk. The HR group of infants also presented higher proportions of Bacteroidetes than the LR group (3.0 (9.9)% vs 2.2 (6.2)%) but the differences were not statistically significant (p=0.954).

Figure 1

Microbiota analysis by pyrosequencing the V5 and V6 hypervariable regions of the 16S rRNA. Mean distribution of the four phyla Actinobacteria, Firmicutes, Proteobacteria and Bacteroidetes detected in the high genetic risk (HR) and low genetic risk (LR) groups of infants (A) and their corresponding individual percentages in each infant (B).

In figure 2 is shown a schematic representation of the principal families characterising the microbiota in the low and high genetic risk groups of infants. In infants with low genetic risk of developing CD the majority (96.4% of sequence reads) of the total bacterial population belonged to five families: Bifidobacteriaceae, Streptococcaceae, Lachnospiraceae, Coriobacteriaceae and Enterobacteriaceae. In infants with high genetic risk of developing CD the majority (95.9% of sequence reads) of the total bacterial population belonged to ten families: Bifidobacteriaceae, Enterobacteriaceae, Streptococcaceae, Lachnospiraceae, Clostridiaceae 1, Bacteroidaceae, Erysipelotrichaceae, Lactobacillaceae, Actinomycetaceae and Enterococcaceae. In infants with low genetic risk of developing CD, sequences belonging to the phyla Actinobacteria and Firmicutes were mainly represented by the families Bifidobacteriaceae, and Lachnospiraceae plus Streptococcaceae, respectively. However, in infants with high genetic risk of developing CD, sequences belonging to the phyla Actinobacteria, Proteobacteria and Firmicutes were mainly represented by the families Bifidobacteriaceae, Enterobacteriaceae and Lachnospiraceae, Streptococcaceae plus Clostridiaceae, respectively. Infants with a high risk of developing CD had significantly higher abundance of Clostridiaceae 1 (p<0.001), Bacillales incertae sedis XI (p=0.009), Enterobacteriaceae (p=0.039), Corynebacteriaceae (p=0.050) and unclassified Clostridiaceae (p=0.012); and lower abundance of Bifidobacteriaceae (p=0.010) than those with low genetic risk.

Figure 2

Percentages of the average values of the different families detected in the high and low genetic risk groups of infants. The families are represented with different shades and patterns indicating that they belong to the phyla Actinobacteria (blue), Firmicutes (green), Proteobacteria (yellow) and Bacteroidetes (red). Families shaded black in the figure had abundances below 0.1% and thus their shares are not visible.

At genus level, HR infants had significantly increased proportions of Corynebacterium (p=0.050), Gemella (p=0.009), Clostridium sensu stricto (p<0.001), Raoultella (p=0.035), unclassified Clostridiaceae (p=0.013) and unclassified Enterobacteriaceae (p=0.009) and decreased proportions of Bifidobacterium (p=0.010) and unclassified Bifidobacteriaceae (p=0.011) compared with LR infants (table 2).

Table 2

Mean, SD, median, Q25, Q75 values and ranges of the number of reads in each infant weighted using the total number of reads of all the infants

An AgNes cluster dendrogram based on Bray-Curtis distances showed that at genus level the microbiota of infants can be partially differentiated based on their HLA-DQ genotype and cluster together with a few exceptions (figure 3). A weighted UniFrac analysis clustered the microbiota according to predominant bacterial genera in each case. Consistent with the AgNes clustering the basic differentiation was between samples containing or not containing bifidobacteria. Six HR infants dominated either by more than 40% Enterobacteriaceae reads (Escherichia or Klebsiella, subcluster C) or by more than 50% Streptococcus (subcluster D) and both not containing bifidobacteria (≤0.2%) grouped with each other. A large group of samples (subcluster A) comprising seven from LR infants and three from HR infants (HR12, HR13 and HR14) all containing 72–98% Bifidobacterium species grouped closely whereas three samples sharing the presence of 25–40% unclassified Lachnospiraceae (subcluster B) clustered together.

Figure 3

(A) AgNes dendrogram shows the clustering of the infants based on Bray-Curtis distances calculated on the basis of the abundance data (number of 16S rRNA gene sequence reads) at genus level. (B) Weighted UniFrac clustering of 166799 reads (representing 64.4% of all reads) based on a set of 258 sequences with total abundances between 16 577 and 100. Capital letters indicate subclusters of infants according to common dominant bacterial genera in their microbiota. (A): Bifidobacterium species >80%, (B): unclassified Lachnospiraceae 25–40%, (C): Enterobacteriaceae >40%, and (D): Streptococcus >50%. Low-risk (LR) individuals are shown in blue and high-risk (HR) in red.

Rarefaction curves for each group indicate that the total bacterial diversity was well represented since clustering curves for all individuals, except HR16, approached saturation at 90% (data not shown).

Richness and diversity analysis

Differences in richness and Simpson index between the high and low genetic risk groups of infants showed a borderline statistical significance (p=0.065) (table 3). Richness was 18.1 (6.2) (mean (SD)) for the low genetic risk group and 27.0 (10.3) for the high genetic risk group. The respective Simpson indices were 0.68 (0.26) and 0.56 (0.23).

Table 3

Number of sequences obtained in samples from the low genetic risk and high genetic risk groups and percentage of reads assigned at the phylum, family and genus levels

Correlation analysis

The possible interactions between the bacterial genera detected in infants at low genetic risk are summarised in figure 4A. Nine interactions involving nine bacteria were selected that comprise only one positive correlation between Rothia-Streptococcus (r=0.630, 1−p=0.029). The highest negative correlations were between Bacteroides-Enterococcus (r=−0.756, p=0.014), and Bifidobacterium-Eggerthella (r=−0.613, p=0.024). The increased diversity in the microbiota of HR infants resulted in a higher number of significant correlations (a total of 35) involving 14 different bacterial genera (figure 4B). Bifidobacterium showed a significant negative correlation with Veillonella (r=−0.590, p=0.010), Rothia (r=−0.800, p=0.004), Streptococcus (r=−0.761, p=0.007) and borderline significance with Escherichia/Shigella (r=−0.497, p=0.059) and Clostridium (r= −0.397, p=0.121). Of bacterial genera that enabled to discriminate between the two groups of infants, Bifidobacterium showed a no significant positive correlation with Corynebacterium (r=0.103, 1−P=0.627) and Raoultella (r=0.070, 1−p=0.582) and negative with Gemella (r=−0.206, p=0.270). Clostridium species showed a statistically significant negative correlation with Lactobacillus (r=−0.573, p=0.041), but not with Corynebacterium (r=−0.130, p=0.379), Gemella (r=−0.212, p=0.268) and Raoultella (r=−0.057, p=0.438).

Figure 4

Representation of the correlations among bacterial genera that met the criteria |1−p|<0.100 (positive correlation) or p<0.100 (negative correlations) in low (A) and high genetic risks groups (B) of infants. The colour of the different genera indicate they belong to the phyla Actinobacteria (blue), Firmicutes (green), Proteobacteria (yellow) and Bacteroidetes (red). Circle sizes are relative to the logarithm of the mean weighted number of reads (see scale). Blue or red node colours indicate positive and negative correlations, respectively. Node style represents the correlation coefficient as continuous line (r≥ 0.75 or ≤ −0.75), discontinuous line (|r|=0.75–0.1) and weak line (0.1>r> −0.1).

Prevalence of Clostridium sensu stricto discriminates between the infant groups

The genus Clostridium sensu stricto was considered indicative of differences in the microbiota between low and high genetic risk groups, because its proportions were quantitatively different between both groups and it was present in nine individuals of the high genetic risk group but in none of the low genetic risk group. Therefore, the composition of this genus was analysed in more detail.

Clostridium sensu stricto was present in three HR group individuals representing 28.1% (HR20), 14.0% (HR21) and 13.3% (HR17) of all reads and in five more representing between 0.4% and 1.4%. The 10 173 reads classified in this genus were therefore further analysed. In total, these represented 1879 different sequence types. BlastN comparisons showed nine Clostridium species as best hits (table 4). Three of them were dominant, to which 9577 reads (94.1%) belonged to Clostridium paraputrificum (41.2%), Clostridium chartatabidum (33.2%) and Clostridium perfringens (19.8%) with average sequence similarities to the reference sequences X75907, X71850 and CP000246 of 97.8, 98.3 and 99.2%, respectively. C perfringens was present in five individuals, proving the most widely distributed species. The highest diversity of Clostridium was found in individuals HR20 and HR21, with seven and five of the nine species identified, respectively. Interestingly, two closely related species (C. paraputrificum and C. chartatabidum) were coexisting in HR20, both in relatively high numbers. On the contrary, the presence of C. perfringens was practically exclusive.

Table 4

Distribution of pyrosequencing reads classified as Clostridium sensu stricto

Real time quantitative PCR (qPCR)

The characterisation of the microbiota composition by qPCR (table 5) revealed that infants in the high genetic risk group had reduced gene copy numbers of Bifidobacterium species (median log values (IQR) 6.86 (6.36–11.28) vs 11.00 (10.73–11.42) p=0.018) compared with the low genetic risk group according to pyrosequencing data. The high genetic risk group also had reduced gene copy numbers of Bacteroides fragilis group and Blautia (Clostridium coccoides) group compared with the low genetic risk group (>1 log unit difference), but the latter differences did not reach statistical significance (p=0.336 and 0.630, respectively). The high genetic risk group of infants also showed increased gene copy numbers of Staphylococcus species compared with the low genetic risk group; however, the difference was not statistically significant (p=0.142).

Table 5

Faecal microbiota of infants with high or low genetic risk of developing coeliac disease determined by quantitative PCR

Statistically significant correlations between the data obtained by qPCR and pyrosequencing were detected for the Bacteroides fragilis group (r=0.998, p<0.001), the Bifidobacterium species (r=0.469, p=0.037) and the Lactobacillus group (r=0.869, p<0.001). No statistically significant correlation between the data obtained by the two techniques was detected for the other bacterial groups analysed by qPCR.


Emerging evidence supports the hypothesis that host genotype influences gut microbiota composition, but there is a limited number of studies revealing these interactions.33 The present study is the first to demonstrate by high-throughput sequencing of 16S rRNA genes that the HLA-DQ2 genotype per se influences the intestinal bacterial community structure of infants at family risk of developing CD. Indeed, compared with LR infants, the intestinal microbiota of HR infants was characterised by increased Firmicutes and Proteobacteria proportions but reduced Actinobacteria proportions. The increased Firmicutes proportions were reflected by increased proportions of the genera Clostridium (Clostridium sensu stricto and unclassified Clostridiaceae) and Gemella. The higher abundances of Proteobacteria were due to increases in Raoultella and unclassified Enterobacteriaceae. The reduced proportions of Actinobacteria were due to decreases of the genera Bifidobacterium and Corynebacterium.

Reductions in Bifidobacterium species have also been detected previously in duodenal biopsies and faeces of subjects with the disease, even after long-term adherence to a gluten-free diet, indicating that this bacterial group could contribute to the etiopathogenesis of CD.17 ,18 Previous studies have investigated the influence of the HLA-DQ2/8 genotype on the intestinal microbiota composition of this cohort of infants with a family history of CD, but they used molecular techniques with a more limited resolution such as fluorescent in situ hybridisation, qPCR and denaturing gradient gel electrophoresis.22 ,23 ,34 Notwithstanding, initial results revealed that although the HLA-DQ2/8 genotype and milk-feeding type (breast-milk or formula) influence the infant's gut colonisation, infants with higher genetic risk (HLA-DQ2/8 positives) had lower Bifidobacterium species and Bifidobacterium longum numbers, and higher Staphylococcus species numbers irrespective of milk-feeding type.23 However, our present study conducted in exclusively breastfed infants did not reveal differences in Bacteroides species, confirming the hypothesis that breastfeeding attenuated the HLA-DQ driven microbiota differences in this bacterial group, which were detected only in the formula-fed infants in our previous study by qPCR.

Our present pyrosequencing data also revealed that the HLA-DQ2 genotype influences colonisation by Clostridium species (Clostridium sensu stricto and unclassified Clostridiaceae). Clostridium sensu stricto was detected in 9 out of 11 samples from HR infants, while it was absent in samples from LR infants. C. perfringens was the species showing the highest prevalence (in five out of nine cases). A recent study that analysed the intestinal microbiota by 16S rRNA gene pyrosequencing in children with the HLA-DQB1 (DQB1*02, DQB1*03:02) genotype, β-cell autoimmunity was positively associated with C. perfringens.35 However, this association could not be related to the HLA genotype because this was similar in children with β-cell autoimmunity (defined as positive diabetes-associated autoantibodies at least twice) and in those negative for the autoantibody. Also Clostridium difficile (cluster XI) infection, defined as positive toxin detection, has been associated with the NOD2 genotype and phenotype of patients with Crohn's disease.21

In the present study, correlation analyses for HR infants also revealed negative correlations between Bifidobacterium species and Clostridium species, which might explain the absence of Clostridium species in LR infants where Bifidobacterium species are clearly dominant. Data from a previous prospective study reported that a lower ratio of Bifidobacterium to Clostridium counts in the faecal microbiota of infants was associated with atopic disease.15

Our pyrosequencing study has also been the first to detect differences in proportions of the genera Corynebacterium, Gemella and Raoultella as a function of the HLA-DQ2 genotype in the infant's gut. Although the biological role of these genera is not well-documented, mice with alcoholic liver disease also showed increases in Corynebacterium and Alcaligenes.36 The authors proposed that these changes in microbiota might contribute to the disease by increasing intestinal permeability via alterations in tight-junction related protein expression (zonula occludens-1 and claudin-1), which could be prevented by probiotic administration.36 In the light of our results, we cannot disregard the possibility that specific microbiota alterations (eg, Corynebacterium species) may also contribute to this phenomenon in CD as increased intestinal permeability is also characteristic of the active phase of CD and is considered a potential predevelopmental event.37

Another study analysed the microbiota of a small subset of subjects (eight per group) by pyrosequencing to investigate how early (from 6 months on) or late (after 12 months) gluten introduction into the infant diet affected CD development in genetically predisposed infants (HLA-DQ2/8).38 The authors reported that the microbiota of HLA-DQ2/8 carriers was characterised by higher abundance of Firmicutes and lower abundance of Bacteroidetes (1% to undetectable) as compared with data from another study on non-genotyped healthy infants.39 However, samples from each of the aforementioned studies were analysed using different techniques (small subunit (SSU) rDNA microarray vs 454 pyrosequencing) which could explain the differences that the authors attributed to the HLA-DQ genotype.38 Differences in sample handling (storage time and temperature) and processing (eg, DNA extraction method) could also lead to different results from massive sequencing analysis; therefore, studies using different methodologies are considered non-cross-comparable.40 Furthermore, the genotype of infants included in the Palmer study was unknown,39 thus there may have been HLA-DQ carriers, because they represent up to 40% of the general population.5

Another prospective study that analysed possible associations between the intestinal microbiota composition and the risk of developing neonatal sepsis or systemic inflammation revealed a dominance of Proteobacteria in infants who developed the disease12 as in the case of our high genetic risk infants. Furthermore, correlation analyses in HR infants also revealed negative correlations between Bifidobacterium species and Proteobacteria (Escherichia/Shigella), which might explain reduced levels of the latter in LR infants whose microbiota was dominated by bifidobacteria. Also, the phenotype of patients with IBDs (Crohn's disease and UC), and the NOD2 and ATG16L1T300A genotypes have been associated with compositional shifts in the phylum Proteobacteria in intestinal biopsies.41

In the present study, statistically significant correlation between pyrosequencing and qPCR data was only obtained for bifidobacteria, bacteroides and lactobacilli. This can be partially explained by differences between the range of bacteria targeted by the qPCR primers used, and the classification of the corresponding taxa by pyrosequencing, as previously reported.40

Several studies have attempted to establish relationships between gut microbiota diversity and health status, but conclusions have been rather inconsistent to date. It is well known that breastfed babies have a low diverse microbiota, mainly dominated by the genus Bifidobacterium, and breastfeeding exerts a positive effect over infants’ health.42 ,43 In our present study, infants with low genetic risk of developing CD also had slightly lower bacterial diversity compared with HR infants. Especially, in those HR infants in which bifidobacteria were absent or present in only small numbers, the microbiota was much more diverse. Other studies report that decreased gut microbiota diversity is associated with diseases such as necrotising enterocolitis44 and type 1 diabetes45 in infants, and with obesity in adults.46 A prospective study of neonatal infants also showed that those who developed sepsis or systemic inflammation presented low microbial diversity, from the meconium through to disease onset.12 One of the reasons that can explain the controversial conclusions regarding the relationship between microbial diversity and disease reported in infants is that some of the previous studies have used primers that are not appropriate for detection of bifidobacteria,12 ,44 ,45 which is the bacterial group that mainly determine gut microbiota diversity early in life. Under-representation of bifidobacteria in 16S rRNA pyrosequencing analyses has been described and led to the design of alternative wobbled PCR amplification primers that improve the detection of bifidobacteria47 and that were used in our study.

In conclusion, the results of our study based on high-throughput sequencing reveal that the HLA-DQ2 genotype strongly influences the intestinal colonisation of infants at family risk of developing CD. Infants with high genetic risk of developing the disease (HLA-DQ2 carriers) show reduced abundance of Actinobacteria (Bifidobacterium species) and increased abundance of Firmicutes (Clostridium sensu stricto and unclassified Clostridiaceae and Gemella) and Proteobacteria (Raoultella and unclassified Enterobacteriaceae). Some of these bacterial groups were negatively correlated, suggesting that one excludes the others. Follow-up of this study cohort is underway to determine whether the microbiota changes dictated by the HLA-DQ genotype precede the development of CD in later life.


The authors thank Juanjo Abellán for his assistance with the R software.



  • Contributors YS conceived the study design, MO performed the microbiological experimental work, MO and AN analysed microbiological and sequence data, GDP created the data base, GC, VV, EN, AM, IP, C R-K, LO recruited and followed the infants, AC and FP genotyped the infants, LI did the statistical analysis, MO, AN and YS drafted the manuscript and all authors read and approved its final version.

  • Funding This work was supported by grants AGL2011–25169 and Consolider Fun-C-Food CSD2007-00063 from the Spanish Ministry of Economy and Competitiveness (MINECO). The scholarships to MO and GDP from CSIC are fully acknowledged.

  • Competing interests None.

  • Patient consent Obtained.

  • Ethics approval CSIC, Hospital Universitario Sant Joan de Reus, Hospital Universitario Sant Joan de Deu and Unidad de Gastroenterología Pediátrica del Institut Dexeus, Hospital Universitario La Paz, Hospital Universitario La Fe; Hospital Universitario Nuestra Señora de Candelaria.

  • Provenance and peer review Not commissioned; externally peer reviewed.