Chapter Nineteen - Advancing Our Understanding of the Human Microbiome Using QIIME
Introduction
Advances in DNA sequencing technologies, together with the availability of culture-independent sequencing methods and software for analyzing the massive quantities of data resulting from these technologies, have vastly improved our ability to characterize microbial communities in many diverse environments. The human microbiota, the collection of microbes living in or on the human body, is of considerable interest: microbial cells outnumber human cells in our bodies by a ratio of up to 10 to 1 (Savage, 1977). These microbial communities contribute to healthy human physiology (De Filippo et al., 2010, Dethlefsen and Relman, 2011, Spencer et al., 2011) and development (Dominguez-Bello et al., 2010, Koenig et al., 2011), and dysbiosis (or imbalance in these communities) is now known to be associated with disease, including obesity (Turnbaugh et al., 2009) and Crohn’s disease (Eckburg & Relman, 2007). More recently, evidence from transplants into germ-free mice suggests that some of these associations may be causal, because certain phenotypes can be transmitted by transmitting the microbiota (Carvalho et al., 2012, McLean et al., 2012, Turnbaugh et al., 2009), even including transmission of human phenotypes into mice (Diaz Heijtz et al., 2011, Koren et al., 2012, Smith et al., 2013).
Illumina’s MiSeq and HiSeq DNA sequencing instruments, respectively, sequence tens of millions, or billions, of DNA fragments in a single sequencing run (Kuczynski et al., 2012). The rapidly increasing data volumes typical of recent studies drive a need for more efficient and scalable tools to study the human microbiome (Gonzalez & Knight, 2012). QIIME (Quantitative Insights Into Microbial Ecology) (Caporaso, Kuczynski, et al., 2010) is an open-source pipeline designed to provide self-contained microbial community analyses, from interacting with raw sequence data through publication-quality statistical analyses and visualizations.
QIIME integrates commonly used third-party tools and implements many diversity metrics, statistical methods, and visualization tools for analyzing microbial data. Consequently, most individual steps in the microbial community analysis can be performed in multiple ways. Here, we describe how samples are prepared for an Illumina MiSeq run, the QIIME pipeline, and our view of the current best practices for analyzing microbial communities with QIIME. Although there are other pipelines available, including mothur (Schloss et al., 2009), the RDP tools (Olsen et al., 1991, Olsen et al., 1992), ARB (Ludwig et al., 2004), VAMPS (Sogin, Welch, & Huse, 2009), and other platforms, in this review, we focus on analysis with the MiSeq platform and QIIME as this combination is increasingly popular as a method for analyzing microbial communities and a detailed comparison of other available pipelines and sequencing platforms is beyond the scope of the present work.
Section snippets
QIIME as Integrated Pipeline of Third-Party Tools
An early barrier to adoption of QIIME was that it was difficult to install, in part because of the large number of software dependencies (third-party packages that need to be installed before QIIME is operational). The large number of dependencies was, however, a deliberate choice made during QIIME development. To build a pipeline for sequence analysis that encompasses the many steps from sequence collection, curation, and statistical analysis, the user must consider many existing tools that
PCR and Sequencing on Illumina MiSeq
Microbial community analysis typically begins with the extraction of DNA from primary samples (note that although most of this DNA comes from cells in the sample, some may consist of dead cells or extracellular DNA, so the representation of the active community from these sources is not perfect). Although methods for DNA extraction vary, several large initiatives such as the Earth Microbiome Project (Gilbert et al., 2010, Gilbert et al., 2010) and the Human Microbiome Project (HMP) (Human
QIIME Workflow for Conducting Microbial Community Analysis
The Illumina MiSeq technology can generate up to 107 sequences in a single run (Kuczynski et al., 2012). QIIME takes the instrument output and generates useful information about the community represented in each sample. At a coarse-grained level, we divide this process into “upstream” and “downstream” stages (Fig. 19.1). The upstream step includes all the processing of the raw data (sequencing output) and generating the key files (OTU table and phylogenetic tree) for microbial analysis. The
Testing linear gradients, including time series analysis
Recent microbiome surveys have started integrating gradients (commonly over time) in their study design. We will discuss a first and general approach for those cases, using the Moving Pictures of the Human Microbiome Dataset (Caporaso et al., 2011), where two subjects were sampled daily for up to 396 days in three different body sites (sebum, saliva, and feces). Note that the mouse dataset that we use as a primary example lacks a natural temporal ordering in the study design, so we cannot use
Recommendations
Here, we highlight some of the main aspects to take into account when performing microbial community analysis:
- •
Use the open-reference OTU picking approach if your data allow it. It will reduce the running time and will recover all the diversity in your samples.
- •
Perform an OTU quality-filtering based on abundance, by removing singletons, for instance. See Bokulich et al. (2013) for further discussion on how to tune this quality-filtering and its effects on downstream analysis. Quality-filtering is
Conclusions
QIIME is a powerful tool for the analysis of bacterial community allowing researchers to recapitulate the necessary steps in the processing of sequences from the raw data to the visualizations and interpretation of the results. Two advantages make QIIME very useful: fidelity to the algorithms used and consistency in the analysis. Fidelity is obtained because QIIME wraps existing software, preserving the integrity of the original programs and algorithms designed, created, and tested by the
Acknowledgments
We thank William A. Walters and Jessica Metcalf for productive discussion and their useful comments about QIIME. We also acknowledge Manuel Lladser for helping collect the dataset and allowing us to use it, and the IQBio IGERT grant for funding data collection. J.A.N.M. is supported by a graduate scholarship funded jointly by the Balsells Foundation and by the University of Colorado at Boulder. S.H. is partially supported by NIH Grant R01 GM086884. This work was partially supported by the
References (107)
- et al.
Basic local alignment search tool
Journal of Molecular Biology
(1990) - et al.
Transient inability to manage proteobacteria promotes chronic gut inflammation in TLR5-deficient mice
Cell Host & Microbe
(2012) Conservation evaluation and phylogenetic diversity
Biological Conservation
(1992)- et al.
Advancing analytical algorithms and pipelines for billions of microbial sequences
Current Opinion in Biotechnology
(2012) - et al.
Host remodeling of the gut microbiome and metabolic changes during pregnancy
Cell
(2012) - et al.
Association between composition of the human gastrointestinal microbiome and development of fatty liver with choline deficiency
Gastroenterology
(2011) - Allaire, J., Horner, J., Marti, V., & Porte, N. (2013). Markdown: Markdown rendering for R. From...
- et al.
Enterotypes of the human gut microbiome
Nature
(2011) - et al.
Microbial ecology: Fundamentals and applications
(1998) - et al.
Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing
Nature Methods
(2013)
Random forests
Machine Learning
PyNAST: A flexible tool for aligning sequences to a template alignment
Bioinformatics
QIIME allows analysis of high-throughput community sequencing data
Nature Methods
Moving pictures of the human microbiome
Genome Biology
Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms
ISME Journal
Computational tools for evaluating phylogenetic and hierarchical clustering trees
Journal of Computational and Graphical Statistics
KING (Kinemage, Next Generation): A versatile interactive molecular and scientific visualization program
Protein Science
The Ribosomal Database Project: Improved alignments and new tools for rRNA analysis
Nucleic Acids Research
Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa
Proceedings of the National Academy of Sciences of the United States of America
Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB
Applied and Environmental Microbiology
Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation
Proceedings of the National Academy of Sciences of the United States of America
Normal gut microbiota modulates brain development and behavior
Proceedings of the National Academy of Sciences of the United States of America
Delivery mode shapes the acquisition and structure of the initial microbiota across multiple body habitats in newborns
Proceedings of the National Academy of Sciences of the United States of America
16S ribosomal DNA sequence analysis of a large collection of environmental and clinical unidentifiable bacterial isolates
Journal of Clinical Microbiology
The role of microbes in Crohn's disease
Clinical Infectious Diseases
MUSCLE: Multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Research
Search and clustering orders of magnitude faster than BLAST
Bioinformatics
UCHIME improves sensitivity and speed of chimera detection
Bioinformatics
Relaxed neighbor joining: A fast distance-based phylogenetic tree construction method
Journal of Molecular Evolution
The influence of sex, handedness, and washing on the diversity of hand surface bacteria
Proceedings of the National Academy of Sciences of the United States of America
Forensic identification using skin bacterial communities
Proceedings of the National Academy of Sciences of the United States of America
BARCRAWL and BARTAB: Software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing
BMC Bioinformatics
The Human Microbiome Project: A community resource for the healthy human microbiome
PLoS Biology
Meeting report: The terabase metagenomics workshop and the vision of an Earth microbiome project
Standards in Genomic Sciences
The Earth Microbiome Project: Meeting report of the “1 EMP meeting on sample selection and acquisition” at Argonne National Laboratory October 6 2010
Standards in Genomic Sciences
SitePainter: A tool for exploring biogeographical patterns
Bioinformatics
Some distance properties of latent root and vector methods used in multivariate analysis
Biometrika
Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons
Genome Research
Microbial community profiling for Human Microbiome Projects: Tools, techniques, and challenges
Genome Research
Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex
Nature Methods
Bacterial diversity in two Neonatal Intensive Care Units (NICUs)
PLoS One
A framework for human microbiome research
Nature
Structure, function and diversity of the healthy human microbiome
Nature
Partitioning diversity into independent alpha and beta components
Ecology
MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform
Nucleic Acids Research
BLAT—The BLAST-like alignment tool
Genome Research
Supervised classification of human microbiota
FEMS Microbiology Reviews
Bayesian community-wide culture-independent microbial source tracking
Nature Methods
Supervised classification of microbiota mitigates mislabeling errors
ISME Journal
Succession of microbial consortia in the developing infant gut microbiome
Proceedings of the National Academy of Sciences of the United States of America
Cited by (443)
Transcriptomic Response of Superworm in Facilitating Polyethylene Biodegradation
2024, Journal of Polymers and the EnvironmentConsiderations for the inclusion of metabarcoding data in the plant protection product risk assessment of the soil microbiome
2024, Integrated Environmental Assessment and Management