Elsevier

Methods in Enzymology

Volume 531, 2013, Pages 371-444
Methods in Enzymology

Chapter Nineteen - Advancing Our Understanding of the Human Microbiome Using QIIME

https://doi.org/10.1016/B978-0-12-407863-5.00019-8Get rights and content

Abstract

High-throughput DNA sequencing technologies, coupled with advanced bioinformatics tools, have enabled rapid advances in microbial ecology and our understanding of the human microbiome. QIIME (Quantitative Insights Into Microbial Ecology) is an open-source bioinformatics software package designed for microbial community analysis based on DNA sequence data, which provides a single analysis framework for analysis of raw sequence data through publication-quality statistical analyses and interactive visualizations. In this chapter, we demonstrate the use of the QIIME pipeline to analyze microbial communities obtained from several sites on the bodies of transgenic and wild-type mice, as assessed using 16S rRNA gene sequences generated on the Illumina MiSeq platform. We present our recommended pipeline for performing microbial community analysis and provide guidelines for making critical choices in the process. We present examples of some of the types of analyses that are enabled by QIIME and discuss how other tools, such as phyloseq and R, can be applied to expand upon these analyses.

Introduction

Advances in DNA sequencing technologies, together with the availability of culture-independent sequencing methods and software for analyzing the massive quantities of data resulting from these technologies, have vastly improved our ability to characterize microbial communities in many diverse environments. The human microbiota, the collection of microbes living in or on the human body, is of considerable interest: microbial cells outnumber human cells in our bodies by a ratio of up to 10 to 1 (Savage, 1977). These microbial communities contribute to healthy human physiology (De Filippo et al., 2010, Dethlefsen and Relman, 2011, Spencer et al., 2011) and development (Dominguez-Bello et al., 2010, Koenig et al., 2011), and dysbiosis (or imbalance in these communities) is now known to be associated with disease, including obesity (Turnbaugh et al., 2009) and Crohn’s disease (Eckburg & Relman, 2007). More recently, evidence from transplants into germ-free mice suggests that some of these associations may be causal, because certain phenotypes can be transmitted by transmitting the microbiota (Carvalho et al., 2012, McLean et al., 2012, Turnbaugh et al., 2009), even including transmission of human phenotypes into mice (Diaz Heijtz et al., 2011, Koren et al., 2012, Smith et al., 2013).

Illumina’s MiSeq and HiSeq DNA sequencing instruments, respectively, sequence tens of millions, or billions, of DNA fragments in a single sequencing run (Kuczynski et al., 2012). The rapidly increasing data volumes typical of recent studies drive a need for more efficient and scalable tools to study the human microbiome (Gonzalez & Knight, 2012). QIIME (Quantitative Insights Into Microbial Ecology) (Caporaso, Kuczynski, et al., 2010) is an open-source pipeline designed to provide self-contained microbial community analyses, from interacting with raw sequence data through publication-quality statistical analyses and visualizations.

QIIME integrates commonly used third-party tools and implements many diversity metrics, statistical methods, and visualization tools for analyzing microbial data. Consequently, most individual steps in the microbial community analysis can be performed in multiple ways. Here, we describe how samples are prepared for an Illumina MiSeq run, the QIIME pipeline, and our view of the current best practices for analyzing microbial communities with QIIME. Although there are other pipelines available, including mothur (Schloss et al., 2009), the RDP tools (Olsen et al., 1991, Olsen et al., 1992), ARB (Ludwig et al., 2004), VAMPS (Sogin, Welch, & Huse, 2009), and other platforms, in this review, we focus on analysis with the MiSeq platform and QIIME as this combination is increasingly popular as a method for analyzing microbial communities and a detailed comparison of other available pipelines and sequencing platforms is beyond the scope of the present work.

Section snippets

QIIME as Integrated Pipeline of Third-Party Tools

An early barrier to adoption of QIIME was that it was difficult to install, in part because of the large number of software dependencies (third-party packages that need to be installed before QIIME is operational). The large number of dependencies was, however, a deliberate choice made during QIIME development. To build a pipeline for sequence analysis that encompasses the many steps from sequence collection, curation, and statistical analysis, the user must consider many existing tools that

PCR and Sequencing on Illumina MiSeq

Microbial community analysis typically begins with the extraction of DNA from primary samples (note that although most of this DNA comes from cells in the sample, some may consist of dead cells or extracellular DNA, so the representation of the active community from these sources is not perfect). Although methods for DNA extraction vary, several large initiatives such as the Earth Microbiome Project (Gilbert et al., 2010, Gilbert et al., 2010) and the Human Microbiome Project (HMP) (Human

QIIME Workflow for Conducting Microbial Community Analysis

The Illumina MiSeq technology can generate up to 107 sequences in a single run (Kuczynski et al., 2012). QIIME takes the instrument output and generates useful information about the community represented in each sample. At a coarse-grained level, we divide this process into “upstream” and “downstream” stages (Fig. 19.1). The upstream step includes all the processing of the raw data (sequencing output) and generating the key files (OTU table and phylogenetic tree) for microbial analysis. The

Testing linear gradients, including time series analysis

Recent microbiome surveys have started integrating gradients (commonly over time) in their study design. We will discuss a first and general approach for those cases, using the Moving Pictures of the Human Microbiome Dataset (Caporaso et al., 2011), where two subjects were sampled daily for up to 396 days in three different body sites (sebum, saliva, and feces). Note that the mouse dataset that we use as a primary example lacks a natural temporal ordering in the study design, so we cannot use

Recommendations

Here, we highlight some of the main aspects to take into account when performing microbial community analysis:

  • Use the open-reference OTU picking approach if your data allow it. It will reduce the running time and will recover all the diversity in your samples.

  • Perform an OTU quality-filtering based on abundance, by removing singletons, for instance. See Bokulich et al. (2013) for further discussion on how to tune this quality-filtering and its effects on downstream analysis. Quality-filtering is

Conclusions

QIIME is a powerful tool for the analysis of bacterial community allowing researchers to recapitulate the necessary steps in the processing of sequences from the raw data to the visualizations and interpretation of the results. Two advantages make QIIME very useful: fidelity to the algorithms used and consistency in the analysis. Fidelity is obtained because QIIME wraps existing software, preserving the integrity of the original programs and algorithms designed, created, and tested by the

Acknowledgments

We thank William A. Walters and Jessica Metcalf for productive discussion and their useful comments about QIIME. We also acknowledge Manuel Lladser for helping collect the dataset and allowing us to use it, and the IQBio IGERT grant for funding data collection. J.A.N.M. is supported by a graduate scholarship funded jointly by the Balsells Foundation and by the University of Colorado at Boulder. S.H. is partially supported by NIH Grant R01 GM086884. This work was partially supported by the

References (107)

  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • J.G. Caporaso et al.

    PyNAST: A flexible tool for aligning sequences to a template alignment

    Bioinformatics

    (2010)
  • J.G. Caporaso et al.

    QIIME allows analysis of high-throughput community sequencing data

    Nature Methods

    (2010)
  • J.G. Caporaso et al.

    Moving pictures of the human microbiome

    Genome Biology

    (2011)
  • J.G. Caporaso et al.

    Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms

    ISME Journal

    (2012)
  • J. Chakerian et al.

    Computational tools for evaluating phylogenetic and hierarchical clustering trees

    Journal of Computational and Graphical Statistics

    (2012)
  • V.B. Chen et al.

    KING (Kinemage, Next Generation): A versatile interactive molecular and scientific visualization program

    Protein Science

    (2009)
  • J.R. Cole et al.

    The Ribosomal Database Project: Improved alignments and new tools for rRNA analysis

    Nucleic Acids Research

    (2009)
  • C. De Filippo et al.

    Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa

    Proceedings of the National Academy of Sciences of the United States of America

    (2010)
  • T.Z. DeSantis et al.

    Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB

    Applied and Environmental Microbiology

    (2006)
  • L. Dethlefsen et al.

    Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation

    Proceedings of the National Academy of Sciences of the United States of America

    (2011)
  • R. Diaz Heijtz et al.

    Normal gut microbiota modulates brain development and behavior

    Proceedings of the National Academy of Sciences of the United States of America

    (2011)
  • M.G. Dominguez-Bello et al.

    Delivery mode shapes the acquisition and structure of the initial microbiota across multiple body habitats in newborns

    Proceedings of the National Academy of Sciences of the United States of America

    (2010)
  • M. Drancourt et al.

    16S ribosomal DNA sequence analysis of a large collection of environmental and clinical unidentifiable bacterial isolates

    Journal of Clinical Microbiology

    (2000)
  • P.B. Eckburg et al.

    The role of microbes in Crohn's disease

    Clinical Infectious Diseases

    (2007)
  • R.C. Edgar

    MUSCLE: Multiple sequence alignment with high accuracy and high throughput

    Nucleic Acids Research

    (2004)
  • R.C. Edgar

    Search and clustering orders of magnitude faster than BLAST

    Bioinformatics

    (2010)
  • R.C. Edgar et al.

    UCHIME improves sensitivity and speed of chimera detection

    Bioinformatics

    (2011)
  • J. Evans et al.

    Relaxed neighbor joining: A fast distance-based phylogenetic tree construction method

    Journal of Molecular Evolution

    (2006)
  • N. Fierer et al.

    The influence of sex, handedness, and washing on the diversity of hand surface bacteria

    Proceedings of the National Academy of Sciences of the United States of America

    (2008)
  • N. Fierer et al.

    Forensic identification using skin bacterial communities

    Proceedings of the National Academy of Sciences of the United States of America

    (2010)
  • D.N. Frank

    BARCRAWL and BARTAB: Software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing

    BMC Bioinformatics

    (2009)
  • D. Gevers et al.

    The Human Microbiome Project: A community resource for the healthy human microbiome

    PLoS Biology

    (2012)
  • J.A. Gilbert et al.

    Meeting report: The terabase metagenomics workshop and the vision of an Earth microbiome project

    Standards in Genomic Sciences

    (2010)
  • J.A. Gilbert et al.

    The Earth Microbiome Project: Meeting report of the “1 EMP meeting on sample selection and acquisition” at Argonne National Laboratory October 6 2010

    Standards in Genomic Sciences

    (2010)
  • A. Gonzalez et al.

    SitePainter: A tool for exploring biogeographical patterns

    Bioinformatics

    (2012)
  • J.C. Gower

    Some distance properties of latent root and vector methods used in multivariate analysis

    Biometrika

    (1966)
  • B.J. Haas et al.

    Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons

    Genome Research

    (2011)
  • M. Hamady et al.

    Microbial community profiling for Human Microbiome Projects: Tools, techniques, and challenges

    Genome Research

    (2009)
  • M. Hamady et al.

    Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex

    Nature Methods

    (2008)
  • K.M. Hewitt et al.

    Bacterial diversity in two Neonatal Intensive Care Units (NICUs)

    PLoS One

    (2013)
  • Human Microbiome Project Consortium

    A framework for human microbiome research

    Nature

    (2012)
  • Human Microbiome Project Consortium

    Structure, function and diversity of the healthy human microbiome

    Nature

    (2012)
  • L. Jost

    Partitioning diversity into independent alpha and beta components

    Ecology

    (2007)
  • K. Katoh et al.

    MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform

    Nucleic Acids Research

    (2002)
  • W.J. Kent

    BLAT—The BLAST-like alignment tool

    Genome Research

    (2002)
  • D. Knights et al.

    Supervised classification of human microbiota

    FEMS Microbiology Reviews

    (2011)
  • D. Knights et al.

    Bayesian community-wide culture-independent microbial source tracking

    Nature Methods

    (2011)
  • D. Knights et al.

    Supervised classification of microbiota mitigates mislabeling errors

    ISME Journal

    (2011)
  • J.E. Koenig et al.

    Succession of microbial consortia in the developing infant gut microbiome

    Proceedings of the National Academy of Sciences of the United States of America

    (2011)
  • Cited by (443)

    View all citing articles on Scopus
    View full text