Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets

  1. Terrence S. Furey1,5,6
  1. 1Department of Genetics, Department of Biology, Lineberger Comprehensive Cancer Center, and Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA;
  2. 2Institute of Intelligent Systems for Automation National Research Council, Bari IT 70126, Italy;
  3. 3Center for Human Genetics and Section of Medical Genetics, Department of Medicine, Duke University, Durham, North Carolina 27710, USA;
  4. 4Departments of Statistical Science, Computer Science, and Mathematics, Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA
    1. 5 These authors contributed equally to this work.

    Abstract

    Single variant or single gene analyses generally account for only a small proportion of the phenotypic variation in complex traits. Alternatively, gene set or pathway association analyses are playing an increasingly important role in uncovering genetic architectures of complex traits through the identification of systematic genetic interactions. Two dominant paradigms for gene set analyses are association analyses based on SNP genotypes and those based on gene expression profiles. However, gene–disease association can manifest in many ways, such as alterations of gene expression, genotype, and copy number; thus, an integrative approach combining multiple forms of evidence can more accurately and comprehensively capture pathway associations. We have developed a single statistical framework, Gene Set Association Analysis (GSAA), that simultaneously measures genome-wide patterns of genetic variation and gene expression variation to identify sets of genes enriched for differential expression and/or trait-associated genetic markers. Simulation studies illustrate that joint analyses of genomic data increase the power to detect real associations when compared with gene set methods that use only one genomic data type. The analysis of two human diseases, glioblastoma and Crohn's disease, detected abnormalities in previously identified disease-associated pathways, such as pathways related to PI3K signaling, DNA damage response, and the activation of NFKB. In addition, GSAA predicted novel pathway associations, for example, differential genetic and expression characteristics in genes from the ABC transporter family in glioblastoma and from the HLA system in Crohn's disease. These demonstrate that GSAA can help uncover biological pathways underlying human diseases and complex traits.

    Footnotes

    • 6 Corresponding authors.

      E-mail sayan{at}stat.duke.edu.

      E-mail tsfurey{at}email.unc.edu.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.124370.111.

    • Received April 13, 2011.
    • Accepted September 19, 2011.

    Freely available online through the Genome Research Open Access option.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server