Elsevier

Journal of Autoimmunity

Volume 64, November 2015, Pages 1-12
Journal of Autoimmunity

Review article
The genetics of human autoimmune disease: A perspective on progress in the field and future directions

https://doi.org/10.1016/j.jaut.2015.08.015Get rights and content

Highlights

  • Hundreds of variations in HLA and non-HLA genes predispose to autoimmunity.

  • For non-coding variations alterations in chromatin structure provide valuable clues.

  • Expression quantitative traits can help identify causal variants.

  • Pathway analyses can facilitate further understanding of autoimmune genetics.

  • Functional studies and accounting for missing heredity are critical to advances.

Abstract

Progress in defining the genetics of autoimmune disease has been dramatically enhanced by large scale genetic studies. Genome-wide approaches, examining hundreds or for some diseases thousands of cases and controls, have been implemented using high throughput genotyping and appropriate algorithms to provide a wealth of data over the last decade. These studies have identified hundreds of non-HLA loci as well as further defining HLA variations that predispose to different autoimmune diseases. These studies to identify genetic risk loci are also complemented by progress in gene expression studies including definition of expression quantitative trait loci (eQTL), various alterations in chromatin structure including histone marks, DNase I sensitivity, repressed chromatin regions as well as transcript factor binding sites. Integration of this information can partially explain why particular variations can alter proclivity to autoimmune phenotypes. Despite our incomplete knowledge base with only partial definition of hereditary factors and possible functional connections, this progress has and will continue to facilitate a better understanding of critical pathways and critical changes in immunoregulation. Advances in defining and understanding functional variants potentially can lead to both novel therapeutics and personalized medicine in which therapeutic approaches are chosen based on particular molecular phenotypes and genomic alterations.

Introduction

Our understanding of which genes predispose to different autoimmune diseases has expanded rapidly over the last decade. This progress has been mostly due to genome-wide association studies (GWAS) and the development of various technical and analytic tools. However, despite this progress less than half of the heritability of most autoimmune diseases can be explained and nearly half of this identified genetic risk is due to variations within HLA. The actual functional variants that underlie statistically significant associations are with some notable exceptions are still largely unknown. In the following perspective, I will review some of the more salient advances in the field, provide examples to illustrate specific points, indicate where knowledge is sparse, and discuss the potential for future advances that I believe could further define the pathogenesis and perhaps enable application to diagnoses and therapy. More detailed aspects of the genetics for a variety of autoimmune diseases is presented by experts in the field in other sections of this special issue of the journal. A general paradigm for GWAS and sequence variant studies is shown in Fig. 1 and discussed in subsequent sections.

Epidemiological studies of most autoimmune diseases including rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), type 1 diabetes (T1D), multiple sclerosis (MS), and primary biliary cirrhosis (PBC) show that there is strong heritability. These include studies showing increased concordance in monozygotic compared to dizygotic twin as well as studies showing increased risk to siblings of proband cases compared to the general population. Although most of these studies are not truly population based and may have biased results, there are some caveats worth noting. First, some autoimmune diseases have much higher sibling relative risk rates than other diseases (e.g. SLE [1], [2], T1D [3], celiac disease [4], and PBC [5], [6] compared to others e.g. RA [2], and MS [7]). Second, although concordance of disease is much higher in monozygotic compared with dizygotic twins for many autoimmune diseases, the overall monozygotic concordance of disease is usually substantially less than 50% [8]. This indicates that stochastic factors including environmental variables are a strong component and although genetics can be very useful in identifying important factors in etiopathogenesis it can only partially predict phenotype. Although some specific environmental factors have been identified (e.g. smoking and rheumatoid arthritis [9], [10]) it is also possible that most of the incomplete concordance is simply chance or indefinable events.

The major advance in identifying genetic loci that predispose to autoimmune diseases has been GWAS. Although some non-major histocompatibility (HLA in humans) loci were identified prior to GWAS using linkage or candidate gene studies, and other methodologies including admixture mapping have also enabled identification of a modicum of risk loci, the exponential increase in loci (over 200 for some autoimmune diseases) has been the direct result of GWAS. The basis of GWAS is the technology enabling efficient and accurate genotyping of single base polymorphisms (SNPs) and large collaborative studies such as HapMap [11], [12] defining large numbers (hundreds of thousands) of SNPs in different populations. The success of GWAS is in large part due to practical advantage in conducting case/control design, namely the ability to recruit large numbers of cases and population controls as opposed to the difficulty in recruiting families: power for any association study is largely based on numbers. A critical aspect for these studies has been the ability to adequately control for population substructure differences using statistical methodology. Most commonly this is done by logistic regression using relevant principal components defined by principal component analyses or similar methods [13], [14], [15]. In some studies only continental population differences are accounted for, but for the most part type 2 errors (false positives) due to unrecognized stratification differences in case and control populations have been minimized. In fact, many studies have used publically available control genotypes rather than specific matched collections of controls. It is also worth noting that it may be possible to increase power (decrease Type 1 errors, false negatives) to ascertain risk variants by limiting studies to more homogenous populations and additional considerations of population substructure is discussed in subsequent sections (see sections 3.3 Rare/uncommon variants, 4 Ancestry makes a difference). However, GWAS is largely applicable to those loci that fulfill the common variant (>0.05 minor allele frequency) common disease hypothesis since this methodology relies on linkage disequilibrium (LD) between the marker (SNP) detected and the actual disease causing variant(s). The commonly used genotyping platforms (Illumina and Affymetrix chip arrays) and clustering algorithms are also most accurate for minor allele frequencies (MAFs) > 0.05, although there have been some improvements in software to enable higher accuracy genotyping.

Another important aspect of current GWAS and other association studies is the ability to accurately impute much of the sequence difference that is not directly genotyped using current genotyping platforms. Imputation is based on sophisticated algorithms that enable under certain conditions the accurate estimation of the sequence variants between genotyped markers based on reference genomes that contain the missing information [16], [17], [18], [19]. It relies on the shared loci (almost all of the SNPs in commonly used genotyping platforms are shared with the reference genomes) and the patterns of variation (http://www.1000genomes.org). With the completion of phase 3 of the 1000 genome sequencing study there is now sequence data for multiple populations that can be used to inform imputation(http://www.1000genomes.org/) [20]. This information can suggest candidate non-synonymous damaging variants and in others provides candidate non-coding variants that might be suggestive based on expression quantitative traits (eQTL), binding sites for transcription factors, DNase sensitivity or other features (e.g. histone marks) that have been elucidated in particular tissues or cell lines (discussed further in Section 6). Importantly, imputation methods can also be used to determine HLA classical determinants and amino acids.

A critical aspect for any association study is defining the phenotype of cases. Theoretically, the better the definition of phenotype the greater the likelihood that type 1 errors will be minimized since inclusion of unaffected individuals or those with a different autoimmune disease may increase heterogeneity and dilute the risk of variants for a particular phenotype. However, there may be a practical trade-off between including cases that may not have all the information (e.g. particular autoantibodies or even age of onset) that could increase the power to detect loci due to greater sample size as compared to more rigid criteria. For some autoimmune diseases particular features clearly increase the relative risk of many of the susceptibility loci. A strong example is for SLE where cases with anti-double strand (DS) DNA show substantially higher odds ratios for several lupus susceptibility alleles than those that are anti-DS DNA negative [21]. Similarly, some RA studies show stronger associations when the cases are limited to those with anti-cyclic citrullinated peptide antibodies (anti-CCP) [22]. However, in other studies when cases are limited to anti-CCP positive RA there are fewer susceptibility loci associated with disease, probably mostly due to decreased power as a function of a smaller sample size [23]. Similarly, if for example an SLE study was restricted to only those that manifested the same 4 out of 11 American College of Rheumatology criteria it is likely that many of the identified loci would have been missed due to weaker signals from decreased sample sizes.

The decision on phenotypic definition is analogous to the classic fight between the lumpers and the splitters in the definition of mammalian species [24] and was advanced many years ago with respect to the nosology of genetic disease [25]. However, for some autoimmune diseases the definition of “subtypes” may be crucial. In collaborative work by this author, the study of myasthenia gravis, it was critical to divide the disease by age-of-onset and the lack of thymomas. Here, there are major differences in which HLA genes are important in susceptibility when early onset (<45 years of age) compared to late onset (>50 years of age) are compared ([26], [27] and Seldin, Gregersen, Hammarstrom, unpublished data). In addition, the highest risk non-HLA gene, TNIP1, observed in early onset myasthenia gravis (EOMG) [26] has no effect in late onset myasthenia gravis (LOMG) (Seldin, Gregersen, Hammarstrom, unpublished data).

As the field progresses there are likely to be more studies that examine endophenotypes including the most prominent disease manifestations, disease severity and morbidity. In SLE there is a study suggesting specific loci that predispose to nephropathy [28]. Such studies could also include those individuals that have very poor outcomes or more severe phenotypes.

Section snippets

The major histocompatibility complex region

With rare exception, genes in the human major histocompatibility complex (MHC), HLA, are the strongest risk genes for each autoimmune disease. Several general points are notable: 1) often there is more than one association signal from the MHC region with the second and even third signals sometimes stronger than any of the individual signals from loci outside this region; 2) recent studies suggest that most of the strongest signals are from classical HLA determinants (not from single SNPs); 3)

Non-HLA susceptibility loci

Several hundred non-HLA loci have been identified that contribute to the association of one or more autoimmune diseases. Most are relatively high frequency perhaps as a reflection of the sensitivity of the methods to detect these loci (i.e. GWAS as discussed above). The vast majority of these loci are located either within or close to genes that participate in immune system response. These include genes that are components of antigen processing, presentation, recognition, differentiation,

Ancestry makes a difference

There are substantial differences in the loci defined for autoimmune diseases in disparate population groups. These differences are most apparent comparing results from one continental population to another continental population. The largest amount available data are from studies of European populations and East Asian populations [93], [94], [95], [96]. The differences are partially explained by the frequency of particular variants in these different populations. Thus, the PTPN22 R602W

Missing heritability

For most autoimmune diseases, estimates suggest that over 50% of the genetic loci contributing to heritability are not yet elucidated. Many explanations that are not mutually exclusive are possible to account for this missing heritability including: 1) very large numbers of common low risk variants that can only be defined with huge numbers of cases and controls (e.g. sample sizes >>100,000); 2) large numbers of rare variants that have not been detected due to insufficient sequencing, sample

Towards functional studies: coding variants, gene expression, and epigenetics

Relatively few non-synonymous variants have been identified in non-MHC genetic loci contributing to the susceptibility to autoimmune diseases. These include PTPN22, TNIP1, ITGAM, and SH2B3 [26], [48], [59], [102], [112]. Although Koch's postulates cannot be tested, for some variants there is compelling evidence that the variant changes a functional aspect(s) of the immune response. For PTPN22 the R620W variant has been extensively studied and clearly has a major effect on increasing T cell and

Final comments: what to make of it all

Some in the scientific community have questioned whether the myriad of loci identified in GWAS studies has been useful. While GWAS has not solved the problem of what causes autoimmunity it is my contention that it has had an important impact on the field and the identification of these loci will continue to be one of the major drivers in furthering our understanding of what causes these phenotypes. Like most investigative results, more questions are raised from these studies. What are the real

Conflict of interest

The authors have no conflict of interest to declare.

Acknowledgments

This work was supported by R01DK091823.

References (179)

  • I.T. Harley et al.

    Genetic susceptibility to SLE: new insights from fine mapping and genome-wide association studies

    Nat. Rev. Genet.

    (2009)
  • K. Sundquist et al.

    Concordant and discordant associations between rheumatoid arthritis, systemic lupus erythematosus and ankylosing spondylitis based on all hospitalizations in Sweden between 1973 and 2004

    Rheumatol. Oxf.

    (2008)
  • K.L. Mehers et al.

    The genetic basis for type 1 diabetes

    Br. Med. Bull.

    (2008)
  • I. Korponay-Szabo et al.

    Families with multiple cases of gluten-sensitive enteropathy

    Z. fur Gastroenterol.

    (1998)
  • C. Selmi et al.

    Epidemiology and pathogenesis of primary biliary cirrhosis

    J. Clin. Gastroenterol.

    (2004)
  • N.M. Nielsen et al.

    Familial risk of multiple sclerosis: a nationwide cohort study

    Am. J. Epidemiol.

    (2005)
  • L. Padyukov et al.

    A gene-environment interaction between smoking and shared epitope genes in HLA-DR provides a high risk of seropositive rheumatoid arthritis

    Arth. Rheum.

    (2004)
  • M. Heliovaara et al.

    Smoking and risk of rheumatoid arthritis

    J. Rheumatol.

    (1993)
  • G.A. Thorisson et al.

    The International HapMap project web site

    Genome Res.

    (2005)
  • The International HapMap project

    Nature

    (2003)
  • A.L. Price et al.

    Principal components analysis corrects for stratification in genome-wide association studies

    Nat. Genet.

    (2006)
  • A.L. Price et al.

    New approaches to population stratification in genome-wide association studies

    Nat. Rev. Genet.

    (2010)
  • C. Tian et al.

    Accounting for ancestry: population substructure and genome-wide association studies

    Hum. Mol. Genet.

    (2008)
  • J. Marchini et al.

    Genotype imputation for genome-wide association studies

    Nat. Rev. Genet.

    (2010)
  • B.N. Howie et al.

    A flexible and accurate genotype imputation method for the next generation of genome-wide association studies

    PLoS Genet.

    (2009)
  • G.R. Abecasis et al.

    An integrated map of genetic variation from 1,092 human genomes

    Nature

    (2012)
  • S.A. Chung et al.

    Differential genetic associations for systemic lupus erythematosus based on anti-dsDNA autoantibody production

    PLoS Genet.

    (2011)
  • S. Viatte et al.

    Genetic markers of rheumatoid arthritis susceptibility in anti-citrullinated peptide antibody negative patients

    Ann. Rheum. Dis.

    (2012)
  • J. Cui et al.

    The influence of polygenic risk scores on heritability of anti-CCP level in RA

    Genes Immun.

    (2014)
  • G. Simpson

    The Principles of Classification and a Classification of Mammals

    Bull. AMNH

    (1945)
  • V.A. McKusick

    On lumpers and splitters, or the nosology of genetic disease

    Perspect. Biol. Med.

    (1969)
  • P.K. Gregersen et al.

    Risk for myasthenia gravis maps to a (151) Pro-->Ala change in TNIP1 and to human leukocyte antigen-B*08

    Ann. Neurol.

    (2012)
  • A.H. Maniaol et al.

    Late onset myasthenia gravis is associated with HLA DRB1*15:01 in the Norwegian population

    PLoS One

    (2012)
  • S.A. Chung et al.

    Lupus nephritis susceptibility loci in women with systemic lupus erythematosus

    J. Am. Soc. Nephrol.

    (2014)
  • J.D. Rioux et al.

    Mapping of multiple susceptibility variants within the MHC region for 7 immune-mediated diseases

    Proc. Natl. Acad. Sci. U. S. A.

    (2009)
  • J.A. Noble et al.

    Genetics of type 1 diabetes

    Cold Spring Harb. Perspect. Med.

    (2012)
  • L. Schlosstein et al.

    High association of an HL-A antigen, W27, with ankylosing spondylitis

    N. Engl. J. Med.

    (1973)
  • M. de Menthon et al.

    HLA-B51/B5 and the risk of Behcet's disease: a systematic review and meta-analysis of case-control genetic association studies

    Arth. Rheum.

    (2009)
  • I. Gockel et al.

    Common variants in the HLA-DQ region confer susceptibility to idiopathic achalasia

    Nat. Genet.

    (2014)
  • E. Maggi et al.

    Th2-like CD8+ T cells showing B cell helper function and reduced cytolytic activity in human immunodeficiency virus type 1 infection

    J. Exp. Med.

    (1994)
  • E. Mohr et al.

    IFN-{gamma} produced by CD8 T cells induces T-bet-dependent and -independent class switching in B cells in responses to alum-precipitated protein vaccine

    Proc. Natl. Acad. Sci. U. S. A.

    (2010)
  • P.K. Gregersen et al.

    The shared epitope hypothesis. An approach to understanding the molecular genetics of susceptibility to rheumatoid arthritis

    Arth. Rheum.

    (1987)
  • P. Invernizzi

    Human leukocyte antigen in primary biliary cirrhosis: an old story now reviving

    Hepatology

    (2011)
  • P. Invernizzi et al.

    Classical HLA-DRB1 and DPB1 alleles account for HLA associations with primary biliary cirrhosis

    Genes Immun.

    (2012)
  • J.P. Achkar et al.

    Amino acid position 11 of HLA-DRbeta1 is a major determinant of chromosome 6p association with ulcerative colitis

    Genes Immun.

    (2012)
  • S. Raychaudhuri et al.

    Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis

    Nat. Genet.

    (2012)
  • S.W. Qiao et al.

    The adaptive immune response in celiac disease

    Semin. Immunopathol.

    (2012)
  • R.W. Ebringer et al.

    Sequential studies in ankylosing spondylitis. Association of Klebsiella pneumoniae with active disease

    Ann. Rheum. Dis.

    (1978)
  • P.L. Schwimmbeck et al.

    Autoantibodies to HLA B27 in the sera of HLA B27 patients with ankylosing spondylitis and Reiter's syndrome. Molecular mimicry with Klebsiella pneumoniae as potential mechanism of autoimmune disease

    J. Exp. Med.

    (1987)
  • T.K. Kvien et al.

    Reactive arthritis: incidence, triggering agents and clinical presentation

    J. Rheumatol.

    (1994)
  • Cited by (78)

    • Epigenetic basis of autoimmune disorders in humans

      2023, Epigenetics in Human Disease, Third Edition
    • Deconvoluting the heterogeneity of SLE: The contribution of ancestry

      2022, Journal of Allergy and Clinical Immunology
    View all citing articles on Scopus
    View full text