Review articleThe genetics of human autoimmune disease: A perspective on progress in the field and future directions
Introduction
Our understanding of which genes predispose to different autoimmune diseases has expanded rapidly over the last decade. This progress has been mostly due to genome-wide association studies (GWAS) and the development of various technical and analytic tools. However, despite this progress less than half of the heritability of most autoimmune diseases can be explained and nearly half of this identified genetic risk is due to variations within HLA. The actual functional variants that underlie statistically significant associations are with some notable exceptions are still largely unknown. In the following perspective, I will review some of the more salient advances in the field, provide examples to illustrate specific points, indicate where knowledge is sparse, and discuss the potential for future advances that I believe could further define the pathogenesis and perhaps enable application to diagnoses and therapy. More detailed aspects of the genetics for a variety of autoimmune diseases is presented by experts in the field in other sections of this special issue of the journal. A general paradigm for GWAS and sequence variant studies is shown in Fig. 1 and discussed in subsequent sections.
Epidemiological studies of most autoimmune diseases including rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), type 1 diabetes (T1D), multiple sclerosis (MS), and primary biliary cirrhosis (PBC) show that there is strong heritability. These include studies showing increased concordance in monozygotic compared to dizygotic twin as well as studies showing increased risk to siblings of proband cases compared to the general population. Although most of these studies are not truly population based and may have biased results, there are some caveats worth noting. First, some autoimmune diseases have much higher sibling relative risk rates than other diseases (e.g. SLE [1], [2], T1D [3], celiac disease [4], and PBC [5], [6] compared to others e.g. RA [2], and MS [7]). Second, although concordance of disease is much higher in monozygotic compared with dizygotic twins for many autoimmune diseases, the overall monozygotic concordance of disease is usually substantially less than 50% [8]. This indicates that stochastic factors including environmental variables are a strong component and although genetics can be very useful in identifying important factors in etiopathogenesis it can only partially predict phenotype. Although some specific environmental factors have been identified (e.g. smoking and rheumatoid arthritis [9], [10]) it is also possible that most of the incomplete concordance is simply chance or indefinable events.
The major advance in identifying genetic loci that predispose to autoimmune diseases has been GWAS. Although some non-major histocompatibility (HLA in humans) loci were identified prior to GWAS using linkage or candidate gene studies, and other methodologies including admixture mapping have also enabled identification of a modicum of risk loci, the exponential increase in loci (over 200 for some autoimmune diseases) has been the direct result of GWAS. The basis of GWAS is the technology enabling efficient and accurate genotyping of single base polymorphisms (SNPs) and large collaborative studies such as HapMap [11], [12] defining large numbers (hundreds of thousands) of SNPs in different populations. The success of GWAS is in large part due to practical advantage in conducting case/control design, namely the ability to recruit large numbers of cases and population controls as opposed to the difficulty in recruiting families: power for any association study is largely based on numbers. A critical aspect for these studies has been the ability to adequately control for population substructure differences using statistical methodology. Most commonly this is done by logistic regression using relevant principal components defined by principal component analyses or similar methods [13], [14], [15]. In some studies only continental population differences are accounted for, but for the most part type 2 errors (false positives) due to unrecognized stratification differences in case and control populations have been minimized. In fact, many studies have used publically available control genotypes rather than specific matched collections of controls. It is also worth noting that it may be possible to increase power (decrease Type 1 errors, false negatives) to ascertain risk variants by limiting studies to more homogenous populations and additional considerations of population substructure is discussed in subsequent sections (see sections 3.3 Rare/uncommon variants, 4 Ancestry makes a difference). However, GWAS is largely applicable to those loci that fulfill the common variant (>0.05 minor allele frequency) common disease hypothesis since this methodology relies on linkage disequilibrium (LD) between the marker (SNP) detected and the actual disease causing variant(s). The commonly used genotyping platforms (Illumina and Affymetrix chip arrays) and clustering algorithms are also most accurate for minor allele frequencies (MAFs) > 0.05, although there have been some improvements in software to enable higher accuracy genotyping.
Another important aspect of current GWAS and other association studies is the ability to accurately impute much of the sequence difference that is not directly genotyped using current genotyping platforms. Imputation is based on sophisticated algorithms that enable under certain conditions the accurate estimation of the sequence variants between genotyped markers based on reference genomes that contain the missing information [16], [17], [18], [19]. It relies on the shared loci (almost all of the SNPs in commonly used genotyping platforms are shared with the reference genomes) and the patterns of variation (http://www.1000genomes.org). With the completion of phase 3 of the 1000 genome sequencing study there is now sequence data for multiple populations that can be used to inform imputation(http://www.1000genomes.org/) [20]. This information can suggest candidate non-synonymous damaging variants and in others provides candidate non-coding variants that might be suggestive based on expression quantitative traits (eQTL), binding sites for transcription factors, DNase sensitivity or other features (e.g. histone marks) that have been elucidated in particular tissues or cell lines (discussed further in Section 6). Importantly, imputation methods can also be used to determine HLA classical determinants and amino acids.
A critical aspect for any association study is defining the phenotype of cases. Theoretically, the better the definition of phenotype the greater the likelihood that type 1 errors will be minimized since inclusion of unaffected individuals or those with a different autoimmune disease may increase heterogeneity and dilute the risk of variants for a particular phenotype. However, there may be a practical trade-off between including cases that may not have all the information (e.g. particular autoantibodies or even age of onset) that could increase the power to detect loci due to greater sample size as compared to more rigid criteria. For some autoimmune diseases particular features clearly increase the relative risk of many of the susceptibility loci. A strong example is for SLE where cases with anti-double strand (DS) DNA show substantially higher odds ratios for several lupus susceptibility alleles than those that are anti-DS DNA negative [21]. Similarly, some RA studies show stronger associations when the cases are limited to those with anti-cyclic citrullinated peptide antibodies (anti-CCP) [22]. However, in other studies when cases are limited to anti-CCP positive RA there are fewer susceptibility loci associated with disease, probably mostly due to decreased power as a function of a smaller sample size [23]. Similarly, if for example an SLE study was restricted to only those that manifested the same 4 out of 11 American College of Rheumatology criteria it is likely that many of the identified loci would have been missed due to weaker signals from decreased sample sizes.
The decision on phenotypic definition is analogous to the classic fight between the lumpers and the splitters in the definition of mammalian species [24] and was advanced many years ago with respect to the nosology of genetic disease [25]. However, for some autoimmune diseases the definition of “subtypes” may be crucial. In collaborative work by this author, the study of myasthenia gravis, it was critical to divide the disease by age-of-onset and the lack of thymomas. Here, there are major differences in which HLA genes are important in susceptibility when early onset (<45 years of age) compared to late onset (>50 years of age) are compared ([26], [27] and Seldin, Gregersen, Hammarstrom, unpublished data). In addition, the highest risk non-HLA gene, TNIP1, observed in early onset myasthenia gravis (EOMG) [26] has no effect in late onset myasthenia gravis (LOMG) (Seldin, Gregersen, Hammarstrom, unpublished data).
As the field progresses there are likely to be more studies that examine endophenotypes including the most prominent disease manifestations, disease severity and morbidity. In SLE there is a study suggesting specific loci that predispose to nephropathy [28]. Such studies could also include those individuals that have very poor outcomes or more severe phenotypes.
Section snippets
The major histocompatibility complex region
With rare exception, genes in the human major histocompatibility complex (MHC), HLA, are the strongest risk genes for each autoimmune disease. Several general points are notable: 1) often there is more than one association signal from the MHC region with the second and even third signals sometimes stronger than any of the individual signals from loci outside this region; 2) recent studies suggest that most of the strongest signals are from classical HLA determinants (not from single SNPs); 3)
Non-HLA susceptibility loci
Several hundred non-HLA loci have been identified that contribute to the association of one or more autoimmune diseases. Most are relatively high frequency perhaps as a reflection of the sensitivity of the methods to detect these loci (i.e. GWAS as discussed above). The vast majority of these loci are located either within or close to genes that participate in immune system response. These include genes that are components of antigen processing, presentation, recognition, differentiation,
Ancestry makes a difference
There are substantial differences in the loci defined for autoimmune diseases in disparate population groups. These differences are most apparent comparing results from one continental population to another continental population. The largest amount available data are from studies of European populations and East Asian populations [93], [94], [95], [96]. The differences are partially explained by the frequency of particular variants in these different populations. Thus, the PTPN22 R602W
Missing heritability
For most autoimmune diseases, estimates suggest that over 50% of the genetic loci contributing to heritability are not yet elucidated. Many explanations that are not mutually exclusive are possible to account for this missing heritability including: 1) very large numbers of common low risk variants that can only be defined with huge numbers of cases and controls (e.g. sample sizes >>100,000); 2) large numbers of rare variants that have not been detected due to insufficient sequencing, sample
Towards functional studies: coding variants, gene expression, and epigenetics
Relatively few non-synonymous variants have been identified in non-MHC genetic loci contributing to the susceptibility to autoimmune diseases. These include PTPN22, TNIP1, ITGAM, and SH2B3 [26], [48], [59], [102], [112]. Although Koch's postulates cannot be tested, for some variants there is compelling evidence that the variant changes a functional aspect(s) of the immune response. For PTPN22 the R620W variant has been extensively studied and clearly has a major effect on increasing T cell and
Final comments: what to make of it all
Some in the scientific community have questioned whether the myriad of loci identified in GWAS studies has been useful. While GWAS has not solved the problem of what causes autoimmunity it is my contention that it has had an important impact on the field and the identification of these loci will continue to be one of the major drivers in furthering our understanding of what causes these phenotypes. Like most investigative results, more questions are raised from these studies. What are the real
Conflict of interest
The authors have no conflict of interest to declare.
Acknowledgments
This work was supported by R01DK091823.
References (179)
- et al.
Familial primary biliary cirrhosis reassessed: a geographically-based population study
J. Hepatol.
(1999) - et al.
Twin studies in autoimmune disease: genetics, gender and environment
J. Autoimmun.
(2012) - et al.
Genotype-imputation accuracy across worldwide human populations
Am. J. Hum. Genet.
(2009) - et al.
A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals
Am. J. Hum. Genet.
(2009) - et al.
HLA-DQ typing in the diagnosis of celiac disease
Am. J. Gastroenterol.
(2002) - et al.
Sequence and haplotype analysis supports HLA-C as the psoriasis susceptibility 1 gene
Am. J. Hum. Genet.
(2006) - et al.
A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis
Am. J. Hum. Genet.
(2004) - et al.
Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies
Am. J. Hum. Genet.
(2012) - et al.
Pooled association tests for rare variants in exon-resequencing studies
Am. J. Hum. Genet.
(2010) - et al.
PTPN22: setting thresholds for autoimmunity
Semin. Immunol.
(2006)
Genetic susceptibility to SLE: new insights from fine mapping and genome-wide association studies
Nat. Rev. Genet.
Concordant and discordant associations between rheumatoid arthritis, systemic lupus erythematosus and ankylosing spondylitis based on all hospitalizations in Sweden between 1973 and 2004
Rheumatol. Oxf.
The genetic basis for type 1 diabetes
Br. Med. Bull.
Families with multiple cases of gluten-sensitive enteropathy
Z. fur Gastroenterol.
Epidemiology and pathogenesis of primary biliary cirrhosis
J. Clin. Gastroenterol.
Familial risk of multiple sclerosis: a nationwide cohort study
Am. J. Epidemiol.
A gene-environment interaction between smoking and shared epitope genes in HLA-DR provides a high risk of seropositive rheumatoid arthritis
Arth. Rheum.
Smoking and risk of rheumatoid arthritis
J. Rheumatol.
The International HapMap project web site
Genome Res.
The International HapMap project
Nature
Principal components analysis corrects for stratification in genome-wide association studies
Nat. Genet.
New approaches to population stratification in genome-wide association studies
Nat. Rev. Genet.
Accounting for ancestry: population substructure and genome-wide association studies
Hum. Mol. Genet.
Genotype imputation for genome-wide association studies
Nat. Rev. Genet.
A flexible and accurate genotype imputation method for the next generation of genome-wide association studies
PLoS Genet.
An integrated map of genetic variation from 1,092 human genomes
Nature
Differential genetic associations for systemic lupus erythematosus based on anti-dsDNA autoantibody production
PLoS Genet.
Genetic markers of rheumatoid arthritis susceptibility in anti-citrullinated peptide antibody negative patients
Ann. Rheum. Dis.
The influence of polygenic risk scores on heritability of anti-CCP level in RA
Genes Immun.
The Principles of Classification and a Classification of Mammals
Bull. AMNH
On lumpers and splitters, or the nosology of genetic disease
Perspect. Biol. Med.
Risk for myasthenia gravis maps to a (151) Pro-->Ala change in TNIP1 and to human leukocyte antigen-B*08
Ann. Neurol.
Late onset myasthenia gravis is associated with HLA DRB1*15:01 in the Norwegian population
PLoS One
Lupus nephritis susceptibility loci in women with systemic lupus erythematosus
J. Am. Soc. Nephrol.
Mapping of multiple susceptibility variants within the MHC region for 7 immune-mediated diseases
Proc. Natl. Acad. Sci. U. S. A.
Genetics of type 1 diabetes
Cold Spring Harb. Perspect. Med.
High association of an HL-A antigen, W27, with ankylosing spondylitis
N. Engl. J. Med.
HLA-B51/B5 and the risk of Behcet's disease: a systematic review and meta-analysis of case-control genetic association studies
Arth. Rheum.
Common variants in the HLA-DQ region confer susceptibility to idiopathic achalasia
Nat. Genet.
Th2-like CD8+ T cells showing B cell helper function and reduced cytolytic activity in human immunodeficiency virus type 1 infection
J. Exp. Med.
IFN-{gamma} produced by CD8 T cells induces T-bet-dependent and -independent class switching in B cells in responses to alum-precipitated protein vaccine
Proc. Natl. Acad. Sci. U. S. A.
The shared epitope hypothesis. An approach to understanding the molecular genetics of susceptibility to rheumatoid arthritis
Arth. Rheum.
Human leukocyte antigen in primary biliary cirrhosis: an old story now reviving
Hepatology
Classical HLA-DRB1 and DPB1 alleles account for HLA associations with primary biliary cirrhosis
Genes Immun.
Amino acid position 11 of HLA-DRbeta1 is a major determinant of chromosome 6p association with ulcerative colitis
Genes Immun.
Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis
Nat. Genet.
The adaptive immune response in celiac disease
Semin. Immunopathol.
Sequential studies in ankylosing spondylitis. Association of Klebsiella pneumoniae with active disease
Ann. Rheum. Dis.
Autoantibodies to HLA B27 in the sera of HLA B27 patients with ankylosing spondylitis and Reiter's syndrome. Molecular mimicry with Klebsiella pneumoniae as potential mechanism of autoimmune disease
J. Exp. Med.
Reactive arthritis: incidence, triggering agents and clinical presentation
J. Rheumatol.
Cited by (78)
Epigenetic basis of autoimmune disorders in humans
2023, Epigenetics in Human Disease, Third EditionDeconvoluting the heterogeneity of SLE: The contribution of ancestry
2022, Journal of Allergy and Clinical ImmunologyGenetic Approaches to Study Rheumatic Diseases and Its Implications in Clinical Practice
2024, Arthritis and RheumatologyGeneral features, pathogenesis, and laboratory diagnostics of autoimmune encephalitis
2024, Critical Reviews in Clinical Laboratory Sciences