Article Text

Download PDFPDF

The promise and perils of interpreting genetic associations in Crohn’s disease
  1. T T Trinh1,
  2. J D Rioux2
  1. 1The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA, and University of Virginia Health System, Digestive Health Center of Excellence, Charlottesville, Virginia, USA
  2. 2The Broad Institute of MIT and Harvard, Brigham and Women’s Hospital, Department of Neurology, Harvard Medical School, Cambridge, Massachusetts, USA, and Université de Montréal, Montreal Heart Institute, Montréal, Québec, Canada
  1. Correspondence to:
    Dr John D Rioux
    5000 Rue Bélanger, Montréal, Québec, Canada;

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Extended analyses of inflammatory bowel disease susceptibility loci is advisable before definitive conclusions about their causative role can be drawn

Genome wide linkage analysis has been an extremely successful method for mapping rare but highly penetrant genes in monogenic disorders.1 Its applications to common diseases have achieved limited success, however, due to the individually low heritability of each contributing gene.2 Alternatively, association based genetic studies, as described by Risch and Merikangas,3 can be powerful tools for identifying causal genes in common human diseases.1,4 To date, these methods have been used to follow up on linkage regions and to test for candidate genes and therefore have been limited by the previous linkage analyses or by the assumptions made regarding disease pathogenesis. Both of these factors can greatly diminish our ability to detect all possible causal variants contributing to complex disease traits. Over the last decade, however, technology and genetic resources have evolved dramatically. With the completion of the human genome sequence, recent development of high throughput genotyping technologies and knowledge of the patterns of genetic variation, it is now possible to perform genome wide association studies for common human diseases. Such genome wide approaches, although comprehensive, still face the analytical challenges of identifying true causal disease alleles. The prospects of identifying potentially significant associations must therefore be tempered by the perils of inaccurately drawn conclusions. Specifically, the progresses made in this new frontier will rely critically on proper execution and interpretation of genetic studies that are able to: (1) detect true positive from false positives; (2) distinguish causal variation from that which is in linkage disequilibrium (LD) (that is, cosegregation or non-random association of nearby alleles within a population); and (3) explain gene-function, gene-gene, and genotype-phenotype relationships.

The challenges presented by these three goals are nicely illustrated in the study of the genetic susceptibility of inflammatory bowel diseases (IBD). Despite the limitations of past genetic tools for complex diseases, the genetics of IBD has enjoyed a near unique situation of having successfully identified multiple associated alleles using these techniques. Several genome wide searches for IBD susceptibility loci had previously identified numerous genomic regions potentially containing IBD risk factors. Subsequently, association studies using positional mapping and candidate gene approaches further identified a few genetic regions that were independently replicated in various populations, including CARD155,6,7,8,9,10 and IBD5.11–16 However, the initial optimism is now being balanced with prudent realism. Difficulties such as replication of associations and translation of genetic association to causal functional consequences provide reason for cautious interpretation of results from genetic analyses of IBD and other complex diseases. The articles in this issue of Gut, by Török and colleagues17 in Germany and by Noble and colleagues18 in Scotland, highlight several of the present challenges with respect to two recently described IBD susceptibility regions (see page 1416 and page 1421). In the case of DLG5 (Drosophila discs large homologue 5), it is the challenge of replicating a putative positive association with IBD, and in the case of the organic cation transporter (OCTN) cluster in IBD5, it is the challenge of distinguishing the causal allele from those with which it is in LD.

In the case of DLG5, a susceptibility locus on chromosome 10 was initially described in 1999 by Hampe and colleagues,19 emerging from a genome wide linkage screen in a European cohort. Stoll et al further narrowed this risk region by using an association mapping approach and identified two distinct haplotypes in the region surrounding the DLG5 gene that were putatively associated with IBD and CD.20 Specifically, this group reported the association of a risk “haplotype D” (defined by a non-synonymous SNP 113G→A resulting in the amino acid substitution R30Q) with IBD and Crohn’s disease (CD) in their family trios (χ2 = 8.1, p = 0.004; χ2 = 4.2, p = 0.04, respectively) with independent replication in their case control cohort (p = 0.0001, odds ratio (OR) = 1.6).20 They also described a protective “haplotype A”, identified by eight haplotype tagging single nucleotide polymorphisms (htSNPs), that was undertransmitted in the IBD trios and was further confirmed in an independent case control sample. Evidence for epistasis between DLG5 and CARD15 was also observed.

The initial excitement regarding DLG5 and IBD has now been met with some frustration, with the recent publication of variable findings from three other groups. Török and colleagues17 performed a replication study for DLG5 in an independent German case control group from Stoll and colleagues20 but could not confirm any association of the two previously reported haplotypes with IBD. They also did not find evidence of locus-locus interactions with CARD15. Along a similar line, Noble and colleagues18 also could not replicate the findings of Stoll et al in their Scottish population.20 They found no association of either haplotype with IBD, CD, or ulcerative colitis. Additionally, they could not confirm genetic epistasis between DLG5 and CARD15 in their population, nor did they identify specific phenotypic associations with DLG5.

In contrast, Daly and colleagues21 confirmed the association of DLG5 with IBD in two of their three European derived populations. Interestingly, they were able to replicate the association of IBD with the R30Q variant (haplotype D) in their Quebec/Italian case control cohort and in an independent Quebec/UK family based study (approximate OR 1.25) but were not able to replicate it in a UK case control group. The authors suggested that the apparently inconsistent results from their own three studies were likely due to undetected phenotypic differences between the sample collections. In contrast, their three studies consistently demonstrated a lack of association to the putatively protective “haplotype A”. They therefore attributed the original finding by Stoll et al as partially due to a statistical fluctuation and partially to a result of LD with the replicated R30Q association, as the presence of an overtransmitted allele or haplotype requires that the alternate allele or haplotype(s) have a net undertransmission (giving the appearance of a protective allele).

In general terms, the factors that are most likely to influence our ability to replicate true association findings are statistical power and consistency of phenotype across studies. The statistical power of an association study is dependent on sample size, frequency, and strength of the disease allele, as well as the frequency of the disease in the population. More specifically for the DLG5 variants, the prevalence of these alleles has not been extensively studied across ethnic or geographic subgroups; this information will help clarify whether differences observed among studies are truly effects of genetic or population heterogeneity, or rather an intrinsic weakness of modestly sized genetic studies due to sampling variation. The absence of CD associated CARD15 mutations and the IBD5 risk haplotype in the Japanese population supports the notion of variation of allelic frequencies in different populations.22,23 Alternatively, collection strategy and sampling bias must also be considered, as incomplete sampling can lead to overestimation of the frequency of some risk alleles and underestimation of others.24 The genetic effect (or penetrance) of the DLG5 variants is also not known although, as for most genes contributing to complex traits, the effects will likely be modest. Taking these factors into account, sample sizes of several thousand cases and controls will be needed to have >90% power to replicate the findings of Stoll et al with confidence (p<0.01).4,21

The current dilemma with DLG5 is not unique in the genetic elucidation of complex human diseases. An analogous situation was observed with type II diabetes, where PPARG was conclusively confirmed as a susceptibility locus only after examination of pooled data among studies, despite inconsistent findings among many small studies.25 Therefore, large collaborative efforts will be necessary to improve study power and enable adequately sized subpopulations to be studied for potential phenotypic associations with DLG5. Such large collaborative efforts will need to pay particular attention to phenotype, as this will not only be crucial for establishing the association of a genetic risk factor to disease but also for determining the relationship to clinical phenotype. Without a doubt, additional functional studies that provide convincing evidence of causality will be powerful adjuncts to any positive statistical findings. Finally, given the current status of the replication findings for DLG5, more work is required to define the exact nature of potential association to IBD.

In contrast with DLG5, the IBD5 risk haplotype has been firmly established as an IBD susceptibility region. Rioux et al first reported linkage for CD on chromosome 5q31 in a Canadian population with subsequent fine mapping of this locus to a 250 kb risk haplotype.11,12 Several groups have independently confirmed this as a CD associated risk haplotype in different European and Caucasian populations.13–16 Unfortunately, identification of the underlying causal genetic variants within this region has been a more daunting task due to the strong LD across this region.11 In addition, specific phenotypic associations have not been clearly defined. Recently, a provocative study by Peltekova and colleagues26 proposed two causal SNPs in the carnitine/OCTN cluster located within the IBD5 risk haplotype that was associated with CD. The two mutations were in strong LD and created a two allele risk haplotype (TC haplotype) that was associated with CD independent of the extended IBD5 risk haplotype in their patient population. They provided preliminary functional studies demonstrating that these two SNPs resulted in impaired OCTN transporter function of various organic cations as well as carnitine, an essential cofactor in lipid metabolism.26,27 Based on the observed association of this TC haplotype with CD independent of the IBD5 extended haplotype, and a speculative link between OCTN function and intracellular homeostasis, they suggested that these two specific variants rather than other closely linked alleles were causal variants in CD susceptibility.

The present study by Török and colleagues17 is the first published study that attempts to replicate the findings reported by Peltekova et al. Although they also found an association of the OCTN-TC haplotype with CD and an interaction with CD associated CARD15 mutations in their German case control cohort, they did not find conclusive genetic evidence that the OCTN polymorphisms were the likely causal variants. This was based on their observation that the association of the OCTN-TC haplotype with CD was not independent from another SNP (IGR2078a_1) also located within this tightly linked region that was chosen as a proxy for the extended IBD5 risk haplotype. The study highlights the difficulty in distinguishing a genetic variation that is causal from that which it is in LD (that is, are the OCTN genes or other unassayed SNPs in the IBD5 region the causal variants?). To firmly establish that the OCTN genes are the susceptibility genes within the IBD5 risk haplotype would require genetic evidence that OCTN-TC is not only independently associated with but also more strongly associated with IBD than the other genetic variants located within the extended haplotype on which it exists. This has not yet been convincingly demonstrated. In both of the above studies, the SNP IGR2078a_1 was chosen as a proxy for the IBD5 haplotype. Previous analysis of the IBD5 haplotype structure11 indicates that this marker is located in a haplotype block at a significant distance from the block containing the OCTN1 and OCTN2 genes. A more relevant comparison to examine whether the OCTN variants are acting independently of the IBD5 haplotype would be to test other htSNPs within the same block. In addition, the relatively small numbers of samples in the two studies preclude a definitive answer to this question of independence of the OCTN variants and the other IBD5 haplotype variants. What are the factors that will influence studies seeking to address this question? In addition to the factors that affect our ability to replicate a true finding of association, the degree of recombination between the putative causal allele(s) and surrounding variants will determine our ability to distinguish the independence of the association signals.

Given the difficulties in resolving these dilemmas, future progress made with respect to OCTN and IBD5 will likely require supportive evidence from functional studies. Information providing a compelling biological explanation for how impairment of these OCTN genes leads to the clinical phenotypes in CD would further strengthen any positive associations. Although Peltekova et al provide preliminary data linking the OCTN mutations with impaired cation transport, it is unclear how these defects would translate to an increased risk for intestinal inflammation. In fact, the OCTNs are widely expressed in various human tissues (that is, brain, intestine, skeletal muscle, heart, kidney, intestines),27,29–31 and previously reported mutations in the human and mouse OCTN2 genes are associated with systemic carnitine deficiency, a condition characterised by diseases of skeletal muscles, cardiac muscles, and liver, rather than the intestinal system.27,28 However, these two common variants in the OCTN genes may in the end turn out to be causative for IBD; much additional work will be necessary to prove causality and determine the precise mechanism of action.

Another apparent inconsistency in the literature regarding the IBD5 risk haplotype is its association with specific clinical phenotypes. Several groups have reported a lack of association between the IBD5 region and specific disease sites.11,13–16 A UK group reported an association of the IBD5 haplotype with both perianal CD and ileal CD; however, on further analysis, they found that in fact the strongest association was for perianal CD with associated ileal disease.16 Recently, Newman et al32 reported an association of the OCTN-TC haplotype with ileal CD, independent of perianal disease involvement, which was further strengthened by the presence of CARD15 alleles. In this issue, Török and colleagues17 report novel phenotypic associations with the IBD5/OCTN-TC haplotype and colonic CD, and with non-fistulising and non-stricturing behaviour. This disparity among the literature reflects the inherent difficulties in genotype-phenotype correlation studies: small sample size of individual stratified subgroups and differences in phenotypic classification systems among studies. Future collaborative efforts to incorporate large data sets using standardised and rigorously defined phenotypic classification schemes will indeed be critical in clarifying these conflicting observations.

Genetic epistasis between both DLG5 and IBD5 with CARD15 was also extensively studied by these various groups, albeit with contrasting findings. This is not surprising, as interpretation of epistasis can be quite challenging. As discussed by Cordell,33 the same terminology has been used to apply to quite different definitions, as well as statistical and biological concepts. To add to this confusion is the dilemma in interpreting epistasis once it has been statistically identified. Statistical interaction does not necessarily imply interaction on a biological level.34 Until we can better define the correlation between statistical models and biological models, any interpretation of genetic epistasis should be made with caution.

In conclusion, the genetic revolution continues to progress with great momentum. This has been particularly evident in the field of IBD where the previously impossible task of identifying multiple causal genes has now become a reality. As we continue to generate interesting findings that have promising prospects in advancing our understanding of disease pathogenesis and ultimately, in the care of our patients, we must be aware of the perils and challenges that come with interpretation of the data. Keeping in mind these intermediate goals of finding true positive associations, identifying actual causal variants, and identifying true gene-gene and genotype-phenotype associations, we can guide our study design and data interpretation to ensure that our ultimate goal, furthering our knowledge of the genetics of complex diseases, is achieved.


JDR is supported by grants from the NIDDK and CCFA.

Extended analyses of inflammatory bowel disease susceptibility loci is advisable before definitive conclusions about their causative role can be drawn



  • Conflict of interest: None declared.

Linked Articles