Article Text


Genotype-phenotype analysis of the Crohn’s disease susceptibility haplotype on chromosome 5q31
  1. A Armuzzi1,
  2. T Ahmad1,
  3. K-L Ling1,
  4. A de Silva1,
  5. S Cullen1,
  6. D van Heel1,
  7. T R Orchard1,
  8. K I Welsh2,
  9. S E Marshall3,
  10. D P Jewell1
  1. 1Gastroenterology Unit, University of Oxford, Gibson Laboratories, Radcliffe Infirmary, Oxford, UK
  2. 2Clinical Genomics, National Heart and Lung Institute, Imperial College, London, UK
  3. 3Department of Immunology, Wright-Fleming Institute, Imperial College, London, UK
  1. Correspondence to:
    A Armuzzi, Gastroenterology Unit, Gibson Laboratories, University of Oxford, Radcliffe Infirmary, Woodstock Rd, Oxford OX2 6QX, UK;


Background and aims: Recent molecular data suggest that genetic factors may underlie the disease heterogeneity observed in both ulcerative colitis (UC) and Crohn’s disease (CD). A locus on chromosome 5q has been implicated in susceptibility to CD, and recently refined by linkage disequilibrium mapping to a conserved 250 kb haplotype (5q31). No data regarding the contribution of this locus to clinical phenotype exist. In this case control study, we investigated the contribution of this haplotype to both susceptibility and phenotype of CD and UC.

Patients and methods: We studied 330 Caucasian CD and 457 UC patients recruited from a single UK centre. Association with disease susceptibility and phenotype was analysed with haplotypes reconstructed from three single nucleotide polymorphisms chosen to span this susceptibility region. Evidence for possible genetic epistasis between IBD5 and NOD2/CARD15 was sought.

Results: Linkage disequilibrium across this region was confirmed, with two haplotypes comprising 88% of all chromosomes. Susceptibility to CD, but not to UC, was associated with homozygosity for a common haplotype, H2 (pc=0.002; relative risk (RR) 2.0). Genotype-phenotype analyses demonstrated that this association was particularly strong in patients with perianal disease (pc=0.0005; RR 1.7), especially in individuals homozygous for this haplotype (pc=0.0005; RR 3.0). Importantly, no association with H2 was found in 186 patients without perianal disease. No evidence of epistasis between IBD5 and NOD2/CARD15 was demonstrated.

Conclusions: The IBD5 risk haplotype is associated with CD only. Genotype-phenotype analysis reveals that the strongest association is observed in patients with perianal CD. While the precise gene involved is unclear, these data provide further molecular evidence for a genetic basis of the clinical heterogeneity of CD.

  • genetics
  • Crohn’s disease
  • ulcerative colitis
  • IBD5
  • genotype-phenotype analysis
  • CD, Crohn’s disease
  • CARD15, caspase recruitment domain containing protein 15
  • IBD, inflammatory bowel disease
  • IL, interleukin
  • PAR, population attributable risk
  • RR, relative risk
  • SNP, single nucleotide polymorphism
  • UC, ulcerative colitis

Statistics from

The chronic idiopathic inflammatory bowel diseases (IBD; MIM 601458) Crohn’s disease (CD; MIM 266600) and ulcerative colitis (UC; MIM 191390) are a significant cause of morbidity in the Western world, with a combined prevalence of 200/100 000 in the UK.1 Epidemiological, clinical, and molecular studies have provided strong evidence for the role of genetics in determining susceptibility to disease. Genome wide scans have implicated a number of susceptibility loci: some influence susceptibility to IBD overall while others confer susceptibility to either CD or UC.2–10 Furthermore, epidemiological evidence suggests that genetic factors also determine the clinical heterogeneity characteristic of both CD and UC, with high concordance rates for age at presentation, location, and behaviour of disease in multiply affected families.11–14 This is supported by a growing body of molecular data. With reference to CD, six independent groups including our own have now reported an association between variants in the NOD2/caspase recruitment domain containing protein 15 (CARD15) susceptibility gene on chromosome 16 (IBD1) and ileal disease only.15–20 We have additionally shown that specific extended haplotypes across the HLA region (IBD3) are associated with colonic and perianal disease.15 These molecular data increasingly suggest that CD is not a single disorder but a limited heterogeneous group of oligogenic disorders manifesting similar but specific clinical features. It is likely that the precise phenotype that an individual manifests depends upon the specific interaction of one or more of a few genetic variants with an unidentified luminal antigen.

Genome wide scans have identified a number of other putative susceptibility loci for CD but the contribution of these areas to disease phenotype is unknown. Linkage to an area on chromosome 5q31-33 has been demonstrated in two studies for CD only,6,8 although few UC affected families were studied. Fine mapping of this locus, designated IBD5, defined an area that achieved genome wide significance (LOD score 3.9) in families characterised by early age of onset of CD.8 Further characterisation of this area by linkage disequilibrium mapping identified a single highly conserved 250 kb haplotype associated with CD.21 This susceptibility haplotype includes 11 single nucleotide polymorphic variants found exclusively on this haplotype, all of which demonstrated significance in a transmission disequilibrium test. Identifying the causal mutation on this haplotype has been hampered by both the degree of linkage disequilibrium and the density of immunoregulatory genes encoded, including the cytokine gene cluster (interleukin (IL)-4, IL-5, and IL-13).

The contribution of this locus to disease phenotype remains unknown. In this case control study, we determine the association of the IBD5 risk haplotype with specific disease phenotypes in a large British cohort of accurately defined CD and UC patients.


Patients and controls

This study was approved by the Central Oxford Research Ethics Committee (COREC 00.083) and written informed consent was obtained from all participants. A total of 330 Caucasoid patients with CD and 457 with UC were recruited from a single tertiary referral centre. Of the CD patients, 240 had previously been included in a study of IBD1 and IBD3.15 Diagnosis of IBD was determined by conventional clinical, radiological, endoscopic, and histological criteria.22 Phenotypic details were obtained by retrospective case note review for all patients between January 2001 and March 2002 by two investigators (TA, AA). Duration of follow up was defined as the interval between diagnosis and case note review. Details regarding ethnicity, family history, and smoking history were further supplemented by a patient completed postal questionnaire. Ethnicity was defined as Jewish if three or more grandparents were Jewish. Only one member from multiply affected families was included.

CD phenotype

CD phenotype was classified by age of diagnosis, location, and behaviour of disease, as previously described.15 Briefly, location of disease was classified by current or past history of disease at three sites (ileal, colonic, perianal). Perianal disease was defined by the presence of perianal abscesses, fistulae, or ulcers, but not by the presence of skin tags. Disease behaviour was classified by current or past behavioural types. The presence of stenotic disease was defined by surgical history or by small bowel enema. Fistulating disease was defined by the presence of an abnormal communication between two epithelial surfaces. This included the presence of ischiorectal, intersphincteric, and pelvirectal abscesses if they connected with an epithelial surface, either spontaneously or following surgical intervention. However, simple perianal abscesses were excluded from this definition.

UC phenotype

UC phenotype was classified by disease extent and disease severity. Disease extent was defined by the most proximal point of inflammation identified at colonoscopy or barium enema. The splenic flexure was used as a landmark to distinguish limited from extensive disease. If macroscopic and microscopic extent at the same time point were discordant, the microscopic extent was recorded. For individuals who had undergone a colectomy, disease extent was defined by microscopic extent in the resection specimen. Disease severity was judged by the need for colectomy for failed medical therapy and not by the use of colitis activity indices or the requirement for immunosuppressant drugs. Patients with indeterminate colitis or colorectal cancer complicating UC were not eligible for inclusion.

The control group of 870 individuals was randomly selected from anonymised blood samples of two independent populations collected from general practitioner health screening clinics in Bedfordshire (n=516, OXCHECK study23) and Oxfordshire (n=354). These two groups were identical with respect to age, sex, smoking history, and genotype (data not shown) and were therefore combined for all analyses. Although none of the control subjects had IBD, no information regarding their family history was available.


We studied three single nucleotide polymorphisms (SNPs) (IGR2060a_1 C/G, IGR2198a_1 G/C, and IGR3096a_1 C/T) previously shown by a Canadian group to be found uniquely on the IBD5 risk haplotype (web site: In order to confirm haplotype structure and extent of linkage disequilibrium in our British population, these SNPs were selected by their position at both ends and the middle of the 250 kb risk haplotype. These SNPs, which are not thought to be functionally relevant, were used here as markers of the CD associated risk haplotype.

All genotyping was carried out using polymerase chain reaction-sequence specific primers.24 Primers were designed using published genomic sequences (GenBank accession No 18562169). CD patients and controls were also typed for polymorphisms in the NOD2/CARD15 gene (R702W, G908R, and 1007fsinsC) using previously described primers (table 1).15 All reactions contained control primers to verify appropriate amplification and electrophoresis. A single thermocycling programme was used for all reactions. The products were electrophoresed on 1% agarose gels with ethidium bromide, viewed under ultraviolet light, and an image recorded digitally.

Table 1

Primers used for sequence specific polymerase chain reaction

Data analysis

In the absence of parental genotype data, the three SNPs spanning the IBD5 haplotype were constructed into haplotypes using a statistical haplotype reconstruction method, implemented in the computer software PHASE.25 This software was recompiled to run under the Macintosh OS10 operating system. For the construction haplotypes, 30 000 iterations of the PHASE algorithm were used, each comprising 100 runs through the Markov chain. Each haplotype was then analysed for association with CD and UC susceptibility and phenotype.

Phenotype-genotype associations were analysed by the χ2 test, using the Knowledge Seeker program (Angoss, Guildford, Surrey, UK). A Bonferroni correction factor was applied where appropriate, and when applied is indicated by pc. The population attributable risk percentage (PAR%) was calculated according to the method of Schlesselman,26 assuming that the frequency of all variants in the control population reflects that of the general population.


A total of 330 patients with CD (table 2), 457 patients with UC (212 men and 245 women), and 870 controls were studied. Mean age at diagnosis for CD and UC patients was 28.1 years (range 3.0–82.2) and 33.8 years (range 0.8–80.3), respectively. Median duration of follow up for the entire cohort of patients was 14.5 years (range 0.9–59.9). There was no significant difference between cases and controls with respect to age or sex. A family history of IBD (UC or CD) was reported by 118 (35.7%) of CD patients and 102 (22.3%) of UC patients.

Table 2

Demographic and clinical characteristics of 330 Crohn’s disease patients

Disease susceptibility

We studied three common SNPs on the previously reported IBD5 CD associated haplotype. As previously described, each of these three SNPs was significantly associated with susceptibility to CD only (IGR2060a_1 C: pc=0.0003, relative risk (RR) 1.4; IGR2198a_1 G: pc=0.0003, RR 1.4; IGR3096a_1 C: pc=0.0001, RR 1.5) (table 3).

Table 3

IBD5 allele frequencies for inflammatory bowel disease (IBD) patients and controls

Haplotype analysis is potentially more informative than analysis of single polymorphisms as it facilitates surveillance of variation within a region of linkage disequilibrium.27 Thus haplotypes were constructed from these three SNPs. In the absence of parental data, the association of one SNP with another (phase) was inferred using statistical computer software.25 The probability of accurate haplotype designation was >90% for 99% of chromosomes. Eight haplotypes were identified, of which four occurred at a frequency of more than 2% in the overall population. Two haplotypes (designated H1 and H2, table 4) accounted for 87.7% of all chromosomes, similar to the previous findings in a Canadian cohort.21 Only haplotype 2 (H2) was significantly associated with CD overall (46.8% v 38.9%; p=0.0004, pc=0.003) (table 4). Conversely, H1 (comprising alleles absent on the risk haplotype) was negatively associated with CD (cases 39.8% v controls 48.8%; p=0.00007, pc=0.0006). No difference in haplotype frequencies was observed between UC patients and controls (table 4).

Table 4

Definition and frequency of IBD5 haplotypes in Crohn’s disease patients and controls

In order to investigate the existence of a haplotype dose effect, further analysis was performed (table 5). In CD patients overall, the risk associated with the H2 haplotype was only seen in homozygotes (homozygotes 23.0% v 14.7% (pc=0.001, RR=2.0, 95% confidence interval (CI) 1.4–2.8); heterozygotes 47.6% v 48.4% (NS)). The PAR% of the risk haplotype H2 for CD was estimated to be 20.3%.

Table 5

Genotype frequencies of IBD5 risk haplotype in Crohn’s disease patients and controls

CD phenotype

Disease location

Epidemiological studies suggest that anatomical location of disease is a stable defining characteristic in an individual with CD.28 In this study, 63.3% of patients had disease at more than one site. Disease was located to the ileum, colon, and perianal regions in 77.3%, 60.9%, and 43.6% of patients, respectively (table 2).

The highest frequency of haplotype H2 (52.1%) was found in 144 patients with perianal disease. Compared with controls, haplotype H2 was associated with two overlapping phenotypic subgroups only, one defined by the presence of perianal disease (pc=0.0005, RR 1.7) and the other by ileal disease (pc=0.02, RR 1.4) (table 6). Importantly, the association between ileal disease and H2 was not seen in the absence of perianal disease (either in the ileal, no perianal group, or in the pure ileal group), suggesting that the association with ileal disease overall is due to the coexistence of perianal disease in 46 of these patients. However, when patients with perianal disease were compared with those without perianal disease, statistical significance did not withstand strict Bonferroni correction (perianal v non-perianal: p=0.01, pc=NS, RR=1.5).

Table 6

IBD5 risk haplotype frequency stratified by Crohn’s disease location

Again, a haplotype dose effect was observed (table 7). Patients homozygous for the H2 haplotype possessed a threefold increased risk of perianal disease compared with controls (H2/H2: 25.7% v 14.7%; pc=0.0005, RR 3.0 (1.8–5.0)).

Table 7

Genotype and haplotype frequencies of IBD5 risk haplotype in perianal disease

IBD5 and disease location stratified by NOD2/CARD15 status

NOD2/CARD15 mutations have recently been reported to be associated with ileal CD.15–20 To determine any evidence of genetic epistasis, the association with the IBD5 risk haplotype H2 was reanalysed following stratification by NOD2/CARD15 status (table 8). NOD2/CARD15 positive patients were defined by possession of at least one of the three common CD associated variant alleles. Stratification analyses, carried out both in the entire CD cohort and in the subgroup of patients with ileal disease, demonstrated that the frequency of haplotype H2 did not differ between NOD2/CARD15 positive and NOD2/CARD15 negative patients. This suggests that these genetic loci may act independently to determine disease at two different anatomical locations. Thus in the 107 patients with both ileal and perianal disease, association was found with both the NOD2/CARD15 variants (minimum one variant NOD2/CARD15 allele: CD 42.1% v controls 15.4%; p=2×10−10) and the IBD5 risk haplotype (IBD5 H2 frequency: CD 53.3% v controls 38.9%; p=0.00005; pc=0.0004).

Table 8

IBD5 risk haplotype frequency, stratified by NOD2/CARD15 status

Disease behaviour

The IBD5 risk haplotype was not associated with specific disease behaviour, nor was it associated with the presence of extraintestinal manifestations. Whether perianal disease can justifiably be classsified as “fistulating”29 is not yet clear and so depends on the study definition. Importantly, no association was found with the IBD5 risk haplotype H2 and fistulating disease overall (including perianal fistulae) (H2 frequency: fistulating disease (n=118) 51.3% v no fistulating disease (n=212) 44.3%). In addition, no association with the IBD5 haplotype was found in the 6.7% of patients who had fistulae at other sites (H2 frequency: fistulating disease and no perianal disease (n=22) 50.0% v no fistulating disease 44.3%). These data suggest that the IBD5 association is with perianal disease alone, rather than with fistulating disease, but the number of patients in this latter group was too low to allow firm conclusions to be drawn.

Age at diagnosis

Early age of diagnosis was associated with ileal (mean (SEM) age: ileal 26.4 (0.7) v no ileal 34.0 (1.8) years; pc=0.00001), perianal (perianal 26.0 (1.0) v no perianal 29.8 (1.0) years; pc=0.02), stenotic (stenotic 26.8 (0.7) v non-stenotic 31.5 (1.5) years; pc=0.009), and fistulating disease (25.3 (1.0) v no fistulating 29.7 (0.9) years; pc=0.01). In the overall group of CD patients, no association was found between age of diagnosis (mean age of diagnosis: H2 positive, 27.4 years v H2 negative, 29.8 years) or the presence of a family history (H2 frequency: family history 45.8% v no family history 47.4%) and either possession or homozygosity of the IBD5 haplotype H2.

UC disease phenotype

The most proximal point of microscopic inflammation defined disease extent. In this study, 242 (52.9%) had extensive disease defined as inflammation proximal to the splenic flexure. No association was found with disease extent and either possession or homozygosity of the IBD5 risk haplotype H2. A total of 124 (27.1%) patients had severe disease, defined by the need for colectomy. The IBD5 risk haplotype H2 was not associated with disease severity (colectomy 41.9% v no colectomy 38.7%). Early age of diagnosis was associated with extensive disease (mean (SEM) age: extensive disease 32.2 (0.8) v distal disease 35.6 (0.9) years; p=0.009) and need for colectomy (need for colectomy 31.0 (1.0) v no colectomy 35.0 (1.0) years; p=0.005). Consistent with the findings in CD patients, no association was found with the presence of extraintestinal manifestations. In addition, neither age of diagnosis (mean age of diagnosis: possession of H2 32.9 v absence of H2 35.3 years) or the presence of a family history of IBD (H2 frequency: family history 38.5% v no family history 39.9%) was associated with the IBD5 risk haplotype H2.


In CD, it is increasingly believed that not only susceptibility but also clinical phenotype are genetically determined. The IBD5 locus on chromosome 5 has been implicated in CD susceptibility by both linkage and transmission disequilibrium tests.6,8,21 However, the role of this genetic region in determining clinical characteristics is unknown. In this case control study of 330 well defined Caucasian UK CD patients, 457 UC patients, and 870 healthy controls, we sought to identify the contribution of IBD5 firstly to CD and UC susceptibility and then to phenotype, classified by age of onset, location, extent, and behaviour of disease.

The IBD5 locus has recently been fine mapped to a 250 kb haplotype.21 We genotyped three SNPs chosen not for functional relevance but to span this susceptibility region. The presence of tight linkage disequilibrium across this region was confirmed, and two common three-locus haplotypes identified, comprising 88% of all chromosomes. Association by haplotypes was carried out as this is potentially more informative, especially where the actual disease causing mutation has not been identified, as in this case.30 However, the highly conserved haplotype structure of this region, combined with the high prevalence of two common haplotypes, meant that individual SNPs were almost as informative as haplotype analysis.

Susceptibility to CD only was associated with each of the three polymorphisms studied, and with the common reconstructed IBD5 haplotype H2. Analysis by genotype revealed that this association was found exclusively in homozygous individuals, consistent with the previously reported dose effect.21 Interestingly, the relative risk associated with haplotype H2 in our CD population was less than that reported previously in the Canadian cohort21 but consistent with the risk determined by a recent UK multicentre transmission disequilibrium test of 294 CD trios.31 This apparent discrepancy between Canadian and UK cohorts is perhaps not a surprising finding when one considers that linkage to this region has been identified by this former group,8 but not in a UK cohort.3 There are a number of possible explanations for these differences, including both genetic and disease heterogeneity between these two populations. Details of ethnicity and disease phenotype of patients recruited to the Canadian study may in the future permit dissection of these differences.

In our ethnically homogenous and rigorously phenotyped single centre cohort, we demonstrated that the frequency of haplotype H2 was particularly high in patients with perianal disease. Compared with controls, carriage of a single copy of the IBD5 risk haplotype was associated with a relative risk of perianal disease of 1.9. This increased to 3.0 in individuals homozygous for this haplotype, consistent with the findings from the overall cohort. Importantly, in the absence of perianal disease, the IBD5 risk haplotype was not associated with CD. However, in comparison with the association between NOD2/CARD15 variant alleles and ileal CD, this represents a modest association. Thus the difference in haplotype H2 frequency between patients with and without perianal disease is insufficient to withstand strict Bonferroni correction.

The ideal genotype-phenotype study categorises patients using clearly defined phenotypic variables that demonstrate good inter- and intraobserver variability. Classification should preferably be applicable at diagnosis, and stable with time. These standards are difficult to meet in IBD studies for a number of reasons. Firstly, existing internationally agreed classification systems are insufficiently detailed to prevent ambiguous assignment of disease subgroup. Not surprisingly therefore, poor inter- and intraobserver variability, even among experts, may result.32 This may explain why the reported prevalence of perianal involvement varies widely.33–35 In this study, strict definitions of clinical subgroups were implemented to minimise the risk of ambiguous subgroup assignment. Secondly, classifying clinical subgroups may be difficult with phenotypes, particularly CD behaviour, which are not stable with time.28 To overcome this limitation disease subtype was not classified at diagnosis, or at a single point in time, but by retrospective review of case notes detailing disease course over a median follow up time of 14 years.

Other polymorphic genes have also been associated with clinical phenotype of CD. Most importantly, variants of the recently identified NOD2/CARD15 gene36–38 have been associated with ileal disease.15–20 In this study, stratification of patients with ileal disease by NOD2/CARD15 status revealed no evidence of epistasis between the IBD5 risk haplotype H2 and NOD2/CARD15 variants. This is not surprising as these two genetic regions may be associated with intestinal inflammation at different sites. Indeed, in the 107 individuals in whom disease affected both the perianal region and the ileum, associations with both the H2 haplotype and NOD2/CARD15 variants were identified. This suggests that location of disease is determined by possession of more than one phenotype determining allele or haplotype.

In contrast, no association was found either with UC overall susceptibility or disease phenotype, which is consistent with the absent linkage of the UC phenotype to this chromosomal region.8

The IBD5 risk haplotype H2 spans 250 kb, encompassing a number of immunoregulatory genes. The highly conserved haplotypes across this region impede the identification of the true disease causing variant but there are a number of possible candidates. These include IL-4, IL-5, and IL-13, all of which are crucially involved in Th1-Th2 immune regulation and the inflammatory response.39 Other candidates are those involved in dendritic cell maturation, including CSF2, which encodes granulocyte macrophage-colony stimulating factor, and IL-3.40 Interestingly, this human locus is syntenic to a mouse region on chromosome 11 which has been implicated by linkage studies in the dextran sulphate model of colitis.41 Why the IBD5 locus should only be associated with perianal CD is not clear but these data raise the intriguing possibility that perianal disease results from dysregulated immune responses to site specific bacterial colonisation.

In summary, we have confirmed the association of the IBD5 risk haplotype H2 with CD only, and detailed genotype/phenotype analysis has revealed that this association is most marked in patients with perianal CD. While the precise gene involved is unclear, these data provide further evidence for the genetic basis of the clinical heterogeneity of CD.


This work was supported by a grant from the Catholic University of Rome (to AA) and the National Association of Crohn’s and Colitis (NACC, UK) (to TA). The authors thank the patients who participated in this study. The authors acknowledge the contributions of Sue Goldthorpe for organisation and collection of blood specimens and Robert Walton for allowing us to use the OXCHECK samples.


View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles

  • Robin Spiller