Introduction

Crohn's disease (CD) and ulcerative colitis (UC) are the major forms of chronic idiopathic inflammatory bowel disease (IBD). They occur in young adults with a prevalence of at least one per thousand.1 Familial aggregation of IBD and greater concordance in monozygotic than dizygotic twins,2 indicate a genetic contribution to IBD that is stronger for CD than UC. The relative risk to siblings of affected relatives (λs) has been estimated at 36.5 for CD, 16.6 for UC and 24.7 for IBD (CD, UC and indeterminate colitis).3

Genome-wide screens in affected relative and sibling pairs provided significant evidence of linkage on chromosome 16 for CD.4 This region has been confirmed as a CD susceptibility locus in numerous replication studies in different populations, including an international IBD consortium (as reviewed in Cavanaugh5). To date, IBD1 appears to be the most consistently replicated region of linkage in inflammatory bowel disease.

NOD2 is located in the peak region of linkage on chromosome 16 and was found to induce NF-κB activation in monocytes.6 Following a frameshift mutation (caused by a cytosine insertion and known as 3020insC or single-nucleotide polymorphism (SNP)13) leading to the truncation of the leucine-rich repeat region of NOD2, responsiveness to bacterial lipopolysaccharide (LPS) was observed to be greatly diminished.6 Associations between this frameshift mutation and CD have been reported in US, UK and European populations.6,7,8

Genomewide screens have also provided significant evidence of linkage on chromosomes 1,7,8 3,7 6,9 10,9 12,3 1910 and X11 to IBD while there have been two independent reports of linkage to CD to chromosome 14q11–12.12,13

In this study, we tested the presence of the chromosome 16, 14, 12 and 1 susceptibility loci in 65 Irish families. We then analysed the contribution of the NOD2 SNPs to CD within the Irish population.

Materials and methods

Family ascertainment and recruitment

Following Ethics Committee approval, recruitment of the IBD families was carried out on a nation-wide basis. Peripheral blood samples were collected from affected individuals and unaffected family members. Informed written consent was obtained from all individuals and affected members were asked to complete a detailed clinical questionnaire. Individuals not of Irish origin were excluded from this study. All affected individuals were Caucasian and had parents that were born in Ireland. DNA was extracted and stored at −20°C prior to genotyping.

A total of 65 multiplex families were recruited (see Table 1 for an overview of family structures). Sib-pair analysis used 77 ASP from these recruited multiplex families. The 65 families contained 143 affected individuals and 200 unaffected relatives. In addition, 109 CD simplex families consisting of an affected individual and both parents were recruited.

Table 1 Families and genotype-affected sibling pairs

Linkage analysis

A total of 12 microsatellite markers were chosen to span previously identified regions of interest on chromosomes 1, 12, 14 and 16 (Table 2). The polymorphic microsatellites and initial corresponding intermarker distances were from the Genethon map,14 and database (ftp://ftp.genethon.fr/pub/Gmap/Nature-1995/data/) with the exception of D1S552, a CHLC marker. The approximate position of D1S552 relative to Genethon markers was obtained using integrated genome maps available at the National Centre for Biotechnology Information (http://www.ncbi.nlm.nih.gov). Fluorescent dye-labelled forward primers and unlabelled reverse primers were custom-synthesised (Applied Biosystems and MWG Biotech). Following multiplex PCRs, amplified products were analysed using an ABI PRISM™ 310 Genetic Analyser (Applied Biosystems). Allele sizes were determined by use of the ABI GENESCAN software (version 2.0) (Applied Biosystems). PedCheck was used to eliminate genotype errors and to check for misinheritance.15 Allele frequencies for each microsatellite locus were determined using 72 unrelated, nonaffected founder members from the multiplex families.

Table 2 Microsatellite markers and intermarker distances according to Genethon map

Pedigrees were primarily analysed by disease subgroup (CD, UC or mixed), then combined to generate an overall result for IBD, with the exception of chromosome 14. The chromosome 14 locus was originally identified as responsible for susceptibility to CD12,13 in two independent studies in the US. However, suggestive evidence of linkage to chromosome 14 was not observed in CD families from another North American panel, but in mixed families.7 Therefore, markers on chromosome 14 were genotyped and analysed in the CD and mixed families only. Multipoint nonparametric linkage (NPL) analysis was performed by use of the ‘Sall’ scoring function in the GENEHUNTER-PLUS16 modification of the GENEHUNTER package.17 This modification also permits the calculation of a multipoint LOD (MLOD) score based on a linear allele-sharing model.16 Exclusion maps were constructed for the loci on chromosomes 1, 12 and 14 using GENEHUNTER17 and λs values of 1.2, 1.5, 2.0, 2.5, 3.0 and 5.0. Linkage was considered excluded given an LOD score of less than −2.0 for a specified value of λs.

Microsatellite saturation

A further 10 microsatellite markers were chosen to saturate the identified region of interest on chromosome 16. The polymorphic microsatellites were from the Genethon map,14 and database (ftp://ftp.genethon.fr/pub/Gmap/Nature-1995/data/) and the MarshMed database (www.marshmed.org/genetics/). Positioning of markers was carried out using integrated genome maps available at the National Centre for Biotechnology Information (http:/www.ncbi.nlm.nih.gov). Fluorescent dye-labelled forward primers and unlabelled reverse primers were custom synthesised (Sigma Genosys) (Table 4). Multipoint analysis of the 21 CD families consisting of 29 ASP was carried out using the ‘Sall’ scoring function by the program GENEHUNTER-PLUS18 as detailed above.

Table 4 Results of PDT analysis using microsatellite markers in the chromosome 16 linkage region

Linkage disequilibrium analysis

Linkage disequilibrium analysis was subsequently carried out in CD families within the region of most positive linkage. In all, 13 microsatellite markers were analysed in 421 individuals consisting of 121 simplex families and 10 CD multiplex families using the pedigree disequilibrium test (PDT).18,19

Mutation detection and analysis

Analysis of SNPs on NOD2 was carried out using a combination of SNaPIT mutation detection method20,21 and restriction endonuclease digestions. Information on the relevant SNPs was obtained from the SNP database http://www.ncbi.nlm.nih.gov/SNP.

In brief, primers were designed to amplify seven of the published SNPs identified on NOD2 using the computer program OLIGO™ Version 4.022 (Table 3). Following amplification, six SNPs were genotyped using SNaPIT mutation detection method, which is a specific, versatile method of mutation detection and has previously been used both in large- and small-scale mutation detection.23 Primers were designed so that during amplification the position of the first uracil incorporated into the extended primers differed depending on whether a polymorphism was present or absent. Subsequent glycosylase excision of the uracil residues followed by cleavage of the apyrimidinic sites allowed detection of the polymorphism in the amplified fragment as a fragment length polymorphism.20 During optimisation of this technique a random number of samples were genotyped using both the SNaPIT method and MspI and HpaII restriction endonucleases that showed 100% correlation in allele detection. SNP12 (using the nomenclature from Hugot et al24) was genotyped utilising the restriction endonuclease HinPII (New England Biolabs) as it was not suitable for use with the SNaPIT method of mutation detection.

Table 3 SNP primer sequence and fragment size generated following SNaPIT method or restriction endonuclease digestion (SNP12)

Following digestion as outlined,20 SNaPIT- and restriction endonuclease-treated samples were analysed on an ABI PRISM™ 310 Genetic Analyser (Applied Biosystems). Heterozygous and homozygous individuals were selected for use as controls to ensure consistent identification of the relevant SNPs between runs.

SNP data were analysed for linkage disequilibrium using the PDT 3.12 program,18,19 which analyses all potentially informative data from extended as well as individual pedigrees. A healthy control population was genotyped for the SNP8, 12 and 13 polymorphisms and allele frequencies for these SNPs were determined. Fisher's exact test and Student's t-test were used to compare qualitative and quantitative variables between groups, respectively.

Results

Nonparametric linkage analysis

Multipoint analysis of affected sib-pairs (n=77 pairs) was performed using the ‘Sall’ algorithm of GENEHUNTER-PLUS.16 Linkage of CD to chromosome 16 was con-firmed25 with a maximum MLOD score of 2.241 at D16S3120 with the UC sib-pairs displaying MLOD scores of zero at this region. There was no significant evidence of linkage to IBD at chromosome 1, 12 or 14 regions (MLOD=0.786 at D1S552, MLOD=0.605 at D12S1708, MLOD=0.107 at D14S264). Using GENEHUNTER,17 we were able to exclude chromosome 12 as a locus with λs=3.0 for susceptibility to IBD, although higher λs values were needed to exclude CD, UC and mixed subgroups (λs>5). However, a λs value of 2.0 was excluded with respect to the marker D12S83, which was originally reported to confer a sibling relative risk of 2.0 for IBD.3 Chromosome 14 was excluded in mixed disease with a λs=3.0. However, higher λs values were required to exclude chromosome 14 to CD and to exclude chromosome 1 as a susceptibility locus for all subgroups (λs>5 for all).

Microsatellite saturation experiment

A further 10 microsatellite markers, spanning the interval from D16S3093 to D16S3034 were genotyped in 21 CD multiplex families consisting of 29 ASP with markers at approximate 1.6 cM intervals. Results of GENEHUNTER-PLUS analysis using the ‘Sall’ algorithm are depicted in graph format (Figure 1). Overall, a triple-peaked configuration of the CD phenotype was observed with the area of maximal MLOD concentrating on the D16S3120 region. Additional peaks were observed near D16S409 and D16S419 markers.

Figure 1
figure 1

Fine mapping of region of interest on chromosome 16 using CD ASP families. Linkage analysis was performed by using GENEHUNTER-PLUS.

Linkage disequilibrium analysis

A total of 421 individuals from 121 simplex CD families and 10 multiplex CD families were genotyped for 13 microsatellite markers. These multiplex families were selected as both parents and unaffected siblings were available for genotyping, which not only increased the genetic information available for analysis but also acted as further internal controls.26 Analysis of linkage disequilibrium within this data set was carried out using the PDT.18,19 PDT analysis of this data set resulted in a significant P-value of 0.0054 at the D16S3080 marker when analysed giving weight to larger families within the data set. This increased to a P-value of 0.017 when weight was averaged out over all families within the data set (Table 4). A weak association was observed for alleles of markers D16S3044 (P=0.02), D16S3080 (Allele 2: P=0.009, Allele 3: P=0.008, Allele 4: P=0.04) and D16S419 (P=0.05). However, after applying the Bonferroni correction for the number of tested alleles, only a weak association between allele 3 (279 bp) of the D16S3080 marker and the CD phenotype remained (P=0.05). Stratification of data on the basis of age of onset <21 years and disease severity did not cause any significant difference to the observed results (data not shown).

NOD2 SNP analysis

Seven SNPs (SNP2, 5–8, 12 and 13) were genotyped in 121 CD simplex families and 10 CD multiplex families. Following PDT analysis, the SNP13 polymorphism was found to be positively associated with CD with a P-value of 0.018 obtained when analysed giving more weight to the larger families within the data set. This value was observed to increase to a P-value of 0.028 when weight was averaged out among all of the families within the data set. A P-value of 0.0305 was noted for SNP7 when analysed giving weight to the larger families. However, this increased to a nonsignificant P-value of 0.0955 when weight was averaged out among all of the families (Table 5). The SNP13 mutation was found to be quite rare with less than 2% of CD patients within the population studied being homozygous and an observed rare allele frequency of 0.04 (Table 5). The frequency of this mutation was observed to be increased in families that were multi-ply affected with CD than in sporadic cases of CD (22 vs 10%, P=0.088); however, this was not observed to be significant.

Table 5 Results of PDT analysis of NOD2 SNPs. NOD2 SNP rare allele frequencies as determined from the nonaffected founder members of our cohort are also displayed

The rare alleles of SNP8 and 12 did not appear to contribute to the CD risk in this cohort with nonsignificant P-values observed. Of the CD patients in this cohort, 22.6% carried a single copy of a NOD2 mutation and 4.8% carried 2 mutations, giving a total of 27.4% of patients with CD carrying at least one copy of a NOD2 mutation. Haplotypes carrying more than one NOD2 mutation (SNP8, 12 and 13) (defined as a ‘NOD2 haplotype’28) were not observed in multiplex families but had an estimated frequency of 1.8% in the sporadic cases. Haplotypes carrying all three rare alleles of SNP8, 12 and 13 were not observed in this data set. Allelic frequencies in our control population for SNP8, 12 and 13 (0.04, 0.01, 0.01, respectively) were observed to be comparable to other European control populations.24,28

Stratification on NOD2 genotypes

In order to identify the contribution of the NOD2 gene to the original linkage observed on chromosome 16 in the ASP families, we stratified the cohort by removing all of the genotypes of all individuals carrying mutations at SNP7 or 13. For this analysis, all individuals carrying the NOD2 mutations were coded as ‘unknown’ in the linkage files. MLOD scores at the D16S3120 were observed to drop in significance (from MLOD 2.2 to MLOD 1.9). However, the MLOD scores at the D16S409 and D16S419 locus did not show as significant a drop as displayed at the D16S3120 locus.

NOD2 SNPs and predominant disease distribution

NOD2 mutations have been recently found to be associated with ileal disease.28,29 Patients in this cohort who possessed a rare allele of SNP8, 12 or 13 presented earlier when compared to patients without rare variants (mean age, 20.1 vs 24 years, P=0.011). An analysis of the available patient information in this cohort indicates that possession of a copy of SNP13 coincides with predominantly ileal disease in 92% of individuals (P=0.02). SNP12 and 8 did not appear to be predominantly associated with ileal disease (nonsignificant P-values observed). However, as the numbers involved are relatively small (n=16, 6, 18 for SNP8, 12 and 13, respectively), a detailed comparative study of the contribution of NOD2 gene mutations to primary disease location in additional Irish CD patients is currently in progress.

Discussion

Genetic linkage of CD to the IBD1 locus on chromosome 16 has been confirmed in many independent groups and by a large collaborative study.5 Altogether, the results of all the independent studies, including this one, are highly consistent, with the recent identification of NOD2 as an IBD susceptibility gene confirming the importance of linkage results on chromosome 16. Nonparametric linkage analysis of the CD-affected sib-pairs, confirmed that a susceptibility locus for CD within the Irish population is located in the pericentromeric region of chromosome 16. Significant MLOD values were obtained at D16S3120 and approaching significance at D16S409.

On chromosome 1, 12 and 14, combined analysis generated positive but nonsignificant LOD scores (MLOD=0.786 at D1S552, MLOD=0.605 at D12S1708, MLOD=0.107 at D14S264). In our study, none of the disease subgroups displayed significant evidence of linkage to these regions, although λs values required to exclude loci ranging from 3.0 to >5.0. To date, there has been no suggestion in any previous study that any IBD locus would give such a sibling risk. It has been observed that it is often difficult to replicate linkages within the same disease, with the findings in studies of the same disease often being inconsistent.30 It appears that considerably more families are required to detect susceptibility loci during replication analyses, than for the initial observations in genome-wide screening particularly in the case of loci of modest effect.31 Therefore, the lack of linkage to these regions in our replication analysis could be because of the relatively small sample size analysed, and weak susceptibility genes may yet exist in these regions. Alternatively, lack of significant evidence of linkage to these regions may possibly be due to the fact that these loci might not play as significant a role in susceptibility to IBD in the cohort analysed, possibly due to disease or locus heterogeneity. However, this may only be confirmed following the analysis of larger numbers of families for these regions.

Microsatellite saturation of the region of most positive linkage on chromosome 16 in ASP families observed a triple-peaked result with the area of maximal MLOD concentrating on the D16S3120 region, with additional peaks observed near the D16S409 and D16S419 microsatellite markers. In the initial report, the IBD1 region spanned more than 40 cM and two general peaks were observed.4 In subsequent studies, peaks in CD families have been reported for many different markers on chromosome 16 (as reviewed in Cavanaugh5). Therefore, our result is consistent with other studies in this region and may provide further evidence that more than one susceptibility gene for IBD resides on chromosome 16.

In order to further refine the IBD1 localisation in this region, we looked for an association using the genotyped microsatellite markers. A weak association was observed for CD with the 279 bp allele of D16S3080 (P=0.05). This marker is located within 3 cM from the region with the strongest evidence for linkage in our CD ASP multipoint analysis and falls within our detected strongest support interval for linkage (D16S409–D16S3120). The small difference in positioning from linkage analysis vs PDT analysis could possibly be due to the overlapping but not identical sets of families used in the analyses and the limitations that ASP/haplotype sharing methods are known to have in the fine mapping of complex traits.32 Thus, altogether, linkage studies and linkage disequilibrium analysis centred the same pericentromeric region on chromosome 16.

Evidence was found in this study of an association of a NOD2 frameshift mutation (SNP13) with CD following family-based association analysis with a higher frequency of SNP13 observed in families that were multiply affected with CD than in sporadic cases of CD. However, SNP7 was only associated with CD when more weight was given to the multiplex families within the data set. This result could be due to statistical fluctuation or to underlying genetic differences between multiply affected CD families and sporadic cases. Ogura et al6 noted that the applicability of NOD2 associations to the more common, sporadic cases would warrant further study. Case–control studies have shown that NOD2 also significantly contributes to the sporadic group.33,24,28 Nevertheless, the frequency of the SNP13 polymorphism was previously observed to be significantly higher in individuals from pure CD families than affecteds from mixed families or sporadic CD cases.28 Following genotype–phenotype analysis the SNP13 polymorphism appears to be associated with both a younger age of onset (P=0.011) and, similar to other studies,28,29 a predominantly ileal disease distribution (P=0.02). Analysis of NOD2 polymorphisms in a further cohort of CD patients is currently in progress.

The nonsignificant excess of transmission of the SNP8 and 12 mutations could be because of their very low frequency within the population, as only a small number of transmissions will be informative in even a large cohort.28 Therefore, lack of evidence of association of these SNPs could be because of the relatively small sample size analysed.

On stratification of NOD2 genotype, the central linkage peak drops in significance from a MLOD of 2.2 to MLOD 1.9, indicating that NOD2 played a role in allele sharing in this region. The relatively unchanged flanking peaks may indicate that further susceptibility genes apart from NOD2 exist on chromosome 16. Evidence for an NOD2-independent susceptibility locus close to the D16S3068 microsatellite marker has been observed by Hampe et al.34 in the German population, which may lead to the discovery of additional susceptibility genes in this region of chromosome 16.

In summary, we present evidence in this report of an association between NOD2 and CD within the Irish population. No significant evidence of linkage was observed between the chromosome 1, 12 and 14 loci and IBD in this data set. Relative to Britain and much of mainland Europe, the ethnic and genetic mix of Ireland is long established and relatively homogenous.35,36,37 Thus, analysis of this population may provide advantages in genetic studies of complex diseases.

In terms of position and function, NOD2 is an excellent candidate gene for IBD. Other potentially disease-causing mutations on NOD2 have also been recently identified by Lesage et al27 and the role of both these, and the previously identified polymorphisms in the pathophysiological mechanisms of CD and the treatment of this disease are currently under intensive investigation. Hopefully, the precise role of NOD2 in innate immune responses in the gastrointestinal tract may be clarified, thus contributing to the identification of associated environmental factors and guiding the search for specific therapies.