Article Text

Original article
Genetic factors conferring an increased susceptibility to develop Crohn's disease also influence disease phenotype: results from the IBDchip European Project
  1. Isabelle Cleynen1,
  2. Juan R González2,
  3. Carolina Figueroa2,
  4. Andre Franke3,
  5. Dermot McGovern4,
  6. Martin Bortlík5,
  7. Bart J A Crusius6,
  8. Maurizio Vecchi7,
  9. Marta Artieda8,
  10. Magdalena Szczypiorska8,
  11. Johannes Bethge3,
  12. David Arteta8,
  13. Edgar Ayala2,
  14. Silvio Danese9,
  15. Ruud A van Hogezand10,
  16. Julian Panés2,
  17. Salvador Amado Peña6,
  18. Milan Lukas5,
  19. Derek P Jewell4,
  20. Stefan Schreiber3,
  21. Severine Vermeire1,
  22. Miquel Sans2,11
  1. 1Department of Clinical and Experimental Medicine, KU Leuven, Leuven, Belgium
  2. 2Department of Gastroenterology, Hospital Clínic i Provincial/IDIBAPS, Barcelona, Spain
  3. 3 Institute for Clinical Molecular Biology, Christian Albrechts University Kiel, Kiel, Germany
  4. 4 Nuffield Department of Medicine, University of Oxford, Oxford, UK
  5. 5 Gastroenterology Center, 4th Internal Department, General Faculty Hospital, Charles University, Prague, Czech Republic
  6. 6 Laboratory of Immunogenetics, Department of Medical Microbiology and Infection Control, VU University Medical Center, Amsterdam, The Netherlands
  7. 7 Department of Gastroenterology, IRCCS Policlinico San Donato, University of Milan, Milan, Italy
  8. 8 Progenika Biopharma, Derio, Spain
  9. 9 IBD Center, Department of Gastroenterology, Humanitas Clinical and Research Center, Rozzano, Italy
  10. 10 Department of Gastroenterology and Hepatology, Leiden University Medical Center, Leiden, The Netherlands
  11. 11 Department of Digestive Diseases, Centro Médico Teknon, Barcelona, Spain
  1. Correspondence to Dr Miquel Sans, Department of Digestive Diseases, Centro Médico Teknon, Barcelona, Spain; sans{at}


Objective Through genome-wide association scans and meta-analyses thereof, over 70 genetic loci (Crohn's disease (CD) single nucleotide polymorphisms (SNPs)) are significantly associated with CD. We aimed to investigate the influence of CD-SNPs and basic patient characteristics on CD clinical course, and develop statistical models to predict CD clinical course.

Design This retrospective study included 1528 patients with CD with more than 10 years of follow-up from eight European referral hospitals. CD outcomes of interest were ileal (L1), colonic (L2) and ileocolonic disease location (L3); stenosing (B2) or penetrating behaviour (B3); perianal disease; extraintestinal manifestations; and bowel resection. A complicated disease course was defined as stenosing or penetrating behaviour, perianal disease and/or bowel resection. Association between CD-SNPs or patient characteristics and specified outcomes was studied.

Results Several CD-SNPs and clinical characteristics were statistically associated with outcomes of interest. The NOD2 gene was the most important genetic factor, being an independent predictive factor for ileal location (p=2.02×10-06, OR=1.90), stenosing (p=3.16×10-06, OR=1.82) and penetrating (p=1.26×10-02, OR=1.25) CD behaviours, and need for surgery (p=2.28×e-05, OR=1.73), and as such was also the strongest factor associated with a complicated disease course (p=6.86×10-06, OR=2.96). Immunomodulator (azathioprine/6-mercaptopurine and methotrexate) use within 3 years after diagnosis led to a reduction in bowel stenoses (p=1.48×10-06, OR=0.35) and surgical rate (p=1.71×10-07, OR=0.34). Association between each outcome and genetic scores, created using significant SNPs in the univariate analysis, revealed large differences in the probability of developing fistulising disease (IL23R, LOC441108, PRDM1, NOD2; p=9.64e-4, HR=1.43), need for surgery (IRGM, TNFSF15, C13ORF31, NOD2; p=7.12×10-03, HR=1.35), and stenosing disease (NOD2, JAK2, ATG16L1; p=3.01×10-02, HR=1.29) among patients with low and high score.

Conclusions This large multicentre cohort study has found several genetic and clinical factors influencing the clinical course of CD. NOD2 and early immunomodulator use are the clinically most meaningful predictors for its clinical course.

  • Crohn's Disease
  • Inflammatory Bowel Disease
  • IBD – Genetics
  • Genetic Polymorphisms
  • IBD

Statistics from

Significance of this study

What is already known on this subject

  • Inflammatory bowel disease (IBD) carries a heterogenic presentation with respect to disease location, behaviour and severity.

  • Most patients will have a non-complicated disease at diagnosis, but the majority will evolve towards a more complicated disease course. Prediction of those high-risk patients may identify those in need of early biological therapy.

  • Through genome-wide association scans, and meta-analyses thereof, over 100 IBD risk loci have been identified. The main merit of genetic studies in IBD so far has been the translation from the identification of Crohn's disease susceptibility loci to pathways important for disease pathogenesis.

  • Several studies have shown that patients carrying NOD2 variants are at higher risk of developing stenosing and non-perianal fistulising complications, and need for surgery earlier in the disease course.

What are the new findings

  • We confirm the carriage of any NOD2 variant allele as a strong predictor of a more complicated disease course (ileal location, stenosing disease behaviour, bowel resection), while being protective for colonic disease.

  • JAK2 was also found to be a predictor of a more complicated disease behaviour (ileal involvement, and shorter time to stenosing disease behaviour).

  • Of the clinical factors studied, early (within 3 years after diagnosis) immunomodulator use is strongly protective against a stenosing disease behaviour and need for surgery, and thus protects against a complicated disease progression.

How might it impact on clinical practice in the foreseeable future?

  • Molecular genetic testing could be clinically useful in an integrated molecular and clinical diagnostic approach. In particular, the observed OR for NOD2 p.Leu1007fsX1008 homozygosity is in the order of three to fourfold, and early immunomodulator use reduces the rate of need for surgery by threefold. Also, the developed genetic risk score shows striking differences for a more complicated disease between patients with Crohn's disease with a low and a high score.


Crohn's disease (CD) is a chronic relapsing inflammatory disease of the gut. It is a heterogeneous disorder with differences in disease location, behaviour and severity. The heterogeneity of the disease has important implications towards clinical management: patients with a more severe disease course might benefit from early introduction of immunomodulators and/or biologicals, while patients with favourable disease prognosis could be spared from intense treatment and possible side effects. The cause of inflammatory bowel disease (IBD) is multifactorial—environmental and genetic—and poorly understood.1 The genetic background of CD has been extensively evaluated, which has led to significant insights into the mechanism of the disease: a disturbed surveillance of bacteria of the microflora by the intestinal mucosa (NOD2),2 ,3 dysregulation of adaptive immunity (IL23R),4 deficient autophagy (ATG16L1, IRGM),5 ,6 and/or endoplasmic reticulum stress (XBP1, ORMDL3).7 ,8 Predicting the course of the disease has been a challenge. Efforts have been done using clinical and serological markers, but diagnostic and prognostic specificity and sensitivity of these methods are generally too low to be useful in clinical practice.9–11 Also, many of the proposed factors have not been confirmed. Given the appealing role genetic markers could play as predictors of disease evolution (stable over time, unaffected by disease flare, present before onset of disease, easy to assess), attempts have been made to link genetic variants with the classical clinical CD subphenotypes. Several studies have suggested that NOD2 variants are associated with ileal disease location, shorter time to onset of stenosing disease and the need for surgery.12–17 However, no other robust associations of studied genetic variants with clinical subphenotypes could be shown.

A meta-analysis of all genome-wide association scans (GWAS) performed in CD revealed a total of 71 regions of susceptibility.18 Before, a first meta-analysis of three GWAS performed in CD was published, revealing 30 regions of susceptibility and 10 nominally associated CD risk loci.7 The aim of this study was to investigate the influence of these loci on the clinical course of CD in a large multicentre cohort, and to develop a statistical model to predict CD clinical course.


Patients and study protocol

Eight European hospitals participated in this study, including a total of 1528 patients with CD of Caucasian ancestry (Leuven=365, Kiel=275, Oxford=203, Prague=188, Amsterdam=142, Leiden=141, Milan=108 and Barcelona=106). All recruiting centres are highly specified ‘IBD units’ at tertiary referral hospitals. The ethical boards of each separate recruiting institution approved the study. All patients included in this study gave written informed consent. Diagnosis of CD was based on standard clinical, endoscopic and histological criteria. Disease location and behaviour were recorded according to the Montreal classification.9 ,19 Taking into account that CD behaviour varies over the course of the disease in a great proportion of patients,20 a minimum of 10 years of continuous follow-up since CD diagnosis was required as the main inclusion criterion in the study, preventing selection bias. Of note, median follow-up since CD diagnosis in our cohort was 18 years (range 10–62 years). Additionally, basic phenotypical and clinical characteristics including gender, smoking status, age at diagnosis, use of an anti-TNF agent within 3 years after diagnosis and use of immunomodulators (thiopurines (azathioprine/6-mercaptopurine) and methotrexate) within 3 years after diagnosis were retrieved. Smoking status was defined as any smoking habit (>1 cigarette/day), at anytime.


We defined eight clinical outcomes of interest: ileal disease location (L1), colonic disease location (L2), ileocolonic disease location (L3), stenosing behaviour (B2), internal penetrating (fistulising) behaviour (B3), perianal disease, bowel resection, and extraintestinal manifestations occurring during follow-up. Bowel resection was defined as any resection of the small or small/large bowel and stricturoplasty, and excluded local surgery for perianal fistulising disease. Extraintestinal manifestations were defined to comprise joints (arthritis), skin (erythema nodosum and pyoderma) and eyes (iritis, episcleritis). In addition, complicated CD was defined as one or more of the following: stenosing or internal penetrating behaviour, perianal disease or bowel resection.

DNA genotyping

Genotyping was performed using an Illumina custom GoldenGate assay. DNA samples that presented low genomic DNA concentrations were first amplified with the GenomiPhi DNA Amplification kit (GE Healthcare, Buckinghamshire, UK). BeadStudio software (Illumina Inc, San Diego, California, USA) was used to manage the genotyping data. The following criteria were used to filter samples and single nucleotide polymorphisms (SNPs): samples with <95% call rate; SNPs with <95% call rate; and SNPs with poor amplification or poor genotype cloud clustering.

A full list of all studied SNPs is given in table S1 in the online supplementary information. All convincingly (p<5×10-08) replicated CD risk loci, and nominally replicated CD risk loci (p<0.05) from the CD GWAS meta-analysis by Barrett et al were included in this study.7 Rs11175593, located near MUC19 and LRRK2, was replaced by rs11176553; and rs2076756, located in NOD2, was replaced by the three known CD risk loci: rs2066844 (p.Arg702Trp), rs2066845 (p.Gly908Arg) and rs2066847 (p.Leu1007fsX1008). Because of technical limitations, the NOD2 frameshift mutation p.Leu1007fsX1008 could not be genotyped using the GoldenGate platform. Instead, KASPar chemistry (KBioscience, Berlin, Germany), a competitive allele specific PCR SNP genotyping system which uses fluorescence resonance energy transfer quencher cassette oligonucleotides was used.

Statistical analysis

Descriptive statistics were calculated as percentages for categorical data, and means (SDs) or medians (IQRs) for continuous data. Departure from Hardy–Weinberg equilibrium for biallelic genetic markers was tested using a fast exact test, as described in Wigginton et al 21 (see table S1 in online supplementary information). We tested potential genetic stratification in our cohort by using principal components analysis (PCA) using a total of 768 genotyped markers.22

Each individual phenotype–SNP association analysis was performed using likelihood ratio test, based on logistic regression models considering both codominant and additive modes of inheritance. The best model was selected using the Akaike information criterion (AIC). We estimated the crude OR and 95% CI. These analyses were performed using the SNPassoc R package.23 For the NOD2 gene, we considered the variants separately (p.Arg702Trp, p.Gly908Arg, p.Leu1007fsX1008) and globally (no risk allele, one risk allele or at least two risk alleles for any of the three NOD2 variants). Association between basic phenotypical and clinical characteristics, and each of the clinical outcomes was tested using logistic regression models (likelihood ratio test). For the outcomes of disease behaviour, bowel resection and extraintestinal manifestations, the disease location variables were included as test variables. To avoid false-positive results due to multiple testing, we applied the Bonferroni correction method: we corrected for 48 (43 SNPs and 5 clinical characteristics) in the case of disease location outcomes and 51 (43 SNPs and 8 clinical characteristics) for the other outcomes. In addition, the robustness of individual findings was investigated by bootstrap analysis: the selection procedure was repeated on 200 bootstrap datasets generated from the original sample, and the number of times a particular association was statistically significant (at different significance levels) between each polymorphism and the phenotype was recorded.

Multivariate logistic regression models, including variables with p<0.20 in univariate analysis, were estimated to identify factors independently associated with outcome. Multivariate analyses were performed by a forward stepwise procedure, using p<0.05 and p>0.10 from the likelihood ratio test as enter and remove criteria respectively. The results from estimated models were expressed as OR (95% CI). Multivariate analyses were performed using R (stepAIC function in library MASS). Receiver operating characteristic (ROC) curves were plotted to represent a sensitivity/specificity pair corresponding to a particular decision threshold.

Time to the clinical event (years from diagnosis to the development of stenosing behaviour, fistulising behaviour, perianal disease and bowel resection) was evaluated for all genetic and clinical parameters using Kaplan–Meier estimates. Multivariate Cox regression models were fitted, including all variables with p values <0.05 in the Kaplan–Meier analysis.

Finally, based on the significantly associated factors in the univariate analysis, we computed a genetic score for each clinical outcome by counting the total number of risk alleles across these variants. Kaplan–Meier estimations were used to analyse the association between the risk-allele score (grouped into different categories) and the probability of observing the specified phenotype. Survival curves were compared using the log-rank test.


Patient characteristics and cohort homogeneity

Patient characteristics are displayed in table 1. The low proportion of patients with inflammatory disease behaviour and the high proportion of patients with previous bowel resection are attributable to the fact that only patients with at least 10 years of follow-up (median follow-up 18 years) were allowed, and that patients were recruited at tertiary care referral hospitals. Homogeneity in patient characteristics among the eight recruiting European hospitals was demonstrated by the lack of significant differences in PCA analysis among sites.

Table 1

Patient characteristics (total number of patients N=1528)

Association of genetic and clinical factors with clinical outcomes (univariate)

Based on PCA analysis, there was no genetic population stratification that would otherwise interfere with data analysis (data not shown). The results of the univariate analysis, describing the degree of association of the genetic markers and clinical factors studied, with each one of the outcomes of interest are reported in detail in tables S2–S19 in the online supplementary information.

Several SNPs and clinical factors were convincingly associated (Bonferroni p<0.05) with ileal, colonic or ileocolonic disease location; stenosing or fistulising disease behaviour; perianal disease; bowel resection or complicated CD behaviour; as displayed in figures 1 and 2. The carriage of the NOD2 gene variants, considered either globally, as the presence or absence of any of the three NOD2 variants studied, or individually, as the carriage of the p.Leu1007fsX1008 or the p.Arg702Trp variants alone, showed the strongest association with most relevant CD clinical outcomes, except for perianal disease, in the univariate analysis (figures 1 and 2). The strongest clinical characteristic associated with most relevant CD clinical outcomes was early (within 3 years after diagnosis) immunomodulator use.

Figure 1

p Values of univariate association between single nucleotide polymorphisms (SNPs) and five different outcomes (with Bonferroni corrected SNPs) are shown. p Values are computed using the likelihood ratio test and the best genetic model as specified in the Methods section. Access the article online to view this figure in colour.

Figure 2

OR of association and 95% CI for those single nucleotide polymorphisms that passed Bonferroni correction for each outcome. NOD2 overall: no risk allele, one risk allele or at least two risk alleles for any of the three NOD2 variants; 12: heterozygous, 22: homozygous for minor allele. Access the article online to view this figure in colour.

Although we acknowledge that formal validation of our results will require their replication in an independent cohort, we next aimed at undertaking an internal validation by means of bootstrap techniques (see Methods for details). As shown in table S20 in the online supplementary information, most of the associations between clinical or genetic factors and CD outcomes of interest described in the univariate analysis were successfully confirmed by >75% of the bootstrap replicates (>150 out of 200), when the statistical significance was set at p values of 0.05 and 0.01 respectively. In the case of the NOD2 gene variants and immunomodulator use within the first 3 years after diagnosis, a very high degree of bootstrap replication was obtained, even for a p value of 1.00×e-04.

Association of genetic and clinical factors with clinical outcomes (multivariate)

Disease location outcomes

The multivariate analysis confirmed NOD2, SBNO2 and no immunomodulator use within 3 years after diagnosis (table 2) as independent predictive factors for ileal disease location. The presence of any NOD2 variant allele was also the strongest genetic factor associated with colonic involvement, but in this case showing a protective effect (table 2). In addition to NOD2 variants, the multivariate analysis confirmed the independent association of other genetic factors, including ZPBP and JAK2. Smoking was negatively associated, and an older age at diagnosis increased the probability of a colonic disease location (table 2). For ileocolonic disease location, IL23R, PTGER4 gene region, JAK2, age at diagnosis, and early immunomodulator use were confirmed to be independent predictive factors (table 2). Of note, when considering any ileal disease location (L1 or L3), NOD2 overall (p=6.96×10-04, OR=1.75 (95% CI 1.28–2.44)), smoking habit (p=1.52×10-03, OR=1.85 (95% CI 1.27–2.71)), ZPBP (p=6.40×10-03, OR=0.66 (95% CI 0.49–0.89)), age at diagnosis (p=6.85×10-03, OR=0.98 (95% CI 0.96–0.99)) and JAK2 (p=1.13×10-02, OR=1.44 (95% CI 1.09–1.91)) were found to be independent predictors. Absence of NOD2 mutations (p=6.97×10-04, OR=0.57 (95% CI 0.41–0.79)), early immunomodulator use (p=2.96×10-03, OR=3.12 (95% CI 1.55–6.99)), IL23R (p=2.19×10-02, OR=2.20 (95% CI 1.17–4.57)), LOC441108 (p=2.34×10-02, OR=0.70 (95% CI 0.51–0.95)) and SBNO2 (p=3.43×10-04, OR=1.52 (95% CI 1.04–2.27)) were independently associated with any colonic disease location (L2 or L3).

Table 2

Multivariate analysis—disease location

Disease behaviour outcomes

Immunomodulator use within the first 3 years after diagnosis was the strongest independent factor associated with stenosing CD phenotype (B2) in the multivariate analysis, with a p value of 1.48×e-06, and its use being protective. Carriage of NOD2 variants (OR=2.11 for at least two risk alleles in any of the NOD2 variants), ileocolonic disease location and JAK2 were also independently associated with stenosing behaviour (table 3). In the case of penetrating disease behaviour (B3), three genetic factors—PRDM1, NOD2 variants and IL23R—and male gender and ileocolonic disease location were independently associated (table 3). The strongest association with perianal disease in multivariate analysis was found with age at diagnosis (p=5.36×10-03). Other clinical characteristics included the use of an anti-tumour necrosis (TNF) factor agent within the first 3 years after diagnosis, ileocolonic disease location and smoking habit. The genetic factor PUS10 also had an independent effect (table 3). Note that the association of perianal disease with anti-TNF is most likely related to a prescription bias, resulting in a higher percentage of patients with perianal disease receiving anti-TNF agents.

Table 3

Multivariate analysis—disease behaviour

Other clinical outcomes

In line with the observed associations between carriage of NOD2 variants and stenosing disease behaviour, this was also the genetic factor with the strongest influence on need for bowel resection (table 4). The use of immunomodulators within the first 3 years after diagnosis, and the TNFSF15 and C13ORF31 SNPs were also found to be independent predictive factors for surgery. Smoking habit was borderline significant (p=5.29×10-02, OR=1.37 (95% CI 1.00–1.88)). Since it is unlikely that any therapy is preventive for surgeries performed around the time of diagnosis, and to further explore the effect of early aggressive therapy, we reanalysed surgical outcome excluding all patients in whom the first operation was performed within 12 months of diagnosis (n=244). Carriage of any NOD2 variant and the use of immunomodulators within 3 years after diagnosis were still the most important drivers for surgical outcome, but TNFSF15 and C13ORF31 were no longer independently associated (subgroup analysis in table 4).

Table 4

Multivariate analysis—other clinical outcomes

As expected, considering the key role we showed for the NOD2 gene variants on stenosing and penetrating behaviour and on bowel resection, NOD2 was also found to be the strongest factor independently associated with complicated CD clinical course (table 4). Four other genetic factors—LOC441108, SLC22A23, PRDM1 and TAB2/MAP3K7IP2—and immunosuppressant use within 3 years after diagnosis and ileocolonic disease location were independently associated with complicated CD (table 4). Finally, four genetic factors, PTPN22, ICOSLG, SLC22A23 and PUS10, were found to be independent predictive factors for the appearance of extraintestinal manifestations in patients with CD (table 4). It should be noted that the risk factors could be different for different types of extraintestinal manifestations (arthritis/arthropathy vs skin vs ocular manifestations).

As detailed in the Methods section, ROC curves were plotted for each of the models obtained using logistic regression. The area under the ROC curve ranged between 61.1% (penetrating behaviour) and 72.1% (complicated disease behaviour), and they thus only had low predictive capacity (see figures S1–S9 in online supplementary information).

Association of genetic and clinical factors with time to develop CD-related complications

While CD location has been recognised as a relatively stable characteristic over time, frequent changes in CD behaviour have been described after disease diagnosis, due to development of bowel stenosis, internal fistula and perianal disease.20 Similarly, surgery is also expected to occur over time in a majority of patients with CD at some point in life.20 For that reason, we aimed at investigating whether the included genetic and clinical factors are also associated with the time to develop each of these outcomes. Detailed results of the univariate Kaplan–Meier log-rank analysis are shown in table S21 in the online supplementary information. As in the univariate logistic regression analysis, carriage of the NOD2 gene variants, or of the p.Leu1007fsX1008 variant alone, early (within 3 years after diagnosis) immunomodulator use and disease location (ileal, colonic or ileocolonic) showed the strongest associations with time to stenosing disease behaviour and need for surgery (figure 3A,B). In addition, smoking habit was significantly associated with a shorter time to need surgery. We also observed a strong association between age at diagnosis and a shorter time to a complicated disease course (including stenosing or fistulising disease behaviour, perianal disease and need for surgery). This association can probably be attributed to an association between disease duration and a complicated disease course. Indeed, the mean disease duration for individuals without complications is 16.84 years, while the mean disease duration for those with complications is 21.44 years (p=2.2×10-16).

Figure 3

Kaplan–Meier curve of developing stenosing disease behaviour (A) or needing surgery (B) for early immunomodulator use (within 3 years after diagnosis). Access the article online to view this figure in colour.

Multivariate Cox regression analysis confirmed the carriage of any NOD2 variant, and the JAK2 variant as independently associated with a shorter time to develop a bowel stenosis, while immunomodulator use within the first 3 years after diagnosis was independently associated with a longer time (table 5). In the case of fistulising disease behaviour, male gender and the carriage of a variant allele for ATG16L1 and PRDM1 were associated with an earlier appearance of internal penetrating disease; carriage of a variant allele for IL23R was associated with a later appearance (table 5). Ileocolonic disease location, the LOC441108 variant and a younger age at diagnosis all led to a significantly shorter time to onset of perianal disease. The use of an anti-TNF agent within the first 3 years after diagnosis was also significantly associated with a shorter time to onset of perianal disease, but this is most likely due to a treatment bias, with patients with perianal disease receiving an anti-TNF agent faster than patients without, and not because of a causal relationship between anti-TNF therapy and perianal disease. Any NOD2 variant, TNFSF15 and smoking behaviour were independently associated with a shorter time to surgery, use of immunomodulators within 3 years after diagnosis and C13ORF31, with longer time to need surgery (table 5). Of the factors independently associated with complicated disease behaviour, carriage of NOD2 variants, LOC441108, SLC22A23 and TAB2/MAP3K7IP2, was also independently associated with a shorter time to develop complicated CD (table 5).

Table 5

Multivariate Cox regression analysis

Development of a genetic risk score

Considering all the genetic factors that were significant in the univariate analysis, we next calculated a combined score for each patient (as detailed in the Methods section). Interestingly, a large difference in probability to develop a fistulising behaviour (p=9.64×10-04, figure 4A) and need for surgery (p=7.12×10-03) was observed between patients with a low and a high score (figure 4B), with a respective HR of 1.43 (95% CI 1.16–1.79) and 1.35 (1.08–1.68) for the high versus low score outcomes. A more modest, but still statistically significant difference was observed according to this score in the probability to develop stenosing (p=3.01×10-02, HR=1.29 (95% CI 1.02–1.62), figure 4C) and complicated CD (p=1.22×10-02, HR=0.75 (95% CI 0.60–0.94), figure 4D). By contrast, there was no influence on the probability to develop perianal disease (HR=1.09 (95% CI 0.88–1.36), figure 4E). To test if these models would have additional value beyond the importance of NOD2, we recalculated the combined genetic risk score excluding NOD2 when applicable. The combined score of IL23R, LOC441108 and PRDM1 significantly contributes to penetrating disease behaviour (p=5.53×e-05, HR=1.61 (95% CI 1.28–2.04)) and LOC441108, SLC22A23 and TAB2 (MAP3K7IP2) to a complicated disease behaviour (p=3.47×10-03, HR=1.32 (95% CI 1.10–159)). There is no significant contribution of the combined score of JAK2 and ATG16L1 to stenosing disease behaviour (p=8.19×e-1, HR=0.98 (95% CI 0.80–1.19)), or of IRGM, TNFSF15 and C13ORF31 to need for bowel surgery (p=7.59×10-02, HR=1.19 (95% CI 0.98–1.45)). The other genes thus also confer risk to some of the clinical outcomes after being combined, and the models have additional value beyond the importance of NOD2.

Figure 4

Probability distribution function of developing each outcome depending on the genetic risk score. The genetic risk score is computed by counting the total number of risk alleles across the significantly associated factors in the univariate analysis for each outcome. Penetrating disease behaviour: IL23R, LOC441108, PRDM1, NOD2 (A). Bowel resection: IRGM, TNFSF15, C13ORF31, NOD2 (B). Stenosing disease behaviour: NOD2, JAK2, ATG16L1 (C). Complicated disease behaviour: LOC441108, SLC22A23, TAB2 (MAP3K7IP2), NOD2 (D). Perianal disease: LOC441108, LOC449915, FGFR10P (E). HRhigh vs low=HR for the high versus low score outcomes. Access the article online to view this figure in colour.


There is increasing evidence that early intervention with biological therapies has rapid and long-lasting benefits, including steroid sparing, and reduced number of hospitalisations and surgeries. The ability to define patients at high risk for a disabling disease course in an early stage is therefore of great significance. Previous studies have shown that prediction models using clinical and serological markers have too low diagnostic and prognostic specificity and sensitivity to be useful in clinical practice. Several companies (23andMe, DecodeME, etc) have been promoting whole-genome sequencing of individuals to predict risk for diseases. Unless good-quality studies show that genetic variants can predict diseases or disease courses, these efforts may be too premature and carry a risk for overinterpretation and/or alarming patients. In this study, we aimed at linking well established CD-associated SNPs and clinical characteristics with the clinical course of CD, and from this develop a predictive model to define patients at high risk for complicating disease.

NOD2, located on chromosome 16q12, was the first and still is the strongest identified CD susceptibility gene. Since its discovery in 2001, many studies have investigated its association with disease subphenotypes. Associations have been shown with ileal involvement, and development of stricturing or non-perianal fistulising complications.16 ,17 ,24 ,25 A recent systematic review and meta-analysis of published literature on association of NOD2 mutations with CD disease behaviour showed an increased risk for complicated disease and need for surgery when NOD2 mutations are present.26 They also found that while the predictive power associated with a single NOD2 mutation is weak, the presence of two NOD2 mutations had 98% specificity for complicated disease. The sensitivity however remained poor. In this study of patients with CD followed for at least 10 years, we confirm these findings: presence of NOD2 variant alleles increased the risk for ileal disease, while being protective for colonic disease. NOD2 variants were also associated with increased risk for stenosing and penetrating disease behaviour, and need for surgery, and as such were also strongly associated with complicated disease course. It is clear from figure 2 that the p.Leu1007fsX1008 homozygous group (genotype ‘22’) has a particularly high rate of stenosis and bowel resection (OR=3.89 and 3.35 respectively). The meta-analysis mentioned above was limited by the fact that most studies included did not differentiate between p.Leu1007fsX1008 heterozygotes and p.Leu1007fsX1008 homozygotes. A large study by Seiderer et al demonstrated a strongly increased risk of p.Leu1007fsX1008 homozygotes (n=19) for ileal stenoses and risk for surgery.27 The same group confirmed these findings in a prospective study.28 Jürgens et al found that the combined presence of fistulas and homozygosity of NOD2 mutations were particular strong predictors for intestinal stenosis.29 In our subgroup of 38 p.Leu1007fsX1008 homozygous patients, there was a strong association with ileal disease location (p=0.03, OR=2.01 (95% CI 1.04–3.86)), stenosing disease behaviour (p<0.001, OR=3.59 (95% CI 1.69–7.63)), need for surgery (p=0.003, OR=3.14 (95% CI 1.43–6.89)) and complicated disease behaviour (p=0.02, OR=4.87 (95% CI 1.17–20.32)) (data not shown). In clinical practice, although the prevalence of p.Leu1007fsX1008 homozygosity is less than 3%, ORs in the order of 3–4 would be meaningful.

A second significant finding was for JAK2, encoding an intracellular tyrosine kinase that transduces cytokine-mediated signals via the JAK–STAT pathway. JAK2 is associated with increased risk for ileal involvement (L1+L3), and stenosing disease behaviour. One mechanism by which JAK2 contributes to CD pathogenesis could be by altering intestinal permeability. Prager et al showed that patients carrying the C risk allele within JAK2 rs10758669 displayed significantly more often an increased permeability compared with patients without the C allele.30 Together with NOD2, JAK2 is a plausible candidate contributing to transmural inflammation and stricture formation. Besides, some other gene–phenotype associations were found. Of note, although only nominally significant in univariate analyses, we found that the CD risk allele (G allele) in ATG16L1—one of the two autophagy genes that have been strongly and reproducibly associated with CD susceptibility—is protective against colonic disease location, while increasing the chance for ileal disease. This is in line with the findings by Prescott et al who demonstrated a significant increase in frequency of the G allele of ATG16L1 in patients with ileal disease with or without colonic involvement (61.7%) compared with those with pure colonic disease (52.2%);31 and—because of the strong association of NOD2 with ileal disease location—also with the recent observations that NOD2 and ATG16L1 physically interact for the (auto)phagocytosis of bacterial antigens.32 ,33

An important finding in this study is the protective effect of immunomodulator use within 3 years after diagnosis on the rate of bowel resection and stenosing disease behaviour (see figure 3A,B). This is in line with earlier published data by Ramadas et al which show that patients who started earlier on thiopurines have reduced rates of surgery.34 In an inception cohort from Hungary, ‘very early’ (<18 months from diagnosis) and ‘early’ (<3 years) azathioprine use was associated with decreased risk of surgery in a propensity score analysis.35 We also included gender, age at CD diagnosis and smoking as covariates in the logistic regression models. It has been described that the latter two can influence the clinical course of CD.36 ,37 In this study, male gender was independently associated with increased risk for penetrating disease behaviour; smoking with ileal involvement (L1 or L3), perianal disease (risk) and protection against colonic disease location. Lower age at diagnosis was associated with increased risk for ileal involvement (L1 or L3), ileocolonic disease location and perianal disease. Overall, NOD2 variants, JAK2, early immunomodulator use and smoking habit were the most relevant factors influencing disease course.

With an area under the ROC ranging from 61% for penetrating disease behaviour to 72% for complicated disease behaviour, the logistic regression models only had poor predictive ability for specific disease phenotypes. Although this does not mean that the identified loci are not important in defining CD clinical course, they are relatively poor predictors. This could be for different reasons: the loci included in this study were mostly found through GWAS, implicating that they are strong susceptibility SNPs for CD overall as opposed to being associated with any of the subphenotypes. Recent work from the international IBD genetics consortium underscores the hypothesis that CD consists of disease susceptibility genes/loci on the one side, and disease-modifying genes/loci on the other.38 Second, many of the GWAS-identified loci have not yet been fine mapped to pinpoint the causal susceptibility variants. This leads to reduced power to build predictive models, and also makes it more difficult to speculate on possible biological meaning of the findings. Third, the CD-associated genetic loci all have modest relative risk scores,7 ,18 probably necessitating even larger sample sizes as in this study. Disease expression is furthermore dependent on a complex interaction between many genetic, environmental (smoking, diet, exposure to medication etc), microbial and clinical factors. Shifts in the composition of resident bacteria have been postulated to drive the chronic inflammation seen in CD. Whether alterations in the intestinal microbial composition are cause or consequence of disease remains to be determined.39 ,40 In either case, the shifts in microbiota composition may be important factors in disease maintenance and severity. We did include some clinical factors as covariates in this study, but due to its retrospective nature, this was not an exhaustive list. Reliable disease phenotype prediction will therefore require multifactorial analyses in prospective study setups. Finally, although speculative, it is possible that the currently used classification of patients with CD based on extent and location of disease is inadequate, at least in the context of genetic markers. We previously showed that, when reclassifying patients based solely on genetic markers, and subsequent correlation of the obtained subgroups with the classically used clinical subphenotypes, the genetic-based subgroups could not be explained adequately by the known clinical subphenotypes.41

Although, based on the results of our study, the genetic profile of an individual seems unlikely to allow the development of a diagnostic test to predict clinical course of the disease, genetic testing could still be clinically useful. The distribution of the number of risk alleles differs, with patients carrying more risk alleles having a higher prevalence of surgery, fistulising or stenosing disease behaviour (figure 4). This is in line with what was shown by Weersma et al, who found that an increase in the number of risk alleles was associated with an increased risk for CD and with a more severe disease course.42

Our study has several strong points. It comprises a large multicentre cohort with homogeneity in patient characteristics and included patients with a follow-up of at least 10 years. Different statistical methods were used that independently proved the association of the clinical and genetic factors to the specified disease outcomes. We acknowledge that our study also has some limitations. As mentioned above, the included SNPs as identified through GWAS might not be the causal SNPs, possibly leading to a loss in power to build good prediction models. The proportion of heritability accounted for by the currently known susceptibility loci is about 20–25%,7 ,18 and might explain too little of the genetic part of the disease risk to be of clinical relevance. The main limitation of the study is its retrospective design. The results from this study should therefore be complemented with results from prospective studies addressing genetic, environmental and clinical factors as predictors for CD clinical course.

In conclusion, while some genes confer susceptibility to CD overall, we confirm that others probably predispose people to certain subphenotypes of CD. Except NOD2 and early immunomodulator use, most of these loci, together with basic phenotypical characteristics, are only poor predictors for CD clinical course. Still, even without NOD2, the other genetic factors do confer risk to specific clinical outcomes, after being combined. Prospective study designs addressing clinical, genetic, microbial and environmental factors, and the complex interaction among them, are needed to unravel the contribution to specific disease expression. Once identified, these need to be implemented in prediction models to define subgroups of patients that would benefit from targeted therapies at an early stage.


The authors thank A A van Bodegraven (MD, PhD, Department of Gastroenterology, VU University Medical Center, Amsterdam, The Netherlands), V Ballet (Department of Gastroenterology, UZ Leuven, Leuven, Belgium), P Naccarato and G Fiorino (Department of Internal Medicine, IRCCS Istituto Clinico Humanitas, Rozzano, Milan, Italy), and L Spina (MD, University of Milan, IRCCS Policlinico San Donato, Milan, Italy) for valuable contribution in acquisition of patient data and material, and feedback.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors All authors are justifiably credited with authorship, according to authorship criteria. In detail, study concept and design: IC, JRG, MA, JBAC, MV, SD, RAV, JP, SAP, ML, DPJ, SS, SV, MSa; acquisition of data: IC, CF, AF, DM, MB, JBAC, MV, JB, SD, RAV, JP, SAP, ML, DPJ, SS, SV, MSa; analysis and interpretation of data: IC, JRG, CF, MA, DA, EA, MS, SV; drafting of the manuscript: IC, MS; critical revision for important intellectual content: JRG, JBAC, MA, JP, DJ, SV; technical and material support: MA, MSz, DA; final approval given: IC, JRG, CF, AF, DM, MB, JBAC, MV, MA, MSz, JB, DA, EA, JSD, RAV, JP, ASP, ML, DJ, SS, SV, MSa.

  • Funding This study was supported by the European Commission through the Sixth Framework Programme (FP6) (grant number LSHB-CT-2006–037319). I Cleynen is a postdoctoral fellow and S Vermeire is a clinical researcher of the Fund for Scientific Research (FWO), Flanders, Belgium.

  • Competing interests None.

  • Ethics approval The ethical boards of each separate recruiting institution approved the study.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Data are available on request from the corresponding author.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles