Objective In the past two decades, approximately 1000 reports have been published regarding associations between genetic variants in candidate genes and risk of colorectal cancer (CRC). Study results are inconsistent. We aim to provide a synopsis of the current understanding of genetic factors for CRC risk through systematically evaluating results from previous studies.
Design We searched PubMed and Google Scholar to identify papers that investigated associations between genetic variants and CRC risk and published through 25 December 2012. With data from 950 papers, we conducted 910 meta-analyses for 267 genetic variants in 150 candidate genes with at least three data sources. We used Venice criteria and false-positive report probability tests to grade levels of cumulative epidemiological evidence of significant associations with CRC risk.
Results Sixty-two variants in 50 candidate genes showed a nominally significant association with CRC risk (p<0.05). Cumulative epidemiological evidence for a significant association with CRC risk was graded strong for eight variants in five genes (adenomatous polyposis coli (APC), CHEK2, DNMT3B, MLH1 and MUTYH), moderate for two variants in two genes (GSTM1 and TERT), and weak for 52 variants in 45 genes. Additionally, 40 variants in 33 genes showed convincing evidence of no association with CRC risk in meta-analyses including at least 5000 cases and 5000 controls.
Conclusions Approximately 4% of genetic variants evaluated to date in candidate-gene association studies showed moderate to strong cumulative epidemiological evidence of an association with CRC risk. These genetic variants, if confirmed, may explain approximately 5% of familial CRC risk.
- Colorectal Cancer
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Significance of this study
What is already known about this subject?
Colorectal cancer (CRC) is one of most commonly diagnosed cancers in the world.
Approximately 35% of CRC risk could be attributable to inheritable factors.
Many studies have been conducted to evaluate associations between genetic variants in candidate genes and risk of CRC over the past two decades—with inconsistent results.
What are the new findings?
This study is the largest, most comprehensive assessment of the literature to date regarding genetic association studies in CRC risk.
Of the 267 variants evaluated, 62 variants in 50 candidate genes showed a statistically significant association with CRC risk.
Eight variants in five genes showed strong cumulative evidence of association with CRC risk, and two variants in two genes showed moderate evidence.
This study provides clues for designing future studies to further investigate genetic risk factors for CRC.
How might it impact on clinical practice in the foreseeable future?
Genetic risk variants may be used to identify high-risk individuals for CRC screening and prevention.
Colorectal cancer (CRC) is the third-most common cancer, and the second leading cause of cancer deaths worldwide.1 Genetic factors play an important role in CRC development.2–6 High-penetrance germline mutations in the adenomatous polyposis coli (APC), MUTYH, SMAD4, BMPR1A, STK11, and mismatch repair genes have been identified to account for about 6% of CRC cases (table 1).6–13 Since 2007, common genetic variants in approximately 21 loci have been identified through genome-wide association studies (GWAS) (table 2).14–24 GWAS-identified variants, however, are associated with weak to moderately elevated risk of CRC, and explain approximately 8% of the familial risk of CRC.20 ,21
In addition to GWAS, approximately 1000 papers have been published over the past 25 years investigating genetic variants in candidate genes in relation to CRC risk. Because of the limitation of single nucleotide polymorphism (SNP) arrays used in GWAS, many genetic variants evaluated in candidate gene association studies have not been adequately investigated in GWAS. Results from previous candidate gene studies have been inconsistent and are difficult to interpret. Most findings from candidate gene association studies cannot be replicated. Furthermore, sample size from most previous candidate gene association studies was small, so these studies often do not have adequate power to detect a true association. Meta-analysis is a useful tool to systematically evaluate available results published to date to assess evidence for a true association. By pooling data from multiple studies, meta-analysis can increase statistical power and evaluate consistency of association, a major criterion for determining causality. Recently, an interim guideline, named Venice criteria, has been used to systematically grade the cumulative evidence of genetic associations.25 ,26 Systematic field synopses and meta-analyses have been used to evaluate the association of genetic variations in candidate genes with several diseases, including Alzheimer's disease,27 schizophrenia,28 breast cancer,29 cutaneous melanoma,30 and Parkinson's disease.31 Herein, we sought to systematically collect and comprehensively evaluate all candidate-gene association studies of CRC risk, perform meta-analyses for variants with at least three independent datasets, and provide a systematic synopsis of our current understanding of the genetic basis of CRC risk.
Search strategy and selection criteria
Literature searches were conducted through a two-stage strategy (figure 1). In Stage 1, we searched the PubMed database using key terms ‘(colorectal cancer OR colon cancer OR rectal cancer) AND association’ before 1 October 2010. This search yielded 8443 potentially relevant articles which were screened for eligibility by title, abstract, or full text as necessary—428 reports, which included 1036 potential candidate genes, then met eligibility criteria. In Stage 2, conducted 1 October 2010 through 25 December 2012, we used four supplementary approaches to query PubMed and Google Scholar: (1) monthly database queries for ‘CRC’ and the 1036 gene names identified in Stage 1, such as ‘MTHFR’; (2) monthly queries using ‘CRC OR colon cancer OR rectum cancer’; (3) searching references and related articles of all gathered papers and (4) checking previously published meta-analyses and reviews. These four searches identified 48 521 additional reports, of which 522 met our inclusion criteria, adding genetic variants in 342 additional candidate genes. In Stages 1 and 2 combined, we screened a total of 56 964 articles, identifying 945 which reported 3603 variants in 1378 independent candidate genes which met our criteria for further analysis.
Studies were eligible for inclusion in this meta-analysis if they met the following criteria: (1) data were published in a peer-reviewed journal in English; (2) the study used a case-control, cohort, or a cross-sectional design in human beings; (3) the study provided sufficient information for the genotypic or allelic distribution of individual variants for both CRC cases and controls, and (4) CRC cases were diagnosed by pathological and/or histological examination. We did not include in the meta-analyses the following two groups of variants: (1) high-penetrance germline mutations in known CRC susceptibility genes, and (2) risk variants identified and confirmed in recent GWAS (table 2). When multiple publications reported on the same or overlapping data, we used the most informative or most recent publication. Only data from original published papers were included in the present analysis. All variants, regardless of their minor allele frequency (MAF), were considered for meta-analyses when genotype counts or allelic counts were provided in the original studies.
Data extraction and management
All data were extracted by two authors (XM and BZ), and disagreement was resolved by discussion. We recorded first author, year of publication, study name, geographic location of study, ethnicity, PubMed identification number, study design, sample size, mean ages of cases and controls, sample source, genes, variants, major and minor alleles, genotype counts or allelic counts for cases and controls, and Hardy–Weinberg equilibrium (HWE) in controls. Ethnicity was classified as African descendants, Asian (East Asian descent), Caucasian (European descent), or Other (including mixed), based on ethnicity of at least 80% of the study population.32 If ethnicity was not reported, we considered ethnicity of the source population where the study was conducted.32 Finally, if a report included several sources or study populations, data were extracted separately.
Statistical analysis and evaluation of cumulative evidence
Statistical analyses were performed by STATA, V.11.0. All tests were two-sided, and p<0.05 was considered statistically significant unless otherwise stated.
Summary ORs with 95% CIs for alleles and genotypes, were used to assess strength of associations between genetic variants and CRC risk by the random-effects method.33 Genotype counts or allelic counts for cases and controls from each original study were used to estimate summary ORs. We did not use adjusted ORs to estimate summary ORs since inconsistent covariates were used for adjustment in original studies included in this meta-analysis. In the primary analyses, we evaluated common variants (MAF≥0.05) using additive model and rare variants (MAF<0.05) using dominant model. For some common variants, a few original studies did not provide sufficient data for analyses with additive model, and thus, the dominant/recessive model was applied in the primary analyses. For some specific variants, we used the conventional comparisons in original studies, like GSTM1 ‘Present/Null’, NAT2 phenotype (predicted by genetic variants) and MUTYH rs36053993 in the primary analyses. We also conducted subgroup analyses by ethnicities. Dominant and recessive models were also used to assess associations between genetic variants and CRC risk, if available. Meta-analyses were performed only for variants with at least three independent datasets. Because major and minor alleles can be reversed in populations of different ethnicities, averaged MAFs across studies might be greater than 50%. When this occurred, the minor allele among Caucasian populations was used as the minor allele in all analyses. For genetic variants other than SNPs, the less prevalent variant or trait was evaluated for associated effects unless otherwise stated. HWE among control groups in each study was assessed by Fisher's exact test to compare observed and expected genotype frequencies.34 We conducted power analysis to evaluate the statistical power of meta-analyses in detecting an association (ie, OR=1.15) with certain allele frequency (ie, MAF=0.10) under the additive genetic model, assuming an α of 0.05.35 We calculated the proportion of the familial risk of CRC based on the formula provided by Houlston et al.20
To determine heterogeneity, we performed Cochran's Q test36 and calculated the I2 statistic to quantify the proportion of total variation due to heterogeneity.37 Heterogeneity was considered significant if p<0.10. Generally, I2 values <25% correspond to no or little heterogeneity, values 25–50% correspond to moderate heterogeneity, and values >50% correspond to strong heterogeneity between studies. Potential small-study bias was assessed with a modified Egger test by Harbord et al.38 We also evaluated if there was any excess in studies with positive findings than expected using the method described by Ioannidis and Trikalinos.39 To evaluate small-study bias and excessive significant findings, we used p<0.10 as the significant level, as recommended.38 ,39 For variants showing statistically significant association with CRC risk, sensitivity analyses were performed to determine if the association would be lost when the first published or first positive report was excluded, or when all studies that deviated from HWE in controls were excluded.
For statistically significant associations identified by meta-analyses, Venice criteria were applied to assess cumulative evidence (see web appendix notes for Venice criteria). Venice criteria details are published elsewhere.25 For the amount of evidence, we did not apply this criterion for rare variants with frequency <1% since an A grade is virtually unobtainable.29 For protection from bias, we also considered GWAS results for all common SNPs (MAF≥5%). If a common variant that can be adequately tagged by GWAS chips was not identified by GWAS, that variant would be downgraded for its evidence of association with CRC risk. Cumulative epidemiological evidence of significant associations in meta-analyses were considered strong if all three grades were A, moderate if all three grades were A or B, and weak if any grade was C. We also performed false-positive report probability (FPRP) analysis to determine if a significant association can be excluded as a false-positive finding. We used the approach developed by Wacholder et al40 to calculate FPRP for the 62 significant associations. We used prior probability of 0.05 to estimate FPRP value for each of the 62 associations based on p value and OR obtained from meta-analysis. FPRP<0.05, 0.2≤FPRP≤0.05, and FPRP>0.2 were considered strong, moderate and weak evidence of true association, respectively. We upgraded cumulative evidence from moderate to strong, and from weak to moderate, if evidence of true association based on the FPRP analysis was strong. We downgraded cumulative evidence from strong to moderate, and from moderate to weak if evidence of true association was weak. For the 25 significant associations derived from subgroup analysis of different ethnicities or under dominant or recessive model, we also assessed significance based on Bonferroni corrected p value (5.49×10−5=0.05/910). Regardless of Venice criteria and FPRP grades, we assigned weak evidence of association credibility if p value>5.49×10−5.
A total of 945 articles reporting 3603 variants in 1378 independent genes were eligible for our analysis (figure 1). Most of these reports (n=884, 93.5%) were published since 2000. We conducted 910 meta-analyses for 267 variants (241 common and 26 rare) in 150 genes that had at least three data sources (figure 1). For the 267 main meta-analyses with the use of all available data, mean sample size was 9633 (range: 519–76 991) from a mean of seven (range: 3–68) independent studies (see web appendix table 1).
Among the main meta-analyses, 37 (13.9%) variants within 28 genes showed nominally significant association (p<0.05) for CRC risk (table 3; see web appendix table 2: references used; web appendix table 3). The 37 variants are not in linkage disequilibrium (r2<0.1). Mean pooled sample size in the 37 meta-analyses that showed significant association was 15 912 (range: 1730–51 971), drawn from an average of 11 independent studies (range: 3–56). Approximately 10-fold elevated risk of CRC risk showed association with MUTYH biallelic mutations. Strong associations with CRC (ORs 2.0–10.0) were detected for four rare variants (MLH1 rs121912963, OR=2.74; MLH1 rs63750447, OR=2.14; MUTYH rs34612342, OR=3.32; MUTYH rs36053993, OR=6.49). Moderate associations with CRC (ORs 1.5–2.0 or 0.50–0.67) were found for three rare variants (APC rs1801155, OR=1.96; CHEK2 rs17879961, OR=1.56; CHEK2 1100delC, OR=1.88) and two common variants (DNMT3B rs1569686, OR=0.57; MLH1 rs1800734, OR=1.51). Associations with CRC risk, ORs 0.67–1.50, were observed for the remaining 27 variants, of which most are common. Four of the 37 positive variants (MLH1 rs1800734; MUTYH biallelic mutations; CHEK2 rs17879961; DNMT3B rs1569686) showed highly significant association with CRC risk at p<5×10−7; 13 showed association with CRC risk at p<0.01, and the remaining 20 had p<0.05 (table 3).
Of the 267 meta-analyses of all available data, 120 (44.9%) had little or no heterogeneity, 43 (16.1%) had moderate heterogeneity, and 104 (39.0%) had strong heterogeneity. The proportion of studies with strong heterogeneity was significantly lower for the 37 positive variants (table 3) than the remaining 230 variants (19% vs 42%, Fisher's exact p<0.01). Small-study bias was detected for 36 variants (13.5%), of which seven were positive variants. Of the 267 variants, 38 (14.2%) showed evidence of excess studies with significant findings including four positive variants. When considering all studies included in 267 meta-analyses as a whole, the number of studies with significant findings was also greater than that expected (666 vs 301, p<0.0001).
In sensitivity analyses, nine SNPs (rs7849, rs1800469, rs3025039, rs1048943, rs689466, rs1544410, rs2854746, rs1800629, G4C14/A4T14) became non-significant after exclusion of HWE-violating studies, and 13 variants (rs2854746, rs121912963, rs63750447, rs26279, rs1950902, MUTYH monoallelic mutation, NAT2 fast/slow, rs2066844, rs2066847, rs1800629, G4C14/A4T14, rs2076485, rs1544410) became non-significant after exclusion of the first positive or first published report.
We next calculated FPRP value at the prior probability, 0.05, to evaluate the probability of true association with CRC risk for the 37 positive variants from the main analyses. Associations with CRC risk had a FPRP value <0.05 for nine variants in seven genes (APC rs1801155, CHEK2 1100delC and rs17879961, DNMT3B rs1569686, GSTM1 deletion, MLH1 rs1800734, MUTYH biallelic mutations, rs36053993, TERT rs2736100), FPRP 0.05–0.2 for six variants in five genes (GSTT1 deletion, MMP1 rs1799750, MSH3 rs184967 and rs26279, PTGS1 rs5788, VDR rs11568820), and FPRP>0.2 for the remaining 22 variants (table 3).
Epidemiological credibility of significant associations was graded for the 37 positive variants identified through the main analyses (table 3 and see web appendix table 3). We first applied Venice criteria. Grades of A were given to 25, 22 and 9 meta-analyses for amount of evidence, replication of association and protection from bias, respectively. Grades of B were given to 7, 8 and 1 meta-analyses for amount of evidence, replication of association, and protection from bias, respectively. Grades of C were given to 0, 7 and 27 meta-analyses for these three criteria, respectively. Next, strong, moderate and weak for evidence of true association with CRC risk were assigned to 9, 6 and 22 variants, respectively, based on FPRP. For MUTYH rs34612342, we disregarded FPRP value (FPRP=0.533) when evaluating cumulative evidence because this mutation is pathogenic and has strong evidence to increase the risk of developing multiple adenomatous polyps and CRC.41 Altogether, eight variants in five genes (APC rs1801155, CHEK2 1100delC and rs17879961, DNMT3B rs1569686, MLH1 rs1800734, MUTYH biallelic mutations, rs34612342, rs36053993), were graded strong for evidence of association with CRC risk using combined Venice criteria and FPRP results. Two variants (GSTM1 present/null, TERT rs2736100) scored moderate for evidence of association with CRC risk. The remaining 27 variants scored C in one or more Venice criteria or were downgraded due to high FPRP. These variants were graded weak for cumulative evidence of association with CRC risk, based on combined Venice criteria and FRPR results.
Next, we performed stratified meta-analyses by ethnicity for 207 variants among Caucasians and 34 variants among Asians (see web appendix table 5) and identified eight additional variants from eight genes to be nominally associated with CRC risk (p<0.05, table 4 and see web appendix table 3). Six of them (rs16260, rs28362491, rs1800566, rs1052133, rs1801394, rs7903146) were associated with CRC risk only in Caucasians; the other two (rs20417, rs1042522) were associated with CRC risk only in Asians. We also performed meta-analyses using dominant and recessive models to evaluate associations of genetic variants with CRC risk, identifying 17 additional variants across 17 genes showing significant association, although none were statistically significant in additive model (table 5, and see web appendix table 4). Similar to the 37 positive variants identified in the main analyses, we applied Venice criteria and FPRP to evaluate these 25 variants. We also considered Bonferroni corrected p value. All were graded weak for cumulative evidence of association with CRC risk.
The vast majority of meta-analyses performed in this project (205 variants in 130 genes) did not yield any evidence of significant association. These meta-analyses included a mean of six studies (range 3–34) and 7916 participants (range 519–36 982). table 6 shows results for 40 variants from 33 genes that showed no evidence of association with CRC risk in meta-analyses with a minimum of 5000 cases and 5000 controls.
To our knowledge, this study is the largest and most comprehensive assessment of the literature regarding candidate-gene association studies for CRC risk conducted to date. We systematically evaluated data for 3603 variants in 1378 independent candidate genes from 950 reports published in the past two decades. Several meta-analyses have been conducted to evaluate candidate-gene association studies of CRC risk for single gene or several genes. These early analyses, however, were limited to 52 variants in 34 genes (see web appendix table 6). Recently, Theodoratou et al42 evaluated genetic variants for CRC risk using data from 635 publications and conducted meta-analyses for 92 polymorphisms in 64 genes, including 18 variants identified from GWAS studies. We did not include GWAS-identified risk variants in this study since they have been robustly replicated and should be considered to have strong evidence of association. Our study not only provides an update of the variants meta-analysed previously using data from more studies and a bigger sample size, but also assessed more than 193 variants that have not been assessed in any previous meta-analyses, including the meta-analysis conducted by Theodoratou, et al.42 Of the 267 variants in 150 genes summarised by our 910 meta-analyses, 62 variants in 50 genes showed nominally significant association with CRC risk. Using Venice criteria plus FPRP results, we graded eight variants strong for cumulative epidemiological evidence of association with CRC risk (APC rs1801155, CHEK2 1100delC and rs17879961, DNMT3B rs1569686, MLH1 rs1800734, MUTYH biallelic mutations, rs34612342, rs36053993), two variants moderate for cumulative evidence of association with CRC risk (GSTM1 Present/Null, TERT rs2736100), and the remaining 52 variants weak. Of the eight strong variants, MUTYH rs36053993 was also rated as having ‘strong’ evidence for association in Theodoratou's study.42 For 40 variants in 33 genes, we showed no evidence of association with CRC risk in meta-analyses with large sample sizes (10 000 individuals minimum). Our study provides a comprehensive research synopsis of candidate-gene association studies of CRC risk. Results from this study will be helpful for future studies to evaluate genetic risk factors for CRC.
The APC gene, a tumour suppressor gene at chromosome 5q21, encodes a large multidomain protein including 2843 amino acids that play a central role in the Wnt singling pathway.43 Germline pathogenic mutations in the APC gene result in autosomal dominant inherited familial adenomatous polyposis in which more than 100 adenomatous polyps can develop.3 ,6 Our meta-analysis provides strong evidence of association for CRC risk with a heterozygous variant at codon 1307 in exon 15 of the gene (rs1801155), with a 1.96-fold increased risk of CRC in Jews (including Ashkenazi and Israeli Jews). This variant is present in 7% of Ashkenazi Jews, while population frequency is very low in Europeans and Asians (based on HapMap data).
The CHEK2 gene maps to chromosome 22q12.1 and encodes a protein kinase that is activated in response to DNA damage and is involved in cell-cycle arrest.44 Our meta-analysis revealed strong evidence of association with CRC risk for a truncating mutation at codon 381 in exon 10 (1100delC) and a missense polymorphism in exon 3 (rs17879961, Ile157Thr). The 1100delC mutation leads to kinase-deficient molecules due to protein truncation,45 while Ile157Thr results in a CHEK2 protein with deficient binding and phosphorylation of downstream substrates.46 Interestingly, in a previous meta-analysis, we found strong cumulative evidence of association for these two variants with breast-cancer risk,29 indicating the CHEK2 gene may play a role in both CRC and breast cancer.
Our meta-analyses revealed strong evidence for an association of CRC risk with three rare variants in the MUTYH gene based on data from 17 population-based studies excluding cases with MUTYH-associated polyposis. Biallelic mutations in the MUTYH gene mainly constitute either homozygotes (two same) or compound heterozygotes (two different) of Gly382Asp and Tyr165Cys. Gly382Asp and Tyr165Cys are located in exon 7 and exon 13 of the MUTYH gene, respectively, and have been predicted to be deleterious by SIFT47 and confirmed to be pathogenic.41 However, the monoallelic mutation, including a heterozygous genotype of 12 mutations in the MUTYH gene showed only weak evidence for association with CRC risk in our study. Two common variants (MLH1 rs1800734, DNMT3B rs1569686) showed strong cumulative evidence of association with CRC risk. MLH1, which maps to chromosome 3p22.2, is a human homologue of the E. coli DNA mismatch repair gene mutL and is a locus frequently mutated in hereditary nonpolyposis colon cancer (HNPCC).48 Approximately 85% of genetically defined HNPCC patients have germline mutations in the MLH1 gene.49 Interestingly, meta-analysis of five studies, comprised of 801 microsatellite instability high (MSI-H) cases and 10 890 controls, identified a highly significant association of rs1800734 (−93G>A) with MSI-H CRC (p=1.67×10−12). This promoter SNP showed a much stronger association with MSI-H CRC (OR=1.51) than overall CRC cases (OR=1.05, p=0.013) based on meta-analysis of six studies: 17 174 cases, 13 166 controls. The DNMT3B gene plays an important role in the generation of aberrant methylation in carcinogenesis.50 Although this gene was not identified as a susceptibility locus for CRC by GWAS, we still rated the SNP (rs1569686) in this gene as having strong evidence for association given the highly consistent results across studies included in our meta-analysis.
Two common variants (GSTM1 null, TERT rs2736100) scored moderate for cumulative evidence of association with CRC risk, and both of them were upgraded from ‘weak’ for having a low FPRP (<0.05). Additional investigations of these variants are needed, particularly since sample sizes of studies for both variants are relatively small. Cumulative epidemiological evidence of association with CRC was weak for the remaining 52 variants, many of which are common and were identified through ethnicity-specific meta-analyses or meta-analyses using dominant or recessive models. Well-designed studies with large samples are warranted to clarify association with CRC for these variants.
Our meta-analysis provides no evidence for association with CRC risk for 205 of the 267 variants evaluated in our study, supporting the notion that the vast majority of genetic variants evaluated in candidate gene association studies may not be truly related to CRC risk. Methodological limitations in previous candidate gene studies, such as small sample size, may explain some of the null associations. However, of the 205 non-significant variants, 40 variants in 33 genes showed no association with CRC risk in meta-analyses including a minimum of 5000 cases, 5000 controls, which provides approximately 85% power to detect an OR of 1.15 under the additive model for a variant with MAF 0.10, Type 1 error 0.05. Thus, future epidemiological studies with a similar sample size are unlikely to be helpful in assessing effects of these variants.
There are several limitations of this study. First, although we have systematically searched the literature to identify eligible studies using two stages, it is possible that some studies might have been missed. PubMed was the main database we used for our literature search. To expand our search, we also queried Google Scholar which links multiple databases. Compared with previous meta-analyses which also used multiple databases (see web appendix table 7), we yielded more studies with a bigger combined sample size for most variants included in our evaluation. Second, we did not assess gene–gene or gene–environment interactions. Additional studies specifically designed to identify these interactions are needed. Third, heterogeneity across studies, including differences in study populations, study designs and genotyping platforms, may have contributed to some of the null associations in this study. More than one-third of the meta-analyses had high heterogeneity, especially for variants with non-significant association. We attempted to address study heterogeneity through stratification analyses by ethnicity. Other sources of heterogeneity also exist and are difficult to address in this meta-analysis because of limited available data. Finally, Venice criteria use p value <0.05 as significance level to determine association. However, we found most associations with a p value 0.005–0.05 to have weak evidence for association with CRC in this study. Thus, a more stringent threshold of p value would be helpful to evaluate evidence for a true-positive association. Additionally, Venice criteria offer the advantage of evaluating multiple sources of potential bias, some of which, such as genotyping error, phenotype misclassification and population stratification, are difficult to assess in meta-analyses.
In our meta-analyses, we identified 10 genetic variants showing strong or moderate epidemiological evidence of associations with CRC risk. If all these 10 variants are confirmed to be associated with CRC risk, they could explain approximately 5% of familial CRC risk in European populations. Nevertheless, genetic risk factors identified to date account for less than 30% familial risk of CRC. Some of the missing heritability could be due to methylation markers, copy number variations, structural variants, and rare variants, for which conventional candidate gene association studies and GWAS are inadequate to investigate. Gene–gene and gene–environment interactions may also play a significant role in the aetiology of CRC. Additional research, including those with a large sample size, use of higher density SNP arrays and next-generation sequencing technologies, imputation using data from the 1000 Genomes Project and better defined CRC subtypes, are needed to clarify the missing heritability of CRC. Our study, the largest field synopsis conducted to date for CRC candidate gene association studies, not only summarises the current literature regarding genetic epidemiology of CRC, but also provides comprehensive data and helpful clues for designing future studies to further investigate genetic risk factors for CRC.
We thank the authors of many original studies for clarification of data and providing additional information, and Mary Jo Daly for her help with manuscript preparation. This research is supported in part by NIH grant R37 CA070867 and Ingram Professorship funds.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online appendix
Contributors XM and BZ conducted literature searches, data extraction, quality assessment and analyses. XM and BZ drafted the manuscript with substantial contributions from WZ. WZ reviewed results and provided guidelines for presentation and interpretation.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.