Background Oesophageal squamous cell carcinoma (OSCC) is considered a difficult cancer to cure. The detection of environmental and genetic factors is important for prevention on an individual basis.
Objective To identify groups at high risk for OSCC by simultaneously analysing both genetic and environmental risk factors.
Methods A multistage genome-wide association study of OSCC in Japanese individuals with a total of 1071 cases and 2762 controls was performed.
Results Two associated single-nucleotide polymorphisms (SNPs), as well as smoking and alcohol consumption, were evaluated as genetic and environmental risk factors, respectively, and their interactions were also evaluated. Risk alleles of rs1229984 (ADH1B) and rs671 (ALDH2) were highly associated with OSCC (odds ratio (OR)=4.08, p=4.4×10−40 and OR=4.13, p=8.4×10−76, respectively). Also, smoking and alcohol consumption were identified as risk factors for OSCC development. By integrating both genetic and environmental risk factors, it was shown that the combination of rs1229984 and rs671 risk alleles with smoking and alcohol consumption was associated with OSCC. Compared with subjects with no more than one environmental or genetic risk factor, the OR reached 146.4 (95% CI 50.5 to 424.5) when both environmental and genetic risk factors were present. Without the genetic risks, alcohol consumption did not correlate with OSCC. In people with one or two genetic risk factors, the combination of alcohol consumption and smoking increased OSCC risk.
Conclusions Analysis of ADH1B and ALDH2 variants is valuable for secondary prevention of OSCC in high-risk patients who smoke and drink alcohol. In this study, SNP genotyping demonstrated that the ADH1B and/or ALDH2 risk alleles had an interaction with smoking and, especially, alcohol consumption. These findings, if replicated in other groups, could demonstrate new pathophysiological pathways for the development of OSCC.
- Esophageal cancer
- cancer epidemiology, cancer prevention
- gene mutation
- oesophageal cancer
Statistics from Altmetric.com
- Esophageal cancer
- cancer epidemiology, cancer prevention
- gene mutation
- oesophageal cancer
Significance of this study
What is already known about this subject?
Oesophageal squamous cell carcinoma (OSCC) is associated with drinking and smoking alcohol, but the genetic risk is unknown.
What are the new findings?
This study demonstrates that single nucleotide polymorphisms of ADH1B and ALDH2 interact with alcohol consumption, especially when combined with smoking, to increase OSCC risk.
How might they impact on clinical practice in the foreseeable future?
The analysis of ADH1B and ALDH2 variants would be valuable for individualised prevention of OSCC.
Oesophageal squamous cell carcinoma (OSCC), but not adenocarcinoma, is relatively common in East Asia, including Japan.1 Oesophageal cancer is the eighth most common cancer world wide, accounting for 462 000 new cases in 2002, and the sixth most common cause of cancer-related death (386 000 deaths). OSCC is the most common histological type world wide,2 and is a treatment-resistant cancer that can withstand a combination of surgery, chemotherapy and radiotherapy.1 It is difficult to diagnose OSCC early because it shows few symptoms in its early stages. Furthermore, there is no effective marker for predicting the development of OSCC. Therefore, it is important to detect risk factors for primary prevention and also to identify high-risk groups for secondary prevention.
Both genetic and environmental factors are involved in the pathogenesis of OSCC. Although smoking and alcohol consumption have been demonstrated as lifestyle factors that contribute to the development of the disease,3 the DNA sequence variations that confer an additional risk of developing the disease remain largely unknown. The availability of high-resolution linkage disequilibrium (LD) maps and comprehensive sets of common single nucleotide polymorphisms (SNPs) that capture most of the common sequence variations facilitate the identification of disease-related genes with genome-wide association studies, an approach without an a priori hypothesis based on a gene function or disease pathway.
To identify OSCC-related genes, we conducted a multistage genome-wide association study in Japanese individuals, with a total of 1071 cases and 2762 controls, and identified a significant genome-wide level of association for two and six SNPs on chromosomes 4q23 and 12q24.11-13, respectively. The most functional variants in the two regions, rs1229984 (ADH1B) and rs671 (ALDH2),4 were strongly associated with OSCC. Furthermore, we analysed the association with OSCC of smoking and drinking alcohol, two of the principal environmental determinants of OSCC, both individually and jointly.5 Finally, we evaluated the combined effects of environmental and genetic risk factors.
This case–control study was designed to investigate the environmental and genetic risk factors for OSCC. The eligibility criterion was that the oesophageal disease was pathologically diagnosed as OSCC. Patients with newly diagnosed oesophageal cancer, 35–85 years of age, were identified from six hospitals (Juntendo University Hospital, National Cancer Center Hospital, Kurume University Hospital, Saitama Cancer Center, Kagoshima University Hospital and Kyushu University Hospital) from 2000 to 2008. Healthy controls without a previous cancer history were recruited from Kyushu University Hospital (and related hospitals) during the same time period. All controls were enrolled after receiving an upper gastrointestinal endoscopy test to ensure that they had no disease. All participants provided written informed content. The study protocol was reviewed and approved by Kyushu University (Fukuoka, Japan), Juntendo University (Tokyo, Japan), National Cancer Center Hospital (Tokyo, Japan), Kurume University (Kurume, Japan), Saitama Cancer Center (Saitama, Japan) and Kagoshima University (Kagoshima, Japan). In total, 1071 patients with OSCC and 2762 controls were enrolled.
Environmental risk factors
Detailed information about demographic characteristics, lifestyle and daily diet was collected using a standardised questionnaire. Of all the known determinants of OSCC, we chose the two major ones—smoking and alcohol consumption—as environmental risk factors to investigate in detail. Information on smoking and alcohol consumption habits (eg, current smoker, ex-smoker, or non-smoker for smoking status) was collected at the time of enrolment. In addition, the Brinkman index (product of the number of cigarettes per day and years of smoking) for current smokers and years after quitting smoking or drinking (<1 year, 1–2 years, 3–9 years, or 10 years or longer) were calculated. Of the data collected from 1071 patients with OSCC and 2762 controls, the data from 742 patients with OSCC and 820 controls were analysed.
Genotyping, quality control and genetic association analysis
The genome-wide association study was carried out using the Affymetrix GeneChip Human Mapping 500K array (online supplementary figure 1). We genotyped 226 OSCC cases and 1118 controls using the Bayesian Robust Linear Model with Mahalanobis (BRLMM) algorithm. Samples with a genotype call rate <0.94 for either NspI or StyI GeneChip SNPs were removed from analysis (N=12). To detect duplicated samples, relatives, and DNA-contaminated samples, pairwise identity-by-descent (IBD) estimation was carried out. We detected 1, 28 and 2 pairs showing IBD (PI_HAT) proportions of 1.0, approximately 0.5 and 0.25, respectively. Based on the results, 31 samples that had lower genotype call rates in each pair were excluded from the association analysis. In addition, we removed samples that had deviated averages of PI_HAT (approximately more than 3 standard deviations (PI_HAT > 0.020, N=13, see supplementary figure 2)) because such high mean PI_HAT values might be caused by DNA contamination or low-quality genotyping. These 13 samples also had higher rates of heterozygous genotypes than the other study samples (supplementary figure 3). After the sample quality check, 1288 samples (209 OSCC and 1079 controls) were subjected to further analysis.
SNPs were removed from analysis if they had a call rate of less than 0.95, showed a difference in call rate of more than 0.03 between OSCC and controls, displayed Hardy–Weinberg disequilibrium (p<1.0×10−4) in the control group, or had a minor allele frequency (MAF) <0.10. SNPs that were not selected in the updated GeneChip SNP5.0 (Affymetrix) were also excluded. After these exclusions, 234 830 SNPs remained in the first stage. The genomic inflation factor based on the median χ2 value was 1.024 in this genome-wide association analysis (supplementary figure 4), implying that there was no systematic increase of false positives owing to population stratification or to any other form of bias. Six SNPs on chromosome 12q24 were strongly associated with the disease, exceeding the genome-wide significance level of p=1.0×10−7 (supplementary figure 5).
In the second stage, 480 OSCC and 864 control samples were genotyped using the Illumina Golden Gate Assay for the best 1536 SNPs (allelic p<0.013). When multiple SNPs displayed strong LD with each other (r2>0.8), the most closely associated SNP was chosen to avoid redundancy during the selection of the 1536 SNPs. The samples with a genotype call rate <0.98 and SNPs with a call rate <0.98, Hardy-Weinberg disequilibrium (p<1.0×10−4) in the controls, or an MAF <0.05 were excluded from the association analysis. After quality control, 479 OSCC, 863 control and 1419 SNP samples remained, and 66 SNPs had an allele test p<0.05 at this stage.
Among the 26 SNPs that showed an allelic p<0.01 in the second stage, 25 could be genotyped with the TaqMan method in 365 OSCC cases and 780 controls in the third stage. The average SNP call rate of these 25 SNPs was 0.998. We identified 10 SNPs with an allelic p<0.05, and eight SNPs reached a significant genome-wide association level (p<1×10−7) in combined samples. The non-synonymous SNPs rs1229984 (ADH1B), rs671 (ALDH2) and rs16969968 (CHRNA5), as well as the synonymous SNP rs1051730 (CHRNA3), were also genotyped in all samples in the first through third stages by the TaqMan method.
To evaluate genetic and environmental factors, genotype data for the two SNPs (rs1229984 and rs671) and lifestyle data (smoking and alcohol consumption) were available for 742 OSCC cases and 820 controls. Odds ratios (OR) and 95% CIs (95% CIs) were calculated using unconditional logistic regression models, adjusted for sex, age (5-year categories) and study area (Honshu and Kyushu islands).
The environmental factors—that is, history of smoking and alcohol consumption, were re-categorised into two subclasses according to whether subjects had a previous habit of smoking or drinking; this was done to minimise the effect of disease. To evaluate the interaction effect more simply, we chose the dominant or recessive model for both SNPs, combining the heterozygous group into either a wild homozygous or mutant homozygous group. The model was selected based on the fitness of the logistic regression. For the results, subjects with GA at rs1229984 were included in the group of AA homozygotes because the recessive model was a better fit than the dominant model. In contrast, the AG and AA genotypes of rs671 were combined because the dominant model was a better fit.
First, we estimated the environmental risk arising from smoking and alcohol consumption both individually and in combination (risk=0, 1 or 2). Similarly, for genetic risk, we estimated the OR of each factor of rs671 (AG/AA) and rs1229984 (GG) and their combined effect (risk=0, 1 or 2). Next, we repeated the same analysis for environmental risk according to the stratum of genetic risk. In the stratified analysis, we evaluated how the environmental effect was modified in the different genetic strata—that is, the existence of a gene–environment interaction. Here we used subjects with the AG/AA genotype of rs671 and/or GG genotype of rs1229984 as a genetic risk group. Finally, we calculated the risk number for the four risk factors in comparison with subjects who had no risk factors to evaluate the accumulation of risk (risk=0, 1, 2, 3 or 4) (tables 1, 2, and 3 and Figure 1).
p Values for the interaction are based on likelihood ratio tests that compared models with and without interaction terms. Statistical analyses were performed with SAS software version 9.1 (SAS Institute). A two-tailed p value of <0.05 was considered statistically significant.
Genotype data cleaning and IBD analysis were carried out using PLINK version 1.06 software.6 LD was assessed with HaploView version 4.0.7 The statistical power for the allelic association analysis in the first and second stages of this study was calculated using the PS program8 (supplementary table 1). Statistical analyses for the gene–environment interaction were performed with SAS. A two-tailed p value of <0.05 was considered statistically significant.
Figure 2 shows the study design. Table 1 shows several characteristics of the cases and controls. Cases included more men, older individuals, ever-drinkers, ever-smokers and subjects with the AG/AA genotype of rs671 and GG genotype of rs1229984 than controls. The average risk was significantly higher among cases (2.7) than among controls (1.5).
Our multistage association study identified two and six SNPs on chromosomes 4q23 and 12q24.11-13, respectively, which showed genome-wide evidence for association with OSCC (p<1.0×10−7) (table 4). The disease-associated markers of 4q23 spanned the ADH gene cluster region, including seven ADH family genes: ADH1A, ADH1B, ADH1C, ADH4, ADH5, ADH6 and ADH7 (Figure 3). We searched for functional SNPs in these genes in the SNP database and found one validated non-synonymous SNP in exon 3 of ADH1B, rs1229984, with an MAF >0.1 in the East Asian population. In addition, 12q24.12 contains the ALDH2 gene, which is a well-known key enzyme in alcohol metabolism (Figure 4). This gene also possesses a non-synonymous SNP in exon 12, rs671, that affects its enzymatic activity. We assessed the LD between these functional SNPs and associated SNP markers. We detected moderate LD between rs1229984 and rs1042026 as well as between rs671 and rs11066280 (r2=0.66 and 0.87, respectively) in control samples (supplementary figure 5). These observations led us to examine the association of rs1229984 and rs671 with OSCC. We found a stronger association between these SNPs and OSCC (allele test OR=1.82, p =6.2×10−28 and OR =1.78, p =1.0×10−26 for rs1229984 and rs671, respectively) than between marker SNPs and OSCC (allele test OR=1.66, p=1.8×10−16 and OR=1.68, p=2.5×10−21 for rs1042026 on 4q23 and rs11066280 on 12q24, respectively), suggesting that rs1229984 and rs671 might be susceptibility variants for OSCC (table 4). Because the other SNP markers with disease associations reside in introns (eg, rs3805322 and rs2074356 reside in the introns of ADH4 and C12orf51, respectively), we cannot exclude the possibility that they have a biological effect on genes from this region. However, other lines of evidence support a possible role for ADH1B and ALDH2 in the pathogenesis of OSCC.
The risk alleles of rs1229984 in ADH1B (G) and rs671 in ALDH2 (A) encode arginine-48 and lysine-504, respectively, which reduce enzymatic activity (table 5). The frequency of the GG genotype of rs1229984 was higher in OSCC than in controls (0.20 vs. 0.06, OR=4.08, p=4.4×10−40). Similarly, the frequency of the AA+AG genotype of rs671 was higher in cases than in controls (0.73 vs 0.43, OR = 3.54, p=5.5×10−62). These results indicate that individuals who exhibit low enzymatic activity for ADH1B and/or ALDH2 are at higher risk for OSCC.
Table 2 shows the ORs of OSCC associated with environmental and genetic risk factors along with their internal interactions. Ever-drinkers who did not smoke and ever-smokers who did not drink alcohol had significantly elevated adjusted ORs of 3.5 (95% CI 2.1 to 5.8) and 2.3 (95% CI 1.2 to 4.3), respectively. A supra-multiplicative OR of 16.0 (95% CI 9.7 to 26.3)—that is, statistically larger than the product of 3.5 and 2.3 (8.0), was found among individuals who were both ever-drinkers and ever-smokers. Subjects with only one risk allele, either rs671 AG/AA or rs1229984 GG, had significantly higher ORs of 4.8 (95% CI 3.7 to 6.3) and 3.1 (95% CI 1.6 to 6.1), respectively, than those without either risk allele. The OR for those with both genetic risk factors was 34.0 (95% CI 18.1 to 63.8); however, the interaction of these two genetic factors did not reach significance (p=0.079).
We also evaluated the combined effects of environmental and genetic risk factors (table 3). In this analysis, the reference group was composed of individuals who never drank or smoked and who also had no genetic risk factors. Compared with the reference group, ever-drinkers who did not smoke and had no genetic risk factors had a non-significant OR of 1.5 (95% CI 0.7 to 3.3). Non-drinkers and non-smokers with genetic risk factors also had a non-significant OR of 1.1 (95% CI 0.5 to 2.4). All other groups, however, had significantly elevated ORs. An interaction between alcohol drinking and smoking was observed only in the stratum with genetic risk. In the stratum with no genetic risk factors, alcohol drinking was not associated with OSCC, regardless of smoking status. Smoking without alcohol drinking elevated the ORs, regardless of the rs671 and rs1229984 genotypes, similarly and significantly. In contrast, interactions between alcohol drinking and genetic risk factors were highly significant. The combined effects of alcohol drinking and genetic risk factors were larger than the products of individual effects. For example, among non-smokers, the combined OR (12.1) was significantly larger than the product of the genetic effect (1.1) and the alcohol drinking risk (1.5). The same effect was seen among smokers (62.1>1.1×5.0).
Finally, we evaluated the effect on OSCC of the number of risk factors present of the two possible environmental and two possible genetic factors (Figure 1). Compared with the no-risk-factor condition, the ORs for one, two, three and four risk factors were 1.4 (95% CI 0.7 to 2.7), 4.3 (95% CI 2.2 to 8.4), 41.0 (95% CI 20.2 to 83.3) and 357.1 (95% CI 105.4 to 1209.5), respectively. A highly significant linear trend (p<0.0001) was observed.
Individuals who smoke and drink alcohol are considered at high risk for OSCC, although most such people do not develop the disease. Indeed, in a recent study, only 41 of 100 000 such people developed OSCC.9 Therefore, it is crucial to simultaneously analyse genetic and environmental risk factors to more efficiently identify people at truly high risk for OSCC. This unbiased genome-wide association study identified two loci containing genes involved in alcohol metabolism. In addition, we found a strong genetic-environmental interaction related to the risk of OSCC. Subjects with two environmental risk factors (ever-smokers and ever-drinkers) in combination with two genetic risk factors (AA or GA at rs1229984 (ADH1B) and GG at rs671 (ALDH2)) had a much higher risk than other subjects. Specifically, compared with no risk factor, the ORs with one, two, three and four risk factors were 1.4 (95% CI 0.7 to 2.7), 4.3 (95% CI 2.2 to 8.4), 41.0 (95% CI 20.2 to 83.3), and 357.1 (95% CI 105.4 to 1209.5), respectively. Of all of the risk factors for OSCC that we had previously examined, the combination of all four factors studied had the highest risk, with an OR of 357.1.
This information on the strong genetic-environmental interaction is valuable for secondary prevention of OSCC. When we see subjects who show ADH1B (rs1229984) and ALDH2 (rs671) variants as well as smoking and drinking habits, we will advise them to have a periodic upper gastrointestinal fibre test. Screening of these patients could have an important role in the early detection of OSCC. Furthermore, the information gained from this study may also enable the development of a primary individualised prevention strategy for young people with these genetic variations. When subjects have these high-risk variants, advising them not to start smoking or, especially, not to drink alcohol will dramatically reduce their risk of developing OSCC. We observed that drinkers who consumed alcohol daily and heavy smokers had higher ORs than their counterparts, and the ORs decreased with an increased amount of time since quitting these habits. However, all ORs among ever-drinkers/smokers were significantly higher than 1.0 (supplementary table 2). At present, we cannot unequivocally determine the preventive effect of quitting smoking or drinking alcohol. To determine whether stopping these habits reduces the risk of OSCC, prospective studies are needed.
Several limitations of this study should be mentioned. First, the statistical power of this genome-wide association study was not sufficient for allelic variants with an OR of <1.5 (supplementary table 1). Therefore, we might have missed variants with a small effect size (eg, 1.1–1.3), which are often reported for other lifestyle-related diseases. Second, we did not match the cases and controls; thus, the basic distributions of sex, place of residence and age were different between the two groups (table 1). However, although matching is efficient in data collection, it does not affect the point estimation if the factor is included in the model. Thus, the absence of matching did not distort the results. Third, personalised genetic testing is prohibitively expensive and ethically problematic. Finally, because of the retrospective study design, several answers in the questionnaire could be altered by disease or pre-disease conditions. Thus, the high OR among former drinkers and smokers who had quit less than 1 year previously may incorrectly imply a causal relationship between these habits and OSCC risk. Therefore, the effect of quitting these habits on OSCC risk should be examined by prospective studies.
Hashibe et al identified the variation of ADH1B (rs1229984) as a risk factor for OSCC.10 They conducted their analysis on European and Latin American populations, and the result was consistent with the results of our study of Japanese patients and controls. A study of Chinese people also demonstrated that ADH1B (rs1229984) is a risk factor for OSCC.11 Recently, Cui et al reported that variations of ADH1B (rs1229984) and ALDH2 (rs671) coupled with alcohol drinking and smoking synergistically enhance oesophageal cancer risk.12 Their study indicates that these two genetic risk variants provide almost equal risk for the generation of OSCC. In our study, among individuals without a genetic risk, alcohol consumption did not increase the OR of OSCC significantly. For people without genetic risk, smoking habits were the major contributing factor for the generation of OSCC. However, in people with genetic risk, a drinking habit strikingly increased the risk of OSCC; combined with smoking, it increased the risk even further (table 3).
Smoking- and alcohol drinking-related genes such as the ADH family, the ALDH family and nicotinic acetylcholine receptors, especially as indicated in SNP analyses, have been significantly associated with a variety of cancers10 13–15 We performed an association analysis between OSCC and SNPs in the nicotinic acetylcholine receptor subunit genes CHRNA3 and CHRNA5 at 15q25 using the same cases and controls. An association with lung cancer was reported by Hung et al14 and Thorgeirsson et al13 However, in this study, we did not find a significant association at either rs1051730 (CHRNA3) or rs16969968 (CHRNA5), with ORs of 0.91 and 0.89, respectively (supplementary table 3). Smoking habits contributed to the development of OSCC; however, SNPs other than rs1051730 (CHRNA3) or rs16969968 (CHRNA5) might affect OSCC generation. Because the risk allele frequency of both of these SNPs was <0.02 in cases and controls in this study population, it would be difficult to show any difference between cases and controls. Variants of ECRG1 have been reported to be associated with OSCC.16 However, it is still unclear whether the genetic or epigenetic changes caused by smoking and/or alcohol drinking are directly associated with the development of OSCC in cooperation with SNPs of genes such as those of the ADH family, the ALDH family and nicotinic acetylcholine receptors5 17 ADH1B in subjects with the rs1229984 AA or GA genotype is reported to metabolise ethanol up to 40 times more quickly than ADH1B from GG homozygotes.18 Furthermore, ALDH2 from subjects with the GG allele of rs671 is reported to metabolise acetaldehyde more than 100 times faster than from AG ALDH2 heterozygotes.19 In addition, ADH genes exhibit a nominally significant association with smoking behaviour.20 Considering our results and these reports, higher local exposure to ethanol and acetaldehyde mediated by smoking may be strongly associated with OSCC development. To answer these questions, it is necessary to conduct a prospective study in genetically at-risk populations with or without drinking and/or smoking habits, as recently performed for type 2 diabetes.21 22
In summary, this study disclosed a significant genetic-environmental interaction, with very large ORs, associated with the development of OSCC. Thus, convincing young people to smoke and drink less is likely to reduce the incidence of OSCC. SNP genotyping demonstrated that the ADH1B and/or ALDH2 risk alleles had an interaction with smoking and, especially, alcohol consumption. Analysis of ADH1B and ALDH2 variants would be valuable for secondary prevention of OSCC in high-risk patients who smoke and drink alcohol. Our findings, if replicated in other groups, could demonstrate new pathophysiological pathways for the development of OSCC.
We thank Ms. Judith Clayton for her critical reading of the manuscript.
Funding This work was supported in part by the following grants and foundations: Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research, grant numbers 17109013, 21229015; CREST, Japan Science and Technology Agency (JST); NEDO (New Energy and Industrial Technology Development Organization) Technological Development for Chromosome Analysis; and The Ministry of Education, Culture, Sports, Science, and Technology of Japan for Scientific Research on Priority Areas, Cancer Translational Research Project, Japan. Other Funders: Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research, Japan Science and Technology Agency, New Energy and Industrial Technology Development Organization.
Competing interests None.
Ethics approval This study was conducted with the approval of the Kyusyu University, Juntendo University, National Cancer Institute, Saitama Cancer Center, Kurume University, and Kurume University, Japan.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.