Article Text
Abstract
Objective Although several genome-wide association studies (GWAS) of non-cardia gastric cancer have been published, more novel association signals could be exploited by combining individual studies together, which will further elucidate the genetic susceptibility of non-cardia gastric cancer.
Design We conducted a meta-analysis of two published Chinese GWAS studies (2031 non-cardia gastric cancer cases and 4970 cancer-free controls) and followed by genotyping of additional 3564 cases and 4637 controls in two stages.
Results The overall meta-analysis revealed two new association signals. The first was a novel locus at 5q14.3 and marked by rs7712641 (per-allele OR=0.84, 95% CI 0.80 to 0.88; p=1.21×10−11). This single-nucleotide polymorphism (SNP) marker maps to the intron of the long non-coding RNA, lnc-POLR3G-4 (XLOC_004464), which we observed has lower expression in non-cardia gastric tumour compared with matched normal tissue (Pwilcoxon signed-rank=7.20×10−4). We also identified a new signal at the 1q22 locus, rs80142782 (per-allele OR=0.62; 95% CI 0.56 to 0.69; p=1.71×10−19), which was independent of the previously reported SNP at the same locus, rs4072037 (per-allele OR=0.74; 95% CI 0.69 to 0.79; p=6.28×10−17). Analysis of the new SNP conditioned on the known SNP showed that the new SNP remained genome-wide significant (Pconditional=3.47×10−8). Interestingly, rs80142782 has a minor allele frequency of 0.05 in East Asians but is monomorphic in both European and African populations.
Conclusion These findings add new evidence for inherited genetic susceptibility to non-cardia gastric cancer and provide further clues to its aetiology in the Han Chinese population.
- GASTRIC CANCER
- GENETIC POLYMORPHISMS
- EPIDEMIOLOGY
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Significance of this study
What is already known on this subject?
Approximately 40% of all cases of gastric cancer worldwide occur in China, and this form of cancer remains one of the key public health issues in cancer prevention and control.
Several previous genome-wide association studies (GWAS) of gastric cancer have reported several associations for common single-nucleotide polymorphisms (SNPs).
Combining together studies of moderate sample sizes will increase statistical power, so more novel signals can be exploited.
What are the new findings?
Based on a GWAS meta-analysis approach to pool two published Chinese GWAS, and followed by two-stage replications (more than 10 000 samples), we identified two novel signals associated with the risk of non-cardia gastric cancer. The first one rs7712641 maps to the intron of the long non-coding RNA, lnc-POLR3G-4 (XLOC_004464). Further analysis showed that rs7712641 had significantly lower expression in non-cardia gastric tumour compared with matched normal tissue. In addition, we observed a new signal marked by rs80142782 at the 1q22 locus. It was independent of the previously reported SNP rs4072037 and is a common SNP in East Asians but is monomorphic in both European and African populations.
Introduction
Globally, gastric cancer remains the third-leading cause of cancer death in both sexes,1 with more than half of gastric cancer cases worldwide occurring in East Asia, predominantly in China. Most cases of gastric cancer are sporadic,2 and its aetiology is related to both genetic susceptibility and epidemiological risk factors3 such as age, sex, Helicobacter pylori infection,4 ,5 family history, excessive salt intake and tobacco smoking. Anatomically, gastric cancer is classified into cardia gastric cancer and non-cardia gastric cancer, which are characterised by distinct risk factors and clinical features.4 ,6–8
Recently, several genome-wide association studies (GWAS) of gastric adenocarcinoma were conducted in East Asians.9–12 Notable findings include single-nucleotide polymorphism (SNP) markers mapping to 8q24.3, for both an intronic SNP (rs2976392) and an exonic SNP (rs2294008) in the Prostate Stem Cell Antigen gene (PSCA); and two markers rs2075570 and rs2070803 near the Mucin 1 gene (MUC1) on 1q22.9 The findings on 1q22 locus and gastric cancer risk were further replicated in several follow-up studies and additional evidence pointed to the non-synonymous SNP, rs4072037, as the functional variant underlying the observed association.13–18 In addition, 3q13.31 marked by rs9841504, 5p13.1 marked by rs13361707 or rs10074991 and 6p21.1 marked by rs2294693 were reported to be associated with non-cardia gastric adenocarcinoma in China,11 ,12 whereas 10q23 marked by rs2274223, a non-synonymous SNP located in PLCE1, was associated with cardia but not non-cardia gastric cancer.10 Together, these data indicate that five chromosomal regions, 1q22, 3q13.31, 5p13.1, 6p21.1 and 8q24.3, have strong evidence of harbouring one or more susceptibility alleles of non-cardia gastric cancer. Based on the experience of other cancer sites, additional loci will likely be found by interrogation of increasingly larger studies.
Materials and methods
Primary GWAS scan data
For the National Cancer Institute (NCI) GWAS, subjects were drawn from four prospective cohort studies and one large case–control study as reported in Abnet et al.10 In addition, all subjects used in replication in the original paper were subsequently genotyped using the Illumina 660W-Quad microarray; this included scanning of 725 additional gastric cancer cases and 608 additional controls. The current analysis included all 1025 non-cardia gastric cancer cases and 2697 controls from the NCI Upper Gastrointestinal Cancer GWAS.
For the Nanjing and Beijing GWAS, individuals were derived from separate case–control studies conducted in Nanjing (565 cases and 1162 controls) and Beijing (468 cases and 1123 controls), as previously reported in Shi et al,11 where individuals were genotyped using the Affymetrix Genome-Wide Human SNP Array (V.6.0).
Replication samples
The first-stage replication including 1145 cases and 2253 controls were derived from Jiangsu Province. The second-stage replication included 2419 cases and 2384 controls, which were derived from Beijing, Hubei and Shandong province.
Gastric cancer cases tested for expression of the lncRNA associated with the GWAS SNPs came from our UGI Cancer Genetic Studies (URL: http://dceg.cancer.gov/about/staff-directory/biographies/O-Z/taylor-philip). Genotypes for all these cases are known because they were also all participants in our previous GWAS.10
All study individuals provided informed consent and both the institutional review boards of NCI and Nanjing Medical University approved all procedures and all experiments, which were conducted in accordance with the approved guidelines.
Genotype imputation
In addition to the quality control procedures performed in the previous primary publications for previous GWAS, SNPs with call rate of <95%, p value for Hardy–Weinberg Equilibrium (HWE) in controls ≤1.0×10−6 or minor allele frequency (MAF) of <1% in controls were further removed before imputation. Imputation was conducted separately for the NCI (Illumina 660W) and the Nanjing+Beijing (Affymetrix 6.0) scan data taking all populations in the 1000 Genomes Project Phase I (V.3) as the reference set and using IMPUTE2 software (V.2.2.2), which automatically finds haplotypes from the best matching population from the entire reference set to do the imputation. First, genomic coordinates for National Center for Biotechnology Information (NCBI) human genome Build 36 were converted to those of NCBI human genome Build 37 using the UCSC liftOver tool. The few loci for which coordinates could not be converted were also excluded from imputation. Second, the strand of the inference data was aligned with the 1000 Genomes Project data by simple allele state comparison or allele frequency matching for adenine/thymine (A/T) and guanine/cytosine (G/C) SNPs. We implemented a 4 Mb sliding window to impute across the genome, resulting in 744 windows. A pre-phasing strategy with SHAPEIT software (V.2) was adopted to improve the imputation performance. The phased haplotypes from SHAPEIT were fed directly into IMPUTE2.
SNP selection and replication genotyping
The meta-analysis included 6 223 896 SNPs based on the intersection of the three imputed datasets. Individual SNPs for the stage 2A replication were selected based on the following criteria: (1) SNPs with INFO score of ≥0.5; (2) MAF in control set ≥0.01; (3) p value for HWE in control set >1.0×10−4 in each set; (4) Phet >1.0×10−4 and I2 <75% in meta-analysis; (5) linkage disequilibrium (LD) pruning: included only one SNP with the lowest p value when the pair-wise r2 ≥0.3 within a distance of 200 kb; (6) exclusion of previously identified loci associated with risk for non-cardia gastric cancer. After applying the above criteria, we then picked the top 48 SNPs (Pmeta≤2.58×10−5). For 1q22 and 8q24, there were two SNPs (rs80142782; rs76845414) retained in our LD filtered list, so we included back two more SNPs (rs4072037; rs2294008) previously reported for each of these regions in order to search for potential secondary signals for these two known loci to derive an initial list of 50 SNPs. Subsequently, 13 SNPs failed either Sequenom assay design or genotyping studies. As a result, a total of 37 SNPs (see online supplementary table S2) were successfully genotyped using iPLEX Sequenom MassARRAY platform (Sequenom, California, USA) in stage 2A replication (1145 cases and 2253 controls).
Five SNPs with p <0.05 in stage 2A without significant heterogeneity (Phet>1.0×10−4 and I2<75%) were advanced for TaqMan assays (Applied Biosystems) in stage 2B replication (2419 cases and 2384 controls) (see online supplementary table S3). Further information on primers and probes are available on request. For quality control purposes: (1) case and control samples were mixed on each plate; (2) genotyping was performed blind to case/control status and (3) two water controls were used in each plate as blank controls.
Quantitative reverse transcription-PCR
Total RNA was extracted from each patient's matched frozen tumour and normal surgical resection tissues using All Prep DNA/RNA/Protein kit (QIAGEN) in accordance with the manufacturer's instructions. RNA quality and quantity were determined using the RNA Nano Chip/Agilent 2100 Bioanalyzer (Agilent Technologies). Reverse transcription of RNA was done by adding 0.2–2 µg total RNA, 1 µL of oligo(dT)12–18 (500 µg/mL), 1 µL (200 units) of SuperScript II reverse transcriptase, 1 µL (2 units) of E-coli RNase and 1 µL of 10 mmol/L deoxynucleotide triphosphate (Invitrogen) in total volume of 20 µl. All real-time PCRs (RT-PCRs) were done using an ABI 7300 Sequence Detection System. Primer and probe for the target gene and the internal control gene (GAPDH, glyceraldehyde-phosphate dehydrogenase) were designed and ordered from ABI (Assay ID: AJBJXUX; Part Number: 4441114). A singleplex reaction mix was prepared according to the manufacturer's protocol of ‘Assays-on-Demand Gene Expression Products’, including 10 µL Taqman Universal PCR Master Mix, No AmpErase UNG (2X), 1 µL of 20X Assays on-Demand Gene Expression Assay Mix (all Gene Expression assays have a 6-carboxyfluorescein (FAM) reporter dye at the 5′ end of the TaqMan minor groove binder (MGB) probe and a non-fluorescent quencher at the 3′ end of the probe) and 9 µL of cDNA (1000 ng) diluted in RNase-free water to a total volume of 20 µL. Each sample for the gene was run in triplicate and the expression level was averaged over all runs. The thermal cycling conditions included an initial denaturation step at 95°C for 10 min, 40 cycles at 95°C for 15 s and 60°C for 1 min.
Statistical analysis
Association testing was performed using SNPTEST software (V.2.2), with adjustment for age, sex and study variables for NCI. Two eigenvectors (ev4 and ev8) were significantly associated with case status (p<0.05) in the baseline model (not including SNP effects) which was adjusted for age, sex, study and all top 10 eigenvectors, and therefore these two significant eigenvectors were also included to adjust for population stratification in final association models. For Nanjing and Beijing, age, sex, smoking and alcohol consumption were adjusted in baseline models. Three eigenvectors (ev1, ev4 and ev9) for Nanjing and five eigenvectors (ev1, ev3, ev7, ev9 and ev10) for Beijing were adjusted for population stratification separately, which were also significantly associated with case status (p<0.05). In each replication study, we adjusted for gender, age, smoking and alcohol consumption only.
For the meta-analysis we used the meta-module implemented in genotype library utilities (GLU) (see URLs). Strand flipping was handled by comparing alleles either with direct matching or with reverse complement matching. For A/T or G/C SNPs, strand matching was based on allele frequency checking. The fixed-effects inverse variance method was used to combine the β estimates and SEs from each GWAS scan as well as the replication stages. The p value for heterogeneity was calculated using Cochran’s Q, which is distributed as a χ2 statistic with (n−1) degrees of freedom, where n is the number of sets included in the meta-analysis. I2 was calculated as 100%×(Q–(n–1))/Q. Data analysis and management was performed with GLU or PLINK (see URLs).
PLINK was also used for the conditional haplotype analysis. Wilcoxon signed-rank test (R package) was applied to the tumour/normal paired quantitative RT-PCR data to assess the RNA expression level.
In silico bioinformatics analysis
We used GTEx (see URLs) for expression quantitative trait loci (eQTL) information for associated SNPs (see online supplementary table S5). We also searched HaploReg (V.3)19 to explore potential functional annotations within Encyclopedia of DNA Elements (ENCODE) data for the genomic regions surrounding our lead SNPs (see online supplementary table S6).
Results
To discover additional susceptibility alleles for non-cardia gastric cancer in the Han Chinese population, we conducted a combined analysis of two previously published GWAS10 ,11 after imputing the genetic data with the 1000 Genome Project data Phase I release (V.3).20 The combined dataset included a total of 6 223 896 SNPs for a fixed-effects meta-analysis of 2031 cases and 4970 cancer-free controls (see online supplementary table S1). Quantile-quantile and Manhattan plots based on stage 1 meta-analysis p values are shown in online supplementary figures S1 and S2, respectively. We further followed up 37 promising loci (see Methods) and genotyped them in an independent set of 1145 cases and 2253 controls in stage 2A (see online supplementary table S2). Finally, for stage 2B, we advanced five loci that were nominally significant in stage2A to a second independent set of 2419 cases and 2384 controls (see online supplementary table S3).
Based on the overall meta-analysis including two discovery GWAS scans and two replication studies, we identified two novel risk loci for non-cardia gastric cancer, the first one is rs7712641 at 5q14.3 (per-allele OR=0.84, 95% CI 0.80 to 0.88; p=1.21×10−11). No heterogeneity was observed across the two GWAS scans and two replication studies (Phet=0.56) (table 1). There are no protein-coding genes within the 1 Mb of the associated SNP (chr5: 88 346 298–89 459 630; hg19) (figure 1A). However, rs7712641 is located in the intron of lnc-POLR3G-4 (XLOC_004464), a long non-coding RNA (lncRNA) which is poorly characterised. To explore the possible effect of the SNP marker on the lncRNA, lnc-POLR3G-4, we extracted total RNA from 75 matched gastric non-cardia adenocarcinoma and adjacent normal tissue pairs and performed a quantitative RT-PCR analysis to measure its expression abundance. We found that expression differed between tumour and normal tissues (Pwilcoxon signed-rank=7.2×10−4); with the majority of pairs (50 of 75) showing lower expression in tumour compared with normal tissue (figure 2). These data provided preliminary evidence that this lncRNA could function in a manner resembling a tumour suppressor gene. However, the association between rs7712641and expression of lnc-POLR3G-4 in normal tissue was negative (p=0.99), which does not support the notion of a functional role for this SNP. More functional studies are warranted to clarify the complicated phenomena.
As anticipated, the current study, which included samples from these previous GWAS reports,10 ,11 also replicated the association with rs4072037 (table 1; per-allele OR=0.74; 95% CI 0.69 to 0.79; p=6.28×10−17) at 1q22. However, we also identified a second strong signal in this region and by doing so established an independent, new genome-wide significant SNP rs80142782 (table 1 and figure 1B; per-allele OR=0.62; 95% CI 0.56 to 0.69; p=1.71×10−19). Based on the evidence at hand, it seems that rs80142782 is likely an independent primary signal (rs4072037 as a secondary signal) at this locus. This evidence includes the following: (1) both SNPs are about 323 kb apart and have moderately low (r2=0.3) pair-wise LD in 1000 Genome s Project data for Asians; (2) rs80142782 conditioned on rs4072037 remained genome-wide significant (Pconditional=3.47×10−8, see online supplementary table S5), although rs4072037 conditioned on rs80142782 did not (Pconditional=2.95×10−6, see online supplementary table S4); (3) we used the haplotype inference method implemented in the plink haplotype test ‘—chap’ option. For rs4072037 and rs80142782, there are three inferred haplotypes with frequencies greater than 1% from all four possible ones. The two models compared in the conditional haplotype likelihood ratio test are the null model: {CC} {CT, TT} and the alternative model: {CC} {CT} {TT}, where each {set} allows a unique effect. The conditional haplotype analysis demonstrated that the effect size of haplotype CT differed from that of TT among the possible haplotypes formed by these two SNPs (Plikelihood ratio=9.11×10−9) and(4) finally, it is notable that rs80142782 is Asian-specific with a MAF of 0.05 in Asians but monomorphic in both European and African populations. Thus, rs80142782 appears to be a better association signal at 1q22 in Asian populations. Further validation in additional studies with even larger sample sizes will be required to determine if these two SNPs are truly independent signals tagging two different causal variants.
The previously reported SNP marker, rs4072037 at 1q22, is a synonymous SNP in MUC1 which is a member of the mucin family that collectively forms the protective mucous barrier on epithelial surfaces. Its expression was highest in stomach among all normal tissues examined by the Genotype-Tissue Expression (GTEx) project (see URLs). Evidence suggests that rs4072037 is the functional variant for this locus because it alters transcriptional regulation and determines splice variants in MUC1.15 Although MUC1 is a putative candidate gene for gastric cancer risk, it is also interesting to note that GTEx data show that rs4072037 is an eQTL for several neighbouring genes (including THBS3, GBAP1, GBA and RP11-263K19.4) in other tissues (see online supplementary table S5). The rs80142782 may act on the ASH1L gene based on its close proximity. ASH1L encodes a member of the trithorax group of transcriptional activators and functions as an epigenetic regulator by histone methylation (H3K4 methyltransferase) and is frequently altered in lung cancer tumours and cell lines,21 ,22 oesophageal squamous cell carcinoma tumour tissue23 and colorectal cancer cell lines.24 It was also implicated in inflammatory autoimmune disease.25
Discussion
Our study identified a new risk locus at 5q14.3 marked by rs7712641 which lies in the intron of a lncRNA with little known prior functional characterisation. Other lncRNAs implicated in cancers include PCA3 and PCGEM1 in prostate tumour,26 and MALAT1 in tumours of the colorectum, liver, pancreas, lung, breast and prostate.27 ,28 It is remarkable that a recent comprehensive transcriptome analysis nominated a total of 7942 lineage-associated or cancer-associated lncRNA genes,29 for which further functional investigations are warranted. Overexpression or knockdown of this lncRNA may be informative in identifying target genes through analysis of differential gene expression profiles in non-cardia gastric tumour cell lines.
Our analysis also revealed an apparently stronger associated SNP (rs80142782) at 1q22 than the previously identified rs4072037. Our data indicate that the association with rs80142782 is independent of rs4072037. Both the new locus at 5q14.3 marked by rs7712641 and the new independent signal at 1q22 marked by rs80142782 could contribute to epigenome regulation. Haploreg data show that both of these SNPs (or SNPs in high LD with them) locate to sites of multiple regulatory elements, including promoter histone marks, enhancer histone marks and DNAse hypersensitivity (see online supplementary table S6). Further functional validation studies are warranted to understand the contribution of these susceptibility alleles to gastric carcinogenesis.
Online supplementary table S7 shows the results from our meta-analysis of two GWAS scans for previously reported variants from the literature. Notably we confirmed a prior independent GWAS report9 of an association between multiple SNPs in PSCA at 8q24.3 and risk of non-cardia gastric cancer. We also confirmed the association for rs13361707 in PRKAA1,11 but there was no additional evidence to support an association for rs9841504 in ZBTB20 with non-cardia gastric cancer in NCI data (p=0.27). Recently, Mocellin et al30 collected published data and nominated a list of 11 SNPs at eight loci with a high level of cumulative evidence for susceptibility to gastric cancer. Among these 11 SNPs, 4 (at 2q33.1, 3p24.1, 6p21.33 and 11q13.2, respectively) were identified beyond those previously established in GWAS findings, but none of these SNPs was associated (p<0.05) with gastric non-cardia cancer risk in the current meta-analysis.
In summary, by combining two pre-existing GWAS scans of non-cardia gastric cancer to increase the sample size for the discovery stage and adding over 8000 individuals for further replication, we identified two novel loci. In the future, additional studies are warranted with larger sample size and/or with a design that considers heterogeneity of the gastric cancer where, for example, the Cancer Genome Atlas (TCGA) recently reported four molecular subtypes for gastric cancer based on multi-omics profiling analyses.31
URLs
GTEx, http://www.gtexportal.org/;
GTEx MUC1 expression, http://www.gtexportal.org/home/gene/MUC1
IMPUTE2 software (V.2.2.2), http://mathgen.stats.ox.ac.uk/impute/impute_v2.html;
LocusZoom, http://csg.sph.umich.edu/locuszoom/;
SHAPEIT software (V.2), https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html;
SNPTEST software (V.2.2), https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html
UCSC liftOver tool, http://hgdownload.cse.ucsc.edu/downloads.html
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online figures
- Data supplement 2 - Online tables
Footnotes
ZW, JD, NH, XM, CCA, MY, NDF and JC contributed equally to this work; SJC, YS, AMG, GJ, PRT and HS co-supervised this work.
Contributors HBS, PRT, GJ, AMG, YS, SJC, ZW, JCD, NH, XM, CCA, MY, NDF and JFC organised and designed the study; ZW, JCD, HBS, PRT, AMG and SJC wrote the first draft of the manuscript; ZW and JCD contributed to the design and execution of statistical analysis; HBS, PRT, GJ, AMG, YS, SJC, ZW, JCD, NH, XM, CCA, MY, NDF and JFC contributed to the writing of the manuscript; ZW, JCD, HBS, GJ, ZH and LB conducted and supervised the genotyping of samples; HS, LW and NH conducted the quantitative real-time PCR experiments; MY, XZ, CCC, CR, SMD, MW, TD, JBD, YTG, RZ, CG, WP, WPK, ND, LML, CY, YLQ, YJ, XOS, JPC, CW, HM, ZZ, CW, YBX, ZH, JMY, LX, WZ and DL contributed to the conduct of the epidemiological studies or contributed samples to the GWAS or follow-up genotyping.
Funding This work was supported by the Intramural Research Program of the National Cancer Institute (NCI), the National Institutes of Health, the Division of Cancer Epidemiology and Genetics and the Center for Cancer Research. This work was also supported in part by the National Basic Research Program (973) (2013CB910304), Program of National Natural Science Foundation of China (81230067), National Natural Science Foundation of China (81573228, 81521004. 81422042, 81373090); Science Foundation for Distinguished Young Scholars in Jiangsu (BK20130042); Jiangsu Natural Science Foundation (BK2012443, BK2012841); Jiangsu Province Clinical Science and Technology Projects (Clinical Research Center, BL2012008); National Program for Support of Top-notch Young Professionals, National Natural Science Foundation of China (81222038); Key Grant of Natural Science Foundation of Jiangsu Higher Education Institutions (15KJA330002); National High-Tech Research and Development Program of China (2015AA020950) and Priority Academic Program for the Development of Jiangsu Higher Education Institutions (Public Health and Preventive Medicine).
Competing interests None declared.
Patient consent Obtained.
Ethics approval Both the institutional review boards of NCI and Nanjing Medical University approved all procedures and all experiments, which were conducted in accordance with the approved guidelines.
Provenance and peer review Not commissioned; externally peer reviewed.