Objective To elucidate the genetic architecture of gene expression in pancreatic tissues.
Design We performed expression quantitative trait locus (eQTL) analysis in histologically normal pancreatic tissue samples (n=95) using RNA sequencing and the corresponding 1000 genomes imputed germline genotypes. Data from pancreatic tumour-derived tissue samples (n=115) from The Cancer Genome Atlas were included for comparison.
Results We identified 38 615 cis-eQTLs (in 484 genes) in histologically normal tissues and 39 713 cis-eQTL (in 237 genes) in tumour-derived tissues (false discovery rate <0.1), with the strongest effects seen near transcriptional start sites. Approximately 23% and 42% of genes with significant cis-eQTLs appeared to be specific for tumour-derived and normal-derived tissues, respectively. Significant enrichment of cis-eQTL variants was noted in non-coding regulatory regions, in particular for pancreatic tissues (1.53-fold to 3.12-fold, p≤0.0001), indicating tissue-specific functional relevance. A common pancreatic cancer risk locus on 9q34.2 (rs687289) was associated with ABO expression in histologically normal (p=5.8×10−8) and tumour-derived (p=8.3×10−5) tissues. The high linkage disequilibrium between this variant and the O blood group generating deletion variant in ABO (exon 6) suggested that nonsense-mediated decay (NMD) of the ‘O’ mRNA might explain this finding. However, knockdown of crucial NMD regulators did not influence decay of the ABO ‘O’ mRNA, indicating that a gene regulatory element influenced by pancreatic cancer risk alleles may underlie the eQTL.
Conclusions We have identified cis-eQTLs representing potential functional regulatory variants in the pancreas and generated a rich data set for further studies on gene expression and its regulation in pancreatic tissues.
- gene expression
- allele specific expression.
Statistics from Altmetric.com
Significance of this study
What is already known on this subject?
Expression quantitative trait loci (eQTLs) are common in human tissues.
There is an important lack of knowledge on the genetic regulation of gene expression in pancreatic tissues, how this regulation may be altered in pancreatic tumours and how it relates to germline loci known to influence pancreatic cancer risk.
What are the new findings?
Extensive influence was seen on gene expression by non-coding variants in pancreatic tissue samples. Approximately 23% and 42% of genes harbouring eQTLs may be specific to tumour-derived or normal-derived tissue samples, respectively, and some genes have opposite effects in normal and tumour-derived samples.
eQTLs were enriched close to transcriptional start sites and in non-coding functional elements important for pancreatic exocrine tissues, as well as highly shared with the endocrine pancreas.
A large number of eQTLs were shared between normal and tumour-derived pancreatic tissue samples; these eQTLs had stronger effect sizes and were located closer to transcriptional start sites than non-shared eQTLs.
Pancreatic cancer risk alleles on chr9q34.2 were strongly associated with ABO gene expression; this effect may be due to allele-specific effects of the risk variants on a non-coding gene regulatory element.
How might it impact on clinical practice in the foreseeable future?
Our study has no direct impact on clinical practice. However, we show that gene expression in histologically normal and tumour-derived pancreatic tissues is influenced by non-coding germline variation, including pancreatic cancer risk variants.
We have created a rich data set for future studies of gene regulation in pancreatic tissues that will be shared with the scientific research community.
Pancreatic cancer is currently the third leading cause of cancer-related deaths in the USA.1 The majority of patients present with advanced disease at time of diagnosis, resulting in a 5-year survival rate of 7%.2 In contrast to most other cancers, mortality rates for pancreatic cancer are not improving in the USA or in Europe. In the USA, it is predicted to become the second leading cause of cancer-related deaths by 2030.3 4
Most of the human genome consists of DNA that does not encode proteins. This DNA is, however, rich in functional elements that influence higher order chromatin structure and gene expression in tissue-specific and disease-specific settings. Cataloguing these elements across multiple tissues has been spearheaded by the ENCyclopedia Of DNA Elements (ENCODE) and National Institutes of Health Roadmap Epigenomics Mapping Consortia using epigenetic approaches that investigate patterns of modified histones, transcription factors (TFs), open chromatin and DNA methylation that mark genomic regions in an active or repressed conformation.5 6 An important question in human genetics is to what degree germline variation influences these functional elements and the relevance of such regulation to complex traits and diseases. Addressing this question benefits greatly from advances in genotyping and massively parallel sequencing techniques, which are increasingly being used to assess quantitative trait loci across the whole genome for gene expression, splicing, DNase hypersensitivity (DHS) and chromatin interactions.7–14
Although expression quantitative trait locus (eQTL) analyses on a genome-wide scale using RNA sequencing in lymphoblastoid cell lines were first published several years ago,15 16 the Genotype-Tissue Expression (GTEx) consortium has recently expanded this work greatly by generating publicly available data for eQTL analysis in 43 human tissues.9 A high degree of eQTL sharing has been described across tissues and individuals.9 17 18 However, to our knowledge, only one study has systematically compared eQTLs in normal and tumour-derived tissues, using 90 tissues from normal and tumour-derived colon tissues.19
To create a rich expression data set for pancreatic tissues and establish a link between germline genetic variants and gene expression in the pancreas, we undertook an eQTL study in histologically normal and tumour-derived pancreatic tissue samples. We investigated the impact of genetic variants on gene expression across the whole genome in pancreatic tissues and assessed enrichment of eQTL variants in functional elements defined by us and others.5 20 Furthermore, to begin interrogating the influence of common inherited pancreatic cancer risk variants on gene expression, we evaluated eQTLs for published pancreatic cancer risk alleles identified by genome-wide association studies (GWAS) in populations of European ancestry.21–24
Identification of eQTLs in histologically normal and tumour-derived pancreatic tissues
We sequenced the transcriptome of fresh frozen, histologically normal, pancreatic tissue samples from individuals of European ancestry. Matched DNA samples (from blood or histologically normal tissue samples) were scanned on single nucleotide polymorphism (SNP) arrays followed by imputation using the 1000 genomes reference panel (Phase 1, V.3). After quality control, 95 histologically normal samples (79 were adjacent to pancreatic tumours and 16 from non-cancerous organ donors at time of death) with high-quality mRNA-seq and genotype data (Laboratory of Translational Genomics, LTG, sample set, Supplementary Table 13) were included in the eQTL analysis (see online supplementary methods). The same analysis was performed for tumour-derived pancreatic tissue samples (pancreatic ductal adenocarcinoma, PDAC) using The Cancer Genome Atlas (TCGA) mRNA-seq and genotype data from matched blood-derived DNA samples; 115 pancreatic ductal adenocarcinoma samples (TCGA/PAAD data set, Supplementary Table 14) were included after quality control. The tumour percentage in these samples ranged from 5% to 73% (average 31%). Procedures for RNA-seq alignment, data processing and analysis are summarised in figure 1 and detailed in online supplementary figure 1.
We first assessed regulation of gene expression in cis (within +/−1 Mb from the transcriptional start site, TSS, for each tested gene) and identified 38 615 cis-eQTLs for the histologically normal tissue samples and 39 713 cis-eQTLs for the tumour-derived samples (table 1) at a false discovery rate (FDR) <0.1. This threshold corresponded to nominal p values <3.0×10−6 and <1.5×10−6 for normal and tumour-derived samples, respectively. Approximately half of the eQTLs (n=17 707) were shared between histologically normal and tumour-derived tissues at this threshold.
For each gene with significant eQTL(s), the variant with the smallest p value (pmin) was used to represent the cis-eQTL for that gene (eGene; defined as a single SNP-gene pair). This resulted in 484 and 237 cis-eQTL genes (eGenes) in histologically normal and tumour-derived tissues, respectively (table 2, online supplementary tables 1 and 2). A total of 107 eGenes were noted in both histologically normal and tumour-derived samples (FDR<0.1) and therefore considered shared eGenes (see online supplementary table 3). At nominal significance (p<0.05), 58% of eGenes (270 out of 464 eGenes tested in both sets) observed in the histologically normal samples replicated in the tumour set, and 76% of eGenes (169 out of 222 eGenes tested in both sets) seen in the tumour set replicated in the histologically normal set. One eGene each in the normal and tumour sets showed an effect in the opposite direction in the other set (at p<0.05): ALOX5 (arachidonate 5-lipoxygenase; rs11239465: βTumour=−0.71, p=6.8×10−7; βNormal=0.52, p=3.8×10−4; figure 2A) and MYADML2 (myeloid-associated differentiation marker-like 2; rs7503637 βNormal=−0.77, p=1.6×10−6; βTumour=0.55, p=0.0017). The remaining eGenes were defined as being specific for the histologically normal (n=193, 42%) or tumour-derived (n=52, 23%) tissues, respectively. Shared eGenes had similar effect sizes in the two sample sets (r2=0.90–0.95), indicating strong preservation of the direction of effect in tumours. In comparison, eGenes specific to one group were not as highly correlated (r2=0.25 for normal-specific eGenes, and r2=0.18 for tumour-specific eGenes; figure 2B). p Value distributions for eGenes observed in histologically normal samples tested in tumour tissues (figure 2C and online supplementary figure 2, left panels) and vice versa (figure 2C and online supplementary figure 2, right panels) also indicated a high degree of sharing between the two sets of tissues. We also performed the eQTL analysis in tumour samples with additional adjustments for local copy number variations and DNA methylation and noted similar results (39 713 cis-eQTLs and 271 eGenes at p<1.5×10−5; online supplementary materials and supplementary table 4).
We attempted to replicate our findings in the publicly available GTEx project eQTL data set9 generated with 149 histologically normal postmortem pancreatic tissue samples (http://www.gtexportal.org/). A higher number of eGenes were detected in this data set, likely due to differences in sample size as well as a considerably larger number of cis-eQTLs tested (LTG: 72 million; TCGA: 75 million; GTEx: 150–170 million9; see Supplementary Methods). Despite this difference, the majority of eGenes observed in the LTG and TCGA/PAAD samples replicated in GTEx. After excluding SNP-gene pairs not reported in GTEx, 82% of eGenes (323 out of 394 eGenes that were tested in both sets) observed in the histologically normal LTG samples and 79% of eGenes (146 out of 185 eGenes tested in both sets; 78.9%) observed in the tumour-derived TCGA samples replicated in GTEx at p<0.05 (see online supplementary tables 1 and 2). All shared eGenes replicated in GTEx (90 out of 107 shared eGenes were tested in GTEx). High correlation was noted between effect sizes in our set and GTEx (r2=0.90 for eGenes in the histologically normal samples and r2=0.89 in pancreatic tumour tissues). Of the 2243 overlapping eGenes in GTEx (at the GTEx defined FDR≤0.05), 51% (n=1154) replicated in the histologically normal pancreatic samples at p<0.05 (effect size correlation r2=0.91). The replication rate was lower in tumour samples, where 28% (n=608, out of 2201) GTEx eGenes replicated with marginal significance (effect size correlation r2=0.87).
Examples of eGenes that were highly specific to histologically normal pancreatic samples (LTG set and GTEx) include PPIL1 (peptidylprolyl isomerase like 1; rs6909988: P LTG=1.8×10−18, β=−1.01; P GTEx=1.0×10−42, β=−1.01; P TCGA=0.82, β=0.03), CDCA7 (cell division cycle associated 7; rs6712331: P LTG=1.9x10−11, β=1.06; P GTEx=7.3×10−26, β=1.25; P TCGA=0.71, β=0.06) and DSCC1 (DNA replication and sister chromatid cohesion 1; rs76414504: P LTG=3.5×10−9, β=−0.94; P GTEx=5.8×10−21, β=−1.02; P TCGA=0.83, β=0.04) (see online supplementary table 1). Likewise, eGenes highly specific for pancreatic tumour-derived samples included two zinc finger containing transcription factors ZNF777 (zinc finger protein 777; P TCGA=1.4×10−8, β=−1.49; P LTG=0.17, β=0.31; P GTEx=0.13, β=0.24) and ZNF746 (zinc finger protein 746; P TCGA=8.1×10−8, β=−1.46; P LTG=0.66, β=0.11; P GTEx=0.99, β=0) (see online supplementary table 2).
We next assessed pairwise sharing of cis-eQTLs identified in pancreatic tissues in our study with those identified across multiple tissue types by GTEx. The highest degree of eQTL sharing for the histologically normal LTG samples was observed for the GTEx pancreatic samples (π1=0.99) and the lowest for brain (π1=0.78). On the other hand, eQTL sharing for TCGA/PAAD samples was lower when compared with GTEx pancreas (π1=0.92) and highest for oesophagus, heart, thyroid, muscle and artery (π1=0.97–0.99), thus possibly reflecting the low tumour cellularity and increased tissue heterogeneity in the tumour samples (see online supplementary table 5). A high degree of sharing was noted for the eQTLs with a pancreatic endocrine eQTL data set generated with 118 islets,25 in particular for the histologically normal samples (LTG: π1=0.94; TCGA/PAAD: π1=0.75) (see online supplementary table 6).
Enrichment of eQTLs in putative functional regions
We next examined the distribution of cis-eQTL variants identified in this study (LTG and TCGA sample sets) with regard to specific genomic features and enrichment in regions bound by TFs, presence of modified histones or DNA sites marked by DHS using ENCODE data.5 cis-eQTLs observed in normal and tumour-derived tissues were enriched in promoters (0–3 kb upstream of TSS, ~4–5-fold), 5’untranslated regions (5’UTR; ~10–21-fold), exons (~3–6-fold) and 3’UTR (~3–4-fold), and depleted in distal intergenic regions (figure 3A). We found stronger eQTL effects closer to TSS (median −log10 p value=18.1 and median absolute effect size=0.81 for eQTLs within 50 kb from TSS) as compared with those located farther away (median −log10 p value=15.0 and median absolute effect size=0.77 for eQTLs over 50 kb from TSS) (figure 3B). In particular, shared cis-eQTLs had stronger effects (median −log10 p value 23.8 vs 15.7; median absolute effect size of 0.90 vs 0.76) and tended to be located closer to TSS (average distance 32 kb vs 110 kb) than tumour and normal tissue-specific eQTLs (figure 3B).
Based on available ChIP-seq and DNase-seq data sets from ENCODE,5 we observed significant enrichment of cis-eQTLs in DHS and genomic regions bound by modified histones that mark active chromatin, RNA polymerase II and TFs in the pancreatic cancer cell line PANC-1 (table 3A). This enrichment replicated in ChIP-seq data sets generated by our laboratory for H3K4me1 and H3K4me3 in tumour (PANC-1) and normal-derived (hTERT-HPNE) pancreatic cell lines20 with similar magnitudes (table 3B). Among three TFs with ChIP-seq data available for PANC-1 cells in ENCODE, the greatest enrichment for cis-eQTLs was observed for paired amphipathic helix protein Sin3a (SIN3A), a transcriptional regulator and epigenetic modifier most frequently associated with gene silencing.26 Likewise, we observed prominent enrichment for cis-eQTLs in regions bound by transcription factor 7-like 2 (TCF7L2), a key TF of the canonical Wnt signalling pathway.27 Enrichment of eQTLs in regions bound by the three TFs combined (SIN3A, TCF7L2 and NRSF/REST) in PANC-1 cells (1.79-fold and 1.80-fold, p<0.0001, for normal and tumour eQTLs, respectively) was more prominent than for a combined data set of 161 TFs tested in 91 cell types from multiple other tissues (1.09-fold, p=0.01, and 1.05-fold, p=0.16, for normal and tumour eQTLs, respectively). Enrichment of cis-eQTLs in DHS was assessed in the three pancreatic cell lines (PANC-1, PA-TU-8988T and HPDE6-E6E7). We observed an overall 1.45-fold enrichment (p<0.0001) for histologically normal tissue cis-eQTLs and 1.32-fold enrichment (p=0.001) for tumour tissue cis-eQTLs in the three pancreatic cell lines (range 1.53–2.10 for normal eQTLs and 1.71–2.02 for tumour eQTLs), which was higher than the enrichment in a combined DHS data set that included 125 cell types (1.06-fold, p=0.03 for normal tissue eQTLs and 1.01-fold, p=0.44 for tumour tissue eQTLs).
Enrichment of cis-eQTLs was also tested in genomic regions of accessible chromatin in endocrine pancreatic tissues.28 29 Strong enrichment was observed for promoters (1.53–1.97-fold) and transcriptionally active regions (1.53–1.56-fold) in pancreatic islets; no significant enrichment was seen in insulators or repressed chromatin (see online supplementary table 7).
eQTLs for common pancreatic cancer susceptibility loci from GWAS
To investigate if the biological mechanisms underlying common pancreatic cancer risk loci identified by GWAS relate to gene expression regulation in cis, we tested the association between published pancreatic cancer susceptibility variants21–24 and expression levels of nearby genes (see online supplementary table 8). The most significant eQTL was seen for a pancreatic cancer risk locus on chromosome 9q34.2 in the ABO (ABO blood group transferase) gene.23 The SNP that marks this locus (rs687289)21 was associated with ABO gene expression in histologically normal (p=5.8×10−8, β=0.85; figure 4A) and tumour-derived (p=8.3×10−5, β=0.51) samples (figure 4B). This association replicated in the GTEx pancreatic samples (n=149, p=3.4×10−12, β=0.89; online supplementary figures 3 and 4) as well as in most other tissues in GTEx (see online supplementary table 9). The protective allele at rs687289 (C) is highly correlated (r2=0.97 in the 1000 genomes EUR samples) with a single base pair insertion/deletion (indel) variant in exon 6 of ABO (rs8176719: −/G), which characterises the human O blood group.30 The deletion allele at rs8176719 shifts the reading frame of the ABO gene at amino acid 87 (figure 4C, filled blue circle) and introduces a stop codon ~30 amino acids downstream (figure 4C, filled red circles), leading to a non-functional protein with regard to glycosyltransferase activity. The presence of such a premature termination codon (PTC) prompted us to ask whether the non-functional O blood group (ABO ‘O’) mRNA is a substrate for the nonsense-mediated decay (NMD) mRNA degradation pathway31 as this could explain why ‘O’ variant ABO mRNA levels are lower than ‘non-O’ ABO mRNA levels. The major feature that predicts whether an mRNA is subjected to NMD is a stop codon situated >50 nucleotides upstream of the last exon–exon junction.31 32 In the ABO ‘O’ mRNA, the PTC is situated 19 nucleotides upstream of the last exon–exon junction, which means that it is not predicted to be an NMD substrate based on this criterion. However, a stop codon followed by a long 3’UTR can also trigger NMD in some cases, although this feature is not generally correlated with NMD.31 32 Mapping the 3’ end of the ABO mRNAs by 3’ rapid amplification of cDNA ends (3’ RACE) showed that the 3’UTR (2148–2858 nucleotides) is significantly longer than currently annotated by RefSeq or University of California Santa Cruz Genome Browser (UCSC) (figure 4C). The long 3’UTR could mean that both mRNAs are NMD substrates. To formally test whether NMD reduces the levels of the ABO mRNAs, we used siRNA-mediated knockdown of two crucial NMD genes, SMG6 and UPF1, in pancreatic cancer cell lines that are either homozygous (AsPC-1) or heterozygous (Capan-1) for the rs8176719 deletion allele (Supplementary Tables 15-17). The ABO ‘non-O’ mRNA is expressed at a higher level (5.3+/−1.1-fold) level than the ‘O’ mRNA in the Capan-1 cell line (figure 4E, grey bar graph), reflecting the eQTL observations (figure 4A,B). Knockdown of SMG6 and UPF1 was confirmed by western blotting (figure 4D) and concomitant significant accumulation of the positive control NMD mRNA substrate GAS5 (figure 4E) as expected. A significant ~2–2.5-fold accumulation of ABO ‘O’ mRNA levels was also observed in both cell lines on depletion of SMG6 (figure 4E, coloured bar graphs). However, since knockdown of UPF1, which acts upstream of SMG6,31 only had a modest effect on ABO ‘O’ mRNA levels, it was not clear whether the observed low levels of this mRNA were due to NMD. We therefore assessed mRNA degradation using a pulse-chase approach, which revealed no distinctive differences in the decay profiles of the ABO ‘non-O’ and ‘O’ mRNAs under control and knockdown conditions. Additionally, no major difference between the decay profiles of the two ABO mRNAs could be observed (figure 4E, line graphs). We therefore conclude that the observed differences in ABO ‘O’ and ‘non-O’ mRNA levels are likely not due to NMD or alterations in mRNA degradation rates in general.
Although we did not observe additional cis-eQTLs for pancreatic cancer risk loci identified by GWAS after corrections for multiple testing (FDR<0.1), nominally significant findings were seen for a number of risk loci. The most consistent of these was for rs16986825 on chr22q12.1, where lower XBP1 (X-box binding protein 1) expression was observed in carriers of pancreatic cancer risk-increasing alleles, both in pancreatic tumour samples (p=0.0058, β=−0.31) and in GTEx (p=0.00022, β=−0.38). This effect was not significant in histologically normal LTG samples (p=0.64, β=−0.065) (see online supplementary table 8).
Identification of allele-specific expression (ASE) loci and ASE genes
We next assessed ASE, whereby genetic loci with expressed heterozygous variants exhibit allelic ratio imbalances, as an alternate method to identify the presence of cis effects on gene regulation. We identified a total of 12 929 ASE loci across the 95 normal samples (LTG sample set) and 22 813 ASE loci across the 115 tumour samples (TCGA sample set) at FDR<0.1 (corresponding to nominal p<3.3×10−3 and p<0.01, respectively). This accounts for 3.3% and 13.6% of the heterozygous exonic loci tested (online supplementary figure 5) and a total of 6869 and 8376 genes with at least one ASE locus (ie, ASE gene) across the combined histologically normal or tumour-derived samples, respectively (online supplementary figure 6). The higher number of ASE loci observed in tumour samples is likely due to somatic copy number alterations (CNA) and clonal heterogeneity. After excluding genes disrupted by CNA in specific tumour samples, this number dropped to 7620 (FDR<0.1). The majority of eGenes observed in the histologically normal (76%) and tumour-derived (90%) samples had at least one significant ASE locus at FDR<0.1, which is 1.5-fold (p<0.001) and 1.13-fold (p=0.003) higher than expected by chance (see online supplementary tables 10 and 11).
Eight eGenes from the histologically normal sample eQTL analysis also presented with nominally significant ASE (figure 5A–D; see online supplementary table 12) including BTN3A2 (butyrophilin subfamily 3 member A2, p=5.3×10−11), FAM118A (family with sequence similarity 118 member A, p=4.4×10−10), PM20D1 (peptidase M20 domain containing 1, p=3.6×10−7) and RFWD3 (ring finger and WD repeat domain 3, p=2.6×10−4). The two pancreatic cancer GWAS risk loci that associated with gene expression by eQTL analysis also exhibited ASE in histologically normal samples with suggestive p values. Risk-increasing haplotypes at 9q34.2 and 22q12.1 were associated with more (ABO, p=0.07) or less (XBP1, p=0.052) expression in heterozygous samples, respectively (figure 5E,F), in agreement with the eQTL results described above. The directions of ASE and eQTL effects were consistent for all these loci.
Here, we describe an in-depth study of the impact of inherited genetic variants on the pancreatic transcriptome. By correlating genotypes with gene expression, we identified close to 40 000 cis-eQTLs in histologically normal and tumour-derived pancreatic tissue samples, corresponding to 484 and 237 eGenes, respectively. A high degree of eQTL sharing has been described across tissues and individuals,9 17 18 but few studies have systematically compared this in normal and tumour-derived tissues.19 In our study, 42% of eGenes were specific for the normal-derived pancreatic tissue samples and 23% for pancreatic tumours. In comparison, a recent study using 90 paired colorectal cancer and histologically normal colon tissue samples estimated that 38% and 36% of eGenes were specific for the normal and tumour-derived samples, respectively.19 While the number of pancreatic tumour-specific eGenes in our samples may be underestimated due to a relatively low tumour content (range 5%–73% in the TCGA pancreatic cancer set), these results indicate that the fraction of normal and tumour-specific eGenes may be comparable across tissues.
Genes whose genetic regulation appeared to be altered in pancreatic tumours include ALOX5, which encodes arachidonate 5-lipoxygenase, an integral enzyme in leukotriene biosynthesis. Leukotrienes are lipid mediators of inflammation that contribute to a number of diseases, including asthma, arthritis, psoriasis, cardiovascular disease and cancer.33 ALOX5 expression in pancreatic cancer cell lines induces growth, and is required for Kras-mediated pancreatic tumourigenesis in mouse models.34 35 Genetic regulation of other genes was lost in the pancreatic cancer set; these include DSCC1, a gene involved in sister chromatid cohesion, DNA replication and maintenance of genome stability,36 and CDCA7, a cell division gene important for Notch-mediated differentiation37 and MYC-dependent transformation and apoptosis.38 Whether these genes are important for pancreatic cancer needs to be formally tested; however, expression of CDCA7 and ALOX5 was increased (4.7-fold, p=6.6×10−12 and 2.3-fold, p=1.9×10−3, respectively) in pancreatic tumours in our study, supporting a possible pro-tumourigenic role in this organ.
In agreement with previous studies, the majority of significant cis-eQTLs showed strong enrichment near TSS.16 39 40 Enrichment of eQTL variants was also observed in open chromatin and genomic regions bound by modified histones and TFs, and was more pronounced in pancreatic tissues, including pancreatic islet cells, as compared with other tissue types. Although the small number of transcription factor ChIP-seq data sets available for pancreatic tissues and cell lines does not permit broad assessment of enrichment for specific transcriptional regulators, these results suggest that the eQTLs we identified are relevant for transcriptional control in the pancreas. The greatest enrichment was noted for paired amphipathic helix protein Sin3a, a scaffold protein that brings together chromatin remodelling proteins and transcriptional regulators, including histone deacetylases (HDAC1 and HDAC2), into the SIN3 corepressor complex and is important for multiple cellular roles, including proliferation, development, differentiation and oncogenesis (reviewed by Kadamb et al).26 Notably, Sin3a antagonises Myc activity, and the two proteins appear to be master transcriptional regulators that maintain a balance between cell growth and differentiation.41
A common pancreatic cancer risk variant on chr9q34.2 was strongly associated with ABO expression in normal and tumour-derived samples. Although the marker SNP for this risk locus is non-coding,21 it is strongly correlated with the O blood group generating deletion variant in the ABO gene, indicating that the eQTL could be due to nonsense-mediated mRNA decay31 32 of the ABO ‘O’ mRNA. However, we did not observe strong evidence of NMD for either the ‘O’ or ‘non-O’ ABO mRNAs. Germline alleles in the ABO gene locus influence risk of a number of traits, including cardiovascular diseases, metabolic traits, infectious diseases and cancer.42 Although the underlying mechanism(s) of these associations is not clear, some studies have suggested that this relates to the glycosyltransferase activities of the ‘A’ and ‘B’ ABO proteins (lacking in the ‘O’ protein variant) that could impact gut colonisation by pathogens,43 44 inflammation45 46 and/or blood von Willebrand factor VIII plasma levels.47 In the absence of NMD for the ABO ‘O’ mRNA, our results indicate that a gene regulatory element, whose strength depends on inherited pancreatic cancer risk variants, may influence expression of the ABO gene. The presence of an important gene regulatory element overlapping several highly significant and highly correlated ABO eQTL variants in the first intron of the gene is supported by active histone modification marks, an open chromatin structure and binding of multiple transcription factors in the ENCODE data. These findings suggest that future investigations of the molecular mechanism underlying disease risk mediated by susceptibility alleles at 9q34.2 should include a possible role for a gene regulatory mechanism on ABO gene expression and perhaps a yet undiscovered function for the ABO ‘O’ mRNA and/or protein.
A suggestive eQTL was seen for a second pancreatic cancer risk locus on chr22q12.1 for XBP1 expression. This gene encodes XBP1, which is an important effector of the unfolded protein response (UPR). The UPR is a protective response to endoplasmic reticulum (ER) stress. Its components are constitutively expressed, and can be activated in the pancreas by inflammatory conditions such as pancreatitis, obesity and diabetes.48–51 Notably, these conditions represent known epidemiological risk factors for pancreatic cancer.52 Likewise, perturbations in this mechanism of cellular recovery after ER stress may play a role in other cancers, including breast cancer, glioma and colorectal cancer.53
Taking advantage of the availability of allele-specific sequence read data, we investigated ASE to indirectly estimate the cis-regulatory effects of nearby genetic variants and identified 6869 genes with ASE in normal tissues, which complemented our eQTL findings. This analysis is complicated in tumours by unevenly distributed allelic ratios in different regions of the genome due to CNAs, somatic mutations and tumour heterogeneity, thus likely inflating the ASE estimate. Although the number of ASE genes was reduced after excluding genes with CNAs, this analysis remains imperfect in tumour-derived samples. Nonetheless, the large overlap between eGenes and ASE genes in histologically normal and tumour-derived samples indicates that tumour samples can be of considerable value in investigating the effects of germline sequence variation on gene expression.
There are several strengths and limitations to our study. An important strength is that we assessed eQTLs in two independent pancreatic tissue sample sets (LTG, histologically normal; TCGA, tumour) and replicated the majority of our findings in the publicly available GTEx Consortium data set. Tissue heterogeneity is a limitation, including a small amount of endocrine pancreatic cells (1%–2%) in the histologically normal samples and low tumour cellularity in many of the tumour samples (range 5%–73%). Likewise, as some of the histologically normal tissue samples are derived from patients with pancreatic cancer, we cannot completely exclude changes by the tumour to the transcriptome of the adjacent histologically normal tissue. Finally, a larger number of pancreatic samples would have enhanced statistical power for the eQTL analysis and accuracy for enrichment analyses.
In conclusion, we have performed a comprehensive analysis of the genetic architecture of gene expression in histologically normal and tumour-derived pancreatic tissues. We identified a large number of genetic variants that associate with gene expression in pancreatic tissues, which are enriched in gene regulatory elements that are functional in this organ. Our results may facilitate the understanding of the molecular mechanisms underlying pancreatic cancer risk alleles and provide a rich data set for future studies of the impact inherited genetic variants exert on gene expression in pancreatic tissues.
We thank the patients and donors of tissue and DNA samples that made this study possible. We are grateful to the staff at Mayo Clinic (Rochester, Minnesota), Memorial Sloan Kettering Cancer Center (New York City, New York) and Penn State College of Medicine (Hershey, Pennsylvania) for help with tissue collection and processing. We thank Bao Tran, Jyoti Shetty and other members of the NCI Center for Cancer Research (CCR) Sequencing Facility for sequencing RNA from histologically normal pancreatic tissue samples, as well as Laurie Burdett, Belynda Hicks, Amy Hutchinson and other staff at the National Cancer Institute’s Division of Epidemiology and Genetics (DECG) Cancer Genomics Research Laboratory for GWAS genotyping. This study used the high-performance computational capabilities of the Biowulf Linux cluster at NIH, Bethesda, Maryland, USA (http://biowulf.nih.gov). We would also like to convey our gratitude to Dr Anna Gloyn at the Wellcome Trust Centre for Human Genetics, the Oxford Centre for Diabetes, Endocrinology and Metabolism, and the Oxford NIHR Biomedical Research Centre at the University of Oxford, Oxford, UK, for generously providing cis-eQTL results and epigenetic chromatin states from pancreatic islets for the analyses presented in this paper. The results published here are in part based on data generated by The Cancer Genome Atlas (TCGA) managed by the NCI and NHGRI. Information about TCGA can be found at http://cancergenome.nih.gov/. We acknowledge the clinical contributors that provided PDAC samples and the data producers of RNAseq and GWAS genotype data from TCGA Research Network. We furthermore thank the members of the TCGA PAAD AWG for providing absolute purity information for pancreatic tumour samples that were included in this analysis. The data sets used for the analyses described in this manuscript were obtained by formal permission through the TCGA Data Access Committee (DAC). We acknowledge clinical and data producers of The Genotype-Tissue Expression (GTEx) Project. This project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI\Leidos Biomedical Research subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171) and Science Care(X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to The Broad Institute. Biorepository operations were funded through a Leidos Biomedical Research subcontract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research (HHSN261200800001E). The Brain Bank was supported by supplements to University of Miami grant DA006227. Statistical methods development grants were made to the University of Geneva (MH090941 and MH101814), the University of Chicago (MH090951, MH090937, MH101825 and MH101820), the University of North Carolina - Chapel Hill (MH090936), North Carolina State University (MH101819), Harvard University (MH090948), Stanford University (MH101782), Washington University (MH101810) and to the University of Pennsylvania (MH101822). The data sets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap (accession number phs000424.v6.p1) on 5 March 2014. MZ and WX are currently employees of the US Food and Drug Administration (FDA). The views expressed in this article are those of the authors and not necessarily those of the FDA.
Contributors LA and MZ had full access to all data in the study and take responsibility for the integrity of the data and the accuracy of data analysis. LA: study concept and design. MZ, BZ, WX, JWH, MvdB, HP, MIM, SJC, NC, JPS, SHO, GMP and JS: contribution to study design. MZ, BZ, WX, IC, JS and LA: acquisition of data. MZ, SLA, BZ, WX, JWH, XZ, LMR, MvdB, JJ, HP, TZ, LS, AJ, CCC, BZ, WZ, THJ, MIM, NC, BMW, JPS, SHO, GMP, JS and LA: analysis and interpretation of data. MZ and LA: drafting the manuscript. SJC and KMB: critical review of the manuscript. MZ, BZ, JS and LA: statistical analysis. BAM, WRB, HO, MY, RCK, GLM, IC and LA: administrative, technical or material support. LA: funding and study supervision.
Funding This study was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health. MvdB is supported by a Novo Nordisk postdoctoral fellowship run in partnership with the University of Oxford. MIM is a Wellcome Trust Senior Investigator and is supported by Wellcome Trust awards (#098381, 090532) and NIH grants (U01DK105535). SO at MSKCC is also supported by P30CA008748, C.Thompson, PI.
Competing interests None declared.
Patient consent Consents were signed at each institution that participated in the research (Mayo Clinic and Memorial Sloan Kettering Cancer Center).
Ethics approval National Cancer Institute/NIH, Mayo Clinic, Memorial Sloan Kettering Cancer Center, Penn State University.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Data from RNA sequencing and GWAS genotyping will be deposited in dbGAP and are available from the authors upon reasonable request.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.