Article Text

Download PDFPDF

Original research
Integrated paired-end enhancer profiling and whole-genome sequencing reveals recurrent CCNE1 and IGF2 enhancer hijacking in primary gastric adenocarcinoma
  1. Wen Fong Ooi1,
  2. Amrita M Nargund2,
  3. Kevin Junliang Lim2,3,
  4. Shenli Zhang2,
  5. Manjie Xing1,
  6. Amit Mandoli2,
  7. Jing Quan Lim4,
  8. Shamaine Wei Ting Ho5,
  9. Yu Guo6,
  10. Xiaosai Yao1,
  11. Suling Joyce Lin1,
  12. Tannistha Nandi1,
  13. Chang Xu2,
  14. Xuewen Ong2,
  15. Minghui Lee2,
  16. Angie Lay-Keng Tan2,
  17. Yue Ning Lam1,
  18. Jing Xian Teo7,
  19. Atsushi Kaneda8,
  20. Kevin P White9,10,
  21. Weng Khong Lim2,7,
  22. Steven G Rozen2,3,7,
  23. Bin Tean Teh2,5,7,11,
  24. Shang Li2,
  25. Anders J Skanderup6,
  26. Patrick Tan1,2,5,7
  1. 1 Cancer Therapeutics and Stratified Oncology, Genome Institute of Singapore, Singapore
  2. 2 Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore
  3. 3 Centre for Computational Biology, Duke-NUS Medical School, Singapore
  4. 4 Lymphoma Genomic Translational Laboratory, National Cancer Centre Singapore, Singapore
  5. 5 Cancer Science Institute of Singapore, National University of Singapore, Singapore
  6. 6 Computational and Systems Biology, Genome Institute of Singapore, Singapore
  7. 7 SingHealth/Duke-NUS Institute of Precision Medicine, National Heart Centre, Singapore
  8. 8 Department of Molecular Oncology, Chiba University, Chiba, Japan
  9. 9 Institute for Genomics and Systems Biology, University of Chicago and Argonne National Laboratory, Chicago, Illinois, USA
  10. 10 Tempus Labs, Chicago, Illinois, USA
  11. 11 Laboratory of Cancer Epigenome, National Cancer Centre Singapore, Singapore
  1. Correspondence to Dr Patrick Tan, Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore 169857, Singapore; gmstanp{at}duke-nus.edu.sg

Abstract

Objective Genomic structural variations (SVs) causing rewiring of cis-regulatory elements remain largely unexplored in gastric cancer (GC). To identify SVs affecting enhancer elements in GC (enhancer-based SVs), we integrated epigenomic enhancer profiles revealed by paired-end H3K27ac ChIP-sequencing from primary GCs with tumour whole-genome sequencing (WGS) data (PeNChIP-seq/WGS).

Design We applied PeNChIP-seq to 11 primary GCs and matched normal tissues combined with WGS profiles of >200 GCs. Epigenome profiles were analysed alongside matched RNA-seq data to identify tumour-associated enhancer-based SVs with altered cancer transcription. Functional validation of candidate enhancer-based SVs was performed using CRISPR/Cas9 genome editing, chromosome conformation capture assays (4C-seq, Capture-C) and Hi-C analysis of primary GCs.

Results PeNChIP-seq/WGS revealed ~150 enhancer-based SVs in GC. The majority (63%) of SVs linked to target gene deregulation were associated with increased tumour expression. Enhancer-based SVs targeting CCNE1, a key driver of therapy resistance, occurred in 8% of patients frequently juxtaposing diverse distal enhancers to CCNE1 proximal regions. CCNE1-rearranged GCs were associated with high CCNE1 expression, disrupted CCNE1 topologically associating domain (TAD) boundaries, and novel TAD interactions in CCNE1-rearranged primary tumours. We also observed IGF2 enhancer-based SVs, previously noted in colorectal cancer, highlighting a common non-coding genetic driver alteration in gastric and colorectal malignancies.

Conclusion Integrated paired-end NanoChIP-seq and WGS of gastric tumours reveals tumour-associated regulatory SV in regions associated with both simple and complex genomic rearrangements. Genomic rearrangements may thus exploit enhancer-hijacking as a common mechanism to drive oncogene expression in GC.

  • gastric cancer
View Full Text

Statistics from Altmetric.com

Significance of this study

What is already known on this subject?

  • Previous structural variation (SV) studies in gastric cancer (GC) have focused on gene fusions and disruptions at 3’ untranslated regions of genes. However, SVs affecting non-coding distal enhancer elements, particularly those leading to increased oncogene expression in GC, remain largely unknown.

  • High CCNE1 expression in GC is associated with both poor patient survival and resistance to anti-HER2 targeted therapies. CCNE1 overexpression has been largely attributed to copy number amplification.

  • In colorectal cancer, genomic rearrangements affecting the IGF2 oncogene have been observed.

What are the new findings?

  • Using an experimental strategy combining paired–end enhancer profiling with whole-genome sequencing, we identified GC enhancer-based SVs.

  • We observed recurrent enhancer-hijacking events at CCNE1 (8%) and IGF2 (4%).

  • Functional analysis using CRISPR/Cas9 genome editing and chromosome conformation assays reveals that hijacked enhancers are important for maintaining high CCNE1 expression.

  • Hi-C analysis of primary GCs revealed novel long-range chromatin interactions at genomic locations exhibiting CCNE1 and IGF2 rearrangements.

Significance of this study

How might it impact on clinical practice in the foreseeable future?

  • Our results reveal that besides copy number amplification, CCNE1 expression in GC is also driven by enhancer hijacking.

  • Diagnostic platforms may need to incorporate non-coding genomic regions and epigenomic information to accurately detect enhancer hijacking events in cancer.

Introduction

Gastric cancer (GC) is a leading cause of global cancer mortality with high incidence in East Asia, Eastern Europe and Central/South America.1 Previous GC genomic studies have largely focused on protein-coding genes such as TP53, ARID1A, ERBB2 and RHOA.2 3 While important to our understanding of GC,3 these studies have been confined to protein-coding genomic regions representing less than 2% of the human genome.2–4 In contrast, recurrent structural variations (SVs) including rearrangements, translocations and inversions remain poorly described in GC.5

Initial SV studies in GC focused on identifying gene fusions6–8 and the 3’ untranslated regions of genes including CCND1,9 FGFR2 10 and PD-L1.11 In other cancers, studies have uncovered a novel class of genetic driver alterations termed ‘enhancer hijacking’, where distal enhancers are translocated to the promoters of oncogenes such as GFI1, IGF2 and MYC.12–15 Detecting tumour-associated ‘enhancer-based SVs’ is challenging, often requiring synthesising outputs from different SV detection algorithms,16 prioritising SV breakpoints located at non-coding and often heterologous genomic regions, and discerning functionally relevant SVs from passenger SVs caused by regional genomic instability.17 Clinically, the observation that tumour-gene expression can be driven by recruitment of distal enhancer elements may require expanding current diagnostic platforms to non-coding genomic regions and integrating epigenetic information to identify functionally important SVs.

Enhancers exhibit high tissue-specificity and context-specificity18 and active enhancers are often marked by histone H3 Lys27 acetylation (H3K27ac).19 Previous studies have employed single-end H3K27ac ChIP-seq profiling for enhancer identification.19 20 Here, we performed H3K27ac ChIP-seq paired-end sequencing to identify SVs directly associated with acetylated genomic regions, an approach previously tested in B-cell lymphoma.21 By integrating SVs associated with acetylated chromatin identified by paired-end NanoChIP-seq (PeNChIP-seq) with whole-genome sequencing (WGS) analysis of a large GC cohort, we observed enhancer hijacking events in GC targeting CCNE1, IGF2 and CCND1. As high expression of CCNE1 has been associated with poor survival and resistance to ERBB2-directed therapy in GC,22 23 our results suggest that in addition to measuring CCNE1 gene amplification, diagnostic panels may also need to consider detecting CCNE1 enhancer-based SVs. Moreover, the identification of novel targets in GC such as IGF2 suggests the possibility of new therapeutic interventions.

Materials and methods

Tissue samples

‘Normal’ (ie, non-malignant) samples refers to stomach samples harvested from sites distant from the tumour and exhibiting no visible evidence of tumour or intestinal metaplasia/dysplasia on surgical assessment.

Patient and public involvement

This research was done without patient involvement. Patients were not invited to comment on the study design and were not consulted to develop patient relevant outcomes or interpret the results. Patients were not invited to contribute to the writing or editing of this document for readability or accuracy.

Paired-end NanoChIP-seq

Paired-end Nano-ChIPseq was performed19 with slight modifications. Fresh-frozen tissues were dissected in liquid nitrogen to obtain ~5 mg sized pieces for ChIP, performed using H3K27ac antibodies (ab4729, Abcam). After recovery of ChIP and input DNA, whole-genome-amplification was performed using the WGA4 kit (Sigma-Aldrich) and BpmI-WGA primers. About 30 ng of amplified DNA was used for each sequencing library (New England Biolabs).

SV detection and analyses

Enhancer-based SVs were detected using LUMPY3 and DELLY,4 followed by manual curation to remove SVs with features suggestive of artefacts such as repetitive or low-complexity DNA sequences and overlapping read pairs mapping to multiple distant genomic sites. We applied LUMPY (V.0.2.13).24 Each sample was genotyped using SVTyper.25 Enhancer-based SV and breakpoint lists were structured in VCF format. Files in BEDPE format were generated using the ‘vcfToBedpe’ script (https://github.com/arq5x/lumpy-sv/blob/master/scripts/vcfToBedpe), allowing detection of common or exclusive breakpoints using pairToPair or pairToBed. We excluded SVs exhibiting: (1) size SVLEN less than 50 bp (only applicable for SVs classified as deletions or tandem duplications), (2) neither breakpoint localised within predicted H2K27ac enriched regions (MACS2, FDR≤5%), (3) one or more breakpoints localised within ENCODE blacklisted or low complexity regions, (4) SV not supported by both discordant paired-end and split reads. After manual curation (above), a further 16 germline enhancer-based SVs were excluded. Enhancer-based SVs used for downstream analysis included: (1) SVs detected in both PeNChIP-seq and matched input using the pairToPair (–slop 20) tool, (2) SVs associated with a minimum quality score (QUAL) of 100. Identification of high-confidence SVs using WGS profiles from GC and normal samples5 was performed using LUMPY. GC-matched normal WGS profiles were analysed from in-house data, TCGA, ICGC and publicly available databases.2

Data availability

Paired-end ChIP-seq, Hi-C, 4C-seq and Capture-C are available in Gene Expression Omnibus (GSE118392; https://www.ncbi.nlm.nih.gov/geo/). Histone ChIP-seq from MKN7 (wild type; GSE97838, GSE97837), colorectal cancer lines (GSE77737), breast cancer lines (GSE85158), in-house RNA-seq (GSE85465) and H3K27ac single-end ChIP-seq (GSE85465) were reanalysed. Normalised gene expression matrices from TCGA STAD (gdac.broadinstitute.org_STAD.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2016012800.0.0) were used. WGS profiles were processed as described in Guo et al.5

Additional methods associated with RNA-seq, Hi-C, Capture-C, 4C-seq, CapStarr-seq are provided in the online supplementary information.

Results

Detecting PeNChIP-seq regulatory structural variants

We performed paired-end H3K27ac PeNChIP-seq on a discovery set of 11 primary GCs and matched normal gastric samples (total 22 samples; online supplementary table 1). Comparison of PeNChIP-seq against previous single-end H3K27ac profiles of the same samples19 26 confirmed a high degree of correlation (0.84≤ρ≤0.91, Spearman’s rank correlation coefficient; online supplementary figure 1A, online supplementary table 2). To detect enhancer-based SVs (SVs associated with H3K27ac acetylated regions), we integrated outputs from two SV callers (LUMPY and DELLY).24 27 We prioritised SVs observed in both PeNChIP-seq and input controls, and/or PeNChIP-seq SVs with high quality sequence scores (online supplementary figure 1B). To be nominated, enhancer-based SVs were also required to demonstrate support by discordant paired-end and also split reads.

As PeNChIP-seq is a relatively new technique for SV discovery, we assessed the concordance of PeNChIP-seq detected enhancer-based SVs to conventional single-end enhancer ChIP-seq with WGS (enhancer-ChIP+WGS). To establish the enhancer-ChIP+WGS data set, we integrated single-end H3K27ac ChIP-seq and WGS structural variant (SV) data from 11 normal gastric samples. For each sample, we identified 72 enhancer-based SVs (average; SD=21 SVs), corresponding to 2.2% of all WGS SVs (average; SD=0.6%) comparable to other studies.28 Independently, we also identified for the same samples 310 PeNChIP-seq enhancer-based SVs (online supplementary table 3). An average of 16% of enhancer-ChIP+WGS SVs were detected using PeNChIP-seq (online supplementary figure 1C). Closer inspection highlighted two main factors explaining the 84% of SVs not detected by PeNChIP-seq. First, for many undetected SVs, we observed low PeNChIP-seq sequence coverage at breakpoints (median six reads for undetected SVs vs 20 for detected SVs, p value <2.2×10-16, one-sided Wilcoxon’s rank sum test; online supplementary figure 2A). Second, unlike PeNChIP-seq SVs where discordant/split aligned reads are directly associated with H3K27ac sequencing reads, enhancer-ChIP+WGS SVs comprise SVs whose breakpoints overlap with ‘genomic regions associated with histone acetylation’, where the latter regions are inferred based on initial H3K27ac reads mapping to the genome followed by subsequent in silico 3’ extension of the read cluster. This results in the inferred acetylated regions in enhancer-ChIP+WGS being larger/longer than the actual H3K27ac read cluster (typical enhancer length ~550 bp; see online supplementary methods) and a substantial number of enhancer-ChIP+WGS SVs mapping not directly but ‘nearby’ a region of H3K27ac ChIP-seq sequencing reads (online supplementary figure 2B).

Reciprocally, 61% of PeNChIP-seq enhancer-based SVs were rediscovered using conventional WGS5 (online supplementary figure 3). As an example, in sample N980417, both PeNChIP-seq and WGS identified a ~4 Kb deletion associated with acetylated chromatin at the SEC22B promoter (online supplementary figure 4). Three factors explain the 39% of SVs that were not detected by matched WGS. First, a subset of germline PeNChIP-seq SVs missed by matched WGS profiles were detected in unmatched WGS profiles from other individuals; inclusion of these unmatched WGS SVs improved the percentage of PeNChIP-SVs detected by WGS from 61% to 68% (online supplementary table 3). Second, categorisation of the undetected SVs by deletions, tandem duplications, inversions and complex variants revealed that undetected SVs are enriched in complex variants (76%) which are known to be analytically challenging (online supplementary figure 5); we found that excluding complex SVs from analysis improved the percentage of PeNChIP-seq SVs detected by WGS from 68% to 84%. Third, using orthogonal PCR and Sanger sequencing, we selected and experimentally validated five out of five PeNChIP-seq specific SVs undetected in the matched WGS profiles (100% success rate; online supplementary table 3; online supplementary figure 6) (see Discussion section).

Identification of tumour-associated enhancer-based SVs

We then applied PeNChIP-seq to the GC samples to identify tumour-associated enhancer-based SVs after removing germline SVs observed in patient-matched normal tissues, in-house matched as well as unmatched gastric WGS profiles and public germline databases (see Methods section). We identified 148 tumour-associated enhancer-based SVs (online supplementary table 4), divided into four rearrangement categories—complex variants (67%), tandem duplications (16%), deletions (16%) and inversions (1%) (figure 1A). Of these tumour-associated enhancer-based SVs, 20% (n=30) exhibited breakpoints localised at H3K27ac-enriched promoter regions (±2.5 Kb from annotated transcription start sites (TSSs) of protein-coding genes from GENCODE 19).

Figure 1ABCD

Landscape of GC enhancer-based SVs using PeNChIP-seq and WGS. (A) Distribution of tumour-associated enhancer-based SVs by rearrangement classes (complex variants, tandem duplications, deletions and inversions). (B) Fraction of tumour-associated enhancer-based SVs with deregulated target gene expression by rearrangement classes. Yellow dotted boxes correspond to classes where the proportion of enhancer-based SVs associated with deregulated target gene expression is significant (empirical p value <0.05). Target genes were deemed deregulated if they exhibited at least fourfold difference in matched RNA-seq profiles and >1.0 FPKM difference. (C) Enhancer-based SV target genes. Enhancer-based SV-target gene pairs (dots) are plotted by their log-transformed fold change (x-axis) and log-transformed absolute difference (y-axis). Pairs exhibiting at least 32-unit FPKM differences are displayed. Target genes showing deregulated expression by at least fourfold (red: upregulation; blue: down-regulation) are indicated. Boxes highlight recurrent target genes (IGF2). A full scatter plot is available in online supplementary figure 8. (D) Genome-wide tumor-associated SV frequencies. Numbers of GC samples (n=208) with SV breakpoints binned within non-overlapping 50 Kb windows. Windows associated with at least five GCs harbouring tumour-associated SVs are termed SV hotspots (black). Windows associated with tumour-associated enhancer-based SVs linked to deregulated gene expression are in red. The top recurrent SV hotspots (black) and enhancer-based SV hotspots (red) are indicated with putative target gene symbols.

Figure 1EFGH

(E) Expression levels (log-transformed RSEM) of target genes associated with enhancer-based SV hotspots identified using PeNChIP-seq. RNA-seq profiles from TCGA stomach adenocarcinoma (STAD) samples were used. The total number of GC and normal gastric samples are reported below the boxplot. (F) Expression levels (log-transformed RSEM) of target genes associated with SV hotspots identified using WGS data only. The most recurrent SV hotspots were analysed. RNA-seq profiles from TCGA stomach adenocarcinoma (STAD) samples were used. (G) Expression levels (log-transformed FPKM) of target genes associated with enhancer-based SV hotspots originally identified using PeNChIP-seq. RNA-seq profiles from in-house samples were used. Yellow shadows indicate positive associations between tumour-associated enhancer-based SVs and putative target gene expression across both data sets. (H) Expression levels (log-transformed FPKM) of target genes associated with SV hotspots identified using WGS data only. RNA-seq profiles from in-house samples were used. GC, gastric cancer; FPKM, fragments per kiilobase of transcript per million mapped reads; SV, structural variation; WGS, whole-genome sequencing.

To evaluate their transcriptional impact, we linked the SVs to their predicted target genes using the GREAT tool29 (online supplementary table 5). The median distance of enhancer-based SVs to their predicted target genes was 16.6 Kb, significantly shorter than SVs from tumour WGS-only data where most SVs are likely caused by general chromosomal instability (31.6 Kb, n=986, p value=2.1×10-6, one-sided Wilcoxon’s test) (online supplementary figure 7). Quantifying target gene expression, 37%–42% of SVs in each class were associated with highly de-egulated (>fourfold) target gene expression compared with normal gastric tissues (figure 1B). Most of the tumour-associated enhancer-based SVs (63%) were associated with gene activation rather than repression, consistent with H3K27ac being a marker of gene activation (figure 1C, online supplementary figure 8).

We proceeded to investigate associations between tumour-associated enhancer-based SVs, somatic copy number amplification and elevated gene expression. Forty-nine of the enhancer-based SVs (33%) were associated with elevated expression of non-amplified target genes (online supplementary figure 9), including IGF2, CCR4 and TWIST2 (>twofold compared with normal), consistent with enhancer-based SVs driving the expression of these non-amplified target genes. Additionally, 19 enhancer-driven SVs (13%) were associated with elevated expression in amplified target genes, such as CCNE1 and CCND1. This raises the possibility that for target genes exhibiting amplification and enhancer-driven SVs, enhancer-driven SVs may interact with gene dosage to collaboratively up-regulate gene expression13 (see next section).

To determine the prevalence of these tumour-associated enhancer-based SVs in a larger GC cohort, we integrated the genomic regions identified by PeNChIP-seq with WGS profiles of 208 primary GCs (online supplementary table 6).5 We also leveraged this analysis to identify rearranged genes revealed by standard WGS SV analysis (WGS-only). Applying non-overlapping 50 Kb genomic windows to determine the most frequent SVs arising from analysing WGS-only data (‘WGS-only SVs’), we identified 297 windows (~0.5%; online supplementary table 7) exhibiting somatic SVs in at least five patients (>2% of GC samples), excluding active long interspersed nuclear element-1 (LINE-1) regions30 (figure 1D, green). Consistent with SVs frequently occurring at fragile sites due to chromosomal damage from replication stress,31 seven of the top ten WGS-only SV hotspots overlapped with common known fragile sites at FHIT, WWOX, MACROD2, PARK2 and IMMP2L.31 At the transcriptional level, across two independent GC cohorts (TCGA and in-house) 20% of the WGS-only SVs were associated with deregulated target gene expression in at least one cohort (online supplementary figure 10), and even after excluding regions associated with fragile sites, none of the top WGS-SV hotspots (n=5) were associated with consistent target gene deregulation (online supplementary figure 11).

About 60% of the top PeNChIP-seq SVs (n=10) exhibited deregulated target gene expression in at least one cohort (online supplementary figure 10). We confirmed that these hotspots were enriched for tumour-associated enhancer-based SVs compared with germline enhancer-based SVs (ratio >1.0, empirical p value ≤6×10-4; see online supplementary results, online supplementary methods). To prioritise functionally relevant enhancer-based SVs, we selected PeNChIP-seq SVs associated with elevated target gene expression and also with one breakpoint recurrently observed in a larger GC series. We identified 68 enhancer SVs (out of 148) associated with elevated target gene expression—of these, 47 exhibited recurrent breakpoints across multiple GCs. Pathway analysis of target genes linked to the 68 enhancer-based SVs revealed that they are enriched in pathways related to regulation of cell communication, signalling, and the mitotic cell cycle (p value <1×10-4, GOrilla),32 which are plausibly related to cancer. CCNE1 and IGF2 displayed consistent trends in both cohorts (online supplementary figure 11), exhibiting gene upregulation in the presence of enhancer-based SVs (yellow highlight, figure 1E,G). CCNE1 and IGF2 were not highlighted by WGS-only SV analysis (figure 1F,H).

Recurrent CCNE1 enhancer-based SVs in GC

CCNE1 is a cell cycle regulator upregulated in several malignancies including GC, ovarian and breast cancer.33–35 Associated with resistance to targeted therapies9 including ERBB2-directed treatment,22 36 CCNE1 genomic amplifications have been reported,33 however, CCNE1 as a target of enhancer-based SVs in GC has not been described. In our cohort, CCNE1-enhancer-based SVs were observed in 8% of GC patients (16 patients; online supplementary table 8), involving seven intra-chromosomal rearrangements, four inter-chromosomal translocations, three tandem duplications and two deletions. Analysing CCNE1 amplifications and enhancer-driven SVs, we found that GCs harbouring CCNE1-enhancer-based SVs but lacking CCNE1 amplification exhibited elevated CCNE1 expression, while GCs with both CCNE1-enhancer-based SVs and CCNE1 amplifications exhibited even greater CCNE1 expression (figure 1E, G, online supplementary figure 12). These results suggest that for target genes exhibiting amplification, enhancer-based SVs may interact with conventional gene dosage to collaboratively up-regulate target gene expression. Notably, in cases where CCNE1 enhancer hijacking was observed, we were unable to observe evidence of CCNE1 fusion genes, reducing the possibility of CCNE1 gene fusions as a causal factor to the dysregulation of CCNE1 expression.

Examination of breakpoints associated with the CCNE1 SVs (online supplementary figure 13A) revealed that 63% of cases (n=10 or 4.8% of the entire study) were associated with one breakpoint consistently localised close to the CCNE1 TSS (−31 Kb to +51 Kb), with partner breakpoints at heterologous regions including up to 10 Mb apart on the same chromosome.15 The remaining six cases also exhibited breakpoints relatively close to the CCNE1 TSS (+52 Kb to +188 Kb), comparable to distances seen in other enhancer hijacking scenarios.14 For one CCNE1-rearranged GC (GC T2000877), the region translocated upstream to CCNE1 was enriched for H3K27ac signals relative to input controls (figure 2A). We orthogonally confirmed this CCNE1 rearrangement (figure 2B) using Sanger sequencing and PCR (figure 2C, D). This enhancer signal was not present in matched normal gastric tissues (figure 2A), indicating that it is tumour-associated. For another CCNE1-rearranged case (T990275, figure 2E), integration of the enhancer-based SV with H3K27ac and CapStarr-seq profiles (a high-throughput method of measuring functional enhancer activity37 from SNU16 GC cells confirmed that the translocating partner region (figure 2E, left) was H3K27 acetylated and exhibited functional transcriptional stimulation activity, consistent with enhancer activity.

Figure 2ABCD

Recurrent enhancer-based SVs target CCNE1 in GC. (A) Representative CCNE1 enhancer-based SV. An intra-chromosomal CCNE1 rearrangement detected by H3K27ac PeNChIP-seq and WGS is shown (T2000877). Alignment of PeNChIP-seq and WGS sequencing reads were visualised using Integrative Genomics Viewer. The red arrow indicates the direction of gene transcription. (B) Breakpoints detected at predicted enhancer regions (green) where H3K27ac enrichment is observed in the tumour (T2000877). The SV juxtaposes a tumour-associated enhancer from a region approximately 1 Mb away to a region 20 Kb upstream of the CCNE1 promoter (yellow). The symbol ‘>’ indicates the strand direction of 5’–3’. (C) Sequence validation of the CCNE1 enhancer-based SV using Sanger sequencing. (D) Orthogonal validation of the CCNE1 enhancer-based SV by PCR using primers targeting the translocated region (GC T2000877). Lane 1 shows DNA ladders; Lane 2 (filled dot) an amplified 715 bp fragment in the GC sample, which is much weaker in the matched normal sample (unfilled dot).

Figure 2EFG

(E) Another a CCNE1 enhancer-based SV (sample T990275). Breakpoints at chromosome 2 and 19 are indicated. This inter-chromosomal rearrangement juxtaposes an enhancer element from the KIAA1211L locus to a region approximately 21Kb upstream of CCNE1. CapStarr-seq profiles from SNU16 GC cells confirm that the translocated enhancer regions exhibits enhancer activity. CapStarr-seq probes are reflected in black at the bottom. The rearrangement is illustrated below the gene track. (F) Top four tumour-associated SVs in MKN7 cells supported by the highest number of supporting reads. DNA regions associated with H3K27ac rearranged to the CCNE1 promoter are visualised. Predicted enhancers (blue) and super-enhancers (black) are indicated. Long range cis-interactions by 4C-seq with the CCNE1 promoter (bait, blue triangle) are indicated in green. Distances between promoter and SV breakpoints are indicated. The region (e1) with a scissor (orange) was deleted using CRISPR/Cas9. (G) CCNE1 expression in wild-type (control with Cas9-coexpressing) and enhancer-deleted (e1-deleted) MKN7 cells measured in three technical replicates per biological replicate (n=5) using RT-qPCR. CCNE1 expression was also measured in two additional control experiments (each control three technical replicates per biological replicate (n=2)), in both cases also using Cas9-coexpressing MKN7 cells, included a non-targeting control sgRNA, and deletion of a 100Kb region in the same genomic neighbourhood as CCNE1 but outside the CCNE1 topology associated domain (TAD). Transfections were performed for 9 days. P values were computed using prism, GraphPad. GC,gastric cancer; SV,structural variation; WGS, whole-genome sequencing.

Extending our analysis beyond primary tumours, we then interrogated WGS data of GC cell lines (CLS145, MKN7, SCH, IST1 and YCC18) exhibiting high CCNE1 expression (online supplementary figure 13B). Two lines (MKN7, IST1) harboured CCNE1-enhancer-based SVs (online supplementary table 9) co-occurring with CCNE1 amplification. Analysis of MKN7 cells revealed that the top four CCNE1 SVs supported by the highest number of supporting reads (left, figure 2F) included an inter-chromosomal event (CTX), two intra-chromosomal rearrangements (ITX) and a separate 3.8 Mb tandem duplication (DUP) (validated by Sanger sequencing, online supplementary figure 13C,D). Examining putative hijacked enhancer elements using matched H3K27ac profiles (right, figure 2F) and 4C-seq (online supplementary figure 13E), we confirmed that these hijacked regions were indeed associated with acetylated chromatin (H3K27ac) at promoter-distal regions (blue, black rectangles, figure 2F). Using 4C-seq, we further confirmed that the putative hijacked enhancers indeed exhibited long-range cis-interactions with the CCNE1 promoter, despite originating from distal regions (Figure 2F, online supplementary figure 13E). These SVs are thus likely to juxtapose active enhancers (blue, black rectangles figure 2F) from distal loci to CCNE1-adjacent regions, resulting in novel enhancer-promoter interactions (green, figure 2F).

To demonstrate a causal role between hijacked enhancers and CCNE1 expression, we selected the tandem duplication-associated enhancer event for testing, as this enhancer is the nearest SV to the CCNE1 promoter, associated with cis-interactions with the CCNE1 promoter, and exhibits high H3K27ac signals. Using CRISPR/Cas9 genome editing, we deleted the hijacked enhancer region in MKN7 cells (scissor, figure 2F). After confirming CRISPR deletion efficiencies (online supplementary figure 13F), we used RT-qPCR to compare CCNE1 expression levels between enhancer-deleted MKN7 cells and control MKN7 cells co-expressing Cas9. Experiments were performed across five independent biological replicates with three technical replicates each. We observed a 20% reduction in CCNE1 expression on deletion of the hijacked enhancer (p value <0.0001, two-sided t-test). We further performed two control experiments, also using Cas9-coexpressing MKN7 cells, testing either a non-targeting control sgRNA, or deletion of a 100 Kb region in the same genomic neighbourhood as CCNE1 but outside the CCNE1 topology associated domain (TAD)—these showed no alteration of CCNE1 transcripts (figure 2G). Taken collectively, these results reveal that in primary GCs and cell lines, CCNE1 enhancer-based SVs frequently juxtapose distal active enhancers to the CCNE1 proximal region, driving CCNE1 overexpression in conjunction with CCNE1 amplification.

CCNE1 enhancer hijacking events disrupt TAD boundaries

Human genomes are partitioned into TADs which are 3D chromosomal domains largely invariant across cell types (mean 830 Kb size).38 Within individual TADs, genomic interactions between regulatory elements are strong, but are insulated between TADs.38 We proceeded to investigate CCNE1 enhancer-promoter interactions and their association with TAD boundaries. First, to survey regulatory interactions associated with the CCNE1 promoter in the absence of genomic rearrangements, we applied Capture-C technology onto a GC cell line (SNU16) lacking CCNE1 enhancer-based SVs (yellow tracks, figure 3A; online supplementary table 10). We observed multiple interactions between the CCNE1 promoter (viewpoint, figure 3A) and putative distal enhancers (H3K27ac+/H3K4me1+/H3K4me3-/CapStarr-seq+; figure 3A), indicating that the CCNE1 promoter can exhibit long-range interactions both upstream (solid yellow line) and downstream of the CCNE1 gene (dotted yellow line, figure 3A). Similar results were obtained using 4C analysis (online supplementary figure 13E). Second, to map CCNE1 promoter-enhancer interactions across TADs, we integrated the Capture-C interactions with TAD domains predicted from Hi-C chromosome conformation capture sequencing using data from SNU16 GC cells (online supplementary table 11), human embryonic stem cells, and human foetal lung cells (IMR90). As predicted, the CCNE1 promoter-enhancer interactions were indeed bounded within a single TAD domain, across all three cell types (figure 3B). These results suggest that in GCs lacking CCNE1 enhancer-based SVs, CCNE1 is generally regulated via promoter-enhancer interactions that are generally do not span more than 100–200 Kb and bounded within a typical TAD (average 830 Kb).38

Figure 3ABCD

Enhancer hijacking activates CCNE1 overexpression. (A) CCNE1 enhancer-promoter interactions. CCNE1 interaction profiles from SNU16 GC cells were generated using Capture-C with the CCNE1 promoter as the viewpoint (blue cross). Interactions (with corresponding statistical significance as −log q values) are shown as yellow bars. H3K27ac, H3K4me1 and H3K4me3 profiles of SNU16 cells are shown in red, blue and green, respectively. CapStarr-seq signals, reflecting functional enhancer activity from SNU16 cells are shown in the last track (black). The CCNE1 promoter shows interactions (yellow boxes) with both upstream (solid line) and downstream enhancer elements (about 40 Kb downstream, dotted line). (B) Predicted TADs at the CCNE1 locus using Hi-C data from SNU16 GC cells. The heat map shows strength of interaction signals between genomic regions. TADs observed in SNU16 cells exhibited strong similarity with TADs predicted using human embryonic stem cells and IMR90 (human foetal lung cells). (C) Location of CCNE1-associated SVs detected in primary GCs and GC cell lines from whole-genome sequencing (WGS, n=18). cases were divided into two categories: tumour-associated SVs with no TAD disruptions (n=8) or predicted TAD disruptions (n=10). Regions from different TADs (intra-chromosomal rearrangements) or from other chromosomes (inter-chromosomal rearrangements) are distinguished by colours. Filled circles indicate cases harbouring SVs close to the CCNE1 promoter (−31 Kb to +51 Kb). (D) Hi-C contact matrix generated from GC T2000877 illustrates the frequency of chromatin interactions between genomic regions chr19:30 000 000–32 000 000. Histone H3K27ac profiles from the same GC and its matched normal (N2000877) at the CCNE1 locus are displayed on top. Locations of two tumour-associated SVs (one tandem duplication, one rearrangement) are indicated. The percentage in brackets indicate the fraction of reads supporting an SV. Yellow boxes represent contact domains at the 10 Kb resolution. Black dotted circles indicate new interactions seen in T2000877 but not in control SNU16 GC cells, associated with the tandem duplication and the rearrangement.

Figure 3E

(E) Model for a predicted enhancer hijacking mechanism arisen from tandem duplication that causes upregulation CCNE1 expression in GC T2000877. DNA harbouring predicted super-enhancers (black box) from one TAD are juxtaposed inside the CCNE1-containing TAD, enabling CCNE1 upregulation. Another hijacking event arising from an intra-chromosomal rearrangement is shown in figure 2A. GC,gastric cancer; SV,structural variation; TAD, topology associated domain.

Importantly, we then found that several of the CCNE1-enhancer-based SVs were predicted to disrupt these pre-existing TAD boundaries, which might lead to the pathological rewiring of gene-enhancer interactions.13 39 Specifically, we observed 10 cases of putative TAD disruptions in eight GC primary samples and two GC cell lines, through inter-chromosomal (n=5) or intra-chromosomal (n=3) rearrangements, one tandem duplication and one deletion (figure 3C), and of these seven out of 10 cases (filled dot, figure 3C) showed one breakpoint close to the CCNE1 promoter. As an example, we observed predicted enhancers (supported by H3K27ac and CapStarr-seq) at the chr2 KIAA1211L locus translocated close to the CCNE1 promoter, disrupting pre-existing TAD boundaries (figure 2E).

To directly confirm TAD disruptions caused by CCNE1-enhancer-based SVs, we performed Hi-C analysis (online supplementary table 11) on a CCNE1-rearranged primary GC containing both a CCNE1 intra-chromosomal rearrangement and a CCNE1 tandem duplication (T2000877). We visualised TAD interactions in this region as a contact matrix (chr19:30 Mb-32Mb; top, figure 3D) and as a negative control, we compared similar contact matrices from SNU16 GC cells where no CCNE1 SVs were observed (bottom, figure 3D). Besides confirming high numbers of interactions (in red) within pre-existing TADs (yellow box, Arrowhead at 10 Kb resolution), we also observed novel de novo interaction patterns at breakpoints associated with the cross-TAD CCNE1 rearrangement and tandem duplication (black dotted circle, figure 3D). These results demonstrate that CCNE1 enhancer-based SVs may rewire local regulatory circuits (figures 2B and 3E) by disrupting pre-existing TAD boundaries and causing new long-range chromatin interactions.

PeNChIP-seq also reveals IGF2 and CCND1 enhancer hijacking in GC

Besides CCNE1, our analysis further revealed enhancer-based SVs affecting IGF2 and CCND1 (online supplementary table 12) collectively occurring in 4.8% of GCs (IGF2—eight cases, CCND1—two cases). At IGF2, enhancer-based SVs were observed in the form of tandem duplications (Figure 4A, online supplementary figure 14A, online supplementary table 12) and linked to IGF2 overexpression in both primary GCs (figure 4B) and cell lines (online supplementary figure 14B,C). Associations between IGF2 SVs and IGF2 overexpression have been observed in colorectal cancer.13 Comparison of primary GC histone profiles against matched normal gastric tissues confirmed that the hijacked enhancer is tumour-associated (online supplementary figure 14D). When compared against gastric, colorectal and breast cancer cell lines,20 we observed similar enhancer acetylation patterns in gastric and colorectal cancer but not in breast cancer (online supplementary figure 14D–F), suggesting that IGF2 enhancer-based SVs are likely lineage-specific and shared across gastro-intestinal malignancies. We also found that IGF2 enhancer-based SVs are associated with TAD disruptions, as IGF2-rearranged primary GCs (T990275) (top, figure 4C) exhibited novel chromatin interactions at the IGF2 locus between pre-existing contact domains (yellow) and peaks (black dotted circles, figure 4C) compared with SNU16 GC cells that have a wild-type IGF2 genomic architecture (figure 4C). Survival analysis revealed that patients exhibiting high IGF2 expression showed poor overall survival compared with patients exhibiting low IGF2 expression (TCGA STAD, online supplementary figure 15; p value=2.5×10-2, log-rank test). In a multivariate analysis, the association with survival remained statistically significant (Cox regression p value=2.3×10-2; HR=1.8, 95% CI 1.08 to 2.90) after adjusting for other risk factors, such as age, stage, patient locality and histological subtype.

Figure 4

IGF2 and CCND1 enhancer-based SVs in GC. (A) Example of a tandem duplication at the IGF2 locus. The tandem duplication, occurring in GC T990275, is indicated by two breakpoints (dotted line). The duplication is predicted to juxtapose a normally downstream enhancer element (black) against the IGF2 upstream region. Similar tandem duplications have been observed in colorectal cancer.13 (B) Differential IGF2 expression across 11 GC/matched normal samples with PeNChIP-seq profiles (log-transformed fold change). Samples GC T990275 and GC T2000877, both carrying an IGF2 tandem duplication, show upregulation of IGF2 expression.

Figure 4C

(C) Chromatin contact matrix of GC T990275 using Hi-C data. The matrix illustrates the frequency of chromatin interactions between genomic regions at chr11:1 600 000–2 400 000. Yellow squares indicate predicted contact domains from IMR90, which are similar to matrices from control SNU16 cells (also known as IGF2 wild-type GC). Black circles in GC T990275 indicate chromatin loops between two loci. Grey dotted lines highlight the location of breakpoints. The star indicates the location proximal to IGF2. The top rows highlight histone H3K27ac profiles from the GC sample (T990275) and the matched normal (N990275) at the IGF2 locus, and the location of the tumour-associated tandem duplication is indicated.

Figure 4DE

(D) Example of an intra-chromosomal rearrangement at the CCND1 locus. The rearrangement, occurring in GC T980417, is predicted to juxtapose a predicted super-enhancer element from the PHDX locus against the CCND1 upstream region. The illustration of such rearrangement is provided at the bottom. The red arrow indicates the direction of gene transcription. The symbol ‘>’ indicates the strand direction of 5’–3’. (E) Differential CCND1 expression across 11 GC/matched normal samples with PeNChIP-seq profiles (log-transformed fold change). Samples GC T980417, carrying a CCND1 rearrangement, shows upregulation of CCND1 expression. GC,gastric cancer; SV,structural variation.

Finally, we observed tumour-associated enhancer-based SVs (figure 4D) supported by discordant paired and split reads at the CCND1 locus. Similar to CCNE1, the CCND1 SVs were associated with both CCND1 amplification and marked upregulation of CCND1 relative to other GC samples lacking CCND1 rearrangements (figure 4E). Analysis of the CCND1-partner breakpoint revealed that it involved a predicted enhancer associated with the PDHX locus in GC, distinct from previous CCND1-hijacking events in lymphoma which involve IGH.21 Taken collectively, these cases demonstrate that in GC, oncogenes such as CCNE1, IGF2 and CCND1 can be activated by rewiring regulatory circuits via enhancer-hijacking.

Discussion

SVs at non-coding regions can cause enhancer-hijacking, where distal enhancer elements are juxtaposed against cancer genes. Initially identified in medulloblastoma through the activation of the GFI1 and GFI1B proto-oncogenes,12 enhancer hijacking events have two major consequences—first, tissue-specific enhancer elements are relocalised to the vicinity of oncogene promoters thereby instigating oncogene activity. Second, genomic rearrangements SVs may disrupt wild-type TAD boundaries, resulting in aberrant enhancer-promoter interactions between regulatory elements normally insulated from one another. In the current study, we employed PeNChIP-seq to identify enhancer hijacking events in GC. Compared with conventional enhancer-ChIP+WGS, benchmarking analysis revealed that only a subset of enhancer-based SVs were identified detected by PeNChIP-seq (online supplementary figure 1C). The performance of PeNChIP-seq may be improved by technical and algorithmic modifications, such as decreased DNA shearing to increase the size of library fragments, increasing the PeNChIP-seq sequencing depth, and more sophisticated methods to dissect complex SV events.

We identified tumour enhancer-based SVs associated with upregulation of CCNE1 and IGF2 expression in GC. In GC, CCNE1 has been associated with copy number amplification,33 and in liver cancer as a target of viral integration and enhancer hijacking.40 Our results suggest that for GC target genes exhibiting amplification such as CCNE1, enhancer-based SVs may interact with conventional gene dosage to collaboratively up-regulate target gene expression (online supplementary figure 12). Supporting this hypothesis, we experimentally confirmed that deletion of a CCNE1-hijacked enhancer in MKN7 cells (harbouring both CCNE1 enhancer-based SVs and CCNE1 amplification) caused decreased CCNE1 expression. Cyclin E1 (CCNE1) encodes a cyclin that stimulates Rb phosphorylation by CDK2, controlling the transition of cells from G1 to S phase. Breakpoints at this locus were detected in a significant fraction (8%) of GC patients, comparable to rates of ERBB2 amplification, the molecular target of trastuzumab.41 From a therapeutic perspective, overexpression of CCNE1 has been associated with poorer patient survival,23 and CCNE1 co-amplification has been reported as a collaborative oncogenic event conferring resistance to ERBB2-directed therapies in gastro-oesophageal malignancies.22 Cell lines harbouring CCNE1 amplification have also been associated with resistance to the CDK4/6 inhibitor abemaciclib,9 and MKN7 GC cells have been reported to be sensitive to bortezomib,42 highlighting a possible treatment strategy for CCNE1-addicted GCs. Notably, the CCNE1 genomic locus also contains URI1, which has been reported as an oncogene in ovarian and colorectal cancer.43 44 However, comparison of expression data demonstrated that CCNE1-rearrangments resulted in greater upregulation of CCNE1 than URI1 (CCNE1: median 40×, average 29×; URI1: median 7×, average=5×), motivating us to focus on CCNE1.

Besides CCNE1, our findings also identified IGF2 as another target of enhancer hijacking in GC. In gastric tumours, IGF2 enhancer hijacking employed a very similar mechanism previously reported in colorectal cancer,13 involving a tandem duplication causing de novo formation of a new TAD interaction domain (proven by Hi-C) mobilising a tissue-specific super-enhancer normally inaccessible to IGF2. IGF2 is known to play a role in the growth and proliferation of cells, and in our study IGF2 upregulation in GC was dramatic (>100 fold). In colorectal cancer, high IGF2 expression has been associated with poor prognosis,45 and IGF2 knockdown in paclitaxel-resistant ovarian cancer cells restored drug sensitivity.46 Besides upregulation of IGF2, intron 8 of IGF2 also contains the microRNA MIR483, which has also been reported to exhibit oncogenic potential.47 Co-occurrence of this enhancer hijacking event in both gastric and colorectal cancer supports non-coding enhancer hijacking as a dominant mechanism of IGF2 overexpression in gastro-intestinal malignancies.

Our study is one of the first to experimentally confirm the establishment of novel de novo chromatin interactions at the locations of enhancer-based SVs in primary epithelial tumours, as shown by Hi-C data for both CCNE1 and IGF2. These results are consistent with recent studies in other tumour types highlighting somatic copy number alterations and genome rearrangements as a mechanism by which local insulator domains may be disrupted, leading to the creation of novel de novo chromatin and regulatory interactions.13 14 Experimental verification of such interactions has also been demonstrated, in our present study through Hi-C analysis of primary GCs, and in other studies by 4C/CTCF-ChIPseq assays in colorectal cancer spheroids,6 neuroblastoma cell lines by in situ Hi-C analysis,11 and recently in primary diffuse large B-cell lymphoma.48

The observation that tumour-gene expression can be driven by recruitment of enhancer elements may have diagnostic and therapeutic implications. First, since enhancer-based SV breakpoints usually occur within non-exonic regions, these events are unlikely to be detected using exome-based or panel-based tumour sequencing where only protein-coding exons are examined. Second, diagnostic platforms may require incorporation of epigenetic information to distinguish functionally important SVs driving tumour gene expression from bystander SVs arising from background genomic instability. Third, examining gene targets of enhancer hijacking may reveal new therapeutic targets missed by exome-based mutation screening. In the case of IGF2, recent studies have shown that IGF2-overexpressing colorectal cancers may exhibit sensitivity to IGF1R/INSR inhibitor compounds.49 Fourth, for gene targets of enhancer hijacking that are themselves considered ‘undruggable’ (eg, CCNE1), disrupting the activity of the hijacked enhancer, by either genome-editing or targeting epigenetic complexes associated with the hijacked enhancer, may represent another therapeutic opportunity.

In conclusion, our findings highlight the dynamic interplay between the cancer genome and epigenome.19 26 We note that besides CCNE1, IGF2 and CCND1, we also observed tumour-associated enhancer-based SVs causing upregulation of several other genes without previously described roles in GC, such DLGAP1 and PKDCC, which may represent novel oncogenes. Our results thus demonstrate how non-coding rearrangements may influence tumour gene expression in GC through the rewiring of cis-regulatory elements.

Acknowledgments

We thank all members of the Singapore Gastric Cancer Consortium for their contributions and support, and the Duke-NUS Genome Biology Facility for sequencing services.

References

View Abstract

Footnotes

  • WFO and AMN contributed equally.

  • Contributors PT led the study and was involved in its conception, design, data collection and assembly, and manuscript writing. WFO, AMN and KJL were involved in study conception, design, data analysis and manuscript writing. JQL, YG, SJL, TN, JXT and WKL were involved in data analysis. AMN, SZ, MX, AM, SWTH, XY, CX, XO and YNL were involved in performing experiments. ML and AL-KT provided genomic profiling (sequencing and microarray) and data preprocessing services. AK, KPW, SR, BTT, SL, AJS provided facilities, reagents and intellectual input. WFO and AMN contributed equally to this article. All authors were involved in proof-reading and gave final approval of the manuscript.

  • Funding This work was supported by the National Research Foundation Singapore under its Translational and Clinical Research (TCR) Flagship Programme and administered by the Singapore Ministry of Health’s National Medical Research Council (TCR/009-NUHS/2013) and grant NMRC/STaR/0026/2015. Other sources of support include A*STAR A*ccelerate GAP fund (ETPL/15-R15 GAP-0021), core funding from Duke-NUS Medical School, and the Cancer Science Institute of Singapore, NUS, under the National Research Foundation Singapore and the Singapore Ministry of Education under its Research Centres of Excellence initiative.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval Primary patient samples were obtained from the SingHealth tissue repository with Institutional Review Board approval and signed patient informed consent.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available in a public, open access repository. All data relevant to the study are included in the article or uploaded as supplementary information.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.