Article Text

Original research
Multiregion whole-exome sequencing of intraductal papillary mucinous neoplasms reveals frequent somatic KLF4 mutations predominantly in low-grade regions
  1. Kohei Fujikura1,
  2. Waki Hosoda1,2,
  3. Matthäus Felsenstein1,3,
  4. Qianqian Song4,
  5. Johannes G Reiter5,
  6. Lily Zheng6,
  7. Violeta Beleva Guthrie7,
  8. Natalia Rincon8,
  9. Marco Dal Molin9,
  10. Jonathan Dudley9,
  11. Joshua D Cohen9,
  12. Pei Wang4,
  13. Catherine G Fischer1,
  14. Alicia M Braxton1,
  15. Michaël Noë1,
  16. Martine Jongepier1,
  17. Carlos Fernández-del Castillo10,
  18. Mari Mino-Kenudson11,
  19. C Max Schmidt12,
  20. Michele T Yip-Schneider12,
  21. Rita T Lawlor13,
  22. Roberto Salvia14,
  23. Nicholas J Roberts1,
  24. Elizabeth D Thompson1,
  25. Rachel Karchin8,
  26. Anne Marie Lennon15,
  27. Yuchen Jiao4,
  28. Laura D Wood1
  1. 1 Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
  2. 2 Department of Pathology and Molecular Diagnostics, Aichi Cancer Center, Nagoya, Japan
  3. 3 Department of Surgery, Charité Universitätsmedizin, Berlin, Germany
  4. 4 State Key Lab of Molecular Oncology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
  5. 5 Canary Center for Cancer Early Detection in Department of Radiology, Stanford Cancer Institute, and Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA
  6. 6 McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
  7. 7 Personal Genome Diagnostics, Baltimore, Maryland, USA
  8. 8 Institute for Computational Medicine and Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA
  9. 9 Ludwig Center for Cancer Genetics and Therapeutics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
  10. 10 Department of Surgery, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
  11. 11 Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
  12. 12 Department of Surgery, Indiana University School of Medicine, Indianapolis, Indiana, USA
  13. 13 ARC-NET: Centre for Applied Research on Cancer, University and Hospital Trust of Verona, Verona, Italy
  14. 14 General and Pancreatic Surgery Department, The Pancreas Institute and Hospital Trust of Verona, Verona, Italy
  15. 15 Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
  1. Correspondence to Dr Laura D Wood, Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA; ldwood{at}; Dr Yuchen Jiao, State Key Lab of Molecular Oncology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China; jiaoyuchen{at}


Objective Intraductal papillary mucinous neoplasms (IPMNs) are non-invasive precursor lesions that can progress to invasive pancreatic cancer and are classified as low-grade or high-grade based on the morphology of the neoplastic epithelium. We aimed to compare genetic alterations in low-grade and high-grade regions of the same IPMN in order to identify molecular alterations underlying neoplastic progression.

Design We performed multiregion whole exome sequencing on tissue samples from 17 IPMNs with both low-grade and high-grade dysplasia (76 IPMN regions, including 49 from low-grade dysplasia and 27 from high-grade dysplasia). We reconstructed the phylogeny for each case, and we assessed mutations in a novel driver gene in an independent cohort of 63 IPMN cyst fluid samples.

Results Our multiregion whole exome sequencing identified KLF4, a previously unreported genetic driver of IPMN tumorigenesis, with hotspot mutations in one of two codons identified in >50% of the analyzed IPMNs. Mutations in KLF4 were significantly more prevalent in low-grade regions in our sequenced cases. Phylogenetic analyses of whole exome sequencing data demonstrated diverse patterns of IPMN initiation and progression. Hotspot mutations in KLF4 were also identified in an independent cohort of IPMN cyst fluid samples, again with a significantly higher prevalence in low-grade IPMNs.

Conclusion Hotspot mutations in KLF4 occur at high prevalence in IPMNs. Unique among pancreatic driver genes, KLF4 mutations are enriched in low-grade IPMNs. These data highlight distinct molecular features of low-grade and high-grade dysplasia and suggest diverse pathways to high-grade dysplasia via the IPMN pathway.

  • pancreatic tumours
  • pancreatic pathology
  • mutations
  • pancreatic cancer
  • molecular genetics

Data availability statement

Data are available from the authors on reasonable request and approval of data sharing by institutional review boards.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known on this subject?

  • Intraductal papillary mucinous neoplasms (IPMNs) are the most common neoplastic cysts in the pancreas and can progress to invasive pancreatic adenocarcinoma.

  • Comprehensive sequencing of small IPMN cohorts has identified driver genes in advanced lesions, and targeted multiregion sequencing has demonstrated genetic heterogeneity in IPMNs. However, comprehensive multiregion genomic analysis of IPMNs is required to elucidate the evolutionary features of neoplastic progression.

What are the new findings?

  • Multiregion whole-exome sequencing revealed KLF4 hotspot mutations (K409 and S411) in >50% of the analysed IPMNs, and these mutations were more frequently detected in regions with low-grade dysplasia than with high-grade dysplasia.

  • KLF4 mutations can be identified in IPMN cyst fluid samples using the Safe Sequencing System, and these mutations were significantly more prevalent in cyst fluid from IPMNs with low-grade dysplasia.

  • Phylogenetic analyses demonstrated diverse patterns of tumour initiation and progression, suggesting that there is not a single universal genetic pathway into high-grade dysplasia in IPMNs.

Significance of this study

How might it impact on clinical practice in the foreseeable future?

  • These results underscore the potential of KLF4 hotspot mutations as a novel biomarker for pancreatic cyst risk stratification, as KLF4 mutations are predominantly detected in low-grade IPMNs.

  • Both interpatient and interregion genetic heterogeneity in IPMNs should be considered during the development of molecular approaches for pancreatic cyst assessment.


Intraductal papillary mucinous neoplasms (IPMNs) are non-invasive cyst-forming pancreatic neoplasms that can progress to aggressive invasive pancreatic ductal adenocarcinoma (PDAC). IPMNs are classified based on the morphological dysplasia of their neoplastic epithelium—low-grade IPMNs have minimal atypia and low risk of malignant transformation, while high-grade IPMNs have severe atypia and are at higher risk for progression to invasive cancer.1 IPMNs are frequently diagnosed incidentally on abdominal imaging, providing an important opportunity to prevent pancreatic cancer.2 3 Guidelines for surveillance or surgical intervention in IPMN patients must balance the opportunity for cancer prevention with the morbidity and even mortality associated with overtreatment of low-risk lesions.4–6 Current decision making relies largely on clinical and radiological features, but these approaches are not adequately sensitive nor are they specific for high-risk IPMNs.5 6 Thus, there is a critical need to better understand the molecular alterations that drive the progression of low-risk IPMNs to those at high risk for progression to invasive carcinoma, as these represent potential biomarkers for cysts requiring clinical intervention.

In contrast to the trove of genomic data describing invasive pancreatic cancers, the genomes of relatively few IPMNs have been analysed. Previous comprehensive sequencing of small cohorts of IPMNs mostly focused on advanced lesions and revealed characteristic driver genes,7–9 while targeted analyses in larger, more diverse cohorts have confirmed the prevalence of specific driver gene mutations that correlate with grade of dysplasia or histological subtype.10 These initial studies relied on analysis of a single region from each IPMN, followed by comparison of clinical, pathological and molecular characteristics across different patients. More recently, multiregion targeted next-generation sequencing of IPMNs has revealed a surprising degree of intratumoral genetic heterogeneity, even with respect to well-characterised driver gene mutations in IPMNs, highlighting previously unappreciated genetic complexity in precancerous pancreatic lesions.11–13 However, comprehensive multiregion sequencing has not yet been performed on IPMNs. Such analyses can define the genomic alterations associated with progression in individual lesions, uncover new driver genes and define unique evolutionary patterns in precancerous lesions.

In this study, we report multiregion whole-exome sequencing of distinct low-grade and high-grade regions of 17 human IPMNs without associated invasive carcinoma. The resulting data define evolutionary trajectories in IPMN progression and highlight genetic heterogeneity throughout these lesions. In addition, we identify a novel driver of early IPMN tumourigenesis with a unique evolutionary pattern and validate patterns of mutations in an independent cohort of IPMN cyst fluid samples. Taken together, our results provide several key insights not possible through analysis of advanced cancers, highlighting the importance of direct analysis of precancerous lesions.


Clinical data

Electronic medical records were reviewed to document clinical information such as age, sex, family history, clinical presentation, imaging diagnosis and outcome. These clinical data are summarised in online supplemental table 1.

Supplemental material

Case selection and specimen acquisition

We retrospectively reviewed surgical pancreatectomy specimens from patients diagnosed with IPMN without associated invasive carcinoma between 2007 and 2016. Diagnostic H&E-stained slides were reviewed by pancreatic pathologists (WH and LDW) to identify IPMNs with distinct components of both low-grade and high-grade dysplasia and to select 1–3 blocks per case with regions of both grades of dysplasia. Because of recent previous reports of polyclonal origin in IPMNs,12 we carefully selected IPMN cases in which morphological features suggested that the high-grade component arose in association with the coexisting low-grade component. We set the following histological criteria for the case selection: IPMN with adequate quantities of both low-grade and high-grade components for genomic analysis in which (1) a high-grade component was in direct contact with a low-grade component, OR (2) if the low-grade component was close to but not directly attached to the high-grade component (in most instances growing in a different cystic space), then both components were located within the same formalin-fixed paraffin-embedded (FFPE) block. Of 118 resected specimens of high-grade IPMN retrieved from the database, we found 24 high-grade IPMN cases that met the aforementioned criteria, and morphologically distinct regions of each grade were identified in 32 blocks from these cases and selected for subsequent laser capture microdissection. The histological subtype of each sequenced region was determined by consensus of four pathologists (KF, EDT, RHH and LDW).

Laser capture microdissection

Twenty to thirty 10 um serial tissue sections from selected FFPE tissue blocks were cut onto membrane slides (Carl Zeiss MembranSlide 1.0 PEN; Carl Zeiss, Oberkochen, Germany). Deparaffinisation and staining were performed as previously described.14 Two to six morphologically distinct regions were microdissected from each case using laser capture microdissection (LMD7000, Leica, Wetzlar, Germany), resulting in DNA samples of adequate quantity and quality for whole-exome sequencing from 76 IPMN regions from 17 cases, as well as a matched normal sample from each case. We did not obtain adequate DNA from the other seven microdissected cases, which were excluded from further analyses.

Whole-exome sequencing and data analysis

Genomic DNA libraries were prepared from the 93 FFPE DNA samples (76 IPMN samples, 17 normal samples) following Illumina’s (Illumina, San Diego, California, USA) suggested protocol. Human exome capture was performed following a protocol from Agilent SureSelect Human All Exon 50 Mb Kit 5.0 (Agilent, Santa Clara, California, USA). The captured libraries were sequenced with XTEN sequencer (Illumina) with 150 bp paired-end reads. Nonsynonymous mutations were called using a well-validated pipeline based on Mutect.15 Details of the pipeline are presented in the online supplemental methods.

Supplemental material

Phylogenetic analysis

To extract per-site features (mismatch frequency, insertion frequency and deletion frequency), BAM alignment files were converted to tab delimited format using jvarkit ( We inferred phylogenies with Treeomics V.1.7.12 based on all mutations that passed the filtering described above. Each phylogeny is rooted at the subject’s normal sample and the leaves represent the distinct regions of the IPMN. Treeomics uses a Bayesian inference model to account for error-prone sequencing and varying neoplastic cell content to infer globally optimal trees using Mixed Integer Linear programming.16 Gene names along lineages indicate an acquired non-synonymous mutation in the corresponding gene–genes are listed twice in the same phylogeny only if multiple distinct somatic mutations in that gene were identified. We display mutations in previously identified PDAC driver genes (significantly mutated genes in The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) PDAC studies),17 18 as well as mutations in genes that were mutated in at least three separate IPMNs and had a nonsynonymous mutation frequency >0.5 mutations per Kb gene size in our cohort.

Analysis of IPMN cyst fluid

Pancreatic cyst fluid was collected from resected specimens in the surgical pathology laboratory (n=55) or at the time of endoscopic ultrasound (n=8). Genomic DNA was purified from pancreatic cyst fluid as described previously.19 The hotspot loci in the KLF4 gene were analysed in IPMN cyst fluid using the Safe Sequencing System (Safe-SeqS),20 which has been described in detail previously.21 22 Details are provided in the online supplemental methods.


Overall approach

In order to dissect the molecular events in low-grade and high-grade components of precancerous pancreatic lesions, we performed multiregion whole-exome sequencing of 17 IPMNs containing regions of both low-grade and high-grade dysplasia (online supplemental table 2). We analysed a total of 76 IPMN exomes, including 49 from low-grade regions and 27 from high-grade regions—the number of exomes analysed per IPMN ranged from 2 to 6 (online supplemental table 2). In the majority of IPMNs in our cohort (14/17), the low-grade regions showed gastric differentiation, while the high-grade regions in the same IPMN were pancreatobiliary (online supplemental table 2). In addition, our cohort contained two mixed gastric-intestinal type IPMNs, as well as one that showed gastric differentiation in all regions analysed (online supplemental table 2). In each case, matched normal samples were also analysed by whole-exome sequencing to exclude germline variants and to identify somatic mutations. The average distinct coverage for our IPMN whole-exome sequencing was 170x (range 38x–367x) (online supplemental table 3).

From the multiregion whole-exome sequencing data, we identified a total of 3090 nonsynonymous somatic mutations, with a mean of 41 non-synonymous somatic mutations per analysed IPMN region, corresponding to a mutation burden of 1.11 nonsynonymous mutations per megabase (Mb) (online supplemental table 4, online supplemental figure 1). Surprisingly, the mean number of somatic mutations did not differ between low-grade and high-grade regions—we identified a mean of 41 nonsynonymous somatic mutations (range 20–103) in low-grade regions compared with a mean of 41 (range 22–76) in high-grade regions (p=0.55, two-tailed Mann-Whitney U test) (figure 1A). These correspond to a mutation burden of 1.10 nonsynonymous mutations per Mb in low-grade regions and 1.12 non-synonymous mutations per Mb in high-grade regions. A mean of 10 nonsynonymous somatic mutations were shared among all samples analysed from a given IPMN, while a mean of 26 were unique/private to low-grade regions and a mean of 24 were unique/private to high-grade regions. We then compared the neoplastic cell fraction (NCF) of mutations in low-grade and high-grade regions (figure 1B). There was no significant difference between low-grade and high-grade regions in the NCF of mutations shared in all samples of a given IPMN (p=0.51, two-tailed Mann-Whitney U test). In contrast, we identified a significant difference between low-grade and high-grade regions in the NCF of unshared mutations, with mutations in low-grade regions having a significantly lower NCF than those in high-grade regions (p<0.0001, two-tailed Mann-Whitney U test). Regions of high-grade dysplasia had a larger proportion of unshared mutations with a NCF approaching 1, suggestive of clonal mutations shared in all analysed cells (figure 1B). Still, mutation signatures were similar among all samples analysed, with an enrichment for C-to-T transitions (figure 1C). Copy number analyses using the whole-exome sequencing data revealed scattered alterations without striking differences between low-grade and high-grade components in most IPMNs (online supplemental figure 2, online supplemental table 5). However, a minor subset of cases showed increased copy number alterations in high-grade regions (for example, IP2, IP9, IP29), suggesting accumulation of copy number alterations during neoplastic progression in some IPMNs (online supplemental figure 2).

Supplemental material

Supplemental material

Figure 1

Whole-exome sequencing of multiregion low-grade (LG) and high-grade (HG) IPMN samples. (A) Comparison of the tumour mutation burden per megabase (TMB/Mb) between LG (n=49) and HG (n=27) IPMN regions. The lines and error bars indicate mean±1 SD. (B) Violin plots showing the neoplastic cell fraction (NCF) of all mutations detected in LG (n=49) and HG (n=27) regions. The NCFs were calculated separately for shared mutations among all regions (left panel) and unshared mutations (right panel). (C) Proportion of base changes observed in each IPMN sample. Samples are organised by grade of dysplasia and descending total number of alterations. IPMNs, intraductal papillary mucinous neoplasms.

Driver genes of IPMN tumourigenesis

Through multiregion whole-exome sequencing, we confirmed the high prevalence of mutations in some previously identified pancreatic driver genes in IPMN cases, including GNAS (15/17 cases; 88%), KRAS (14/17 cases; 82%), RNF43 (10/17 cases; 59%) and CDKN2A (5/17 cases; 29%) (figure 2A, online supplemental figure 3). Intriguingly, TP53 mutations were uncommon in our cohort (2/17 cases; 12%), and we identified no mutations in SMAD4 or TGFBR2. We also identified hotspot mutations in CTNNB1 (3/17 cases; 18%) and inactivating mutations in APC (3/17 cases; 18%), demonstrating WNT signalling alterations in a subset of IPMNs as has been previously reported.10 In addition, non-synonymous mutations in RBM10 were identified in 24% (4/17) of IPMNs in our study—mutations in this gene have been previously reported in both IPMNs and PDACs.13 23

Supplemental material

Figure 2

Driver mutations in low-grade (LG) and high-grade (HG) IPMN. (A) Somatic non-synonymous mutations in the most frequently mutated genes are categorised as shared between LG and HG IPMN (grey), limited to LG (blue), or limited to HG (red). Genes with mutations in >3 IPMNs and >0.5 mutations per Kb gene size are included. (B) Comparison of prevalence of non-synonymous mutations between LG and HG samples. Genes showing significantly different mutation frequencies between LG and HG are indicated by asterisks (two-tailed Mann-Whitney U test). IPMNs, intraductal papillary mucinous neoplasms.

In addition to confirming the prevalence of mutations in previously characterised driver genes, our study also identified novel drivers of pancreatic tumourigenesis these IPMNs. The most striking of these new drivers was KLF4, which encodes a member of the Kruppel family of transcription factors. We identified somatic mutations in one of two hotspot codons (amino acids K409 and S411) of KLF4 in 53% (9/17) of IPMNs in our study (figure 2A,B). Both hotspots are located in the highly conserved C2H2 zinc finger domains in KLF4 (figure 3A,B). A total four different amino acid substitutions (K409Q, K409E, S411Y and S411F) were detected in these two hotspots, and two IPMNs in our cohort had two different KLF4 mutations (K409E and S411F in IP5; K409Q and S411Y in IP7) (2/17 cases; 12%). We analysed the four different KLF4 hotspot mutations using Cancer-specific High-throughput Annotation of Somatic Mutations plus (CHASMplus).24 All four different KLF4 hotspot mutations had high CHASMplus PDAC scores (K409Q, p=0.042; K409E, p=0.041; S411Y, p=0.038; S411F, p=0.038), suggesting that these mutations are likely to be drivers of IPMN tumourigenesis. None of these amino acid substitutions were detected in the Genome Aggregation Database of germline alterations (>250 000 alleles). Only KLF4 K409Q has been previously reported in meningiomas and in three cases in previous IPMN analyses, but our study represents the first report of four different KLF4 hotspot mutations at this high prevalence in epithelial neoplasms.8 13 25 26

Figure 3

Characterisation of recurrent hotspot mutations in KLF4. (A) Gene schematic showing mutations identified in KLF4 in our IPMN cohort (n=17 cases, 76 regions) and TCGA PDAC cohort (n=184 cases). All mutations in our IPMN cohort are located in two hotspots (K409 and S411) in first C2H2 zinc finger domain. (B) Amino acid sequences of different species around the hotspot mutations of KLF4. Completely conserved amino acids across all species are indicated by asterisks. (C) Comparison of KLF4 mutation prevalence among ‘low-grade (LG)’, ‘intermediate-grade (IG)’ and ‘high-grade (HG)’ IPMN in our cohort and TCGA PDAC. The histological grade in this panel is based on the previous three-tiered grading system for IPMN. P values were calculated based on two-tailed Mann-Whitney U test. IPMN, intraductal papillary mucinous neoplasms; PDAC, pancreatic ductal adenocarcinoma; TCGA, The Cancer Genome Atlas.

In addition to these frequent mutations in KLF4, we report nonsynonymous mutations in FBXW7 and HUWE1, each in 18% (3/17) of the IPMNs studied—these genes encode the components of an E3 ubiquitin ligase and are previously characterised tumour suppressor genes in other tumour types (figure 2A,B, online supplemental figure 3).27 28 We also report mutations in SETBP1 in 24% (4/17) of IPMNs; mutations in this gene have been reported in several other tumour types but have not been previously highlighted in pancreatic neoplasms.29 30 Other genes with prevalent somatic mutations in our cohort include MUC16 (6/17 cases; 35%), TTN (5/17 cases; 29%), and NBPF1 (5/17 cases; 29%). However, the large size of MUC16 and TTN suggest that many (if not all) of these mutations may be passengers.

The comprehensive genomic analysis of multiple regions per IPMN, including regions of both low-grade and high-grade dysplasia, provides a unique opportunity to assess the timing of mutations in specific driver genes. In IPMNs with KRAS mutations, at least one mutation in this gene was shared in both low-grade and high-grade components (figure 2A,B). Similarly, mutations in GNAS were almost always shared between low-grade and high-grade components. Most other mutated genes had a mixture of mutational patterns, including mutations limited to both low-grade and high-grade components. Of note, TP53 mutations were consistently limited to high-grade components, as were mutations in the adherens junction protein AJAP1 (figure 2A,B).

Mutations in KLF4 had a strikingly different pattern, with mutations frequently limited to the low-grade component of the IPMNs—KLF4 hotspot mutations were limited to the low-grade component in 6 of 9 mutant cases and were shared between the low-grade and high-grade components of 3 of 9 (example in figure 4). In our cohort, the prevalence of KLF4 mutations in low-grade and high-grade components was significantly different (figure 2B; 40% vs 15%, p=0.049, two-tailed Mann-Whitney U test). There were no KLF4 mutations that were limited to the high-grade components. When the previous three-tiered histologic grading system for IPMN was applied, with low-grade IPMN divided into ‘low-grade’ and ‘intermediate-grade’ dysplasia, the prevalence of KLF4 mutations was highest in ‘low-grade’ (50%) compared with ‘intermediate-grade’ (39%) or ‘high-grade’ (15%; p=0.023, two-tailed Mann-Whitney U test vs ‘low-grade’) (figure 3C). In addition, the prevalence of hotspot KLF4 mutations in TCGA analysis of invasive PDAC (0.5%, 1/184 cases) was much lower than the prevalence in our cohort of precancerous lesions.17

Figure 4

Distribution of somatic mutations in low-grade (LG) and high-grade (HG) regions of IP5. (A) Heatmap depicting the distribution of nonsynonymous somatic mutations among five different regions of IP5 (three LG and two HG). Black boxes indicate mutations with variant allele frequency (VAF) >15%, and grey boxes indicate VAF of 5%–15%. (B) Representative images of neoplastic tissue of IP5 stained by H&E. The coloured circles indicate the microdissected regions for sequencing analysis. (C) Chow-Ruskey plot of shared and unique somatic mutations among five different regions of IP5. TMB, tumour mutation burden.

Our multiregion sequencing data also allows examination of genetic heterogeneity in IPMNs, confirming previous observations about precancerous pancreatic neoplasia.12 In IP24, two of the analysed regions shared no somatic mutations with the other regions, suggesting that the analysed tissue block contained multiple independent precancerous neoplasms. In this IPMN, the independent neoplasms demonstrably involved the same ductal space (figure 5). These observations provide further support for the hypothesis that at least a subset of IPMNs have polyclonal origin and are comprised of multiple independently arising clones that share no somatic mutations.12 We also identified a distinct pattern of mutation in both RNF43 and KLF4 in which multiple mutations in the same driver gene were present in different regions of the same IPMN, suggesting convergent evolution with respect to mutations in some driver genes (figure 4). This pattern has been previously reported in RNF43 but not KLF4, highlighting unique selective pressures on mutations in these two key driver genes.12 31

Figure 5

Distribution of somatic mutations in low-grade (LG) and high-grade (HG) regions of IP24. (A) Heatmap depicting the distribution of nonsynonymous somatic mutations among four different regions of IP24 (two LG and two HG). Black boxes indicate mutations with variant allele frequency (VAF) >15%, and grey boxes indicate VAF of 5%–15%. (B) Representative images of neoplastic tissue of IP24 stained by H&E. The coloured circles indicate the microdissected regions for sequencing analysis. The bottom image shows the enlarged view of the black dotted circle in upper image. (C) Chow-Ruskey plot of shared and unique somatic mutations among four different regions of IP24. TMB, tumour mutation burden.

Evolutionary analyses of non-invasive IPMNs

In order to analyse the evolutionary relationships of IPMN samples in more detail, we reconstructed lesional phylogenies using Treeomics, a computational approach specifically designed for noisy next-generation sequencing data from tumour samples.16 Treeomics calculates the probability that a mutation is present or absent in a particular sample based on the number of mutant and reference reads, with hyperparameters for the probability calculation depending on the estimated sample purity as well as other variables. Mutations are placed on the root of the tree by Treeomics only if that mutation was present with a high probability in all samples. Multiple important insights were evident from this analysis. First, our evolutionary analysis highlighted that in multiple samples (eg, IP7, IP8, IP31), high-grade dysplasia arose without unique mutations in pancreatic driver genes, while in others (eg, IP2, IP9) high-grade-specific mutations in driver genes such as TP53 were identified (online supplemental figure 4). Second, in some samples with multiple distinct high-grade regions (eg, IP7, IP20), we demonstrate that such regions are more closely related to low-grade regions than to each other, suggesting that the transition to high-grade dysplasia can occur multiple times in the same IPMN (figure 6, online supplemental figure 4). However, this pattern is not universal, as in other IPMNs (eg, IP2) the high-grade components have a recent common ancestor (online supplemental figure 4). Our Treeomics analysis also confirmed independent genetic origin of different regions of IP24 and highlighted the parallel evolution of multiple distinct RNF43 mutations in discrete subclones in IP20 (figure 6, online supplemental figure 6).

Supplemental material

Figure 6

Representative IPMN phylogenies constructed using Treeomics. Treeomics generated phylogenetic trees from all non-synonymous mutations identified in each IPMN region (A, IP6; B, IP20). Potential driver gene mutations (including those identified in previous pancreatic cancer genomics studies, as well as those mutated in >3 IPMNs and >0.5 Mutations per Kb gene size in the current study) are indicated by their gene name on the lineage in which they occur. Numbers indicate the number of non-synonymous somatic mutations occurring in each trunk or branch. Representative images of neoplastic tissue stained by H&E are presented for both cases, with coloured circles indicating the microdissected regions for sequencing analysis. HG, high grade; IPMN, intraductal papillary mucinous neoplasm; LG, low grade.

We also assessed quantitative features of the Treeomics phylogenies in detail. We observed varying lengths of the ‘trunk’ of the phylogenetic tree, ranging from 0 mutations shared in all samples to 27 (online supplemental figure 4). The mean was ~12 truncal mutations, but there was a broad range—seven IPMNs had <10 truncal mutation while two IPMNs had >20. This highlights distinct evolutionary features in different IPMNs, underscoring that the selective forces governing neoplastic evolution may be variable between patients. Next, we compared the genetic relatedness of low-grade and high-grade regions using ‘genetic distance’, defined as the total number of non-shared somatic mutations between two samples.32 High-grade/high-grade sample comparisons had a lower mean genetic distance (36.1), when compared with low-grade/high-grade sample comparisons (44.2) and low-grade/low-grade sample comparisons (45.2). However, these results did not reach statistical significance (p=0.21 for LG/LG vs HG/HG, p=0.26 for LG/HG vs HG/HG, two-tailed Mann-Whitney U test).

Detection of KLF4 mutations in IPMN cyst fluid

In order to determine the prevalence of KLF4 mutations in an independent cohort, we employed next-generation sequencing analysis of human IPMN cyst fluid samples using the Safe-SeqS, a method designed to reduce sequencing errors and detect low-frequency mutations.20 Clinical and pathological features of the analysed cyst fluid samples are presented in online supplemental table 6. Our Safe-SeqS assay was designed to assess the previously identified hotspots in KLF4 at codons 409 and 411. The cyst fluid samples were from 63 IPMNs, including 26 low-grade and 37 high-grade IPMNs, with all diagnoses confirmed on pathological review of resected specimens. In this cohort of 63 cyst fluid samples, we identified KLF4 mutations in 19 samples (30%), including 11 samples with 1 KLF4 mutation, 7 samples with two distinct KLF4 mutations and 1 sample with three distinct KLF4 mutations (table 1). The prevalence of multiple KLF4 mutations in our cyst fluid samples (8 of 63, 13%) is similar to that identified in whole-exome sequencing (2 of 17, 12%). Hotspot mutations in KLF4 were identified in 12 of 26 (46%) low-grade IPMN cyst fluid samples, compared with 7 of 37 (19%) high-grade IPMN cyst fluid samples using Safe-SeqS (p=0.027, two-tailed Fisher’s exact test). These results confirm the high prevalence of KLF4 mutations in IPMNs as well as the enrichment of these in low-grade lesions. In addition, we identified significantly different KLF4 mutation prevalence based on histological subtype, with mutations in 10 of 22 (45%) gastric-type IPMNs but only 3 of 22 (14%) intestinal-type IPMNs (p=0.045, two-tailed Fisher's exact test) (online supplemental table 7). The detectability of KLF4 mutations in cyst fluid samples highlights their potential utility in preoperative risk stratification of IPMNs.

Table 1

KLF4 mutations identified in cyst fluid samples by Safe-SeqS


Using multiregion whole-exome sequencing of IPMNs with both low-grade and high-grade components, we identified prevalent mutations in a previously unappreciated driver of pancreatic neoplasia: KLF4. Prevalent mutations at an oncogenic hotspot in KLF4 have been previously reported in meningiomas and have been implicated as a universal genetic feature of the secretory subtype of meningioma.25 26 In the pancreas, loss of heterozygosity at the KLF4 locus has been reported in PDAC, but frequent somatic mutations in this gene have not been previously reported.33 Somatic mutations at hotspot positions in KLF4 have been reported in three cases in previous whole-exome sequencing studies of IPMNs, but mutations at the prevalence in our study (with somatic mutations in >50% of analysed IPMNs) have not been previously reported.8 13 This is likely due to the enrichment of KLF4 mutations in regions of low-grade dysplasia, as previous comprehensive sequencing studies of IPMNs have focused on high-grade IPMNs or those with associated invasive carcinomas. We identified a total of 19 IPMNs previously analysed by whole-exome sequencing in the literature, the vast majority from high-grade IPMNs.8 9 11–13 KLF4 mutations were identified in three samples from two different studies, indicating a prevalence of 16% which is similar to the prevalence in high-grade IPMNs in our study. However, because KLF4 mutations occurred in one or two samples in previous studies, they could not be separated from the much larger number of passenger mutations in these studies.

Several studies have assessed the expression levels of KLF4 in normal and neoplastic pancreas.33–35 In the normal pancreas, KLF4 was found to be localised to the nuclei of pancreatic ductal epithelial cells.33 Intriguingly, loss of KLF4 expression has been reported in a sizeable proportion (>85%) of PDACs,33 whereas increased expression was observed in human and mouse acinar-to-ductal metaplasia (ADM) and pancreatic intraepithelial neoplasia (PanIN) lesions.34 We are aware of only one study to date describing KLF4 expression in IPMN.35 This study demonstrated that KLF4 expression is restricted to a small proportion of highly mucinous cells in both human and mouse IPMN specimens, potentially representing regions of low-grade dysplasia.35 Expanded analysis of KLF4 expression in larger IPMN cohorts and correlation with KLF4 mutation status are warranted to clarify the relationship between KLF4 mutation, expression, and role in tumourigenesis.

KLF4, also known as gut-enriched KLF, is an important member of Kruppel-like transcription factor family with multiple putative functions.36 KLF4 was initially identified as a key regulator of cell fate decisions, such as cell proliferation, differentiation and apoptosis. The identification of KLF4 as one of the four ‘Yamanaka factors’ that can reprogram differentiated somatic cells into pluripotent stem cells (OCT3/4, SOX2, KLF4 and MYC), as well as its essential role in the maintenance of genome stability, further substantiate its role in cell fate determination.37 KLF4 has been reported to have tumour suppressive functions in several tumour types, and both experimental and clinical evidence has shown that the loss of KLF4 protein expression can cause altered cell proliferation, differentiation and precancerous changes in adult digestive organs.36 These processes are mediated by multiple oncogenic pathways, including Wnt/β-catenin, transforming growth factor-β1 and p21WAF1/Cip1 signalling.36 Recent studies have also indicated that KLF4 is a negative regulator of epithelial-to-mesenchymal transition, revealing several critical genes as direct transcriptional targets of KLF4, including CDH1, CDH2, VIM, CTNNB1, VEGF and MAPK8.38

In the pancreas, the functional role of KLF4 varies at different points in tumourigenesis, with multiple studies suggesting protumourigenic function in pancreatic tumour initiation and tumour-suppressive function in advanced PDAC.34 Studies of KLF4 in genetically engineered mouse models of pancreatic cancer have demonstrated overexpression of this gene in early pancreatic neoplasia, while experimental overexpression in pancreatic cancer cell lines led to cell cycle arrest and growth inhibition.35 36 These genetically engineered mouse models suggested that dysregulation of the KLF4 signalling pathway promotes PDAC progression and metastasis, but paradoxically, KLF4 ablation attenuates the formation of ADM and PanIN after pancreatic injury in the setting of mutant KRAS. Together with our data showing enrichment of KLF4 mutations in low-grade regions, this raises the intriguing hypothesis that KLF4 mutations are selected early in pancreatic tumourigenesis but are then selected against as lesions progress. Still, although the prevalence and pattern of KLF4 mutations in our study provide strong evidence that KLF4 is an oncogene in neoplastic pancreatic cysts, our sequencing data cannot provide mechanistic insights into the selective pressures in IPMNs. As such, we cannot further evaluate the hypothesis of temporal changes in selective forces with the current data. The functional impacts of hotspot KLF4 mutations in pancreatic tumourigenesis remain to be determined and represent a critical direction of future investigation, and further experimental data in model systems will be required to support or refute this hypothesis.

The distinct mutation patterns of driver genes suggest their role in specific stages of pancreatic tumourigenesis. Mutations in the hotspots of the initiating oncogenes KRAS and GNAS were most often shared among all samples from a given IPMN, suggesting that these mutations occur early in tumourigenesis. In contrast to KRAS and GNAS, TP53 mutations were not common, occurring in only two IPMNs and consistently limited to high-grade components. No mutations were identified in our IPMN cohort in SMAD4, despite frequent mutations in this gene in PDAC and IPMN-associated invasive carcinomas. These findings are consistent with previous studies suggesting that mutations in these genes occur very late in pancreatic tumourigenesis, and in particular that SMAD4 mutations are typically limited to invasive carcinoma.10 14 39 40

As discussed above, in contrast to these later drivers, mutations in KLF4 were uniquely enriched in regions of low-grade dysplasia, suggesting distinct selective forces on these mutations. The pattern of KLF4 mutations suggests that, due to the complexities of clonal evolution, not all driver mutations in early pancreatic tumourigenesis can be detected by studying advanced cancers, highlighting the importance of direct analysis of precursor lesions. It is also important to note that many IPMNs lacked a high-grade-specific driver gene, and copy number alterations were largely shared between matched low-grade and high-grade components, raising the possibility of a non-genetic driver of progression to high-grade dysplasia in a subset of cases. Overall, our data suggest that there is not a single universal genetic pathway to high-grade dysplasia.

In total, 76 multiregion samples from 17 IPMN cases were analysed by whole-exome sequencing in our study. In the majority of IPMNs (16/17 cases, 94%), low-grade and high-grade regions shared common somatic mutations, suggesting evolution from a common ancestor. These observations raise the fundamental question of whether high-grade IPMN arises from low-grade IPMN. Because we analysed a single resected IPMN specimen from each patient and thus observed each IPMN at a single point in time, we cannot directly observe this common ancestor of low-grade and high-grade IPMN. However, the molecular alterations predicted in the common ancestor by Treeomics are most consistent with those previously reported in low-grade IPMN.10 Thus, our data suggest that the common ancestor of the low-grade and high-grade IPMN regions we sequenced was low-grade IPMN, supporting the idea that high-grade IPMN typically arises from low-grade IPMN. Intriguingly, in addition to the shared mutations, each region we sequenced also independently accumulated a set of private mutations, even when the high-grade regions were in direct contact with low-grade regions within the same pancreatic duct (eg, IP6, IP7). These results demonstrate independent evolution of both low-grade and high-grade regions after divergence from the common ancestor, suggesting that continued selection shapes the genetic alterations in both IPMN grades. However, the higher proportion of mutations in high-grade regions with an NCF near one suggests a clonal selection event that is unique to the development of high-grade dysplasia. We also identified one notable exception (IP24; 1/17 cases, 6%), in which the low-grade and high-grade components shared no somatic mutations. In this case, low-grade and high-grade regions arose as independent clones in the same duct.

Pancreatic cysts are frequently identified incidentally on abdominal imaging, creating the unique clinical problem of surveillance that balances cancer prevention with overtreatment.41 To date, several different sets of guidelines have been released for the management of pancreatic cystic neoplasms, including those that can progress to invasive carcinoma, such as IPMN and MCN.5 6 42 In these guidelines, clinical decision making relies largely on radiographic and clinical features, augmented by biochemical and cytologic analyses of cyst content. However, currently available diagnostic tools and algorithms are still imperfect. There is a discrepancy between the preoperative and postoperative diagnosis in >30% of the pancreatic lesions, and 25% of patients who undergo surgery have pancreatic cysts without malignant potential, highlighting a need for improved approaches to preoperative diagnosis.43–45 Recent studies have identified a combination of molecular and clinical features that classified cyst type with >90% sensitivity and >90% specificity.19 46 However, separating low-grade and high-grade precancerous cysts remains a challenge even with these advanced approaches. The results of our genomic analysis of IPMN progression may improve this discrimination and thus contribute to multidisciplinary assessment and management of IPMN.

More specifically, our results provide further insights into the use of molecular alterations in cyst fluid for IPMN risk stratification. SMAD4 mutations were absent in our cohort of non-invasive IPMNs, and TP53 mutations were uncommon and limited to areas of high-grade dysplasia, suggesting that alterations in these genes are specific markers for IPMNs at high risk of malignant progression or associated invasive carcinomas.19 47 In contrast, KLF4 mutations were frequently limited to regions of low-grade dysplasia in our IPMNs analysed by multiregion sequencing of tissue samples, and mutations in this gene were significantly more prevalent in cyst fluid samples from low-grade IPMNs. These data suggest that KLF4 mutations may add to the discriminatory power of molecular cyst fluid analysis, though it is important to note that these mutations were also present in a smaller proportion of high-grade IPMNs. Thus, KLF4 mutations are not an entirely specific marker of low-grade dysplasia and will likely need to be interpreted in combination with other clinical and molecular features to accurately assess the risk of IPMN progression. Furthermore, because a single IPMN can contain both low-grade and high-grade components, the finding of a mutation suggestive of low-grade dysplasia does not rule out a higher grade component elsewhere in the IPMN. While the difference in prevalence of KLF4 mutations in low-grade and high-grade IPMNs was statistically significant, these observations suggest significant challenges to their clinical utility, and thus, the clinical impact of our study should be interpreted with caution. Assessment of KLF4 mutations in larger cohorts will be required to clarify the value added by these mutations to existing risk stratification approaches. In addition, assessment of KLF4 mutation prevalence in other pancreatic precancerous lesions and cyst types will be required to interpret the implications of these mutations in biospecimens.

Like all cancer genomics studies, our study has limitations. We analysed a relatively small number of IPMNs. Still, this represents the largest cohort to date of IPMNs without associated cancer analysed by whole-exome sequencing, and we analysed 76 IPMN exomes in total, a large increase in sample size from previous studies. Moreover, although we performed multiregion whole-exome sequencing, we analysed only a small proportion of the neoplastic epithelium in each case by microdissecting pathologically defined regions from one to three tissue blocks. The design of such multiregion sequencing studies represents a balance between the comprehensiveness of lesional sampling vs genomic analysis. In this study, we chose to comprehensively analyse the genome of the analysed regions but not sample the whole IPMN. Of note, we employed the opposite balance in a recently published study in which we analysed all available tissue from a cohort of IPMNs by targeted next-generation sequencing of a small driver gene panel.12 The latter approach allows comprehensive assessment of genetic heterogeneity with respect to known drivers but does not allow identification of new driver genes, which is a key finding in the current study. Taken together, the two approaches provide complementary insights into pancreatic tumourigenesis via the IPMN pathway.

It is also important to note that our cohort size did not permit us to perform computational methods of driver gene assessment such as MutSigCV.48 As such, the importance of infrequently mutated genes in our study should be interpreted with caution, as the mutation prevalences have not been corrected for important confounders such as gene size, nucleotide context, and replication timing. Still, we identify KLF4 mutations at oncogenic hotspots in >50% of analysed IPMNs, and these specific mutations are predicted to be drivers based on CHASMplus analysis. Thus, methods such as MutSigCV are not required to confirm the driver gene status of KLF4. Another important caveat in our study is that we analysed exomes from FFPE tissue, which has been documented to contain more artefacts than sequencing data from fresh or frozen tissue.49 Although this is one possible explanation for the observation that we identified a similar number of mutations in IPMN samples to that previously identified in PDAC, an alternative explanation is also possible. Because we performed laser capture microdissection and analysed multiple small, morphologically discrete regions of each IPMN, it is also likely that our experimental design allowed a higher sensitivity for subclonal mutations, particularly compared with bulk sequencing of paucicellular PDACs. A final caveat is that whole-exome sequencing, as employed in our study, cannot identify all types of genomic alterations. Future studies using whole genome sequencing will be required to confidently place chromosomal rearrangements, chromothripsis, and whole genome doubling on the timeline of IPMN tumourigenesis.

In this study, we report comprehensive multiregion whole-exome sequencing of pathologically well-characterised IPMNs with both low-grade and high-grade components. This approach identified a new genetic driver of IPMN tumourigenesis and highlighted unique evolutionary processes not previously appreciated in precancerous pancreatic neoplasia. In addition, our results provide a novel biomarker that may refine risk stratification of IPMNs using cyst fluid analysis.

Data availability statement

Data are available from the authors on reasonable request and approval of data sharing by institutional review boards.

Ethics statements

Ethics approval

This study was approved by Institutional Review Board of The Johns Hopkins Hospital. Patients or the public were not involved in the design, conduct, reporting, or dissemination plans of our research.


The authors thank Ralph Hruban for helpful discussions and pathological expertise in the histological subtyping of IPMNs for this study. The authors thank Bert Vogelstein, Janine Ptak, Natalie Silliman, Joy Schaeffer, Lisa Dobbyn and Maria Popoli for expert technical assistance.


Supplementary materials


  • KF, WH, MF and QS contributed equally.

  • Correction notice This article has been corrected since it published Online First. Figures have been replaced for clarity.

  • Contributors KF, WH, MF, QS, YJ and LDW designed the study. WH, MDM, CF-dC, MM-K, CMS, MY-S, RTL, RS, EDT, AML and LDW contributed to sample acquisition. WH, QS, JD, JC, PW acquired data. KF, MF, QS, JR, LZ, VBG, NR analysed data. KF, WH, MF, QS, JR, LZ, VBG, CGF, AB, MN, MJ, NJR, RK, YJ, LDW interpreted data. KF, LDW wrote the manuscript. All authors critically reviewed the manuscript. LDW provided study supervision.

  • Funding The authors acknowledge the following sources of funding: NIH/NCI P50 CA62924; NIH/NIDDKK08 DK107781; Sol Goldman Pancreatic Cancer Research Center; Buffone Family Gastrointestinal Cancer Research Fund; Carol S. and Robert M. Long Pancreatic Cancer Research Fund; Kaya Tuncer Career Development Award in Gastrointestinal Cancer Prevention; AGA-Bernard Lee Schwartz Foundation Research Scholar Award in Pancreatic Cancer; Sidney Kimmel Foundation for Cancer Research Kimmel Scholar Award; AACR-IncyteCorporation Career Development Award for Pancreatic Cancer Research; American Cancer Society Research Scholar Grant RSG-18-143-01-CSM; Emerson Collective Cancer Research Fund; Rolfe Pancreatic Cancer Foundation; Joseph C Monastra Foundation; The Gerald O Mann Charitable Foundation (Harriet and Allan Wulfstat, Trustees); Susan Wojcicki and Denis Troper; Lustgarten Foundation for Pancreatic Cancer Research; CAMS Innovation Fund for Medical Sciences 2016-I2M-1-001 and 2019-I2M-1-001; Virginia and D.K. Ludwig Fund for Cancer Research; Sol Goldman Sequencing Facility at Johns Hopkins; Howard Hughes Medical Institute; Associazione Italiana Ricerca Cancro (grant number: 12182).

  • Competing interests LDW receives research support from Applied Materials. VBG is an employee of Personal Genome Diagnostics. The other authors declare no conflict of interest.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.