Article Text

Download PDFPDF

Original research
Plasma extracellular vesicle long RNA profiling identifies a diagnostic signature for the detection of pancreatic ductal adenocarcinoma
  1. Shulin Yu1,2,
  2. Yuchen Li2,3,
  3. Zhuan Liao4,
  4. Zheng Wang5,
  5. Zhen Wang2,3,
  6. Yan Li2,3,
  7. Ling Qian1,2,
  8. Jingjing Zhao2,3,
  9. Huajie Zong2,6,
  10. Bin Kang7,
  11. Wen-Bin Zou4,
  12. Kun Chen1,2,
  13. Xianghuo He2,3,
  14. Zhiqiang Meng1,2,
  15. Zhen Chen1,2,
  16. Shenglin Huang2,3,
  17. Peng Wang1,2
  1. 1 Department of Integrative Oncology, Fudan University Shanghai Cancer Center, Shanghai, China
  2. 2 Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
  3. 3 Fudan University Shanghai Cancer Center, Key Laboratory of Medical Epigenetics and Metabolism, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
  4. 4 Department of Gastroenterology, Digestive Endoscopy Center, Changhai Hospital, the Second Military Medical University, Shanghai, China
  5. 5 Department of Hepatobiliary Surgery, First Affiliated Hospital, Xi’an Jiaotong University, Xi’an, China
  6. 6 Department of General Surgery, Huashan Hospital, Fudan University, Fudan University, Shanghai, China
  7. 7 Fudan University Shanghai Cancer Center - InstitutMerieux Laboratory, Cancer Institute, Fudan University Shanghai Cancer Center, Shanghai, China
  1. Correspondence to Dr Peng Wang, Department of Integrative Oncology, Fudan University Shanghai Cancer Center, Shanghai, China; wangp413{at}163.com; Professor Shenglin Huang, Fudan University Shanghai Cancer Center, Key Laboratory of Medical Epigenetics and Metabolism, Institutes of Biomedical Sciences, Shanghai, China; slhuang{at}fudan.edu.cn; Dr Zhen Chen, Department of Integrative Oncology, Fudan University Shanghai Cancer Center, Shanghai, China; zchenzl{at}fudan.edu.cn

Abstract

Objective Pancreatic ductal adenocarcinoma (PDAC) is difficult to diagnose at resectable stage. Recent studies have suggested that extracellular vesicles (EVs) contain long RNAs. The aim of this study was to develop a diagnostic (d-)signature for the detection of PDAC based on EV long RNA (exLR) profiling.

Design We conducted a case-control study with 501 participants, including 284 patients with PDAC, 100 patients with chronic pancreatitis (CP) and 117 healthy subjects. The exLR profile of plasma samples was analysed by exLR sequencing. The d-signature was identified using a support vector machine algorithm and a training cohort (n=188) and was validated using an internal validation cohort (n=135) and an external validation cohort (n=178).

Results We developed a d-signature that comprised eight exLRs, including FGA, KRT19, HIST1H2BK, ITIH2, MARCH2, CLDN1, MAL2 and TIMP1, for PDAC detection. The d-signature showed high accuracy, with an area under the receiver operating characteristic curve (AUC) of 0.960, 0.950 and 0.936 in the training, internal validation and external validation cohort, respectively. The d-signature was able to identify resectable stage I/II cancer with an AUC of 0.949 in the combined three cohorts. In addition, the d-signature showed superior performance to carbohydrate antigen 19-9 in distinguishing PDAC from CP (AUC 0.931 vs 0.873, p=0.028).

Conclusion This study is the first to characterise the plasma exLR profile in PDAC and to report an exLR signature for the detection of pancreatic cancer. This signature may improve the prognosis of patients who would have otherwise missed the curative treatment window.

  • extracellular vesicle
  • long RNA
  • pancreatic ductal adenocarcinoma
  • diagnosis

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known on this subject?

  • Pancreatic ductal adenocarcinoma (PDAC) is difficult to diagnose at resectable stage.

  • Carbohydrate antigen 19-9 (CA19-9) has a pooled sensitivity of 75.4% and a specificity of 77.6% for distinguishing pancreatic cancer from non-malignant forms.

  • Long RNA species have recently been found in human plasma extracellular vesicles (EVs).

What are the new findings?

  • This study provides the first genome-wide analysis of EV long RNAs (exLRs) in plasma from PDAC patients, suggesting the feasibility of identifying cancer biomarkers based on exLR profiling.

  • We developed an exLR-based diagnostic (d-)signature that showed high accuracy for the diagnosis of PDAC based on multicentric validation.

  • The d-signature was able to detect resectable stage I/II cancer as readily as stage III/IV PDAC tumours.

  • The d-signature could distinguish between CA19-9-negative PDAC cases and healthy controls, thus complementing the use of CA19-9 in PDAC detection. It could also distinguish PDAC from chronic pancreatitis with high accuracy.

How might it impact on clinical practice in the foreseeable future?

  • The d-signature can detect resectable PDAC with high accuracy, especially for patients with CA19-9 negativity, so that more patients, who would have otherwise missed the curative treatment window, can benefit from optimal therapy.

  • The low false-positive rate of the d-signature may prevent unnecessary pancreatic resection.

Introduction

Pancreatic ductal adenocarcinoma (PDAC) is the fourth leading cause of cancer-related deaths worldwide.1 The lack of early-stage diagnostics has hindered the development of therapeutics that can slow down or reverse PDAC.2 3 Carbohydrate antigen 19-9 (CA19-9) is the biomarker currently used for PDAC diagnosis.4 However, CA19-9 has a pooled sensitivity of 75.4% (95% CI: 73.4% to 77.4%) and a specificity of 77.6% (95% CI: 75.4% to 79.7%) for differentiation between malignant and non-malignant forms of cancer.5 Moreover, the specificity of distinction between PDAC and CP often does not exceed 60%,6 which has prompted a search for alternative biomarkers.

Extracellular vesicles (EVs, exosomes and microvesicles) are lipid bilayer-enclosed structures that contain various cargoes, including proteins, lipids and RNA.7 8 Recent studies have focused on the application of EV protein markers in the diagnosis of human cancers, as seen in studies where a single EV protein marker GPC1,9 or a combination of GPC1 with other protein markers, such as EGFR, EPCAM, MUC1 and WNT2,10 was developed for the diagnosis of PDAC with great accuracy, indicating that EVs can be an appealing source of diagnostic biomarkers for human cancers.

Long RNA species—mainly messenger (m)RNA, long non-coding (lnc)RNA and circular (circ)RNA—have been recently found in human plasma EVs, with clinical implications.11–13 The landscape and composition of long RNAs from plasma EVs remain elusive, mainly because of the inherent difficulties in the isolation and purification of EVs from blood and EV RNA sequencing (RNA-seq) library construction.14 It remains unknown whether EV long RNAs (exLRs) could serve as biomarkers for non-invasive diagnosis in human cancer, especially PDAC. We have recently developed an optimised strategy for exLR sequencing (exLR-seq) of human plasma. More than 10 000 exLRs could be reliably detected in each exLR-seq library derived from 1 mL of plasma.15 Plasma exLRs may reflect their tissue origins and hence, the exLR profile might distinguish patients with cancer from healthy individuals.15 We have previously conducted exLR-seq in 14 PDAC patients and 32 healthy persons.16 The preliminary evidence suggested that PDAC patients exhibit an exLR profile distinct from that of healthy persons. This result compelled us to further evaluate the value of exLRs as diagnostic biomarkers in PDAC.

In this study, we performed exLR-seq on plasma samples collected from 501 subjects including PDAC patients, chronic pancreatitis (CP) patients and healthy participants with the aim to establish an exLR-based signature for the detection of PDAC.

Patients and methods

Patient and public involvement

No patients were involved in setting the research question nor were they involved in developing plans for recruitment, design or implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the patient community.

Patients and clinical features

The study was conducted according to the standards for reporting of diagnostic accuracy (STARD) guidelines and the diagnostic power of resectable stage and all-stage PDAC samples was pre-estimated (>0.8).

Five hundred and one participants, including patients with PDAC (n=284), CP patients (n=100) and healthy controls receiving routine healthcare (n=117), were enrolled in this study. Among them, 323 participants that were recruited from Fudan University Shanghai Cancer Center (Fudan Center) between 31 January 2012 and 6 June 2017 were divided into training (n=188) and internal validation (n=135) cohorts. The 178 participants that served as the external validation cohort were recruited from Changhai Hospital of the Second Military Medical University (Shanghai, China; Changhai Center) and Xi’an Jiaotong University Affiliated Medical Center (Xi’an, China; Xi’an Center) between 11 September 2018 and 2 April 2019. Clinical features including age, gender, tumour stage (American Joint Committee on Cancer classification) and CA19-9 level are shown in table 1.

Table 1

Patients’ characteristics for the training, internal validation and external validation cohorts

Plasma sample collection

Blood samples were collected from all participants in 10 mL EDTA-coated Vacutainer tubes. Blood samples of PDAC patients were collected before surgery from patients with resectable tumours and before chemotherapy from patients with unresectable tumours. Five patients (one, two and two in the training, internal validation and external validation cohort, respectively) received neoadjuvant chemotherapy (regimens included gemcitabine/nab-paclitaxel or gemcitabine/oxaliplatin). Blood samples were collected before neoadjuvant chemotherapy. Plasma was separated by centrifugation at 800×g (~3000 rpm) for 10 min at room temperature (25°C) within 2 hours after blood collection and was then centrifuged at 16 000×g (~13 000 rpm) for 10 min at 4°C to remove debris. Plasma samples were stored at –80°C until use.

Isolation of EVs and EV RNA

For each patient, 1 mL of plasma was used, and EVs were isolated by affinity-based binding to spin columns using an exoRNeasy Serum/Plasma Kit (Qiagen, Hilden, Germany) following the manufacturer’s instructions. Briefly, thawed plasma was mixed with binding buffer and added to the exoEasy membrane affinity spin column. For transmission electron microscopy (TEM), size distribution measurement and Western blotting (see online supplementary methods), the EVs were eluted with 400 µL of XE elution buffer. To reduce the eluate volume (to 50 µL) and exchange buffer with phosphate buffer saline (PBS), samples were subjected to ultrafiltration using Amicon Ultra-0.5 Centrifugal Filter 10 kDa (Merck Millipore, Germany). For EV RNA isolation, EVs were lysed on the column using QIAzol (Qiagen), and total RNA was then eluted and purified.

Supplemental material

RNA-seq analysis

Total EV RNA isolated from 1 mL of plasma was treated with DNase I (NEB, Ipswich, Massachusetts, USA) to remove DNA. Strand-specific RNA-seq libraries were prepared using the SMARTer Stranded Total RNA-Seq Kit—Pico Input Mammalian (Clontech, Palo Alto, California, USA). Library quality was analysed using a Qubit fluorometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA) and Qsep100 (BiOptic, New Taipei City, Taiwan). EV RNA-seq libraries could be prepared from 1 mL of plasma on average and yielded 6.24 ng/µL for healthy, 9.73 ng/µL for CP and 15.19 ng/µL for PDAC subjects (20 µL volume of elute). ExLR-seq was performed on an Illumina sequencing platform (San Diego, California, USA) with 150 bp paired-end run metrics. The sequence data were deposited to GEO (accession number: GSE133684).

Raw reads were filtered using FastQC and aligned to the GRCh38 human genome assembly using HISAT2.17 Gene expression levels were calculated in transcripts per kilobase million (TPM). Annotations of mRNA and lncRNA in the human genome were retrieved from the GENCODE (V.25). T-distributed stochastic neighbour embedding (t-SNE) was conducted using the R package Rtsne.18 19 Differentially regulated exLRs were annotated gene IDs and assessed for Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment using DAVID (https://david.ncifcrf.gov/).

Data and statistical analyses

RNA-seq raw read counts were converted to TPM values to scale all comparable variates and normalised across all samples. Variates with frequencies of <25% (ie, expressed in less than 25% of the entire samples) were omitted, and remaining markers were used for subsequent statistical analyses. ExLR-seq TPM expression profiles from the Fudan Center (n=323) were randomly distributed in training (n=188) and validation cohorts (n=135). In the training cohort, the Mann-Whitney U test was used to assess differential expression of exLRs in the non-tumour and tumour patient cohorts, and the p value of each marker was adjusted by the Benjamini–Hochberg method to control the false discovery rate (FDR). ExLRs with FDR<0.05 and fold change >1.4 were retained and intersected with differential RNA-seq profiles based on The Cancer Genome Atlas (TCGA) PDAC tissue data sets (n=178) and Genotype-Tissue Expression (GTEx) normal pancreatic tissue data sets (n=171). Only the exLRs that were upregulated in these tissue data sets (FDR<0.05, fold change >1.4) were considered as valuable for PDAC classification.

Two statistical learning algorithms, least absolute shrinkage and selection operator (LASSO) and random forest based on the out-of-bag (OOB) error, were used to shrink sparse high-dimensional data in the training cohort. For LASSO regularisation, 50% of the training cohort was randomly sampled to process LASSO regression in 1000 repetitions. We used fivefold cross-validation and the Akaike information criterion to estimate the expected generalisation error and then selected optimal value of ‘1-se’ lambda parameter to construct an adaptive general linear model for marker selection. For the random forest method, the OOB error was used as a minimising criterion to select variates, and the dropping fraction parameter was set at 0.1 between two shrinking steps.

Eight exLRs evaluated by the mentioned algorithms and annotations were selected to construct a support vector machine (SVM) model for PDAC prediction. For binary (PDAC vs CP + healthy) sample classification, the SVM algorithm was executed using the ‘e1071’ package in R software. In principle, the SVM algorithm determines the location of all samples in a high-dimensional space, in which each axis represents an exLR and the expression level of a particular exLR in a sample determines its location on the axis. During the training process, the SVM algorithm draws a hyperplane that best separates the two classes based on the distance between the closest sample of each class and the hyperplane. The different sample classes are positioned at each side of the hyperplane. Moreover, to assess the predictive value of the SVM algorithm in an independent data set, which is not standardly included in the SVM training process, the algorithm was trained with the training data set, all SVM parameters were fixed and samples in the internal validation and external validation cohorts were then evaluated. Internal training performance of the SVM algorithm could be improved by enabling SVM tuning function, which implies optimal determination of parameters of the SVM algorithm (gamma, cost) by randomly subsampling the data set used for algorithm training (‘fivefold internal cross-validation’).

The d-signature was computed from the predictive strength of the SVM classifier output. To assess the samples’ probability of being predicted as PDAC, we used the R function ‘predict’ to evaluate prediction strength in quantitative terms on the internal validation and external validation cohorts. The prediction strength of the SVM classifier output was used to establish the exLR d-signature. The diagnostic efficacy of the d-signature was evaluated by receiver operating characteristic (ROC) curve analysis for the training, internal validation and external validation cohorts. The comparison between areas under the curve (AUCs) of different classifiers was evaluated by the bootstrap method with 2000 iterations. Youden’s index was determined to identify the optimal cut-off point for calculating exact diagnostic indices. d-Signature distribution in the different patient groups was tested by the Wilcoxon rank-sum test and Student’s t-test after using the Shapiro-Wilk test to determine data normality.

For missing data management, we dropped the samples without CA19-9 annotation (10 out of 100 pancreatitis and 14 out of 284 PDAC patients) from the diagnostic prediction of CA19-9. The diagnostic power of the d-signature was pre-evaluated by PASS software (V.11) before implementing the diagnostic analysis. The anticipant effect size (AUC: 0.90; upper and lower FPR: 1 and 0), sample size and statistical significant level (α: 0.05) were established on PASS software to estimate the potential efficacy for resectable stage (total: 0.99; training cohort: 0.83; internal validation cohort: 0.67; external validation cohort: 0.83) and all-stage PDAC samples (0.99) enrolled in this study. The resultant power represented the potential robust performance of the d-signature for PDAC diagnosis.

All statistical analyses were two-sided, and a p value <0.05 was considered statistically significant. The following R software packages were used in this study: ‘e1071’, ‘glmnet’, ‘varSelRF’, ‘pROC’ and ‘caret’. The statistical analysis and software/platform with their original sources were indicated in supplementary table S1. The flowchart of detail analysis process was shown in online supplementary figure 1. The core R codes were uploaded on github (https://github.com/Liyuchen-1993/Plasma-exLR-for-PDAC-diagnosis).

Results

EV isolation and exLR-seq

TEM showed the presence of rounded, cup-shaped, double-membrane-bound vesicle-like structures in plasma (figure 1A), and flow cytometry revealed a heterogeneous population of spherical nanoparticles, with abundant peaks ranging from 30 to 200 nm (figure 1B). Western blot analysis revealed characteristic TSG101 and CD63 expression in these isolated vesicles but not in peripheral blood mononuclear cell (PBMC), whereas Grp94 and calnexin, both are intracellularly enriched proteins located on the endoplasmic reticulum and used as negative-control protein marker for EVs identification, were detected in PBMC but not in isolated vesicles, indicating that the isolated EVs consisted mostly of exosomes (figure 1C).

Figure 1

Plasma exLR-seq results. (A–B) EVs were isolated and purified from plasma using membrane affinity spin columns and were detected by transmission electron microscopy (A) and flow cytometry (B). Scale bar, 200 nm. (C) Western blots of EV markers TSG101 and TCD63 in isolated vesicles. Grp94 and calnexin proteins, which should be detected in peripheral blood mononuclear cells (PBMC), but not in isolated vesicles, were used as controls. (D) Distribution of exLRs per sample among PDAC patients, CP patients and healthy subjects. (E) Three-dimensional scatter plot generated from t-distributed stochastic neighbour embedding (t-SNE) analysis for the differential exLR profiles of PDAC patients from those of healthy individuals and CP patients. (F) Heatmap of unsupervised hierarchical clustering of the exLRs differentially expressed between PDAC patients and controls (CP + healthy). Each column represents an individual sample, and each row represents an exLR. The scale represents the expression beta values. (G) KEGG pathway enrichment analysis for differentially expressed exLRs is shown in (F) conducted using David. CP, chronic pancreatitis; EV, extracellular vesicle; exLR-seq, EV long RNA sequencing; mTOR, mammalian target of rapamycin; PDAC, pancreatic ductal adenocarcinoma; TPM, transcripts per kilobase million; VEGF, vascular endothelial growth factor.

ExLR-seq was conducted using plasma samples from 117 healthy individuals, 100 CP patients and 284 PDAC patients. Approximately 15 000 annotated genes, including mRNAs, lncRNAs and pseudogenes, were reliably detected in each sample. Numbers of detected RNA species did not significantly differ between groups (figure 1D). A three-dimensional data scatterplot generated using t-SNE revealed that the exLR profiles of PDAC patients generally differed from those of healthy individuals and some CP patients (figure 1E). We identified 399 exLRs that were differentially expressed in PDAC compared with controls (CP + healthy) by Mann-Whitney U test (FDR<0.05, fold change >2). Unsupervised hierarchical clustering revealed a clear separation of PDAC and controls (figure 1F). KEGG pathway analysis revealed that differentially expressed exLRs were enriched for some pathways involved in cancer, such as pancreatic cancer, pathways in cancer, mTOR signalling pathway and VEGF signalling (figure 1G). In addition, we performed Gene Set Enrichment Analysis (GSEA) of a PDAC-related gene signature with the exLR profile between PADC and control samples. The PDAC-related gene signature was significantly enriched in PDAC samples (online supplementary figure 2). These results suggested that exLRs have potential as biomarkers for the detection of PDAC.

Establishment of an exLR d-signature for PDAC

The workflow used to identify an exLR d-signature for the diagnosis of PDAC is shown in figure 2. ExLRs (n=1502) that were upregulated in PDAC patients compared with controls were selected using a training cohort of 67 control (CP + healthy) individuals and 121 PDAC patients. To obtain cancer-relevant markers, these exLRs were integrated with RNA profiles from the TCGA PDAC 178 tissue RNA-seq data set and the GTEx 171 normal pancreatic tissue data set. Only exLRs (n=398) upregulated in both tissue and EV profiles were considered (FDR<0.05, fold change >1.4). The selected exLR markers were analysed using the random forest algorithm and the LASSO method to shrink the number of variables. Finally, eight exLR markers (FGA, KRT19, HIST1H2BK, ITIH2, MARCH2, CLDN1, MAL2 and TIMP1) were selected and used to construct a PDAC classifier (table 2). Using the SVM algorithm, we established a diagnostic model and generated an exLR d-signature for PDAC. The exLR d-signature comprising the eight exLRs distinguished PDAC from controls with an AUC of 0.960 (95% CI: 0.931 to 0.988), a sensitivity of 93.39% (95% CI: 87.39% to 97.10%) and a specificity of 85.07% (95% CI: 74.26% to 92.60%) in the training cohort. The diagnostic accuracy was 90.43% (95% CI: 85.29% to 94.23%) (figure 3A and table 3). The exLR d-signature was then applied to the internal validation cohort; PDAC was detected with an AUC of 0.950 (95% CI: 0.910 to 0.991), a sensitivity of 95.59% (95% CI: 87.64% to 99.08%) and a specificity of 88.06% (95% CI: 77.82% to 94.70%). The diagnostic accuracy was 91.85% (95% CI: 85.89% to 95.86%) (figure 3B and table 3). To assess whether the d-signature had the same or similar diagnostic value in different populations, 178 samples from the Changhai and Xi’an centers were assessed as an external validation cohort. The d-signature differentiated PDAC from CP and healthy controls with an AUC of 0.936 (95% CI: 0.889 to 0.983), a sensitivity of 93.68% (95% CI: 86.76% to 97.65%), a specificity of 91.57% (95% CI: 83.39% to 96.54%) and a diagnostic accuracy of 92.69% (95% CI: 87.83% to 96.05%) (figure 3C and table 3). The d-signature had an AUC value of 0.966 (95% CI: 0.936 to 0.996) for the Changhai center and of 0.953 (95% CI: 0.897 to 1.0) for the Xi’an center, respectively (online supplementary figure 3). Unsupervised hierarchical clustering using the eight exLRs effectively distinguished PDAC from controls with high specificity and sensitivity (figure 3D–F). The d-signature does not have crossover applicability to five other types of cancer (hepatocellular carcinoma, breast cancer, colorectal cancer, gastric cancer and kidney cancer) (online supplementary figure 4).

Figure 2

Workflow of data generation and analysis. Diagnostic marker selection; least absolute shrinkage and selection operator (LASSO) and random forest analyses were applied to a training cohort of 121 PDAC patients and 16 CP patients, and 51 healthy subjects, leading to a final selection of eight markers. These eight markers were applied to an internal validation cohort (68 PDAC, 44 CP and 23 healthy) and an external validation cohort (95 PDAC, 40 CP and 43 healthy). CP, chronic pancreatitis; EV, extracellular vesicle; exLR-seq, EV long RNA sequencing; FDR, false discovery rate; PDAC, pancreatic ductal adenocarcinoma; TPM, transcripts per kilobase million.

Figure 3

ExLR profiling in the diagnosis of PDAC. (A–C) Receiver operating characteristic curve for performance of the exLR d-signature in the training (n=188) (A), internal validation (n=135) (B) and external validation (n=178) (C) cohorts. (D–F) Unsupervised hierarchical clustering of eight exLRs selected for use in the d-signature in the training (D), internal validation (E) and external validation (F) cohorts. AUC, area under the curve; exLR, extracellularvesicle long RNA; PDAC, pancreatic ductal adenocarcinoma.

Table 2

Characteristics of differential expression of the eight markers for PDAC detection identified in this study

Table 3

Performance of d-signature in the diagnosis of PDAC

The exLR d-signature detects resectable PDAC

The true value of a biomarker for diagnosis of PDAC lies in its ability to detect PDAC at a resectable stage. We found that PDAC exhibited a high median exLR d-signature score when compared with CP (0.936 vs 0.180; Mann-Whitney U test, p<0.001) and healthy controls (0.936 vs 0.078; Mann-Whitney U test, p<0.001, figure 4A). We found no association of the d-signature with age or sex (online supplementary figures 5 and 6). We also observed no correlation between d-signature scores and tumour stages (figure 4B), suggesting that the diagnostic performance of the d-signature was independent of tumour burden, which would make it an optimal diagnostic tool for the detection of PDAC. Therefore, we next confirmed the diagnostic performance of the d-signature in PDAC of different stages. The AUCs for patients with stage I, II and III/IV cancers compared with controls (CP+healthy) were 0.966, 0.988 and 0.941, respectively, in the training cohort, 0.978, 0.980 and 0.927 in the internal validation cohort and 0.895, 0.929 and 0.972 in the external validation cohort (table 3). The AUCs for patients with stage I, II and III/IV tumours compared with CP were 0.931, 0.966 and 0.915, respectively, in the training cohort, 0.967, 0.969 and 0.910 in the internal validation cohort and 0.909, 0.947 and 0.991 in the external validation cohort (table 3). Further, the d-signature was able to identify resectable stage I/II cancer from control (CP+healthy) with an AUC of 0.949 (95% CI: 0.920 to 0.977) in the combined three cohorts (figure 4C). These results demonstrated that the exLR d-signature could be used for high-accuracy diagnosis of PDAC.

Figure 4

ExLR d-signature for the diagnosis of PDAC. (A) ExLR d-signature in healthy subjects (n=117), CP patients (n=100) and PDAC patients (n=284). (B) ExLR d-signature in PDAC patients with stage I (n=59), II (n=77), III (n=42) and IV (n=106) cancer. (C) ROC for the performance of the exLR d-signature in PDAC with resectable stage (stage I/II) in the combined cohorts. (D) ROC for the performance of the exLR d-signature in the diagnosis of CA19-9-negative PDAC in the combined cohorts. (E) ROC for the performance of the exLR d-signature compared with that of CA19-9 or their combination in the differential diagnosis of PDAC from CP in the combined cohorts. (F) Decision curve analysis to compare the net benefit of combined exLR d-signature and CA19-9 (red line) with that of CA19-9 alone (blue line) for PDAC vs CP in the combined cohorts. AUC, area under the curve; CA19-9, carbohydrate antigen 19-9; CP, chronic pancreatitis; exLR, EV long RNA; PDAC, pancreatic ductal adenocarcinoma; ROC, receiver operating characteristic.

The exLR d-signature has improved diagnostic performance for PDAC detection

The ability to complement the limitations of the current biomarker CA19-9 in the detection of PDAC (including distinguishing CA19-9-negative PDAC from controls) would add value to a biomarker for the diagnosis of PDAC. Figure 4D shows that the d-signature was able to distinguish CA19-9-negative PDAC from controls, with an AUC of 0.916 (95% CI: 0.858 to 0.975), a sensitivity of 90.91% (95% CI: 78.33% to 97.47%), a specificity of 88.48% (95% CI: 83.46% to 92.40%) and a diagnostic accuracy of 88.89% (95% CI: 88.43% to 92.43%) in the combined three cohorts (figure 4D). CA19-9-negative PDAC was distinguished from controls with an AUC of 0.939 (95% CI: 0.880 to 0.998) in the training and internal validation cohorts (online supplementary figure 7A) and 0.905 (95% CI: 0.807 to 1) in the external validation cohort (online supplementary figure 7B). In addition, CA19-9 alone differentiated PDAC from CP with an AUC of 0.873 (95% CI: 0.835 to 0.911), a sensitivity of 83.28% (95% CI: 78.75% to 87.90%), a specificity of 74.89% (95% CI: 65.36% to 84.00%) and a diagnostic accuracy of 81.10% (95% CI: 77.28% to 85.53%) in all cohorts combined. Whereas the d-signature could distinguish PDAC from CP with an AUC of 0.931 (95% CI: 0.897 to 0.964), a sensitivity of 94.07% (95% CI: 90.56% to 96.58%), a specificity of 86.92% (95% CI: 75.28% to 91.23%) and a diagnostic accuracy of 92.19% (95% CI: 88.14% to 95.01%) in the combined three cohorts and showed superior performance compared with CA19-9 (p=0.028, figure 4E). When we combined the d-signature with CA19-9, the AUC increased to 0.964 (95% CI: 0.943 to 0.984) (figure 4E). Decision curve analysis corroborated the superior performance of the combined exLR d-signature and CA19-9 when compared with CA19-9 alone (figure 4F). The performance of the d-signature for staged PDAC versus CP is shown in table 3. These results suggested that the d-signature alone is superior to CA19-9 but also complements CA19-9 in PDAC detection.

Discussion

In this study, we obtained exLR-seq expression profiles from 501 human plasma EV samples representing, to our knowledge, the largest published long RNA-seq expression profile library from human plasma EVs. In addition, we, for the first time, compared differences in exLR levels between patients with PDAC, patients with CP and healthy participants and established a diagnostic signature for PDAC.

Recent studies have suggested that EVs represent an appealing source of diagnostic biomarkers.9 10 20 21 The major breakthrough in the detection of PDAC has been the discovery of cell type-specific proteins in pancreatic cancer cell-derived exosomes,22 as shown in a study where the serum GPC1+ exosome level detected early-stage pancreatic cancer with a sensitivity and specificity of 100%.9 However, a later study suggested that GPC1 alone has a sensitivity of 82% and specificity of 52%,10 and a multiplexed protein signature is required because of tumorous heterogeneity. Recent studies have suggested that EVs contain long RNAs that function as mediators of intercellular communication.7 11 12 23–26 For instance, unshielded RN7SL1 can enter breast cancer cells via exosomes and activate the pattern recognition receptor RIG-I to promote aggressive features of cancer.27 The lncRNA lncARSR can be incorporated into exosomes and transmitted to sensitive cells, disseminating sunitinib resistance in renal cell carcinoma cells.28 Certain exLRs have been shown to be differentially expressed between cancer and healthy controls. For example, human telomerase reverse transcriptase mRNA is not detected in serum-derived exosomes from healthy persons, but it has been detected, with variations, in patients with certain types of cancers.25 All these previous studies suggested that EVs contain long RNAs that may serve as a non-invasive diagnostic biomarker.

Given the lack of a landscape and composition characteristics of long RNAs from plasma EVs, we performed exLR profiling of plasma samples from all 501 participants using an optimised exLR-seq strategy we recently developed.15 We established a d-signature that comprised eight exLRs (FGA, KRT19, HIST1H2BK, ITIH2, MARCH2, CLDN1, MAL2 and TIMP1) for PDAC detection. We validated the exLR-seq results by qPCR in the training cohorts (online supplementary table S3). We observed a high correlation between exLR-seq and qPCR results for the eight markers, except MAL2 and CLDN1 (online supplementary figure 8A). The discrepant exLR-seq and RT-qPCR results for these two markers may be explained by the fact that they were expressed at a low level in EVs and EVs contained only fragments, which may not be identified by RT-qPCR as RT-qPCR detects only a specific region, whereas RNA-seq captures the entire gene region.15 We used the qPCR data to construct an SVM classifier; the new classifier showed high diagnostic performance similar to that of the original exLR-classifier (AUC: 0.960 (exLR-seq) vs 0.938 (RT-qPCR), p=0.31, online supplementary figure 8B). Therefore, considering the clinical cost-effectiveness and high sensitivity of the classifier, prospective large-scale sample screening by RT-qPCR is viable for future studies.

To establish a plasma exLR d-signature for PDAC detection, we integrated plasma exLR-seq profiles with TCGA and GTEx data sets to identify markers relevant to cancer. Only exLRs upregulated in both tissue and EV data sets were considered. Therefore, the eight exLRs that were used in the final d-signature were all upregulated in both PDAC tissues and plasma EV when compared with normal pancreatic tissues or plasma EV from healthy or CP controls, respectively. TIMP1, KRT19 and MAL2 have been previously shown to be upregulated and to possess a tumor-promoting function in PDAC.29–32 MARCH2, a member of the MARCH family, is upregulated in colon cancer and its expression is correlated with advanced clinicopathological features and poorer overall survival in colon carcinoma33; however, its clinical significance in PDAC remains unknown. ITIH2 and CLDN1 were upregulated in PDAC tissue; however, their expression pattern usually varies between different other tumour entities, suggesting their function is context-dependent and tissue-dependent.34 35 Little is known about HIST1H2BK in previous cancer studies. For certain markers identified here, their potential for non-invasive cancer diagnosis has been reported previously, such as serum TIMP1 and FGA proteins used for the diagnosis of PDAC and gastric cancer, respectively.36–38 Collectively, these results suggested that plasma EVs contain a considerable number of exLRs that are potential biomarkers for PDAC detection.

PDAC is difficult to diagnose at a resectable stage, and most patients are at a locally advanced or metastatic stage at the time of initial diagnosis.1 When the disease is detected while it is still localised and treated with curative resection with or without neoadjuvant and adjuvant therapy, the 5-year survival rate is 32%.1 This rate drops to 12% for locally advanced disease and to 3% for distant metastatic disease.1 Survival rates can be substantially improved with the identification of biomarkers that can support accurate and reliable diagnosis. Here, we established an exLR signature that can distinguish PDAC patients from healthy subjects or patients with CP. Notably, tumour burden had little influence on the exLR d-signature score of patients, suggesting that the d-signature may help detect tumours that are resectable. Indeed, the d-signature detected resectable PDAC from controls (CP+healthy) with an AUC of 0.949, a sensitivity of 94.85% and a specificity of 88.48% in the combined three cohorts. With a low false-negative rate of 5.15%, this signature may improve the prognosis of patients who would have otherwise missed the curative treatment window. The low false-positive rate of 9.82% in the present study indicates that it can prevent unnecessary pancreatic resection, which has a reported mortality of 3.8%.39

Our study had the following strengths: (1) It provided genome-wide profiles of exLRs from human blood, demonstrating the feasibility of identifying cancer biomarkers based on exLR profiling. (2) It included a relatively large number of patients and controls, grouped into three cohorts, with 501 participants in total. Most importantly, we performed an external validation study using a cohort from two other medical centres to confirm the value of the exLR-based d-signature for PDAC detection. (3) The exLR d-signature overcomes certain limitations of the current biomarker, CA19-9, in the detection of PDAC; it is able to distinguish CA19-9-negative and resectable PDAC patients from healthy controls and PDAC from CP. Our findings suggested that the exLR d-signature complements CA19-9 for PDAC diagnosis.

There are limitations to our study. First, although the d-signature is a non-invasive diagnostic method and possesses potential for screening, its true value would lie in the fact that it can detect PDAC before the disease can be diagnosed by imaging, which would have to be assessed in longitudinal cohort studies using samples from large population-based biobanks. Therefore, further study is warranted before the d-signature can be used as a screening test for PDAC detection. Second, although we conducted a multicentric validation study using patient samples from different regions covering East and Northwest China, validation of the exLR d-signature in other ethnic populations or other countries may improve the efficacy and stability of this PDAC diagnostic method. Third, the epidemiologic studies have suggested strong associations between inflammatory stimuli such as cigarette smoking and alcohol consumption and the risk of pancreatic cancer.40 41 And therefore, it is meaningful to evaluate the diagnostic accuracy of the d-signature plus CA19-9, age, tobacco and alcohol use. However, the diet history was not collected and the tobacco and alcohol use were unavailable for the cohorts because of nature of retrospective analysis, which should also be considered as a limitation for this study.

In conclusion, our study indicated the value of exLR profiling in cancer marker discovery and established a diagnostic signature that can detect PDAC with high accuracy. This signature has potential clinical value for resectable PDAC diagnosis, so that more patients, who would have otherwise missed the curative treatment window, can benefit from optimal therapy.

Acknowledgments

We thank Novel Bioinformatics Ltd, Co for the support in bioinformatics analysis with the NovelBio Cloud Analysis Platform.

References

Footnotes

  • SY, YL, ZL and ZW are joint first authors.

  • Contributors PW, SH and ZC contributed to conception and design; SY, YL, ZL, ZhengW, ZhenW, YL, LQ, W-BZ and ZM contributed to provision of study materials or patients; SY, YL, ZL, ZhenW, YL, LQ, ZJ, HZ, W-BZ, KC and ZM collected and assembled the data. PW, SH, ZC, SY, YL and BK contributed to data analysis and interpretation. PW, SH, ZC, SY and YL wrote the manuscript. All authors finally approved the manuscript.

  • Funding This study was supported by the National Natural Science Foundation of China (81622049, 81672779, 81871989); the Shanghai Science and Technology Committee Programme (19XD1420900); Shanghai Education Commission Programme (17SG04) and Shanghai Municipal Commission of Health and Family Planning (201540191).

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval This study was approved by the Ethics Committee of the Fudan University Shanghai Cancer Center, Shanghai, China (approval no: 050432-4-1212B), and written informed consent was obtained from each participant in accordance with institutional guidelines.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement The RNA-Seq data have been deposited at the Gene Expression Omnibus (GEO) under the accession number GSE133684.

Linked Articles