Article Text

Download PDFPDF

Original article
Protein and glycomic plasma markers for early detection of adenoma and colon cancer
  1. Jung-hyun Rho1,2,
  2. Jon J Ladd1,2,
  3. Christopher I Li1,
  4. John D Potter1,3,4,
  5. Yuzheng Zhang1,
  6. David Shelley1,2,
  7. David Shibata5,
  8. Domenico Coppola6,
  9. Hiroyuki Yamada7,
  10. Hidenori Toyoda8,
  11. Toshifumi Tada8,
  12. Takashi Kumada8,
  13. Dean E Brenner9,10,
  14. Samir M Hanash11,
  15. Paul D Lampe1,2
  1. 1Translational Research Program, Public Health Sciences Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
  2. 2Human Biology Divisions, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
  3. 3School of Public Health, University of Washington, Seattle, Washington, USA17
  4. 4Centre for Public Health Research, Massey University, Wellington, New Zealand
  5. 5University of Tennessee Health Science Center, Memphis, Tennessee, USA
  6. 6Moffitt Cancer Center, Tampa, Florida, USA
  7. 7Wako Life Sciences, Inc., Mountain View, California, USA
  8. 8Department of Gastroenterology, Ogaki Municipal Hospital, Gifu, Japan
  9. 9Great Lakes New England (GLNE) Clinical Validation Center of EDRN, University of Michigan Medical Center, Ann Arbor, Michigan, USA
  10. 10VA Medical Center, Ann Arbor, Michigan, USA
  11. 11Department of Clinical Cancer Prevention, Red and Charline McCombs Institute for the Early Detection and Treatment of Cancer, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
  1. Correspondence to Dr Paul D Lampe, Translational Research Program, Public Health Sciences and Human Biology Divisions, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA; plampe{at}fredhutch.org

Footnotes

  • Contributors J-hR designed the work, conducted experiments, interpreted the data and wrote the manuscript. JJL conducted experiments, interpreted the data and wrote the manuscript. PDL conceived, designed and established the project, interpreted the data and wrote the manuscript. YZ performed statistical analyses and approved the manuscript. DS conducted experiments and approved the manuscript. SMH provided a subset of preliminary data for potential biomarkers and critically reviewed and approved the manuscript. CIL contributed the CHS colon cancer prediagnostic plasma samples and matched controls and critically reviewed and approved the manuscript. DEB collected and provided colon cancer diagnostic plasma samples from the EDRN project and approved the manuscript. HY provided the Japanese samples and critically reviewed the manuscript. TT, HT and TK supplied the Japanese samples and approved the manuscript. DS supplied the colon adenoma and cancer TMAs, helped interpret the results and approved the manuscript. DC performed the TMA staining, assigned the Allred scores, wrote the sections concerning this work and approved the manuscript. JDP provided plasma samples from the CPRU studies conducted at the University of Minnesota and edited and approved the manuscript.

  • Funding This work was funded in part by grants U01 CA152746 (PDL and SMH), U01 CA152637 (CIL and PDL) and U01CA086400 (DEB) from the National Institutes of Health as part of the EDRN, grant P50 CA130810 (GI SPORE (DEB)), the Kutsche Family Memorial Chair in Internal Medicine (DEB) and the Geriatric Research Education and Clinical Center at the Ann Arbor VA Medical Center. Assaying of the Japanese sample cohort was funded in part by Wako Diagnostics.

  • Competing interests Fred Hutchinson Cancer Research Center has filed patent applications on the results of this study. HY is an employee of Wako Life Sciences, Inc.

  • Ethics approval Fred Hutchinson Cancer Research Center institutional review board.

  • Provenance and peer review Not commissioned; externally peer reviewed.

View Full Text

Statistics from Altmetric.com

Significance of this study

What is already known on this subject?

  • Early detection of colon cancer by colonoscopy saves lives.

  • Colonoscopic screening of the entire average-risk population is not feasible.

  • Current assays for screening have low rates of compliance and faecal tests do not have sufficient sensitivity for adenoma detection.

What are the new findings?

  • Plasma levels of BAG family molecular chaperone regulator 4 (BAG4), interleukin-6 receptor subunit beta (IL6ST), von Willebrand factor (VWF), CD44 and epidermal growth factor receptor (EGFR) were higher in people diagnosed with colon cancer up to 3 years after the blood draw and in three subsequent sets of subjects with colon adenoma and/or cancer.

  • Plasma EGFR and CD44 have increased levels of sialyl Lewis-A and Lewis-X in people with adenoma and colon cancer.

  • The protein/glycomic panel shows relatively high sensitivity for adenoma and colon cancer.

How might it impact on clinical practice in the foreseeable future?

  • If our proposed panel maintains its performance for adenoma and cancer detection through formal validation trials that include controls with a variety of diseases, incorporation into an autoanalyzer platform would be warranted leading to the ultimate goal of replacing existing faecal tests.

Introduction

Colorectal cancer is the third leading cause of cancer-related deaths in the USA with an estimated 134 490 new cases and 49 190 deaths in 2014.1 Early detection substantially improves survival: the 5-year survival proportion is 90% when the cancer is detected at localised stages and can be treated by surgery; however, survival is 70% and 12% with regional or distant spread, respectively. Current guidelines recommend screening beginning at age 50 and continuing until age 75 with faecal immunochemical test (FIT) every year, flexible sigmoidoscopy every 5 years and/or colonoscopy every 10 years.2 Cologuard, a recently Food and Drug Administration (FDA)-approved faecal test, essentially combines FIT with DNA mutational and methylation analysis to achieve somewhat higher sensitivity for colon cancer.3 Colonoscopy and sigmoidoscopy result in a reduction of both incidence and mortality.4 ,5 Despite the benefit, approximately 45% of the US population remains unscreened by endoscopy.6 ,7 Partly due to this low screening rate, only 39% of cancers are detected at a localised stage.1 With current endoscopy and physician capacity, providing colonoscopic screening to the unscreened age-eligible population could take 10 years or longer.8 Theoretically, reserving colonoscopy for those with a positive FIT could result in coverage of the unscreened population, but its low sensitivity, particularly for adenoma and from a single test, and the fact that Faecal Occult Blood Test (FOBT) use has been low and trending down at approximately 15% of the age-appropriate group7 may limit this approach in practice.9 At this point, it is difficult to predict whether Cologuard, which is also a faecal test with a significantly higher cost, will have any better acceptance.

A strategy for overcoming this low rate of screening is urgently needed and blood-based biomarkers hold considerable promise for higher compliance as a widespread screening test because it could be combined with routine annual blood-based tests. A very recently approved test, DNA for the SEPT9 gene in blood, might be helpful but has only moderate and low sensitivity for colon cancer and advanced adenoma (AA), respectively.10 The presence of carcinoembryonic antigen-related cell adhesion molecule (CEA) in blood is applicable only for preoperative prognosis, recurrence prediction and detection of liver metastasis.11

Here, we report that a plasma biomarker panel consisting of five proteins and the sialyl Lewis-A and Lewis-X content of two of the markers performs well for prediction of colon adenoma and cancer. Preliminary discovery was made by high-density antibody array analyses of plasma from subjects enrolled in a large observational study of risk factors for cardiovascular disease, an ideal population to model early detection of cancer in the general population.12 ,13 Specifically, we compared plasma from people diagnosed with colon cancer up to 3 years after the blood draw with well-matched controls. Further testing of the 78 best performing markers in diagnostic plasma samples including adenoma, AA and cancer cases confirmed 32 of the markers. Optimal 4-marker panels (BAG family molecular chaperone regulator 4 (BAG4), interleukin-6 receptor subunit beta (IL6ST), von Willebrand factor (VWF) and CD44 or epidermal growth factor receptor (EGFR)) calculated from the prediagnostic samples were replicated in the diagnostic sample set. The sialyl Lewis-A and Lewis-X content of CD44 and EGFR increased the panel sensitivity. A third and a fourth independent sample set confirmed both the identity and increased levels of the five proteins and the panel performance. Further, BAG4, IL6ST and CD44 were increased in tissue microarrays (TMAs) containing colon adenomas and cancers, indicating they could be tumour derived. The final proteomic/glycomic panel performance compared very favourably with existing tests. Thus, we believe the panel should be further tested in large populations.

Methods

Customised antibody microarray

We produced high-density antibody arrays containing ∼3200 different antibodies that had been selected based on our previous research, the literature and large libraries of potential cancer biomarkers (the complete antibody list is essentially identical to one we have previously published14). The antibodies (most at 0.275 mg/mL) were printed in triplicate (3600×3=10 800 total spots: ∼1200 various types of control spots are also included) by covalently immobilising on N-hydroxysuccinimide (NHS)-ester reactive 3D thin film surface slides (Nexterion H slide, Schott AG, Jena Germany) using a Genetix arrayer.15 Printed arrays were placed in a humidity chamber (95%) overnight, then stored at −20°C. Arrays from this same print batch showed good interarray reproducibility of technical replicates with an average variation of 0.043 when 27 arrays were tested with replicate samples in a blinded manner.12

Study populations

Cardiovascular Health Study prediagnostic samples

The Cardiovascular Health Study (CHS) is a population-based, longitudinal study of coronary heart disease and stroke that recruited a total of 5888 men and 5201 women 65 or older in 1989–1999 and an additional 687 African-American men 65 and older in 1992–1993.16 Up to 10 years of annual clinic examinations were performed from the date of enrolment. Plasma samples from subjects with myocardial infarction, angina pectoris or stroke were excluded as they were reserved for studies of cardiovascular disease. A total of 126 subjects were newly diagnosed with colon cancer during the study, of which 79 cases were diagnosed within 36 months after a blood draw. These 79 cases were individually matched to controls (ie, no cancer) based on age, sex, body mass index (BMI) and smoking history from the data nearest in time to the blood draw (table 1A and see online supplementary figure S1).

Table 1

Characteristics of human subjects and plasma samples

Early Detection Research Network diagnostic samples

This diagnostic sample population was distributed for an Early Detection Research Network (EDRN) collaborative group project and was collected by the Great Lakes and New England Clinical Validation Center. Cases were diagnosed with adenoma (30 cases, those with tubular morphology but not with advanced characteristics), AA (30 cases: >1 cm, those with significant high-grade dysplasia, tubulovillous, villous, sessile serrated or traditional serrated histology or more than three adenomas of any size), early (30 cases: stage I–II) and late colon cancers (30 cases: stage III–IV). Plasma samples from healthy controls were collected prior to surveillance (30 controls) and screening colonoscopy (30 controls) (table 1B).

Minnesota prediagnostic samples

Plasma samples were collected prior to screening colonoscopy as part of the Cancer Prevention Research Unit (CPRU) studies conducted at the University of Minnesota. Plasma from clean colons (seven samples), villous polyps (seven cases), carcinoma in situ (seven cases) and invasive carcinomas (six cases) were randomly selected.

Japanese cohort samples

Serum samples were collected prior to colonoscopy (ie, prospective collection with retrospective blinded evaluation (PRoBE) compliant17) at the Ogaki Municipal Hospital, Ogaki, Gifu Prefecture, Japan. Serum from 168 Japanese individuals with normal lower GI tracts, 159 individuals with pathological findings defined as low risk for developing colon cancer within 5 years, including hyperplastic polyps or small tubular adenomas (not defined as AAs via above criteria), 59 individuals with UC and 514 individuals with colorectal cancer were collected.

Multidimensional array analyses on plasma samples

Protein analysis

To detect proteins in plasma, we removed albumin and IgG using a ProtIA spin column (Sigma Chemical CO, St Louis, Missouri, USA), and 200 μg of the remaining proteins from either the case or the control sample was labelled with NHS-Cy5 (all laboratory steps were blind to case status). A pool of plasma from healthy individuals was similarly treated and labelled with NHS-Cy3, and 200 µg was mixed with either case or control samples and analysed as previously described.15 ,18 After incubation with sample and processing, slides were scanned on a GenePix 4000B Microarray Scanner to produce Cy5 and Cy3 images. Array spots were analysed using GenePix Pro 6.0 image analysis software.

Sialyl Lewis-A and Lewis-X modified protein analysis

As previously described,19 we detected sialyl Lewis-A or Lewis-X carrying proteins on an array slide using plasma (10 μL) diluted 1:8 in phosphate buffered saline (PBS) containing 0.05% Tween-20 (PBST). After the slide was washed, bound sialyl Lewis-A or Lewis-X carrying proteins were simultaneously detected with Cy dye-labelled anti-sialyl Lewis-A or Lewis-X antibodies (US Biological; 5 μg/mL) using the GenePix Scanner and software as described previously.19

Statistical methods

Data from the scanned array image were imported to the Bioconductor R package Limma V.2.4.1120 using our published codes.21 For protein levels, the fold change of signal (red) compared with reference (green)—the M value—was calculated as log2(Rc/Gc), where Rc is red corrected and Gc is green corrected (using the normexp background correction method22). For sialyl Lewis-A and Lewis-X modified protein arrays, the R or G value was calculated as log2(Rc) or log2(Gc), which is the expression on the log2 scale after background correction. Saturated array spots were flagged and triplicate antibodies with coefficients of variation >10% were removed. For the M value, experimental variation was normalised using within-array print-tip Loess and between-array quartile normalisation.23 Triplicate features were summarised using their median. Statistical analyses were conducted on M, R or G values.

Values were standardised such that the mean value and SD of the cancer-free control group were set to zero and one, respectively. Values were further normalised using linear regression to remove age, sex and assay day effects for the EDRN arrays, and age, sex, BMI, smoking status and assay day effects for the CHS arrays. Markers were ranked based on the p value, OR and sensitivity at 90% specificity. OR is log2 based, such that a positive OR indicates levels greater in neoplasia than control, and negative values mean lower in neoplasia. To adjust for multiple hypotheses testing, a q value, the minimum false discovery rate, was calculated.24 Logistic regression was used to identify the combination of multiple markers that best distinguished cases from controls; the combined marker performance was calculated as a predictive index.25 Specifically, ‘logit(p)∼β01m1+…+βnmn’ was used, where p is the probability of being cancer, n is the number of genes and mi is the marker value after standardisation to mean 0 and SD equal to 1. The linear combination of proteomics value ‘risk=β1mi+…+βnmn’ is the risk score that can best discriminate the case and control difference. Coefficient values were calculated for the CHS samples (see online supplementary table S1) and applied to both CHS and EDRN results. Receiver operating characteristic (ROC) analysis was conducted for the CHS and EDRN sample sets using these risk scores. Since the Japanese samples were serum rather than plasma and used an independent method of analysis (Luminex vs array), risk scores and ROC analysis used optimal or equal weighting in their calculation. In practice, if a test sample risk score exceeded the cut-off value, it would be classified as a potential cancer or adenoma worthy of follow-up by colonoscopy.

Western blotting

Western blotting was performed as previously described.26 Plasma proteins (30 µg) after albumin and IgG depletion were separated using a reducing 4–12% Bis–Tris gel system with 3-(N-morpholino)propanesulfonic acid sodium dodecyl sulfate Running Buffer (Novex-ThermoFisher, Waltham, Massachusetts, USA). Protein-transferred nitrocellulose membranes were incubated with the appropriate primary antibody (anti-BAG4, IL6ST, CD44, EGFR and VWF: all rabbit polyclonal antibodies, cat numbers 2108.00.02, 2048.00.02, 4078.00.02, 3170.00.02 from SDIX (now sold by Novus, Littleton, Colorado, USA) and ab6994-100 from Abcam, Cambridge, Massachusetts, USA, respectively). The specific bands were detected by an Odyssey Imaging System (LI-COR. Lincoln, Nebraska, USA) after incubation with IR dye 800-labelled anti-rabbit IgG antibodies (LI-COR).

Immunohistochemistry and TMA

A colon cancer TMA block was constructed in the Tissue Core Facility at the Moffitt Cancer Center using a TMA Tissue Arrayer (Beecher Instruments, Estigen, Tartu Estonia). The diagnosis of each sample was confirmed and the area of interest was outlined by a pathologist with interest in GI pathology before being included in the TMA. TMA sections (3 μm thickness) were immunostained using a Ventana Discovery XT automated system (Tucson, Arizona, USA). Briefly, slides were deparaffinised on the automated system with EZ Prep solution (Ventana). The same BAG4 and IL6ST antibodies listed above and CD44 (#HPA005785, Sigma) were incubated at 1:200, 1:800 and 1:1000 dilution, respectively, in Dako antibody diluent (Carpenteria, California, USA) and for 60, 60 and 32 min, respectively. We used heat-induced antigen retrieval in Ribo CC (Ventana) for BAG4 and Cell Conditioning 1 (Ventana) for IL6ST and CD44. Next, Ventana OmniMap Anti-Rabbit Secondary Antibody was used for 16 min (BAG4), 20 min (CD44) and 8 min (IL6ST). Detection used the Ventana ChromoMap Kit, and the slides were then counterstained with haematoxylin.

Modified Luminex assays on plasma and serum samples

Plasma or serum samples were depleted of IgG and serum albumin as described for the arrays. Depleted samples were then reacted with a 20x molar excess of Sulfo-NHS-Biotin (ThermoFisher) at room temperature (RT) for 30 min. Free biotin was subsequently quenched with a 10x molar excess of ethanolamine (Sigma) on ice for 2 hours.

The same BAG4, IL6ST, VWF, CD44 and EGFR antibodies used for the array were each paired with a non-magnetic, COOH bead (Bio-Rad, Hercules, California, USA) that is uniquely labelled with two fluorescent dyes. Beads were activated with 0.2 M 1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide and 0.5 M Sulfo-NHS (ThermoFisher) in 0.1 M NaH2PO4, pH 6.2, for 20 min (RT in the dark as are subsequent steps). After washing with 50 μM 2-(N-morpholino)ethanesulfonic acid, primary antibodies were reacted with activated beads for 2 hours. After washing with PBS, beads were then blocked for 30 min in 1% BSA. Beads were then washed and stored in 1% BSA at 4°C. Five thousand of each unique antibody-coupled bead were added to individual wells of a filter plate (Millipore) and washed with PBST. Fifty microlitres of biotinylated sample (5 μg/mL total protein) was added to individual wells and shaken for 1 hour at RT. Beads were then washed 3 times with PBST and incubated with 50 μL of a 1:1000 Streptavidin-R-Phycoerythrin conjugate (BD Biosciences, San Jose, California, USA) for 30 min. Beads were washed 3 times with PBST and 125 μL of 1% BSA was added to each well. Fluorescent signal was read on a Bio-Rad Luminex 100 System. Fifty beads per region were counted.

Results

Discovery of colon cancer biomarkers in prediagnostic samples

Antibody arrays have been used for over 15 years to discover changes in protein levels.27 We constructed an in-house antibody array containing 3600 antibody spots printed in triplicate (total 10 800 spots) capable of binding >2100 different proteins. Approximately 1100 proteins were targeted by two or more antibodies to allow detection at different epitopes, including phosphorylation sites. The antibody coverage included most known cancer markers (eg, CEA, mucin-16 (CA-125) and prostate-specific antigen), many cytokines, extracellular portions of membrane receptors, secreted proteins and additional candidates from preliminary studies using earlier-format arrays and mass spectrometry. We have previously shown these arrays perform with high sensitivity (picogram level),15 minimal coefficient of variation14 ,15 (<10 % coefficient of variation for 85% of the array) and good interarray reproducibility of technical replicates.12

For discovery of potential early-detection markers, 79 case–control pairs of prediagnostic plasma samples from the CHS were analysed via antibody arrays (see online supplementary figure S1: design of case–control selection from the 11 776 participants in the study, and table 1A: demography of the study population). We express the difference between case and control protein levels as a log2(OR), such that a positive OR means the protein is higher in cases than controls, and negative means lower in cases than controls. For example, a value of 1 means the average of the cases is higher than the controls by 1 SD of the controls. A volcano plot in figure 1A indicates the OR and statistical significance of all measured antibodies for distinguishing cases from controls. Using selection criteria of p≤0.015 and area under the curve (AUC)>0.60 yielded 78 antibodies representing 74 unique proteins that are higher in cases than controls (four were represented with two antibodies each; see online supplementary table S2 for the entire list and online supplementary table S3 for M value data). From a cancer biology perspective, both upregulated and downregulated markers may be important. However, we chose to focus our further biomarker confirmation effort only on the upregulated markers since most if not all currently implemented cancer-related biomarkers are increased in cancers (see figure 2, Step 1).

Figure 1

Volcano scatter plots presenting proteomic expression and statistical significance of antibody array-analysed markers. Data showing case–control differences in protein levels as tested with 3600 antibodies using prediagnostic and diagnostic plasma sets. The x-axis represents the magnitude of change by log2 OR and the y-axis indicates statistical significance as −log10 of p value. A positive OR means that the marker was increased in cases with a filled circle indicating it reached statistical significance and area under the curve (AUC) above 0.60. Non-significant markers are indicated by an X and the significantly decreased marker by a –. (A) Data from prediagnostic samples consisting of 79 cases (draw <3 years prior to diagnosis) and 79 matched controls. A cut-off of p≤0.015 was applied for marker selection yielding 78 markers. (B) Performances of these 78 upregulated markers from A were tested in 120 cases (30 adenomas, 30 advanced adenomas, 30 stage I–II and 30 stage III–IV cancers) and 60 control diagnostic samples. A cut-off of p<0.05 was applied and significance and direction are indicated as in (A). Unfilled circles were significant by p values but had an AUC below 0.60.

Figure 2

A flow chart of the panel marker selection and confirmation processes reported in this study for early detection of adenoma and colon cancer. AUC, area under the curve; BAG4, BAG family molecular chaperone regulator 4; CHS, Cardiovascular Health Study; EDRN, Early Detection Research Network; EGFR, epidermal growth factor receptor; IHC, immunohistochemistry; IL6ST, interleukin-6 receptor subunit beta; PRoBE, prospective collection with retrospective blinded evaluation; sens, sensitivity; spec, specificity; SLeA, sialyl-Lewis A (CA19-9); SLeX, sialyl-Lewis X; UM, University of Minnesota; VWF, von Willebrand factor.

Confirmation of the prediagnostic markers via diagnostic plasma samples

The 78 prediagnostic markers we found to be increased were re-examined by antibody array using samples supplied as part of an EDRN collaborative group project and included plasma samples from 30 adenoma, 30 AA, 30 stage I–II cancer, 30 stage III–IV cancer and 60 control individuals (see table 1B and figure 2, Step 2). Using a cut-off of p<0.05 and AUC>0.60 for an increase in all cases (adenomas and cancers) versus controls, we found an impressive rate of confirmation: 41% of the 78 markers (32 markers) surpassed the cut-off level (figure 1B), 16 times the level expected by chance (ie, 2 markers=78×0.05 (cut-off)×0.5 (increased)). Of the remaining 46 markers, only one showed a significant (p<0.05) decrease. Table 2 lists the confirmed 32 antibodies identifying 31 unique proteins with their statistical performance.

Table 2

Performance of different proteins as early-detection plasma markers

Marker combinations and biomarker performance closer to diagnosis and at different colon sites

In order to find markers that could complement each other's performance, we used logistic regression to determine optimal 4-marker panels from the confirmed 32 markers against the CHS prediagnostic samples (see figure 2, Step 3). We found a panel of BAG4, IL6ST and VWF combined with either EGFR or CD44 had an AUC of 0.81 (40.9% sensitivity) or 0.79 (42.4% sensitivity) at 90% specificity, respectively (see table 3). Correlation coefficients of assay values between different pairs of the panel members were relatively low (r2<0.3). The expression of each of the panel constituent markers (BAG4, IL6ST, VWF, EGFR and CD44) in the CHS and EDRN sample sets is shown as a scatter plot in figure 3A, B. The performance of the 4-marker panels was then examined in the EDRN samples (figure 2, Step 4) using the same coefficient values calculated in the CHS data. The panel consisting of BAG4, IL6ST, VWF and EGFR or CD44 yielded an AUC of 0.87 or 0.85, respectively, for all cases and AUCs ranging up to 0.90 for cancers only (see table 3).

Table 3

ROC analysis of biomarker combination calculated from CHS and EDRN samples

Figure 3

Performance of prediagnostic marker panel members BAG family molecular chaperone regulator 4 (BAG4), interleukin-6 receptor subunit beta (IL6ST), von Willebrand factor (VWF), epidermal growth factor receptor (EGFR) and CD44 in prediagnostic and diagnostic sample sets. Adjusted M value plots with mean and SD indicated for BAG4, IL6ST, VWF, EGFR and CD44 show statistically significantly higher levels in the cases of (A) the Cardiovascular Health Study (CHS) and (B) Early Detection Research Network (EDRN) sample sets (*p<0.05; **p<0.01; ***p<0.001; ****p<0.0001).

To determine the performance of the markers in relation to proximity to cancer diagnosis and hence ascertain their utility for early detection, we analysed the data from the CHS samples by time from blood draw to diagnosis (see online supplementary figure S2). Consistent with better marker performance as disease progresses from adenoma to early-stage and late-stage cancers in the EDRN set, the markers showed more statistically significant changes and better sensitivity closer to diagnosis in the CHS samples (table 3). Next, we determined whether our panel detected future cancer diagnoses similarly at different locations. When the CHS data were stratified into proximal colon, distal colon and rectal sites, our panel performed well for proximal and distal colonic cancers but the signal for rectal cancers did not reach statistical significance perhaps due to a smaller sample (see online supplementary figure S3).

Western blot confirmation of BAG4, IL6ST, VWF, EGFR and CD44

First, the antibodies used for array discovery were tested to see if they yielded the appropriate bands via immunoblot. Briefly, six plasma samples (30 µg, IgG and albumin depleted) with known array M values for each marker were separated via gel electrophoresis, followed by blotting with the corresponding antibody. Predominant bands were detected at the expected molecular sizes for all antibodies (ie, 65, 103, 300, 140 and 80 kDa for BAG4, IL6ST, VWF, EGFR and CD44, respectively; see online supplementary figure S4). Importantly, for complex samples such as plasma, negative controls (no primary antibody) did not produce bands in these same regions.

The plasma samples used for western blotting confirmation (figure 2, Step 5) were collected prior to colonoscopy as part of a University of Minnesota-CPRU study. The set we examined included samples from clean colons (n=7), villous polyps (n=7), carcinoma in situ (n=7) and invasive cancer cases (n=6). After gel electrophoresis, immunoblotting and band densitometry, the mean BAG4 band intensities of villous adenoma (×5.6, p=0.0008) and invasive cancers (×2.8, p=0.0446) were significantly higher than that of the controls (figure 4). Carcinoma in situ showed a trend that was not statistically significant. Increased levels of IL6ST were confirmed in all three types: villous adenoma (×5.0, p=0.0008), carcinoma in situ (×5.2, p<0.0001) and invasive cancers (×6.5, p<0.0001). Increased VWF was confirmed in villous adenoma and invasive cancers (villous: ×1.5, p=0.0353 and invasive: ×2.1, p=0.0320). EGFR showed higher levels for carcinoma in situ and invasive cancer cases (carcinoma in situ: ×1.3, p=0.0437; invasive cancers: ×1.3, p=0.0023). CD44 was significantly increased in all three case subgroups: villous adenoma (×1.5, p=0.0399), carcinoma in situ (×1.3, p=0.0264) and invasive cancers (×1.3, p=0.0197).

Figure 4

Western blotting confirmation of increased levels of BAG family molecular chaperone regulator 4 (BAG4), interleukin-6 receptor subunit beta (IL6ST), von Willebrand factor (VWF), epidermal growth factor receptor (EGFR) and CD44. BAG4, IL6ST, VWF, EGFR and CD44 levels were examined in plasma from people with villous adenoma, carcinoma in situ or invasive cancer cases compared with controls (collected prior to colonoscopy). Albumin and IgG depleted plasma proteins (30 µg) were separated on a denaturing 4–12% Bis–Tris gel under reducing conditions. After immunoblot and primary antibody incubation, specific bands were visualised with a fluorescently labelled secondary antibody. Statistical significance: *p<0.05; **p<0.01; ***p<0.001; ****p<0.0001.

Immunohistochemistry on colorectal cancer, adenoma and normal tissues

In order to determine if the increase in the circulating levels of these five proteins was potentially tumour derived, we performed immunohistochemistry on TMAs (figure 2, Step 6). Colon tissue samples from 436 individuals with cancer, 263 with adenoma and 217 that were identified as having no neoplasm (control) were stained with antibody to each of the five proteins and scored using the Allred system.28 BAG4, IL6ST and CD44 showed an increase in staining (p<0.0001) in adenomas compared with control tissues (figure 5; see online supplementary figure S5). Both BAG4 and CD44 staining were significantly elevated in early-stage (I and II) cancer. Furthermore, increased BAG4 was confirmed in late-stage (III and IV) cancer as well. IL6ST showed increased staining in cancers, though the result was not statistically significant (p=0.0966). EGFR staining was elevated in adenomas (p=0.0788) and in all cancers (p=0.1390) compared with control individuals, but the results were not statistically significant (not shown). VWF did not show any epithelial staining, though there was vascular element staining as expected.

Figure 5

Colon tissue microarray (TMA) analysis for BAG family molecular chaperone regulator 4 (BAG4), interleukin-6 receptor subunit beta (IL6ST) and CD44 expression in adenoma and cancer tissues. Immunohistochemistry staining of BAG4, IL6ST and CD44 was scored based on Allred's method and statistical significance was calculated between normal colon tissue and each case category. Tested tissue samples included 217 normal subject colon tissues, 263 adenomas, 57 stage I, 167 stage II, 128 stage III and 84 stage IV cancer cases. Statistical significance: *p<0.05; ***p<0.001; ****p<0.0001.

Sialyl Lewis-A and Lewis-X modification of EGFR and CD44 and their contribution to the marker panel

We assayed CHS and EDRN samples for sialyl Lewis-A or Lewis-X glycan modifications on BAG4, IL6ST, VWF, CD44 and EGFR proteins (figure 2, Step 7). Only CD44 and EGFR exhibited increased levels of sialyl Lewis-A and Lewis-X glycan in cases compared with controls (see online supplementary figure S6 demonstrates the values of the CD44 and EGFR glycan expression detection). Moreover, the addition of glycan features improved the performance for detection of adenoma and cancer in both the CHS and EDRN samples (compare figure 6A with D, left side). Combination of the EGFR or CD44 protein and sialyl Lewis-A and Lewis-X content with the other four panel members increased the performance for adenoma/cancer detection for both prediagnostic (figure 6A and C, compare right side with middle plots) and diagnostic cases (figure 6B and D, compare right side with middle plots). The extent of improved detection for EGFR or CD44 glycosylation stratified by adenoma and cancer subgroups for the EDRN samples is shown in online supplementary figure S7 and AUCs and sensitivity at 90% specificity for CHS and EDRN samples in table 3.

Figure 6

Performances of epidermal growth factor receptor (EGFR) and CD44 sialyl Lewis-A and Lewis-X glycan markers in prediagnostic and diagnostic sets of plasma. (A–D left graphs) Plots for the predictive indices of EGFR or CD44 protein and EGFR-sialyl Lewis-A and Lewis-X glycan levels or CD44-sialyl Lewis-A and Lewis-X glycan levels. (A–D middle graphs) Case predictive indices of the 4-marker panels. (A–D right graphs) Predictive indices after addition of EGFR or CD44-glycan to the 4-marker panels. The dotted line represents 90% specificity. Statistical significance: **p<0.01; ***p<0.001; ****p<0.0001. BAG4, BAG family molecular chaperone regulator 4; CHS, Cardiovascular Health Study; EDRN, Early Detection Research Network; IL6ST, interleukin-6 receptor subunit beta; VWF, von Willebrand factor.

Screening of circulating proteomic and glycomic markers in a PRoBE-compliant cohort

A modified Luminex assay was developed for high-throughput (figure 2, Step 8), multiplexed screening of all five proteomic markers in 900 serum samples collected prior to colonoscopy and cancer diagnosis in Japan (figure 2, Step 9). Additionally, microarrays were used to assay these samples for sialyl Lewis-A and Lewis-X glycan features on CD44 and EGFR. All five proteins were statistically significantly elevated (p<0.001) in the sera of colorectal cancers compared with normal controls. Furthermore, all markers except BAG4 were significantly elevated (p<0.01) in individuals with colorectal cancer (CRC) compared with individuals with low-risk colon polyps or from individuals with UC. BAG4 showed statistically significant elevation (p<0.01) in CRC relative to colon polyp samples, but there was no significant difference from samples with UC (see online supplementary figure S8).

Given their performance, we performed optimal logistic regression on a combination of all five of these proteins and glycan modifications (see Statistical methods section for rationale). For all cancers versus all control groups, this panel had an AUC of 0.86 with a sensitivity of 73.0% at 90% specificity (table 4). This was similar across separate control groups as well (figure 7A). When cancers were separated by stage, we observed increasing sensitivities at 90% specificity of 62.3%, 71.6%, 77.6% and 81.6% for stage I, II, III and IV cancer, respectively (figure 7B, table 4). Upon examining the 120 rectal cancers compared with 168 normal controls, the sensitivity at 90% specificity was 72.5%, suggesting that the panel performs well for both the colon and rectum.

Table 4

ROC analysis of biomarker combination in Japanese cohort

Figure 7

Performance of a 5-protein panel by receiver operating characteristic (ROC) curves in Japanese serum sample cohort. (A) ROC curve of all colorectal cancers versus different control groups. (B) ROC curve of cancers separated by stage versus all controls.

Discussion

In this study, we performed antibody array analyses of prediagnostic colon cancer and control plasma samples that yielded 78 potential colon cancer early-detection markers, 32 of which were confirmed in control, colon adenoma and cancer diagnostic samples. Using the prediagnostic sample data, optimal panels of BAG4, IL6ST, VWF and EGFR or CD44 were identified. Testing of these panels in 60 colon cancer and 60 adenoma samples versus 60 controls from the EDRN diagnostic sample set showed good sensitivity and specificity. The increased levels of the five individual markers were then confirmed in a third independent colon adenoma and cancer sample set via western blotting, giving further confidence to protein identification and panel performance. Further confirmation was obtained for all panel members via a fourth independent, PRoBE-compliant sample set. Using the multidimensional assay capability of antibody arrays,29 we discovered that CD44 and EGFR contain sialyl Lewis-A and Lewis-X in patients with adenoma and cancer and inclusion of these data increased the sensitivity of the panel.

All five of the proteins are affected during colon carcinogenesis. In the publicly available Cancer Genome Atlas (TCGA) gene expression data for the available 143 colon cancer case and 19 controls, BAG4, CD44 and VWF showed a significantly elevated expression, with p<0.0001, p<0.0001 and p=0.001, respectively. Single nucleotide polymorphism arrays showed DNA copy number variation for VWF and EGFR in 54.2% (p<0.001) and 76.3% (p<0.001), respectively, for the 413 colon cancer cases compared with 462 controls available. Our immunohistochemical staining of independent tissue samples confirmed elevation of three of the five markers (BAG4, IL6ST and CD44) in epithelial tumour tissue compared with normal. Although VWF and CD44 are secreted and IL6ST and EGFR are plasma membrane proteins with forms found in plasma, BAG4 is normally expressed in the nucleus. Thus, either apoptosis of cells, aberrant protein export or increased/differential sequestration of BAG4 into exosomes30 could account for the levels found in blood.

We note that four of the five panel members are involved in antiapoptotic cell survival signalling pathways. CD44, known to be overexpressed in colon cancer compared with autologous normal colon, promotes resistance to apoptosis,31 and the best performing antibody to CD44 in all four sample sets was one to variant 3, which has been shown to be specifically upregulated in colon cancer.32 Both IL6ST and EGFR activate STAT3, allowing it to bind to the promoter region of the BAG4 gene, increasing expression.33 A STAT3 antibody on our array showed a statistically significant increase in the EDRN sample set (OR=0.49, p=0.0008; all cases vs controls). BAG4 prevents cell death signalling.34 Our results suggest that BAG4 may be overexpressed via IL6ST-triggered and EGFR-triggered STAT3 activation pathways. In support of this concept, we observed increased levels of activated IL6ST's heterodimeric partner IL12RB2 in the EDRN diagnostic samples (OR=0.55, p=0.0003; all cases vs controls) and STAT3 phosphorylation has been shown to be required for activation in colon cancer.35 EGFR overexpression is common in colon cancer, whereas several other cancers usually have mutations at phosphorylation sites.36

The biological relationship of the markers to each other indicated above is also reflected in their performance for adenoma and carcinoma. In terms of specificity for colon cancer, we have used our array to examine sample sets for lung (unpublished), breast37 and pancreas12 cancers. BAG4 and IL6ST were not increased in any of them, and they are reported to not be confidently associated with any other diseases.38 The other markers show disease associations but since CD44 variant 3 showed colon cancer specificity,32 we hypothesise that these three proteins will allow for colon cancer specificity for the panel in planned future studies that will use samples from people with a wide variety of GI issues and other cancers.

Our findings identify a protein/glycomic marker panel that compares well with colon cancer early detection blood or faecal tests. For comparison, we examined the levels of CEA in the EDRN samples by ELISA and detected cancer and adenoma with 38% (AUC=0.66) and 15% sensitivity (AUC=0.50), respectively, at 90% specificity. Combination of our panel with CEA in a subset of the Japanese cohort for which these values were available modestly improved sensitivity (76.7% vs 73.0% at 90% specificity) for cases versus all controls. This was driven primarily by stage III and IV cancers, which showed sensitivity increases of 8.2% and 12.6%, respectively. If instead of optimal we used equal weighting for the marker combination of the Japanese samples, the AUC decreased from 0.86 to 0.78 confirming panel utility.

Published values for faecal occult blood testing show detection of cancers with 50% sensitivity and larger (>1 cm) polyps with 17–46% sensitivity at 98.0% specificity.39 ,40 The FIT can detect cancer with 73.8% sensitivity, advanced precancerous lesions with 23.8% sensitivity and high-grade dysplasia with 46.2% sensitivity, all at 94.9% specificity.3 A recently FDA-approved test that includes a FIT component and DNA mutation and methylation analysis detects cancer with 51.6–92.3% sensitivity, larger (>1 cm) serrated sessile polyps with 42.4% sensitivity and 1–5 cm adenomas with 82.0% sensitivity at optimal specificities ranging from 86.6% to 94.4%.3 ,41 The test for SEPT9 DNA in blood has 48.2% and 11.2% sensitivity for colon cancer and adenoma, respectively, at 91.5% specificity.10

Adenomas have different long-term risks depending on histological type, size and number. AAs, tubulovillous and villous adenomas have a higher risk of cancer mortality than small and tubular adenomas.42 Larger (≥1 cm) polyps are recommended for increased surveillance and/or excision, but smaller adenomas do not trigger increased surveillance.43 In our study, the marker panel could detect both adenomas and cancers. The high sensitivity of the panel to detect adenoma in the EDRN set was probably due to the fact that the panel was initially created from the CHS prediagnostic set of samples where early stages of disease are prevalent. In fact, many different panels with superior performance for stage II–IV cancer detection could have been devised from the diagnostic data, but their performance in the earlier stages would be poor. Thus, as previous studies have indicated,44 we would argue that prediagnostic subjects with unknown disease status at the time of blood draw are both the most appropriate samples for discovery and a good model for what would be required for performance in a general population screen.37 ,45

Sialyl Lewis-A (CA19-9) is the primary biomarker used for surveillance of pancreas and other GI cancers. Production of the Lewis antigen is controlled genetically and Lewis-negative individuals (10% in the Caucasian population46) do not produce sialylated Lewis antigens even when a large tumour is present.47 Although not fully understood, the concentration of the marker is influenced by the patient's secretor status (FUT2 gene) and Lewis genotype (FUT3 gene).48 We found that many subjects with adenoma and cancer overexpressed sialyl Lewis-A and/or Lewis-X on EGFR and CD44, particularly in the Japanese cohort. Both EGFR and CD44 have been reported to have high levels of fucosylation and sialylation in cancer,49 ,50 consistent with increased levels of sialyl Lewis-A and Lewis-X.

In conclusion, the current study identifies a panel of colon cancer early-detection markers that have high sensitivity for adenoma, AA and colon cancer. A strength of this report is the confirmation of the upregulation of all five panel members in four different sample sets—including three that are PRoBE-compliant—using multiple diagnostic techniques (ie, array, immunoblot and Luminex assays). However, several issues will have to be addressed prior to translation of these results to a clinically useful test. We will need to test the panel performance on samples from people with many other diseases to determine the specificity for colon adenoma and cancer, and the assays will need to be converted to a highly quantitative, high-throughput platform. Furthermore, optimal sensitivity and specificity calculations will have to take into account cost–benefit analysis to ensure the additional colonoscopies that should be performed based on false-positive results are appropriate. For this analysis, the performance of the panel should be compared with FIT and other faecal tests. Colonoscopy and sigmoidoscopy have the advantage that polyps can be removed during the procedure but are expensive and invasive. If validated, the proteomic/glycomic test could be performed in most clinics at the time of an annual check-up in conjunction with other blood tests. Given that these assays should be easily converted to common autoanalyzer ELISA-based platforms, it would have the considerable advantages of being relatively non-invasive and inexpensive compared with Cologuard and colonoscopy.

References

View Abstract

Footnotes

  • Contributors J-hR designed the work, conducted experiments, interpreted the data and wrote the manuscript. JJL conducted experiments, interpreted the data and wrote the manuscript. PDL conceived, designed and established the project, interpreted the data and wrote the manuscript. YZ performed statistical analyses and approved the manuscript. DS conducted experiments and approved the manuscript. SMH provided a subset of preliminary data for potential biomarkers and critically reviewed and approved the manuscript. CIL contributed the CHS colon cancer prediagnostic plasma samples and matched controls and critically reviewed and approved the manuscript. DEB collected and provided colon cancer diagnostic plasma samples from the EDRN project and approved the manuscript. HY provided the Japanese samples and critically reviewed the manuscript. TT, HT and TK supplied the Japanese samples and approved the manuscript. DS supplied the colon adenoma and cancer TMAs, helped interpret the results and approved the manuscript. DC performed the TMA staining, assigned the Allred scores, wrote the sections concerning this work and approved the manuscript. JDP provided plasma samples from the CPRU studies conducted at the University of Minnesota and edited and approved the manuscript.

  • Funding This work was funded in part by grants U01 CA152746 (PDL and SMH), U01 CA152637 (CIL and PDL) and U01CA086400 (DEB) from the National Institutes of Health as part of the EDRN, grant P50 CA130810 (GI SPORE (DEB)), the Kutsche Family Memorial Chair in Internal Medicine (DEB) and the Geriatric Research Education and Clinical Center at the Ann Arbor VA Medical Center. Assaying of the Japanese sample cohort was funded in part by Wako Diagnostics.

  • Competing interests Fred Hutchinson Cancer Research Center has filed patent applications on the results of this study. HY is an employee of Wako Life Sciences, Inc.

  • Ethics approval Fred Hutchinson Cancer Research Center institutional review board.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.