Article Text

Download PDFPDF

Original article
Gene expression profiling-derived immunohistochemistry signature with high prognostic value in colorectal carcinoma
  1. Wenjun Chang1,
  2. Xianhua Gao2,
  3. Yifang Han1,
  4. Yan Du1,
  5. Qizhi Liu2,
  6. Lei Wang3,
  7. Xiaojie Tan1,
  8. Qi Zhang,
  9. Yan Liu1,1,
  10. Yan Zhu4,
  11. Yongwei Yu4,
  12. Xinjuan Fan3,
  13. Hongwei Zhang1,
  14. Weiping Zhou5,
  15. Jianping Wang3,
  16. Chuangang Fu2,
  17. Guangwen Cao1
  1. 1Department of Epidemiology, Second Military Medical University, Shanghai, China
  2. 2Department of Colorectal Surgery, Changhai Hospital, Second Military Medical University, Shanghai, China
  3. 3Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
  4. 4Department of Pathology, Changhai Hospital, Second Military Medical University, Shanghai, China
  5. 5Department of Surgery, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, China
  1. Correspondence to Professor Guangwen Cao, Department of Epidemiology, Second Military Medical University, 800 Xiangyin Rd., Shanghai 200433, China; gcao{at}smmu.edu.cn

Abstract

Objective Gene expression profiling provides an opportunity to develop robust prognostic markers of colorectal carcinoma (CRC). However, the markers have not been applied for clinical decision making. We aimed to develop an immunohistochemistry signature using microarray data for predicting CRC prognosis.

Design We evaluated 25 CRC gene signatures in independent microarray datasets with prognosis information and constructed a subnetwork using signatures with high concordance and repeatable prognostic values. Tumours were examined immunohistochemically for the expression of network-centric and the top overlapping molecules. Prognostic values were assessed in 682 patients from Shanghai, China (training cohort) and validated in 343 patients from Guangzhou, China (validation cohort). Median follow-up duration was 58 months. All p values are two-sided.

Results Five signatures were selected to construct a subnetwork. The expression of GRB2, PTPN11, ITGB1 and POSTN in cancer cells, each significantly associated with disease-free survival, were selected to construct an immunohistochemistry signature. Patients were dichotomised into high-risk and low-risk subgroups with an optimal risk score (1.55). Compared with low-risk patients, high-risk patients had shorter disease-specific survival (DSS) in the training (HR=6.62; 95% CI 3.70 to 11.85) and validation cohorts (HR=3.53; 95% CI 2.13 to 5.84) in multivariate Cox analyses. The signature better predicted DSS than did tumour-node-metastasis staging in both cohorts. In those who received postoperative chemotherapy, high-risk score predicted shorter DSS in the training (HR=6.35; 95% CI 3.55 to 11.36) and validation cohorts (HR=5.56; 95% CI 2.25 to 13.71).

Conclusions Our immunohistochemistry signature may be clinically practical for personalised prediction of CRC prognosis.

  • Colorectal Cancer
View Full Text

Statistics from Altmetric.com

Significance of this study

What is already known on this subject?

  • Gene expression signatures predicting the prognosis of colorectal cancer (CRC) have been continually developed in the past 10 years.

  • None of the previously reported CRC signatures has been applied in clinical settings possibly because of low reproducibility.

  • Although immunohistochemistry (IHC) signatures have been tested for CRC prognosis, none of them was developed by integrating current gene signatures.

What are the new findings?

  • Of 25 gene signatures evaluated, five were verified to have higher consistency and/or better performance in three independent combined microarray cohorts.

  • The five signatures were integrated to construct a functional subnetwork, from which three network-centric molecules were identified to have prognostic value in CRC.

  • A four-molecule IHC signature derived from the above analysis had a higher prognostic value than tumour-node-metastasis staging in our training and validation cohorts.

How might it impact on clinical practice in the foreseeable future?

  • Our four-molecule IHC signature will be more clinically practicable than previously reported gene signatures due to its convenience and high reproducibility in the clinical settings.

  • Signature genes identified might be potential therapeutic targets for CRC recurrence.

Introduction

Colorectal carcinoma (CRC) is the third most commonly diagnosed cancer in men and the second in women worldwide.1 Surgery remains the mainstay of curative treatment. However, a subset of patients will develop local recurrences and metachronous metastases after resection of the primary tumour. To properly address postoperative surveillance and treatment, it is necessary to develop prognostic markers to characterise the heterogeneity of CRC. The tumour-node-metastasis (TNM) staging system, based on the anatomical extent of the disease, is a well-established prognostic system. However, its prognostic value has been recently challenged.2 There have been many ‘prognostic marker’ studies aiming to improve the prognostic prediction of the TNM system. However, most proposed biomarkers for CRC are not clinically implemented due to lack of reproducibility and/or standardisation.3 Gene expression profiling has provided an opportunity to understand the diversity of cancers and shown great promise in accurate prediction of prognosis and therapeutic response, paving the way for personalised medicine.4 In CRC, a substantial number of pioneering studies have been conducted on prognosis-related global gene expression in tumours.5–27 These studies have identified gene signatures that are prognostic, predictive or both for CRC patients. However, none of these signatures has been adopted in clinical practice, possibly because of low reproducibility. A meta-analysis indicated that most signatures showed a significant association with prognosis in their training datasets but none of the signatures performed satisfactorily when the prediction ability was assessed in independent datasets.28 The heterogeneity in cell populations of tumour mass might dilute the prognosis molecular signal. These gene signatures only slightly overlap in gene identity, which was perplexing with regard to their clinical application. This lower-than-expected overlap is likely due to differences in patient cohort, microarray platform and data-mining method. Therefore, it is beneficial to know if these gene signatures with little overlap can be connected together to form a molecular network with robust stratification power and high stability in various CRC cohorts. Current methods of gene expression profiling usually require frozen tissues for analysis. However, most available samples are formalin-fixed paraffin-embedded (FFPE), which has been the standard storage method for decades. Immunohistochemistry (IHC) using FFPE specimens is highly available in medical centres because of its convenience, low cost and high reproducibility. Hence, IHC using FFPE specimens could be an important step in translating the findings of current gene expression profiling. Here, we applied a systematic approach to evaluate the robustness of CRC-related signatures and construct a subnetwork by comparing and integrating currently available microarray data. Based on the network-derived and top overlapping molecules, we developed a powerful IHC signature with high prognostic value in CRC.

Methods

Selection of candidate prognostic biomarkers

We searched PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) for gene expression signatures concerning CRC prognosis, published up to 31 December 2012. We also constructed an inhouse CRC prognosis-related gene signature (see online supplementary table S1) using microarray datasets GSE28702 and GSE5206.29 ,30 CRC microarray datasets with prognosis information were retrieved from GEO (http://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (http://www.ebi.ac.uk/arrayexpress/). After excluding datasets with small sample size and duplications, we combined the datasets of the same platforms. After gene identity mapping, prognostic values of gene signatures were reassessed separately in each combined microarray dataset using a modified nearest template prediction method.31 Genes from robust signatures were mapped and imported to NetBox (http://cbio.mskcc.org/tools/netbox/) that queried the human protein–protein interaction network for interactions between effective linkers and seeds to construct a functional subnetwork. This procedure is detailed in online data supplement. The top overlapping molecule and the network-centric molecules were selected for subsequent study.

Study patients

We obtained pathologically proven FFPE specimens of 1097 stages I–III CRC patients with typical adenocarcinoma histology. Of these, 706 received curative surgery in Changhai Hospital, Second Military Medical University (Shanghai, China) between January 2001 and December 2009 and 391 received curative surgery in The Sixth Affiliated Hospital, Sun Yet-Sen University (Guangzhou, China) between January 2000 and January 2006. Baseline information for each specimen donor, including age, gender, disease location, and TNM staging at surgery and rule-based postoperative chemotherapy (FOLFOX regimen), was documented. TNM staging was reclassified according to the American Joint Committee on Cancer staging manual (seventh edition). Flow diagram and selection criteria of study patients for survival analysis are presented in online supplementary figure S1. We also acquired FFPE specimens of mucosa from 51 haemorrhoids patients, of adenoma from 51 polyps patients and freshly frozen tumour tissues from 96 stage I–III CRC patients who received surgery in Changhai Hospital. All participants are self-reported Han Chinese. This study was approved by the institutional review boards of Changhai Hospital and The Sixth Affiliated Hospital. A written informed consent was obtained from each patient.

Quantitative RT-PCR, RNA silencing and western blot

Relative mRNA levels of GRB2 (growth factor receptor-bound protein 2), PTPN11 encoded SHP2 (Src homology phosphotyrosine phosphatase 2), ITGB1 (integrin β1) and POSTN (periostin) in freshly frozen CRC tissues were measured using quantitative reverse transcription (RT)-PCR. The primers and PCR condition are presented in online supplementary table S2. The frozen CRC tissues with mRNA concentration gradients of the four molecules were randomly selected for protein analysis. Relative protein levels of the four molecules in the frozen CRC tissues and human CRC cell lines were measured using western blot. Western blot with siRNA transfection was also performed to evaluate the specificity of antibodies. These assays are detailed in online data supplement.

Immunohistochemistry

Tissue microarrays (TMAs) containing the specimens from Changhai Hospital were commercially developed (Outdo Biotech, Shanghai, China) and those from The Sixth Affiliated Hospital were developed conforming to the guidelines.32 ,33 The construction of tissue microarrays is detailed in online data supplement. IHC was carried out in Pathology Core Laboratory of Changhai Hospital. Rabbit monoclonal antibodies to GRB2 (1:150, #1517-1, Epitomics, Burlingame, California, USA), mouse monoclonal antibody to ITGB1 (1:50, ab3167, Abcam, Cambridge, UK), and rabbit polyclonal antibodies to human POSTN (1:500, ab14041, Abcam) and SHP2 (1:100, #AP8471e, Abgent, San Diego, California, USA) were used for IHC according to protocols provided by the manufacturers. Intratumoural heterogeneity of these antibodies was assessed by examining eight randomly selected spots of each whole mount from the remaining blocks of specimens originally used for the development of TMA from 10 patients. The procedures of IHC and the intratumoural heterogeneity assays are presented in online data supplement. Scores were independently assessed by eight researchers including two pathologists (YZ, YY) blinded to clinical data as previously described.34 Briefly, staining intensity was graded as 0 (negative), 1 (weak), 2 (moderate) and 3 (strong); staining extent was graded as 0 (0%–4%), 1 (5%–24%), 2 (25%–49%), 3 (50%–74%) and 4 (>75%). Values of the intensity and the extent were multiplied as an immunoreactive score. The criteria of IHC scoring for the four proteins were first determined by the two pathologists and followed by the others. There was a close agreement on staining intensity (81%) and staining extent (85%) between the two pathologists. Disagreements were resolved by consensus.

Follow-up and survival analysis

Follow-up exam was performed at our outpatient clinics every 3–6 months or outpatient clinics of local hospitals whenever the patients had related symptoms and/or syndromes. Median follow-up time was 58 months (IQR 30–78). At the follow-up exam, serum levels of carcinoembryonic antigen (CEA) and carbohydrate antigen 19-9 (CA19-9) were measured and an abdominal ultrasonography was performed for all patients. For those suspected of CRC relapse, CT, MRI and/or colonoscopy were conducted to confirm the diagnosis. The final date of follow-up was 10 January 2012 for patients from Changhai Hospital (the Shanghai cohort) and 5 August 2010 for patients from The Sixth Affiliated Hospital (the Guangzhou cohort).

Patients with intact IHC data were included in survival analysis. Our primary outcome of interest was disease-free survival (DFS). DFS was defined as months from the date of receiving surgery to the date of first relapse. Patients who experienced second primary tumours of other histotypes were counted as censored in the DFS analysis. Disease-specific survival (DSS) was measured in months from the date of receiving surgery to the date that patient died of CRC. We identified each biomarker whose immunoreactive score was significantly associated with DFS, then selected an optimal combination of molecules with high prognostic values to form an IHC signature using the samples of patients from the Shanghai cohort as a training set. Prognostic value of the signature was subsequently validated in the patients from the Guangzhou cohort as an external validation set.

Statistical analysis

Pearson's r test was applied to evaluate the correlations of mRNA levels with protein levels of selected molecules in freshly frozen CRC tissues. Spearman's r test was used to evaluate the correlations of immunoreactive score with TNM stage. Intratumoural heterogeneity of IHC scores was assessed separately for each marker by calculating the coefficient of variation (CV). To evaluate the effects of each marker and their combinations on the prediction of DFS in the training cohort, lasso penalised multivariate Cox proportional hazards model was employed to perform the variable selection and shrinkage and a leave-one-out cross-validation method was used to calculate regression coefficient of each marker and cross-validation partial likelihood of each formula.35 Only the marker with non-zero regression coefficient was used to form a formula of risk score with the regression coefficient as a weight. Receiver operating characteristics (ROC) curve was also used to compare the validity of each formula in predicting the 5-year DFS. A formula with maximal partial likelihood and area under ROC curve (AUC) was considered as the best formula. The cut-off point of the best formula, which could partition patients into high-risk and low-risk subgroups, was optimised using X-tile software (http://medicine.yale.edu/lab/rimm/research/software.aspx).36 χ2 Test, Fisher's exact test, Student t test, and Mann–Whitney U test were used to determine differences in clinicopathogical variables between high-risk and low-risk subgroups in both cohorts. Kaplan–Meier analysis with log-rank test was used to estimate DFS and DSS. Multivariate Cox regression analysis was performed to determine contribution of the IHC signature to the survivals, adjusting for age, gender, disease location, TNM stage, tumour differentiation grade, lymph nodes examined, serum CEA, serum CA19-9 and adjuvant chemotherapy. All statistical analyses were two-sided and conducted using R (http://www.r-project.org/) and SPSS V.16.0.2 for Windows (SPSS, Chicago, Illinois, USA). Significance was defined as p<0.05.

Results

The top overlapping and network-centric molecules

We initially included 24 published signatures (see online supplementary table S3)5–27 and an inhouse signature. The 25 signatures contained a total of 1708 unique genes and none of them appeared in all signatures. The top overlapping genes in the 25 signatures were POSTN (six times), followed by CYP1B1 and SPP1 (five times), 23 genes (three times), and 155 genes (twice). We excluded five signatures without information of gene expression direction.7 ,9 ,11 ,12 ,26 The remaining 20 gene signatures were subjected for nearest template prediction analyses in each of four combined microarray datasets from Affymetrix platforms (see online supplementary tables S4 and S5). Twelve of the 20 signatures had positive prevalent prediction in at least one combined dataset, six (Laiho's, Oh's, Jorissen's, Popovici's, Arango's and inhouse) had a high prevalent prediction (>8%) in three combined datasets (see online supplementary table S6). The concordance analysis indicated that five signatures (Jorissen's, Oh's, Popovici's, Laiho's and inhouse) correlated with each other (Cramer's V values of >0.40) in the three combined datasets (see online supplementary table S7). Laiho's, Oh's, Popovici's and inhouse signatures significantly predicted poor survival(s) in the U133Plus2 dataset, while Jorissen's signature significantly predicted poor survival(s) in U133A dataset (see online supplementary figure S2). Multivariate Cox analysis with age, gender and tumour stage as covariates showed that each of the five signatures (Jorissen's, Oh's, Popovici's, Laiho's and inhouse) significantly predicted poor survival(s) of CRC (see online supplementary table S8).

The five signatures contained 508 unique genes, of which one (POSTN) overlapped four times. With these genes, we constructed a subnetwork consisting of 24 linkers and 118 seeds connected by 270 edges with the shortest path threshold of two and p<0.05. Gene function enrichment analysis showed that the subnetwork was enriched with molecules functionally related to cancer invasiveness (see online supplementary table S9). GRB2, PTPN11, ITGB1, FN1, VEGFA and CD4 with higher degrees (>12 neighbours) and betweennesses than others were network-centric proteins (figure 1). FN1 is an extracellular matrix protein whose expression differs among normal tissues from the right colon, left colon and rectum as well as between adenoma and normal mucosa.37 VEGFA, a major proangiogenic factor, expresses prominently in adjacent parenchyma.38 CD4 is the hallmark of T helper cells. Thus, FN1, VEGFA and CD4 are not ideal for the establishment of CRC prognosis-related IHC signature with molecules predominately expressed in cancer cells. We then selected GRB2, PTPN11, ITGB1 and POSTN for subsequent study.

Figure 1

Structure of functional subnetwork constructed with molecules of five colorectal carcinoma prognosis-related microarray signatures. Molecules in the five robust signatures (Oh's, Jorissen's, Laiho's, Popovici's and inhouse) were used to construct a subnetwork which was connected by 24 linker (rectangle) genes and 118 seed (round circle) genes. Bigger size and red colour direction indicate higher degree and higher betweenness, respectively.

Correlation between mRNA and protein expression levels of the selected genes in CRC tissues

In the 30 frozen CRC tissues with mRNA concentration gradients of the four selected molecules, protein expression levels of these molecules significantly correlated to their corresponding mRNA levels (GRB2: r=0.631, p<0.001; ITGB1: r=0.425, p=0.019; POSTN: r=0.502, p=0.005; PTPN11: r=0.654, p<0.001). The expression patterns of the four molecules in the frozen CRC tissues were generally consistent with those in human CRC cell lines as detected using western blot. Western blot and siRNA assays supported the specificities of antibodies to the four markers in CRC cell lines. These data are shown in online supplementary figure S3.

Expression patterns of candidate molecules in colorectal specimens

Typical immunostainings are shown in figure 2. POSTN expressed in extracellular matrix and nuclear of epithelial and mesenchymal cells in mucosa and adenoma tissues but expressed strongly in cytoplasm of cancer cells and infiltrated fibroblasts in CRC specimens. GRB2 immunoreactivity was mainly nuclear in normal epithelial cells and cytoplasmic and nuclear in cancer cells in CRC specimens. Strong ITGB1 immunoreactivity was cell membrane and cytoplasmic in cancer cells in CRC specimens although positive staining of ITGB1 was also observed on the membrane of some epithelial cells in the benign samples. SHP2 immunoreactivity was mainly nuclear in normal epithelial cells and weakly expressed in epithelial cells in adenoma and CRC tissues. The CV (IQR) for the intratumoural heterogeneity in immunoreactive scores of ITGB1, POSTN, GRB2 and SHP-2 in whole mounts were 15.5% (10.8%–21.5%), 15.0% (11.5%–18.8%), 21.0% (3.3%–28.3%) and 14.0% (10.3%–26.5%), respectively.

Figure 2

Immunohistochemistry expression pattern of GRB2, ITGB1, SHP2 and POSTN in normal mucosa, adenoma and stages I–III colorectal carcinoma (CRC). The representative immunohistochemistry visual fields in this figure were scored as follows. Immunoreactive scores for GRB2 were 12 in epithelial nuclei, 8 in epithelial cytoplasm and 8 in parenchyma of normal mucosa; 12 in epithelial nuclei, 8 in epithelial cytoplasm and 8 in parenchyma of adenoma tissues; 4 in cancer cell nuclei, 8 in cancer cytoplasm and 6 in parenchyma of stage I CRC; 3 in cancer cell nuclei, 12 in cancer cytoplasm and 6 in parenchyma of stage II CRC; and 3 in cancer cell nuclei, 12 in cancer cytoplasm and 6 in parenchyma of stage III CRC. Immunoreactive scores for ITGB1 were 0 in epithelial cytoplasm of normal mucosa; 0 in epithelial cytoplasm of adenoma tissues; 12 in cancer cytoplasm of stage I CRC; 12 in cancer cytoplasm of stage II CRC; and 12 in cancer cytoplasm of stage III CRC. Immunoreactive scores for SHP2 were 12 in epithelial nuclei of normal mucosa; 12 in epithelial nuclei of adenoma tissues; 8 in cancer nuclei of stage I CRC; 6 in cancer cell nuclei of stage II CRC; and 0 in cancer cell nuclei of stage III CRC. Immunoreactive scores for POSTN were 4 in fibroblast cytoplasm and 3 in epithelial cytoplasm of normal mucosa; 9 in fibroblast cytoplasm and 3 in epithelial cytoplasm of adenoma tissues; 8 in fibroblast cytoplasm and 8 in epithelial cytoplasm of stage I CRC; 12 in fibroblast cytoplasm and 9 in epithelial cytoplasm of stage II CRC; and 12 in fibroblast cytoplasm and 12 in epithelial cytoplasm of stage III CRC. Different proteins might have different immunostaining patterns. The grading for staining intensity was based on individual protein. Bar, 50 μm.

Construction of an IHC signature

IHC data of all the four proteins were available for 1025 (93.4%) patients (682 in the Shanghai cohort and 343 in the Guangzhou cohort). In the Shanghai cohort, univariate Cox regression analysis indicated that TNM stage, differentiation grade, postoperative chemotherapy and immunoreactive scores (as continuous variables) of GRB2 in cancer cytoplasm (HR, 1.14; 95% CI 1.08 to 1.20; p<0.0001), ITGB1 in cancer cytoplasm (HR 1.21; 95% CI 1.14 to 1.30; p<0.0001) and POSTN in cancer epithelial (HR 1.13; 95% CI 1.08 to 1.18; p<0.0001), were significantly associated with increased DFS; whereas the score of SHP2 in cancer cells was significantly associated with decreased DFS (HR, 0.90; 95% CI, 0.86 to 0.94; p<0.0001). A formula composed of tumour POSTN, tumour cytoplasm GRB2, tumour ITGB1 and tumour SHP2 had the highest cross-validated partial likelihood and the maximal AUC value among the tested formulae (see online supplementary table S10), indicating the four-protein panel was the best one. We then derived this formula to calculate a risk score for each patient based on immunoreactive scores of the four markers in the training set, weighted by regression coefficients:Embedded Image

Prognostic values of the signature

Using the formula, patients in the Shanghai cohort were dichotomised into high-risk and low-risk subgroups with an optimal risk score (1.55) as the cut-off. Distribution of demographic and clinical characteristics such as age, gender, disease location, tumour differentiation grade, and serum levels of CEA and CA19-9 did not vary significantly between the high-risk and the low-risk subgroups (table 1). In the multivariate Cox regression analyses, high-risk score, TNM stage (III vs I+II) and tumour cell involvement in lymph nodes were independently associated with unfavourable DFS while high-risk score and the lymph node involvement were independently associated with poor DSS (table 2). Compared with patients with low-risk scores, those with high-risk scores had shorter DFS and DSS. Importantly, high-risk score significantly predicted poor DFS and DSS for stage II patients (figure 3).

Table 1

Clinical characteristics of colorectal carcinoma patients dichotomised by immunohistochemistry signature at the cut-off in the training and validation cohorts

Table 2

Cox regression analysis of immunohistochemistry signature score and clinicopathological covariates with survivals in the Shanghai cohort

Figure 3

High-risk immunohistochemistry signature score and poor survivals of colorectal carcinoma patients in the Shanghai cohort and the Guangzhou cohort. Stage I–III patients were dichotomised into high-risk subgroup and low-risk subgroup at the cut-off point (1.55) of the signature score, respectively. Disease-free survival and disease-specific survival are presented. p Values are shown. Green line represents the high-risk subgroup. Blue line indicates the low-risk subgroup. Log-rank p values are from Kaplan–Meier analysis with log-rank test.

We then applied the same cut-off to dichotomise the study patients in the Guangzhou cohort. Clinical variables including the TNM stage, tumour differentiation grade and postoperative chemotherapy did not vary significantly between the high-risk and low-risk subgroups (table 1). In the multivariate Cox regression analyses, high-risk signature score, rather than TNM stage, was significantly associated with an unfavourable DSS (table 3). Patients with high-risk scores had shorter DFS and DSS than did those with low-risk scores. Similarly, high-risk score significantly predicted a poor DSS for stage II patients (figure 3).

Table 3

Cox regression analysis of immunohistochemistry signature score and clinicopathological covariates with disease-specific survival in the Guangzhou cohort

Taking signature risk score as a continuous variable, we also performed the multivariate Cox regression analysis, adjusting for these covariates. The same conclusions were obtained with a continuous risk score in both cohorts. The signature scores increased with increasing TNM stage in the training cohort (Spearman's r=0.181, p<0.001) but not in the validation cohort (Spearman's r=0.008, p=0.884).

The signature predicts the prognosis of the patients who received postoperative chemotherapy

We assessed prognostic value of the signature for the patients with or without postoperative chemotherapy. In the Shanghai cohort, high-risk signature score was significantly associated with shorter DFS and DSS in the patients with postoperative chemotherapy whereas this effect was not observed in those without chemotherapy. In the Guangzhou cohort, high-risk score significantly predicted an unfavourable DSS in those with and without postoperative chemotherapy. These data are presented in figure 4.

Figure 4

The association between the scores of immunohistochemistry signature and survivals of the colorectal carcinoma (CRC) patients with or without postoperative chemotherapy in both cohorts. The CRC patients from each cohort were dichotomised into high-risk subgroup and low-risk subgroup at the cut-off point (1.55) of the signature score. Disease-free survival and disease-specific survival of the patients with and without postoperative chemotherapy (FOLFOX) are presented. p Values are shown. Green line represents the high-risk subgroup. Blue line indicates the low-risk subgroup. Log-rank p values are from Kaplan-Meier analysis with log-rank test.

Discussion

In this study, we applied a systematic approach to evaluate the concordance and robustness of 25 signatures in predicting CRC postoperative prognosis. Of the 20 published signatures with information of gene expression direction, 15 without sufficient concordance in the validation process were excluded. These 15 studies were originally designed to answer different prognosis-related questions, for example, risk for metastases. The five selected signatures had high prevalence prediction and were at least moderately correlated, indicating that these genes could be involved in an intrinsic network. Of the five signatures, Oh's has been proven to have robust performance in identifying CRC patients with poor prognosis in two independent cohorts.39 As signatures with sufficient concordance contained less genes in common, we constructed a functional subnetwork by integrating the five signatures. The molecules enriched in this subnetwork were functionally related to cancer invasiveness, indicating the importance of this network in CRC progression.

Besides gene transcription, other factors may influence protein expression levels. Therefore, we compared mRNA and proteins levels and confirmed that mRNA levels significantly correlated to protein levels of the four molecules in CRC tissues, which bridges gene signatures and protein profiling. High mRNA levels of ITGB1 and POSTN in tumours predicted a poor prognosis of CRC patients7 ,10 ,15 ,19 ,26 while low mRNA levels of PTPN11 also predicted a poor prognosis (GSE28702 and GSE5206), which was quite consistent with their proteins’ performance.

We developed an IHC signature by examining expression patterns of the network-centric and the top overlapping proteins in the training set. This signature efficiently discriminated CRC patients with distinct prognosis in the validation set. CRC prognosis is stage and grade dependent. Our signature efficiently partitioned the patients into the high-risk and low-risk subgroups with balanced grade in both cohorts and TNM stage in the validation cohort (table 1) and had a better survival predictive power than TNM staging in both cohorts (table 2). Importantly, our signature significantly predicted poor postoperative prognosis of stage II CRC patients in both cohorts (figure 3). Dichotomisation of stage II CRC patients by markers is the area of greatest clinical need for prognostic assays.

Our IHC signature exhibited stronger stratification power than did single-marker studies,3 ,34 ,40 in terms of HR or validity, possibly because the four molecules were selected via our systemic approach and their effects in predicting CRC prognosis were complementary. This signature also has advantages over reported gene signatures and other quantitative approaches such as reverse-phase protein array (RPPA).41 The stratification power is generally higher than previously reported signatures.5 ,9 ,10 ,14 ,15 ,18 ,20–23 25–27 IHC provides cell-type localisation information, whereas gene signatures and the RPPA do not. Several protein signatures composed of key proteins in some signalling pathways and differentially expressed proteins between CRC tissues and normal mucosa have been reported.42–44 However, none of them were validated in external cohorts. Our approach may be more clinically useful at least in the short-term as information technology and IHC capacities now exist in major medical centres. To our knowledge, this is the first study reporting an IHC signature derived from gene signatures for predicting CRC prognosis.

The expression pattern of signature proteins may present new insights into the mechanisms that underlie cancer metastasis or recurrence. GRB2 is critical for cell cycle progression and angiogenesis.45 Blocking the GRB2 signalling inhibits CRC cell motility.46 POSTN has been found to be highly upregulated in CRC tissues and sera of CRC patients.47 A study reported that POSTN was positive in cancer-associated fibroblasts, not in colon cancer cells.48 Our data indicated that POSTN was highly expressed in human CRC cell line HCT116 and most CRC tissues (see online supplementary figure S3). This difference may be explained by the different antibodies used. Expression of POSTN can be solely found in cancer cells of colon cancer tissues and promote metastasis by augmenting cell survival via the Akt/PKB pathway.49 POSTN is required for cancer stem cell maintenance.50 Anti-POSTN antibody activates apoptosis of CRC cells and potentiates the effects of 5-fluorouracil-based chemotherapy.51 Based on these evidence, we believe that POSTN promotes CRC aggressiveness and aberrant expression of POSTN in CRC cells and tumour-infiltrating cells predicate a poor postoperative prognosis. The roles of GRB2 and POSTN expression in primary tumours for the prediction of CRC prognosis have not been reported. SHP2 acts as either tumour promoter or suppressor in the malignancies of different origin.52 ,53 SHP2 expression in CRC correlated to a good prognosis, indicating SHP2 tends to be a tumour suppressor in CRC. Integrin β1, an important extracellular matrix-interacting network hub, is essential for oncogenesis and angiogenesis.54 However, the effect of integrin β1 expression on CRC prognosis remains unknown. These signature proteins may affect CRC recurrence through inflammation and/or metastasis from disseminated cancer cells after surgery. Our signature predicted an unfavourable prognosis of those with postoperative chemotherapy in both cohorts (figure 4), indicating that some signature molecules might promote the ‘stemness’ of disseminated cancer cells and contribute to poor therapeutic outcome. Our signature could allow clinicians to identify the patients who need targeted treatments. Function of these molecules in CRC progression needs to be further studied.

Our study remains to be improved on several aspects. Although we carried out this study following the guidelines for REMARK55 and enrolled consecutive patients in the training cohort, some patients were lost to follow-up. This might introduce a bias. Further randomised clinical trials are needed to validate this signature. Relapse monitoring was incomplete for 70 of the 343 patients in the validation cohort, resulting in loss of DFS data. Inflammation is important in CRC metastasis. The expression of some network-centric molecules in non-cancer cells may also be of prognostic value. We presented preliminary data of intratumoural heterogeneity in the expression of the four molecules in whole mounts of 10 patients. However, effect of the intratumoural heterogeneity on prognosis prediction of our signature remains to be systemically evaluated using large cohorts before clinical translation.

In conclusion, this study presents a powerful IHC signature by comparing and integrating currently available microarray data. Although further prospective studies are necessary to validate the robustness of this signature, our approach represents an innovation toward clinical applications of current gene expression profiling in CRC, contributing to personalised prediction of CRC prognosis. Furthermore, the roles of these proteins in CRC metastasis and targeted therapy warrant further investigation.

Acknowledgments

We are grateful to Professor Timothy C Thompson (University of Texas MD Anderson Cancer Center, Texas, USA) for critical reading of this manuscript.

References

View Abstract

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • Contributors WC, XG, YH, YD, QL, LW, JW, CF and GC had full access to all the data in the study and take responsibility for the integrity of the data and accuracy of the data analysis. WC, XG, YH, YD, QL and LW contributed equally to this work and share cofirst authorship; GC, CF and JW are equal corresponding authors. GC, WC, CF and JW developed the study concept and design. WC, XG, YH, YD, QL, LW, XT, QZ, YL, YZ, YY, XF, HZ, WZ, JW, CF and GC were involved in acquisition of the data. GC wrote the manuscript. WC, XG, YH, YD, QL, LW, XT, QZ, YL, HZ, JW, CF and GC proofread the manuscript for important intellectual content. WC, YH, YD and HZ performed statistical analysis. GC, CF, XG and WC obtained funding. LW, XF, HZ, JW and CF provided administrative, technical or material support. GC supervised the study.

  • Funding This work was sponsored by the National Science Fund for Distinguished Young Scholar (81025015 to Cao) and regular funds (81272561 to Fu, 81201936 to Gao and 81372671 to Chang) from the National Natural Scientific Foundation of China. The funding agencies had no role in the design and conduct of the study; collection, management, analysis and interpretation of the data; and preparation, review or approval of the manuscript.

  • Competing interests None.

  • Patient consent Obtained.

  • Ethics approval The study was approved by the committees for ethics review for research involving human subjects at Changhai Hospital, Secondary Military Medical University and The Sixth Affiliated Hospital, Sun Yet-Sen University.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The gene expression data in this study can be found online at the Gene Expression Omnibus under accession numbers GSE28702 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28702) and GSE5206 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5206).

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.