Gut 62:423-429 doi:10.1136/gutjnl-2011-301489
  • Colon
  • Original article

Single measures of performance do not reflect overall institutional quality in colorectal cancer surgery

  1. Omar Faiz1
  1. 1Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, St Mary's Hospital, London, UK
  2. 2Dr Foster Unit, Department of Primary Care and Public Health, Imperial College London, London, UK
  1. Correspondence to Mr O Faiz, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, St Mary's Hospital, Praed Street, London W21NY, UK; o.faiz{at}
  1. Contributors AMA, CV and OF conceived the study. AMA analysed and interpreted the data. AMA composed and drafted the manuscript and all authors critically revised the manuscript. All authors, external and internal, had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis. OF is the guarantor.

  • Revised 17 January 2012
  • Accepted 19 January 2012
  • Published Online First 16 February 2012


Objective To evaluate overall performance of English colorectal cancer surgical units identified as outliers for a single quality measure—30 day inhospital mortality.

Design 144 542 patients that underwent primary major colorectal cancer resection between 2000/2001 and 2007/2008 in 149 English National Health Service units were included from hospital episodes statistics. Casemix adjusted funnel plots were constructed for 30 day inhospital mortality, length of stay, unplanned readmission within 28 days, reoperation, failure to rescue-surgical (FTR-S) and abdominoperineal excision (APE) rates. Institutional performance was evaluated across all other domains for institutions deemed outliers for 30 day mortality. Outliers were those that lay on or breached 3 SD control limits. ‘Acceptable’ performance was defined if units appeared under the upper 2 SD limit.

Results 5 high mortality outlier (HMO) units and 15 low mortality outlier (LMO) units were identified. Of the five HMO units, two were substandard performance outliers (ie, above 3 SD) on another metric (both on high reoperation rates). A further two HMO institutions exceeded the second but not the third SD limits for substandard performance on other outcome metrics. One of the 15 LMO units exceeded 3 SD for substandard performance (APE rate). One LMO institution exceeded the second but not the third SD control limits for high reoperation rates. Institutional mortality correlated with FTR-S and reoperations (R=0.445, p<0.001 and R=0.191, p<0.020 respectively).

Conclusions Performance appraisal in colorectal surgery is complex and dependent on stakeholder perspective. Benchmarking units solely on a single performance measure is over simplistic and potentially hazardous. A global appraisal of institutional outcome is required to contextualise performance.

Significance of this study

What is already known about this subject?

  • Variability exists in institutional 30 day mortality after colorectal cancer surgery in England.

  • Additional performance measures are derivable from routinely collected data.

  • To date there has been no comparison of how institutions that are ‘outliers’ for risk adjusted 30 day mortality perform across other related and unrelated performance domains.

What are the new findings?

  • Correlations do exist between mortality and other performance measures (namely those related to mortality—FTR-S and reoperations).

  • Institutions that are high mortality outliers are not globally poor performers across other performance measures.

  • Low mortality outliers generally seem to perform acceptably across other performance domains.

  • Relationships between measures of performance are complex and not necessarily predictable.

How might it impact on clinical practice in the foreseeable future?

  • Appraising institutional service quality in colorectal surgery is complex. High mortality outlier status does not necessarily predict global poor performance across other domains. Understanding the context of measures such as mortality could enable meaningful quality improvement.


In the UK, comprehensive and mandatory healthcare data collection is performed routinely.1–3 Measures of surgical quality are readily available from such administrative data sources. The utility of these measures depends on their use to define quality and influence decision making at a clinical, managerial or policy level. Several metrics relevant to colorectal surgery are derivable from routinely held data. These could potentially be used to benchmark performance in colorectal surgery. If this process were reliable it could inform broadly on surgeon specific and institutional colorectal surgical performance. Moreover, if reliance on existing National Health Service (NHS) data sources (that lie within the public domain) were maintained for these purposes transparent reporting of outcome to the public would follow. For such a system to be fair and robust two conditions are pre-requisites. Firstly, the accuracy of the data used for benchmarking must be consistent at an institutional and surgeon level. In addition, an understanding of how individual metrics interrelate to reflect high and poor surgical performance is needed. High achievement across all measured domains almost certainly reflects a proactive and competent provider. Secondly, can one however meaningfully comment on provider performance from measurement of one domain alone? If so, what limits are meaningful? Alternatively, do these metrics reflect unique aspects of performance and demand individual appraisal (and remedial intervention)?

Validated metrics may be used to benchmark performance between surgical providers and to underpin quality improvement initiatives. Surgical measures that are easily obtained from routinely collected data include 30 day mortality,4 inpatient length of stay5 and readmission rate.6 Additional markers of performance that are of relevance to colorectal surgery can also be derived from such data sources, including short term re-intervention7 and abdominoperineal excision (APE) rate.8 Other metrics such as lymph node yield, R0 resection rate and quality of mesorectal excision may be obtained from clinical registries.9 The latter may also be used to evaluate service quality between providers.10 The aforementioned measures potentially form the basis for future quality improvement programmes.11

A recent publication by Morris et al used administrative data linked to cancer registry information to report on the variation that currently exists in 30 day mortality rates between English NHS institutions undertaking colorectal surgery.12 Performance concerns, with regards to short term survival outcome, may be justifiable in a limited number of outlying institutions. Clarification that poor perioperative mortality rates denote poor global standards of colorectal surgical practice (eg, associated high re-intervention rates, poor oncological outcomes, etc) clearly justifies public reporting of single performance measures. As such, it is necessary to elucidate what it means to be an institutional outlier for 30 day mortality? Lastly, an understanding of the limitations of specific outcome measures is becoming increasingly significant to individual surgical practitioners in the UK due to the implementation of compulsory revalidation of doctors.13

The aim of this study was to explore, from an English national administrative database, the relationships between commonly collected and derivable metrics following major resection for colorectal cancer. Specifically, we used national data to correlate institutional 30 day mortality rate with other outcome metrics. In addition, we sought to investigate the performance of statistical ‘outliers’ for 30 day mortality across other quality domains.


Hospital episodes statistics database and patient selection

The hospital episodes statistics (HES) database is an administrative dataset to which all NHS hospitals compulsorily submit patient level information. It contains data on patient demographics as well as primary and secondary diagnoses codes. Each patient entry also has dated procedural and operation fields that can be analysed. Outcome measures such as length of stay and inhospital mortality are routinely collected14 and readmission is easily derived.

All patients that underwent a primary major colorectal procedure with a diagnosis of colorectal cancer between April 2000 and March 2008 in English NHS trusts were included. Patients were identified using diagnostic and procedural codes from the relevant International Classification of Disease 10th revision (ICD-10) and Office for Population Census and Surveys Classification of Surgical Operations and Procedures 4th revision (OPCS-4) codes on the HES database. A detailed methodology of this process has been described previously.15 The following resections were analysed: right and extended right hemicolectomies, transverse colectomy, left hemicolectomy, sigmoid colectomy, Hartmann's procedure, subtotal colectomy, panproctocolectomy, total colectomy, anterior resection and abdominoperineal resection (APER). The corresponding OPCS-4 procedural codes are described in appendix 1 (available online only).

The Charlson comorbidity scoring system was developed for administrative datasets.16 Comorbidities that are associated with worse outcomes are given greater scores. Secondary diagnosis fields on HES were used to create the Charlson comorbidity index. Charlson score was reclassified into three categories: 0, 1–4 and ≥5. The Carstairs index17 is a composite socioeconomic deprivation score calculated at the output area level and converted into population weighted quintiles.17

Outcome metrics

Length of stay, readmission within 28 days and mortality within 30 days

Institutional lengths of admission stay for the above procedures (taken as a basket) were described as mean values following logarithmic conversion as the percentages of patients that had lengths of stay greater than the population 75th centile. Twenty-eight day readmission and 30 day mortality rates were expressed as proportions (in percentages) of the total caseload.

Reoperation rates

Reoperation rates are computed from HES data employing a methodology that has been described previously.18 Reoperation describes any patient returned to theatre after their index procedure for a select group of interventions within 28 days. The codes for reoperations include those denoting: washout of abdomen, small bowel resection, further colorectal resection, drainage of intra-abdominal abscess, division of adhesions, stoma formation or operation on a stoma, and wound complications requiring return to theatre. Reoperation rates are calculated as a proportion of the total volume of index procedures undertaken.

Abdominoperineal excision rates

Where the primary diagnosis was rectal cancer, the APE rate was calculated as the number of non-emergency APE resections performed over the total volume of other procedures performed for excision of a rectal cancer. Other procedures included anterior resections, Hartmann's resections, excision of rectum unspecified/other, panproctocolectomy, total colectomy, sigmoid colectomy and excision of left hemicolon. These were converted into a percentage. OPCS-4 and ICD-10 codes used are listed in appendix 1 (available online only).

Failure to rescue-surgical rate

Failure to rescue-surgical (FTR-S) rate is defined as the proportion of patients that die on their index admission after being returned to theatre (for the procedures listed above). The methodology used to calculate FTR-S has been described previously.14

Statistics and funnel plots

Outcome rates were calculated per institution for correlation and comparison purposes. Correlation between linearly distributed outcome variables was investigated using Pearson's statistic. Length of stay required logarithmic transformation prior to application of linear statistical methods.

Adjustment was carried out using multiple regression analyses. Models incorporated covariates including patient gender, age, Charlson comorbidity score, Carstairs deprivation index, type of resection and method of admission (elective/emergency). These were aggregated on a per institution level for each metric considered and used to create casemix adjusted funnel plots for each dependent variable. Funnel plots19 were created using the tools available at The control limits displayed are the exact Poisson control limits.


Patient population

A total of 144 542 patients were analysed that had undergone a primary major colorectal cancer resection between April 2000 and March 2008 in 149 NHS units. Patient demographics are described in table 1.

Table 1

Demographics of the patients

Institutional performance according to casemix adjusted funnel plot description

A funnel plot describing risk adjusted inhospital 30 day mortality rate for all 149 NHS institutions was charted (figure 1). The funnel plot depicts the adjusted upper and lower 2 SD and 3 SD control limits for varying caseload. Units were described as outliers if they lay above or below the respective 3 SD control (99.8%) limits. Units were described as lying within acceptable limits if they lay below the upper 2 SD control (95.0%) limit.

Figure 1

Funnel plot demonstrating five high mortality outlier units (A–E) on or above the 3 SD control limits for adjusted mortality and 15 low mortality outlier units below the 3 SD control limit.

The funnel plot highlighted five institutions whose mortality rates lay on, or above, the upper 3 SD control limit (ie, have significantly higher than expected mortality rates at the 99.8% confidence level). These institutions were termed high mortality outlier (HMO) units. Fifteen units were identified below or on the lower 3 SD control limit (ie, significantly lower mortality than expected at the 99.8% CI). These were described as low mortality outlier (LMO) units (figure 1).

High mortality outlier units

All five HMO units lay within acceptable limits for readmissions and APE rates (figures 2 and 3) when identified on casemix adjusted funnel plots. All but one unit (institution D) lay between the control limits for length of stay (figure 4).

Figure 2

Funnel plot demonstrating positions of the mortality outlier units and how they perform when analysed on readmission rates within 28 days.

Figure 3

Funnel plot demonstrating positions of the mortality outlier units and how they perform when analysed on abdominoperineal excision rates per unit.

Figure 4

Funnel plot demonstrating positions of the mortality outlier units and how they perform when analysed on percentage of patients with lengths of stay greater than the 75th quintile of the population mean.

When the HMO units A–E were charted on a casemix adjusted funnel plot for reoperation rates, unit D lay within acceptable limits whereas unit C lay below the lower 3 SD control limit (ie, lower than expected reoperation rates). In contrast, units A, B and E all demonstrated higher than expected reoperation rates (figure 5).

Figure 5

Funnel plot demonstrating positions of the mortality outlier units and how they perform when analysed on 28 day reoperation rates.

A funnel plot describing casemix adjusted institutional FTR-S rates demonstrated that units A, D and E lay within acceptable limits (figure 6). Unit C demonstrated a significantly lower FTR-S rate than expected. In contrast, unit B lay above the upper 2 SD control limit indicating that a greater number of patients at the institution failed to be rescued.

Figure 6

Funnel plot demonstrating positions of the mortality outlier units and how they perform when analysed on failure to rescue-surgical (FTR-S) rates.

Table 2 summarises the performance of the HMO units and how they performed on the other considered metrics.

Table 2

Summary of the performance of the high mortality outlier institutions (A–E) and how they performed on the other considered metrics

Low mortality outlier units

Thirteen of 15 LMO units performed within acceptable limits, with several performing better than expected when charted on casemix adjusted funnel plots for readmission rate, length of stay and FTR-S (figures 2, 4 and 6). When reoperation rate was charted, one LMO unit was observed to perform less well than expected, lying above the 2 SD control limit. The remaining 14 units performed as well as, or better than, expected (figure 5). One LMO unit lay above the upper 3 SD control limit for APER rates (figure 3).

Institutional outcome metric correlation

Correlations between institutional 30 day postoperative mortality rate and other outcome measures are described in table 3. At the institutional level, when 30 day mortality is correlated against FTR-S rates, a significant correlation is observed (R=0.445, p=<0.001). When institutional 30 day postoperative mortality rate is correlated against reoperation rate, a weak statistical correlation is observed (R=0.191, p=0.020). Mortality did not correlate statistically with: 28 day readmission rates (R=0.143, p=0.082), APER (R=0.119, p=0.147) or length of stay (R=0.148, p=0.072).

Table 3

Individual Pearson's correlation statistics and significance values per metric against 30 day mortality


This study, that utilised English administrative data sources, highlights the complexity associated with service quality appraisal in colorectal surgery. It also questions the reliability of reporting individual metrics as universal markers of provider performance. These findings have important future implications regarding surgical benchmarking and quality improvement. The study suggests that units are not necessarily substandard performers across a range of outcome metrics despite being HMO units for postoperative mortality. Although the study findings suggest that high 30 day mortality outlier status does not necessarily reflect poor overall institutional performance, low 30 day mortality outlier status does seem to convey at least ‘standard’ overall performance. When LMO units were considered across other outcome domains, only two units performed worse than expected on two separate outcome measures.

The definition of an outlier is complex. Specific nationally agreed acceptable benchmarks for outcome do not currently exist beyond perioperative mortality in the UK. The Association of Coloproctologists of Great Britain and Ireland have offered a guideline that ‘surgeons should expect to achieve an operative mortality of <20% for emergency surgery and <7% for elective surgery for colorectal cancer’. In our study, the 3 SD limits and population means for reoperation, readmission and mortality followed those of previously published US and UK studies.18 ,20 ,21 In addition, our approach to identify statistical outliers on the basis of the 0.05 and 0.002 (ie, 2 SD and 3 SD) control limits follows governmental guidance on this issue.22

Dangers potentially arise when ‘good’ and ‘poor’ performance labels are assigned to units on the basis of outlier status using only single metric evaluation. Certainly, factors such as casemix could underlie outlier status and are potentially not fully accounted for on routinely collected datasets. The complexity of performance appraisal is appreciated when figures 5 and 6 are considered together. From these figures, unit C has a significantly lower than expected rate of returning patients to theatre despite HMO status. Unit C has a lower than expected FTR-S rate also. The latter markers (reoperation and FTR-S rates) potentially represent high performance when taken in isolation but the unit is a known HMO. The latter units' high mortality rate is therefore not a consequence of surgical re-intervention. One explanation for this finding may however be that patients at this institution are perhaps not being returned to theatre when it is indicated. Alternatively, perhaps complex casemix underlies this finding and patients are dying from non-surgically related causes postoperatively. However, when one considers the outcome of units B and E, it can be observed that they both return patients to theatre more often than expected (figure 5). Yet these units are distinguished in figure 6 where unit D lies within normal control limits whereas unit B lies above the upper 2 SD control limit for FTR-S. This suggests that the latter unit is not salvaging the patients it is returning to theatre. In contrast, in unit D, the high mortality outlier status does not appear to be due to failures in rescuing patients following reoperations.

The low correlation between outcomes corroborates the fact that defining quality in colorectal surgery is complex.23 This further calls into question how quality in colorectal surgery can be quantified and meaningfully benchmarked. The fact that little correlation exists between postoperative mortality and other metrics suggests that achieving a definition of quality is potentially subjective and dependent on what aspects of quality are prioritised. This perspective depends on the viewpoints of the stakeholders concerned. Moreover, the lack of such correlation demands that overarching decision makers (surgical professional bodies, health policy makers, hospital managers) decide on the importance and relevance that should be placed on individual performance targets. Furthermore, extrapolating high performance from such metrics may be further complicated by the perceptions of good outcome dependent on the agenda and goals of individual stakeholders.24 For example, mortality and the formation of a permanent stoma are likely to be of greater concern to most patients than length of stay. In contrast, bed stay and its associated cost is likely to have greater implications for managers and service providers. As a result, there is a need to rationalise which measures should be targeted for benchmarking and quality improvement purposes. Ideally, inclusion of patient centred metrics would offer a comprehensive appraisal of quality that includes the patient perspective. Such measures could represent patient satisfaction scores and/or patient reported outcome measures. These outcomes are not however collected routinely following colorectal surgery in England and therefore were not available for inclusion in the current study.

In terms of groups of measures, moderate correlation was identified in the current study between the mortality related measures (ie, 30 day mortality, reoperation and FTR-S). This indicates that, on some level, these measures reflect similar aspects of clinical decision making and care received. This questions whether it would be possible to use a single composite metric to describe multiple mortality related outcomes. Importantly however, the funnel plots demonstrate very different performance levels among the five HMO institutions when FTR-S rates are considered, thereby suggesting that considerable institutional difference in prevention of death after reoperations occurs. The 30 day mortality, despite risk adjustment, may offer little information regarding which deaths have arisen that could have potentially been avoided through re-intervention or better quality perioperative care. Appraisal of both metrics using funnel plots depicts differing apparent ‘poor performers’. It is therefore a subjective decision to determine which measure (or indeed both) should be used to reflect the desired goal. Complete appraisal of a colorectal cancer service is incomplete without consideration of long term oncological outcome following treatment. It is known that interplay between short term outcome, such as anastomotic leak, and poor subsequent oncological outcome exists.25 The HES database does not contain such information and thus could not be included in the analysis. Some investigators have, however, linked HES and cancer registry data in a proportion of the English population.8 Future studies will need to investigate the relationship between perioperative and long term oncological outcome in population based studies.

A clear understanding of the scope of each outcome metric used to reflect performance in colorectal surgery is required if these are to be openly reported.26 Previous attempts to report variability in practice in colorectal surgery have been met with mixed response.8 This has mainly been due to the limited extrapolation that is possible when measures are heavily influenced by clinical factors27 that are not represented fully in the datasets used for the analysis. It is therefore perhaps important that any audience is fully informed as to the complex relationships that surround these metrics should they be subject to open reporting. In addition, offering information on an institution's performance across multiple outcome measures might allow transparency with regards to overall performance.

Appraisal of units using multiple factors (as opposed to using a single metric) may facilitate understanding performance. Consideration of reoperation and failure to rescue rates together in light of overall mortality is potentially highly informative. In high mortality units where reoperation rates are below expected findings and FTR-S rates are acceptable, this may suggest that units are not returning patients to theatre when necessary. It must be considered that within these units, patients are either not being recognised as requiring reoperations or dying before reaching reoperation. Appraising unit performance on the basis of single measures can be contentious or only definable in the context of detailed clinical information. For example, postoperative mortality measures only reflect the outcome of patients selected for surgery and perhaps apparent high performers in this context may be denying operations to potential candidates. Furthermore, variation in reoperation rate must depend to some extent on variation in operative casemix and complexity undertaken by surgical teams. As such, clear definition of high and poor performance even in these seemingly uncontentious metrics is hazardous. A panel of metrics may however facilitate contextualisation of performance measurement, as illustrated by units B and C, with similar high mortality status units but with diametric FTR-S rates.

Finally, it may be argued that 30 day mortality performance measures should only be applied to patients undergoing elective surgery. Our analyses includes risk adjusted models for both elective only (figure 7) plus elective and non-elective (figure 1) scenarios. Units identified as HMO (ie, >3 SD) for elective and emergency resections were also outliers (at the >2 SD threshold) when elective resections are considered alone (see figure 7). This implies that a factor such as mode of admission is perhaps not an overwhelming determinant of performance ranking in our population. Moreover, 30 day elective mortality at all five institutions exceeded 6% despite a national adjusted mean of 3.8%. As such, we feel it appears reasonable to consider elective and emergency patients together in this form of analysis to appraise the unit as a whole. By analysing elective and non-elective patients together, this allows assessment for how a colorectal cancer population is treated by a given institution rather than just how patients that present via discrete elective or emergency channels may do. This to some degree reflects a surgeon's case selection, propensity to operate on high risk patients, and intensive care facilities and support of such patients. Institutional performance in colorectal surgery is to some extent denoted by minimising patient exposure to non-elective presentations and subsequent treatment. As such, efforts such as bowel stenting are practised in many centres to avoid emergency operations. Many clinicians consider this practice a marker of high quality service provision. Hospitals that successfully employ such procedures consequently operate on these patients electively but on an expedient basis. As such, examination of the elective workload in isolation may negatively bias their outcome despite arguably providing a better service than those that might just undertake an emergency operation. For this additional reason, inclusion of both elective and emergency colorectal cancer patient groups into perioperative mortality risk models appears warranted. It is conceivable that the proportion of emergency admissions could affect unadjusted outcome measures such as length of stay or reoperation rates. In so far as is possible, this has been adjusted for by including mode of presentation into the risk adjustment models.

Figure 7

Funnel plot demonstrating five high mortality outlier units (A–E) on or above the 3 SD control limits for adjusted mortality and 15 low mortality outlier units below the 3 SD control limit for elective resections alone.

We recognise that there are potential limitations to the present study. Some of these relate to the administrative nature of HES data. Specifically, without clinical information relating to tumour height, APER rates are difficult to interpret. The validity of this metric has been questioned when it is derived from HES data.27 Inclusion in this context is for comparative purposes rather than judgement of the appropriateness of the procedure. Similarly with reoperation rate; clinical corroboration of reoperative need would facilitate performance appraisal. Data reliability is central to performance appraisal and benchmarking. High overall accuracy of the HES dataset has been demonstrated in a number of reviews.28 ,29 Concerns still exist however that variation in coding accuracy between institutions renders performance benchmarking hazardous. As stated above, genuine casemix differences that are not accounted for in the dataset may also influence outcome. Specifically, centres that operate on more advanced disease are more likely to have increased perioperative risk and this is not directly discernible from this study. At some institutions therefore, outlier status may be a reflection of casemix complexity rather than differences in underlying performance.


The current study suggests that high institutional postoperative mortality rate following colorectal surgery does not necessarily predict how such units perform on other measures of service quality. Benchmarking institutional colorectal surgical performance is complex and not generalisable from a single measure of outcome but rather demands global service appraisal across a range of outcome measures.


  • Funding AMA is in receipt of funding from the National Institute of Health Research for research into patient safety. National Institute of Health Research had no role in the study design; in the collection, analysis and interpretation of data; in the writing of the report; or in the decision to submit the article for publication. The researchers had complete independence from AMA's funders. PA and AB are employed within the Dr Foster Unit at Imperial, which is largely funded by a research grant from Dr Foster Intelligence (an independent health service research organisation). The Dr Foster Unit at Imperial is affiliated with the Imperial Centre for Patient Safety and Service Quality at Imperial College Healthcare NHS Trust, which is funded by the National Institute of Health Research. The Department of Primary Care and Social Medicine is grateful for support from the National Institute for Health Research Biomedical Research Centre Funding Scheme.

  • Competing interests None.

  • Ethics approval The study was approved under section 251 granted by the National Information Governance Board for Health and Social Care (formerly section 60 by the Patient Information Advisory Group). The authors have had approval for using these data for research from St Mary's local ethics committee since 2002. Application under section 251 of the NHS Act 2006 was undertaken to permit access to patient level HES data.

  • Provenance and peer review Not commissioned; externally peer reviewed.


Free sample
This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of Gut.
View free sample issue >>

Don't forget to sign up for content alerts so you keep up to date with all the articles as they are published.

You are viewing from:
MSN Academic Search