Article Text

Diagnostic utility of alarm features for colorectal cancer: systematic review and meta-analysis
1. A C Ford1,
2. S J O Veldhuyzen van Zanten2,
3. C C Rodgers3,
4. N J Talley4,
5. N B Vakil5,
6. P Moayyedi1
1. 1Gastroenterology Division, McMaster University, Health Sciences Centre, Hamilton, Ontario, Canada
2. 2Division of Gastroenterology, University of Alberta, Edmonton, Alberta, Canada
3. 3Department of Medicine, Dalhousie University, Halifax, Nova Scotia, Canada
4. 4Department of Medicine, Mayo Clinic Jacksonville, Jacksonville, Florida, USA
5. 5Department of Medicine, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, USA
1. Dr Alex Ford, Gastroenterology Division, McMaster University, Health Sciences Centre, 1200 Main Street West, Hamilton, Ontario, Canada; alexf12399{at}yahoo.com

## Abstract

Objective: Colorectal cancer is the second most common cause of cancer death in Europe and North America. Alarm features are used to prioritise access to urgent investigation, but there is little information concerning their utility in the diagnosis of colorectal cancer.

Methods: A systematic review and meta-analysis of the published literature was carried out to assess the diagnostic accuracy of alarm features in predicting colorectal cancer. Primary or secondary care-based studies in unselected cohorts of adult patients with lower gastrointestinal symptoms were identified by searching MEDLINE, EMBASE and CINAHL (up to October 2007). The main outcome measures were accuracy of alarm features or statistical models in predicting the presence of colorectal cancer after investigation. Data were pooled to estimate sensitivity, specificity, and positive and negative likelihood ratios. The quality of the included studies was assessed according to predefined criteria.

Results: Of 11 169 studies identified, 205 were retrieved for evaluation. Fifteen studies were eligible for inclusion, evaluating 19 443 patients, with a pooled prevalence of colorectal carcinoma of 6% (95% CI 5% to 8%). Pooled sensitivity of alarm features was poor (5% to 64%) but specificity was >95% for dark red rectal bleeding and abdominal mass, suggesting that the presence of either rules the diagnosis of colorectal cancer in. Statistical models had a sensitivity of 90%, but poor specificity.

Conclusions: Most alarm features had poor sensitivity and specificity for the diagnosis of colorectal carcinoma, whilst statistical models performed better in terms of sensitivity. Future studies should examine the utility of dark red rectal bleeding and abdominal mass, and concentrate on maximising specificity when validating statistical models.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Colorectal carcinoma is the second most common cause of cancer death in the Western world.1 ,2 Traditionally, lower gastrointestinal (GI) symptoms are thought to be more common in those with colorectal carcinoma.3 ,4 However, these symptoms are also prevalent in healthy individuals in the community,513 and in a health service with a limited budget it is not economically feasible to investigate every person who reports them. In the UK, the National Institute for Health and Clinical Excellence (NICE)14 ,15 has published guidelines to assist general practitioners in identifying patients with a high probability of colorectal carcinoma. The presence of so-called “alarm” features, such as rectal bleeding, a change of bowel habit to looser or more frequent stools, iron deficiency anaemia, or a palpable right-sided abdominal mass or rectal mass have been proposed as the optimal method of identifying these individuals, who are required to be seen by a specialist for further assessment and investigation within 2 weeks of referral.

These symptoms and signs have been selected because it is thought that they may predict a diagnosis of colorectal cancer. However, colorectal cancer can present with vague symptoms, or patients may be completely asymptomatic, so expecting clinical features to be highly sensitive is unrealistic. Indeed, this is the rationale for screening of healthy asymptomatic individuals for colorectal cancer in many countries. In addition, as lower GI symptoms are common, many individuals will need to be investigated to detect one case of cancer. Studies that have evaluated the accuracy of clinical features in the diagnosis of colorectal carcinoma have been conflicting, with some suggesting that no single item can be used to identify patients with colorectal cancer accurately, and others reporting that certain individual symptoms or signs, or a combination of both in statistical models, are accurate in this situation.1619 All studies have focused on maximising sensitivity, but this may not be appropriate when triaging patients for urgent referral where the doctor wishes to prioritise those with a high likelihood of having malignancy. A better approach to this problem would be to choose clinical features that have a high specificity for the diagnosis of colorectal carcinoma which, if present, would effectively rule the disease in.20

There is currently little in the way of distillation of published evidence to support the recommendations made by current guidelines. We have therefore conducted a systematic review of the literature to identify studies that examine the accuracy of alarm features to ascertain if it is possible, using current available data, to identify subgroups of patients who are likely to have colorectal carcinoma, and therefore require rapid referral for assessment and investigation to exclude this diagnosis.

## METHODS

### Data sources and search strategy

The systematic review was performed according to the Cochrane Methods Group on Screening and Diagnostic Tests guidelines.21 Two authors performed searches of the medical literature using MEDLINE (1950 to October 2007), EMBASE (1988 to October 2007) and CINAHL (1982 to October 2007). Papers on lower GI neoplasia were identified with the terms “colon neoplasms”, “rectal neoplasms”, “colorectal neoplasms” (both as Medical Subject Heading and as free text terms), “colo$” adj5 “cancer”, “colo$” adj5 “adenocarcinoma”, “rectal” adj5 “cancer” and “rectal” adj5 “adenocarcinoma” (all free text terms). These were combined using the set operator AND with papers evaluating clinical features using the terms “medical history taking”, “anaemia”, “iron deficiency anaemia”, “diarrhoea”, “alarm”, “weight” adj5 “loss”, “altered” adj5 “bowel” and “rectal” adj5 “bleed\$” (all free text terms). There were no language restrictions, and papers published in abstract form were eligible for inclusion in the review. Abstracts of the papers identified by the initial search were evaluated for appropriateness to the study question, and all potentially relevant papers were obtained and evaluated in detail. The bibliographies of identified studies were used to perform a recursive search of the literature. Authors were contacted where study data or methodology required further clarification.

### Study selection

Studies were required to report prospectively on unselected cohorts of adult patients attending for investigation of lower GI symptoms, and had to record symptoms prior to investigation. Those comparing accuracy of alarm features, a statistical model or a combination of both with the results of lower GI investigation were eligible for inclusion (Box 1). More than 90% of subjects were required to undergo lower GI investigation, defined as colonoscopy, barium enema or CT colography. Studies that used flexible sigmoidoscopy alone to evaluate patients were only eligible if patients were followed-up for at least 1 year, or there was evidence that data on all potentially missed colorectal cancers were collected at study end. Case–control studies evaluating patients with cancer and comparing them with patients without cancer were not included, as they tend to bias results in favour of the diagnostic test being studied.22 Articles were independently assessed by three researchers according to the predefined eligibility criteria, with disagreements resolved by consensus.

#### Box 1 Eligibility criteria

• Adult patients (aged>16 years) presenting with lower gastrointestinal (GI) symptoms evaluated

• Cross-sectional design (not case–control)

• Patients not specially selected*

• Lower GI symptoms recorded†

• Symptoms and diagnosis recorded prospectively

• Patients undergo lower GI investigations with diagnosis recorded‡

• Symptoms and investigative diagnosis compared

• More than 100 patients included

• More than one colorectal cancer diagnosed

*Patients could be selected by age or by primary care doctor’s referral, but not other criteria (eg, only patients without rectal bleeding).

†This includes diagnostic criteria, scores generated from symptom questionnaires and computer-aided diagnoses.

‡Colonoscopy, barium enema, CT colography, flexible sigmoidoscopy or any combination of the four.

### Data extraction and quality assessment

Data were extracted by three reviewers on to predesigned forms, and discrepancies in data extraction were resolved by consensus. The quality of included studies was assessed using a meta-analysis, which identified factors that influenced the outcome of diagnostic studies (Box 2).23 Studies were evaluated according to whether assessors were blinded, cases were consecutive and whether sample size was adequate.

#### Box 2 Method used for assigning quality of evidence

##### Level 1 (highest)

Independent blind comparisons of test with a valid criterion standard in a large number (⩾200) of consecutive patients.

##### Level 2

Independent blind comparisons of test with a valid criterion standard in a small number (<200) of consecutive patients. Studies that had separate researchers performing test and criterion standard but did not explicitly state that these were masked included in this category.

##### Level 3

Independent blind comparison of test with a valid criterion standard in patients who were not enrolled consecutively.

##### Level 4

Non-independent comparison of a test with a valid criterion standard among a “convenience” sample of patients believed to have the condition in question.

##### Level 5

Non-independent comparison of a test with a standard of uncertain validity.

### Data synthesis and analysis

Diagnoses established according to individual alarm features and statistical models were analysed separately. The primary goal of the study was to describe the performance of the various alarm features and statistical models in distinguishing colorectal carcinoma from all other organic and functional lower GI diseases. The sensitivity, specificity, positive likelihood ratio (LR), negative LR and 95% CI were calculated for each alarm feature using a Microsoft Excel spreadsheet (XP professional edition; Microsoft Corp, Redmond, Washington, USA) and checked using StatsDirect version 2.4.4 (StatsDirect, Sale, Cheshire, UK). Data were pooled using a random effects model,24 and StatsDirect was used to generate Forest plots of pooled sensitivities, specificities, and positive and negative LRs. Where sufficient studies reported the utility of an individual alarm feature, the sensitivity and (1 – specificity) for each study were plotted graphically and a pooled summary receiver operating characteristics (ROC) curve constructed, and the area under the curve calculated, using Meta-DiSc version 1.4 (Universidad Complutense, Madrid, Spain).

Heterogeneity between pooled studies for each alarm feature was assessed using the I2 statistic and χ2. Where statistically significant heterogeneity between studies existed (I2>25% or p<0.10)25 for a particular alarm feature and a sufficient number of studies were available, potential reasons for this were explored informally via subgroup analyses according to study setting (primary care-based vs secondary care-based), number of centres (single vs multicentre studies), type of lower GI investigation used (colonoscopy vs other lower GI investigations), method of symptom data collection (questionnaire vs clinical history vs method not reported) and study sample size (<500 patients vs ⩾500 patients). These are exploratory analyses only, and the results should be interpreted with caution. We compared LRs between these subgroups using the Cochrane Q statistic and, due to multiple analyses, a p value of <0.01 was considered statistically significant.

## RESULTS

The search strategy identified 11 169 studies of which 205 were possibly relevant to the systematic review and retrieved (fig 1). Of these, 15 studies were eligible for inclusion, evaluating a total of 19 443 patients,1619,2636 with a pooled prevalence of colorectal carcinoma in all studies of 6% (95% CI 5% to 8%). The prevalence of colorectal carcinoma in individual studies varied from 3% to 14.6%. Polyp detection rates in individual studies, where reported, varied between 7% and 50%. Complete colonic imaging rates for lower GI investigations varied between 56% and 98%. None of the included studies reported on operator experience, in terms of number of previous procedures performed, or withdrawal times for colonoscopy. Only three studies reported the method of bowel preparation,28 ,32 ,35 and only one study reported the quality of bowel preparation achieved,30 which was inadequate in 16% of patients. Other study characteristics are provided in table 1.

Figure 1 Flow diagram of assessment of studies identified in the systematic review.
Table 1 Characteristics of included studies

Thirteen studies examined the accuracy of alarm features in predicting colorectal carcinoma, one study reported on the accuracy of a statistical model and one study reported on both. Symptom data were collected using a questionnaire in seven studies,17 ,18 ,26 ,27 ,29 ,31 ,34 via clinical history in three studies,16 ,32 ,33 and the method of collection was unclear in the remaining studies. One study used barium enema alone as the lower GI investigation of choice,29 and no study used CT colography alone. Several studies included patients who required endoscopic visualisation of abnormalities detected at barium enema, or who were asymptomatic but undergoing follow-up or surveillance for previous colorectal carcinoma, polyps or inflammatory bowel disease.16 ,27 ,28 ,31 ,33 ,35 ,36 These groups of patients were always excluded from our analyses, as they were not relevant to the clinical question we were addressing.

### Rectal bleeding

We identified 14 studies evaluating 19 189 patients,1619 ,2736 with a pooled prevalence of colorectal cancer of 7% (95% CI 5% to 8%). The pooled prevalence of rectal bleeding in the studies was 49% (95% CI 38% to 59%), the character of which was not specified in 12 of the studies. The pooled positive and negative LRs for rectal bleeding were poor, and there was statistically significant heterogeneity between study results (table 2). The ROC curve indicated that rectal bleeding had a limited accuracy for diagnosing lower GI malignancy, with an area under the curve of 0.60 (fig 2).

Figure 2 Receiver operating characteristics curve for accuracy of rectal bleeding in predicting colorectal carcinoma. AUC, area under the curve.
Table 2 Sensitivity, specificity, and positive and negative likelihood ratios of individual symptom items and statistical models

Two studies collected more detailed information on the character of rectal blood loss in 4440 patients,17 ,18 with a pooled prevalence of colorectal cancer of 5% (95% CI 4% to 6%). The pooled prevalence of dark blood loss per rectum was 5% (95% CI 2% to 9%), as distinct from bright red rectal bleeding. The specificity of this symptom was high, which meant that the pooled positive LR was greater than for rectal bleeding of unspecified character (table 2).

### Change in bowel habit

There were 11 studies evaluating a current change in bowel habit in 17 581 patients,1719,2730,32,3436 with a pooled prevalence of colorectal cancer of 6% (95% CI% 5% to 8%). Only two studies specified the duration of this symptom, and this was between 3 and 12 months.17 ,34 The pooled prevalence of a change in bowel habit in the studies was 32% (95% CI 22% to 43%). The pooled positive and negative LRs for a change in bowel habit were poor, with statistically significant heterogeneity between studies (table 2).

### Anaemia

We identified seven studies reporting on the presence or absence of anaemia in 4404 patients,17 ,27 ,28 ,31 ,32 ,34 ,35 with a pooled prevalence of colorectal cancer of 8% (95% CI 6% to 11%). The pooled prevalence of anaemia was 11% (95% CI 8% to 15%). Pooled positive and negative LRs were again disappointing for anaemia in predicting underlying colorectal cancer (table 2).

Four studies specifically stated that they examined the utility of iron deficiency anaemia in predicting a diagnosis of colorectal carcinoma in 1571 patients,28 ,31 ,32 ,34 though only one study defined this precisely,34 with a pooled prevalence of cancer in these studies of 9% (95% CI 5% to 13%). The prevalence of iron deficiency anaemia was 14% (95% CI% 9% to 22%). However, the pooled LRs did not improve over those for anaemia of unspecified type (table 2).

### Weight loss

We identified five studies evaluating weight loss in 7418 patients,17 ,18 ,29 ,31 ,34 with a pooled prevalence of colorectal cancer of 6% (95% CI% 4% to 8%). Overall, the pooled prevalence of weight loss in the studies was 12% (95% CI% 6% to 20%). Specificity of weight loss was generally high, meaning that the pooled positive LR was almost 2, but the pooled negative LR was poor (table 2).

### Diarrhoea

There were five studies reporting on diarrhoea in 3904 patients,18 ,27 ,31 ,32 ,34 with a pooled prevalence of colorectal cancer of 9% (95% CI 5% to 13%). The pooled prevalence of diarrhoea in the studies was 20% (95% CI 6% to 38%). Both pooled positive and negative LRs were poor, with a positive LR <1, suggesting that the presence of diarrhoea was actually a negative predictor of colorectal cancer (table 2).

### Abdominal mass

Two studies reported on the presence or absence of an abdominal mass on physical examination in 2465 patients,17 ,35 with a pooled prevalence of cancer in the studies of 6% (95% CI 5% to 7%). The pooled prevalence of a palpable abdominal mass was 3% (95% CI 2% to 4%). Pooled specificity of an abdominal mass was high, but pooled sensitivity was very low, resulting in poor pooled positive and negative LRs (table 2).

### Statistical models

We identified two studies reporting on statistical models in a total of 2522 patients, with a pooled prevalence of colorectal carcinoma of 4% (95% CI 3% to 5%).18 ,26 One of the models was designed specifically to differentiate between colorectal carcinoma and other organic diseases of the colon and rectum, and was applied prospectively,18 and the other was designed to differentiate between organic and functional lower GI disease, and was applied retrospectively,26 but the reporting of data in this study allowed us to examine the model’s accuracy in predicting colorectal cancer compared with all other organic and functional lower GI disease. The models predicted a pooled prevalence of colorectal carcinoma of 44% (95% CI 36% to 51%). Pooled sensitivity was high, meaning that the pooled positive and negative LRs were better than for most of the individual symptom items (table 2).

### Subgroup analyses

There were sufficient studies available to perform subgroup analyses for rectal bleeding, change in bowel habit and anaemia of unspecified type (table 3). There were trends for a change in bowel habit to perform better as a predictor of colorectal cancer in secondary care studies compared with primary care studies, and for anaemia of unspecified type to perform better in studies that collected symptom data via a questionnaire, compared with those that did not report the method of data collection, and in multicentre studies. However, these did not achieve statistical significance, and the positive and negative LRs remained poor. The type of lower GI investigation used and study sample size appeared to have little effect on the utility of alarm features. Sensitivity was lower, and specificity higher, in studies that used colonoscopy when change in bowel habit was evaluated, but not for rectal bleeding or anaemia. The reason for this is unclear, and is probably a chance finding.

Table 3 Subgroup analyses for rectal bleeding, change in bowel habit and anaemia

## DISCUSSION

To our knowledge this is the first systematic review and meta-analysis of the accuracy of alarm features or statistical models in predicting which individuals need urgent investigation to exclude colorectal cancer. The current study suggests that symptoms and signs traditionally thought to have high sensitivity for the diagnosis of colorectal cancer, when taken individually, perform little better than chance alone in differentiating between cancer and other organic and functional lower GI disease. Statistical models, validated in only two studies, were also suboptimal. This is because these models were heavily weighted towards detecting all colorectal cancers, and therefore demonstrated high sensitivity, at the expense of specificity. This aspect of their design means that many individuals would need to be investigated to detect one case of colon cancer, and they are therefore not suitable for triaging patients for urgent referral.

However, the current data also provide additional novel, and clinically more useful, information. The presence of a palpable abdominal mass on examination and a report of dark red rectal bleeding by the patient both had a specificity of >95% for the diagnosis of colorectal carcinoma, because the vast majority of individuals without cancer did not exhibit either of these clinical features. This means that their presence in a patient effectively rules in the diagnosis of colon cancer. These features could therefore have potential value to prioritise access to urgent colonoscopy in the future. The caveat to this is that these features were reported only in secondary care-based studies, and both the spectrum of disease and the doctor making the diagnosis will be different from those encountered in primary care.

There may be ways to improve the utility of other individual alarm features in predicting colorectal cancer. Examples include specifying the amount of weight lost (and over what period of time); the degree of anaemia (and whether there is definite evidence that this is due to iron deficiency); and using more precise definitions of diarrhoea and change in bowel habit based on either stool frequency, form or a combination of the two. Of the seven studies that reported on the presence of anaemia in this meta-analysis, only four stated that this was iron deficiency anaemia,28 ,31 ,32 ,34 and only one of these reported the threshold used to define this.34 Excluding the three studies that did not expand on the term anaemia from the meta-analysis,17 ,27 ,35 so that only studies examining the utility of iron deficiency anaemia were pooled, had little effect on the sensitivity, specificity, or positive and negative LR of anaemia in predicting colorectal carcinoma. However, from table 2 it can be seen that the study providing a precise definition of what constituted iron deficiency anaemia gave the best sensitivity, positive LR and negative LR of these four studies.34 This suggests that definite objective evidence of iron deficiency anaemia may provide greater accuracy in predicting a diagnosis of colorectal carcinoma, rather than anaemia of unspecified type. One study also quantified the degree of weight loss,17 though the positive and negative LRs derived from these data remain disappointing.

Another approach to improve the accuracy of alarm features in predicting underlying colorectal carcinoma might be to combine them. Two of the studies included in this meta-analysis used this approach.19 ,36 However, the sensitivity and positive and negative LRs remained poor. As data reporting was inconsistent between the two studies that combined symptoms, in terms of the individual symptoms that were selected for combination, it was not possible to pool the results for the purpose of this meta-analysis in order to assess the value of this approach further. Doctors often combine other items from the clinical history, such as patient age and any relevant family history, with the patient’s symptoms as part of the diagnostic process,37 and this may improve overall accuracy. This is essentially the approach taken by statistical models, but these are currently suboptimal as they have focused on achieving a high sensitivity, by predicting that between 40% and 50% of patients presenting with lower GI symptoms will have colorectal cancer. A health service with a limited budget would be unable to cope with the demands placed upon it if all these patients were referred for urgent investigation. Future models should concentrate on maximising specificity in order to rule in a diagnosis of colorectal cancer and prioritise urgent referral more effectively.

This systematic review is limited by the quality of the studies included. The majority of the studies identified, while large in several cases, were of poor quality in terms of reporting whether either patient recruitment was consecutive or assessors were blinded. As studies did not routinely report quality assurance data such as quality of bowel preparation, operator experience, withdrawal time for colonoscopy and completion rates, the accuracy of colonoscopy or barium enema as a diagnostic test cannot be assessed as part of the current study. However, even if the quality of the examinations is suboptimal in some of these studies, this is unlikely to account for the disappointing accuracy of most alarm features.

Subgroup analyses conducted according to study setting, type of lower GI investigation used, method of symptom data collection, number of study centres and study sample size failed to have any significant impact on the LRs of individual symptom items, where sufficient studies reported them, and did not reveal any obvious cause for the observed heterogeneity between studies. The poor performance of alarm features in predicting colorectal carcinoma may have been due, in part, to the fact that the majority of included studies were based in secondary care. This could have created a referral bias, whereby primary care doctors preferentially referred only those individuals in whom it was not possible to reach a clear diagnosis, after a comprehensive history and physical examination, to the studies. However, this is unlikely to be a major explanation for the findings of the current study, as the performance of these individual symptoms was no better when only primary care-based studies were included in the meta-analysis. The prevalence of colorectal cancer varied between studies, but in the majority of cases was between 4% and 8%. Three studies reported a prevalence in excess of 10%,27 ,32 ,34 higher than would be expected in a Western population; two of these studies were conducted in Italy.27 ,34 The reasons for this are speculative, but may reflect a degree of selection bias. Eleven of the studies stated that recruitment had been prospective, nine that they had included consecutive patients, only one study specified that investigators were blinded to patient data,29 and in four studies clinical data were collected by the doctor performing the investigative procedure,27 ,3133 though the latter would be expected to bias studies towards overestimating the accuracy of alarm features in diagnosing colorectal cancer.

The findings of the current study demonstrate that there is little evidence to support current NICE guidelines on rapid referral for suspected colorectal cancer.14 ,15 The presence of a palpable abdominal mass and dark red rectal bleeding appeared to be the most useful individual clinical features to prioritise access to urgent colonoscopy, due to their high specificity. The latter item is not utilised in current guidelines, but this is an important observation and needs further validation in future studies. Finally, models assessing a combination of features should be developed to focus on specificity rather than sensitivity. These recommendations for research may enable future referral criteria to better identify patients with high likelihood of lower GI malignancy who warrant urgent investigation.

## Footnotes

• Competing interests: None.