Faecal immunochemical test is superior to symptoms in predicting pathology in patients with suspected colorectal cancer symptoms referred on a 2WW pathway: a diagnostic accuracy study

Objective To assess whether a faecal immunochemical test (FIT) could be used to select patients with suspected colorectal cancer (CRC) symptoms for urgent investigation. Design Multicentre, double-blinded diagnostic accuracy study in 50 National Health Service (NHS) hospitals across England between October 2017 and December 2019. Patients referred to secondary care with suspected CRC symptoms meeting NHS England criteria for urgent 2 weeks wait referral and triaged to investigation with colonoscopy were invited to perform a quantitative FIT. The sensitivity of FIT for CRC, and effect of relevant variables on its diagnostic accuracy was assessed. Results 9822 patients were included in the final analysis. The prevalence of CRC at colonoscopy was 3.3%. The FIT positivity decreased from 37.2% to 19.0% and 7.6%, respectively, at cut-offs of 2, 10 and 150 µg haemoglobin/g faeces (µg/g). The positive predictive values of FIT for CRC at these cut-offs were 8.7% (95% CI, 7.8% to 9.7%), 16.1% (95% CI 14.4% to 17.8%) and 31.1% (95% CI 27.8% to 34.6%), respectively, and the negative predictive values were 99.8% (95% CI 99.7% to 99.9%), 99.6% (95% CI 99.5% to 99.7%) and 98.9% (95% CI 98.7% to 99.1%), respectively. The sensitivity of FIT for CRC decreased at the same cut-offs from 97.0% (95% CI 94.5% to 98.5%) to 90.9% (95% CI 87.2% to 93.8%) and 70.8% (95% CI 65.6% to 75.7%), respectively, while the specificity increased from 64.9% (95% CI 63.9% to 65.8%) to 83.5% (95% CI 82.8% to 84.3%) and 94.6% (95% CI 94.1% to 95.0%), respectively. The area under the receiver operating characteristic curve was 0.93 (95% CI 0.92 to 0.95). Conclusion FIT sensitivity is maximised to 97.0% at the lowest cut-off (2 µg/g); a negative FIT result at this cut-off can effectively rule out CRC and a positive FIT result is better than symptoms to select patients for urgent investigations. Trial registration number ISRCTN49676259.


INTRODUCTION
Bowel symptoms are the imprecise basis of referral for urgent investigation in England to rule out cancer. 1 2 Symptoms are non-specific for colorectal cancer (CRC); 96 of 100 patients referred urgently on a 2-week (2WW) wait pathway under National Institute for Health and Care Excellence (NICE) NG12 guidelines will not have CRC. 1 Urgent referrals have increased by 90% over the last 5 years 3 ; 45% of UK endoscopy units are failing to meet CRC waiting targets. 4 The faecal immunochemical test (FIT) was recommended by NICE (DG30) 2 in 2017 to guide the referral of patients with low-risk symptoms of

Significance of this study
What is already known on this subject? ► Faecal immunochemical tests (FIT) are already recommended by the National Institute for Heath and Care Excellence to guide referral of patients with low-risk bowel symptoms but has not been recommended for all symptomatic patients due to concerns over the quality and power of previous studies.
What are the new findings? ► FIT sensitivity for colorectal cancer (CRC) is maximised to 97.0% at the limit of detection of 2 µg haemoglobin (Hb)/g faeces (µg/g). ► A faecal Hb concentration (f-Hb) result less than the limit of detection in symptomatic patients indicates that their chances of not having CRC is 99.8%. ► There was no significant variation in the ability of FIT to detect CRC by patient or tumour characteristics, including age, sex, ethnicity, deprivation or iron-deficiency anaemia.
How might it impact on clinical practice in the foreseeable future? ► FIT could be used to rule out CRC in primary care for symptomatic patients meeting 2 weeks wait criteria, with sensitivity equivalent to colonoscopy at a cut-off of 2 µg/g. ► FIT can be used to prioritise patients for investigation, as CRC and other serious bowel disease is more likely at higher f-Hb concentrations. ► The diagnostic accuracy of FIT for CRC is superior to symptoms.
CRC, and is currently used in the National Health Service (NHS) of England. FIT detects the globin component of haemoglobin (Hb) by immunoassay and can reliably measure the faecal Hb concentration (f-Hb) to the nearest microgram of Hb per gram of faeces (µg/g). 5 Since 2010, over 25 diagnostic accuracy studies have reported data on the use of FIT in symptomatic patients utilising a range of cut-offs. [6][7][8] In 2014, a study of 787 symptomatic patients from Spain suggested that FIT is more accurate for the detection of CRC than NICE 2005 criteria (CG27) although NICE have since expanded its referral criteria to include lower risk symptoms (NG12). 9 10 More recently, two meta-analyses reported the sensitivity of FIT for CRC in symptomatic patients at a cut-off of 10 µg/g was 92.1% (95% CI 86.9% to 95.3%) 6 and 94.1% (95% CI 90.0% to 96.6%). 7 However, meta-analyses cannot account for variation in f-Hb concentrations by patientlevel variables such as age, [11][12][13] sex, 11-13 deprivation 13 14 and between homogeneous ethnic population, 15 which may lead to higher rates of undetected cancers within certain groups of patients. Consequently, a health technology assessment recommended that diagnostic cohort studies were performed to investigate variation in FIT accuracy in relevant subgroups. 6 Similarly, a systematic review concluded a clear need for research on FIT as a triage test in the symptomatic primary care population. 16 The NICE guidelines and FIT (NICE FIT) study was designed to investigate whether FIT could be used to rule out CRC in symptomatic patients in primary care meeting NICE 2WW criteria, and guide referral for further investigation.

Study design
The study met Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidelines. 17 Ethics and study approval were granted from the UK Health Research Authority (IRAS 218404).
Patients were recruited at 50 NHS hospitals across England; sites were opened sequentially during the study.
The primary outcome measure was to identify a suitable f-Hb cut-off that would maximise sensitivity for CRC. The secondary outcome measures were to establish the diagnostic accuracy of FIT for CRC and other serious bowel disease (SBD) at different f-Hb cut-offs, and investigate the impact of other variables, such as age, sex, ethnicity and deprivation.

Patient and public involvement
Patient and public representatives were consulted through a process of in-depth interviews during the development of the study protocol. All relevant feedback was considered and incorporated into patient information sheets. Study progress and feedback was provided regularly to the Royal Marsden Partners (RM Partners) Patient Advisory Group by the senior research manager. The chief investigator regularly reported to the RM Partners Clinical Oversight Group which includedpatient and public involvement representatives throughout all phases of the study. The results will be disseminated to trial participants directly via email and the website (https://www. nicefitstudy. com/), to other healthcare professionals at scientific conferences and through press releases.

Patient selection
All patients referred from primary care with symptoms of suspected CRC meeting NICE referral criteria under the 2WW pathway and who were triaged by secondary care clinicians to investigation by colonoscopy were eligible for inclusion. Secondary care sites were opened continuously throughout the process. The total number of eligible patients at each site was not captured but was dependent on the volume of referrals received by each site, and the length of time the study was

Bowel cancer screening
open to recruitment. Data on symptoms were extracted from NICE NG12 2WW and DG30 referral criteria completed on the referral form by primary care clinicians. 1 2 Patients referred urgently on a 2WW pathway without meeting NICE criteria due to clinical concerns were classified as 'others' and included in the analysis. Since patients are often referred with multiple symptoms or signs, a hierarchy was created to match one criterion to each patient, based on clinical estimation of positive predictive values (PPV). NG12 criteria were ranked in importance as follows: abdominal mass, iron-deficiency anaemia (IDA) (patients over 60 years), rectal bleeding, change in bowel habit (over 60) and abdominal pain and weight loss. DG30 criteria, were ranked in importance as follows: IDA (under 60), non-IDA, abdominal pain or weight loss, change in bowel habit (under 60).
Patients were identified by the central study team or local cancer research network (CRN) team once they had been booked for colonoscopy and contacted by post or telephone and invited to participate in the study. Patients were sent an FIT specimen collection device and asked to collect one sample of faeces prior to commencing bowel preparation for their colonoscopy. A first-class return envelope was enclosed for patients to post their sample directly to the study laboratory. Patients initially provided written consent, and after approval from the National Confidentiality Advisory Group, gave implied consent by returning an FIT sample.
Patients were not included if they did not return an FIT sample, did not have a complete colonoscopy unless due to CRC, were retriaged to another investigation (eg, flexible sigmoidoscopy or CT), or withdrew consent. Patients due to undergo colonoscopy within 3 days of identification were not invited to participate in the study, as there would not have been sufficient time to return a sample. In the original NG12 guidance, 1 NICE recommended that patients with low risk bowel symptoms were tested with a guiac-based faecal occult blood test (gFOBT) prior to 2WW referral. In many regions, these patients were referred on 2WW pathways without gFOBT due to concerns over its poor sensitivity for CRC, 18 and therefore, were eligible for inclusion. During this study, NICE recommended that low risk patients, as defined in DG30, 2 were triaged in primary care with FIT prior to 2WW referral. This guidance was not fully implemented during this study, but those low-risk patients who were tested with FIT in primary care prior to referral were not included.

Index test and reference standard
FIT analysis was performed at one centralised laboratory where staff were blinded to patient clinical information. One HM-JACKarc analytical system (Hitachi Chemical Diagnostics Systems, Tokyo, Japan, supplied by Alpha Labs, Eastleigh, Hants, UK) was used to analyse all samples. The analytical working range is 7-400 µg/g. The limit of detection (LoD) of the assay is 2 µg/g and the limit of quantitation is 7 µg/g. NICE recommended an f-Hb cut-off of 10 µg/g in the DG30 guidelines. 2 In accordance with previous publications on FIT, we chose the LoD and the f-Hb cut-off recommended in NICE DG30 as cut-offs to investigate sensitivity. To investigate the specificity and PPV at higher f-Hb, we also chose a higher cut-off of 150 µg/g that had previously been reported to predict high rates of significant pathology. 19 FIT specimen collection and handling, quality management and result handling was conducted and reported according to recent guidelines for studies on FIT 20 (see online supplemental appendix), using recommended analytical performance specifications. 5 FIT samples that were unsuitable for analysis (collection device over or underfilled, or unavailable for analysis for more than 14 days) or performed after the colonoscopy were not included in the study.
Colonoscopy was chosen as the reference standard since it is acknowledged to be the gold-standard investigation for  Clinical data extraction was performed initially by the local CRN team. A rigorous system of quality assurance was implemented. All colonoscopy and pathology results, as well as clinical and pathological tumour staging. were checked by the central study team, and then again by a team of senior colorectal clinicians blinded to the FIT laboratory results.

Sample size
To determine the sample size, calculations were based on a significance level of 5%, power of 80% and prevalence of CRC within the NICE 2WW symptomatic population estimated at 3.5% based on data from the RM Partners Network. To demonstrate a lowest acceptable sensitivity of FIT for CRC of 98% with CI width of 2%, a total sample size of 5379 patients was required. Given that previous studies had reported a 50% noncompletion rate, it was determined that at least 10 000 patients would need to be invited to participate in the study. The study was funded to over-recruit beyond this sample size to address the secondary endpoints and investigate the impact of other factors on FIT diagnostic accuracy. Accurate power calculations were not possible for the secondary endpoints, due to the lack of data on these covariates on the diagnostic accuracy of FIT for CRC in the symptomatic populations.

Data analysis
Patients with multiple findings at colonoscopy were recategorised with one diagnosis in a hierarchy; CRC ranked highest followed by high-risk adenoma (HRA) and then inflammatory bowel disease (IBD). These were grouped together as SBD. This was followed by low-risk adenoma (LRA) which was ranked above other non-malignant diagnoses, including diverticular disease, microscopic colitis, benign perianal disease (haemorrhoids, anal fissures, anal fistulas, solitary rectal ulcers), angiodysplasia, or rare findings such as melanosis coli, parasites or lipomas. HRA was defined by the NICE FIT Steering group as any polyp with high-grade dysplasia or polyps over 10 mm in size with low grade dysplasia, and serrated lesions in the right colon. Other polyps less than 10 mm were classified as LRA.
The indices of multiple deprivation were derived from postcodes (1=most deprived and 10=least deprived). 21 Patients were classified as anaemic according to WHO criteria 22 ; blood Hb concentration less than 120 g/L for women or 130 g/L for men, based on the most recent measurement within 3 months before referral. IDA was defined using British Society of Gastroenterology guidelines as present when serum ferritin concentration was less than 15 µg/L. 23 Data were assessed for normality by the Shapiro test and Q-Q plot analysis. Mann-Whitney and Kruskal Wallis tests were used for non-normally distributed data. Analysis of variancewas used across multiple groups, with separate models for each factor; age was pooled for analysis. Categorical data were compared with χ 2 tests. Sensitivity, specificity, PPV and negative predictive value (NPV) were reported for each f-Hb cut-off, with 95% CIs . Receiver operating characteristic (ROC) curves were plotted for f-Hb. These were done using an initial threshold of 0.1 to calculate sensitivity and specificity, and then recalculated with increments of 0.1 to plot the ROC curve. In every statistical analysis, p<0.05 was considered significant. All analyses were performed using SAS V.9.4 (SAS Institute).

RESULTS
Between October 2017 and December 2019, 21 126 patients were sent recruitment packs, 13 219 (62.6%) returned FIT devices. Complete FIT and colonoscopy outcomes were available for 9 822 patients, who were included in the study results.

Bowel cancer screening
A study flow diagram is shown in figure 1: NICE FIT study flow diagram (adapted from STARD) . Data were not uploaded by the local sites for 44 patients were excluded. Patient demographics are summarised in table 1. The median patient age was 65.0 years (IQR 56.0-73.0). Women returned 54.9% of kits. The most common ethnic groups were white (75.9%), other (11.2%) and Asian (6.3%). The median deprivation index score was 6.0 (IQR 4.0-9.0). Patients were referred most commonly with high-risk symptoms meeting NG12 criteria (73.2%), followed by low-risk symptoms meeting DG30 criteria (21.4%) or other symptoms warranting urgent referral (6.4%).
Tests that were older than 14 days or sampled inadequately (n=330) could not be analysed. FIT analysis was performed within 7 days of sample collection in 94.8% of specimens, and within a day of receipt by the laboratory in 94.6% of specimens.
Findings at colonoscopy are reported in table 2. Overall, the most prevalent finding at colonoscopy was that no disease was detected (31.3%). SBD (CRC, HRA or IBD) was detected in 11.9% of patients during colonoscopy. CRC was detected in 3.3% of patients.
The diagnostic accuracy of FIT for SBD is summarised in table 4. The sensitivity of FIT for HRA and IBD are significantly lower than for CRC at every f-Hb cut-off. The PPV for SBD increases significantly at higher f-Hb cut-offs; 24.8% at 2 µg/g, 39.6% at 10 µg/g and 64.5% at 150 µg/g.
On ROC curve analysis (figure 2), the area under the curve (AUC) for CRC was 0.93 (0.92-0.95). Youden's index, which maximises the sum of sensitivity and specificity was 38 µg/g, but FIT sensitivity was still optimised at 2 µg/g. Patients with CRC that had f-Hb <10 µg/g were analysed in further detail (table 5). There were no significant differences between patients with CRC and f-Hb greater or less than either cut-off of 2 µg/g or 10 µg/g with regard to age, sex, deprivation or ethnicity, iron and non-IDA or tumour characteristics.

DISCUSSION
This is the first powered, multicentre, double-blinded diagnostic accuracy study to demonstrate that FIT can be used to select patients with NICE 2WW symptoms for urgent investigation. FIT can be used to rule out CRC when f-Hb is undetectable or low. FIT sensitivity for CRC is significantly higher at 97% when using a lower f-Hb cut-off of the LoD (2 µg/g) compared with 10 µg/g, the cut-off recommended in NICE DG30. No significant difference was found in FIT sensitivity on subgroup analysis by age, sex, deprivation, ethnicity and tumour characteristics, suggesting FIT can be used in all symptomatic patients that meet 2WW referral criteria. Employing a higher cut-off for investigation will result in a smaller group of FIT positive patients with a higher PPV or prevalence for CRC, but at the expense of detecting fewer CRC; this strategy may be adopted when endoscopy capacity is restricted or paused as occurred at the height of the current COVID-19 pandemic. 25 26 The likelihood of cancer increases with increasing f-Hb concentrations (above 150 µg/g), and consequently, FIT could be used to rule-in cancer or prioritise patients for investigation.
The most common finding at colonoscopy in symptomatic patients in our study was the absence of disease (31.3%) in keeping with other reports on 2WW referrals 19 27 ; FIT can appropriately triage these patients off urgent pathways for investigation. Importantly, a negative FIT result can be used to reassure patients that their symptoms are unlikely to be due to CRC because of the high NPV; 99.8% and 99.6% at 2 µg/g and 10 µg/g, respectively. Patients with symptoms meeting NICE criteria and a negative FIT result at these cut-offs have less than 0.5% chance of CRC; a very low risk, but not no risk. In patients with undetectable f-Hb, 617 patients would need to undergo  Continued Bowel cancer screening colonoscopy to detect 1 CRC; hence clinical acumen and safetynetting remains essential to identify patients with CRC and false negative FIT. 28 The ROC AUC of 0.93 (0.92-0.95) confirms that the diagnostic accuracy of FIT is excellent, and on its own is at least as good as risk scores such as FAST (AUC 0.91) or COLON-PREDICT (AUC 0.92) that combine FIT with other patient characteristics such as demographics, serum Hb and symptoms. 29 30 Colonoscopy currently remains the gold-standard investigation to diagnose or exclude CRC but can fail to detect CRC. The sensitivity of colonoscopy for CRC in a meta-analysis of 9223 patients in 25 studies was 94.7% (95% CI 90.4% to 92.7%), although the largest trials reported data from asymptomatic participants in screening programmes. 31 A recent study from the UK reported that the postcolonoscopy CRC rate at 3 years was 3.6%-7.4%, implying that these CRC were potentially not detected at index colonoscopy. 32 In this context, FIT sensitivity of 97.0% for CRC at a cut-off of the LoD appears to be equivalent to colonoscopy for the detection of CRC. Other SBDs such as HRA and IBD are associated with a raised f-Hb; the PPV of 64.5% for SBD at 150 ug/g is clinically significant. However, the poor sensitivity of FIT for HRA at 65.8% and IBD at 73,1% at a cut-off of 2% and 45.4% and 57.8% at a cut-off of 10 suggest that FIT does not reliably identify these conditions. FIT has already been recommended by NICE DG30 to triage patients with low-risk symptoms 2 for investigation, but at the time was not been recommended for high-risk symptoms, due to a lack of robust evidence within the UK setting and because f-Hb are known to vary by age, [11][12][13] sex, [11][12][13] deprivation, 13 14 cancer stage, 33 IDA 19 27 and between homogeneous ethnic populations. 15 We investigated these known covariates and found that there was no significant difference in FIT sensitivity for CRC across all groups including cancer stage and IDA at cut-offs of 2 and 10 µg/g. Previous studies have reported some differences in median f-Hb across these variables 11-15 but this was not clinically relevant for detection of CRC at the different cut-offs investigated. We have not found significant difference in FIT sensitivity in patients referred with IDA, as was reported in other studies. 19 27 However, missing data during referral from primary care or even prior to colonoscopy was common, particularly ferritin concentration (30%), but even Hb. Furthermore, data on luminal narrowing was not reported in 24.6% of colonoscopy reports, and while ethnic representation reflected the UK population, the numbers of some minority groups even within a study of this size remained small, and the potential for type II error exists.
Although not yet recommended by NICE, FIT is already being used by some services for high-risk symptoms. The largest two reports on the diagnostic accuracy of FIT in high risk symptoms were from service evaluations within Nottingham 19 in England and NHS Tayside in Scotland. 27 Neither study investigated the impact of age, sex, deprivation or ethnicity on FIT diagnostic accuracy. Chapman et al 19 investigated 1106 patients in Nottingham with NICE NG12 2WW symptoms (excluding rectal bleeding) with FIT. Similar results to our study were reported, with sensitivity for CRC of 97.5%, 87.5% and 60% at cut-offs of 4 µg/g (the LoD of the FIT system used), 10 µg/g and 150 µg/g, respectively; the PPV for CRC at the same cut-offs were 12.5%, 14.6% and 35.8%, respectively. No disease was found at colonoscopy in 58.8% of patients in Nottingham. In NHS Tayside, Mowat et al 27 reported the results on 1447 symptomatic patients investigated with FIT prior to colonoscopy. At a cut-off of 10 µg/g, the sensitivity of FIT was 90.5% and PPV was 11.0%: no disease was found in 27.8% of patients. FIT sensitivity for CRC in our study at 10 µg/g was similar to the results of previous meta-analyses of 4091 symptomatic patients 92.1% (95% CI 86.9% to 95.3%) 6 Table 5 Continued and designed to meet the highest methodological quality using STARD guidelines, our results on FIT accuracy unequivocally supports the use of FIT as a basis to triage patients with 2WW symptoms for referral and investigation.
Our system of quality assurance is the first described in the symptomatic FIT literature; over 30% of errors in colonoscopy data coding were detected by clinicians, including misclassification of CRC. The missed CRC rate is unknown, since the volume and key performance indicators of individual endoscopists are unknown, although the majority of endoscopy units in this study were accredited by the Joint Advisory Group on Gastrointestinal Endoscopy. As 11% of colonoscopies were incomplete and excluded from analysis, it is possible that the true prevalence of pathology present in a 2WW population was not captured by this study but at 3.3%, CRC prevalence in this study is equivalent to CRC prevalence in the 2WW population recorded nationally. 3 34 We found no obvious pattern or cause for false negative FIT results in patients with CRC, which may require further research into patient-level (genetic or medication) variables. Sampling studies in symptomatic patients (eg, multiple samples from the one bowel motion or consecutive motions) may provide possible strategies to improve sensitivity. Sequential use of further biomarkers (including volatile organic compounds in the urine, faeces or breath) following FIT might reduce the number of false positive and false negative results. 35 36 Our diagnostic accuracy results may not be replicated in other laboratories or FIT analysers, which may not be able to detect f-Hb down to 2 µg/g; an international group is working on FIT method standardisation. 37 Finally, the optimal FIT pathway remains unclear. When FIT was used in primary care in Scotland, referrals reduced by 15.1%. 27 When FIT accompanied referral in Nottingham, 2WW referrals and 2WW CTC usage increased while there was no long-term reduction in 2WW colonoscopy usage; possibly due to referral of a wider, lower risk group of patients. 38 We would recommend incorporating FIT into referral pathway of symptomatic patients in primary care with appropriate safety netting, to reduce unnecessary referrals for investigations and help secondary care prioritise patients with higher risk of CRC. NICE have already recommended in their DG30 guidance use of FIT in primary care as a triaging tool for low risk symptoms before referral to secondary care; this strategy should be expanded to include all symptomatic patients. The f-Hb cut-off for onwards referral should be set at the LoD (2 µg/g) to provide sensitivity equivalent to colonoscopy, the current gold standard for investigation and yet reduce referrals by 60%. While not the primary intention of the 2WW pathway, more cases of HRA (20.4%) and IBD (15.3%) will also be detected at the LOD than 10 µg/g. Alternatively, the f-Hb cut-offs could be set higher to reduce referrals further to match existing colonoscopy resource and maximise the PPV for CRC.

CONCLUSION
FIT is superior to 2WW symptoms in predicting pathology in patients with suspected CRC. At a cut-off of the LoD of the FIT analytical system used, FIT detects CRC with equivalent diagnostic accuracy to colonoscopy. A higher f-Hb cut-off may be set to match capacity in resource-limited settings; this will reduce the number of positive results, onwards referral for investigation and demand for colonoscopy but at the expense of detecting fewer cancers. High f-Hb levels are associated with a high PPV for CRC and SBD and can be used to prioritise investigations.
Correction notice This article has been corrected since it published Online First. The formatting of the last column in table 5 has been corrected.
Twitter Nigel D'Souza @mrnigeldsouza and Muti Abulafi @muti192 Bowel cancer screening Ethics approval The study was approved by the National Research Ethics Service Committee, London-South East (reference 16/LO/2174). Ethics and study approval were granted from the UK Health Research Authority (IRAS 218404).
Provenance and peer review Not commissioned; externally peer reviewed.

Data availability statement No data are available.
Open access This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https:// creativecommons. org/ licenses/ by/ 4. 0/.