Background Clinicians use fibrosis in a liver biopsy to predict clinical outcomes of chronic liver disease. The performance of non-invasive tests has been evaluated against histological assessment of fibrosis but use of clinical outcomes as the reference standard would be ideal. The enhanced liver fibrosis (ELF) test was derived and validated in a large cohort of patients and shown to have high diagnostic accuracy (area under the curve (AUC)=0.80 95% CI 0.76 to 0.85) in identification of significant fibrosis on biopsy.
Objective To evaluate ELF performance in predicting clinical outcomes by following up the original ELF cohort.
Methods Patients recruited to the ELF study at seven English centres were followed up for liver morbidity and mortality by examination of clinical data. Defaulting/discharged patients were followed up by family practitioner questionnaires. Primary outcome measure was liver-related morbidity/liver-related death.
Results 457 patients were followed up (median 7 years), with ascertainment of clinical status in 92%. There were 61 liver-related outcomes (39 deaths). Survival analysis showed that the ELF score predicts liver outcomes, with people having the highest ELF scores being significantly more likely to have clinical outcomes than those in lower-score groups. A Cox proportional hazards model showed fully adjusted HRs of 75 (ELF score 12.52–16.67), 20 (10.426–12.51) and 5 (8.34–10.425) compared with patients with ELF <8.34. A unit change in ELF is associated with a doubling of risk of liver-related outcome.
Conclusions An ELF test can predict clinical outcomes in patients with chronic liver disease and may be a useful prognostic tool in clinical practice.
- Liver disease
- non-invasive test
- clinical outcome
- ELF test
- chronic liver disease
Statistics from Altmetric.com
Significance of this study
What is already known about this subject?
Liver biopsy is an imperfect reference standard.
Serum markers have been shown to be predictive of liver fibrosis in cross-sectional studies.
There are limited data on the performance of serum markers in directly predicting clinical outcomes.
The enhanced liver fibrosis (ELF) test is a panel of serum markers that has been shown to predict fibrosis on biopsy with good accuracy in external validation studies but there are no data on predicting clinical outcomes.
What are the new findings?
In a follow-up study of 457 patients with mixed aetiology, a chronic liver disease ELF test can predict liver-related clinical outcomes at least as well as a liver biopsy.
A unit change in ELF is associated with a doubling of risk of liver-related outcome.
From survival analysis the ELF test had fully adjusted HRs of 75 (ELF score 12.52–16.67), 20 (10.426–12.51) and 5 (8.34–10.425) compared with patients with ELF <8.34.
An ELF test can predict clinical outcomes in patients with chronic liver disease and may be a useful prognostic tool in clinical practice.
Morbidity and mortality in chronic liver disease (CLD) are attributable to the presence, degree and consequences of hepatic fibrosis.1–3 In the minority of cases moderate or even severe fibrosis is associated with non-specific symptoms but in most cases it is asymptomatic until the disease has progressed to cirrhosis, when portal hypertension and/or hepatocellular failure develop, presenting as bleeding oesophageal varices, ascites, encephalopathy or jaundice. This may take decades. Consequently, it is difficult to identify cases of advanced liver disease before symptoms herald decompensated disease, and so prediction of future morbidity and mortality is problematic. The severity of liver fibrosis has been used by clinicians as a surrogate for clinical outcomes in CLD. The established method for the assessment of fibrosis is examination of histological specimens of the liver obtained by needle biopsy. This invasive test is subject to several limitations, including the risk for the patient, sampling error, interpretation and diagnostic accuracy.4 5 Although biopsy is useful in the diagnosis of liver disease, increasing awareness of its deficiencies has led to growing interest in the use of non-invasive methods for assessing fibrosis, the most extensively evaluated being imaging and serum biomarkers. If such methods could accurately predict prognosis they could be used to identify those at risk and provide valuable information for optimising the management of patients with CLD. This could include longitudinal assessment of the impact of interventions such as lifestyle modifications and treatment of the underlying aetiology—for example, chronic hepatitis B and C, and when available, anti-fibrotic treatments.
Serum biomarkers offer the advantage over histological scoring systems of generating continuous quantitative variables and can be automated, allowing standardisation and high throughput. The original European Liver Fibrosis panel is an example of such a panel of biomarkers shown to be accurate in diagnosing significant fibrosis on biopsy in a large, mixed liver disease study (n=921).6 This panel incorporates hyaluronic acid, tissue inhibitor of matrix metalloproteinases-1, aminoterminal propeptide of procollagen type III (which are involved in the synthesis and degradation of extracellular matrix) and age. The panel has since been simplified by removing age while maintaining diagnostic accuracy, establishing the enhanced liver fibrosis (ELF) test,7 which has been shown to be accurate in predicting significant liver fibrosis in independent populations.7–10 This paper reports a prospective study aimed at determining the prognostic accuracy of ELF in patients with CLD.
Patients recruited to the original European Liver Fibrosis study in English centres (1998–2000) were followed up for clinical outcomes by clinical record review. All biopsy specimens were of ≥12 mm length and ≥5 portal triads, and analysed by one expert liver histopathologist blinded to the ELF score (intra-variability κ=0.934; SE 0.012; p<0.0001; 95% CI 0.911 to 0.957). Patients were eligible for inclusion if they were aged 18–75 years and scheduled for liver biopsy for investigation of CLD following a clinical assessment by the lead hepatologist at the recruiting centre. Patients were excluded if they had a disorder associated with extrahepatic fibrosis, had clinical symptoms of decompensated cirrhosis (Child–Pugh class C) or hepatocellular carcinoma (HCC). In addition, some patients (n=44) in the original study had received liver transplants before the study and these patients were excluded from the current prognostic study. Patients were recruited consecutively. Ethical permission was granted by Multi-Centre Research Ethics Committee South West (MREC/98/6/08).
The primary outcome measure was the incidence of the first liver-related clinical event defined as liver-related death, or any episodes of decompensated cirrhosis after recruitment, including ascites (detected by paracentesis, ultrasound, or on clinical examination), encephalopathy (defined clinically), oesophageal variceal haemorrhage (confirmed by endoscopy), liver transplantation and hepatocellular cancer (diagnosed by one or more space-occupying lesions seen by imaging methods with typical patterns of HCC or by histology). Liver-related death was defined as any mention of liver disease in part 1 of the death certificate (where the primary cause of death is recorded). The secondary outcome was all-cause mortality (available in supplementary data). All outcome data were gathered by clinical researchers with content expertise, who reviewed clinical records (paper/electronic) in the recruiting centres, and who were blinded to baseline test results. Mortality was ascertained using clinical records and tracing to national death registration and through retrieval of all death certificates. For those patients who failed to attend or were discharged from the original recruiting centre, ethical approval was gained to contact the last recorded family practitioner with a questionnaire survey to ascertain when the patient was last known to be alive and to determine if there had been any liver-related events (questionnaire available from authors on request).
The original study provided baseline data on the ELF test, demographic data such as age, gender, self-reported alcohol consumption and smoking and liver biopsy. For this study, data collected included the date of first liver-related clinical event, death, date when last known to be alive, follow-up status and information on any treatment response that might have affected prognosis (eg, response to treatment with antiviral treatment for hepatitis B and C).
Standard descriptive statistics were used to describe variables. Survival analyses were performed to derive the cumulative probability of survival using Kaplan–Meier curves, censoring at death/last known alive. Statistical significance was tested by the log rank test. A priori the full ELF score range in the cohort was divided into tertiles having equal ranges of score per tertile. This was a pragmatic process as we estimated that the number of outcomes ascertained would populate these meaningfully. The middle tertile was then subdivided into two equal ranges by score, to evaluate whether this further division of ELF score could result in more precise stratification of risk. This was done with clinical usefulness in mind. The four groups had absolute scores of 4.16–8.33 (low), 8.34–10.425 (intermediate 1), 10.426–12.51 (intermediate 2) and 12.52–16.67 (high). Biopsy fibrosis stages were divided into three groups (nil/mild=Ishak 0–1; moderate=Ishak 2–3; and severe/cirrhosis=Ishak 4–6). This was done to derive clinically meaningful groups as expert hepatopathologists advise that classifying biopsies into three groupings rather than seven is more representative of histopathological findings and thus more accurate.
Adjusted HRs of the association of ELF with risks of liver-related outcomes were derived for ELF and biopsy using a Cox proportional hazards model. Adjustment was made for age, gender, primary cause of CLD, baseline alcohol consumption, smoking, treatment response and centre (as a fixed effect). The proportional hazards assumption was checked by examining the log minus log plot (SPSS version 14, SPSS Inc) and Schoenfeld residuals in STATA (StataCorp, 2005. Stata Statistical Software: Release 9). The ELF score was fitted both as the categories above and as a continuous variable. Linearity was assessed by testing if a quadratic expression was required, and by fitting ELF scores as categorical and continuous predictors with the categorical term having no significant improvement of model fit over the simple linear term. Receiver operating characteristic (ROC) curves were derived for ELF and biopsy based on event occurrence at specific follow-up intervals. Six years' follow-up was selected as it offered a balance between sufficient events and minimal loss to follow-up of patients.
Sensitivity analyses were also performed at 5 and 7-year follow-ups. The Hosmer–Lemeshow statistic was used to show the goodness of fit of models to the data, where a non-significant result is indicative of good calibration. ROC curves were compared using the Hanley MacNeil method.11 Two further sensitivity analyses were also conducted which included those patients recruited to the original diagnostic study who had received liver transplants before the start of the study and also of the aetiology of liver disease. Logistic regression at a time-specific point was used to provide an adjusted OR of unit increase in ELF predicting liver-related outcomes adjusting for factors associated with liver outcomes. These were chosen a priori from knowledge of factors associated with possible liver fibrosis development and progression. They included age, sex, alcohol consumption, treatment response and smoking. Other terms were added from the results of data analysis.
ELF scores were available on 457 patients from seven hepatology centres in secondary/tertiary care hospital settings throughout England. Paired ELF scores were obtained and biopsy data coded by a single expert hepatic pathologist using the Ishak system available on 420 subjects. Patient characteristics are presented in table 1.
Sixty-seven per cent of the cohort were male, 43% had hepatitis C and 95% were of White ethnic origin. The median follow-up time (from recruitment to event or ‘last known alive’) was 6.86 years (range 0–9.0; IQR 5.7–7.6). Ascertainment of clinical status and outcome was possible in 92% of patients and all outcome data are presented as a proportion of the inception cohort. The loss to follow-up rate from original recruiting centre was 46% (figure 1). Questionnaire responses from GPs were received for 81% of patients lost to follow-up or discharged by the recruiting centres with only one reporting a clinical decompensation event. Thus outcome data are missing for only 8% of the original cohort. The overall mortality rate was 14% (n=65), of which 60% (n=39) had alcoholic liver disease and 13% (five) were patients with hepatitis C. There were 61/457 (13%) liver-related outcomes (fatal or non-fatal), more than half (61%) of which were in those with alcoholic liver disease (table 2).
Crude unadjusted analyses by Kaplan–Meier plots showed that there was a graded relationship between the baseline ELF score divided into the four groups and liver outcomes, (figure 2a), (log rank test (Mantel–Cox) p<0.001). Histology was also predictive of liver-related outcomes but only when classed in three groups, as nil/mild (Ishak stages 0–1), moderate (Ishak stages 2–3) or severe fibrosis (Ishak stages 4–6) (figure 2b). Differences between tertiles were significant p>0.0001. (Performance of ELF in tertiles and histology as seven stages are reported in the online supplementary data).
ELF was better than liver biopsy at predicting liver-related outcomes. The liver-related outcomes in the highest ELF group (12.52–16.67) were 50% at 2 years while those patients staged as ‘severe fibrosis/cirrhosis’ on biopsy reached 50% liver-related outcomes at 3 years. Eighty-two per cent of patients with an ELF score in the top score group experienced a liver-related outcome during follow-up compared with 46% of those identified on biopsy as having ‘severe fibrosis/cirrhosis’ on biopsy (online supplementary data).
Cox proportional hazards model
For the ELF score the fully adjusted HRs for liver outcomes showed a graded response; compared with the lowest tertile the HR was 5 (95% CI 1.4 to 16.9) in the lower half of the middle tertile, 19.8 (95% CI 5.5 to 71.0) in the upper half of the middle tertile and 76 (95% CI 17.6 to 325.4) in the highest tertile. (Figure 2a: table 3).
For biopsy the fully adjusted HRs for liver outcomes compared with nil/mild fibrosis was 2.4 (95% CI 0.8 to 7.1) for moderate fibrosis, and 8.3 (95% CI 3.3 to 21) for severe fibrosis/cirrhosis (Figure 2b).
Sensitivity analyses conducted to evaluate the effect of alcoholic liver disease as the cause of CLD showed that the ELF test maintained its prognostic accuracy (online supplementary data). A further sensitivity analysis was performed which included patients who had a transplanted liver (n=44) and who had been excluded from the study. This analysis similarly showed that ELF maintained prognostic ability at the same level as observed in the whole cohort.
From logistic regression, the fully adjusted OR for liver outcome at 6 years using ELF as a continuous variable was 2.2 (95% CI 1.7 to 2.9). Adjusting for biopsy the OR became 1.9 (95% CI 1.4 to 2.7), indicating that ELF predicts liver-related outcomes independently of biopsy.
The Hosmer–Lemeshow statistic was not significant (p=0.15 suggesting good calibration) and the C statistic was 0.91 (suggesting very good discrimination of high-risk subjects from low-risk subjects). For biopsy a change of one Ishak stage was associated with a 1.7 (95% CI 1.4, 2.1) times risk of liver-related outcome (Hosmer–Lemeshow=0.196).
From a ROC analysis, the unadjusted area under the ROC curve for predicting liver outcomes for ELF was 0.87 (95% CI 0.81 to 0.92). When comparing ELF and biopsy directly the area under the curve of ELF was greater than biopsy (0.87 (95% CI 0.81 to 0.92) vs 0.81 (95% CI 0.76 to 0.89), though this difference just failed to reach statistical significance (p=0.06). Findings were similar for 5- and 7-year follow-ups (table 4).
When a threshold ELF score of 9.49 was used (Q point on the ROC curve where sensitivity=specificity) the sensitivity of ELF for predicting a liver-related outcome within 6 years was 84% and specificity 81%, the negative predictive value in this cohort was 97% and positive predictive value was 44% with a diagnostic OR of 22. In comparison, biopsy (Ishak 4–6) could predict liver-related outcomes at 6 years with a sensitivity of 70%, specificity 77%, and diagnostic OR of 8 (table 5).
Comparison of ELF with other panels/clinical score
Comparisons were made with the MELD score (MELD=3.8(ln serum bilirubin (mg/dl)) +11.2(ln INR)+9.6(ln serum creatinine (mg/dl))+0.643.12
The ELF test could predict clinical outcomes at 6 years with area under the curve of ELF versus MELD score 0.88 vs 0.74 (p<0.0002). Further paired analyses using ELF compared with other panels was not possible owing to unavailability of data.
The ELF test could predict liver-related outcomes at 6 years in a cohort of patients with CLD with a range of aetiologies. It was at least as accurate as liver biopsy in predicting liver-related outcomes, and the relationship was graded with no evidence of a threshold. A one-point increase in ELF in adjusted models was associated with a twofold increase in risk of clinical outcome.
This study adds to the current evidence base of serum markers predicting clinical outcomes—in particular, it has very high levels of follow-up of patients over 7 years and is the first study in a cohort of patients with mixed aetiology liver disease, which is representative of much hepatology clinical practice.
The good performance of the ELF test may be because serum markers reflect fibrosis in the whole liver rather than 1/50 000th of the organ (as does a biopsy sample) or, alternatively, because the ELF test evaluates the impact of liver fibrosis on liver function as well as the architectural damage associated with histological fibrosis and cirrhosis. When used alongside biopsy, the ELF test may offer additional prognostic value over and above that provided by the biopsy alone as shown by the HR for ELF in predicting liver-related outcomes remaining high (1.7) when adjusted for biopsy, with ELF possibly reflecting ongoing pathophysiological processes and functions that biopsy cannot capture.
Direct comparisons of ELF with other panels of markers were limited by the availability of analyte data. We were only able to compare the ELF test with MELD and found that the ELF test had a better performance. Further studies to compare ELF directly with other panels in larger populations and also to evaluate whether simple markers can improve on ELF performance in prediction of clinical outcomes are needed. Cost-effectiveness studies are urgently needed to help to determine the utility of these biomarkers to patients, clinicians and providers of healthcare.
There is scant previous documentation on the performance of serum markers in predicting clinical outcomes,13–14 with only four studies and one overview published in the past 5 years.15–19 Three of these studies were from the same research group, and evaluated the performance of the FibroTest panel (FT), in predicting clinical outcomes in hepatitis C, hepatitis B and alcoholic liver disease. These appear promising but loss to follow-up in the hepatitis C study was high at 48%.15 In hepatitis B the area under the curve for FT predicting survival without complications was higher than viral load or alanine aminotransferase, the usual clinical practice.16 For patients with alcoholic liver disease the area under the ROC curve for predicting clinical outcomes was less than in patients with hepatitis B and C, but not significantly different from biopsy.17 In all these studies there were no significant differences between the lowest and middle FT score groups in prediction of liver-related outcomes on Kaplan–Meier analysis, but significant differences between the highest FT score group and the middle/lowest score groups.
Similar Kaplan–Meier analysis with ELF test scores divided into four showed a significant difference between each of the groups with respect to liver-related outcomes, permitting a precise prediction of clinical outcomes. Histology and other panels—in particular, APRI and Forns Index, were compared with FT in the above studies and found to perform significantly less well than FT.15–17 In patients with alcoholic liver disease and hepatitis B, APRI was less predictive of survival without liver-related complications than in hepatitis C.16 17 A further study has evaluated the performance of APRI in predicting HCC and mortality outcomes in a cohort of 778 patients with chronic hepatitis C treated with antiviral agents and found that APRI performance was good but that performance was diminished for untreated patients.19 All of these studies suggest that biopsy is good at predicting clinical outcomes but that serum markers are significantly better.
A further analysis evaluating ELF panel performance in predicting all-cause mortality found the AUC at 6 years was significantly greater at 0.82 (95% CI 0.75 to 0.88) compared with biopsy 0.70 (95% CI 0.62 to 0.79) (p=0.0004) (reported in the online supplementary data). The reasons for this are unclear. It might be that ELF, which includes biomarkers involved in extracellular matrix formation and breakdown, is reflecting additional morbidity in the body that contributes to the mortality rates, or multisystem disorders such as cardiovascular disease that may affect the liver.
Strengths and limitations of the ELF follow-up study
This large study was conducted in patients representative of those referred to hepatologists for investigation of liver disease by biopsy in seven UK hepatology centres in secondary and tertiary settings, and reflected general hepatology practice. Direct use of clinical outcomes rather than surrogate measures such as liver fibrosis on biopsy, make this study clinically relevant for patients and doctors. ELF was measured in one research laboratory with high-quality assurance, and biopsy data were reported by one experienced pathologist. The findings are strengthened by achieving almost complete follow-up for mortality. The ELF score generates a numerical continuous variable that may be used to quantify the severity of fibrosis and risk of complications. In comparison, liver biopsy histological staging represents a categorical descriptive variable. Although numbers are commonly assigned to these categories, pathologists emphasise that it is inappropriate to use these scores as numerical measures of fibrosis severity.20
Outcomes were evaluated using data collected by three data collectors all of whom had a clinical background with hepatology expertise. Data were not double collected owing to time and cost pressures and there may have been some interobserver error, although attempts were made to minimise this by training, and use of a standardised data collection extraction form. Minor episodes of decompensation or bleeding not requiring hospital or primary care attendance would have been missed, as would those unrecorded in the clinical records. Presence of varices was not included as an outcome as the variation in practice of endoscopy in the centres might have led to ascertainment bias. There may be inaccuracies in liver disease assignment on death certificates, but ELF was also predictive of all-cause mortality. The cohort may not have been a true inception cohort21 as subjects were not recruited at the same stage of liver disease and there was variation in the interval between first presentation and liver biopsy. However, analysis at one centre showed that most patients were recruited within 4 months of presentation to the hepatologist. Entry to the study was at the point of biopsy, which was a clinical decision taken by the hepatology specialist. This was decide on pragmatic grounds and might have introduced a degree of spectrum bias. The study subjects conformed to strict inclusion and exclusion criteria of the specialist centres and as a result generalisability of the findings must be confirmed in unselected patients.22 23 Few patients from ethnic minorities were included in the study so further research in these populations may increase generalisability of the findings. ROC analysis for biopsy should be regarded with some caution as biopsy stages have been treated as if they were continuous variables with seven thresholds, rather than ordinal categorical variables. This is an acceptable method of directly comparing ELF and biopsy but this pragmatic approach may overestimate the diagnostic performance of biopsy.
Access to a simple non-invasive test which has an ability to predict outcome at least as well as biopsy would be a valuable addition to the management of patients with liver disease—assessing prognosis and monitoring changes in disease severity where repeated use of an invasive test with established morbidity/mortality is often unacceptable to patient and doctor. It might also be used when patient preference or circumstances favour a non-invasive method of evaluating prognosis. The ELF test can identify patients more likely to have serious clinical outcomes within 6 or more years with a high degree of accuracy. These patients may then be offered surveillance for oesophageal varices and HCC, and could be better prepared for transplantation, thus reducing morbidity and mortality.24 25 While the ELF test predicted liver-related outcomes in this study more accurately than biopsy, non-invasive tests for liver fibrosis will not replace liver biopsy in the evaluation of CLD, and it can be expected that both tests will have roles in the management of patients with CLD. The wealth of information obtained from careful examination of a liver biopsy specimen can aid in diagnosis of aetiology, assessment of inflammatory activity, duration of disease, detection of comorbidity and in staging fibrosis. However, if confirmed the findings of this study suggest that ELF could be used effectively to prognosticate in CLD.
In the future the ELF test may be used in evaluating the impact of treatment directed at the underlying causes such as viral hepatitis, and in the development of new treatments such as anti-fibrotic drugs. Research is now needed to show that changes in ELF correlate with changes in clinical outcome to fully establish its role as a surrogate. It will be important to evaluate the role of ELF in primary care for more appropriate selection and referral of people with CLD risk factors and abnormal liver function tests. This may be invaluable given the burgeoning epidemics of hazardous drinking, obesity and hepatitis C.26–29 Care needs to be taken in extrapolating these data to primary care as currently no studies have evaluated performance of ELF in this setting.
In summary, the ELF score can independently predict liver-related clinical outcomes and all-cause mortality in patients with CLD, at least as well as liver biopsy.
The authors would like to acknowledge the help given to this study by the staff at the centres participating in the prognostic study—in particular, Elsbeth Henderson, Linda Knowles, Annie Lorton, Devina Mallon, Katrina O'Donnell, Jeanne Prosser, Wendy Smeeton, Madeleine Thyssen, Susan Williams. Thanks also to the Southampton Wellcome Trust Clinical Research Facility, The UCLH/UCL Comprehensive Biomedical Research Centre and the National Institute for Health Research for support of staff and facilities in the conduct of this study.
Review history and Supplementary material
Linked articles 214932.
Funding Medical Research Council UK provided the salary costs for JPs via a training fellowship. Siemens Healthcare Diagnostics provided financial support for travel and subsistence for researchers collecting data from each centre.
Competing interests Professor William Rosenberg has received research support from Bayer and financial support for lecturing from Siemens Diagnostics.
Ethics approval This study was conducted with the approval of the South West Multi-centre Research Ethics Committee. Study reference: MREC/98/6/08.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.