Prediction of disease severity in patients with acute pancreatitis is an important clinical goal. An accurate predictive tool allows early identification of those patients who would require treatment in a high dependency or intensive care unit and transfer to a referral centre [1]. In addition, it allows the selection of patients who definitely need early enteral tube feeding and possibly other (yet to be determined) early treatments. The research on predictors of severity in acute pancreatitis has traditionally been based on the premise of quest for an ideal predictive tool. The characteristics of such a tool are well established: it would be available on admission to hospital, be easily repeated for monitoring purposes, be quick and reproducible [1].

The pioneering work, which investigated the relationship between 43 early measurements and “overall morbidity and mortality” in 100 patients with acute pancreatitis, was published by Ranson nearly four decades ago [2]. Since then, the scientific framework of the concept of prediction in acute pancreatitis has been unquestionable. Moreover, the number of followers has been constantly growing with the research on predictors of severity being arguably the most prolific area in the literature on acute pancreatitis for many years. A recent systematic review of the literature found 184 original studies that reported on 196 different predictors of severity in acute pancreatitis [3]. Strikingly, 144 of 184 (78%) studies reported a statistically significant result for at least one predictor. It is also worth noting that the search was limited to studies indexed only in MEDLINE and published only in English. Further, it also only focused on novel (non-routine) molecular markers, which effectively means that many routine markers (urea, creatinine, lactate dehydrogenase, C-reactive protein, hematocrit, blood gases, etc.) as well as several modern computer-based predictive tools in acute pancreatitis (artificial neural network, kernel-based modelling, linear discriminant analysis [46]) were not counted. Collectively, these indicate that the literature is replete with dozens, if not hundreds, of presumably effective ways to predict the severity of acute pancreatitis, but it appears that very few have entered clinical practice. There are several legitimate reasons for this lack of penetration: the predictive tools are often complex, cumbersome, expensive, and not available commercially [7]. But the most important reason is that they are notoriously inaccurate when it comes to prediction of an individual patient’s severity.

In this issue of the journal, Dr. Hong et al. [8] report on a novel computer-based predictive tool in acute pancreatitis—classification and regression tree (CART) analysis. CART is a non-parametric technique that can select from among a large set of variables, those that individually, or in combination, best predict the endpoint of interest by splitting the initial cohort sequentially into smaller subsets. The method has the potential to become a valuable tool in the field of acute pancreatitis because it can not only assess which individual variables are most accurate in predicting the severity but also define their optimal combination and order (so-called “decision tree”). The study by Dr. Hong and colleagues advocates three variables for severity prediction, namely blood urea nitrogen, pleural effusion, and serum calcium, and suggests a particular order in which to use them. It is reported that the CART model has a high discriminatory power (the area under the ROC curve is up to 0.86) and correctly predicts severity in 89% of patients with acute pancreatitis.

The study has a number of strengths. The total sample size of 420 patients recruited over less than 2 years is respectable. All the variables included in the model are routinely collected as part of clinical management, usually on admission to the hospital. Commendably, there were several efforts to mitigate potential biases pertinent to the retrospective nature of the study: the study population was constrained to patients admitted within 72 h of the onset of symptoms, all the transferred patients were excluded, and there was a random assignment to the training and validation cohorts.

Although the results are intriguing, several caveats have to be mentioned. First, it may well be that the constructed CART model reflects the structure of the training cohort too closely or, in statistical terms, the model is overfitted. This is evidenced by the fact that the discriminatory power for the validation cohort was higher than that for the training cohort. The important implication is that the model may be less accurate when applied in other settings. Second, while the discriminatory power of the CART model was significantly higher than that of the APACHE II score, there was no significant difference between the CART model and the logistic regression model. In fact, the latter was slightly more accurate. Third, an astute reader may notice that the diagnostic accuracy of a single variable (serum calcium) included in the model was even higher than that of the overall CART model. Despite the foregoing remarks, CART is a potentially useful addition to the growing arsenal of computer-based predictive tools in acute pancreatitis. But the important question is whether this (or any other severity prediction tool) makes the prediction of severity more accurate in an individual patient?

The key to enhancing the accuracy of predictive tools in an individual patient with acute pancreatitis is to correctly identify what we aim to predict. In other words, what endpoints should be used for the purpose of predicting the severity? The systematic review mentioned above showed that there was a remarkable heterogeneity between the studies in this regard. The endpoints for the prediction of severity included multiple factor prognostic scores (APACHE II ≥8 and/or Ranson ≥3), death, local and/or systemic complications (as defined by the Atlanta symposium), Japanese criteria of severity, organ failure, pancreatic necrosis, infected pancreatic necrosis, length of hospitalisation, ICU admission, and need for surgery [3].

It is argued that the accurate prediction of severity requires that the endpoint for the prediction be causally associated with severity. While, undoubtedly, there are many studies in the literature that demonstrate a statistically significant association between all the entities mentioned above and the severity of acute pancreatitis, one should bear in mind that the majority of observed statistical associations are non-causal. This means that the observed association between two variables might be due to other measured or unmeasured variables affecting the results. This is known as the “third variable” problem [9]. In particular, the association between APACHE II score ≥8 and mortality is true but it is not causal. It turns out that there is a third variable, namely organ failure, that is associated with APACHE II score (PaO2, creatinine, and arterial pressure are the components of the score and the criteria for diagnosing of respiratory, renal, and cardiovascular failure, respectively) and that causes death. This is one of the reasons for why modern prognostic scores can, on average, correctly predict the severity in only 60–80% of patients [1]. Moreover, a recent randomised controlled trial from a well-known group with an interest in acute pancreatitis employed APACHE II score ≥8 to enroll patients with predicted severe course of acute pancreatitis and found that actual severe acute pancreatitis (as defined by the Atlanta symposium) occurred in only 46% [10]. That is inferior to tossing a coin (and definitely more labor- and time-consuming)!

The “third variable” problem also takes place when the need for ICU admission and surgery are considered as the endpoints for the prediction of severity. While both of them are indeed associated with mortality, these associations are non-causal. The actual causes of death in patients with acute pancreatitis are persistent organ failure and infected pancreatic necrosis. And they are also the main reasons for ICU admission and surgery, respectively (Fig. 1).

Fig. 1
figure 1

Factors associated with mortality in acute pancreatitis. Solid arrow depicts a causal association, dashed arrow a non-causal association; POF persistent organ failure, IPN infected (peri)pancreatic necrosis, ICU intensive care unit

One might believe that mortality itself is an incontestable endpoint for the prediction of severity. However, it is argued that mortality should not be used for this purpose as it is already used to define (and compare) categories of severity on population level [11]. The use of mortality for both defining and predicting the severity is a circular argument inevitably resulting in a misclassification error. Moreover, unlike mortality, severity of acute pancreatitis is not a dichotomous event but rather a continuous spectrum that can, on sound clinical and epidemiological grounds, be classified into four categories (Table 1) [11]. On the individual level, which is certainly the most important one in routine clinical practice, patients at each end of the spectrum can decease. The only difference is that the probability of death varies: it grows incrementally in patients from mild through moderate and severe to critical acute pancreatitis. And this probability is determined by the presence of local and systemic factors that are causally associated with mortality in acute pancreatitis—infected (peri)pancreatic necrosis and persistent organ failure, respectively [12]. Thus, the two should be considered the optimal endpoints for the prediction of severity in acute pancreatitis.

Table 1 The new determinants-based classification of severity of acute pancreatitis [11]

In the last four decades, there has been no lack of attempts to define what constitutes the right predictive tool but virtually no efforts to establish what constitutes the right endpoint for the prediction of severity. It is invigorating to see that the first studies focusing on clinically and epidemiologically sensible endpoints have emerged in the literature [13, 14] and the future of this area of research looks bright. Provided the cart (CART or another predictive tool) is hitched to the right horse (endpoint).