Development and validation of three machine-learning models for predicting multiple organ failure in moderately severe and severe acute pancreatitis

Background Multiple organ failure (MOF) is a serious complication of moderately severe (MASP) and severe acute pancreatitis (SAP). This study aimed to develop and assess three machine-learning models to predict MOF. Methods Patients with MSAP and SAP who were admitted from July 2014 to June 2017 were included. Firstly, parameters with significant differences between patients with MOF and without MOF were screened out by univariate analysis. Then, support vector machine (SVM), logistic regression analysis (LRA) and artificial neural networks (ANN) models were constructed based on these factors, and five-fold cross-validation was used to train each model. Results A total of 263 patients were enrolled. Univariate analysis screened out sixteen parameters referring to blood volume, inflammatory, coagulation and renal function to construct machine-learning models. The predictive efficiency of the optimal combinations of features by SVM, LRA, and ANN was almost equal (AUC = 0.840, 0.832, and 0.834, respectively), as well as the Acute Physiology and Chronic Health Evaluation II score (AUC = 0.814, P > 0.05). The common important predictive factors were HCT, K-time, IL-6 and creatinine in three models. Conclusions Three machine-learning models can be efficient prognostic tools for predicting MOF in MSAP and SAP. ANN is recommended, which only needs four common parameters. Electronic supplementary material The online version of this article (10.1186/s12876-019-1016-y) contains supplementary material, which is available to authorized users.


Background
Acute pancreatitis (AP) is a common and serious inflammatory disorder that may result in severe complications such as systemic inflammatory response syndrome (SIRS), organ failure, etc. The 2012 revised Atlanta classification stratified AP into mild acute pancreatitis (MAP), moderately severe acute pancreatitis (MSAP), and severe acute pancreatitis (SAP) based on the presence of persistent organ failure and complications [1]. If organ failure cannot be resolved within 48 h, SAP will develop. Single organ failure may persist to the late phase in AP, even affecting other organs [1]. The lung is the most commonly affected extrapancreatic organ in AP and this is frequently followed by acute kidney injury and cardiovascular system injury [2]. The main factor determining clinical outcome is the presence and duration of multiple organ failure (MOF) [1,3], and the mortality of AP complicated by MOF is higher than 20% [4]. Since no specific drug is available to prevent AP developing into MOF, which is an extremely serious complication, it is pivotal to identify patients at high risk of MOF in an early phase, so intensive care and appropriate intervention can be provided to prevent disease progression.
Several single parameters such as C-reactive protein (CRP) and complex scores, including the Acute Physiology and Chronic Health Evaluation (APACHE) II score and Ranson score, are available to assess the severity of AP. However, their practical application for predicting the risk of MOF in an early phase is limited, since the CRP value can only indicate the inflammation state 72 h after the onset of symptoms; in addition, these scoring systems are cumbersome and require that some indexes are recorded dynamically [1]. At present, no single parameter or system is capable of predicting MOF in AP accurately. Therefore, it is essential to develop and validate a prognostic tool that can reliably predict MOF in the early phase.
MOF in SAP is thought to be a consequence of many factors, including uncontrolled systemic inflammation, microcirculation disturbance, coagulation dysfunction, and so on. SAP is often accompanied by substantial changes in the coagulation system [5], and coagulation-inflammation interactions occur in SAP [6]. Therefore, we speculated that inflammatory and coagulation markers could be helpful for predicting the risk of MOF. Moreover, blood urea nitrogen (BUN) is associated with mortality in AP [7] and creatinine is shown to be a marker of pancreatic necrosis [8], so they can probably predict MOF in AP as well.
It is well known that machine-learning techniques such as support vector machine (SVM), logistic regression analysis (LRA) and artificial neural networks (ANN) provide new methods for predicting clinical outcomes and complications at an individual level, and these have already been applied to clinical studies [9]. The aim of this study was to develop a computational tool for predicting the risk of MOF in MSAP and SAP from a larger set of parameters that include blood volume, inflammatory, coagulation and renal function markers, which have been shown to be different between patients with and without MOF. Five-fold cross-validation was used to test the predictive ability of SVM, LRA and ANN, and we compared the predictive efficiency of the three models and APACHE II score.

Patients and data collection
This retrospective cohort study was performed in three affiliated hospitals (Daping Hospital, Southwest Hospital and Xinqiao Hospital) of Army Medical University, Chongqing, China. Data of MSAP and SAP patients were collected from July 1st, 2014 to June 30th, 2017. Diagnostic criteria of MSAP and SAP followed the consensus revision of the Atlanta classification [1]. The patients with conditions such as pregnancy, pancreatic cancer, liver cirrhosis, coagulation system disease and incomplete laboratory examinations and those who were transferred after initial treatment or the time from onset to hospital admission exceeded 24 h were excluded from this study. According to the guideline of management of AP [10], all patients underwent standard medical treatment such as early aggressive hydration, antibiotics for infection, enteral nutrition, and so on. The Modified Marshall score was used to evaluate organ failure at 48 h after admission to determine whether they had delayed MOF including the pulmonary system, renal system and cardiovascular system failure. The study protocol was reviewed and approved by the Ethics Committee of Army Medical University and all methods were performed in accordance with the relevant guidelines and regulations. As a retrospective observational study and all subjects were anonymized, informed consent was not required.
Demographic and clinical features, including gender, age, body mass index (BMI), history of hypertension, diabetes and etiology (biliary, hypertriglyceridemia, alcoholic, et al) were recorded. Laboratory data, a total of twenty-three parameters obtained on admission, are shown in Additional file 1. Meanwhile, we calculated the admission APACHE II score.

Statistics
Descriptive data were presented as median and interquartile ranges for skewed distribution variables or mean ± standard deviation for variables with a normal distribution. Categorical data were presented as proportions. The Pearson chi-square test was used to compare categorical variables and multiple rates. A t-test and the nonparametric Mann-Whitney test were used to compare normal and skewed distribution variables, respectively. Statistical analyses were performed using SPSS 23.0 software. Comparison for multiple ROC curves of three models and APACHE II score were performed using MedCalc software.

Machine learning model
SVM, LRA and ANN were performed with Matlab 2014. The selected parameters entered into SVM, LRA and ANN were variables that had a significant difference (P < 0.05) in univariate analysis of the whole cohort. As output, a binary variable was used with one category representing some patients with MOF (1) and the other representing patients without MOF (0). All original values were used for LRA, while they were normalized, ranging from − 1 to 1, for SVM and ANN. A feature selection process was used to incrementally choose the most representative features and increase the relevance and reduce redundancy of prediction. In an attempt to prevent overfitting, given the limited training cohort available, and to maximize generalizability, we used fivefold cross-validation to train classifiers. The whole dataset was randomly divided into five roughly equally numbered, non-overlapping subsets, each called a fold. Then, four of the five folds were used as the training set, and the remaining one as the validation set. Using each of the five folds as validation set, the above process was repeated 10 times. We set two parameters in SVM with radial basis function (C = 1, gamma = 0.5). Finally, the final receiver operating characteristics (ROC) curve and averaged area under the curve (AUC) value of these three classifiers for the 10 trials were obtained to assess the classification algorithm.
With predicted the pseudo-probability from SVM, LRA and ANN, and obtained the cutoff value from ROC curve. Then, sensitivity (SEN), specificity (SPE), false positive rate (FPR), false negative rate (FNR), positive predictive value (PPV), negative predictive value (NPV) and accuracy were calculated from these three models. We compared these indexes of these models and APA-CHE II score.

Baseline characteristics
Two hundred and 63 patients suffering from MSAP and SAP were enrolled in this study. The characteristics of the included patients with and without MOF were summarized in Table 1. The characteristics of the whole cohort of patients are shown in Additional file 2. Seventy-two (27.38%) patients suffered from MOF. Consistent with previous reports [11,12], biliary tract disease (40.30%) was the most common cause of AP, and hypertriglyceridemia came second (34.98%). One hundred and 47 of all the patients were obese (BMI ≥ 25 kg/m2) [13].
Parameters including hematocrit (HCT), coagulogram, thrombelastogram (TEG), inflammatory markers, renal function and the APACHE II score differed significantly between patients with and without MOF (P < 0.05). Unexpectedly, no statistical differences were observed in the levels of white blood cell (WBC) count and calcium ion (Ca 2+ ) between the two groups (P > 0.05). Meanwhile, no differences were observed in gender, age, history of hypertension and diabetes, etiology and BMI between the two groups (P > 0.05).

SVM prediction
Sixteen parameters that differed significantly between the two groups (P < 0.05) were used for feature selection by SVM, to find an optimal combination of features for predicting MOF in MSAP and SAP. With the increase in the number of selected features, the acquired combination of features became variable. After feature selection, the combination of nine features, namely HCT, fibrinogen (FIB), D-dimer, reaction time (R-time), kinetic time (K-time), coagulation index (CI), CRP, interleukin-6 (IL-6) and creatinine, obtained the highest AUC value, making it the optimal combination. If other features were added to this basis, the AUC value was reduced ( Table 2) . The AUC values of the optimal combination, single feature, namely BUN and all features were 0.840 (95% confidence interval (CI): 0.783-0.896), 0.702 (95% CI: 0.625-0.778) and 0.816 (95% CI: 0.755-0.876), respectively (Fig. 1a).

LRA prediction
Similar to SVM, the sixteen features were entered into LRA. However, unlike SVM, the optimal combination of features for predicting MOF only needed six features, including HCT, activated partial thromboplastin time (APTT), FIB, K-time, IL-6 and creatinine. If other features were added to the basis of these six features, AUC would again be reduced ( Table 3). The AUC values of the optimal combination, single feature, namely IL-6 and all features were 0.832 (95% CI: 0.773-0.890), 0.709 (95% CI: 0.642-0.775) and 0.783 (95% CI: 0.714-0.853), respectively (Fig. 1b).

ANN prediction
As for SVM and LRA, the same sixteen features were entered into ANN. The optimal combination of features for predicting MOF only required four features, namely HCT, K-time, IL-6 and creatinine. If features were added to the basis of these four features, AUC would be reduced as well ( Table 4). The AUC values of the optimal combination, single feature, namely IL-6 and all features were 0.834 (95% CI: 0.777-0.890), 0.705 (95% CI: 0.639-0.772) and 0.789 (95% CI: 0.723-0.856), respectively (Fig. 1c). Thus, HCT, K-time, IL-6 and creatinine were the common important predictive factors for MOF in these three optimal combinations of features obtained by SVM, LRA and ANN.

Comparisons of three models and APACHE II score
We compared the optimal combinations of features resulting from SVM, LRA and ANN and the APACHE II score. The evaluating indexes for these three models and APACHE II score for predicting MOF in MSAP and SAP are shown in Table 5. No significant differences were observed among these three models in SEN, FNR, PPV, NPV and AUC value (P > 0.05). The SPE, FPR and predictive accuracy of SVM was superior, but the overall predictive performance of these three models and APA-CHE II score was not different (P > 0.05) (Fig. 1d).

Discussion
MOF is a serious systemic complication of AP, leading to high mortality [1]. The 2012 revised Atlanta classification of AP stated that organ failure lasting for more than 48 h is the key determinant of severity [1]. Meanwhile, organ failure is a risk factor for infected pancreatic necrosis and directly increases mortality to 30% [14]. Cardiovascular system failure and respiratory system failure account for 21.1 and 12.3% of total death in AP, and kidney failure and disseminated intravascular coagulation account for 7.0% [15]. Although tryptophan metabolism inhibition is a novel therapeutic blockade for MOF in animal model, no specific therapies are available that are capable of protecting individuals against MOF induced by AP [16]. Therefore, it is extremely important to predict the risk of MOF early. Presently, investigation is still going on for a convenient and practical tool for MOF prediction. Machine-learning techniques have extraordinary information analyzing capabilities and can select the most meaningful features to construct a model; they are novel tools in medical research and have become recognized as such by more and more medical professionals recently [17]. Here, we applied three types of machine-learning algorithms (SVM, LRA and ANN) to the data for AP to develop a convenient tool for predicting the risk of MOF in the medium or late phase of pancreatitis. Clinical data on routine blood test, coagulogram, TEG, inflammatory markers, and renal function were collected and used for machine-learning algorithms. Finally, these three models all yielded satisfactory predictive performance and each produced an optimal combination of features as predictive model. HCT, K-time, IL-6 and creatinine were common important predictive factors for MOF selected by SVM, LRA and ANN. An elevated HCT is associated with hypovolemia, while decreased HCT suggests hemodilution. It was reported that HCT ≥ 44% could predict persistent organ failure [18], while our research showed that a decreased HCT is correlated with MOF. Therefore, we speculate that a significantly elevated or decreased HCT could indicate a poor prognosis. IL-6 is an effective indicators for the degree of inflammatory response in AP [19]. Plasma IL-6 levels are markedly increased in pancreatitis animals [20], and inhibition of IL-6 alleviates the formation of edema, inflammatory cell infiltration, and necrosis in cerulein-induced AP [21]. IL-6 is a pro-inflammatory cytokine and regulates leukocyte recruitment through the IL-6 trans-signaling-dependent STAT3 pathway in pancreatic acinar cells. It links local inflammation in the pancreas to systemic inflammation, and even to lethal extrapancreatic organ damage [22]. IL-6 levels are significantly higher in patients with acute lung injury compared with MAP patients [22]. Our results show that patients with MOF had higher levels of IL-6 than those without MOF and demonstrate that IL-6 plays an important role in predicting the risk of MOF.
AP patients with organ failure were reported to have higher prothrombin time (PT) and APTT levels than those without organ failure, but PT and APTT were not able to independently predict organ failure in a multivariate analysis [23]. Here, patients with MOF had higher PT, APTT, thrombin time (TT), and D-dimer and lower FIB levels, suggesting that the coagulation dysfunction was more serious in patients with MOF. However, none of these coagulogram parameters were entered into the three predicted models for MOF in pancreatitis simultaneously.
K-time, a parameter in TEG which is a comprehensive examination reflecting coagulation state, like alpha (α) angle mainly indicates FIB level, but also can be influenced by platelet function to a small extent. Other parameters in TEG included R-time, which indicates the role of clotting factors, maximum amplitude (MA), which indicates the number and function of platelet, and CI indicating the overall coagulation status [24]. Prolonged R-time and K-time suggest a state of hypocoagulation in pancreatitis patients with MOF, while α angle, MA and CI would Entry in boldface showed highest AUC be reduced. Here, K-time was demonstrated to be vital in these three models. The reasons that K-time can predict MOF are the following: firstly, one of the consequences of local inflammation is vascular injury within the pancreas, leading to endothelial cell activation and damage, increased vascular permeability, leukocyte adhesion and migration, and activation of the coagulation system [25]. Secondly, some clotting factors concentrations alter due to activation of the coagulation system; for example, concentrations of serum tissue factor and von Willebrand factor increase in SAP [26,27], and this could significantly predict acute lung injury [27]. This results in the hypercoagulable state and thrombotic complications including thrombosis and gangrene observed in some SAP patients [28]. Then, the consumption of large amounts of clotting factors leads to a hypocoagulable state. In turn, coagulation dysfunction aggravates inflammation because thrombin promotes the production and release of pro-inflammatory cytokines, particularly IL-6 [28]. Therefore, coagulation dysfunction reflects the severity of AP, and the positive feedback relationship between coagulation and inflammation is the reason that K-time could predict MOF in MSAP and SAP. The serum creatinine level helps to predict organ failure in SAP if it is higher than 110 μmol/L [29]. Our results showed levels of creatinine in patients with MOF were higher than in patients without MOF. Therefore, creatinine, an essential indicator of renal function, is AP patients with hyperlipidemia had a higher mortality rate, worse prognosis and higher risk of local complications [30], because elevated level of triglyceride and free fatty acids lead to toxic effects and are essential risk factors for pancreatic acinar cell damage [31]. Diabetes and hypertension have been reported to increase the risk of AP [32] and could be predictors of SAP [33]. Age and BMI are recognized factors to assess the severity of AP initially [10]. However, AP's etiology, such as hyperlipidemia, diabetes or hypertension, age and BMI, were not Entry in boldface showed highest AUC Entry in boldface showed highest AUC capable of predicting MOF in MSAP and SAP in this study, probably because these parameters are less related to MOF than inflammatory and coagulation parameters.
Here, we made use of machine learning to predict the risk of MOF induced by pancreatitis. In one study, ANN was able to predict the incidence of portosplenomesenteric venous thrombosis in AP, with an AUC value of 0.849. However, that was a small-sample research analyzing only 11 parameters [34]. We conducted a twostep feature selection strategy to develop a superior prediction model. The first step eliminated a great number of unrelated data. Then, five-fold cross-validation was used to test the predictive ability of the three models investigated to achieve a reliable and stable predictive model.
As for our study, it is very convenient to get the predicted probability for MOF of an individual, which is superior to complicated score systems such as APACHE II score. Secondly, compared to traditional statistical methods, SVM, LRA and ANN are better at analyzing nonlinear relationships between various biochemical markers and MOF. In addition, these three models are practical, since the parameters used in the three models are well established in routine clinical work. We recommend the ANN model, which only needs four parameters to get satisfactory AUC values, as well as SVM, LRA and the APACHE II score. Moreover, we note that combining coagulation and inflammation parameters has great potential for predicting the risk of MOF, confirming the effect of coagulation dysfunction in the pathogenesis of MOF induced by AP.

Conclusions
Three convenient and practical models which can predict the risk of MOF of individual AP patients based on SVM, LRA and ANN were developed and validated. HCT, K-time, IL-6 and creatinine play a significant role in these models. All of the parameters in the three models are well established in routine clinical work, so SVM, LRA and ANN could be promising tools for predicting MOF in MSAP and SAP patients in the clinical practice.

Additional files
Additional file 1: Table S1. Laboratory data obtained on admission of all patients. (DOC 36 kb) Additional file 2: Table S2. Baseline characteristics in the whole cohort of patients. APACHE II score, Acute Physiology and Chronic Health Evaluation II score; BMI, body mass index; MOF, multiple organ failure.