Skip to main content

Clinical Data based XGBoost Algorithm for infection risk prediction of patients with decompensated cirrhosis: a 10-year (2012–2021) Multicenter Retrospective Case-control study



To appraise effective predictors for infection in patients with decompensated cirrhosis (DC) by using XGBoost algorithm in a retrospective case-control study.


Clinical data were retrospectively collected from 6,648 patients with DC admitted to five tertiary hospitals. Indicators with significant differences were determined by univariate analysis and least absolute contraction and selection operator (LASSO) regression. Further multi-tree extreme gradient boosting (XGBoost) machine learning-based model was used to rank importance of features selected from LASSO and subsequently constructed infection risk prediction model with simple-tree XGBoost model. Finally, the simple-tree XGBoost model is compared with the traditional logical regression (LR) model. Performances of models were evaluated by area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity.


Six features, including total bilirubin, blood sodium, albumin, prothrombin activity, white blood cell count, and neutrophils to lymphocytes ratio were selected as predictors for infection in patients with DC. Simple-tree XGBoost model conducted by these features can predict infection risk accurately with an AUROC of 0.971, sensitivity of 0.915, and specificity of 0.900 in training set. The performance of simple-tree XGBoost model is better than that of traditional LR model in training set, internal verification set, and external feature set (P < 0.001).


The simple-tree XGBoost predictive model developed based on a minimal amount of clinical data available to DC patients with restricted medical resources could help primary healthcare practitioners promptly identify potential infection.

Peer Review reports


The natural history of cirrhosis is characterized by an asymptomatic compensated phase followed by a decompensated phase, marked by the development of overt clinical signs, the most frequent of which are ascites, bleeding, encephalopathy, and jaundice [1,2,3]. Patients with decompensated cirrhosis (DC) are critically ill with high mortality. A study has shown that, compared with compensated cirrhosis, the annual mortality rate of patients with DC reaches 20%, which is much higher than the 7% of patients with compensated cirrhosis [4]. At the same time, patients with DC have more complications, and infection is the most common complication [5]. There are many kinds of infection caused by cirrhosis, such as spontaneous bacterial peritonitis (SBP) [6, 7], urinary system infection [8], and spontaneous bacteremia [9, 10]. Infection is also an important inducing factor of severe complications such as upper gastrointestinal bleeding, hepatic encephalopathy, and hepatorenal syndrome, and is one of the main causes of death of patients with advanced liver cirrhosis [11,12,13]. Over the past few decades, various cohort studies have evaluated SBP-related in-hospital mortality. From December 1984 to February 1989, the Liver Unit at the University of Barcelona Hospital Clinic reported a 38% in-hospital mortality in 185 consecutive cirrhotic patients with SBP [14]. In another 10-year cohort study (from 1988 to 1998), Maryland hospitals reported that 112 of 343 patients with SBP died in the hospital, with a mortality rate of 32.6% [15]. Thus, patients with DC complicated with infection usually have a poor prognosis. Therefore, identifying the risk factors of DC complicated with infection and constructing the prediction model are of great significance for improving the prognosis quality and reducing the risk of mortality in DC Patients.

As an artificial intelligence, machine learning algorithm has been applied in the field of disease prediction and diagnosis [16,17,18]. Classical machine learning algorithms and models include decision tree model and integration tree model, among which support vector machines (SVM) [19] and neural network models (NNs) [20] are more commonly used, while XGboost is the most commonly used integration tree algorithm [21]. Among many machine learning algorithms and models, logistic regression (LR) is more suitable for processing linear variables, while XGboost, multilayer perceptron (MLP), random forest (RF), naive bayes (NB) and SVM have strong nonlinear variable processing capabilities [22,23,24]. In addition, XGboost has become one of the most successful algorithms in machine learning competitions, and has been widely used and achieved good results.

Kim et al. developed 55 machine learning models (RF, NNs, XGBoost, generalized linear model, etc.) to predict the needs of patients with COVID-19 for intensive care, and found that XGBoost model showed the highest recognition performance. The area under the receiver operating characteristic curve (AUROC) of XGBoost model in the development group is 0.897, and that in the validation group is 0.885. This model can effectively predict the demand for intensive care of patients with COVID-19 [25]. Huang et al. used the traditional Cox proportional risk model and three machine learning models to construct and screen the best recurrence prediction model after resection of hepatocellular carcinoma for early monitoring and identification of high-risk patients with recurrence. The results showed that in the internal validation set, XGBoost model obtained the best discrimination with a C index of 0.713, which affirmed the value and role of XGBoost model in prediction [26].

Although the importance of XGBoost in clinical decision-making has been gradually recognized by clinicians. However, its value in predicting infection in patients with DC has not been reported. Therefore, we designed this study to develop an XGBoost model combining demographic characteristics, etiology, complications, and laboratory indicators to predict the risk probability of infection in patients with DC, and further compared the value of the XGBoost model with the prediction method based on the conventional LR.


Study design and patients

Clinical data of this study were obtained from five third-level hospitals in southwest China. In this multicenter retrospective study, 6,648 of 10,689 DC patients with clinical consultation records met the quality standards for the final analysis. These patients were randomly divided into a training set with 4,353 samples and an internal validation set with 1,866 samples from hospitals A-D at a ratio of 7:3. A total of 429 samples from hospital E were used for external validation. The study adhered to the principles of the Declaration of Helsinki and the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis Guidelines [27]. Clinical research ethics approval was obtained from the Ethics Committee of the Affiliated Banan Hospital of Chongqing Medical University (approval number: 2021-008). Individual patient-level consent was not required because the study only used fully de-identified collected data.

Diagnostic criteria

The diagnosis of DC is confirmed by liverbiopsy, clinical, biochemical, and imaging data or past medical records, and the diagnosis is in accordance with the “EASL Clinical Practice Guidelines for the management of patients with decompensated cirrhosis” [1]. Infection was defined to include SBP, pneumonia, cellulitis, urinary system infection and spontaneous bacteremia, and (ii) a combination of microbial detection, clinical or laboratory signs of infection [28, 29].

Inclusion and exclusion criteria

The inclusion criteria for this study were DC patients admitted between July 2012 and December 2021. Exclusion criteria were as follows: (i) age < 18 years, (ii) patients with cancer other than primary liver cancer, (iii) mental illness, (iv) pregnant and lactating women, and (v) variables with > 30% missing values. The detailed selection process is shown in Supplementary Fig. 1.

Data collection

On the basis of previous studies, 28 variables routinely tested or recorded were collected, which included age, sex, hypertension, diabetes, smoking, drinking, primary liver cancer, family history of liver disease, hepatitis B virus (HBV), hepatitis C virus (HCV), alcoholic, autoimmunity, gastrointestinal bleeding (GIB), ascites, hepatic encephalopathy (HE), hepatic failure (HF), total protein (TP), total bilirubin (TB), hemoglobin, blood sodium (Na), blood potassium (K), albumin (ALB), prothrombin activity (PTA), blood urea nitrogen (BUN), creatinine (Cr), red blood cell (RBC) count, white blood cell (WBC) count, and neutrophils to lymphocytes ratio (NLR). Considering that many features may have different values when measured at different time points, we only included the first measurement values of patients after their first admission in this study.

Statistical analysis

Statistical analysis was performed using SPSS 22.0 and R software (version 4.0.2, Vienna, Austria). Kolmogorov Smirnov Normality test was applied for quantitative data. Probability (P) values of > 0.05 were considered normal distribution. The data with a normal distribution were presented as the mean ± standard deviation and tested with t-test, whereas those with a non-normal distribution were described with the median (interquartile range, [IQR]) and tested with Mann-Whitney U test. The qualitative data were presented as n (%) and tested with χ2 test. We used the R multivariate imputation by chained equation package for missing data imputation in this study.

In the model construction phase, we developed the LR and XGBoost algorithm models. First, the variables with statistical differences were identified through single factor analysis. Then the least absolute shrinkage and selection operator (LASSO) regression was used to further screen potential related variables. Finally, LR and XGBoost models were constructed to analyze the impact of each variable on the increased risk of infection in patients with DC. The hyperparameters of XGBoost were set as follows: eta = 0.3, max_depth = 5, subsample = 0.5, colsample_bytree = 1, gamma = 0.5. We defined this model as “multi-tree XGBoost” and the ranks of feature importance were then obtained [30]. The correlation between the multi-tree XGBoost model’s features was evaluated using Pearson correlation analysis. In order to further determine the most significant features related to infection risk in the unbalanced data, we conducted 100-round 5-fold cross-validation in the training set. When the seventh feature was added in the XGBoost model, the increased AUROC was less than 0.5% (P = 0.158, Supplementary Fig. 2). Finally, six features were selected as significant predictors and defined the model as “simple-tree XGBoost”.

All statistical analyses were two-sided, and statistical significance was set at P < 0.05. Moreover, the “rms”, “ggplot2”, “glment”, “plotROC”, “reportROC”, “corrplot”, “caret”, “dplyr”, and “XGBoost” packages in R were used in our study.


Patient characteristics

The Mann-Whitney U test revealed that there was no significant difference in all missing variables in the training and internal validation sets before and after multiple imputations (Supplementary Table 1). Furthermore, there were no significant differences in all missing variables in the external validation set before and after multiple imputations (Supplementary Table 2). Table 1 summarizes the clinical characteristics of patients in the training and internal validation sets. No significant differences were observed in any of the variables between the two groups (P > 0.05). Patients in the training set were divided into infection and non-infection groups. Univariate analysis revealed that the following variables were significantly associated with infection: sex, hypertension, diabetes, smoking, drinking, primary liver cancer, alcoholic, autoimmunity, GIB, HE, HF, TP, TB, hemoglobin, Na, K, ALB, PTA, BUN, Cr, RBC count, WBC count, and NLR (Table 2).

Table 1 Demographic and clinical characteristics of the training and internal validation sets
Table 2 Univariate analysis of variables associated with infection

Clinical features selection in LASSO regression analysis

Further, 22 features with statistical differences in univariate analysis were enter into the LASSO regression analysis, and 11 were significantly associated with infection, including GIB, HF, TP, TB, hemoglobin, Na, ALB, PTA, BUN, WBC count, and NLR (Fig. 1).

Fig. 1
figure 1

Features selection by LASSO. (A) LASSO coefficients profiles (y-axis) of the 22 features. The upper x-axis is the average numbers of predictors and the lower x-axis is the log(λ). (B) 10-fold cross-validation for tuning parameter selection in the LASSO model

Figure 2 shows the correlation between these 11 features. There is a significant positive correlation between HF and TB (r = 0.53, P < 0.001), a significant positive correlation between TP and ALB (r = 0.53, P < 0.001), a significant negative correlation between HF and PTA (r=-0.55, P < 0.001), and a significant negative correlation between TB and PTA (r=-0.47, P < 0.001).

Fig. 2
figure 2

Correlation coefficient Matrices of 11 features

Construction and evaluation of XGBoost model

The aforementioned 11 features were entered into multi-tree XGBoost. Figure 3 shown the rank of their importance. Subsequently, we added the ranked features one by one to the XGBoost model until an AUROC score improving inferior to 0.5%. Six features, including TB, Na, ALB, PTA, WBC count and NLR were selected as the significant factors. Then a simple-tree XGBoost model was constructed based on the above six key features.

Fig. 3
figure 3

The rank of importance of 11 features in Mutil-tree XGBoost

For the benchmark purpose, we also compared the performances of XGBoost model with the conventional multivariable LR model. In training set, the simple-tree XGBoost model with 6 selected features revealed superior performance compared to the LR with all 11 features (AUROC: 0.971 vs. 0.869, P < 0.001) or 6 features (AUROC: 0.971 vs. 0.864, P < 0.001) (Fig. 4). Table 3 shown the detailed performance metrics for the four models in training set. We have provided the formula details of the performance criteria in Supplementary Table 3. Similarly, in internal validation set, the simple-tree XGBoost model exhibited better performance than the LR used by all 11 features (AUC: 0.998 vs. 0.878, P < 0.001) or the six selected features (AUC: 0.998 vs. 0.875, P < 0.001) (Supplementary Fig. 3). Supplementary Table 4 shown the detailed performance metrics for the four models in internal validation set. In the external validation set, the simple-tree XGBoost model by using six selected features and LR model by using 11 features showed a superior performance (AUC: 1.000 vs. 0.849, P < 0.001) (Supplementary Fig. 4). Supplementary Table 5 shown the detailed performance metrics for the four models in external validation set. Briefly, the above results suggested that simple-tree XGBoost model owned more precise and stable prediction performance than multivariable LR in identifying infection outcome of patients with DC. In addition, we have substituted patients from different centers into the model and compared the diagnostic agreement. The results showed no significant difference between the AUROC of each center and the AUROC of all centers (Supplementary Table 6).

Fig. 4
figure 4

AUROC in training set

Table 3 Detailed performance metrics for the four models in training set


A retrospective study of DC patients hospitalized in five third-level hospitals in southwest China showed that six characteristics, including TB, Na, ALB, PTA, WBC count and NLR were important predictors of the risk of infection in patients with DC. The simple-tree XGBoost model based on these six significant features shows good prediction performance. In training set, it had an AUROC of 0.971, sensitivity of 91.5%, specificity of 90.0%, PPV of 90.8%, and NPV of 90.7%.

More and more studies have confirmed that it is convenient and effective to use laboratory biological indicators to build prediction models. Wang et al. established a prognosis model by combining conventional laboratory indicators with COVID-19 patients. The model based on the combination of neutrophils, lymphocytes, platelets and IL-2R showed good performance in predicting the death of COVID-19 patients. When the critical value was 0.572, the sensitivity and specificity of the prediction model were 90.74% and 94.44%, respectively [31]. In a retrospective cohort study, the researchers used laboratory indicators such as hemoglobin, platelet count, white blood cell count, urea nitrogen, creatinine, glucose, sodium, potassium, and total bicarbonate to construct a multivariate LR model to predict in-hospital mortality of hospitalized patients. A good model calibration and fit were observed (Hosmer-Lemeshow = 13.9, P = 0.18) [32]. The simple-tree XGBoost model constructed in this study can also provide a simple screening tool for medical providers in the primary health care setting, so as to quickly identifying patients at high risk of infection in a single visit.

In a study aimed at constructing a multivariate predictive model for SBP in patients with liver cirrhosis, researchers found that blood neutrophil percentage was a significant predictor of SBP [33]. However, among the five indicators ultimately included in the prediction model, blood neutrophil percentage has the lowest importance compared to the other four indicators. Interestingly, in this study, NLR was the most important predictor for infection in DC patients, indicating that NLR’s sensitivity in predicting infection seems to be superior to blood neutrophil percentage. In addition, in this study, all six features included in the simple-tree XGBoost model have appeared in other studies on constructing prediction model for infection in patients with liver cirrhosis, indicating that the six features selected in this study have high clinical practicality in predicting infection [34,35,36,37].

PTA is a classic index used to judge the severity of liver disease [38]. Its sensitivity and specificity for various liver diseases are different in clinical evaluations, but a decrease in its level generally indicates that the liver function of the patients was damaged to different degrees. Llucia Tito et al. found that PTA was an independent predictor of liver cirrhosis complicated with SBP infection. In this study, a decreased PTA was found to be a risk factor for DC complicated with infection, and the risk of developing an infection would increase 0.04-fold when PTA decreased by 1% [39]. Hypoalbuminemia is also an independent risk factor for infection in DC patients. The low level of ALB reflects that the patient’s liver function and nutritional status are poor, the detoxification function of the body is reduced, and the ability to resist pathogenic bacteria is significantly reduced, which makes the patient prone to infection [40]. TB and Na were also proved to be poor predictors of infection [41, 42].

WBC count was another key predictor in the simple-tree XGBoost model. WBC count is an important component of the body’s defense system as a traditional indicator for detecting infectious diseases such as viruses and bacteria [43]. Autoimmune disease, infection or septicemia can cause excessive consumption of granulocytes, resulting in granulocytopenia. During the diagnosis of infected patients, the detection of patients’ WBC count can make a specific analysis of patients’ inflammation; However, in some patients with non bacterial infection, WBC count in patients will also show constant changes due to the influence of external environment [44, 45]. Cheng et al. found that WBC count was an important risk factor for nosocomial bacterial infection in COVID-19 patients in tertiary hospitals. It is worth noting that compared with WBC count [(4.0 ~ 10.0) × 109/L], patients with WBC count (> 10.0 × 109/L or ≤ 4.0 × 109/L) have a 7.38 fold increased risk of nosocomial bacterial infection [46]. The study by Huang also demonstrated that WBC count (threshold > 10 × 109/L) and procalcitonin to lactic acid ratio (threshold > 0.438) may help identify early stages of infection in patients with diabetic ketoacidosis, and combining these two markers may help with specificity [47].

NLR is a particularly interesting parameter. It is believed that liver cirrhosis has immune insufficiency, while neutrophils can reflect the immediate response of the body to inflammation, protect the body against bacterial infection [48,49,50], and lymphocyte level can reflect the immune level and nutritional status of the body. In patients with liver cirrhosis, the intestinal barrier is destroyed, intestinal flora changes, and pathogen-associated molecular patterns produced by bacteria, such as endotoxin, enter the blood circulation [51, 52]. Neutrophils can produce a large number of proinflammatory or anti-inflammatory cytokines, such as IL-6, IL-8, IL-17, when pathogen-associated molecular patterns and damage-associated molecular patterns are produced by liver cell necrosis. These cytokines in turn promote the activation of neutrophils [51]. In the process of disease development, patients often have lymphocytopenia, which may be related to the increase of lymphocyte apoptosis in the process of inflammation [53]. Therefore, NLR is an indicator that can reflect the overall immune status of the body. At the same time, a large number of studies have also confirmed that NLR can be used to evaluate the long-term or short-term prognosis of patients with stable or decompensated cirrhosis and cirrhosis with or without acute liver failure [48, 54,55,56].

In 2020, the annual per capita disposable income of rural households in China was approximately 17,132 yuan, which is approximately one-third of the income of urban households [57]. Financial cost may be the leading barrier to screen DC patients for the risk of infection. Because of immune response dysfunction, infection poses a huge risk to patients with DC and indicates the beginning of the terminal phase of this disease, but the known risk factors have not fully clarified this relationship. Thus, it is important to minimize the number of variables in diagnostic tools as much as possible in medically underserved settings. The population with limited access to infection care may benefit from our simple-tree XGBoost model, which was developed based on restricted medical resources and would not incur additional expenditures.

The advantage of this study is to use multicenter electronic medical record data to develop a infection prediction model. However, this study still has some limitations. First, due to retrospective research, the causal relationship between risk factors and infection should be carefully considered. Second, some important potential influencing factors were not included in this study because of significant data missing. Third, this study can only be regarded as a pilot study. More features and larger sample studies would be conducted to verify and improve the overall performance of the model in future.


Our study suggests that a simple predictive model could provide added value as an automated screening tool to DC patients for infection. We identified six candidate features, including TB, Na, ALB, PTA, WBC count and NLR measured at hospital admission, as critical infection risk biomarkers for DC patients. The simple-tree XGBoost model conducted by the six significant features can help to predict infection of DC patients with accurately > 95% precision and > 95% sensitivity.

Data Availability

The datasets used for this study are available on request to the corresponding author.


  1. Angeli P, Bernardi M, Villanueva C, Francoz C, Mookerjee RP, Trebicka J, et al. EASL Clinical Practice Guidelines for the management of patients with decompensated cirrhosis. J Hepatol. 2018;69(2):406–60.

    Google Scholar 

  2. D’Amico G, Morabito A, D’Amico M, Pasta L, Malizia G, Rebora P, et al. New concepts on the clinical course and stratification of compensated and decompensated cirrhosis. Hep Intl. 2018;12(Suppl 1):34–43.

    Google Scholar 

  3. Costentin CE, Layese R, Bourcier V, Cagnot C, Marcellin P, Guyader D, et al. Compliance with Hepatocellular Carcinoma Surveillance Guidelines Associated with increased lead-time adjusted survival of patients with compensated viral cirrhosis. Gastroenterology. 2018;155(2):431–42.

    PubMed  Google Scholar 

  4. Fleming KM, Aithal GP, Card TR, West J. The rate of decompensation and clinical progression of disease in people with cirrhosis: a cohort study. Aliment Pharmacol Ther. 2010;32(11–12):1343–50.

    CAS  PubMed  Google Scholar 

  5. Merwe SVd, Chokshi S, Bernsmeier C, Albillos A. The multifactorial mechanisms of bacterial infection in decompensated cirrhosis. J Hepatol. 2021;75(S1):82–S100.

    Google Scholar 

  6. Solà E, Solé C, Ginès P. Management of uninfected and infected ascites in cirrhosis. Liver International: Official Journal of the International Association for the Study of the Liver. 2016;36(Suppl 1s1):109–15.

    PubMed  Google Scholar 

  7. Gallo A, Dedionigi C, Civitelli C, Panzeri A, Corradi C, Squizzato A. Optimal management of cirrhotic ascites: a review for internal medicine physicians. J Translational Intern Med. 2020;8(4):220–36.

    Google Scholar 

  8. Reuken PA, Stallmach A, Bruns T. Mortality after urinary tract infections in patients with advanced cirrhosis - relevance of acute kidney injury and comorbidities. Liver International: Official Journal of the International Association for the Study of the Liver. 2013;33(2):220–30.

    PubMed  Google Scholar 

  9. Marciano S, Dirchwolf M, Bermudez CS, Sobenko N, Haddad L, Ber FG, et al. Spontaneous bacteremia and spontaneous bacterial peritonitis share similar prognosis in patients with cirrhosis: a cohort study. Hep Intl. 2018;12(2):181–90.

    Google Scholar 

  10. Benz F, Mohr R, Tacke F, Roderburg C. Pulmonary complications in patients with liver cirrhosis. 2020;8(3):150–8.

  11. Fernández J, Tandon P, Mensa J, Garcia-Tsao G. Antibiotic prophylaxis in cirrhosis: good and bad. Hepatology (Baltimore MD). 2016;63(6):2019–31.

    PubMed  Google Scholar 

  12. Yamaguchi D, Sakata Y, Yoshida H, Furukawa NE, Tsuruoka N, Higuchi T, et al. Effectiveness of endoscopic hemostasis with soft coagulation for Non-Variceal Upper gastrointestinal bleeding over a 12-Year period. Digestion. 2017;95(4):319–26.

    PubMed  Google Scholar 

  13. Alabsawy E, Shalimar, Sheikh MF, Ballester MP, Acharya SK, Agarwal B, et al. Overt hepatic encephalopathy is an independent risk factor for de novo infection in cirrhotic patients with acute decompensation. Aliment Pharmacol Ther. 2022;55(6):722–32.

    PubMed  PubMed Central  Google Scholar 

  14. Toledo C, Salmerón JM, Rimola A, Navasa M, Arroyo V, Llach J, et al. Spontaneous bacterial peritonitis in cirrhosis: predictive factors of infection resolution and survival in patients treated with cefotaxime. Hepatology. 1993;17(2):251–7.

    CAS  PubMed  Google Scholar 

  15. Thuluvath PJ, Morss S, Thompson R. Spontaneous bacterial peritonitis—in-hospital mortality, predictors of survival, and health care costs from 1988 to 1998. Am J Gastroenterol. 2001;96(4):1232–6.

    CAS  PubMed  Google Scholar 

  16. Saberi-Karimian M, Khorasanchi Z, Ghazizadeh H, Tayefi M, Saffar S, Ferns GA, et al. Potential value and impact of data mining and machine learning in clinical diagnostics. Crit Rev Clin Lab Sci. 2021;58(4):275–96.

    PubMed  Google Scholar 

  17. Jayatilake SMDAC, Ganegoda GU. Involvement of machine learning tools in Healthcare decision making. J Healthc Eng. 2021;2021:6679512.

    PubMed  PubMed Central  Google Scholar 

  18. Shu S, Ren J, Song J. Clinical application of machine learning-based Artificial Intelligence in the diagnosis, prediction, and classification of Cardiovascular Diseases. Circulation Journal: Official Journal of the Japanese Circulation Society. 2021;85(9):1416–25.

    CAS  PubMed  Google Scholar 

  19. Mangasarian OL, Wild EW. Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell. 2006;28(1):69–74.

    PubMed  Google Scholar 

  20. Cichy RM, Kaiser D. Deep neural networks as scientific models. Trends Cogn Sci. 2019;23(4):305–17.

    PubMed  Google Scholar 

  21. Chen T, Guestrin C, XGBoost:. A Scalable Tree Boosting System. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794). ACM.

  22. Steinmeyer C, Wiese L. Sampling methods and feature selection for mortality prediction with neural networks. J Biomed Inform. 2020;111:103580.

    PubMed  Google Scholar 

  23. Auret L, Aldrich C. Interpretation of nonlinear relationships between process variables by use of random forests. Miner Eng. 2012;35:27–42.

    CAS  Google Scholar 

  24. Bai Y, Bain M. Optimizing weighted lazy learning and Naive Bayes classification using differential evolution algorithm. J Ambient Intell Humaniz Comput. 2021(prepublish):1–20.

  25. Kim H-J, Han D, Kim J, Kim D, Ha B, Seog W, et al. An Easy-to-use machine learning model to predict the prognosis of patients with COVID-19: Retrospective Cohort Study. J Med Internet Res. 2020;22(11):e24225.

    PubMed  PubMed Central  Google Scholar 

  26. Huang Y, Chen H, Zeng Y, Liu Z, Ma H, Liu J. Development and Validation of a Machine Learning Prognostic Model for Hepatocellular Carcinoma Recurrence After Surgical Resection&#13. Frontiers in Oncology. 2021;10:593741.

  27. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 2015;13(1):1–10.

    PubMed  PubMed Central  Google Scholar 

  28. Campbell KA, Trivedi HD, Chopra S. Infections in cirrhosis: a guide for the Clinician. Am J Med. 2021;134(6):727–34.

    PubMed  Google Scholar 

  29. Kulkarni AV, Premkumar M, Arab JP, Kumar K, Sharma M, Reddy N, et al. Early diagnosis and Prevention of Infections in cirrhosis. Semin Liver Dis. 2022;42(3):293–312.

    CAS  PubMed  Google Scholar 

  30. Guan X, Zhang B, Fu M, Li M, Yuan X, Zhu Y, et al. Clinical and inflammatory features based machine learning model for fatal risk prediction of hospitalized COVID-19 patients: results from a retrospective cohort study. Ann Med. 2021;53(1):257–66.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Wang F, Hou H, Wang T, Luo Y, Tang G, Wu S, et al. Establishing a model for predicting the outcome of COVID-19 based on combination of laboratory tests. Travel Med Infect Dis. 2020;36:101782.

    PubMed  PubMed Central  Google Scholar 

  32. Blanco N, Leekha S, Magder L, Jackson SS, Tamma PD, Lemkin D, et al. Admission laboratory values accurately predict In-hospital mortality: a Retrospective Cohort Study. J Gen Intern Med. 2020;35(3):719–23.

    CAS  PubMed  Google Scholar 

  33. Tu B, Zhang YN, Bi JF, Xu Z, Zhao P, Shi L, et al. Multivariate predictive model for asymptomatic spontaneous bacterial peritonitis in patients with liver cirrhosis. World J Gastroenterol. 2020;26(29):4316–26.

    PubMed  PubMed Central  Google Scholar 

  34. Yang Q, Jiang XZ, Zhu YF, Lv FF. Clinical risk factors and predictive tool of bacteremia in patients with cirrhosis. J Int Med Res. 2020;48(5):300060520919220.

    CAS  PubMed  Google Scholar 

  35. Hu Y, Chen R, Gao H, Lin H, Wang J, Wang X, et al. Explainable machine learning model for predicting spontaneous bacterial peritonitis in cirrhotic patients with ascites. Sci Rep. 2021;11(1):21639.

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Huynh NC, Vo TD. Validation of a new simple scoring system to predict spontaneous bacterial peritonitis in patients with cirrhosis and ascites. BMC Gastroenterol. 2023;23(1):272.

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Termsinsuk P, Auesomwang C. Factors that predict recurrent spontaneous bacterial peritonitis in cirrhotic patients. Int J Clin Pract. 2020;74(3):e13457.

    CAS  PubMed  Google Scholar 

  38. Drolz A, Horvatits T, Roedl K, Rutter K, Staufer K, Kneidinger N, et al. Coagulation parameters and major bleeding in critically ill patients with cirrhosis. Hepatology (Baltimore MD). 2016;64(2):556–68.

    CAS  PubMed  Google Scholar 

  39. Titó L, Rimola A, Ginès P, Llach J, Arroyo V, Rodés J. Recurrence of spontaneous bacterial peritonitis in cirrhosis: frequency and predictive factors. Hepatology (Baltimore MD). 1988;8(1):27–31.

    PubMed  Google Scholar 

  40. Trebicka J. Role of albumin in the treatment of decompensated liver cirrhosis. Curr Opin Gastroenterol. 2022;38(3):200–5.

    CAS  PubMed  Google Scholar 

  41. Takahashi N, Nakada T-A, Walley KR, Russell JA. Significance of lactate clearance in septic shock patients with high bilirubin levels. Sci Rep. 2021;11(1):6313.

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Ismail MK, Daboul I, Waters B, Fleckenstein JF, Vera SR, Riely CA. Liver transplastion for hepatic sarcoidosis: long term follow-up and recurrence after liver transplantion, a single center experience. Gastroenterology. 2001;120(5):A372.

    Google Scholar 

  43. Safuan SNM, Tomari MRM, Zakaria WNW. White blood cell (WBC) counting analysis in blood smear images using various color segmentation methods. Measurement. 2018;116:543–55.

    Google Scholar 

  44. Honda T, Uehara T, Matsumoto G, Arai S, Sugano M. Neutrophil left shift and white blood cell count as markers of bacterial infection. Clin Chim Acta. 2016;457:46–53.

    CAS  PubMed  Google Scholar 

  45. Ishimine N, Honda T, Yoshizawa A, Kawasaki K, Sugano M, Kobayashi Y, et al. Combination of white blood cell count and left shift level real-timely reflects a course of bacterial infection. J Clin Lab Anal. 2013;27(5):407–11.

    PubMed  PubMed Central  Google Scholar 

  46. Cheng K, He M, Shu Q, Wu M, Chen C, Xue Y. Analysis of the risk factors for nosocomial bacterial infection in patients with COVID-19 in a Tertiary Hospital. Risk Manage Healthc Policy. 2020;13:2593–9.

    Google Scholar 

  47. Huang B, Yang S, Ye S. Systemic infection predictive value of procalcitonin to lactic acid ratio in diabetes ketoacidosis patients. Diabetes, metabolic syndrome and obesity: targets and therapy. 2022;15:2127–33.

  48. Kalra A, Wedd JP, Bambha KM, Gralla J, Golden-Mason L, Collins C, et al. Neutrophil-to-lymphocyte ratio correlates with proinflammatory neutrophils and predicts death in low model for end-stage liver disease patients with cirrhosis. Liver transplantation: official publication of the American Association for the study of Liver Diseases and the International Liver. Transplantation Soc. 2017;23(2):155–65.

    Google Scholar 

  49. Tritto G, Bechlis Z, Stadlbauer V, Davies N, Francés R, Shah N, et al. Evidence of neutrophil functional defect despite inflammation in stable cirrhosis. J Hepatol. 2011;55(3):574–81.

    CAS  PubMed  Google Scholar 

  50. Mookerjee RP, Stadlbauer V, Lidder S, Wright GAK, Hodges SJ, Davies NA, et al. Neutrophil dysfunction in alcoholic hepatitis superimposed on cirrhosis is reversible and predicts the outcome. Hepatology (Baltimore MD). 2007;46(3):831–40.

    CAS  PubMed  Google Scholar 

  51. Albillos A, Lario M, Álvarez-Mon M. Cirrhosis-associated immune dysfunction: distinctive features and clinical relevance. J Hepatol. 2014;61(6):1385–96.

    CAS  PubMed  Google Scholar 

  52. Kalaitzakis E. Gastrointestinal dysfunction in liver cirrhosis. World J Gastroenterol. 2014;20(40):14686–95.

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Viers BR, Thompson RH, Lohse CM, Cheville JC, Leibovich BC, Boorjian SA, et al. Pre-treatment neutrophil-to-lymphocyte ratio predicts tumor pathology in newly diagnosed renal tumors. World J Urol. 2016;34(12):1693–9.

    PubMed  Google Scholar 

  54. Cai Y-J, Dong J-J, Dong J-Z, Chen Y, Lin Z, Song M, et al. A nomogram for predicting prognostic value of inflammatory response biomarkers in decompensated cirrhotic patients without acute-on-chronic liver failure. Aliment Pharmacol Ther. 2017;45(11):1413–26.

    PubMed  Google Scholar 

  55. Liu H, Zhang H, Wan G, Sang Y, Chang Y, Wang X, et al. Neutrophil-lymphocyte ratio: a novel predictor for short-term prognosis in acute-on-chronic hepatitis B liver failure. J Viral Hepatitis. 2014;21(7):499–507.

    CAS  Google Scholar 

  56. Zhang H, Sun Q, Mao W, Fan J, Ye B. Neutrophil-to-lymphocyte ratio predicts early mortality in patients with HBV-Related decompensated cirrhosis. Gastroenterol Res Pract. 2016;2016:4394650.

    PubMed  PubMed Central  Google Scholar 

  57. China NBoSo. China Statistical Yearbook 2021. Beijing: China Statistical Publishing House; 2021.

    Google Scholar 

Download references


We would like to thank all the participants of this project and investigators for collecting the data.


This work was supported by grants from the Natural Science Foundation of Zhejiang Province [grant number LQ21H190004].

Author information

Authors and Affiliations



JZ, JL and XW wrote the main manuscript text and JZ prepared Figs. 1, 2, 3 and 4. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Xiaoxin Wu or Zihao Guo.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Ethics Statement

The Ethics Committee of the Affiliated Banan Hospital of Chongqing Medical University approved the study (approval number: 2021-008). Written informed consent for participation was not required for this study due to its retrospective design(The Ethics Committee of the Affiliated Banan Hospital of Chongqing Medical University waived the informed consent for this study), and the study was undertaken in accordance with national legislation and institutional requirements.

Consent for publication

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, J., Li, J., Zhang, Z. et al. Clinical Data based XGBoost Algorithm for infection risk prediction of patients with decompensated cirrhosis: a 10-year (2012–2021) Multicenter Retrospective Case-control study. BMC Gastroenterol 23, 310 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: