Validation of a new prognostic model to predict short and medium-term survival in patients with liver cirrhosis

Background MELD score and MELD score derivates are used to objectify and grade the risk of liver-related death in patients with liver cirrhosis. We recently proposed a new predictive model that combines serum creatinine levels and maximum liver function capacity (LiMAx®), namely the CreLiMAx risk score. In this validation study we have aimed to reproduce its diagnostic accuracy in patients with end-stage liver disease. Methods Liver function of 113 patients with liver cirrhosis was prospectively investigated. Primary end-point of the study was liver-related death within 12 months of follow-up. Results Alcoholic liver disease was the main cause of liver disease (n = 51; 45%). Within 12 months of follow-up 11 patients (9.7%) underwent liver transplantation and 17 (15.1%) died (13 deaths were related to liver disease, two not). Measures of diagnostic accuracy were comparable for MELD, MELD-Na and the CreLiMAx risk score as to power in predicting short and medium-term mortality risk in the overall cohort: AUROCS for liver related risk of death were for MELD [6 months 0.89 (95% CI 0.80–0.98) p < 0.001; 12 months 0.89 (95% CI 0.81–0.96) p < 0.001]; MELD-Na [6 months 0.93 (95% CI 0.85–1.00) p < 0.001 and 12 months 0.89 (95% CI 0.80–0.98) p < 0.001]; CPS 6 months 0.91 (95% CI 0.85–0.97) p < 0.01 and 12 months 0.88 (95% CI 0.80–0.96) p < 0.001] and CreLiMAx score [6 months 0.80 (95% CI 0.67–0.96) p < 0.01 and 12 months 0.79 (95% CI 0.64–0.94) p = 0.001]. In a subgroup analysis of patients with Child-Pugh Class B cirrhosis, the CreLiMAx risk score remained the only parameter significantly differing in non-survivors and survivors. Furthermore, in these patients the proposed score had a good predictive performance. Conclusion The CreLiMAx risk score appears to be a competitive and valid tool for estimating not only short- but also medium-term survival of patients with end-stage liver disease. Particularly in patients with Child-Pugh Class B cirrhosis the new score showed a good ability to identify patients not at risk of death.


Background
Mathematical scores are frequently used tools of clinical routine frequently used in the assessment of disease severity and in the determination of patient prognosis [1][2][3][4]. Historically the Child-Pugh score (CPS) was implemented as a clinical score to evaluate prognosis in liver cirrhosis [1]. Initially CPS was used as the basis for prioritizazion of organs for liver transplantation. Since 2002, the model for end-stage liver disease (MELD) has become the reference mathematical model for many liver transplant societies to facilitate and objectivize the allocation of donor organs to liver transplant candidates. Nevertheless, there is an ongoing discussion on the prognostic value of both scores [5]. In particular, the known disparities of MELD based organ allocation systems favorizing a "sickest first" policy, suggest that room remains for improvement [6].
In patients with liver cirrhosis the predictive power of quantitative liver function tests has been reported with results comparable to established biological indices in several studies [7,8]. Here, 13 C-liver function breath tests were identified as particularly accurate [9,10]. In our past studies we extensively evaluated the 13 C-methacetin LiMAx® methodology and its prognostic value in different clinical situations [11][12][13][14].
The results of our prior studies indicated that enzymatic liver function and serum creatinine are independent factors for short-term survival of patients with liver cirrhosis. In a cohort study we prospectively investigated the ability of the LiMAx®-test to predict 3-month mortality in patients with liver cirrhosis [15]. Here, the LiMAx® score and serum creatinine were independent predictors for liver-failure related death in a cox proportional-hazards model. Furthermore, the logistic regression analysis identified the LiMAx® score and serum creatinine as independent factors of mortality. On the basis of these results we have proposed a new score, combining the predictive power of both, the LiMAx® test and serum creatinine, namely the CreLiMAx risk score. The benefits of the new CreLiMAx risk score were its high diagnostic accuracy and its potential to refine the assessment of end-stage liver disease (ESLD), beyond information provided by the MELD score.
In this validation study we aimed to reproduce the Cre-LIMAx diagnostic accuracy in patients with end-stage liver disease in liver related death within 12 months of follow-up.

Study design
The present analysis is based on data derived from a prospective observational study. Patients with ESLD were recruited via gastroenterological and surgical outpatient and inpatient departments between May 2012 and July 2014. The study was approved by the Charité Institutional   Ethics Committee and registered at the German Clinical  Trials  Register  prior  study  start  (DRKS-ID:  DRKS00005308).

Study concept
At the time of study enrolment clinical examinations and routine laboratory tests were performed in all patients in order to calculate predictive models. In addition, patients underwent two additional tests on the day of enrolment (range 0-7 days) to assess the degree of liver fibrosis: an enzymatic liver function test [maximum liver function capacity (LiMAx®) test], and transient elastography (Fibro Scan®).
Study participants were followed up for 1 year from enrolment or until a study endpoint (death or liver transplantation) was reached. Causes of death were assigned by study personal to three categories: liverrelated (n = 13), not liver-related (n = 2) unclear (n = 2).

Study population
Survival data of 113 patients with ESLD were assessed. Inclusion criteria were a biopsy proven cirrhosis or clinical signs of cirrhosis [presence of at least two of the following complications: history of encephalopathy, liver associated ascites, endoscopically proven oesophageal varices, cirrhosis suggested in imaging (irregular shape of the liver, spleenomegaly)] or biochemical signs of cirrhosis [platelet count < 120/nL in the absence of a hematologic disorder, INR greater than 1.5 and albumin levels lower than 30 g/L]. Exclusion criteria for the analysis comprised previous liver surgeries including liver transplantation (LTx), the necessity for liver support therapy, parenteral nutrition, recent and past drug consumption, suspected or diagnosed hepatocellular carcinoma, acute on chronic liver failure and prior insertion of a transjugular intrahepatic portosystemic shunt. Written informed consent was obtained from all enrolled patients.
Finally, in order to address the accuracy of diagnostic tests in patients with Child-Pugh B, we performed a subgroup analysis in patients with Child-Pugh class B cirrhosis at the time of study enrolment (n = 41).

MELD [2]
and MELD-Na [3] were calculated according to previously described formulas. The recently proposed CreLiMAx risk score was calculated according to the authors' previously published work [15]. The Child-Pugh score (CPS) was computed according to a published formula [1]. Encephalopathy was classified in 4 grades according to the West-Haven Criteria [16], presence and severity of ascites was being graded according to published guidelines [17].
The individual enzymatic liver function of the patient was assessed with the LiMAx® test. The test methodology is grounded on a 2 mg/kg bodyweight-adjusted intravenous 13 C-labeled methacetin administration and subsequent liver-specific metabolisation of the substrate via the CYP 1A2 isoenzyme complex. The emerging 13 CO 2 / 12 CO 2 ratios are continuously measured by means of online breathsampling using a special isotope-selective infrared spectroscope (FLIP, Humedics GmbH, Berlin, Germany). Values > 315 μg/kg/h are considered normal.

Statistical analysis
Data were analyzed using the IBM SPSS Statistics  To account for missing values, an expectation maximization technique was used to impute missing observations. Continuous variables are shown as median and interquartile range (IQR; 25th-75th percentile). Categorical variables are shown as frequencies and percentage. The chi-squared test was applied for categorical variables and the Mann-Whitney U-test to compare quantitative variables. Correlations were calculated using the Spearman correlation coefficient. In order to assess the results of selected parameters with the best sensitivity and specificity to discriminate between patients who survived and those who either underwent LTx or died due to liver failure during the 1year follow-up, we used the receiver operating characteristic (ROC) curve analyses and calculated the area under the ROC curve (AUROC). Optimal cut-offs were determined by the highest Youden index (corresponding to the sum of sensitivity and specificity minus 1). Patients with unclear and non-liver-related death were censored for the 12-month analysis (4 patients died after the 6month follow-up). Differences between paired ROC curve area comparisons was performed using the DeLong test [18] In order to assess the multiple testing error we performed ANOVA with Bonferroni correction in our subgroup analysis. Survival curves were estimated using the Kaplan-Meier method for with patients censored at the time of transplantation or non liver-related death. Positive diagnostic likelihood ratios (LR) were used to evaluate the survival probability of patients at risk; values greater than 1 increasing the probability of disease [19]. A p-value of less than 0.05 for two sided tests was considered statistically significant.

Study population
Clinical and epidemiological characteristics of the overall cohort are depicted in Table 1. Alcoholic liver disease was the main cause of liver disease (n = 51; 45%). Within 12 months of follow-up 11 patients (9.7%) underwent LTx and 17 (15.1%) died. Among these 13 deaths were liver-related, while two were not liver-related (spontaneous intracranial bleeding and subarachnoidal bleeding after downfall). In two patients the cause of death remained unclear (Fig. 1). The time between study enrolment and death was 142 (45-235) days and 91 (57-237) days between study enrolment and LTx.

Diagnostic accuracy
The LiMAx® test showed a significant negative correlation with the MELD score (r s = − 0.65; p < 0.001) and CPS (r s = − 0.62; p < 0.001). Also, the newly proposed prognostic CreLiMAx risk score correlated significantly with MELD (r s = − 0.72; p < 0.001) and CPS (r s = − 0.55; p < 0.001, respectively). At the time of study enrolment, mathematical models as well as the LiMAx® test were significantly different between patients who survived and those who died due to liver-failure related complications within 12 months (data not shown).
We determined the best discriminative cut-off values to predict the probability of liver-related death in our cohort. This confirmatory analysis showed similar results for the estimated 3-month survival when compared to data deriving from a different cohort of ESLD-patients recently published by the authors.
The current results show that the CreLiMAx risk score addresses short-term and medium-term mortality risk with comparable diagnostic accuracy. Positive diagnostic likelihood ratios (LR) (values greater than 1 increase the probability of disease) were best for the CreLiMAx risk score, also with the best diagnostic accuracy for the evaluation of the survival probability of patients at risk.

Subgroup analysis in patients with child-Pugh class B cirrhosis
Because of the known clinical heterogeneity of patients with Child-Pugh Class B cirrhosis we examined the prognostic ability of MELD, MELD-Na and the CreLiMAx risk score in this subgroup separately. Table 3 summarizes the clinical baseline data of these 42 patients. The CreLiMAx risk score remained the only parameter that significantly differed in non-survivors and survivors within this subgroup. Figure 3 shows Boxplot diagrams for 3, 6 and 12 months, respectively.
In this subgroup analysis measures of diagnostic accuracy were strong for the CreLiMAx score. Positive and negative LR were best for the CreLiMAx score, again with best diagnostic accuracy for the evaluation of the survival probability of patients at risk. Table 4

Discussion
Results emerging from the present study confirm the Cre-LiMAx risk score as a valid estimate of 3-month survival for patients with ESLD. Moreover, the new score was confirmed as an eligible predictor of 6-and 12-month survival, with good discriminative ability to identify transplant candidates at risk of dying within this period. Initially, the LiMAx® test has been reported as a reliable method to evaluate actual liver function and to predict postoperative outcome in the perioperative work-up of patients with liver pathologies [11,20]. Furthermore, the authors were able to demonstrate its prognostic accuracy in patients with acute liver failure [12]. In addition, we were able to demonstrate that the LiMAx® test is an appropriate marker for 6-month survival prognosis in liver transplant candidates [21]. Supported by these finding we recently proposed the CreLiMAx risk score for the evaluation of short-term mortality risk in patients with chronic liver disease [15].
In our current analysis we were able to confirm the ability of the new score to estimate short term survival in an independent prospective cohort study. Compared with our initial publication, this confirmatory study comprised a similar set of patients and comparable numbers of liverrelated deaths and liver transplantations performed.
Additionally, in this study we were able to demonstrate the potential of the CreLiMAx risk score in predicting medium-term mortality. Although, measures of diagnostic accuracy for all clinically relevant models were comparable, positive likelihood ratio and diagnostic accuracy were slightly higher for the CreLiMAx risk score. Likelihood ratio is commonly seen as being less likely to change with the prevalence of the disorder and thus has an advantage over sensitivity and specificity [22]. Hence, the positive likelihood ration of 3.25 means a CreLiMAx risk score < 0.9 is 3 times as likely to be seen in a patient who survives, than in a patient who dies within 3 months. In contrast, MELD focuses on identifying the sickest patients with the highest risk of death. Therefore, both tools might be used in a synergetic manner to estimate outcome in such patients.
A recent study aimed to demonstrate the benefit of the incorporation of liver function tests into existing survival scores to improve diagnostic accuracy [23]. However, linkage to preexisting scores was criticized as being one of the major drawbacks. Several other studies have already shown that liver function tests per se are capable of estimating survival. To our knowledge, no study has yet been able to convincingly demonstrate prognostic superiority to established scores. In 2013, Giannini et al. showed the ability of a breath test to distinguish between different degrees of liver function impairment and clinical and histological findings in cirrhotic patients [24]. A large prospective study reported a similar -but not superior -diagnostic power of liver tests when compared with established tests Fig. 2 Kaplan-Meier estimates according to CreLiMAx risk score cut-off values. (patients were censored at time of transplantation or not liverrelated death). The Breslow-Wilcoxon test was used to determine differences between survival curves: p = 0.223 (CreLiMAx risk score cut-off 0.9) for 3 months, p = 0.003 (CreLiMAx risk score cut-off 0.9) for 6 months and p = 0.002 (CreLiMAx risk score cut-off 0.9) for 12 months, respectively to estimate disease related survival [25]. Stravitz et al. recently suggested that a liver specific breath test could better classify risk of cirrhotic complications and mortality than certain MELD thresholds [10]. Although the study cohort mainly contained patients with mild liver impairment, CPS was not considered as a method of reference. Over decades CPS has been widely used as the reference for assessing outcome in cirrhotic patients [1,26]. However, within early stages of compensated liver disease, scoring systems -such as CPS and MELD -provide only limited prognostic value for short-term survival [27]. Furthermore, a recent systematic review examined the differences in survival predictions of CPS versus MELD in regard to different clinical settings [5]. Especially within  the strata of early CP classes, algorithms based on knowledge of true liver function may provide additional information for a refined patient treatment. The LiMAx® test has already been proven to reflect true residual organ function otherwise not detected by routinely used methods [14]. This appears to be of great importance for the estimation of disease severity and the outcome prediction of patients with chronic liver disease. In particular, Child-Pugh Class B comprises patients with a high degree of heterogeneity with respect to residual liver function. In this stratum, patients with lower CPS but severely decreased liver specific test results might have a poor prognosis comparable to patients with higher CPS, despite the absence of clinical complications. In our study, the CreLiMAx risk score showed a good predictive performance in the Child-Pugh Class B. It was the only test that significantly predicted the discrimination between patient survival and death within the strata of the CPS. In particular, the CreLi-MAx risk score was characterized by an excellent negative predictive value and a very good positive as well as negative likehood ratio. This study is of course not the first evaluating a new score to stratify patients with cirrhosis according to their prognosis. However, to our best knowledge this is the first confirmatory study that evaluates the performance of an enzymatic-based liver function model for disease severity.
Certainly, our study has limitations. Although the number of primary endpoints (number of transplantations / number of patient deaths) were comparable in both the initial and confirmatory studies, cohorts differed in some clinical characteristics. The small difference seen in etiology, baseline demography and disease severity reflects the variety in epidemiology of patients with chronic liver disease. Secondly, diagnostic accuracy of disease severity scores differed only marginally. Our analysis strongly focuses on Child-Pugh Class B patients. We must admit that by performing this subgroup analysis the MELD might underperform in estimating patient survival when compared with the CreLiMAx risk score. Since patients' values for serum bilirubin and INR are parameters of the MELD and CPS equation, mathematical bias may be present.
In any case, the CreLiMAx risk score showed a good ability to identify patients with low risk of death. Therefore, the CreLiMAx risk score seems to be a valid tool for providing additional information for a holistic clinical decision-finding in patients facing an allocation system based on a sickest-first policy.

Conclusion
In conclusion, the study confirmed the CreLiMAx risk score to be a competitive and valid tool for estimating not only short-, but also medium-term survival in patients with end-stage liver disease. In particular in patients with Child Pugh Class B cirrhosis the new score best identified patients who were not at risk of death and might represent a complementary tool to the MELD for the assessment of the outcome of patients with chronic liver disease.