Risk indices predicting graft use, graft and patient survival in solid pancreas transplantation: a systematic review

Background Risk indices such as the pancreas donor risk index (PDRI) and pre-procurement pancreas allocation suitability score (P-PASS) are utilised in solid pancreas transplantation however no review has compared all derived and validated indices in this field. We systematically reviewed all risk indices in solid pancreas transplantation to compare their predictive ability for transplant outcomes. Methods Medline Plus, Embase and the Cochrane Library were searched for studies deriving and externally validating risk indices in solid pancreas transplantation for the outcomes of pancreas and patient survival and donor pancreas acceptance for transplantation. Results were analysed descriptively due to limited reporting of discrimination and calibration metrics required to assess model performance. Results From 25 included studies, discrimination and calibration metrics were only reported in 88% and 38% of derivation studies (n = 8) and in 25% and 25% of external validation studies (n = 12) respectively. 21 risk indices were derived with mild to moderate ability to predict risk (C-statistics 0.52–0.78). Donor age, donor body mass index (BMI) and donor gender were the commonest covariates within derived risk indices. Only PDRI and P-PASS were subsequently externally validated, with variable association with post-transplant outcomes. P-PASS was not associated with pancreas graft survival. Conclusion Most of the risk indices derived for use in solid pancreas transplantation were not externally validated (90%). PDRI and P-PASS are the only risk indices externally validated for solid pancreas transplantation, and when validated without reclassification measures, are associated with 1-year pancreas graft survival and donor pancreas acceptance respectively. Future risk indices incorporating recipient and other covariates alongside donor risk factors may have improved predictive ability for solid pancreas transplant outcomes.

transplantation within the USA [7]. However, they are not widely used as external validation studies have reported varying association with their intended outcomes [8][9][10][11]. Other risk indices have been derived for use in pancreas transplantation however have not been validated widely in external cohorts [12,13].
We compared the predictive ability of all current risk indices derived for use in solid pancreas transplantation via a systematic review. This would guide future work in incorporating a risk index into the Australian and New Zealand pancreas transplant protocol, as no index is currently used to guide solid pancreas transplantation locally [5,14].

Methods
This systematic review was guided by the Cochrane's Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) tool for reviews of prediction modelling studies [15] and used the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) checklist to assess the completeness of individual study reporting [16]. This project was exempt from requiring local ethics board approval as only previously published data (no identifiable individual data) was the subject of review. The review protocol was registered via the International Prospective Register of Systematic Reviews (PROSPERO ID CRD42018080189) [17].

Literature search
Ovid Medline, Embase and the Cochrane Database of Systematic Reviews were searched for studies which derived or validated risk indices used in pancreas transplantation from inception to the 30 th of March 2020. Grey literature searching of OpenGrey, Scopus and Web of Science was performed for the same period. Search terms included the following keywords and MESH terms; pancreas, transplant, donor, recipient, index or indices, model or models, tool or tools, pancreas after kidney (PAK), pancreas transplant alone (PTA), simultaneous kidney-pancreas transplant (SPK), P-PASS and PDRI. (Search protocols in Additional file 1: Supplement 1).

Eligibility criteria
All observational studies which derived or validated risk indices for solid pancreas transplantation were accepted for full-text review. Islet transplant studies were excluded. A risk index or model was defined as a combination of multiple predictors which calculated individual patient risk of a future outcome [18]. Studies examining a single risk factor's association with solid pancreas transplant outcomes were excluded. Likewise, case series and studies identifying factors associated with solid pancreas transplant outcomes without deriving a risk index or validating a known risk index were excluded. Studies whose aims did not include either deriving or validating a risk index but analysed PDRI or P-PASS for association with various pancreas transplant outcomes were retained for discussion but not included in the analysis. We anticipated a limited number of relevant studies and therefore included abstracts meeting inclusion criteria where no full-text article was available.

Study outcomes
Primary outcomes were pancreas graft survival, patient survival and donor pancreas acceptance for solid-organ pancreas transplantation. Pancreatic graft failure was defined as a permanent return to insulin therapy or pancreatectomy [14] and was reported as death-censored where possible based on study reporting.

Data extraction and critical appraisal
The CHARMS tool was utilized for data extraction (as data was non-randomised) [15]. Domains within CHARMS include data source, participant description, predicted outcomes, significant predictors, sample size, data handling, model development, model performance, model evaluation and results. The TRIPOD checklist was used to assess the quality of data reporting for all included studies [16]. The Prediction Model Risk of Bias Assessment Tool (PROBAST) was used to assess risk of bias and applicability of included studies [19,20].
Model performance was assessed via study reporting of discrimination and calibration metrics [21], as described in TRIPOD, PROBAST and elsewhere [16,20,22,23]. Discrimination (measured via the C-statistic or the area under the receiver operating characteristic (AUROC) curve [22,23]) is the ability to distinguish those at higher risk (for outcome of choice) from those at lower risk. For instance, a C-statistic of 1.0 indicates an index is able to perfectly predict subjects at higher (or lower) risk, whereas 0.5 represents inability to differentiate between risk outcomes (akin to flipping a coin) [23]. The accuracy of an indices' predicted risk (compared to the actual absolute risk) is measured by calibration [22,23]. This is performed via the Hosmer-Lemeshow test, comparison of calibration plots or observed-predicted outcome ratios [22][23][24]. A well-calibrated model is denoted by the lack of significant differences between observed and predicted outcomes or a Hosmer-Lemeshow p value of > 0.05 [24,25].
Model predictors, effect estimates (hazard ratios), missing data, events-per-variable (EPV) rate were extracted to assess study quality [15,26]. Risk indices derived in a cohort with an EPV < 10 have a risk of overfitting (small number of outcome events compared to number of model predictors) [22,23]. When indices were externally validated, we noted if reclassification of predictors took place [27]. If a single study derived more than one risk index, they were considered unique models if the predictors within the models were different.
Two authors (JEHL and TC) performed title, abstract and full-text reviews independently and compared results. Where there was no consensus between both authors, a third author (JK or KRP) was involved. Data extraction and risk of bias assessment was performed by two authors (JEHL and TC) independently and results were compared. Study authors were contacted for clarification and data extraction (9 study authors contacted) particularly when only abstract-level data was available, but only one response was forthcoming.

Data analysis
All risk indices derived for use in solid pancreas transplantation were grouped by the outcomes they were derived to predict. If more than two studies derived risk indices for similar outcomes, we intended to meta-analyse their metrics of model performance. Unfortunately, insufficient metrics of discrimination and calibration were reported by studies deriving and externally validating risk indices to allow pooling of these metrics in a meta-analysis.
Results are therefore presented in two analyses. Firstly, we describe all risk indices derived to predict our primary outcomes in solid pancreas transplantation. Secondly, we describe the external validation of these risk indices. For each analysis, we report risk index performance via their discrimination and calibration metrics (where present) and assess their method of derivation, association with outcome, study quality and risk of bias. We used the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) framework to summarize the current evidence for the use of externally validated risk indices by outcome, according to the domains of Risk of bias, Inconsistency, Indirectness, Imprecision and Publication bias [28][29][30].
Model calibration (Hosmer-Lemeshow test or observed/predicted events ratio) was reported by three studies (38%) deriving indices ( Table 1). The logistic regression model by Dorsey et al. [13] had a C-statistic of 0.78 for 3-month pancreas survival, with a Hosmer-Lemeshow p value of 0.74. In comparison, the Composite Risk Model [12] had C-statistics of 0.6 to 0.52 for the same outcome depending on the number of risk factors included. The corresponding observed/predicted ratios decreased from 0.8 to 0.2 (with increasing risk factors), with a concurrent decrease in model sensitivity. Meanwhile Kasiske et al. derived 12 models with C-statistics ranging from 0.61 to 0.78 for 1-and 3-year pancreas and graft survival (by transplant type) and reported Hosmer-Lemeshow p values ranging from 0.24 to 0.92 [33]. The PDRI C-statistic was 0.67 for 1-year pancreas survival with no calibration reported [3]. No discrimination was reported for P-PASS however the observed incidence of declining a donor pancreas (45.3%) corresponded to the predicted risk of declining a donor pancreas (42.8%) with a P-PASS ≥ 17 [4].

Studies deriving risk indices: study quality and risk of bias
Overall study quality of studies deriving risk indices was moderate (Table 1, Additional file 1: Supplement 3). Four studies were in single centre cohorts, three studies used registry data, and one study utilized a multi-centre cohort. Cohorts consisted of recipients of all solid pancreas transplant types (SPK, PAK, PTA) in seven of eight studies deriving risk indices (one study was in an SPK-only cohort [32]). Study outcomes (graft survival) were defined in seven of eight studies, with centre-based reporting of graft survival by one registry-based study [3] (Additional file 1: Supplement 3). Outcomes reported were 3-month pancreas survival (n = 3), 1-year pancreas survival (n = 3), 1-year patient survival (n = 2), 3-year pancreas and patient survival (n = 1) and donor pancreas acceptance (n = 1) ( Table 1). Five of eight studies had an events-per-variable (EPV) rate of > 10, lowering the risk of overfitting (Additional file 1: Supplement 3). Missing data was present in one study [11] and was handled via complete case analysis. In three studies, it was unclear whether missing data was present [32,33,35]. Four studies had no missing data [3,4,12,13] (Additional file 1: Supplement 3).
Risk indices were modeled differently between studies. The PDRI was developed from significant donor predictors identified via multivariate Cox regression and combined into a continuous risk index with the median donor having a PDRI of 1.0 [3]. In comparison, P-PASS used pre-defined predictors identified by expert opinion to derive logistic regression models for    [13]. Risk indices from other studies were derived using the regression coefficients of significant predictors from multivariate analysis [11,12,32,33,35]. Only one study (out of eight) derived risk indices by each pancreas transplant type [33]. Only four studies (50%) reported derived risk index equations containing all final predictors with coefficients [3,11,12,32] and only four studies (50%) documented internal validation procedures (Additional file 1: Supplement 3) [3,4,12,33]. The overall PROBAST for studies deriving risk indices was rated at high risk of bias and low applicability ( Table 3). All studies except one [13] were at high risk of bias for the ' Analysis' domain due to limited reporting of discrimination and calibration metrics. The other PROBAST domains for 'Participants' , 'Predictors' and 'Outcomes' had low risk of bias in eight (100%), four (50%) and seven (88%) studies respectively. Four studies (50%) scored poorly for ' Applicability' as they included factors that could only be measured at the transplant stage (such as cold ischaemia) or post-transplant stage (such as iliac venous drainage or use of induction therapy), despite being derived for use pre-transplant [12,32,33,35].
Only two of the 12 studies (17%) externally validating PDRI reported discrimination metrics. Blok et al. reported a C-statistic of 0.69 for PDRI for an association between PDRI and pancreas survival up to 10 years when a cut-off of 1.24 was used [8]. This was similar to the C-statistic reported in the PDRI derivation study. Smigielska et al. reported a AUROC of 0.52 for PDRI as a continuous model in predicting 1-year pancreas survival but found that PDRI was not associated with the outcome [11]. Only one of 12 studies (8%) validating PDRI reported observed/expected ratios for calibration (ranging from 0.76 to 1.12 by quintile and transplant type) [41] however the study deriving PDRI [3] did not report calibration metrics hence no comparison was possible.
Of the four studies utilizing PDRI by quintiles (as per its' derivation), two reported an association between PDRI and pancreas survival (only in SPK transplants from two studies) [41,42]. Of the three studies utilizing PDRI as a continuous model, one study reported an association between PDRI and 1-year pancreas survival [42]. From the five studies validating PDRI via different risk groups to derivation, only one study reported an association with pancreas survival [8] suggesting that reclassification of PDRI during external validation may have affected the outcome.
For P-PASS, one of three (33%) external validation studies reported discrimination and calibration metrics. Kopp et al. reported a C-statistic of 0.68 for P-PASS and the outcome of donor pancreas acceptance [40]. However, observed/exposed ratios of 0.69 to 0.72 were reported by two studies (67%) [34,40] compared to P-PASS derivation (observed/exposed ratio 1.06) [4]. All three studies validating P-PASS for donor pancreas acceptance reported an association with the outcome [34,40,45].

Studies externally validating risk indices: study quality and risk of bias
Overall study quality of the external validation studies was poor (Table 4). Of the 12 external validation studies, eight studies utilized single-centre cohorts, two studies were registry-based, and one study utilized a multi-centre cohort. Missing data was present in eight studies (62%), varying from 1.4 to 73% of the cohort [8,9,11,36,37,39,40,42]. This was handled by complete case analysis in all eight studies. Three studies had no missing data [34,38,41] and one study did not report missing data [45]. Graft failure was not defined in three studies (23%) [36,39,45]. Model predictors were reclassified in several studies. In five studies, PDRI was categorised differently to how it was originally derived [8,[36][37][38][39]. In these studies, PDRI was classified either as 'high' or 'low' according to the median PDRI within the cohort, or in tertiles. Furthermore, donor race was classified differently to that of PDRI derivation in one study due to differences in that country [36] and was not clearly classified in two studies [11,38] (not reported with other study variables). P-PASS was also validated while omitting serum sodium in one study due to lack of reporting in that jurisdiction [45]. Due to the limited reporting of discrimination and calibration metrics, risk of bias for the ' Analysis' domain in PROBAST was high in all but one study [40] ( Table 5). However, domains for 'Participants' , 'Predictors' recorded low risk of bias for all 12 studies, while 'Outcomes' had low risk of bias in nine studies (75%). ' Applicability' was rated low for two studies(17%) [36,45] due to predictors being modified as previously described, and unclear for four studies [11,38,39,41] due to lack of information on outcome definition and predictor collection.

GRADE assessment
All derivation and external validation studies were included in the GRADE assessment of the overall quality of evidence by outcomes. Baseline evidence quality was downgraded to 'Moderate' as all studies were retrospective and non-randomised [30]. This was further downgraded to 'Low' due to the high risk of bias as per PROBAST. For studies examining PDRI as a continuous score as well as via various risk categories, GRADE was downgraded for 'Inconsistency' domain as varying degrees of association with the outcome were present. For studies validating PDRI using different PDRI risk groups (to its' derivation) GRADE was downgraded for the 'Indirectness' domain. Some outcomes included only one or two studies or were performed in small cohorts with low event rates, thus GRADE was downgraded for 'Imprecision' (Table 6).
In summary, 'Low' quality evidence exists for PDRI (as quintiles per derivation) in predicting risk of 1-year pancreas survival and for P-PASS in predicting donor pancreas acceptance (Table 6). Even less evidence exists in utilizing PDRI by different risk strata to its' derivation, and for PDRI in predicting 1-year patient survival.

Discussion
This systematic review of risk indices derived for use in solid pancreas transplantation found that despite 21 risk indices being derived, only P-PASS and PDRI were externally validated and are in use today [6,7]. PDRI (derived in USA) was validated in a UK cohort [41,42] while P-PASS (derived in the Netherlands) was validated in Spanish [34] and Australian [45] cohorts, albeit in a modified form in the latter.
PDRI discrimination for 1-year pancreas survival was poor to moderate (C-statistic/AUROC 0.52-0.69) and Table 3 PROBAST assessment for studies deriving risk indices PROBAST Prediction model Risk of Bias ASsesment Tool, ROB risk of bias + indicates low ROB/low concern regarding applicability; − indicates high ROB/high concern regarding applicability; ? indicates unclear ROB/unclear concern regarding applicability Study    1-year pancreas survival P-PASS discrimination was moderate for donor pancreas acceptance (C-statistic 0.68). Calibration was poorly reported in both PDRI and P-PASS derivation and external validation studies. In P-PASS external validation studies, calibration (in form of O/E ratios) was lower than in its' derivation, suggesting a degree of overestimation. A contributing factor to this from one study was that donor pancreas acceptance in that particular cohort was determined by more liberal donor pancreas acceptance cutoffs compared to the P-PASS derivation study [34].

Partcipants Predictors Outcome Analysis Participants Predictors Outcome Risk of bias Applicability
Within the studies included in our review, discrimination and calibration metrics were only reported in 88% and 38% of risk index derivation studies, 17% and 8% of studies externally validating PDRI and 33% and 67% of studies externally validating P-PASS respectively. This limits the applicability of such indices in other cohorts external to its' derivation. Limited reporting of these metrics in prediction modelling has been previously reported in a systematic review of clinical prediction studies in 2008, where discrimination and calibration were only reported in 27% and 12% of studies respectively [48]. This led to the introduction of TRIPOD to ensure completeness of data reporting in prediction studies [16]. A review of studies published before TRIPOD's inception found incomplete information to guide use of prediction models was present in > 80% of derived models [49], an issue also present in our review (Additional file 1: Supplement 2).
Our review also identified 13 studies analysing the association of PDRI (n = 2) and P-PASS (n = 11) for outcomes they were not derived to predict (Additional file 1: Supplement 4). P-PASS was analysed for an association with pancreas survival in 11 studies [8, 10, 11, 37-39, 43, 44, 46, 47, 50] (no significant association in eight studies). PDRI was associated with graft survival at 3 months in one study [12], and associated with donor pancreas acceptance in another study [40]. Therefore P-PASS in particular should not be utilised to predict pancreas survival outcomes. Our review also demonstrates that using PDRI with different risk categories to that of its' derivation, or in different cohorts without proper reclassification measures reduces its' predictive ability for pancreas survival.
Elsewhere, PDRI and P-PASS have been analysed along with other independent variables for associations with post-transplant outcomes. PDRI was associated with 1-year pancreas and patient survival in a study examining the correlation of immunological matching with graft rejection and survival in pancreas transplantation [51]. P-PASS however was not associated with pancreas survival in a study analysing donor and recipient factors predicting graft survival post-pancreas transplantation [52], again correlating with our finding of P-PASS being poorly associated with pancreas graft survival as it was not derived to predict this.
A study limitation was the inclusion of abstracts as we anticipated a limited number of studies meeting our inclusion criteria. To counter this, we contacted Table 5

PROBAST risk of bias assessment for studies externally validating PDRI/P-PASS
PROBAST Prediction model Risk of Bias ASsesment Tool, ROB risk of bias + indicates low ROB/low concern regarding applicability; − indicates high ROB/high concern regarding applicability;? indicates unclear ROB/unclear concern regarding applicability Study  study authors to obtain supporting information for any abstracts meeting our inclusion criteria however received few responses. Also, the use of C-statistic/ AUROC as a means of assessing discrimination has been discussed elsewhere as these metrics do not account for study heterogeneity and the predicted probabilities of individual variables upon the outcome [53]. Notwithstanding this, the TRIPOD assessment of discrimination includes the C-statistic/AUROC hence we have considered it an acceptable metric for this purpose. Finally, as risk indices are used in combination with clinical judgement to make transplantation decisions, other factors may confound the results. Studies externally validating risk indices should compare their baseline cohort characteristics to that of the risk index derivation study, as well as other factors such as the immunosuppression regimen used in order to stratify for key differences. Unfortunately, comparisons of baseline study characteristics were not made in 75% of external validation studies and immunosuppression regimen were not detailed in 67% of external validation studies (Table 4).

Participants Predictors Outcome Analysis Participants Predictors Outcome Risk of bias Applicability
Beyond the limited number of quality external validation studies, P-PASS and PDRI have other factors limiting their use in other cohorts. While able to predict which donor pancreata should be accepted, P-PASS was not derived to predict post-transplant outcomes [4] hence limiting its' ability to meaningfully guide pancreas transplantation decisions. While PDRI is associated with pancreas graft survival, it's predictive ability is best at differentiating risk between extreme PDRI values [3], thus PDRI values close to the median are less easy to interpret. Furthermore in some cohorts, PDRI has only been able to predict graft survival for SPK transplantation (as opposed to PAK or PTA transplants) [42].
Current solid pancreas transplant protocols identify suitable pancreas donors without established highrisk factors (i.e. cause of death from trauma, age below 40-50 years old, BMI under 30 kg/m 2 and cold ischaemic time (CIT) below 12 h) while donors beyond such criteria are either not accepted or allocated to islet cell transplantation [54][55][56]. However, such an approach may lead to an under-utilisation of donor pancreata which do not meet all the above criteria. Validated indices taking into account donor factors at time of donor offer and estimating graft or patient survival could aid in decisions for transplantation, particularly for donors who have borderline criteria by current standards. Also, incorporating recipient and other risk factors (also present at time of donor offer) [33,35] in future risk indices may further improve their predictive ability for post-transplant outcomes. Similar risk indices (with discriminatory metrics similar to that of PDRI) incorporating both donor and recipient covariates are currently being used to guide kidney transplantation decisions both locally and abroad [57,58].
Currently in Australia and New Zealand, donor race is coded differently to that of the PDRI and donor serum sodium is not routinely collected by the Australia and New Zealand Pancreas Transplant Registry. Therefore, to validate PDRI or P-PASS for use locally would require reclassification measures. Furthermore, CIT as a covariate in PDRI is not usually available at time of donor offer. Axelrod et al. acknowledge this and suggest setting the CIT to 12 h (the reference value) in such cases [3]. A similar approach would be taken with other variables such as donor ethnicity. An alternative approach is to retrospectively review local data present at time of organ offer to identify significant donor and recipient covariates associated with pancreas transplant outcomes to derive and validate a risk index which could guide local donor pancreas acceptance decisions.

Conclusions
Current data quality of studies deriving and externally validating risk indices for use in solid pancreas transplantation is inadequate. External validation for 90% of derived risk indices for solid pancreas transplantation was not performed. PDRI and P-PASS are the only risk indices currently externally validated for use in solid pancreas transplantation. PDRI was derived and validated for the outcomes of 1-year pancreas survival while P-PASS was derived and validated for donor pancreas acceptance for transplantation. Due to inadequate reporting of model performance metrics, there is currently low evidence to support their use outside current externally validated cohorts, or with different cut-offs to their derivation. To validate either risk index for use in Australia/ New Zealand would require reclassification measures due to differences in covariate coding. However, incorporating recipient and other factors which are associated with post-transplant outcomes alongside current donor covariates such as those within PDRI may increase predictive ability for future risk indices to guide solid pancreas transplantation decisions.