Establishment of prognostic nomogram for elderly colorectal cancer patients: a SEER database analysis

Background This study aimed to establish nomogram models of overall survival (OS) and cancer-specific survival (CSS) in elderly colorectal cancer (ECRC) patients (Age ≥ 70). Methods The clinical variables of patients confirmed as ECRC between 2004 and 2016 were retrieved from the Surveillance, Epidemiology, and End Results (SEER) database. Univariate and multivariate analysis were performed, followed by the construction of nomograms in OS and CSS. Results A total of 44,761 cases were finally included in this study. Both C-index and calibration plots indicated noticeable performance of newly established nomograms. Moreover, nomograms also showed higher outcomes of decision curve analysis (DCA) and the area under the curve (AUC) compared to American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) stage and SEER stage. Conclusions This study established nomograms of elderly colorectal cancer patients with distinct clinical values compared to AJCC TNM and SEER stages regarding both OS and CSS.

Cancer (AJCC) is widely used in the therapeutic and prognostic administration of colorectal cancer. Given increasing values of multiple variables, including tumor size and marital status, have been noticed [13,14], a more comprehensive prognostic predictor is necessary for ECRC.
Of note, knowledge regarding the clinical prediction of ECRC is limited, with very few studies focusing on the nomogram implementation. In this study, a ECRCtargeting nomogram was established for prognostic prediction based on large sample size retrieved from the Surveillance, Epidemiology, and End Results (SEER) database in hopes of elucidating further prognostic insights [15].

Recruitment of patients from SEER database
The clinical variables of patients confirmed as ECRC between 2004 and 2016 were retrieved from the SEER database, a program established by National Cancer Institute aiming for comprehensively national-level clinical investigation [16,17]. The reference number was 16,595-Nov2018. The inclusion criteria were: 1) colon and rectum (site recode, international classification of diseases for oncology (ICD-O-3)/WHO 2009); 2) age ≥ 70; 3) complete information on TNM stage; 4) only one primary tumor cases were selected; 5) Fig. 1 The inclusion criteria flowchart of recruited patients in SEER database Fig. 2 The X-tile analysis of best-cutoff points of age and tumor size variables. a X-tile plot of training sets in age; b the cutoff point was highlighted using a histogram of the entire cohort; c the distinct prognosis determined by the cutoff point was shown using a Kaplan-Meier plot (low subset = blue, middle subset = gray, high subset = magenta); d X-tile plot of training sets in tumor size; e the cutoff point was highlighted using a histogram; f Kaplan-Meier plot of prognosis determined by the cutoff point (low subset = blue, middle subset = gray, high subset = magenta) surgery performed in each case. Next, all included cases were randomly divided into training and validation sets with equal sample size. In addition, x-tile software was used to determine and visualize the best cutoff points of age and tumor size variables in this study [18].

Clinical variables extracted for analysis
Age, sex, marital status, tumor site, histological grade, SEER stage, the AJCC TNM stage, distant metastasis (bone, brain, liver and lung) and tumor size were all selected for the establishment of nomogram modeling. Regarding the clinical outcome, overall survival (OS) and cancer-specific survival (CSS) were chosen as the primary and second endpoints.

Construction and validation of the nomogram
Statistically, chi-square test was used for all included categories between training and validation groups. Next, univariate and multivariate analysis were used to determine distinct variables, which were further output for the construction of nomogram model by R software 3.3.0 (R Foundation for Statistical Computing, Vienna, Austria, www.r-project.org). Then, the validation group was used for the assessment of the newly established nomogram. The comparison between the nomogram prediction and observed outcomes was assessed by the concordance index (C-index). The calibration plot was used for visualized comparison between prognosis predicted by nomogram and actual ones. Sensitivity and specificity were evaluated by receiver operating characteristics curve (ROC)-the area under the curve (AUC). Furthermore, the power of nomogram model was also compared to the TNM stage and SEER stage in both ROC and decision curve analysis (DCA). All analysis was achieved by R software 3.3.0, with p value< 0.05 considered as statistically significant.

Characterization of included cases
Following inclusion criteria, a total of 44,761 cases were finally included in this study with 22,381 assigned to training set and 22,380 to validation set randomly ( Fig. 1). Among all patients, 44.6% were male and 55.4% female; 47.6% were unmarried and 46.8% married; 81.9% were colon cancer and 18.1% rectal cancer; 0.3% of cases had bone metastasis, 0.1% with brain metastasis, 7.0% with liver metastasis, 1.8% with lung metastasis. The cutoff points of age and tumor size were determined by x-tile (Fig. 2). Specifically, 40.9% were < =76 years old, 44.5% between 77 and 86 years old, and 14.7% > =87 years old. 29.8%    (Table 3). Thus, OS and CSS nomogram models of 1-, 3-and 5-year were established, respectively (Fig. 3a, b).

Nomogram validation
The assessment was performed both internally and externally, measured by C-index and calibration plots.  (Fig. 6). Meanwhile, nomograms in OS and CSS also showed higher statistic power to AJCC TNM stage and SEER stage (Figs. 7, 8, Table 5).

Discussion
Up to now, numerous studies had investigated the role of prognostic nomograms for colorectal cancer patients using SEER database for variable objects [19,20]. In fact, increasing studies tended to focus more on the therapeutics or modified classification, with very rare    highlighted the role of age in the prognostic assessment of colorectal cancer. Our previous study reported that a nomogram for early-onset colorectal cancer patients could display comparably higher Cindex value and better performance than conventional variables [21]. ECRC, on the other hand, had been explored with limited studies. Li et al. reported that, with 18,937 included cases, adjuvant chemotherapy did not offer additional survival benefits to elderly patients with stage II or III [22]. Nonetheless, a general prognostic nomogram of ECRC is yet to be fully characterized. In this study, the nomograms displayed higher C-index and convinced calibration plots for OS and CSS prediction using SEER database. Moreover, they achieved higher values regarding both AUC and DCA assessment systems compared to AJCC TNM and SEER stages. Of note, in OS, 12 variables (sex, age, marital status, grade, AJCC TNM, bone metastasis, brain metastasis, liver metastasis and lung metastasis and tumor size) out of 15 variables were determined for the construction of nomogram. Similar feature had also been noticed in CSS nomogram. It was highly possible that the prognosis of ECRC could be associated with more variables than common colorectal cancer cases. Moreover, four types of distant metastasis, for the first time, had been incorporated for nomogram of ECRC in SEER analysis.
In addition, X-tile tool was introduced for the best cutoff values of age and tumor size in this study. X-tile    tool was established as a powerful graphic method to illustrate potential subsets (cutoff) with construction of a two dimensional projection [18]. It had been widely used in numerous investigations, including esophageal squamous cell carcinoma, bladder cancer and chondrosarcoma [23][24][25]. In this study, for the first time, subsets of consecutive variables, age and tumor size, were determined by X-tile tool. In fact, the role of tumor size had been intensively studied [26]. However, the cutoff points of tumor size in colorectal cancer remain largely arbitrary. Therefore, introduction of X-tile for the classification of tumor size could be both reliable and replicated. Generally, elderly patients may naturally associate with increased mortality as age increased. However, no study did fully cover nor depict the quantified association of age and risks for prognosis, particularly when elderly patients had surpassed 70 years old. In our study, age itself was identified as a higher risk factor in OS compared to CSS nomogram, with age ≥ 87 representing nearly 90 points in OS but less than 60 points in CSS. Interestingly, female was identified as a protective factor in OS nomogram, instead of CSS nomogram. Moreover, marriage is also identified as a protective factor in both OS and CSS nomogram. By comparing OS and CSS nomograms, insightful clues had been noticed for further external clinical investigation.

Conclusion
This study established nomograms of elderly colorectal cancer patients with distinct clinical values compared to AJCC TNM and SEER stages regarding both OS and CSS.