Prognostic value of Glypican family genes in early-stage pancreatic ductal adenocarcinoma after pancreaticoduodenectomy and possible mechanisms

Background This study explored the prognostic significance of Glypican (GPC) family genes in patients with pancreatic ductal adenocarcinoma (PDAC) after pancreaticoduodenectomy using data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). Methods A total of 112 PDAC patients from TCGA and 48 patients from GEO were included in the analysis. The relationship between overall survival and the expression of GPC family genes as well as basic clinical characteristics was analyzed using the Kaplan-Meier method with the log-rank test. Joint effects survival analysis was performed to further examine the relationship between GPC genes and prognosis. A prognosis nomogram was established based on clinical characteristics and prognosis-related genes. Prognosis-related genes were investigated by genome-wide co-expression analysis and gene set enrichment analysis (GSEA) was carried out to identify potential mechanisms of these genes affecting prognosis. Results In TCGA database, high expression of GPC2, GPC3, and GPC5 was significantly associated with favorable survival (log-rank P = 0.031, 0.021, and 0.028, respectively; adjusted P value = 0.005, 0.022, and 0.020, respectively), and joint effects analysis of these genes was effective for prognosis prediction. The prognosis nomogram was applied to predict the survival probability using the total scores calculated. Genome-wide co-expression and GSEA analysis suggested that the GPC2 may affect prognosis through sequence-specific DNA binding, protein transport, cell differentiation and oncogenic signatures (KRAS, RAF, STK33, and VEGFA). GPC3 may be related to cell adhesion, angiogenesis, inflammatory response, signaling pathways like Ras, Rap1, PI3K-Akt, chemokine, GPCR, and signatures like cyclin D1, p53, PTEN. GPC5 may be involved in transcription factor complex, TFRC1, oncogenic signatures (HOXA9 and BMI1), gene methylation, phospholipid metabolic process, glycerophospholipid metabolism, cell cycle, and EGFR pathway. Conclusion GPC2, GPC3, and GPC5 expression may serve as prognostic indicators in PDAC, and combination of these genes showed a higher efficiency for prognosis prediction.


Background
Pancreatic cancer (PC) is related to an unfavorable prognosis, and its mortality rate is close to its incidence rate [1]. The incidence of PC is predicted to rise 40% in the next 10 years in North America and Europe [2], and according to the latest statistics, PC ranks fourth among cancers directly causing death for men and women in the United States [3], moreover, by 2030, its rank may increase to second [4]. In China, the prognostic status of PC patients is also severe, and 5-year survival rate of patients with PC after age standardization is approximately 11.7% [5]. Due to the unique biological behaviors of PC, metastasis is present when patients are diagnosed and only 9.7% patients can be diagnosed at an early stage [6]. Furthermore, the 5-year survival rate is 9% for PC at all stages and 3% at advanced stages [3]. So far, surgical resection remains the best therapy for PC at the early stage [7]. Therefore, identifying reliable early molecular markers to improve prognosis of PC is important. Glypican (GPC) family genes include six members (GPC1, GPC2, GPC3, GPC4, GPC5, GPC6), and all of the GPC family are expressed in human [8]. Glypicans are attached to the cell membrane and function in biological processes such as cell and tissue growth, embryo development, and cell movement [9,10]. They are reported to be related to multiple diseases including various cancers. GPC1 is upregulated in pancreatic cancer [11], esophageal cancer [12], and prostate cancer [13]. Li et al. report that GPC1 contributes to the proliferation and motility of esophageal cancer cells through the PTEN/Akt/βcatenin pathway [14]. Increased level of GPC3 in serum could serve as a marker for hepatoblastoma [15] as well as hepatocellular carcinoma (HCC) [16,17]. GPC3 deletion mutation can help in diagnosis of Simpson-Golabi-Behmel syndrome type 1 (SGBS1), which is a serious genetic disease [18,19]. Overexpression of GPC5 may accelerate tumor progression of lymphoma [20]. In addition, GPC5 may play a role in strengthening the interaction between Patched 1 and Hedgehog signaling in rhabdomyosarcoma [21]. GPC5 may serve as a key gene affecting the cell cycle of podocytes in kidneys, finally causing nephrotic syndrome [22].
Pancreatic ductal adenocarcinoma (PDAC) accounts for more than 80% of pancreatic neoplasms [1,23]. However, there are few studies on the prognostic value of GPC family genes in early-stage PDAC after pancreaticoduodenectomy despite the poor prognosis of this tumor type.
In this study, we explored the relationship between GPC family genes expression and prognosis of PDAC patients.

Patient data
The RNA-sequencing dataset used in this study and the corresponding clinical data were acquired from The Cancer Genome Atlas (TCGA) (https ://porta l.gdc.cance r.gov/; accessed September 25, 2019), and DESeq was applied to normalize the initial material [24]. To increase reliability of data analysis, previously established inclusion and exclusion criteria were used [25]. The inclusion criteria were as follows: (i) survival information was complete; (ii) histology result was confirmed as PDAC; (iii) pathologic stage was I or II; (iv) pancreaticoduodenectomy was carried out on patients. PDAC patients with pathologic stage III or IV and those who underwent other surgical procedures were excluded from the study. According to the above criteria, 112 patients were included in the analysis. The clinical characteristics included in the analysis were age, sex, alcohol history, pathologic stage, histologic grade, radical resection, radiation therapy, targeted molecular therapy, overall survival (OS) time, and survival status. Dataset GSE62452 was downloaded from Gene Expression Omnibus (GEO) database to validate the prognostic value of survivalrelated genes (https ://www.ncbi.nlm.nih.gov/geo/query / acc.cgi?acc=GSE62 452; accessed October 5, 2020). Following the same criteria described above, we included 48 cases in this study.

Analysis using public database
The expression status of GPC family genes in different normal tissues was analyzed by the Genotype-Tissue Expression (GTEx, https ://www.gtexp ortal .org/, accessed October 9, 2019) website [26,27]. The Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia .cance r-pku.cn/, accessed October 9, 2019), an online tool containing 9736 tumors and 8587 normal samples from the TCGA and the GTEx projects, was used to show expression level of each gene in both tumor and normal tissues of PDAC [28]. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) v6.8 (https :// david .ncifc rf.gov/, accessed November 6, 2019) [29,30] was chosen to carry out gene enrichment analysis containing Gene Ontology (GO) function analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The possible functioning pathways of the genes were also investigated by Biological Network Gene Ontology (BiNGO) in Cytoscape (version 3.7.1) [31].

Survival analysis
Two groups of patients were set up based on 50% cutoff expression value of each gene both in TCGA database and GEO database. The relationship between OS and gene expression level as well as basic clinical characteristics was analyzed using Kaplan-Meier method with the log-rank test. Log-rank P < 0.05 was considered statistically significant. Multivariate Cox proportional hazards regression analysis was used to adjust for prognosis-significant factors. Hazard ratio (HR) and 95% confidence interval (CI) were considered to estimate the relative risk. Stratified analysis was carried out based on certain clinical characteristics of the patients for survivalrelated genes to explore their significance in prognosis. To understand the relationship between GPC genes and prognosis at a deeper level, joint effects survival analysis was taken into consideration. The survival-significant clinical characteristics, clinical factors usually related to prognosis of patients with malignant tumors clinically and prognosis-related genes were included to establish a

Genome-wide co-expression analysis
Genome-wide co-expression analysis of prognosisrelated genes was performed to investigate their potential biological mechanisms based on TCGA database. A gene a b c d e f with Pearson correlation coefficient > 0.5 and P < 0.05 was considered as a co-expression gene. A co-expression network was built for each gene related to prognosis and its co-expressed genes using Cytoscape software (version 3.7.1) [32]. GO function analysis and KEGG pathway analysis of these genes were also completed using DAVID [29,30].

Statistical analysis
Survival analysis was performed using Kaplan-Meier method with log-rank test. Univariate and multivariate survival analyses were performed with Cox proportional hazards regression model to calculate crude and adjusted HRs and 95% CIs. Survival curves were plotted using GraphPad Prism v.7.0 (GraphPad Software Inc., La Jolla, CA).
The unpaired t test was used to compare gene expression levels between normal and tumor tissues. The expression relationship of each GPC gene and its coexpressed genes was quantified by Pearson's correlation coefficient. The correlation plot was constructed using Cytoscape software (version 3.7.1). All statistical analyses were performed using SPSS v.25.0 software (IBM, Chicago, IL, USA). A P value < 0.05 was considered statistically significant.

Analysis using public database
The expression status of GPC family genes in tissues derived from various normal human organs was analyzed using GTEx (Fig. 1). The expression level of GPC family genes was lower in human pancreas than in other organs. The results of GEPIA analysis showed that expression of GPC1, GPC3, GPC4, and GPC6 was significantly higher in PDAC tumor tissues than in normal tissues (P < 0.05) (Fig. 2). GO functional enrichment analysis indicated that GPC family genes were mainly involved in composition of cell membrane, organelles and anchored components of the membrane, heparan sulfate proteoglycan binding, and glycosaminoglycan metabolic process (Fig. 3, Additional file 1: Table 1). The results of BiNGO analysis (Fig. 4) confirmed those of GO analysis.

Survival analysis
The Kaplan-Meier method and log-rank test were used to investigate the association between basic clinical characteristics and OS in TCGA database. Additional file 2: Table 2 shows that histologic grade, extent of surgery, treatment with radiation and targeted molecular therapy were significant in OS. GPC family genes were divided into two groups based on expression level, and survival analysis was performed between the two groups.
The results (Fig. 5a-f ) Table 2. High expression of GPC2 was significantly associated with better OS in patients who were male, were > 60 years old, had histologic grade G1 or G2, had R1 or Rx resection or whether received radiation therapy. GPC3 expression was related to patients who were female, were > 60 years old, had histologic grade G1 or G2, or did not receive radiation or targeted molecular therapy. Moreover, GPC5 could influence prognosis of patients who were ≤ 60 years old, had histologic grade G3 or G4, had R1 or Rx resection, or did not receive radiation or targeted molecular therapy.

Joint effects analysis
Based on the prognostic significance of each GPC family gene, we combined every two genes among GPC2, GPC3, and GPC5 to investigate their significance in PDAC prognosis. The combination of GPC2 and GPC3 was associated with worse survival outcome in group 1 (MST = 278 days, adjusted P value < 0.001). The group of GPC2 and GPC5 was associated with the highest risk of death in group I (MST = 278 days, adjusted P value < 0.001) and the group combining GPC3 and GPC5 showed the poorest prognosis in group i (MST = 278 days, adjusted P value < 0.001).
We also analyzed survival associated with the three genes simultaneously. Group A showed the worst in survival status (MST = 219 days, adjusted P value = 0.018),  whereas the best survival was observed in group D (MST = 702 days, adjusted P value < 0.001). These data are shown in Table 3 and Fig. 7a-d showed the survival curves.

Prognosis nomogram
Based on the status of each clinical parameter and expression levels of GPC2, GPC3, and GPC5, a score for each variable was calculated. The total score could be calculated to predict 1-, 2-, and 3-year survival probabilities. The nomogram (Fig. 8) indicated that GPC2, GPC3, and GPC5 affected the prognosis of PDAC to different degrees.

Validation dataset to demonstrate the prognostic value of survival-related genes
To further understand the prognostic value of GPC2, GPC3, and GPC5, we acquired the GSE62452 dataset from GEO database. As shown in Additional file 3: Table 3, histologic grade was significantly associated with OS. GPC family genes were also divided into two groups by the median expression level of each gene and survival analysis between the two groups was carried out. Table 4 and Fig. 9a-f show that higher expression of GPC3 was significantly related to better survival (log-rank P = 0.038) and higher expression of GPC2 and GPC5 was also related to better survival, though not significantly (log-rank P = 0.337 and 0.090, repectively). Multivariate Cox proportional hazards regression analysis adjusted for prognosis-related clinical characteristics showed that none of these genes was significantly correlated to overall survival (all adjusted P > 0.05).

Genome-wide co-expression analysis of GPC2, GPC3 and GPC5 in PDAC
Genome-wide co-expression analysis was performed for each of these genes to investigate their related functional pathways through TCGA database. For GPC2 and its coexpressed genes, a correlation network was established as shown in Fig. 10a (Additional file 4: Table 4). GO analysis indicated that GPC2 and its co-expressed genes functioned mainly in sequence-specific DNA binding, protein transport, cell differentiation, and anterior/posterior pattern specification (Fig. 10b, Additional file 5: Table 5).
The correlation network for GPC5 and its coexpressed genes was shown in Fig. 12a and Additional file 9: Table 9. The results of GO analysis showed that these genes were associated with transcription factor complex and phospholipid metabolic process (Fig. 12b, Additional file 10: Table 10). KEGG analysis showed that these genes were involved in pancreatic secretion and glycerophospholipid metabolism (Fig. 12c, Additional file 11: Table 11).

Gene set enrichment analysis
GSEA was carried out to explore possible mechanisms of GPC family genes affecting prognosis of PDAC patients through TCGA database. The results of c6 reference indicated that low GPC2 expression was closely related to oncogenic signatures such as KRAS, RAF1, STK33, and VEGFA ( Fig. 13a-f; Additional file 12: Table 12). GSEA results of c2 enrichment showed that high GPC3 expression was associated with neuroactive ligand receptor interaction and GPCR ligand binding (Fig. 14a-c; Additional file 13: Table 13), and c6 enrichment suggested that high GPC3 expression was correlated to cyclin D1, p53, and PTEN ( Fig. 13d-f; Additional file 14: Table 14). For GPC5, c2 reference indicated that low expression of GPC5 was related to the EGFR pathway, gene methylation status, TFRC1, and the cell cycle (Fig. 15a-d; Additional file 15: Table 15) and c6 reference indicated that low GPC5 expression was related to HOXA9 and BMI1 (Fig. 15e-f; Additional file 16: Table 16).

Discussion
In this research, we studied the relationship between GPC family gene expression and prognosis of early-stage PDAC patients after pancreaticoduodenectomy both in TCGA database and GEO database. We concluded that high expression of GPC2, GPC3, and GPC5 was significantly related to favorable prognosis in TCGA database, suggesting the value of these genes as biomarkers for predicting the prognosis of PDAC patients. Moreover,  miR-149 can restrain both GPC1 expression and cell proliferation of the tumor, suggesting that GPC1 can be used as a marker for diagnosis and therapy of colorectal cancer [36]. It is reported that GPC2 could promote the proliferation of neuroblastoma cells as a result of MYCN binding to a motif of the promoter of GPC2 and gain of chromosome 7q [37]. GPC2 can also be used an effective prognostic indicator for prostate cancer and neuroblastoma [37][38][39]. GPC3 blocks the cell cycle in renal cancer cells 786-O and ACHN at G1 phase [40]. Overexpression of GPC3 reduces progression and metastasis of breast cancer cells LM3 through targeting canonical Wnt pathway [41]. The GPC5 rs2352028 variant and lower expression of this gene may contribute to increased risk of lung cancer [42,43]. Sun et al. have shown that GPC5 regulates epithelial-mesenchymal transition to reduce invasion of prostate cancer cells [44]. Its expression can serve as a prognostic indicator in a cohort of prostate cancer patients in China [45]. In this study, we demonstrated the relationship between OS and expression levels of GPC2, GPC3, and GPC5. Combined with results of GEPIA, it demonstrates their roles as tumor suppressor genes in PDAC.
To explore potential mechanisms of GPC genes affecting prognosis, we conducted GSEA and genome-wide coexpression analyses. The results showed that GPC2 was associated with sequence-specific DNA binding, protein transport, cell differentiation and oncogenic signatures (KRAS, RAF, STK33, and VEGFA). In pancreatic cancer, mutation of TP53 at codon 249 can alter the structure of p53, thus affecting its binding to a specific region of DNA and enhancing the risk of cancer [46,47]. A study showed that GDF11 regulates the biological behaviors of pancreatic cancer cells to influence their differentiation and high expression of GDF11 is associated with favorable OS in pancreatic cancer [48]. RAF1 accelerates migration and invasion of pancreatic cancer and disorders of the RAF1 pathway are related to worse prognosis in pancreatic cancer patients [49,50]. Moreover, microRNA-216a may downregulate RAF1 in pancreatic cancer and increase cell apoptosis [51]. VEGFA expression can increase as a result of the long non-coding RNA (lncRNA) 00511 in PDAC, which finally promotes tumor progression. The expression level of lnc00511 can be used as an indicator of prognosis in PDAC [52].
GPC3 is related to cell adhesion, angiogenesis, inflammatory response, signaling pathways like Ras, Rap1, PI3K-Akt, chemokine, GPCR, and signatures like cyclin D1, p53, PTEN. For pancreatic cancer patients, the degree of inflammatory response can be measured by serum lactate dehydrogenase level and it is associated with the outcome of patients [53]. Angiogenesis is dysregulated in PDAC, and it contributes to proliferation and deterioration of the tumor, making survival of patients worse [54, [56]. In PDAC associated with the KRAS mutation, decitabine therapy inhibits tumor growth [57]. ARF6 is reported to be in close relationship with the Ras pathway and its overexpression is related to unfavorable prognosis of PDAC patients [58]. PTEN plays a role in pancreatic cancer growth. The function of PTEN is regulated by HNF1A and finally affects the survival of pancreatic cancer patients [59,60]. GPC5 is involved in the transcription factor complex TFRC1, oncogenic signatures HOXA9 and BMI1, gene methylation, phospholipid metabolic process, glycerophospholipid metabolism, cell cycle, and the EGFR pathway. In pancreatic cancer, the transcription factor hif-2α can speed up metabolism and promote tumor proliferation and high level of hif-2α correlates with worse OS [61,62]. The methylation status of GRAP2, ICAM3, A2ML1, MUC1, and MUC4 can influence the expression of these genes, which is associated with survival of pancreatic cancer [63,64]. Phosphatidylserine is related to apoptosis of pancreatic cancer cells with the involvement of microparticles [65]. Stimuli such as oxidative stress can make phosphatidylserine appear outside on the pancreatic cancer cell membrane, finally leading to dysregulation of factors and cells such as VEGF and macrophages, making prognosis of patients  Fig. 11 a Correlation network for Glypican3 and its co-expression genes in The Cancer Genome Atlas database. The pink nodes are genes correlated positively and the blue nodes are genes correlated negatively. b Function enrichment analysis of Gene Ontology for Glypican3 and its co-expression genes. c Function enrichment analysis of Kyoto Encyclopedia of Genes and Genomes for Glypican3 and its co-expression genes a c b Fig. 12 a Correlation network for Glypican5 and its co-expression genes in The Cancer Genome Atlas database. The pink nodes are genes correlated positively. b Function enrichment analysis of Gene Ontology for Glypican5 and its co-expression genes. c Function enrichment analysis of Kyoto Encyclopedia of Genes and Genomes for Glypican5 and its co-expression genes The present study had several limitations. First, clinical data acquired from TCGA and GEO databases did not include all the relevant information, and there may be some factors that needed to be adjusted. Second, because the study included PDAC patients who underwent pancreaticoduodenectomy, the sample size was relatively small. Third, the results of genome-wide analysis and GSEA analysis were based on online databases to predict potential processes influencing prognosis, and further studies at molecular and genomic levels are necessary to confirm the results.
Despite these limitations, we identified GPC2, GPC3, and GPC5 as biomarkers for prognosis of PDAC patients and showed that joint effects analysis was more effective for prediction of prognosis. We also explored possible mechanisms of survival-significant genes affecting PDAC prognosis through genome-wide analysis and GSEA analysis. These results could all improve prognostic prediction for PDAC and provide information valuable for the management of PDAC patients and making better clinical decisions in this population.