Development of a gastric cancer risk calculator for questionnaire-based surveillance of Iranian dyspeptic patients

Gohari, Kimiya; Saberi, Samaneh; Esmaieli, Maryam; Tashakoripour, Mohammad; Hosseini, Mahmoud Eshagh; Nahvijou, Azin; Mohagheghi, Mohammad Ali; Kazemnejad, Anoshirvan; Mohammadi, Marjan

doi:10.1186/s12876-024-03123-z

Research
Open access
Published: 18 January 2024

Development of a gastric cancer risk calculator for questionnaire-based surveillance of Iranian dyspeptic patients

Kimiya Gohari^1,2,
Samaneh Saberi¹,
Maryam Esmaieli¹,
Mohammad Tashakoripour³,
Mahmoud Eshagh Hosseini³,
Azin Nahvijou⁴,
Mohammad Ali Mohagheghi⁴,
Anoshirvan Kazemnejad² &
…
Marjan Mohammadi¹

BMC Gastroenterology volume 24, Article number: 39 (2024) Cite this article

822 Accesses
Metrics details

Abstract

Background

Gastric cancer (GC) is considered a silent killer, taking more than three quarters of a million lives annually. Therefore, prior to further costly and invasive diagnostic approaches, an initial GC risk screening is desperately in demand.

Methods

In order to develop a simple risk scoring system, the demographic and lifestyle indices from 858 GC and 1132 non-ulcer dyspeptic (NUD) patients were analysed. We applied a multivariate logistic regression approach to identify the association between our target predictors and GC versus NUD. The model performance in classification was assessed by receiver operating characteristic (ROC) analysis. Our questionnaire covering 64 predictors, included known risk factors, such as demographic features, dietary habits, self-reported medical status, narcotics use, and SES indicators.

Results

Our model segregated GC from NUD patients with the sensitivity, specificity, and accuracy rates of 85.89, 63.9, and 73.03%, respectively, which was confirmed in the development dataset (AUC equal to 86.37%, P < 0.0001). Predictors which contributed most to our GC risk calculator, based on risk scores (RS) and shared percentages (SP), included: 1) older age group [> 70 (RS:+ 241, SP:7.23), 60–70 (RS:+ 221, SP:6.60), 50–60 (RS:+ 134, SP:4.02), 2) history of gastrointestinal cancers (RS:+ 173, SP:5.19), 3) male gender (RS:+ 119, SP:3.55), 4) non-Fars ethnicity (RS:+ 89, SP:2.66), 5) illiteracy of both parents (RS:+ 78, SP:2.38), 6) rural residence (RS:+ 77, SP:2.3), and modifiable dietary behaviors (RS:+ 32 to + 53, SP:0.96 to 1.58).

Conclusion

Our developed risk calculator provides a primary screening step, prior to the subsequent costly and invasive measures. Furthermore, public awareness regarding modifiable risk predictors may encourage and promote lifestyle adjustments and healthy behaviours.

Peer Review reports

Introduction

According to the International Agency for Research on Cancer (IARC), gastric cancer (GC) is responsible for more than 769,000 global deaths, equating to one in every 13 deaths, for the year 2020 [1]. GC is more prevalent amongst the male population, such that ~ 49 in 100,000 males suffer from this disease, which is more than twice its prevalence in females (~ 21 in 100,000) [2]. Stomach cancer mainly involves older people, with the average age of diagnosis being 68 and more than half of people diagnosed are 65 or older [3].

GC is a multistep and multifactorial process involving genetic and environmental factors [4]. Besides age and gender as known risk factors, there is much evidence that unhealthy diets [5, 6], alcohol abuse [7], smoking [8,9,10] and other factors such as genetics, environmental and behavioural factors [11,12,13,14] enhance the risk of GC development.

Considering the incidence rates of GC in most countries are expected to decrease through 2030, reductions in smoking, prevalence of Helicobacter pylori infection and diet improvement will be the likely contributing factors [15].

It is most desired to estimate the primary disease risk using general information, consuming the least time and resources [16]. This can be made possible by combining clinical knowledge with applied data science [17]. The ultimate product should be an optimal tool, readily performed by anyone without expert knowledge, using their personal information. Ideally, the application of such tools will help increase awareness and ultimately reduce the burden of disease on the community and the health care system [18].

Classification methods are usually used to develop a risk score and identify high-risk individuals in a population [19,20,21,22]. In this study, we have used multivariate analysis to identify individual predictors differentiating GC from NUD patients. For this purpose, we carried out a logistic regression approach, using 64 predictors of known risk factors including demographic features, dietary habits, self-reported medical status, narcotics use, and SES indicators. Developing a time and cost-effective algorithm that uses the personal medical history and lifestyle habits to screen subjects for GC risk, can provide a tool for filtering dyspeptic patients prior to the more invasive screening approaches.

Materials and methods

Study setting

This hospital-based observational study was conducted on a group of Iranian gastric cancer (GC) patients (n = 858), who were consecutively (July 2003 to Jan 2020) referred to the National cancer Institute of Iran (NCII). Our GC cases were diagnosed with histologically confirmed gastric adenocarcinoma. The non-ulcer dyspeptic (NUD) patients (n = 1132) were those who had referred for upper gastroscopy, but lacked GC. NUD patients were admitted at the endoscopy unit of Amiralam Hospital. Both centers shared similar SES profiles. The anatomic location (subsite) of the tumor was classified as cardia (defined as cardioesophageal junction, oesophagogastric junction and gastroesophageal junction) or non-cardia (all other locations in the stomach) [23]. Histopathologic studies identified the subtype of the gastric tumors, as intestinal or diffuse [24].

Trained technicians interviewed each participant at the time of recruitment, using a structured questionnaire. This questionnaire elicited 64 predictors, including demographic features, dietary habits, self-reported medical status, narcotics use, and SES indicators. Primarily, each of the questionnaire predictors, with multiple levels (with the exception of age) was turned into binary groups (S-Table-1).

In order to use the properties of data while assuming the power of 90% for testing the significance of the odds ratio in the logistic regression model, a minimum of 936 observations were required. Therefore, our data, including 1990 observations, had sufficient power for the risk score development.

Statistical analysis

Imputation of the missing data

We used multivariate imputation by chained equations (MICE) [25], to deal with missing data in more than one variable. In this method, two general approaches for imputing multivariate data have been applied: joint modeling (JM) [computational strategies for multivariate linear mixed-effects models with missing values [26], multilevel models with multivariate mixed response types] and full conditional specification (FCS) [multivariate imputation by chained equations-dependency networks for inference, collaborative filtering, and data visualisation [27].

To validate our imputation method, we conducted the following steps. At first, using the bootstrap method [28], based on the distribution of data, we have generated ten copies of our dataset. The multivariate imputation method imputed the missing values in each copy, and five new complete datasets were generated for all the copies. In this manner, we achieved 50 complete datasets. The distribution of all variables in the original dataset and these 50 imputed versions were compared. The variables were kept in the dataset, if the deviation in the mean (S-Fig. 1) and standard deviation (S-Fig. 2) did not exceed 0.05. Next, we randomly converted 10% of the observed values for each variable into missing and imputed them again. This process was repeated 1000 times and all the variables’ biases were calculated (S-Fig. 3). The cut-off value for the bias variation was set at 2%. We aimed to maintain the imputation bias under this cut-off value.

Model development

We used Chi-square test to measure associations between predictors and outcomes (Table 1). Statistical significance was determined using 2-sided P-values, with values < 0.05 considered as statistically significant. We have presented a univariate analysis of all predictors and assessed the association between each of them with GC vs. NUD, without taking into consideration the other predictors. In this step, we emphasized on the distribution of each (Table 1). In the next step we performed multivariate logistic regression analysis on 70% of randomly selected observations, and determined the associations with each predictor, while adjusting for all others (Table 2) [29]. The probability of having GC vs. NUD, based on the logistic model, was calculated [30].

Table 1 Distribution of predictors amongst GC versus NUD patients

Full size table

Table 2 The results of the multivariate logistic regression model to explore the GC-prone versus NUD-prone predictors

Full size table

The probability of being GC versus NUD was computed using logistic regression:

$$P(GC)=\frac{1}{1+{e}^{-\left({\beta}_0+{\beta}_1{X}_1+{\beta}_2{X}_2+\dots +{\beta}_k{X}_k\right)}}$$

Where β₀ is the intercept term and β₁, β₂, …, β_k are the coefficients associated with the input features X₁, X₂, …, X_k.

This study divided patients into two risk groups based on an assigned cut-off point, derived from fixing the sensitivity rate at a minimum of 90%, while maximizing the specificity rate. Accordingly, the best threshold for the risk score was identified. We defined the shared percentage for every predictor in our risk calculator, as the contribution of each variable in predicting GC vs. NUD, as clinical outcomes. This measure is the proportion of the standardized regression coefficient (point estimates) for each predictor relative to their total sum (Table 2 and Fig. 1). The final risk score for each predictor was calculated by the multiplication of their pertinent point estimate by 100.

Model validation

We used the train-test split method [31] for determining the performance criteria (AUC, sensitivity, specificity, precision, false-positive, false-negative, and accuracy rates) of our logistic model (Fig. 2), as well as to assess its internal validity.

These performance criteria were calculated as follows:

The accuracy rate, which measures the overall correctness of the classification, was calculated as:

$$Accuracy=\frac{TP+ TN}{TP+ TN+ FP+ FN}$$

The sensitivity rate (true positive rate or recall), which measures the proportion of actual positive instances that were correctly identified, was calculated as:

$$Sensitivity=\frac{TP}{TP+ FN}$$

The specificity rate (true negative rate), which measures the proportion of actual negative instances that were correctly identified, was calculated as:

$$Specificity=\frac{TN}{TN+ FP}$$

Where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN is the number of false negatives.

To do this, the data were randomly divided into the development (70%) and validation (30%) subsets. The performance criteria of our GC risk calculator were determined by examining calibration and discrimination measures. Calibration refers to how closely the predicted probability of having GC agrees with the observed GC status and is assessed by the Hosmer-Lemeshow test [32]. The discrimination rate expresses the ability of the model to differentiate between individuals with GC versus NUD. This was evaluated by calculating the area under the ROC curve (AUC) [33]. An AUC value of 50 and 100 was considered as having no versus perfect discrimination, respectively. Risk thresholds that gave a combination of more than 85% sensitivity rates and maximum specificity rates were derived from the list provided by the ROC curve analysis. All statistical analysis and data visualizations were done in R statistical software environment.

Results

Descriptive information

Our observational study included 858 GC [development = 610 and validation = 248] and 1132 NUD [development = 783 and validation = 349] patients, who were entered into this hospital-based study.

All of the 64 questionnaire predictors, with multiple levels, were converted into binary categories, as presented in S-Table 1. We have also presented the distribution of each of these predictors amongst GC versus NUD patients, without any adjustments for other predictors in Table 1. The results of the Chi-square test showed that the distribution of most (47/64) of the predictors were different between GC and NUD patients (Table 1). The association between predictors and GC vs. NUD affected the model, and although most variables were independently associated with GC (Table 1), when adjusted for all other variables, few associations, remained statistically significant (Table 2).

The data obtained from the 64 predictors from our 1990 (GC + NUD) cases included varying degrees of missingness. To remedy this, we used the MICE method to impute the missing values. But first it was critical to validate our imputation technique and ascertain a consistent distribution for each predictor thereafter. Having done so, in the first approach amongst the 50 regenerated samples, the mean (S-Fig. 1) and standard deviation (S-Figure-2) differences, between our actual and imputed data did not exceed 0.05 and were thus acceptable. In the second approach, for all 1000 bootstrap-generated samples, the bias was determined as under 0.02 (S-Fig. 3).

Model development

Logistic regression specified the strength of association between each of our 64 predictors and the clinical outcome (GC or NUD). The strengths of association for each of the predictors (if any), while adjusting for all other predictors, are presented via risk scores, shared precents, and odds ratios (Table 2). Taking into consideration all of the 64 predictors in our model, a risk calculator was created, scoring for GC or NUD (Fig. 1). The obtained risk score ranged from − 1261 to + 2077 (total range of 3338), moving from NUD towards GC. Of this range, the risk score of − 451, equivalent to a shared percentage of 13.49, was assigned to subjects at reference level. The remaining 86.51 percentage of the risk score was contributed by our 64 predictors, with varying shares. Aiming for a minimum sensitivity rate of 90%, the risk score of − 91, coinciding with the probability value of 0.29 (ranging from 0 to 1.0, Fig. 2) was identified as the cut-off point. Keeping in mind that each of the addressed 64 predictors contributed to the final risk score, those which were statistically significant are described below.

GC-prone predictors

The predictors which acted towards the development of GC are considered as GC-prone. Amongst the demographic category, older age holds the first place, creating risk scores of + 134 to + 221 to + 241, for subjects aged > 50 – 60, > 60 – 70 and > 70, in reference to those aged ≤ 50 years, respectively. These values were sequentially equivalent to 4.02, 6.60, and 7.23 shared percentage (SP) of the total range. Next in line, was being of male gender and non-Fars/mixed ethnicity, with risk scores of + 119 (SP = 3.55%) and + 89 (SP = 2.66%), respectively. Amongst the SES factors, the illiteracy of both parents and residence in a rural area contributed + 77 and + 78 risk scores, respectively, which contributed 2.35 and 2.3 shared percentages to the score. In regards to the medical status of the subjects, having a personal and family history of GI cancers provided a GC risk score of + 173 (SP =5.19) and + 57 (SP =1.72), respectively. Modifiable lifestyle behaviors, such as diet and use of narcotics took the subsequent positions. Amongst dietary habits, drinking hot tea [+ 53 (SP =1.58)], consumption of medium-to-high amounts of cheese [+ 47 (SP = 1.42)], use of table salt [+ 46 (SP = 1.39)], late dinnertime [+ 34 (SP = 1.03)], and consumption of medium-to-high amounts of eggs [+ 32 (SP = 0.96)] were amongst the dietary GC-prone predictors (Table 2).

Association with the subtype and subsite of GC

Some of the above-mentioned GC-prone predictors were also associated with the subsite and/or histologic subtype of the tumor. Amongst these, age was closely associated with the intestinal histologic subtype of GC (P = 0.003). History of GI cancer was associated with the cardia anatomic location (P = 0.02) and intestinal histologic subtype (P = 0.032) of the tumor. The predictors of drinking hot tea (P = 0.004) and consumption of table salt (P = 0. 04) were associated with the cardia subset of the GC tumors.

NUD-prone predictors

However, there were some predictors that acted towards the development of NUD; in other words, they were NUD-prone. These predictors belonged to the two categories of SES and diet. The predictors of the former category included: rented residential place (during childhood) [− 62 (SP = 1.85)], rural-birth place [− 57 (SP = 1.69)], and lacking refrigerator (during childhood) [− 46 (SP = 1.39)]. Predictors of the dietary habits included: taking vitamins [− 60 (SP = 1.81)], consumption of smoked rice [− 57 (SP = 1.72)], medium-to-high consumption of yoghurt [− 39 (SP = 1.16)], and consumption of tuna fish [− 31 (SP = 0.94)] (Table 2).

Model validation

To validate the results of the above-described development model, a validation approach was taken, assessing 597 individuals (248 GC and 349 NUD; GC: NUD ratio, 1:1.41), on which the ROC analysis was performed (Fig. 3). Using our 64 predictors, we were able to differentiate GC from NUD, with an AUC of 86.37% and the sensitivity, specificity, and accuracy rates of 85.89, 63.9 and 73.03%, respectively. According to this model, the rates of false positives and false negatives were 36.1, and 14.11%, respectively (Fig. 3).

We have also evaluated the calibration of our model by the Hosmer-Lemeshow method. Having done so, a P value of 0.4761 (well above 0.05) was obtained. Thus, the fitness of our model was confirmed. Our risk calculator can, thus, calculate the risk of GC versus NUD, based on the probability our 64 predictors, proportional to the achieved risk score (Fig. 3).

Discussion

Gastric cancer being a silent killer, usually catches patients and their health service providers, off-guard. Being able to assign a relative risk to subjects, based on their demographic characteristics and life style behaviours, will provide an upper hand in focusing on the at-risk subjects, with subsequent stepwise clinical testing and follow-ups. The goal of this study was to develop an approach to accomplish the primary screening step based on our target predictors.

In 2023, a multicentre population-based study, carried out on over 416 thousand subjects (aged 40 – 75 years) in China, a GC risk calculator was developed, which highlighted 11 demographic and life style variables that place individuals at risk of GC [34]. Although our study was hospital-based and has screened Iranian dyspeptic patients, the common variables between these two studies still identify age, gender (male), education (illiteracy of parents), salt intake and personal and family history of cancer as definite risk factors. In another population-based screening study on subjects (aged 40–74 years), with no history of cancer in Korea, six risk factors were identified [35], of which salt intake was a shared prominent risk factor with our hospital-based screening study.

In 2019, a population-based study was conducted in China to assess the general knowledge about GC risk factors and symptoms. The analysis was performed on 1200 adults, over the age of 18 with an average age of 40, which showed that the mean score for GC knowledge was 8.85 out of 22. Of the 1200 participants, 564 (47.0%) had insufficient understanding of GC risk factors and warning symptoms. Overall, about 84% of people believed that screening helped diagnose GC. However, only 15.2% of people were screened for GC. There were various reasons for avoiding screening, including being asymptomatic, fear of diagnostic screening and its outcomes, male gender, living in rural areas, lower educational levels, etc. [36]. Hence, lack of routine screening and the absence of specific symptoms for this fatal disease, leaves most subjects undiagnosed until the terminal stages, which accounts for GC being known as a silent killer [37, 38].

Several methodological studies on GC have been conducted over the years [13, 39,40,41,42,43]. A concerted strategy for the joint analysis of these investigations may allow new insights into the etiology of GC. Therefore, the ‘Stomach cancer Pooling (StoP) Project’ was set up in 2012 to join together several investigators and create a consortium of epidemiological investigations on risk factors for GC. The SToP’s final aim was to examine the role of several lifestyles and genetic determinants in the etiology of GC, through pooled analyses of individual-level data [44].

In our study we intended to investigate the effects of any potential risk factors, even if they were not statistically significant, so to create an all-inclusive risk calculator.

The GC-prone factors identified via our model, are also supported by previous studies, include older age [34, 45,46,47], male gender [34, 48], and non-Fars/mixed ethnicity [49,50,51], illiteracy of both parents [52,53,54,55], family history of GI cancers [56,57,58,59], drinking hot tea [60], late dinnertime [61, 62], consumption of table salt [63,64,65], and medium to high amounts of cheese and eggs [66,67,68,69]. Having used a logistic regression model, we have developed a gastric cancer risk calculator, with the sensitivity, specificity, and accuracy rates of 85.89, 63.9, and 73.03%, which can be used by individuals or their healthcare workers, for primary screening of dyspeptic patients.

In 2007, Driver et al. [21] developed a simple scoring system that identifies men at increased risk of colorectal cancer, based on age and modifiable behaviours, such as alcohol intake, smoking status, and body mass index. They ran a logistic regression model as well as a proportional hazards model, to better simulate a screening decision, based on the information obtained. The discrimination power of the final model was about 70% (AUC = 69.5%) [21]. In comparison, our risk score had the discrimination power (AUC) of 86.37% during internal validation. Keeping in mind that this is the primary screening step, followed by simple and complex clinical testing, the limited detection rates, we have herein obtained for a primary questionnaire-based surveillance, are acceptable.

The strengths of our study include its sensible sample size and inclusion of a wide variety of target demographic and lifestyle behaviours. However, we have used a case-case setting in order to be able to add other clinical data, on the next rounds of clinical and paraclinical screening. Having compared GC patients with non-GC (non-ulcer dyspeptic, NUD) patients, the scale bar of our risk score moves towards the direction of GC (GC-prone) or NUD (NUD-prone) and is, at best, suitable for screening dyspeptic patients, rather than the general population. Thus, our risk calculator, must be adjusted, by applying the model in a case-control (GC versus healthy population) setting. It must also be kept in mind that some of the highlighted risk indicators may actually be proxies for other unaddressed predictors. Furthermore, the fact that we had to turn our multinomial levels (answers), into binomial, may have oversimplified our model. Another point of concern is the external validation of this model on other sample cohorts with diverse environmental, cultural, and social characteristics.

Nevertheless, applying such an inexpensive GC risk calculator, using questionnaire-based information, can provide the first step in screening Iranian at risk patients, to be followed by more complex laboratory and clinical screenings. Furthermore, providing information about individualized GC risk status, can lead to attempts at correction of the modifiable risk behaviours. Future studies include, validation of this model in case-control settings, in different geographic locations.

Availability of data and materials

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Abbreviations

GC:: Gastric cancer
NUD:: Non-ulcer dyspeptic
ROC:: Receiver operating characteristic
MOHME:: Ministry of Health and Medical Education
StoP:: Stomach cancer Pooling
SP:: Shared percentage
FCS:: Full conditional specification
JM:: Joint modeling
MICE:: Multivariate imputation by chained equations
NCII:: National cancer institute of Iran
IARC:: International Agency for Research on Cancer
RS:: Risk scores

References

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.
Article PubMed Google Scholar
Alvarez EM, Force LM, Xu R, Compton K, Lu D, Henrikson HJ, Kocarnik JM, Harvey JD, Pennini A, Dean FE. The global burden of adolescent and young adult cancer in 2019: a systematic analysis for the global burden of disease study 2019. Lancet Oncol. 2022;23(1):27–52.
Article Google Scholar
Thrift AP, Nguyen TH. Gastric Cancer epidemiology. Gastrointest Endosc Clin N Am. 2021;31(3):425–39.
Article PubMed Google Scholar
Bertuccio P, Chatenoud L, Levi F, Praud D, Ferlay J, Negri E, Malvezzi M, La Vecchia C. Recent patterns in gastric cancer: a global overview. Int J Cancer. 2009;125(3):666–73.
Article CAS PubMed Google Scholar
Bertuccio P, Rosato V, Andreano A, Ferraroni M, Decarli A, Edefonti V, La Vecchia C. Dietary patterns and gastric cancer risk: a systematic review and meta-analysis. Ann Oncol. 2013;24(6):1450–8.
Article CAS PubMed Google Scholar
Du S, Li Y, Su Z, Shi X, Johnson NL, Li P, Zhang Y, Zhang Q, Wen L, Li K, et al. Index-based dietary patterns in relation to gastric cancer risk: a systematic review and meta-analysis. Br J Nutr. 2020;123(9):964–74.
Article CAS PubMed Google Scholar
de Menezes RF, Bergmann A, Thuler LC. Alcohol consumption and risk of cancer: a systematic literature review. Asian Pac J Cancer Prev. 2013;14(9):4965–72.
Article PubMed Google Scholar
Sasco AJ, Secretan MB, Straif K. Tobacco smoking and cancer: a brief review of recent epidemiological evidence. Lung Cancer. 2004;45(Suppl 2):S3–9.
Article PubMed Google Scholar
Ladeiras-Lopes R, Pereira AK, Nogueira A, Pinheiro-Torres T, Pinto I, Santos-Pereira R, Lunet N. Smoking and gastric cancer: systematic review and meta-analysis of cohort studies. Cancer Causes Control. 2008;19(7):689–701.
Article PubMed Google Scholar
Dong J, Thrift AP. Alcohol, smoking and risk of oesophago-gastric cancer. Best Pract Res Clin Gastroenterol. 2017;31(5):509–17.
Article PubMed Google Scholar
Cavatorta O, Scida S, Miraglia C, Barchi A, Nouvenne A, Leandro G, Meschi T, De' Angelis GL, Di Mario F. Epidemiology of gastric cancer and risk factors. Acta Biomed. 2018;89(8-s):82–7.
PubMed Google Scholar
Machlowska J, Baj J, Sitarz M, Maciejewski R, Sitarz R. Gastric Cancer: epidemiology, risk factors, classification, genomic characteristics and treatment strategies. Int J Mol Sci. 2020;21(11):4012.
Article CAS PubMed PubMed Central Google Scholar
Guggenheim DE, Shah MA. Gastric cancer epidemiology and risk factors. J Surg Oncol. 2013;107(3):230–6.
Article PubMed Google Scholar
Yusefi AR, Bagheri Lankarani K, Bastani P, Radinmanesh M, Kavosi Z. Risk factors for gastric Cancer: a systematic review. Asian Pac J Cancer Prev. 2018;19(3):591–603.
CAS PubMed PubMed Central Google Scholar
Lin Y, Zheng Y, Wang HL, Wu J. Global patterns and trends in gastric Cancer incidence rates (1988-2012) and predictions to 2030. Gastroenterology. 2021;161(1):116–127.e118.
Article PubMed Google Scholar
National Research Council Committee on Risk P. Communication. In: Improving Risk Communication. Washington (DC): National Academies Press (US) Copyright © 1989 by the National Academy of Sciences; 1989.
Google Scholar
In: Fundamentals of Clinical Data Science. edn. Edited by Kubben P, Dumontier M, Dekker A. Cham (CH): Springer Copyright 2019, The Editor(s) (if applicable) and The Author(s). This book is an open access publication. 2019. ISBN 978-3-319-99712-4, ISBN 978-3-319-99713-1 (eBook).
Mishra GA, Dhivar HD, Gupta SD, Kulkarni SV, Shastri SS. A population-based screening program for early detection of common cancers among women in India - methodology and interim results. Indian J Cancer. 2015;52(1):139–45.
Article CAS PubMed Google Scholar
Li X, Bian D, Yu J, Li M, Zhao D. Using machine learning models to improve stroke risk level classification methods of China national stroke screening. BMC Med Inform Decis Mak. 2019;19(1):261.
Article PubMed PubMed Central Google Scholar
Heikes KE, Eddy DM, Arondekar B, Schlessinger L. Diabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes Care. 2008;31(5):1040–5.
Article PubMed Google Scholar
Driver JA, Gaziano JM, Gelber RP, Lee IM, Buring JE, Kurth T. Development of a risk score for colorectal cancer in men. Am J Med. 2007;120(3):257–63.
Article PubMed Google Scholar
Lloyd-Jones DM, Wilson PW, Larson MG, Beiser A, Leip EP, D'Agostino RB, Levy D. Framingham risk score and prediction of lifetime risk for coronary heart disease. Am J Cardiol. 2004;94(1):20–4.
Article PubMed Google Scholar
Wittekind C, Compton CC, Greene FL, Sobin LH. TNM residual tumor classification revisited. Cancer. 2002;94(9):2511–6.
Article PubMed Google Scholar
Lauren P. The two histological main types of gastric carcinoma: diffuse and so-called intestinal-type carcinoma. An attempt at a histo-clinical classification. Acta Pathol Microbiol Scand. 1965;64:31–49.
Article CAS PubMed Google Scholar
Zhang Z. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med. 2016;4(2):30.
PubMed PubMed Central Google Scholar
Yucel RM. Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philos Trans A Math Phys Eng Sci. 1874;2008(366):2389–403.
Google Scholar
van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007;16(3):219–42.
Article PubMed Google Scholar
Wahl S, Boulesteix AL, Zierer A, Thorand B, van de Wiel MA. Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation. BMC Med Res Methodol. 2016;16(1):144.
Article PubMed PubMed Central Google Scholar
Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M. Logistic regression. Springer; 2002.
Google Scholar
Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med. 2013;4(2):627–35.
PubMed PubMed Central Google Scholar
Refaeilzadeh P, Tang L, Liu H, Liu L. Encyclopedia of database systems. In: Cross-validation. Springer; 2009. p. 532–8.
Google Scholar
Blaha MJ. The critical importance of risk score calibration: time for transformative approach to risk score validation? J Am Coll Cardiol. 2016;67(18):2131–4.
Article PubMed Google Scholar
Landwehr JM, Pregibon D, Shoemaker AC. Graphical methods for assessing logistic regression models. J Am Stat Assoc. 1984;79(385):61–71.
Article Google Scholar
Zhu X, Lv J, Zhu M, Yan C, Deng B, Yu C, Guo Y, Ni J, She Q, Wang T, et al. Development, validation, and evaluation of a risk assessment tool for personalized screening of gastric cancer in Chinese populations. BMC Med. 2023;21(1):159.
Article PubMed PubMed Central Google Scholar
Trinh TTK, Lee K, Oh JK, Suh M, Jun JK, Choi KS. Cluster of lifestyle risk factors for stomach cancer and screening behaviors among Korean adults. Sci Rep. 2023;13(1):17503.
Article CAS PubMed PubMed Central Google Scholar
Liu Q, Zeng X, Wang W, Huang RL, Huang YJ, Liu S, Huang YH, Wang YX, Fang QH, He G, et al. Awareness of risk factors and warning symptoms and attitude towards gastric cancer screening among the general public in China: a cross-sectional study. BMJ Open. 2019;9(7):e029638.
Article PubMed PubMed Central Google Scholar
Necula L, Matei L, Dragu D, Neagu AI, Mambet C, Nedeianu S, Bleotu C, Diaconu CC, Chivu-Economescu M. Recent advances in gastric cancer early diagnosis. World J Gastroenterol. 2019;25(17):2029–44.
Article CAS PubMed PubMed Central Google Scholar
Pasechnikov V, Chukov S, Fedorov E, Kikuste I, Leja M. Gastric cancer: prevention, screening and early diagnosis. World J Gastroenterol. 2014;20(38):13842–62.
Article PubMed PubMed Central Google Scholar
Karimi P, Islami F, Anandasabapathy S, Freedman ND, Kamangar F. Gastric cancer: descriptive epidemiology, risk factors, screening, and prevention. Cancer Epidemiol Biomark Prev. 2014;23(5):700–13.
Article Google Scholar
Correa P. Gastric cancer: overview. Gastroenterol Clin N Am. 2013;42(2):211–7.
Article Google Scholar
Ang TL, Fock KM. Clinical epidemiology of gastric cancer. Singap Med J. 2014;55(12):621–8.
Article Google Scholar
Smyth EC, Nilsson M, Grabsch HI, van Grieken NC, Lordick F. Gastric cancer. Lancet. 2020;396(10251):635–48.
Article CAS PubMed Google Scholar
Venerito M, Link A, Rokkas T, Malfertheiner P. Gastric cancer - clinical and epidemiological aspects. Helicobacter. 2016;21(Suppl 1):39–44.
Article CAS PubMed Google Scholar
Pelucchi C, Lunet N, Boccia S, Zhang ZF, Praud D, Boffetta P, Levi F, Matsuo K, Ito H, Hu J, et al. The stomach cancer pooling (StoP) project: study design and presentation. Eur J Cancer Prev. 2015;24(1):16–23.
Article PubMed Google Scholar
Choi Y, Kim N, Kim KW, Jo HH, Park J, Yoon H, Shin CM, Park YS, Lee DH. Gastric Cancer in older patients: a retrospective study and literature review. Ann Geriatr Med Res. 2022;26(1):33–41.
Article PubMed PubMed Central Google Scholar
Lee JG, Kim SA, Eun CS, Han DS, Kim YS, Choi BY, Song KS, Kim HJ, Park CH. Impact of age on stage-specific mortality in patients with gastric cancer: a long-term prospective cohort study. PLoS One. 2019;14(8):e0220660.
Article CAS PubMed PubMed Central Google Scholar
Chung HW, Noh SH, Lim JB. Analysis of demographic characteristics in 3242 young age gastric cancer patients in Korea. World J Gastroenterol. 2010;16(2):256–63.
Article PubMed PubMed Central Google Scholar
Choi Y, Kim N, Kim KW, Jo HH, Park J, Yoon H, Shin CM, Park YS, Lee DH, Oh HJ, et al. Sex-based differences in histology, staging, and prognosis among 2983 gastric cancer surgery patients. World J Gastroenterol. 2022;28(9):933–47.
Article CAS PubMed PubMed Central Google Scholar
Mohebbi M, Mahmoodi M, Wolfe R, Nourijelyani K, Mohammad K, Zeraati H, Fotouhi A. Geographical spread of gastrointestinal tract cancer incidence in the Caspian Sea region of Iran: spatial analysis of cancer registry data. BMC Cancer. 2008;8:137.
Article PubMed PubMed Central Google Scholar
Malekzadeh R, Derakhshan MH, Malekzadeh Z. Gastric cancer in Iran: epidemiology and risk factors. Arch Iran Med. 2009;12(6):576–83.
PubMed Google Scholar
Kalan Farmanfarma K, Mahdavifar N, Hassanipour S, Salehiniya H. Epidemiologic study of gastric Cancer in Iran: a systematic review. Clin Exp Gastroenterol. 2020;13:511–42.
Article PubMed PubMed Central Google Scholar
Tam YH, Yeung CK, Lee KH, Sihoe JD, Chan KW, Cheung ST, Mou JW. A population-based study of Helicobacter pylori infection in Chinese children resident in Hong Kong: prevalence and potential risk factors. Helicobacter. 2008;13(3):219–24.
Article PubMed Google Scholar
Nouraie M, Latifi-Navid S, Rezvan H, Radmard AR, Maghsudlu M, Zaer-Rezaii H, Amini S, Siavoshi F, Malekzadeh R. Childhood hygienic practice and family education status determine the prevalence of Helicobacter pylori infection in Iran. Helicobacter. 2009;14(1):40–6.
Article PubMed Google Scholar
Lagergren J, Andersson G, Talbäck M, Drefahl S, Bihagen E, Härkönen J, Feychting M, Ljung R. Marital status, education, and income in relation to the risk of esophageal and gastric cancer by histological type and site. Cancer. 2016;122(2):207–12.
Article PubMed Google Scholar
Stephens MR, Blackshaw GR, Lewis WG, Edwards P, Barry JD, Hopper NA, Allison MC. Influence of socio-economic deprivation on outcomes for patients diagnosed with gastric cancer. Scand J Gastroenterol. 2005;40(11):1351–7.
Article PubMed Google Scholar
Slavin TP, Weitzel JN, Neuhausen SL, Schrader KA, Oliveira C, Karam R. Genetics of gastric cancer: what do we know about the genetic risks? Transl Gastroenterol Hepatol. 2019;4:55.
Article PubMed PubMed Central Google Scholar
Ren JS, Freedman ND, Kamangar F, Dawsey SM, Hollenbeck AR, Schatzkin A, Abnet CC. Tea, coffee, carbonated soft drinks and upper gastrointestinal tract cancer risk in a large United States prospective cohort study. Eur J Cancer. 2010;46(10):1873–81.
Article CAS PubMed PubMed Central Google Scholar
Sun WY, Yang H, Wang XK, Fan JH, Qiao YL, Taylor PR. The association between family history of upper gastrointestinal Cancer and the risk of death from upper gastrointestinal Cancer-based on Linxian dysplasia nutrition intervention trial (NIT) cohort. Front Oncol. 2022;12:897534.
Article PubMed PubMed Central Google Scholar
Thrift AP, Jove AG, Liu Y, Tan MC, El-Serag HB. Associations of duration, intensity, and quantity of smoking with risk of gastric intestinal metaplasia. J Clin Gastroenterol. 2022;56(1):e71–6.
Article CAS PubMed PubMed Central Google Scholar
Mao XQ, Jia XF, Zhou G, Li L, Niu H, Li FL, Liu HY, Zheng R, Xu N. Green tea drinking habits and gastric cancer in Southwest China. Asian Pac J Cancer Prev. 2011;12(9):2179–82.
PubMed Google Scholar
Xu L, Zhang X, Lu J, Dai JX, Lin RQ, Tian FX, Liang B, Guo YN, Luo HY, Li N, et al. The effects of dinner-to-bed time and post-dinner walk on gastric Cancer across different age groups: a multicenter case-control study in Southeast China. Medicine (Baltimore). 2016;95(16):e3397.
Article PubMed Google Scholar
Huang L, Chen L, Gui ZX, Liu S, Wei ZJ, Xu AM. Preventable lifestyle and eating habits associated with gastric adenocarcinoma: a case-control study. J Cancer. 2020;11(5):1231–9.
Article CAS PubMed PubMed Central Google Scholar
Wu B, Yang D, Yang S, Zhang G. Dietary salt intake and gastric Cancer risk: a systematic review and Meta-analysis. Front Nutr. 2021;8:801228.
Article PubMed PubMed Central Google Scholar
Thapa S, Fischbach LA, Delongchamp R, Faramawi MF, Orloff M. Association between dietary salt intake and progression in the gastric precancerous process. Cancers (Basel). 2019;11(4):467.
Article CAS PubMed PubMed Central Google Scholar
Ge S, Feng X, Shen L, Wei Z, Zhu Q, Sun J. Association between habitual dietary salt intake and risk of gastric Cancer: a systematic review of observational studies. Gastroenterol Res Pract. 2012;2012:808120.
Article PubMed PubMed Central Google Scholar
Shen JG, Jin LD, Dong MJ, Wang LB, Zhao WH, Shen J. Low level of serum high-density lipoprotein cholesterol in gastric cancer correlates with cancer progression but not survival. Transl Cancer Res. 2020;9(10):6206–13.
Article CAS PubMed PubMed Central Google Scholar
Miao P, Guan L. Association of Dietary Cholesterol Intake with Risk of gastric Cancer: a systematic review and Meta-analysis of observational studies. Front Nutr. 2021;8:722450.
Article PubMed PubMed Central Google Scholar
Shin HJ, Roh CK, Son SY, Hoon H, Han SU. Prognostic value of hypocholesterolemia in patients with gastric cancer. Asian J Surg. 2021;44(1):72–9.
Article PubMed Google Scholar
Pih GY, Gong EJ, Choi JY, Kim MJ, Ahn JY, Choe J, Bae SE, Chang HS, Na HK, Lee JH, et al. Associations of serum lipid level with gastric Cancer risk, pathology, and prognosis. Cancer Res Treat. 2021;53(2):445–56.
Article CAS PubMed Google Scholar

Download references

Acknowledgments

The author would like to sincerely thank Professor Emeritis Olof Nyren, (Karolinska Institutet, Stockholm, Sweden) for his expert and generous guidance in formulating the interview questionnaire. We would also like to thank every interviewer who have generously assisted in obtaining the data on our patient population over the years.

Funding

This study was supported by a technical assistance grant (IRN-072), which was co-funded by the Islamic Development Bank, Saudi Arabia and Pasteur Institute of Iran.

Author information

Authors and Affiliations

HPGC Research Group, Department of Medical Biotechnology, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
Kimiya Gohari, Samaneh Saberi, Maryam Esmaieli & Marjan Mohammadi
Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
Kimiya Gohari & Anoshirvan Kazemnejad
Gastroenterology Department, Amiralam Hospital, Tehran University of Medical Sciences, Tehran, Iran
Mohammad Tashakoripour & Mahmoud Eshagh Hosseini
Cancer Research Center, Cancer Institute, Tehran University of Medical Sciences, Tehran, Iran
Azin Nahvijou & Mohammad Ali Mohagheghi

Authors

Kimiya Gohari
View author publications
You can also search for this author in PubMed Google Scholar
Samaneh Saberi
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Esmaieli
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Tashakoripour
View author publications
You can also search for this author in PubMed Google Scholar
Mahmoud Eshagh Hosseini
View author publications
You can also search for this author in PubMed Google Scholar
Azin Nahvijou
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Ali Mohagheghi
View author publications
You can also search for this author in PubMed Google Scholar
Anoshirvan Kazemnejad
View author publications
You can also search for this author in PubMed Google Scholar
Marjan Mohammadi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KG: Modelling, methodology, computational analysis, drafting the manuscript, and assistance in conceiving the concept and study design; SS: Data collection, assistance in conceiving the concept, study design and drafting the manuscript; ME: Data collection, database entry and management; MT: Data collection, MEH: Clinical data collection on endoscopy patients; AN: Data collection; AK: Methodology; MAM: Clinical data collection on gastric cancer patients; MM: Conceptualization, study design, supervision, drafting, revising and finalizing the manuscript.

Corresponding authors

Correspondence to Anoshirvan Kazemnejad or Marjan Mohammadi.

Ethics declarations

Ethics approval and consent to participate

All methods were carried out following relevant guidelines and regulations. Data collections were carried out following obtaining the participants’ written informed consent, according to the protocols approved by the National Committee on Ethical Issues in Medical Research, Ministry of Health and Medical Education (MOHME) of Iran; Ref No. 315.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Marjan Mohammadi is the main corresponding author and Anoshirvan Kazemnejad is the co-corresponding author.

Supplementary Information

Additional file 1.

Additional file 2.

Additional file 3.

Additional file 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Gohari, K., Saberi, S., Esmaieli, M. et al. Development of a gastric cancer risk calculator for questionnaire-based surveillance of Iranian dyspeptic patients. BMC Gastroenterol 24, 39 (2024). https://doi.org/10.1186/s12876-024-03123-z

Download citation

Received: 22 November 2023
Accepted: 02 January 2024
Published: 18 January 2024
DOI: https://doi.org/10.1186/s12876-024-03123-z

Development of a gastric cancer risk calculator for questionnaire-based surveillance of Iranian dyspeptic patients

Abstract

Background

Methods

Results

Conclusion

Introduction

Materials and methods

Study setting

Statistical analysis

Imputation of the missing data

Model development

Model validation

Results

Descriptive information

Model development

GC-prone predictors

Association with the subtype and subsite of GC

NUD-prone predictors

Model validation

Discussion

Availability of data and materials

Abbreviations

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1.

Additional file 2.

Additional file 3.

Additional file 4.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Gastroenterology

Contact us