Skip to main content

Development of a gastric cancer risk calculator for questionnaire-based surveillance of Iranian dyspeptic patients



Gastric cancer (GC) is considered a silent killer, taking more than three quarters of a million lives annually. Therefore, prior to further costly and invasive diagnostic approaches, an initial GC risk screening is desperately in demand.


In order to develop a simple risk scoring system, the demographic and lifestyle indices from 858 GC and 1132 non-ulcer dyspeptic (NUD) patients were analysed. We applied a multivariate logistic regression approach to identify the association between our target predictors and GC versus NUD. The model performance in classification was assessed by receiver operating characteristic (ROC) analysis. Our questionnaire covering 64 predictors, included known risk factors, such as demographic features, dietary habits, self-reported medical status, narcotics use, and SES indicators.


Our model segregated GC from NUD patients with the sensitivity, specificity, and accuracy rates of 85.89, 63.9, and 73.03%, respectively, which was confirmed in the development dataset (AUC equal to 86.37%, P < 0.0001). Predictors which contributed most to our GC risk calculator, based on risk scores (RS) and shared percentages (SP), included: 1) older age group [> 70 (RS:+ 241, SP:7.23), 60–70 (RS:+ 221, SP:6.60), 50–60 (RS:+ 134, SP:4.02), 2) history of gastrointestinal cancers (RS:+ 173, SP:5.19), 3) male gender (RS:+ 119, SP:3.55), 4) non-Fars ethnicity (RS:+ 89, SP:2.66), 5) illiteracy of both parents (RS:+ 78, SP:2.38), 6) rural residence (RS:+ 77, SP:2.3), and modifiable dietary behaviors (RS:+ 32 to + 53, SP:0.96 to 1.58).


Our developed risk calculator provides a primary screening step, prior to the subsequent costly and invasive measures. Furthermore, public awareness regarding modifiable risk predictors may encourage and promote lifestyle adjustments and healthy behaviours.

Peer Review reports


According to the International Agency for Research on Cancer (IARC), gastric cancer (GC) is responsible for more than 769,000 global deaths, equating to one in every 13 deaths, for the year 2020 [1]. GC is more prevalent amongst the male population, such that ~ 49 in 100,000 males suffer from this disease, which is more than twice its prevalence in females (~ 21 in 100,000) [2]. Stomach cancer mainly involves older people, with the average age of diagnosis being 68 and more than half of people diagnosed are 65 or older [3].

GC is a multistep and multifactorial process involving genetic and environmental factors [4]. Besides age and gender as known risk factors, there is much evidence that unhealthy diets [5, 6], alcohol abuse [7], smoking [8,9,10] and other factors such as genetics, environmental and behavioural factors [11,12,13,14] enhance the risk of GC development.

Considering the incidence rates of GC in most countries are expected to decrease through 2030, reductions in smoking, prevalence of Helicobacter pylori infection and diet improvement will be the likely contributing factors [15].

It is most desired to estimate the primary disease risk using general information, consuming the least time and resources [16]. This can be made possible by combining clinical knowledge with applied data science [17]. The ultimate product should be an optimal tool, readily performed by anyone without expert knowledge, using their personal information. Ideally, the application of such tools will help increase awareness and ultimately reduce the burden of disease on the community and the health care system [18].

Classification methods are usually used to develop a risk score and identify high-risk individuals in a population [19,20,21,22]. In this study, we have used multivariate analysis to identify individual predictors differentiating GC from NUD patients. For this purpose, we carried out a logistic regression approach, using 64 predictors of known risk factors including demographic features, dietary habits, self-reported medical status, narcotics use, and SES indicators. Developing a time and cost-effective algorithm that uses the personal medical history and lifestyle habits to screen subjects for GC risk, can provide a tool for filtering dyspeptic patients prior to the more invasive screening approaches.

Materials and methods

Study setting

This hospital-based observational study was conducted on a group of Iranian gastric cancer (GC) patients (n = 858), who were consecutively (July 2003 to Jan 2020) referred to the National cancer Institute of Iran (NCII). Our GC cases were diagnosed with histologically confirmed gastric adenocarcinoma. The non-ulcer dyspeptic (NUD) patients (n = 1132) were those who had referred for upper gastroscopy, but lacked GC. NUD patients were admitted at the endoscopy unit of Amiralam Hospital. Both centers shared similar SES profiles. The anatomic location (subsite) of the tumor was classified as cardia (defined as cardioesophageal junction, oesophagogastric junction and gastroesophageal junction) or non-cardia (all other locations in the stomach) [23]. Histopathologic studies identified the subtype of the gastric tumors, as intestinal or diffuse [24].

Trained technicians interviewed each participant at the time of recruitment, using a structured questionnaire. This questionnaire elicited 64 predictors, including demographic features, dietary habits, self-reported medical status, narcotics use, and SES indicators. Primarily, each of the questionnaire predictors, with multiple levels (with the exception of age) was turned into binary groups (S-Table-1).

In order to use the properties of data while assuming the power of 90% for testing the significance of the odds ratio in the logistic regression model, a minimum of 936 observations were required. Therefore, our data, including 1990 observations, had sufficient power for the risk score development.

Statistical analysis

Imputation of the missing data

We used multivariate imputation by chained equations (MICE) [25], to deal with missing data in more than one variable. In this method, two general approaches for imputing multivariate data have been applied: joint modeling (JM) [computational strategies for multivariate linear mixed-effects models with missing values [26], multilevel models with multivariate mixed response types] and full conditional specification (FCS) [multivariate imputation by chained equations-dependency networks for inference, collaborative filtering, and data visualisation [27].

To validate our imputation method, we conducted the following steps. At first, using the bootstrap method [28], based on the distribution of data, we have generated ten copies of our dataset. The multivariate imputation method imputed the missing values in each copy, and five new complete datasets were generated for all the copies. In this manner, we achieved 50 complete datasets. The distribution of all variables in the original dataset and these 50 imputed versions were compared. The variables were kept in the dataset, if the deviation in the mean (S-Fig. 1) and standard deviation (S-Fig. 2) did not exceed 0.05. Next, we randomly converted 10% of the observed values for each variable into missing and imputed them again. This process was repeated 1000 times and all the variables’ biases were calculated (S-Fig. 3). The cut-off value for the bias variation was set at 2%. We aimed to maintain the imputation bias under this cut-off value.

Model development

We used Chi-square test to measure associations between predictors and outcomes (Table 1). Statistical significance was determined using 2-sided P-values, with values < 0.05 considered as statistically significant. We have presented a univariate analysis of all predictors and assessed the association between each of them with GC vs. NUD, without taking into consideration the other predictors. In this step, we emphasized on the distribution of each (Table 1). In the next step we performed multivariate logistic regression analysis on 70% of randomly selected observations, and determined the associations with each predictor, while adjusting for all others (Table 2) [29]. The probability of having GC vs. NUD, based on the logistic model, was calculated [30].

Table 1 Distribution of predictors amongst GC versus NUD patients
Table 2 The results of the multivariate logistic regression model to explore the GC-prone versus NUD-prone predictors

The probability of being GC versus NUD was computed using logistic regression:

$$P(GC)=\frac{1}{1+{e}^{-\left({\beta}_0+{\beta}_1{X}_1+{\beta}_2{X}_2+\dots +{\beta}_k{X}_k\right)}}$$

Where β0 is the intercept term and β1, β2, …, βk are the coefficients associated with the input features X1, X2, …, Xk.

This study divided patients into two risk groups based on an assigned cut-off point, derived from fixing the sensitivity rate at a minimum of 90%, while maximizing the specificity rate. Accordingly, the best threshold for the risk score was identified. We defined the shared percentage for every predictor in our risk calculator, as the contribution of each variable in predicting GC vs. NUD, as clinical outcomes. This measure is the proportion of the standardized regression coefficient (point estimates) for each predictor relative to their total sum (Table 2 and Fig. 1). The final risk score for each predictor was calculated by the multiplication of their pertinent point estimate by 100.

Fig. 1
figure 1

Risk score system for segregation of GC from NUD based on our logistic regression model

Model validation

We used the train-test split method [31] for determining the performance criteria (AUC, sensitivity, specificity, precision, false-positive, false-negative, and accuracy rates) of our logistic model (Fig. 2), as well as to assess its internal validity.

Fig. 2
figure 2

The probability of GC versus NUD based on risk scores

These performance criteria were calculated as follows:

The accuracy rate, which measures the overall correctness of the classification, was calculated as:

$$Accuracy=\frac{TP+ TN}{TP+ TN+ FP+ FN}$$

The sensitivity rate (true positive rate or recall), which measures the proportion of actual positive instances that were correctly identified, was calculated as:

$$Sensitivity=\frac{TP}{TP+ FN}$$

The specificity rate (true negative rate), which measures the proportion of actual negative instances that were correctly identified, was calculated as:

$$Specificity=\frac{TN}{TN+ FP}$$

Where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN is the number of false negatives.

To do this, the data were randomly divided into the development (70%) and validation (30%) subsets. The performance criteria of our GC risk calculator were determined by examining calibration and discrimination measures. Calibration refers to how closely the predicted probability of having GC agrees with the observed GC status and is assessed by the Hosmer-Lemeshow test [32]. The discrimination rate expresses the ability of the model to differentiate between individuals with GC versus NUD. This was evaluated by calculating the area under the ROC curve (AUC) [33]. An AUC value of 50 and 100 was considered as having no versus perfect discrimination, respectively. Risk thresholds that gave a combination of more than 85% sensitivity rates and maximum specificity rates were derived from the list provided by the ROC curve analysis. All statistical analysis and data visualizations were done in R statistical software environment.


Descriptive information

Our observational study included 858 GC [development = 610 and validation = 248] and 1132 NUD [development = 783 and validation = 349] patients, who were entered into this hospital-based study.

All of the 64 questionnaire predictors, with multiple levels, were converted into binary categories, as presented in S-Table 1. We have also presented the distribution of each of these predictors amongst GC versus NUD patients, without any adjustments for other predictors in Table 1. The results of the Chi-square test showed that the distribution of most (47/64) of the predictors were different between GC and NUD patients (Table 1). The association between predictors and GC vs. NUD affected the model, and although most variables were independently associated with GC (Table 1), when adjusted for all other variables, few associations, remained statistically significant (Table 2).

The data obtained from the 64 predictors from our 1990 (GC + NUD) cases included varying degrees of missingness. To remedy this, we used the MICE method to impute the missing values. But first it was critical to validate our imputation technique and ascertain a consistent distribution for each predictor thereafter. Having done so, in the first approach amongst the 50 regenerated samples, the mean (S-Fig. 1) and standard deviation (S-Figure-2) differences, between our actual and imputed data did not exceed 0.05 and were thus acceptable. In the second approach, for all 1000 bootstrap-generated samples, the bias was determined as under 0.02 (S-Fig. 3).

Model development

Logistic regression specified the strength of association between each of our 64 predictors and the clinical outcome (GC or NUD). The strengths of association for each of the predictors (if any), while adjusting for all other predictors, are presented via risk scores, shared precents, and odds ratios (Table 2). Taking into consideration all of the 64 predictors in our model, a risk calculator was created, scoring for GC or NUD (Fig. 1). The obtained risk score ranged from − 1261 to + 2077 (total range of 3338), moving from NUD towards GC. Of this range, the risk score of − 451, equivalent to a shared percentage of 13.49, was assigned to subjects at reference level. The remaining 86.51 percentage of the risk score was contributed by our 64 predictors, with varying shares. Aiming for a minimum sensitivity rate of 90%, the risk score of − 91, coinciding with the probability value of 0.29 (ranging from 0 to 1.0, Fig. 2) was identified as the cut-off point. Keeping in mind that each of the addressed 64 predictors contributed to the final risk score, those which were statistically significant are described below.

GC-prone predictors

The predictors which acted towards the development of GC are considered as GC-prone. Amongst the demographic category, older age holds the first place, creating risk scores of + 134 to + 221 to + 241, for subjects aged > 50 – 60, > 60 – 70 and > 70, in reference to those aged ≤ 50 years, respectively. These values were sequentially equivalent to 4.02, 6.60, and 7.23 shared percentage (SP) of the total range. Next in line, was being of male gender and non-Fars/mixed ethnicity, with risk scores of + 119 (SP = 3.55%) and + 89 (SP = 2.66%), respectively. Amongst the SES factors, the illiteracy of both parents and residence in a rural area contributed + 77 and + 78 risk scores, respectively, which contributed 2.35 and 2.3 shared percentages to the score. In regards to the medical status of the subjects, having a personal and family history of GI cancers provided a GC risk score of + 173 (SP =5.19) and + 57 (SP =1.72), respectively. Modifiable lifestyle behaviors, such as diet and use of narcotics took the subsequent positions. Amongst dietary habits, drinking hot tea [+ 53 (SP =1.58)], consumption of medium-to-high amounts of cheese [+ 47 (SP = 1.42)], use of table salt [+ 46 (SP = 1.39)], late dinnertime [+ 34 (SP = 1.03)], and consumption of medium-to-high amounts of eggs [+ 32 (SP = 0.96)] were amongst the dietary GC-prone predictors (Table 2).

Association with the subtype and subsite of GC

Some of the above-mentioned GC-prone predictors were also associated with the subsite and/or histologic subtype of the tumor. Amongst these, age was closely associated with the intestinal histologic subtype of GC (P = 0.003). History of GI cancer was associated with the cardia anatomic location (P = 0.02) and intestinal histologic subtype (P = 0.032) of the tumor. The predictors of drinking hot tea (P = 0.004) and consumption of table salt (P = 0. 04) were associated with the cardia subset of the GC tumors.

NUD-prone predictors

However, there were some predictors that acted towards the development of NUD; in other words, they were NUD-prone. These predictors belonged to the two categories of SES and diet. The predictors of the former category included: rented residential place (during childhood) [− 62 (SP = 1.85)], rural-birth place [− 57 (SP = 1.69)], and lacking refrigerator (during childhood) [− 46 (SP = 1.39)]. Predictors of the dietary habits included: taking vitamins [− 60 (SP = 1.81)], consumption of smoked rice [− 57 (SP = 1.72)], medium-to-high consumption of yoghurt [− 39 (SP = 1.16)], and consumption of tuna fish [− 31 (SP = 0.94)] (Table 2).

Model validation

To validate the results of the above-described development model, a validation approach was taken, assessing 597 individuals (248 GC and 349 NUD; GC: NUD ratio, 1:1.41), on which the ROC analysis was performed (Fig. 3). Using our 64 predictors, we were able to differentiate GC from NUD, with an AUC of 86.37% and the sensitivity, specificity, and accuracy rates of 85.89, 63.9 and 73.03%, respectively. According to this model, the rates of false positives and false negatives were 36.1, and 14.11%, respectively (Fig. 3).

Fig. 3
figure 3

ROC curve analysis of our model differentiating GC from NUD

We have also evaluated the calibration of our model by the Hosmer-Lemeshow method. Having done so, a P value of 0.4761 (well above 0.05) was obtained. Thus, the fitness of our model was confirmed. Our risk calculator can, thus, calculate the risk of GC versus NUD, based on the probability our 64 predictors, proportional to the achieved risk score (Fig. 3).


Gastric cancer being a silent killer, usually catches patients and their health service providers, off-guard. Being able to assign a relative risk to subjects, based on their demographic characteristics and life style behaviours, will provide an upper hand in focusing on the at-risk subjects, with subsequent stepwise clinical testing and follow-ups. The goal of this study was to develop an approach to accomplish the primary screening step based on our target predictors.

In 2023, a multicentre population-based study, carried out on over 416 thousand subjects (aged 40 – 75 years) in China, a GC risk calculator was developed, which highlighted 11 demographic and life style variables that place individuals at risk of GC [34]. Although our study was hospital-based and has screened Iranian dyspeptic patients, the common variables between these two studies still identify age, gender (male), education (illiteracy of parents), salt intake and personal and family history of cancer as definite risk factors. In another population-based screening study on subjects (aged 40–74 years), with no history of cancer in Korea, six risk factors were identified [35], of which salt intake was a shared prominent risk factor with our hospital-based screening study.

In 2019, a population-based study was conducted in China to assess the general knowledge about GC risk factors and symptoms. The analysis was performed on 1200 adults, over the age of 18 with an average age of 40, which showed that the mean score for GC knowledge was 8.85 out of 22. Of the 1200 participants, 564 (47.0%) had insufficient understanding of GC risk factors and warning symptoms. Overall, about 84% of people believed that screening helped diagnose GC. However, only 15.2% of people were screened for GC. There were various reasons for avoiding screening, including being asymptomatic, fear of diagnostic screening and its outcomes, male gender, living in rural areas, lower educational levels, etc. [36]. Hence, lack of routine screening and the absence of specific symptoms for this fatal disease, leaves most subjects undiagnosed until the terminal stages, which accounts for GC being known as a silent killer [37, 38].

Several methodological studies on GC have been conducted over the years [13, 39,40,41,42,43]. A concerted strategy for the joint analysis of these investigations may allow new insights into the etiology of GC. Therefore, the ‘Stomach cancer Pooling (StoP) Project’ was set up in 2012 to join together several investigators and create a consortium of epidemiological investigations on risk factors for GC. The SToP’s final aim was to examine the role of several lifestyles and genetic determinants in the etiology of GC, through pooled analyses of individual-level data [44].

In our study we intended to investigate the effects of any potential risk factors, even if they were not statistically significant, so to create an all-inclusive risk calculator.

The GC-prone factors identified via our model, are also supported by previous studies, include older age [34, 45,46,47], male gender [34, 48], and non-Fars/mixed ethnicity [49,50,51], illiteracy of both parents [52,53,54,55], family history of GI cancers [56,57,58,59], drinking hot tea [60], late dinnertime [61, 62], consumption of table salt [63,64,65], and medium to high amounts of cheese and eggs [66,67,68,69]. Having used a logistic regression model, we have developed a gastric cancer risk calculator, with the sensitivity, specificity, and accuracy rates of 85.89, 63.9, and 73.03%, which can be used by individuals or their healthcare workers, for primary screening of dyspeptic patients.

In 2007, Driver et al. [21] developed a simple scoring system that identifies men at increased risk of colorectal cancer, based on age and modifiable behaviours, such as alcohol intake, smoking status, and body mass index. They ran a logistic regression model as well as a proportional hazards model, to better simulate a screening decision, based on the information obtained. The discrimination power of the final model was about 70% (AUC = 69.5%) [21]. In comparison, our risk score had the discrimination power (AUC) of 86.37% during internal validation. Keeping in mind that this is the primary screening step, followed by simple and complex clinical testing, the limited detection rates, we have herein obtained for a primary questionnaire-based surveillance, are acceptable.

The strengths of our study include its sensible sample size and inclusion of a wide variety of target demographic and lifestyle behaviours. However, we have used a case-case setting in order to be able to add other clinical data, on the next rounds of clinical and paraclinical screening. Having compared GC patients with non-GC (non-ulcer dyspeptic, NUD) patients, the scale bar of our risk score moves towards the direction of GC (GC-prone) or NUD (NUD-prone) and is, at best, suitable for screening dyspeptic patients, rather than the general population. Thus, our risk calculator, must be adjusted, by applying the model in a case-control (GC versus healthy population) setting. It must also be kept in mind that some of the highlighted risk indicators may actually be proxies for other unaddressed predictors. Furthermore, the fact that we had to turn our multinomial levels (answers), into binomial, may have oversimplified our model. Another point of concern is the external validation of this model on other sample cohorts with diverse environmental, cultural, and social characteristics.

Nevertheless, applying such an inexpensive GC risk calculator, using questionnaire-based information, can provide the first step in screening Iranian at risk patients, to be followed by more complex laboratory and clinical screenings. Furthermore, providing information about individualized GC risk status, can lead to attempts at correction of the modifiable risk behaviours. Future studies include, validation of this model in case-control settings, in different geographic locations.

Availability of data and materials

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.



Gastric cancer


Non-ulcer dyspeptic


Receiver operating characteristic


Ministry of Health and Medical Education


Stomach cancer Pooling


Shared percentage


Full conditional specification


Joint modeling


Multivariate imputation by chained equations


National cancer institute of Iran


International Agency for Research on Cancer


Risk scores


  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.

    Article  PubMed  Google Scholar 

  2. Alvarez EM, Force LM, Xu R, Compton K, Lu D, Henrikson HJ, Kocarnik JM, Harvey JD, Pennini A, Dean FE. The global burden of adolescent and young adult cancer in 2019: a systematic analysis for the global burden of disease study 2019. Lancet Oncol. 2022;23(1):27–52.

    Article  Google Scholar 

  3. Thrift AP, Nguyen TH. Gastric Cancer epidemiology. Gastrointest Endosc Clin N Am. 2021;31(3):425–39.

    Article  PubMed  Google Scholar 

  4. Bertuccio P, Chatenoud L, Levi F, Praud D, Ferlay J, Negri E, Malvezzi M, La Vecchia C. Recent patterns in gastric cancer: a global overview. Int J Cancer. 2009;125(3):666–73.

    Article  CAS  PubMed  Google Scholar 

  5. Bertuccio P, Rosato V, Andreano A, Ferraroni M, Decarli A, Edefonti V, La Vecchia C. Dietary patterns and gastric cancer risk: a systematic review and meta-analysis. Ann Oncol. 2013;24(6):1450–8.

    Article  CAS  PubMed  Google Scholar 

  6. Du S, Li Y, Su Z, Shi X, Johnson NL, Li P, Zhang Y, Zhang Q, Wen L, Li K, et al. Index-based dietary patterns in relation to gastric cancer risk: a systematic review and meta-analysis. Br J Nutr. 2020;123(9):964–74.

    Article  CAS  PubMed  Google Scholar 

  7. de Menezes RF, Bergmann A, Thuler LC. Alcohol consumption and risk of cancer: a systematic literature review. Asian Pac J Cancer Prev. 2013;14(9):4965–72.

    Article  PubMed  Google Scholar 

  8. Sasco AJ, Secretan MB, Straif K. Tobacco smoking and cancer: a brief review of recent epidemiological evidence. Lung Cancer. 2004;45(Suppl 2):S3–9.

    Article  PubMed  Google Scholar 

  9. Ladeiras-Lopes R, Pereira AK, Nogueira A, Pinheiro-Torres T, Pinto I, Santos-Pereira R, Lunet N. Smoking and gastric cancer: systematic review and meta-analysis of cohort studies. Cancer Causes Control. 2008;19(7):689–701.

    Article  PubMed  Google Scholar 

  10. Dong J, Thrift AP. Alcohol, smoking and risk of oesophago-gastric cancer. Best Pract Res Clin Gastroenterol. 2017;31(5):509–17.

    Article  PubMed  Google Scholar 

  11. Cavatorta O, Scida S, Miraglia C, Barchi A, Nouvenne A, Leandro G, Meschi T, De' Angelis GL, Di Mario F. Epidemiology of gastric cancer and risk factors. Acta Biomed. 2018;89(8-s):82–7.

    PubMed  Google Scholar 

  12. Machlowska J, Baj J, Sitarz M, Maciejewski R, Sitarz R. Gastric Cancer: epidemiology, risk factors, classification, genomic characteristics and treatment strategies. Int J Mol Sci. 2020;21(11):4012.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Guggenheim DE, Shah MA. Gastric cancer epidemiology and risk factors. J Surg Oncol. 2013;107(3):230–6.

    Article  PubMed  Google Scholar 

  14. Yusefi AR, Bagheri Lankarani K, Bastani P, Radinmanesh M, Kavosi Z. Risk factors for gastric Cancer: a systematic review. Asian Pac J Cancer Prev. 2018;19(3):591–603.

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Lin Y, Zheng Y, Wang HL, Wu J. Global patterns and trends in gastric Cancer incidence rates (1988-2012) and predictions to 2030. Gastroenterology. 2021;161(1):116–127.e118.

    Article  PubMed  Google Scholar 

  16. National Research Council Committee on Risk P. Communication. In: Improving Risk Communication. Washington (DC): National Academies Press (US) Copyright © 1989 by the National Academy of Sciences; 1989.

    Google Scholar 

  17. In: Fundamentals of Clinical Data Science. edn. Edited by Kubben P, Dumontier M, Dekker A. Cham (CH): Springer Copyright 2019, The Editor(s) (if applicable) and The Author(s). This book is an open access publication. 2019. ISBN 978-3-319-99712-4, ISBN 978-3-319-99713-1 (eBook).

  18. Mishra GA, Dhivar HD, Gupta SD, Kulkarni SV, Shastri SS. A population-based screening program for early detection of common cancers among women in India - methodology and interim results. Indian J Cancer. 2015;52(1):139–45.

    Article  CAS  PubMed  Google Scholar 

  19. Li X, Bian D, Yu J, Li M, Zhao D. Using machine learning models to improve stroke risk level classification methods of China national stroke screening. BMC Med Inform Decis Mak. 2019;19(1):261.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Heikes KE, Eddy DM, Arondekar B, Schlessinger L. Diabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes Care. 2008;31(5):1040–5.

    Article  PubMed  Google Scholar 

  21. Driver JA, Gaziano JM, Gelber RP, Lee IM, Buring JE, Kurth T. Development of a risk score for colorectal cancer in men. Am J Med. 2007;120(3):257–63.

    Article  PubMed  Google Scholar 

  22. Lloyd-Jones DM, Wilson PW, Larson MG, Beiser A, Leip EP, D'Agostino RB, Levy D. Framingham risk score and prediction of lifetime risk for coronary heart disease. Am J Cardiol. 2004;94(1):20–4.

    Article  PubMed  Google Scholar 

  23. Wittekind C, Compton CC, Greene FL, Sobin LH. TNM residual tumor classification revisited. Cancer. 2002;94(9):2511–6.

    Article  PubMed  Google Scholar 

  24. Lauren P. The two histological main types of gastric carcinoma: diffuse and so-called intestinal-type carcinoma. An attempt at a histo-clinical classification. Acta Pathol Microbiol Scand. 1965;64:31–49.

    Article  CAS  PubMed  Google Scholar 

  25. Zhang Z. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med. 2016;4(2):30.

    PubMed  PubMed Central  Google Scholar 

  26. Yucel RM. Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philos Trans A Math Phys Eng Sci. 1874;2008(366):2389–403.

    Google Scholar 

  27. van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007;16(3):219–42.

    Article  PubMed  Google Scholar 

  28. Wahl S, Boulesteix AL, Zierer A, Thorand B, van de Wiel MA. Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation. BMC Med Res Methodol. 2016;16(1):144.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M. Logistic regression. Springer; 2002.

    Google Scholar 

  30. Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med. 2013;4(2):627–35.

    PubMed  PubMed Central  Google Scholar 

  31. Refaeilzadeh P, Tang L, Liu H, Liu L. Encyclopedia of database systems. In: Cross-validation. Springer; 2009. p. 532–8.

    Google Scholar 

  32. Blaha MJ. The critical importance of risk score calibration: time for transformative approach to risk score validation? J Am Coll Cardiol. 2016;67(18):2131–4.

    Article  PubMed  Google Scholar 

  33. Landwehr JM, Pregibon D, Shoemaker AC. Graphical methods for assessing logistic regression models. J Am Stat Assoc. 1984;79(385):61–71.

    Article  Google Scholar 

  34. Zhu X, Lv J, Zhu M, Yan C, Deng B, Yu C, Guo Y, Ni J, She Q, Wang T, et al. Development, validation, and evaluation of a risk assessment tool for personalized screening of gastric cancer in Chinese populations. BMC Med. 2023;21(1):159.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Trinh TTK, Lee K, Oh JK, Suh M, Jun JK, Choi KS. Cluster of lifestyle risk factors for stomach cancer and screening behaviors among Korean adults. Sci Rep. 2023;13(1):17503.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Liu Q, Zeng X, Wang W, Huang RL, Huang YJ, Liu S, Huang YH, Wang YX, Fang QH, He G, et al. Awareness of risk factors and warning symptoms and attitude towards gastric cancer screening among the general public in China: a cross-sectional study. BMJ Open. 2019;9(7):e029638.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Necula L, Matei L, Dragu D, Neagu AI, Mambet C, Nedeianu S, Bleotu C, Diaconu CC, Chivu-Economescu M. Recent advances in gastric cancer early diagnosis. World J Gastroenterol. 2019;25(17):2029–44.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Pasechnikov V, Chukov S, Fedorov E, Kikuste I, Leja M. Gastric cancer: prevention, screening and early diagnosis. World J Gastroenterol. 2014;20(38):13842–62.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Karimi P, Islami F, Anandasabapathy S, Freedman ND, Kamangar F. Gastric cancer: descriptive epidemiology, risk factors, screening, and prevention. Cancer Epidemiol Biomark Prev. 2014;23(5):700–13.

    Article  Google Scholar 

  40. Correa P. Gastric cancer: overview. Gastroenterol Clin N Am. 2013;42(2):211–7.

    Article  Google Scholar 

  41. Ang TL, Fock KM. Clinical epidemiology of gastric cancer. Singap Med J. 2014;55(12):621–8.

    Article  Google Scholar 

  42. Smyth EC, Nilsson M, Grabsch HI, van Grieken NC, Lordick F. Gastric cancer. Lancet. 2020;396(10251):635–48.

    Article  CAS  PubMed  Google Scholar 

  43. Venerito M, Link A, Rokkas T, Malfertheiner P. Gastric cancer - clinical and epidemiological aspects. Helicobacter. 2016;21(Suppl 1):39–44.

    Article  CAS  PubMed  Google Scholar 

  44. Pelucchi C, Lunet N, Boccia S, Zhang ZF, Praud D, Boffetta P, Levi F, Matsuo K, Ito H, Hu J, et al. The stomach cancer pooling (StoP) project: study design and presentation. Eur J Cancer Prev. 2015;24(1):16–23.

    Article  PubMed  Google Scholar 

  45. Choi Y, Kim N, Kim KW, Jo HH, Park J, Yoon H, Shin CM, Park YS, Lee DH. Gastric Cancer in older patients: a retrospective study and literature review. Ann Geriatr Med Res. 2022;26(1):33–41.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Lee JG, Kim SA, Eun CS, Han DS, Kim YS, Choi BY, Song KS, Kim HJ, Park CH. Impact of age on stage-specific mortality in patients with gastric cancer: a long-term prospective cohort study. PLoS One. 2019;14(8):e0220660.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Chung HW, Noh SH, Lim JB. Analysis of demographic characteristics in 3242 young age gastric cancer patients in Korea. World J Gastroenterol. 2010;16(2):256–63.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Choi Y, Kim N, Kim KW, Jo HH, Park J, Yoon H, Shin CM, Park YS, Lee DH, Oh HJ, et al. Sex-based differences in histology, staging, and prognosis among 2983 gastric cancer surgery patients. World J Gastroenterol. 2022;28(9):933–47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Mohebbi M, Mahmoodi M, Wolfe R, Nourijelyani K, Mohammad K, Zeraati H, Fotouhi A. Geographical spread of gastrointestinal tract cancer incidence in the Caspian Sea region of Iran: spatial analysis of cancer registry data. BMC Cancer. 2008;8:137.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Malekzadeh R, Derakhshan MH, Malekzadeh Z. Gastric cancer in Iran: epidemiology and risk factors. Arch Iran Med. 2009;12(6):576–83.

    PubMed  Google Scholar 

  51. Kalan Farmanfarma K, Mahdavifar N, Hassanipour S, Salehiniya H. Epidemiologic study of gastric Cancer in Iran: a systematic review. Clin Exp Gastroenterol. 2020;13:511–42.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Tam YH, Yeung CK, Lee KH, Sihoe JD, Chan KW, Cheung ST, Mou JW. A population-based study of Helicobacter pylori infection in Chinese children resident in Hong Kong: prevalence and potential risk factors. Helicobacter. 2008;13(3):219–24.

    Article  PubMed  Google Scholar 

  53. Nouraie M, Latifi-Navid S, Rezvan H, Radmard AR, Maghsudlu M, Zaer-Rezaii H, Amini S, Siavoshi F, Malekzadeh R. Childhood hygienic practice and family education status determine the prevalence of Helicobacter pylori infection in Iran. Helicobacter. 2009;14(1):40–6.

    Article  PubMed  Google Scholar 

  54. Lagergren J, Andersson G, Talbäck M, Drefahl S, Bihagen E, Härkönen J, Feychting M, Ljung R. Marital status, education, and income in relation to the risk of esophageal and gastric cancer by histological type and site. Cancer. 2016;122(2):207–12.

    Article  PubMed  Google Scholar 

  55. Stephens MR, Blackshaw GR, Lewis WG, Edwards P, Barry JD, Hopper NA, Allison MC. Influence of socio-economic deprivation on outcomes for patients diagnosed with gastric cancer. Scand J Gastroenterol. 2005;40(11):1351–7.

    Article  PubMed  Google Scholar 

  56. Slavin TP, Weitzel JN, Neuhausen SL, Schrader KA, Oliveira C, Karam R. Genetics of gastric cancer: what do we know about the genetic risks? Transl Gastroenterol Hepatol. 2019;4:55.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Ren JS, Freedman ND, Kamangar F, Dawsey SM, Hollenbeck AR, Schatzkin A, Abnet CC. Tea, coffee, carbonated soft drinks and upper gastrointestinal tract cancer risk in a large United States prospective cohort study. Eur J Cancer. 2010;46(10):1873–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Sun WY, Yang H, Wang XK, Fan JH, Qiao YL, Taylor PR. The association between family history of upper gastrointestinal Cancer and the risk of death from upper gastrointestinal Cancer-based on Linxian dysplasia nutrition intervention trial (NIT) cohort. Front Oncol. 2022;12:897534.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Thrift AP, Jove AG, Liu Y, Tan MC, El-Serag HB. Associations of duration, intensity, and quantity of smoking with risk of gastric intestinal metaplasia. J Clin Gastroenterol. 2022;56(1):e71–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Mao XQ, Jia XF, Zhou G, Li L, Niu H, Li FL, Liu HY, Zheng R, Xu N. Green tea drinking habits and gastric cancer in Southwest China. Asian Pac J Cancer Prev. 2011;12(9):2179–82.

    PubMed  Google Scholar 

  61. Xu L, Zhang X, Lu J, Dai JX, Lin RQ, Tian FX, Liang B, Guo YN, Luo HY, Li N, et al. The effects of dinner-to-bed time and post-dinner walk on gastric Cancer across different age groups: a multicenter case-control study in Southeast China. Medicine (Baltimore). 2016;95(16):e3397.

    Article  PubMed  Google Scholar 

  62. Huang L, Chen L, Gui ZX, Liu S, Wei ZJ, Xu AM. Preventable lifestyle and eating habits associated with gastric adenocarcinoma: a case-control study. J Cancer. 2020;11(5):1231–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Wu B, Yang D, Yang S, Zhang G. Dietary salt intake and gastric Cancer risk: a systematic review and Meta-analysis. Front Nutr. 2021;8:801228.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Thapa S, Fischbach LA, Delongchamp R, Faramawi MF, Orloff M. Association between dietary salt intake and progression in the gastric precancerous process. Cancers (Basel). 2019;11(4):467.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Ge S, Feng X, Shen L, Wei Z, Zhu Q, Sun J. Association between habitual dietary salt intake and risk of gastric Cancer: a systematic review of observational studies. Gastroenterol Res Pract. 2012;2012:808120.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Shen JG, Jin LD, Dong MJ, Wang LB, Zhao WH, Shen J. Low level of serum high-density lipoprotein cholesterol in gastric cancer correlates with cancer progression but not survival. Transl Cancer Res. 2020;9(10):6206–13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Miao P, Guan L. Association of Dietary Cholesterol Intake with Risk of gastric Cancer: a systematic review and Meta-analysis of observational studies. Front Nutr. 2021;8:722450.

    Article  PubMed  PubMed Central  Google Scholar 

  68. Shin HJ, Roh CK, Son SY, Hoon H, Han SU. Prognostic value of hypocholesterolemia in patients with gastric cancer. Asian J Surg. 2021;44(1):72–9.

    Article  PubMed  Google Scholar 

  69. Pih GY, Gong EJ, Choi JY, Kim MJ, Ahn JY, Choe J, Bae SE, Chang HS, Na HK, Lee JH, et al. Associations of serum lipid level with gastric Cancer risk, pathology, and prognosis. Cancer Res Treat. 2021;53(2):445–56.

    Article  CAS  PubMed  Google Scholar 

Download references


The author would like to sincerely thank Professor Emeritis Olof Nyren, (Karolinska Institutet, Stockholm, Sweden) for his expert and generous guidance in formulating the interview questionnaire. We would also like to thank every interviewer who have generously assisted in obtaining the data on our patient population over the years.


This study was supported by a technical assistance grant (IRN-072), which was co-funded by the Islamic Development Bank, Saudi Arabia and Pasteur Institute of Iran.

Author information

Authors and Affiliations



KG: Modelling, methodology, computational analysis, drafting the manuscript, and assistance in conceiving the concept and study design; SS: Data collection, assistance in conceiving the concept, study design and drafting the manuscript; ME: Data collection, database entry and management; MT: Data collection, MEH: Clinical data collection on endoscopy patients; AN: Data collection; AK: Methodology; MAM: Clinical data collection on gastric cancer patients; MM: Conceptualization, study design, supervision, drafting, revising and finalizing the manuscript.

Corresponding authors

Correspondence to Anoshirvan Kazemnejad or Marjan Mohammadi.

Ethics declarations

Ethics approval and consent to participate

All methods were carried out following relevant guidelines and regulations. Data collections were carried out following obtaining the participants’ written informed consent, according to the protocols approved by the National Committee on Ethical Issues in Medical Research, Ministry of Health and Medical Education (MOHME) of Iran; Ref No. 315.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Marjan Mohammadi is the main corresponding author and Anoshirvan Kazemnejad is the co-corresponding author.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gohari, K., Saberi, S., Esmaieli, M. et al. Development of a gastric cancer risk calculator for questionnaire-based surveillance of Iranian dyspeptic patients. BMC Gastroenterol 24, 39 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: