Deep learning as a novel method for endoscopic diagnosis of chronic atrophic gastritis: a prospective nested case–control study

Zhao, Quchuan; Jia, Qing; Chi, Tianyu

doi:10.1186/s12876-022-02427-2

Research
Open access
Published: 25 July 2022

Deep learning as a novel method for endoscopic diagnosis of chronic atrophic gastritis: a prospective nested case–control study

Quchuan Zhao¹,
Qing Jia² &
Tianyu Chi¹

BMC Gastroenterology volume 22, Article number: 352 (2022) Cite this article

1732 Accesses
3 Citations
1 Altmetric
Metrics details

Abstract

Background and aims

Chronic atrophic gastritis (CAG) is a precancerous disease that often leads to the development of gastric cancer (GC) and is positively correlated with GC morbidity. However, the sensitivity of the endoscopic diagnosis of CAG is only 42%. Therefore, we developed a real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net deep learning (DL) and conducted a prospective nested case–control study to evaluate the diagnostic evaluation indices of the model and its consistency with pathological diagnosis.

Methods

Our cohort consisted of 1539 patients undergoing gastroscopy from December 1, 2020, to July 1, 2021. Based on pathological diagnosis, patients in the cohort were divided into the CAG group or the chronic nonatrophic gastritis (CNAG) group, and we assessed the diagnostic evaluation indices of this model and its consistency with pathological diagnosis after propensity score matching (PSM) to minimize selection bias in the study.

Results

After matching, the diagnostic evaluation indices and consistency evaluation of the model were better than those of endoscopists [sensitivity (84.02% vs. 62.72%), specificity (97.04% vs. 81.95%), positive predictive value (96.60% vs. 77.66%), negative predictive value (85.86% vs. 68.73%), accuracy rate (90.53% vs. 72.34%), Youden index (81.06% vs. 44.67%), odd product (172.5 vs. 7.64), positive likelihood ratio (28.39 vs. 3.47), negative likelihood ratio (0.16 vs. 0.45), AUC (95% CI) [0.909 (0.884–0.934) vs. 0.740 (0.702–0.778)] and Kappa (0.852 vs. 0.558)].

Conclusions

Our prospective nested case–control study proved that the diagnostic evaluation indices and consistency evaluation of the real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL were superior to those of endoscopists.

Trial registration ChiCTR2100044458, 18/03/2020.

Peer Review reports

Introduction

China is a country with a high gastric cancer (GC) morbidity. Every year, there are approximately 400,000 new cases of GC and approximately 350,000 deaths. The number of new cases and deaths accounts for 40% of the world's GC cases. The prognosis of GC is closely related to the timing of diagnosis and treatment. Early detection, early diagnosis and early treatment of cancer have always been the main strategies to reduce mortality and improve survival [1]. Chronic atrophic gastritis (CAG) is a precancerous disease of GC that positively correlates with GC morbidity [2, 3]. Early diagnosis and treatment of CAG is an efficient and feasible way to modify the severe situation of diagnosis and treatment of GC. However, if pathological diagnosis is taken as the "gold standard", the sensitivity of the endoscopic diagnosis of CAG is only 42% [4]. Therefore, determining how to improve the diagnostic rate of CAG and the coincidence rate between endoscopy and pathological diagnosis has been a hot issue of clinical attention.

In recent years, artificial intelligence (AI) has made breakthroughs in the field of image recognition. In particular, the emergence of deep learning (DL) has eliminated the need for the artificial extraction of data features, which is inefficient and incomplete [5,6,7]. Although the DL technique combined with digestive endoscopy has become one of the hot topics in the field of digestive research [8,9,10], research on the application of DL in the identification of CAG remains less common; most of the research on training and validation of the model has used static images from retrospective data, and less of the research has been on the identification of real-time video monitoring [11,12,13].

Therefore, we developed a real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL. Our team previously proved through a prospective cohort study that the real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL can improve the endoscopic diagnosis rate of CAG compared with that of endoscopists [14]. To further verify the performance of the model, we enrolled additional patients into the cohort and conducted a prospective nested case–control study to evaluate the diagnostic evaluation indices of the model and its consistency with pathological diagnosis, using pathological diagnosis as the gold standard.

Methods

Sample size calculation

PASS 15 (NCSS, LCC., Kaysville, Utah) was used to calculate the sample size. We planned to use patients in the cohort to conduct a prospective nested case–control study to verify the sensitivity, specificity and other diagnostic evaluation indices of the DL model for CAG. The operational process was as follows: Proportions → One Proportion → Confidence Interval → Confidence Interval for One Proportion. According to the guidelines, with pathological diagnosis as the "gold standard", the sensitivity and specificity of endoscopic diagnosis of atrophy are only 42% and 91% [4], respectively. We assumed that the DL model could improve the sensitivity by 50%, and we set α = 0.05 and the confidence interval = 10%. Based on the estimation of the minimum sample size required, the sample sizes of the CAG group and CNAG group were equal, requiring 93 samples for both the CAG group and the CNAG group.

Study design and participants

We performed a prospective nested case–control study. Our cohort consisted of 1539 patients who were at least 18 years old and volunteered to participate in this study to undergo gastroscopy in the digestive endoscopy center of our hospital from December 1, 2020 to July 1, 2021. This study protocol (XWKD-2020086) was approved by the ethics committee of Xuanwu Hospital of Capital Medical University. The written informed consent was obtained from all the participants in the study.

The endoscopist diagnosis procedure: In accordance with the guidelines [4], the endoscopist routinely took 3 biopsies from the gastric antrum, gastric angle and gastric body for each patient during the process of gastroscopy operation; additionally, another biopsy was taken from the suspected atrophy site. Olympus GIF-HQ290 was used to perform gastroscopy for patients, and Boston Scientific Radial Jaw 4 biopsy forceps were used to take biopsies.

The real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL diagnosis procedure: Synchronized with the doctor's observations, the DL model also marked the suspected atrophy sites during real-time video monitoring of the same patient, after which the assistant informed the doctor to proceed with the biopsy of the suspected atrophy sites as labeled by the DL model. If the suspected atrophy site labeled by the DL model overlapped with the suspected atrophy site observed by the endoscopist, there was no need for another biopsy.

Based on the pathological results of the biopsy tissue, the patients in the cohort were divided into either the CAG group or the CNAG group, and the diagnostic evaluation indices of this model for the endoscopic diagnosis of CAG and its consistency with pathological diagnosis were evaluated.

The exclusion criteria were as follows: (1) Patients who could not tolerate gastroscopy and did not complete the procedure; (2) Patients who were found during gastroscopy to have lesions other than chronic gastritis, such as peptic ulcers or gastrointestinal malignancies; (3) Patients with contraindications to biopsy, such as taking anticoagulant or anti-platelet drugs; and (4) Patients who requested withdrawal from the study during gastroscopy.

Diagnosis of chronic atrophic gastritis

All of our operations were carried out by endoscopists who had performed more than 10,000 gastroscopy procedures, who were experienced and who held the title of associate chief physician or above. According to guidelines [4], a pathological biopsy of chronic gastritis showing atrophy of the inherent glands could lead to a diagnosis of atrophic gastritis, regardless of the number of areas or degree of atrophy of the biopsy specimen. Biopsy histopathology is very important for the diagnosis of CAG, and biopsy should be performed according to the pathological conditions and needs. For clinical diagnosis, it is recommended to take 3 pieces of tissue for biopsy in gastric antrum, gastric angle and great curvature of the middle part of gastric body. Another biopsy was taken on the suspected lesions. Specimens should be large enough to reach the mucosal muscularis [15]. The severity of atrophy was estimated by the amount of natural glands reduced in the stomach (mild: the number of natural glands is reduced, not more than 1/3 of the original glands; moderate: the number of natural glands decreased between 1/3 and 2/3 of the original glands; severe: the number of natural glands is reduced by more than 2/3 of the original glands, with only a few remaining glands or even complete disappearance). The severity of CAG can be divided into mild, moderate and severe according to the pathological conditions or C type and O type according to the range of lesions [16]. The best noninvasive method to assess HP is the urea breath test (C13) with positive DOB ≥ 4.

The real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net deep learning

With the rapid development of DL technology, the application of DL in the field of medical imaging has attracted extensive research and attention, of which determining how to automatically identify and segment lesions in medical images is one of the most concerning problems. In order to solve this problem, the U-Net network model has been proposed [17, 18]. It is based on an FCN (fully convolutional network) and consists of an encoder, bottleneck module and decoder. Due to its U-shaped structure that combines context information, fast training speed and small amounts of data, it can meet the demands of medical image segmentation [19]. The classical DL model of image recognition requires a large amount of training data. Given that it is difficult for medical images to obtain such large-scale data, U-Net simply makes up for this deficiency. The main idea of U-Net is to add a network similar to the previous one behind the contracted network, in which the pooling operator is replaced by the upsampling operator. Therefore, these layers increase output resolution [20]. For localization, high-resolution features from the contraction path are combined with the upsampled output. The continuous convolutional layer can then learn to assemble a more accurate output based on this information. Since being proposed, the U-Net network has been widely used in medical image segmentation. U-Net has become the baseline model for most semantic segmentation tasks of medical images [21, 22]. This study intends to build a deep learning-based endoscopic diagnosis model for CAG by applying U-Net.

The INPAINT_TELEA algorithm is used to process watermarks in certain areas of the gastroscopic image, such as age, gender, time and system. The objective is to remove sensitive information related to patients and avoid white watermark interference in atrophic gastritis image recognition.

This model evaluates and measures the performance of the model through dice similarity coefficient and intersection over union (IOU), commonly used evaluation indexes for medical image segmentation. Dice and IOU are both measures to measure the similarity between two sets, and are used to measure the similarity between network segmentation results and standard masks in the field of image segmentation.

The cross entropy loss function is used in the loss function, Adam optimizer is used, the initial learning rate is 0.01, attenuation rate is 0.00003. The training hardware platform is a single-card server. The CPU is Intel Xeon (Cascade Lake) Platinum 8269 2.5 GHz, and the GPU is NVIDIA A100.

Our model analyzes images in real time and automatically during gastroscopy. Fully and accurately extract and store clear images of all detected parts and atrophic lesions from the global video, and arrange them according to the operation sequence of international standard [23]. Each atrophy lesion in each patient was automatically labeled and the atrophy severity was assessed. After the examination, the diagnosis of CAG on the patient level was made according to the guidelines [4].

Deep learning model training and testing

In this study, a U-Net network was used to build a real-time video monitoring endoscopic diagnosis model for CAG based on DL.

This is done in three steps. The first is the preparation of the dataset: Based on the pathological diagnosis, 5290 high-quality endoscopic images of 1711 patients who underwent gastroscopy in our hospital from August 1, 2019 to August 1, 2020 were labeled by two gastroenterologists who had the experience of having performed more than 10,000 gastroscopy cases and who held the title of associate chief physician or above. A total of 4175 images of CAG were labeled, including 2389 images of mild atrophic gastritis, 977 images of moderate atrophic gastritis and 809 images of severe atrophic gastritis. In addition, 1115 images of CNAG were labeled. Then, according to the severity of atrophy, 70% of the images were included in the training set, and 30% of the images were included in the test set by stratified random method. The accuracy of the model was adjusted by fivefold cross validation with 3703 gastroscopy images.

The second step was the definition of the model: the concrete structure of the model definition. The left part was an encoder, which consisted of two 3 × 3 convolution layers (ReLU) and a 2 × 2 maxpooling layer to form a subsampling module. A total of four subsampling modules were connected together to form an encoder, and the encoder was then connected to the decoder in the right half. The decoder was repeatedly composed of a deconvolution layer + feature splicing concat + two 3 × 3 convolution layers (ReLU).

The third step was the training of the model. The training process involved taking a test image as the input of the U-Net model and obtaining the output after the model processing. The output results were compared with the results of labeled gastritis lesions, and a current loss value was calculated according to the loss function. The loss propagated backward along the network structure, the gradient of the parameters of this layer was calculated at each layer of the network, and the parameters were updated according to the gradient. The loss function here is the BCEWithLogitsLoss function, and the algorithm of parameter update adopts the adaptive optimization algorithm RMSProp. The whole dataset was divided into several batches, and the above process was repeated for each BATCH to update the model parameters until convergence. After all batches of training were completed, the new model parameters fit the characteristics of the training data well and were suitable for the diagnostic task of CAG.

After the model training was completed, we tested the model using 1587 endoscopic images. The sensitivity, specificity and accuracy of the model for the endoscopic diagnosis of CAG were 92.73%, 92.24% and 92.63%, respectively.

Outcomes

Our primary outcome was to conduct a nested case–control study and to use the pathological diagnosis as the gold standard to study the sensitivity, specificity, accuracy and other diagnostic evaluation indices of the real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL, evaluate its consistency with pathological diagnosis, and draw its receiver operating characteristic (ROC) curve.

Our secondary outcome was to use the pathological diagnosis as the gold standard to conduct subgroup analysis to evaluate the sensitivity, specificity, accuracy and other diagnostic evaluation indices of the real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL in the diagnosis of mild, moderate and severe CAG.

Statistical analysis

We assessed the diagnostic evaluation indices of the DL model after propensity score matching (PSM) to minimize the selection bias in this real-world study (RWS).

Given the differences in the baseline characteristics between eligible participants in the two groups (Table 1), PSM was used to identify a cohort of patients with similar baseline characteristics. The propensity score is a conditional probability of having a particular case–control (CAG vs. CNAG) given a set of baseline measured covariates [24]. The propensity score was estimated with the use of a nonparsimonious multivariate logistic regression model, with CAG as the dependent variable and all the baseline characteristics outlined in Table 1 as covariates. Matching was performed with the use of a 1:1 matching protocol without replacement (nearest-matching algorithm), with a caliper width equal to 0.2 of the standard deviation of the logit of the propensity score [25]. Standardized differences were estimated for all the baseline covariates before and after matching to assess prematch imbalance and postmatch balance. Standardized differences of less than 0.1 for a given covariate indicate a relatively small imbalance [26].

Table 1 Baseline characteristics before and after propensity score matching

Full size table

Continuous variables are expressed as the mean and standard deviation (SD) or median and interquartile range (IQR) for skewed data, and categorical variables are expressed as frequencies (%). Continuous variables were compared using the t-test if normally distributed and the Mann–Whitney U test if not. Categorical variables were compared using the chi-square test or Fisher’s exact test. Using the data for the propensity-matched patients, ROC curves were constructed to assess sensitivity, specificity and respective areas under the curves (AUCs) with 95% CIs.

A two-tailed P value < 0.05 was considered statistically significant. All of the analyses were conducted using SPSS software, version 23.0 (IBM Corp., Armonk, NY, USA).

Sensitivity analysis

To test the robustness of the main results, several additional analyses were conducted. First, using the data for all the patients before matching, we assessed the diagnostic evaluation indices of the DL model. Second, subgroup analysis with the data before matching was also conducted by stratifying CAG patients into mild, moderate or severe groups.

Results

Study population

Figure 1 shows the study flowchart. A total of 1539 patients who underwent gastroscopy in the digestive endoscopy center of our hospital were enrolled in the study. A total of 408 patients were excluded. Reasons for exclusion included: Patients who were unable to tolerate gastroscopy and did not complete the procedure (n = 26, 1.7%); patients with peptic ulcer disease found during gastroscopy (n = 107, 7.0%); patients with gastrointestinal malignancies found during gastroscopy (n = 18, 1.2%); patients with gastric polyps found during gastroscopy (n = 34, 2.2%); patients with contraindications to biopsy, such as taking anticoagulant or antiplatelet drugs (n = 194, 12.6%); and patients who requested withdrawal from the study during gastroscopy (n = 29, 1.9%).

A total of 1131 patients constituted the study cohort, including 338 (29.9%) patients in the CAG group and 793 (70.1%) patients in the CNAG group. Before PSM, there were differences between the two groups in several of the baseline variables (Table 1). With the use of PSM, 338 CAG patients were matched with 338 CNAG patients. After matching, the standardized differences were less than 0.1 for all variables, indicating only small differences between the two groups (Table 1).

Primary outcomes

We conducted a nested case–control study with the present cohort. After matching and taking pathological diagnosis as the gold standard, the diagnostic evaluation indices and consistency evaluation of the real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL were better than those of endoscopists (Table 2, Fig. 2).

Table 2 Diagnostic evaluation indices and the evaluation of consistency with pathological diagnosis in the deep learning group and endoscopist group before and after propensity score matching

Full size table

Secondary outcomes

With pathological diagnosis as the gold standard, subgroup analysis was conducted. After matching, the sensitivity, specificity, accuracy and other diagnostic evaluation indices of the real-time video monitoring diagnosis model for CAG based on U-Net DL were better than those of endoscopists in the diagnosis of mild, moderate and severe CAG (Table 3).

Table 3 Diagnostic evaluation indices in the deep learning group and endoscopist group after propensity score matching in subgroups for the severity of CAG

Full size table

Sensitivity analysis

We performed statistical analysis on all patients before matching and obtained similar results (Table 2).

Subgroup analysis with the data before matching was also conducted by stratifying CAG patients into the mild, moderate or severe groups. With pathological diagnosis as the gold standard, in all subgroup analyses before matching, the sensitivity, specificity, accuracy and other diagnostic evaluation indices of the real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL were better than those of endoscopists in the diagnosis of mild, moderate and severe CAG (Table 4).

Table 4 Diagnostic evaluation indices in the deep learning group and endoscopist group before propensity score matching in subgroups for the severity of CAG

Full size table

Discussion

The morbidity and mortality of GC in China rank first in the world. Reducing the morbidity and mortality of GC in China is one of the major public health problems that urgently need to be solved [1]. A prospective study of 1592 patients with CAG by Chinese scholars revealed the progression of CAG. Among them, 23 patients (1.44%) had GC due to CAG, and 349 patients (21.92%) had atypical hyperplasia. As age increased, atrophy and intestinal metaplasia deteriorated in more than 35% of patients [27]. Early detection and diagnosis of CAG can prevent the formation of GC to a certain extent, but the difficulty of diagnosis and the rate of missed diagnoses have brought great challenges to endoscopists [28]. According to the "Consensus of Chronic Gastritis in China", the endoscopic manifestations of CAG are red and white mucosa, mainly white mucosa, folds that flatten or even disappear, and exposure of some mucosal vessels. These features may be accompanied by mucosal granules or nodules [4]. However, in clinical practice, identifying mucosal atrophy is mainly based on the subjective impression of endoscopists and depends on their understanding of the guidelines, previous operating experience, and the standard training level conducted by the hospital and other factors. Therefore, the diagnosis of CAG solely dependent on endoscopists is uncertain and varies greatly [29]. Studies have shown that the proportion of endoscopic diagnoses for CAG varies greatly in different regions and in different hospitals in the same region, fluctuating from 17.7 to 39.8%, and the sensitivity of the endoscopic diagnosis of CAG is only 42% [28]. Endoscopic atrophy classification exhibited a significant correlation between histological atrophy and intestinal metaplasia, and represents a noninvasive classification method [16, 30]. Endoscopic grading can predict histological atrophy with few false negatives, indicating that precancerous conditions can be identified during screening endoscopy [31, 32]. Therefore, it is particularly important to improve the endoscopic diagnosis rate of CAG. Determining how to achieve consistent and accurate early detection and diagnosis of CAG by every endoscopist has always been a difficult problem that clinical guidelines have been trying but have been unable to solve.

DL is an improvement of artificial neural networks, which are composed of more layers of neural networks, allowing the higher layer to contain more abstract information for data prediction. To date, DL has become the leading machine learning tool in the field of computer vision [5, 19, 33]. A typical convolutional neural network (CNN) model used for image processing in DL consists of a series of convolutional networks, including a series of convolutional layers, pooling layers and fully connected layers. Like low-level visual processing in the human brain, convolutional network detection extracts image features, such as lines or circles that might represent straight edges (such as organ detection) or circles (colon polyp detection), followed by higher-order features, such as local and global shape or texture feature extraction [6, 34]. CNN needs to acquire a large amount of training data, while medical images have difficulty obtaining such large-scale data [11]. Therefore, a kind of network model, namely U-Net, which is especially suitable for biomedical image processing tasks, is emerging at the right moment. The main idea of U-Net is to supplement a network similar to the previous one after the contraction network, in which the pooling operator is replaced by the upsampling operator. Therefore, these layers increase the resolution of the output. For localization, the high-resolution features from the contraction path are combined with the upsampled output. The continuous convolutional layer can then learn to assemble a more accurate output based on this information [35,36,37]. Since being proposed, the U-Net network has been widely used in medical image segmentation. U-Net was first published in MICCAI in 2015 and then became the baseline model for most of the semantic segmentation tasks of medical images [38]. It also inspired a large number of researchers to think about U-shaped semantic segmentation networks. In the field of natural image understanding, an increasing number of semantic segmentation and target detection SOTA models have begun to pay attention to and use U-shaped structures [39,40,41].

The application of DL combined with digestive endoscopy has become a research hot topic, especially for the diagnosis of upper digestive tract diseases [42]. At present, the main research directions focus on DL auxiliary detection of Barrett's esophagus, auxiliary detection of esophageal cancer, auxiliary detection of GC, auxiliary detection of Helicobacter pylori infection and auxiliary identification of anatomical sites, especially for early cancer [43]. Some scholars have applied AI to the study of traditional endoscopy. After machine learning through upper digestive tract endoscopic images, the sensitivity of malignant lesions was as high as 98%, and the negative predictive value was 95%, but the positive predictive value was only 40%. It is possible to improve the positive predictive value by increasing the number of learning samples [8]. By using a large number of traditional endoscopic images and using the convolutional neural network in the DL algorithm, some scholars have established a computer-aided diagnosis (CAD) system capable of automatic detection of early GC. It can identify lesions quickly and has a sensitivity of 92%, indicating that the CAD system with this algorithm as the core has strong clinical diagnostic ability [44]. While many scholars focus on early cancer of the upper digestive tract, our study focuses on early lesions of "early gastric cancer", "chronic atrophic gastritis", so as to "move forward the threshold" and more effectively reduce the occurrence of GC. Studies have shown that the accuracy, sensitivity and specificity of the convolutional neural network model for the diagnosis of atrophic gastritis are 0.942, 0.945 and 0.940, respectively, which are all higher than those of ordinary endoscopic experts, while the detection rates of mild, moderate and severe atrophic gastritis are 93%, 95% and 99%, respectively [12, 29]. However, the data used for the training and validation of the model in the above studies were all retrospective endoscopic static images, and the data were artificially preliminarily screened, thus lacking prospective research results. At present, prospective studies mainly focus on the recognition of static images, while the recognition of real-time surveillance video is limited. Our study extends the above studies well, develops a U-Net DL model for the diagnosis of CAG that can be applied in real-time video monitoring of gastroscopy, and conducts a prospective nested case–control study using PSM.

In our study, pathological diagnosis was taken as the gold standard, and it was found that the diagnostic evaluation indices and consistency evaluation of the real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL were better than those of endoscopists. The sensitivity (84.02% vs. 62.72%) and specificity (97.04% vs. 81.95%) showed that the model had good ability to detect CAG and identify CNAG. The positive predictive value (96.60% vs. 77.66%) and negative predictive value (85.86% vs. 68.73%) showed that the patients with a positive diagnosis were more likely to be diagnosed with CAG, and the patients with a negative diagnosis were more likely to be diagnosed with CNAG. The accuracy rate (90.53% vs. 72.34%) showed that the diagnostic ability of CAG and CNAG was good. The Youden index (81.06% vs. 44.67%) showed that the model was more authentic. The odd product (172.5 vs. 7.64) showed that the diagnostic value of this model was high. The positive likelihood ratio (28.39 vs. 3.47) and negative likelihood ratio (0.16 vs. 0.45) indicated that the model had a good ability to detect CAG and identify CNAG when excluding the influence of prevalence; AUC (95% CI) [0.909 (0.884–0.934) vs. 0.740 (0.702–0.778)] and Kappa (0.852 vs. 0.558). The AUC of this model was > 0.9, indicating a high diagnostic accuracy. The Kappa of this model was > 0.8, indicating that it has better consistency with pathological diagnosis. Accurate diagnosis of CAG has always been difficult in gastroscopy, and the sensitivity of endoscopists is only 42%. The basic reason for the low sensitivity and accuracy of endoscopists in the diagnosis of CAG is that the diagnosis is mainly made through the subjective observation of gastric mucosal morphological characteristics under gastroscopy and the lack of quantitative indicators. At the same time, only the morphological description of CAG is given in the guidelines, and there is no quantitative standard. Therefore, the subjective judgment of endoscopists is likely to lead to misdiagnosis and missed diagnoses, and even the same doctor may draw different conclusions when observing the same case at different times. Our model well makes up for the above deficiencies. As a physician's assistant, it can objectively, stably and efficiently diagnose CAG.

At the same time, we conducted a subgroup analysis, and after matching, the sensitivity, specificity, accuracy and other diagnostic evaluation indices of the endoscopic diagnosis model for CAG based on U-Net DL were better than those of endoscopists in the diagnosis of mild, moderate and severe CAG. The sensitivity (72.12% vs. 39.42%, 85.71% vs. 62.59%, 95.40% vs. 90.80%, respectively) and the specificity (97.04% vs. 81.95% for both subgroups) of the model in the diagnosis of mild, moderate and severe CAG showed that its ability to detect mild, moderate and severe CAG was superior to that of endoscopists. The positive predictive values (88.34% vs. 40.20%, 92.65% vs. 60.13%, 89.25% vs. 56.43%, respectively) and the negative predictive values (91.88% vs. 81.47%, 93.98% vs. 83.43%, 98.80% vs. 97.19%, respectively) of the model indicated that patients diagnosed with mild, moderate and severe CAG had a higher probability of being diagnosed with mild, moderate and severe CAG compared to those found by endoscopists, while patients diagnosed with CNAG had a higher probability of being diagnosed with CNAG compared to that found by endoscopists. The accuracy rate (91.18% vs. 71.95%, 93.61% vs. 76.08%, 96.71% vs. 83.76%, respectively) of the model showed that its ability to diagnose mild, moderate and severe CAG and CNAG was better than that of endoscopists. The Youden index (69.16% vs. 21.37%, 82.75% vs. 44.54%, 92.44% vs. 72.75%, respectively) showed that this model for the diagnosis of mild, moderate and severe CAG was more authentic than that of endoscopists. The odds product (84.83 vs. 2.96, 196.8 vs. 7.6, 680.6 vs. 44.84, respectively) showed that the model was more valuable than that of endoscopists in the diagnosis of mild, moderate and severe CAG. The positive likelihood ratio (24.36 vs. 2.18, 28.96 vs. 3.47, 32.23 vs. 5.03, respectively) and the negative likelihood ratio (0.29 vs. 0.74, 0.15 vs. 0.46, 0.05 vs. 0.11, respectively) showed that, when the influence of prevalence was excluded, the ability of this model to detect mild, moderate and severe CAG and to identify CNAG was better than that of endoscopists. The consensus points out that moderate to severe CAG has a certain cancer rate, and the operative link for the gastritis assessment (OLGA) system based on the severity of CAG is an important system for the clinical assessment of the morbidity of GC. Studies have shown that the gastritis stage remains unchanged in the vast majority of OLGA 0-II patients, whereas cancer occurs in OLGA III and IV patients. An OLGA stage of high-risk grade III or IV is closely related to a high risk of GC, but the consistency rate of judgment between endoscopists and pathological diagnosis is relatively low [45]. Our model solves the above problems well and can assist endoscopists in accurately judging the severity of CAG, so as to avoid missed diagnoses in high-risk populations and effectively prevent the occurrence of GC.

Our study had some limitations. First, as this is an exploratory study, we conducted a nested case–control study with a cohort from our single-center. The enrolled cases were all from our region, which may have selection bias. In the near future, we will include cases from different regions for a multi-center study to make our results more representative. Second, in order to avoid risk to patients and improve the accuracy of the model, our exclusion criteria were relatively strict, excluding patients with lesions other than chronic gastritis, such as peptic ulcers and gastrointestinal malignant tumors, found during gastroscopy. Therefore, there was a certain bias in the patients enrolled in the cohort. Given the successful experience of the present study, our cohort will be included in a wider range of patients from multiple centers and will enroll patients who have chronic gastritis that is complicated by other lesions in subsequent studies, so as to more scientifically verify the effectiveness of our model. Third, Serological tests were not included in our analysis. The combination of pepsinogen I to pepsinogen II ratio (PGR), HP antibody and gastrin 17 has been shown to screen for gastric mucosal atrophy and is referred to as a "serological biopsy" [46, 47]. The combination of non-invasive serological screening and endoscopy can improve the screening effect of gastric cancer [48]. Our follow-up study will combine our model with “serological biopsy” results for statistical analysis to make our model more reliable.

Conclusion

In conclusion, our prospective nested case–control study proves that, when taking pathological diagnosis as the gold standard, the diagnostic evaluation indices and consistency evaluation of the real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net DL were superior to those of endoscopists and can better assist endoscopists in the real-time endoscopic diagnosis of CAG.

Availability of data and materials

The dataset generated and analyzed during the study is stored in a secure localized database but is available from the corresponding author in an anonymous format on reasonable request.

Abbreviations

CAG:: Chronic atrophic gastritis
GC:: Gastric cancer
DL:: Deep learning
CNAG:: Chronic nonatrophic gastritis
PSM:: Propensity score matching
AI:: Artificial intelligence
ROC:: Receiver operating characteristic
CAD:: Computer-aided diagnosis

References

Zhuan L, Tao S, Hao W, Fan Y, Wenbing Z. Consensus on early gastric cancer screening and endoscopic diagnosis and treatment in China (2014, Changsha). Chin J Digest. 2014;31:361–77. https://doi.org/10.3760/cma.j.issn.0254-1432.2014.07.001.
Article Google Scholar
Uemura N, Okamoto S, Yamamoto S, Matsumura N, Yamaguchi S, Yamakido M, et al. Helicobacter pylori infection and the development of gastric cancer. N Engl J Med. 2001;345:784–9. https://doi.org/10.1056/NEJMoa001999.
Article CAS PubMed Google Scholar
Masuyama H, Yoshitake N, Sasai T, Nakamura T, Masuyama A, Zuiki T, et al. Relationship between the degree of endoscopic atrophy of the gastric mucosa and carcinogenic risk. Digestion. 2015;91:30–6. https://doi.org/10.1159/000368807.
Article PubMed Google Scholar
Jingyuan F, Du Y, Wenzhong L, Jianlin R, Yanqing L, Xiaoyu C, et al. Consensus on chronic gastritis in China (2017, Shanghai). Chin J Digest. 2017;37:721–38. https://doi.org/10.3760/cma.j.issn.0254-1432.2017.11.001.
Article Google Scholar
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal. 2018;40:834–48. https://doi.org/10.1109/TPAMI.2017.2699184.
Article Google Scholar
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. https://doi.org/10.1016/j.media.2017.07.005.
Article PubMed Google Scholar
Kuwahara T, Hara K, Mizuno N, Haba S, Okuno N, Kuraishi Y, et al. Artificial intelligence using deep learning analysis of endoscopic ultrasonography images for the differential diagnosis of pancreatic masses. Endoscopy. 2022. https://doi.org/10.1055/a-1873-7920.
Article PubMed Google Scholar
Horie Y, Yoshio T, Aoyama K, Yoshimizu S, Horiuchi Y, Ishiyama A, et al. Diagnostic outcomes of esophageal cancer by artificial intelligence using convolutional neural networks. Gastrointest Endosc. 2019;89:25–32. https://doi.org/10.1016/j.gie.2018.07.037.
Article PubMed Google Scholar
Li L, Chen Y, Shen Z, Zhang X, Sang J, Ding Y, et al. Convolutional neural network for the diagnosis of early gastric cancer based on magnifying narrow band imaging. Gastric Cancer Off J Int Gastric Cancer Assoc Jpn Gastric Cancer Assoc. 2020;23:126–32. https://doi.org/10.1007/s10120-019-00992-2.
Article Google Scholar
Higuchi N, Hiraga H, Sasaki Y, Hiraga N, Igarashi S, Hasui K, et al. Automated evaluation of colon capsule endoscopic severity of ulcerative colitis using ResNet50. PLoS ONE. 2022;17:e269728. https://doi.org/10.1371/journal.pone.0269728.
Article CAS Google Scholar
Mori Y, Kudo SE, Mohmed HEN, Misawa M, Ogata N, Itoh H, et al. Artificial intelligence and upper gastrointestinal endoscopy: current status and future perspective. Digest Endosc. 2019;31:378–88. https://doi.org/10.1111/den.13317.
Article Google Scholar
Guimarães P, Keller A, Fehlmann T, Lammert F, Casper M. Deep-learning based detection of gastric precancerous conditions. Gut. 2019;69:4–6. https://doi.org/10.1136/gutjnl-2019-319347.
Article PubMed Google Scholar
Zhang X, Hu W, Chen F, Liu J, Yang Y, Wang L, et al. Gastric precancerous diseases classification using CNN with a concise model. PLoS ONE. 2017;12:e185508. https://doi.org/10.1371/journal.pone.0185508.
Article CAS Google Scholar
Zhao Q, Chi T. Deep learning model can improve the diagnosis rate of endoscopic chronic atrophic gastritis: a prospective cohort study. BMC Gastroenterol. 2022;22:133. https://doi.org/10.1186/s12876-022-02212-1.
Article PubMed PubMed Central Google Scholar
Tytgat GNJ. The Sydney System: endoscopic division. Endoscopic appearances in gastritis/duodenitis. J Gastroenterol Hepatol. 1991;6:223–34. https://doi.org/10.1111/j.1440-1746.1991.tb01469.x.
Article CAS PubMed Google Scholar
Kimura K, Takemoto T. An endoscopic recognition of the atrophic border and its significance in chronic gastritis. Endoscopy. 1969;1:87–97. https://doi.org/10.1055/s-0028-1098086.
Article Google Scholar
Cao Y, Vassantachart A, Ye JC, Yu C, Ruan D, Sheng K, et al. Automatic detection and segmentation of multiple brain metastases on magnetic resonance image using asymmetric UNet architecture. Phys Med Biol. 2021;66:15003. https://doi.org/10.1088/1361-6560/abca53.
Article CAS Google Scholar
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging. 2020;39:1856–67. https://doi.org/10.1109/TMI.2019.2959609.
Article PubMed Google Scholar
Tang P, Liang Q, Yan X, Xiang S, Sun W, Zhang D, et al. Efficient skin lesion segmentation using separable-Unet with stochastic weight averaging. Comput Meth Prog Biomed. 2019;178:289–301. https://doi.org/10.1016/j.cmpb.2019.07.005.
Article Google Scholar
Jin Q, Meng Z, Sun C, Cui H, Su R. RA-UNet: a hybrid deep attention-aware network to extract liver and tumor in CT scans. Front Bioeng Biotechnol. 2020;8:1471. https://doi.org/10.3389/fbioe.2020.605132.
Article Google Scholar
Li J, Lin X, Che H, Li H, Qian X. Pancreas segmentation with probabilistic map guided bi-directional recurrent UNet. Phys Med Biol. 2021;66:115010. https://doi.org/10.1088/1361-6560/abfce3.
Article Google Scholar
Zhao B, Zhang X, Li Z, Hu X. A multi-scale strategy for deep semantic segmentation with convolutional neural networks. Neurocomputing. 2019;365:273–84. https://doi.org/10.1016/j.neucom.2019.07.078.
Article Google Scholar
Hirasawa T, Uchita K, Yano T. How many pictures are demanded for screening gastroscopy? Digest Endosc. 2016;28:33–4. https://doi.org/10.1111/den.12616.
Article Google Scholar
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55. https://doi.org/10.1093/biomet/70.1.41.
Article Google Scholar
Zhao Q, Chi T. Biopsy in emergency gastroscopy does not increase the risk of rebleeding in patients with Forrest I acute nonvariceal upper gastrointestinal bleeding combined with suspected malignant gastric ulcer: a multicenter retrospective cohort study. BMC Gastroenterol. 2021;21:250. https://doi.org/10.1186/s12876-021-01836-z.
Article PubMed PubMed Central Google Scholar
Rubin DB, Rubin DB. Using propensity scores to help design observational studies: application to the tobacco litigation. Health Serv Outcomes Res Methodol. 2006;2:169–88.
Article Google Scholar
Chooi EY, Chen HM, Miao Q, Weng YR, Chen XY, Ge ZZ, et al. Chronic atrophic gastritis is a progressive disease: analysis of medical reports from Shanghai (1985–2009). Singapore Med J. 2012;53:318–24.
PubMed Google Scholar
Junxiang L, Yan C, Bin L, Yangang W. Consensus on the diagnosis and treatment of chronic atrophic gastritis with integrated traditional chinese and western medicine (2017). Chin J Integr Tradit Western Med Digest. 2018;26:121–31. https://doi.org/10.3969/j.issn.1671-038X.2018.02.03.
Article Google Scholar
Zhang Y, Li F, Yuan F, Zhang K, Huo L, Dong Z, et al. Diagnosing chronic atrophic gastritis by gastroscopy using artificial intelligence. Digest Liver Dis. 2020;52:566–72. https://doi.org/10.1016/j.dld.2019.12.146.
Article Google Scholar
Kodama M, Okimoto T, Ogawa R, Mizukami K, Murakami K. Endoscopic atrophic classification before and after H. pylori eradication is closely associated with histological atrophy and intestinal metaplasia. Endosc Int Open. 2015;03:E311–7. https://doi.org/10.1055/s-0034-1392090.
Article Google Scholar
Kono S. Can endoscopic atrophy predict histological atrophy? Historical study in United Kingdom and Japan. World J Gastroenterol. 2015;21:13113. https://doi.org/10.3748/wjg.v21.i46.13113.
Article CAS PubMed PubMed Central Google Scholar
Sugano K, Tack J, Kuipers EJ, Graham DY, El-Omar EM, Miura S, et al. Kyoto global consensus report on Helicobacter pylori gastritis. Gut. 2015;64:1353–67. https://doi.org/10.1136/gutjnl-2015-309252.
Article PubMed Google Scholar
Fitting D, Krenzer A, Troya J, Banck M, Sudarevic B, Brand M, et al. A video based benchmark data set (ENDOTEST) to evaluate computer-aided polyp detection systems. Scand J Gastroenterol. 2022. https://doi.org/10.1080/00365521.2022.2085059.
Article PubMed Google Scholar
Glissen Brown JR, Mansour NM, Wang P, Chuchuca MA, Minchenberg SB, Chandnani M, et al. Deep learning computer-aided polyp detection reduces adenoma miss rate: a United States Multi-center Randomized Tandem Colonoscopy Study (CADeT-CS Trial). Clin Gastroenterol Hepatol. 2022;20:1499–507. https://doi.org/10.1016/j.cgh.2021.09.009.
Article CAS PubMed Google Scholar
Li D, Chen C, Li J, Wang L. Dense gate network for biomedical image segmentation. Int J Comput Assist Radiol. 2020;15:1247–55. https://doi.org/10.1007/s11548-020-02138-7.
Article Google Scholar
Chen Y, Wang K, Liao X, Qian Y, Wang Q, Yuan Z, et al. Channel-Unet: a spatial channel-wise convolutional neural network for liver and tumors segmentation. Front Genet. 2019;10:1110. https://doi.org/10.3389/fgene.2019.01110.
Article PubMed PubMed Central Google Scholar
Thomas E, Pawan SJ, Kumar S, Horo A, Niyas S, Vinayagamani S, et al. Multi-Res-Attention UNet: a CNN model for the segmentation of focal cortical dysplasia lesions from magnetic resonance images. IEEE J Biomed Health. 2021;25:1724–34. https://doi.org/10.1109/JBHI.2020.3024188.
Article Google Scholar
Zhang Y, Wu J, Liu Y, Chen Y, Wu EX, Tang X. MI-UNet: multi-inputs UNet incorporating brain parcellation for stroke lesion segmentation from T1-weighted magnetic resonance images. IEEE J Biomed Health. 2021;25:526–35. https://doi.org/10.1109/JBHI.2020.2996783.
Article Google Scholar
Ibtehaz N, Rahman MS. MultiResUNet: rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020;121:74–87. https://doi.org/10.1016/j.neunet.2019.08.025.
Article PubMed Google Scholar
Moustafa MS, Mohamed SA, Ahmed S, Nasr AH. Hyperspectral change detection based on modification of UNet neural networks. J Appl Remote Sens. 2021;15:028505. https://doi.org/10.1117/IIRS.15.028505.
Article Google Scholar
Zhu C, Mei K, Peng T, Luo Y, Liu J, Wang Y, et al. Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet. Neurocomputing. 2021;438:165–83. https://doi.org/10.1016/j.neucom.2020.04.154.
Article Google Scholar
Suqin L, Shan H, Yiyun C, Xiaoyu Z, Xia L, Honggang Y. Advances in the application of artificial intelligence in digestive endoscopy. Chin J Digest Endosc. 2020;37:146–9. https://doi.org/10.3760/cma.j.issn.1007-5232.2020.02.018.
Article Google Scholar
Shengbing Z, Wei Q, Yu B, Zhaoshen L. Advances in the application of artificial intelligence in the diagnosis and treatment of digestive endoscopy. Chin J Digest Endosc. 2019;66:293–6. https://doi.org/10.3760/cma.j.issn.1007-5232.2019.04.017.
Article Google Scholar
Hirasawa T, Aoyama K, Tanimoto T, Ishihara S, Shichijo S, Ozawa T, et al. Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer. 2018;21:653–60. https://doi.org/10.1007/s10120-018-0793-2.
Article PubMed Google Scholar
Yue H, Shan L, Bin L. The significance of OLGA and OLGIM staging systems in the risk assessment of gastric cancer: a systematic review and meta-analysis. Gastric Cancer. 2018;21:579–87. https://doi.org/10.1007/s10120-018-0812-3.
Article PubMed Google Scholar
Agréus L, Kuipers EJ, Kupcinskas L, Malfertheiner P, Di Mario F, Leja M, et al. Rationale in diagnosis and screening of atrophic gastritis with stomach-specific plasma biomarkers. Scand J Gastroenterol. 2012;47:136–47. https://doi.org/10.3109/00365521.2011.645501.
Article CAS PubMed PubMed Central Google Scholar
Michigami Y, Watari J, Ito C, Nakai K, Yamasaki T, Kondo T, et al. Long-term effects of H. pylori eradication on epigenetic alterations related to gastric carcinogenesis. Sci Rep UK. 2018;8:14369. https://doi.org/10.1038/s41598-018-32717-3.
Article CAS Google Scholar
Tu H, Sun L, Dong X, Gong Y, Xu Q, Jing J, et al. A serological biopsy using five stomach-specific circulating biomarkers for gastric cancer risk assessment: a multi-phase study. Am J Gastroenterol. 2017;112:704–15. https://doi.org/10.1038/ajg.2017.55.
Article PubMed Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work has not been supported by any financial grant.

Author information

Authors and Affiliations

Department of Gastroenterology, Xuanwu Hospital of Capital Medical University, 45 Chang-chun Street, Beijing, 100053, China
Quchuan Zhao & Tianyu Chi
Department of Anesthesiology, Guang’anmen Hospital China Academy of Chinese Medical Sciences, 5 North Court Street, Beijing, 100053, China
Qing Jia

Authors

Quchuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Qing Jia
View author publications
You can also search for this author in PubMed Google Scholar
Tianyu Chi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

QZ and TC marked the gastroscopic images and participated in the model construction. QZ, QJ and TC conducted training and testing of the model. QZ, QJ and TC wrote the main manuscript text and prepared the tables and figures. All the authors had access to the study data, they had reviewed and approved the final manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Qing Jia or Tianyu Chi.

Ethics declarations

Ethics approval and consent to participate

The study protocol was done in accordance to the ethical guidelines of the 1975 Helsinki Declaration. This study protocol was approved by the ethics committee of Xuanwu Hospital of Capital Medical University. the written informed consent was obtained from all the participants in the study, which was approved by the ethics committee at our hospital.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Zhao, Q., Jia, Q. & Chi, T. Deep learning as a novel method for endoscopic diagnosis of chronic atrophic gastritis: a prospective nested case–control study. BMC Gastroenterol 22, 352 (2022). https://doi.org/10.1186/s12876-022-02427-2

Download citation

Received: 15 May 2022
Accepted: 15 July 2022
Published: 25 July 2022
DOI: https://doi.org/10.1186/s12876-022-02427-2

Deep learning as a novel method for endoscopic diagnosis of chronic atrophic gastritis: a prospective nested case–control study

Abstract

Background and aims

Methods

Results

Conclusions

Introduction

Methods

Sample size calculation

Study design and participants

Diagnosis of chronic atrophic gastritis

The real-time video monitoring model for endoscopic diagnosis of CAG based on U-Net deep learning

Deep learning model training and testing

Outcomes

Statistical analysis

Sensitivity analysis

Results

Study population

Primary outcomes

Secondary outcomes

Sensitivity analysis

Discussion

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Gastroenterology

Contact us