Comparisons with similar studies
Since no previous studies were designed to investigate the diagnostic performance of the Reference Standard relative to the Rome IV criteria, we cannot compare the findings in this diagnostic study to those in other publications. That said, the Reference Standard has been evaluated against the Rome II criteria and the Rome III criteria for FD . When compared against the Reference Standard, the Rome II criteria had a sensitivity of 71.4% (95% CI 68.4%–74.2%), a specificity of 55.6% (95% CI 51.5%–59.7%), a positive LR of 1.61 (95% CI 1.45–1.78), and a negative LR of 0.51 (95% CI 0.45–0.58). The Rome III criteria had a sensitivity of 60.7% (95% CI 57.5%–63.9%), a specificity of 68.7% (95% CI 64.6%–72.6%), a positive LR of 1.94 (95% CI 1.69–2.22), and a negative LR of 0.57 (95% CI 0.52–0.63).
Implication for practice
This cross-sectional study illustrated that the Reference Standard may be sufficient in ruling out patients without Rome IV-defined FD, PDS, or EPS, given relatively high sensitivity values ranging from 78.6 to 95.5%. Their negative LRs of around 0.20 implied that negative test results of the Reference Standard may moderately decrease patients’ post-test probabilities of having FD, PDS, or EPS diagnosis . However, affected by the mediocre specificity values ranging from 37.6 to 59.6%, the Reference Standard may not be useful in ruling in patients with Rome IV-defined FD, PDS, or EPS. The positive LRs of less than 2.0 also revealed that positive test results of the Reference Standard may only slightly increase patients’ post-test probabilities of having an FD, PDS, or EPS diagnosis . The AUC values of around 0.70 indicated that the Reference Standard had moderate accuracy in distinguishing between patients with and without Rome IV-defined FD, PDS, or EPS .
With its satisfactory performance in ruling out patients without FD, PDS, or EPS, the Reference Standard may reduce unnecessary initiation of FD treatments. Reduction of over-treatment may reduce treatment-associated adverse events. The use of prokinetics, the recommended first-line therapy for FD in the Asian guideline , is associated with adverse events of dystonia, parkinsonism-type movements, tardive dyskinesia, or even life-threatening arrhythmia [9, 28]. Proton pump inhibitors, the first-line therapy for FD in the North American guideline , may increase the risk of hip fracture, community-acquired pneumonia, and Clostridium difficile infection . Also, second-line therapy, tricyclic antidepressants, may cause constipation, dry mouth, urinary retention, and somnolence [9, 30].
By ruling out FD effectively, it is expected that the financial burden on FD patients and healthcare systems would be relieved by minimising the chance of initiating unnecessary treatments. A study in 2013 estimated that each FD patient in the United States had to pay, on average, USD805 per year for regular consultations and treatment . These calculations did not consider the indirect cost incurred by absence from work and loss of productivity. A retrospective study in Malaysia also showed that FD is associated with the highest healthcare burden compared to other functional GI disorders in secondary care .
In routine practice where consultation time is limited, the Reference Standard may be used as an initial screening tool for FD and FD symptom subtypes prior to confirmation by the Rome IV criteria. If service arrangement allows, the Rome IV criteria should be implemented to confirm the positive results made by the Reference Standard to avoid potential false-positive cases, given the mediocre specificity of the former instrument. Moreover, the Rome IV criteria may be adopted for confirming FD diagnosis among patients who present with persistent dyspepsia symptoms but indeed show negative test results in Reference Standard screenings, facilitating appropriate initiation of treatment. Besides the questionnaire-based criteria, physicians should consider additional patient information [5, 33], such as symptom duration and co-morbidity, when making diagnoses for dyspeptic patients, since patients who do not fully meet the Rome IV criteria may still be offered essential treatments for reducing symptoms and improving quality of life [5, 8]. Additional diagnostic workup may also supplement OGD and H. Pylori tests for differential diagnosis. For example, in areas with a high prevalence of hepatocellular carcinoma like Southern China, upper abdominal ultrasound may be valuable for differentiating epigastric pain caused by malignancy or FD .
Implication for research
Although the Rome diagnostic criteria were recognised in the Asian and North American FD guidelines for clinical research [8, 9], they were considered to have limited relevance to routine practice. To prepare for future updates, future research can investigate the feasibility of developing a concise edition of the Rome FD diagnostic criteria based on the Reference Standard, so as to facilitate FD diagnosis in outpatient clinics where consultation time is limited.
Furthermore, to introduce objectivity of FD diagnosis, the potential value of adding duodenal eosinophilia as a diagnostic marker should be investigated, since the phenomenon is closely associated with early satiety and PDS . Gastric emptying is associated with the pathophysiological mechanism of FD [1, 28], so accelerated gastric emptying, delayed emptying, and fasting gastric volume may also be evaluated as potential motility markers for FD diagnosis [28, 35]. Alteration of the GI microbiota may also be chosen as another biomarker [28, 36], given its relationship with the occurrence of functional GI disorders.
Lastly, further diagnostic research may be conducted to investigate whether the Reference Standard is able to differentiate organic dyspepsia from FD. Evidence produced may contribute to reducing unnecessary tests and examinations in routine practice .
Strengths and limitations
To the best of our knowledge, this study is the first to translate the Rome IV criteria for FD from English to Cantonese using the standardised forward–backward translation method. We held cognitive debriefing sessions with FD patients to test the clarity, adequacy of cultural adaptation, language usage, and acceptability of the draft translation. Each step in the translation process was monitored and approved by the official Rome Foundation. This study is also the first to compare the Reference Standard against the Rome criteria in terms of diagnostic performance and the first that is conducted in the China Greater Bay Area. Most importantly, this cross-sectional diagnostic study has low risk of bias and concerns over applicability in terms of the four domains in the QUADAS-2. These domains include: (i) patient selection; (ii) index test; (iii) reference standard; and (iv) flow and timing .
This study had certain limitations. First, given that only participants in Hong Kong were recruited for cognitive debriefing, local adaptations may be required before adopting the Cantonese R4DQ-FD in other Cantonese-speaking populations in China or overseas. Second, misclassifications of organic upper GI diseases might exist, since OGD and H. Pylori tests within five years were accepted for eligibility screening instead of referring the patient to concurrent diagnostic workup. Third, results from abdominal ultrasound were not included in the eligibility criteria, because the clinical value of the procedure in evaluating organic dyspepsia is limited . Fourth, we used the prevalence of FD in Asia (30%)  for the sample size calculation because no information regarding the prevalence of FD in the China Greater Bay Area is available. This diagnostic study would have required a larger sample size if the prevalence of FD in the China Greater Bay Area was, in fact, lower than in Asia. Also, due to the lack of objective gold standard definitions of FD, the prevalence of FD used in this study may not reflect the true prevalence of FD in the continent. Fifth, the diagnostic performance indicators of a diagnostic test may vary between subgroups of participants with different demographical or clinical characteristics . These characteristics may include but are not limited to age, gender, and symptoms’ severity and frequency . However, due to the relatively small sample size, we did not conduct logistic regression analyses to explore the relationships between the diagnostic performance indicators of the Reference Standard and participant subgroups.