Skip to main content

Validation of a survey methodology for gastroesophageal reflux disease in China



Gastroesophageal reflux disease (GERD) causes a wide range of clinical symptoms and potentially serious complications, but epidemiological data about GERD in China are limited. The aim of this pilot study was to develop and validate a methodology for the epidemiological study of GERD in China.


Regionally stratified, randomized samples of Shanghai residents (n = 919) completed Mandarin translations of the Reflux Disease Questionnaire (RDQ), GERD Impact Scale, Quality of Life in Reflux and Dyspepsia (QOLRAD) questionnaire and 36-item Short Form Health Survey (SF-36). Reliability and construct validity were tested by appropriate statistical analyses.


The response rate was 86%. The test-retest reliability coefficients for the RDQ, GERD Impact Scale, QOLRAD and SF-36 were 0.80, 0.71, 0.93 and 0.96, respectively, and Cronbach's alpha coefficients were 0.86, 0.80, 0.98 and 0.90, respectively. Dimension scores were highly correlated with the total scores for the QOLRAD and SF-36, and factor analysis showed credible construct validity for the RDQ, GERD Impact Scale and SF-36. The RDQ GERD score was significantly negatively correlated with QOLRAD dimensions of food and drink problems and social functioning, and was significantly negatively correlated with all dimensions of the SF-36. All eight of the SF-36 dimensions were significantly correlated with the QOLRAD total score.


This study developed and tested a successful survey methodology for the investigation of GERD in China. The questionnaires used demonstrated credible reliability and construct validity, supporting their use in larger epidemiological surveys of GERD in China.

Peer Review reports


Gastroesophageal reflux disease (GERD) is a common disorder caused by backflow of stomach contents into the esophagus. As it can cause a wide range of clinical symptoms and potentially serious complications, the epidemiology of GERD has been a subject of much interest in recent years. GERD is frequently diagnosed on the basis of symptoms alone, with the criterion for diagnosis in clinical practice being when reflux symptoms become troublesome to the patient [1]. However, for epidemiological studies, a simple symptom threshold is required to identify those who have GERD. In many studies, this threshold is defined as at least weekly reflux symptoms [2]. GERD is common in the West, with a prevalence of about 10–20%, but the prevalence in Asia is generally lower at approximately 5% [2]. The prevalence of GERD is, however, thought to be increasing [3], with trends in Asia attracting particular interest [4]. There have been few high quality, population-based epidemiological surveys of GERD in Asia, particularly in China [5]. A number of methodological challenges associated with studying the epidemiology of GERD in this region may have contributed to this paucity.

To identify reflux symptoms accurately, validated patient-completed questionnaires are needed, as clinicians tend to underestimate the presence and severity of reflux symptoms reported by patients [6]. In particular, validated symptom descriptors (e.g. 'burning behind the breastbone') are necessary because terms such as 'heartburn' are known to be poorly understood by patients [7]; this is of particular relevance to Chinese populations, because there is no word for 'heartburn' in Mandarin Chinese beyond specialist medical circles, and a survey in the USA revealed that only 13.2% of East Asian patients understood the term [7].

Within the Chinese population, language and cultural differences can lead to different communities perceiving and expressing their symptoms differently. In China, Mandarin is the official language, but about half the population does not speak it, particularly those living in rural areas and older people [8]. There are thousands of local dialects, many of which are mutually unintelligible when spoken. All use the same writing system, and overall literacy rates in China are high, but literacy among older people, women and those living in rural areas is relatively low; in the 2003 census, over 9.6% of women and 2.1% of men were illiterate or semi-literate [9].

Population surveys can be difficult to implement in China. Telephone surveys may introduce population bias in favour of the more wealthy urban Chinese population who are more likely to have telephones. The utility of postal surveys is limited by the ability of the respondent to understand the terms used [10] which, for questionnaires developed in the West, may be further compounded by cultural conceptual differences. Response rates to telephone or postal questionnaires may be low, potentially introducing responder bias [11, 12]. For these reasons, previous population surveys of GERD in China have administered questionnaires using a face-to-face interview technique, in which subjects completed the questionnaire while being assisted by trained interviewers [10, 13, 14]. This technique has achieved high response rates and has enabled terms and definitions to be clarified appropriately for individual respondents.

In order to investigate the prevalence and impact of GERD in China and facilitate comparisons with other countries, linguistic and psychometric validation of internationally recognized disease-specific and generic patient-reported outcomes instruments is required. The aim of this pilot study was to develop and validate a methodology for the epidemiological study of GERD in China. The feasibility, validity and reliability of several well-designed questionnaires were tested in a Chinese environment using randomized, stratified, multi-stage cluster sampling, a statistical sampling technique adopted by the World Health Organization (WHO) [15] that is particularly well suited to the residential and social administration system in China.



Shanghai, on the east coast of China, is China's largest city. It is divided into 18 districts and one county, each of which is classified as urban, suburban, or rural (Figure 1). Each district includes numerous blocks, which include multiple residential areas, and the county covers several towns that govern a number of villages. Broadly speaking, people who live in an urban area have a city lifestyle, while people who live in a rural region lead a farming or country peasant way of life. The suburban lifestyle is intermediate between these two.

Figure 1

The survey sites in Shanghai.


A randomized, stratified, multi-stage cluster sampling methodology was used to select a representative sample of the general population in Shanghai. Huangpu was randomly selected from the nine urban districts, Pudong from the four suburban districts, and Songjiang from the five rural districts and one county of Shanghai. Blocks were randomly selected from districts and residential areas from blocks so that, finally, four residential areas in the urban district, three in the suburban district and two in the rural district were randomly selected (see Figures 1 and 2). The Residential Committee of each residential area supplied detailed household rosters of all adults, and subjects for this study were randomly sampled from these lists.

Figure 2

Stratified, multi-stage randomized cluster sampling of urban, suburban and rural districts in Shanghai.

Pudong District consists of 26 towns and blocks, and is the biggest district in Shanghai. The residents in this district are widely dispersed and not all the information for each resident could be obtained. As information for all families in Pudong was available, families were randomly sampled from the selected residential areas and the family member with a birthday closest to the investigation date was selected.

According to the statistical formula n = t2pq/d2 (where n, t, p, q and d are sample size, t value, positive rate, negative rate and acceptable error, respectively), assuming a GERD prevalence of 10%, and setting significance at P = 0.05 and acceptable error at 2%, the calculated sample size was 864 [16]. According to the 1 in 10 000 sampling proportion principle and the population size of Shanghai, the target sample size was 1300 respondents. Combining these two figures, a target sample size of 1000 valid respondents was deemed appropriate. Allowing for a 20% non-response rate, the final intended sample size was set at 1200, including 400 subjects from each district.

Residents under 18 years of age, or residents who were illiterate, had severe visual, hearing or learning disabilities, or major psychiatric illness, were excluded from the survey. Respondents who were not at home after three attempts to administer the questionnaire were considered to be missing.

Administration of questionnaires

Local residential committee staff informed residents of the survey and secured their support and understanding. The informed consent of respondents was obtained, and each respondent was free to discontinue participation in the study at any time. The study was approved by the Second Military Medical University Ethics Committee.

During the fieldwork period from November 2005 to January 2006, respondents completed questionnaires in their own homes or in local residential committee offices. Questionnaires were self-administered, with trained and supervised facilitators on hand to explain any questions that were unclear. The facilitators were social workers at the site, who were trained by supervisors who were professionals and graduate students from the Department of Health Statistics (DoHS), who received training from an epidemiology survey expert from the DoHS and a gastrointestinal specialist from Shanghai Hospital. Quality auditing was performed to ensure all questionnaires were completed properly. A valid questionnaire was one that had been audited and signed by a supervisor.


Each respondent completed five questionnaires in Mandarin (see additional file 1: GERD questionnaire in English and Mandarin Chinese): a general information questionnaire and translations of four concise, well-validated, internationally recognized and frequently cited disease-specific and generic health questionnaires, chosen to facilitate comparison with other studies and minimize the length of the overall survey: the Reflux Disease Questionnaire (RDQ), the GERD Impact Scale, the Quality of Life in Reflux and Dyspepsia (QOLRAD) questionnaire, and the 36-Item Short-Form Health Survey (SF-36). The general information questionnaire collected information on age, gender, education, income and other general demographic variables.

The RDQ is a 12-item self-report questionnaire measuring the frequency and severity of upper gastrointestinal symptoms (heartburn, regurgitation and epigastric pain) over the previous week. Symptom frequency and severity are scored on a 6-point Likert scale (0–5, where 5 is the most severe/frequent). A GERD dimension score can be obtained by combining the heartburn and regurgitation scores [17]. Subjects reporting heartburn and/or regurgitation of any frequency during the 1-week recall period of the questionnaire were defined as having GERD. The RDQ was validated for use in clinical trials in two large studies [18, 19], and was also recently validated for use as a diagnostic tool in the DIAMOND study (Diagnostic Tool for the Management of Patients with Reflux Disease) [20]. A Chinese version of the RDQ was tested in 10 hospitals in mainland China, and was found to identify accurately the presence of symptoms suggestive of GERD experienced over the previous month [21].

The GERD Impact Scale questionnaire is an eight-item self-report questionnaire designed to aid patient-physician communication in primary care. It assesses the frequency of gastroesophageal reflux symptoms over the past 2 weeks and their impact on everyday activities such as sleep, work, meals and social occasions, and the use of additional medication (other than that prescribed). Four response options for frequency are provided (1–4) where 1 is 'all of the time' and 4 is 'none of the time'. This newly developed tool has demonstrated good psychometric properties [22].

The GERD-specific version of the QOLRAD questionnaire is a 25-item disease-specific quality-of-life instrument measuring the impact of upper gastrointestinal symptoms over the previous week on five dimensions: emotional well-being, sleep, vitality, eating/drinking, and physical/social functioning [23]. The frequencies of effects are reported using a 7-point Likert scale, with low scores indicating frequent impairment. Its reliability and validity have been extensively documented in studies of patients with upper gastrointestinal symptoms [2325].

The SF-36 is a generic questionnaire assessing health status and well-being over the past 4 weeks. It contains 36 items clustered in eight dimensions: physical functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional, and mental health, plus one item assessing change in health status over the previous year [26]. Item scores for each dimension are coded, summed and transformed to a scale from 0 (worst possible health state) to 100 (best possible health state). Its reliability and validity are widely documented across a range of language versions [27, 28].

Translation and cognitive debriefing

Apart from the SF-36, where validated Mandarin translations already exist [29], questionnaires were translated and tested in the Department of Medicine, Faculty of Medicine, at the University of Hong Kong. Literal translation of Hong Kong Chinese into mainland Chinese (Mandarin) was undertaken by investigators and a panel of mainland gastroenterologists so that questionnaires were more interpretable by people from mainland China. This process was followed by cognitive debriefing, where five literate volunteers from mainland China who had a diagnosis of GERD (heartburn and/or acid regurgitation over the past year) completed the translated questionnaires and were interviewed to assess their understanding and interpretation. The overall relevance and clarity of the questionnaire were assessed using defined responses (very low; low; moderate; high; very high) and subjects were asked to specify any items that they regarded as irrelevant or unclear. Subjects considered the questions to be relevant and clear (grading: moderate to very high). No additional revisions were required.

Statistical analysis

Data management

Questionnaire responses were coded and double-entered by two independent professional data-entry staff from the DoHS. EpiData software [30] was used to check for consistency between the two sets of data entries to ensure data quality. For the RDQ, QOLRAD, and SF-36, where at least 50% of items in a dimension were completed, the mean value of the completed items was used to impute the missing values. Where more than 50% of items were missing, the dimension score was excluded from the analysis [3133]. For the GERD Impact Scale, if an item score was missing, imputation was not performed and the score was excluded from the analysis.

SAS 9.1.3 (SAS, Shanghai, China) and SPSS 10.0 software (SPSS Inc., Shanghai, China) were used to complete data analyses. All hypothesis tests used two-side tests and set alpha at 0.05. A two-tailed P-value of 0.05 or less was considered to indicate statistical significance. Different groups of subjects were compared by ANOVA for normally distributed continuous data, Fisher's exact test for categorical variables and the Cochran-Mantel-Haenszel test for ranked variables.


Internal consistency was evaluated using Cronbach's alpha coefficient to determine the extent to which items within each questionnaire were interrelated [34]. Cronbach's alpha coefficients for each questionnaire were calculated by correlating all individual item scores with dimension scores and/or the overall score. An alpha coefficient above 0.70 suggests good internal consistency and reliability.

Test-retest reliability is a measure of the stability of the instrument under different conditions with the same respondent; in this study, it was assessed by retesting 10% of respondents (n = 40 from each region) 2–7 days after the baseline test. Cohen's kappa coefficient and the intraclass correlation coefficient (ICC) were used to analyze the test-retest reliability of the survey instruments. Cohen's kappa coefficient was used in the analysis of categorical and ranked measurements, while ICC was used to analyze quantitative measurements. A test-retest coefficient above 0.70 was considered acceptable [35].

Construct validity

Construct validity evaluates whether an instrument actually measures the phenomena that it theoretically predicts; correlation and factor analysis were used to evaluate construct validity in this study. Factor analysis using principal component analysis and quartimax rotation explored whether the factor structure of each questionnaire was supported. Factor loadings larger than 0.50 within one dimension were considered to support the factor construct provided the factor loadings were low across the other dimensions, with cumulative rates used to show the contributions of combinations of principal components [36]. Correlation analysis tested the construct validity of questionnaires containing multiple dimensions (i.e. RDQ, QOLRAD and SF-36). The analysis measured the strength of association between dimension scores and the total score for QOLRAD and SF-36 questionnaires, and between item scores and dimension scores for the RDQ. A strong correlation coefficient was considered to be over 0.6, a moderate correlation, 0.3–0.6, and a weak correlation below 0.3 [37].

Convergent validity analyzes whether the postulated dimension of an instrument correlates appreciably with all other dimensions from other instruments that should theoretically be related to it. Convergent validity was investigated in this study by correlating the GERD dimension from the RDQ with SF-36 and QOLRAD dimensions, and SF-36 dimensions with QOLRAD total score. A decrease in health-related quality of life was expected for respondents with GERD symptoms.


Response rate

Of the 1200 randomly pre-selected subjects, 1034 agreed to be interviewed (a response rate of 86%). In the Pudong District, a total of 112 respondents' questionnaires were withdrawn from the statistical analysis due to one facilitator's failure to adhere to the study protocol. A further three questionnaires from the Huangpu District were excluded due to incompleteness. Therefore, a total of 919 questionnaires (359 from the urban region, 224 from the suburban region, and 336 from the rural region) were included in the analysis after quality auditing. The mean response rates for items in each questionnaire are provided in Table 1.

Table 1 Mean item response rates by questionnaire and by region.

Of 120 subjects randomly selected for retest, 113 agreed to be re-interviewed (a 94% response rate). Fourteen questionnaires were rejected because they were not completed in line with the study protocol, leaving 99 questionnaires for inclusion in the retest analysis.


The respondents' average age was 47 years (ranging from 18 to 77 years); 55% were female and the majority of respondents (85%) were married. Most respondents did not smoke (74%) or drink alcohol (83%). The average BMI was 22.6 kg/m2, with a range of 14.4–36.5 kg/m2. Level and years of education, current job type and income level all varied significantly between the three regions (p < 0.0001). Education levels and family income were greatest for the urban region and lowest for the rural region (Table 2), reflecting the socioeconomic divide that exists between urban and rural China. Forty percent of urban respondents were professionals or technicians, while 73% of rural respondents and 44% of suburban respondents were agricultural or fishery workers.

Table 2 Demographics and baseline characteristics of respondents by region.


In the test-retest analysis, Cohen's kappa coefficients ranged from 0.66 to 1.00 for RDQ dimensions, 0.49 to 1.00 for GERD Impact Scale items, and 0.79 to 1.00 for QOLRAD dimensions. The test-retest ICC ranged from 0.69 to 0.97 for seven dimensions of the SF-36 questionnaire, while one (role-emotional) was close to zero (0.01). Internal consistency (indicated by Cronbach's alpha coefficient) ranged from 0.65 to 0.97 for QOLRAD dimensions. For SF-36, seven dimensions ranged from 0.69 to 0.95, while one (social functioning) was 0.31. The test-retest reliability coefficient and total Cronbach's alpha coefficient for each questionnaire are shown in Table 3. All coefficients were ≥ 0.7, demonstrating good reliability and internal consistency for each questionnaire.

Table 3 Reliability of questionnaires.

Construct validity

Each dimension score was highly correlated with the total score for both QOLRAD and SF-36 (p < 0.001), indicating good construct validity. For QOLRAD, Spearman correlation coefficients ranged from 0.77 for physical/social functioning to 0.91 for food and drink problems and for vitality, among respondents reporting symptoms of heartburn and/or regurgitation via the RDQ. For SF-36, Spearman correlation coefficients ranged from 0.53 for social functioning to 0.77 for general health, for the study population as a whole. The RDQ also demonstrated good construct validity (Table 4), with each dimension correlating most strongly with the individual items comprising it (Spearman correlation coefficients 0.62–0.94). Regurgitation items correlated strongly with the GERD dimension as expected, but the weaker correlation with heartburn items may have been due to the low prevalence of heartburn in the Shanghai population.

Table 4 Spearman correlation coefficient between RDQ item score and RDQ dimension score.

Factor analysis was used to explore whether the predicted factor structure of the questionnaire was supported. Credible construct validity was demonstrated for the RDQ, GERD Impact Scale and SF-36 questionnaires. All RDQ items correlated as expected in the factor analysis apart from the frequency and severity of 'pain behind breastbone', which correlated more strongly with the epigastric pain dimension than the heartburn dimension (Table 5). The cumulative rate of the three factors was 72.1%. All GERD Impact Scale items correlated with factors as expected (Table 6). The cumulative rate of the four factors was 78.0%.

Table 5 Factor analysis matrix of RDQ items in three factors (heartburn, regurgitation and epigastric pain).
Table 6 Factor analysis matrix of GERD Impact Scale items in four factors.

For SF-36, the cumulative rate of the eight factors plus health transition item was 71.3%. Most items correlated with factors as expected (see Table 7), with particularly high correlations seen for role-physical and bodily pain dimensions. The physical functioning (PF) items were distributed into two dimensions; PFa included moderate to vigorous activities such as lifting or carrying groceries, climbing several flights of stairs and walking more than one mile, whereas PFb included less strenuous activities such as climbing one flight of stairs, bending, kneeling, walking one or several blocks, and bathing or dressing oneself. The social function dimension was unclear, distributing to mental health and role-emotional dimensions. In addition, two items from the vitality dimension, two from the mental health dimension and one from the physical functioning dimension were distributed into the general health dimension. The three role-emotional items showed a tendency towards distribution into the role-physical dimension, although the correlation coefficients were lower than those for distribution into the expected role-emotional dimension.

Table 7 Factor analysis matrix of SF-36 items in nine factors.

The factor analysis showed that the construct validity of QOLRAD was not as good as expected, as items were not distributed to the appropriate dimensions (Table 8).

Table 8 Factor analysis matrix of QOLRAD items in five factors.

Convergent validity

The RDQ GERD score was negatively correlated with all QOLRAD dimensions; correlations were statistically significant for the QOLRAD dimensions of food and drink problems (p = 0.037, correlation coefficient -0.28) and social functioning (p = 0.003, correlation coefficient -0.39). The RDQ GERD score was also significantly negatively correlated with all dimensions of SF-36 (p ≤ 0.001). SF-36 correlation coefficients ranged from -0.11 (social functioning) to -0.34 (bodily pain). Correlations were negative because health-related quality of life decreases as symptoms and their impact increase.

The RDQ GERD score correlated most strongly with bodily pain (the SF-36 dimension most impaired by GERD in previous studies), reflecting the fact that GERD is primarily a painful disease. All eight SF-36 dimensions were significantly correlated with the QOLRAD total score (p ≤ 0.001, correlation coefficients ranged from 0.16–0.29), supporting the construct validity of QOLRAD and SF-36.


This pilot study used several well-designed questionnaires, administered together, with the aim of developing and validating a methodology for the epidemiological study of GERD in China. Using a randomized, stratified, multi-stage cluster sampling technique, we validated Chinese translations of the SF-36, QOLRAD questionnaire, GERD Impact Scale and RDQ. In this study, the translated and adapted questionnaires demonstrated reproducibility and internal consistency within the methodology adopted, although responsiveness was not assessed. Each questionnaire had a test-retest reliability coefficient larger than 0.7 and a high Cronbach's alpha coefficient (≥ 0.8), suggesting good reliability. The construct validity of questionnaires was also credible in this survey, although the QOLRAD did not perform well in the factor analysis. This was likely to be due to linguistic and cultural translation problems: facilitators considered that some items were difficult to explain to respondents, particularly for those with a low level of education.

The sampling and administration techniques contributed substantially to the success of this study. By gaining the support of local residential communities, a high response rate of 86% was achieved, which is likely to prevent significant responder bias. The provision of assistance from trained facilitators helped avoid potential cultural and linguistic confusion, providing a relatively precise interpretation of the items in the questionnaire, and is recommended for future epidemiological studies using this survey instrument in order to ensure accuracy.

Chinese translations of the SF-36 have previously undergone psychometric validation among Chinese-speaking peoples in mainland China, the USA, Hong Kong and Taiwan [29, 3841]. These studies demonstrated satisfactory psychometric characteristics for SF-36 in these groups, while highlighting a level of cultural variation between Western and Chinese versions and between the different Chinese cultures. There is a tendency, also reflected in the current study, for the social functioning dimension to perform less well in China [29]; Li and colleagues have commented that this points to the Confucian ideology of collectivism in China, where it is socially unacceptable for Chinese to use 'sickness' as an excuse to avoid working or socializing [29].

In several previous studies vitality was more strongly associated with mental health than physical health [29, 3840], which may relate to traditional Chinese medicine, where fatigue associated with depression is conceptualized as a deficiency of vital energy or 'qi'. Although this was not the case in the current study, two items in the vitality dimension were more strongly distributed to general health. These issues illustrate the importance of examining the psychometric validity of instruments in different ethnic groups with cultural differences in language, values and perceptions of health.

This study has several limitations. Some subjects found the combined questionnaire too long and repetitive: a general information questionnaire, the RDQ, GERD Impact Scale, QOLRAD and SF-36 combined to make a total of 137 items and, on average, the questionnaire took about 20 minutes to complete. Responsiveness to change and known-groups validity were not assessed. Where construct validity was assessed, the different recall periods for individual questionnaires may have weakened convergent correlation results, while the short retest period may distort the reliability analysis where respondents remember their previous responses. The methodology was unable to sample migrant workers, who make up a significant portion of the Shanghai population, as they remain officially registered in their place of origin.


The experience gained in this pilot study will inform a planned larger study of the epidemiology of GERD across mainland China, which will establish the wider prevalence of GERD symptoms in China using representative study populations and a standardized, well-validated methodology. The survey questionnaire will be reduced in length and simplified, and symptoms will be assessed using the RDQ with a longer recall period (4 weeks). The QOLRAD questionnaire will be removed from the survey, due to its relatively poor performance in the factor analysis. Ideally, responsiveness to change and known-groups validity should be studied to investigate further the validity of the survey instruments. Health-related quality of life will be evaluated using the SF-36, and sleep disturbance will be investigated using the Epworth Sleepiness Scale (ESS). Endoscopic examination of randomly sampled subjects would also be informative, to allow comparison with recent studies conducted in the West [42, 43].

In summary, this study developed and tested a successful survey methodology for the epidemiological study of GERD in China. The questionnaires used demonstrated credible reliability and construct validity, supporting their use in larger epidemiological surveys of GERD in China, and allowing the results of this study to be extrapolated to the general population of East China.


  1. 1.

    Vakil N, van Zanten SV, Kahrilas P, Dent J, Jones R: The Montreal definition and classification of gastroesophageal reflux disease: a global evidence-based consensus. Am J Gastroenterol. 2006, 101: 1900-20; quiz 1943. 10.1111/j.1572-0241.2006.00630.x.

    Article  PubMed  Google Scholar 

  2. 2.

    Dent J, El-Serag HB, Wallander MA, Johansson S: Epidemiology of gastro-oesophageal reflux disease: a systematic review. Gut. 2005, 54: 710-717. 10.1136/gut.2004.051821.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    El-Serag HB: Time trends of gastroesophageal reflux disease: a systematic review. Clin Gastroenterol Hepatol. 2007, 5: 17-26. 10.1016/j.cgh.2006.09.016.

    Article  PubMed  Google Scholar 

  4. 4.

    Wong WM, Lim P, Wong BC: Clinical practice pattern of gastroenterologists, primary care physicians, and otolaryngologists for the management of GERD in the Asia-Pacific region: the FAST survey. J Gastroenterol Hepatol. 2004, 19 Suppl 3: S54-60. 10.1111/j.1440-1746.2004.03590.x.

    Article  PubMed  Google Scholar 

  5. 5.

    Wong BC, Kinoshita Y: Systematic review on epidemiology of gastroesophageal reflux disease in Asia. Clin Gastroenterol Hepatol. 2006, 4: 398-407. 10.1016/j.cgh.2005.10.011.

    Article  PubMed  Google Scholar 

  6. 6.

    McColl E, Junghard O, Wiklund I, Revicki DA: Assessing symptoms in gastroesophageal reflux disease: how well do clinicians' assessments agree with those of their patients?. Am J Gastroenterol. 2005, 100: 11-18. 10.1111/j.1572-0241.2005.40945.x.

    Article  PubMed  Google Scholar 

  7. 7.

    Spechler SJ, Jain SK, Tendler DA, Parker RA: Racial differences in the frequency of symptoms and complications of gastro-oesophageal reflux disease. Aliment Pharmacol Ther. 2002, 16: 1795-1800. 10.1046/j.1365-2036.2002.01351.x.

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Ministry of Education: Language survey of 500,000 Chinese. 2007, Beijing, Xinhua

    Google Scholar 

  9. 9.

    Wu J: Status of the population of China, 2002: Analysis of market and population (Chinese). 2003, 9: 45-48.

    Google Scholar 

  10. 10.

    Wang JH, Luo JY, Dong L, Gong J, Tong M: Epidemiology of gastroesophageal reflux disease: a general population-based study in Xi'an of Northwest China. World J Gastroenterol. 2004, 10: 1647-1651.

    Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Wong WM, Lai KC, Lam KF, Hui WM, Hu WH, Lam CL, Xia HH, Huang JQ, Chan CK, Lam SK, Wong BC: Prevalence, clinical spectrum and health care utilization of gastro-oesophageal reflux disease in a Chinese population: a population-based study. Aliment Pharmacol Ther. 2003, 18: 595-604. 10.1046/j.1365-2036.2003.01737.x.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Parker C, Dewey M: Assessing research outcomes by postal questionnaire with telephone follow-up. TOTAL Study Group. Trial of Occupational Therapy and Leisure. Int J Epidemiol. 2000, 29: 1065-1069. 10.1093/ije/29.6.1065.

    CAS  Article  PubMed  Google Scholar 

  13. 13.

    Pan G, Xu G, Ke MY, Han S, Guo H, Li Z, Fang X, Zou D, Lu SR, Liu J: Epidemiological study of symptomatic gastroesophageal reflux disease in China: Beijing and Shanghai. Chin J Dig Dis. 2000, 1: 2-8. 10.1046/j.1443-9573.2000.00001.x.

    Article  Google Scholar 

  14. 14.

    Chen M, Xiong L, Chen H, Xu A, He L, Hu P: Prevalence, risk factors and impact of gastroesophageal reflux disease symptoms: a population-based study in South China. Scand J Gastroenterol. 2005, 40: 759-767. 10.1080/00365520510015610.

    Article  PubMed  Google Scholar 

  15. 15.

    World Health Organisation: The World Health Survey (WHS). Sampling guidelines for participating countries. []

  16. 16.

    Campbell MJ, Julious SA, Altman DG: Estimating sample sizes for binary, ordered categorical, and continuous outcomes in two group comparisons. BMJ. 1995, 311: 1145-1148.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Shaw MJ, Talley NJ, Beebe TJ, Rockwood T, Carlsson R, Adlis S, Fendrick AM, Jones R, Dent J, Bytzer P: Initial validation of a diagnostic questionnaire for gastroesophageal reflux disease. Am J Gastroenterol. 2001, 96: 52-57. 10.1111/j.1572-0241.2001.03451.x.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Veldhuyzen van Zanten SV, Armstrong D, Barkun A, Junghard O, White RJ, Wiklund IK: Symptom overlap in patients with upper gastrointestinal complaints in the Canadian confirmatory acid suppression test (CAST) study: further psychometric validation of the reflux disease questionnaire. Aliment Pharmacol Ther. 2007, 25: 1087-1097.

    Article  Google Scholar 

  19. 19.

    Johnsson F, Hatlebakk J, Klintenberg AC, Roman J: The symptom relieving effect of esomeprazole 40 mg daily in patients with heartburn. Scand J Gastroenterol. 2003, 38 (4): 347-353. 10.1080/00365520310002157.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Dent J, Vakil N, Jones R, Reimitz PE, Schöning U, Halling K, Junghard O, Lind T: Validation of the reflux disease questionnaire for the diagnosis of gastroesophageal reflux disease in primary care. Gut. 2007, 56: A75-

    Google Scholar 

  21. 21.

    Chinese Gastroesophageal Reflux Disease Study Group: Value of reflux diagnostic questionnaire in the diagnosis of gastroesophageal reflux disease. Chin J Dig Dis. 2004, 5: 51-55. 10.1111/j.1443-9573.2004.00155.x.

    Article  Google Scholar 

  22. 22.

    Jones R, Coyne K, Wiklund I: The Gastroesophageal Reflux Disease Impact Scale - a patient management tool for primary care. Aliment Pharmacol Ther. 2007, 25: 1451-1459.

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Wiklund IK, Junghard O, Grace E, Talley NJ, Kamm M, Veldhuyzen van Zanten S, Pare P, Chiba N, Leddin DS, Bigard MA, Colin R, Schoenfeld P: Quality of Life in Reflux and Dyspepsia patients. Psychometric documentation of a new disease-specific questionnaire (QOLRAD). Eur J Surg Suppl. 1998, 41-49. 10.1080/11024159850191238.

    Google Scholar 

  24. 24.

    Kulich KR, Wiklund I, Junghard O: Factor structure of the Quality of Life in Reflux and Dyspepsia (QOLRAD) questionnaire evaluated in patients with heartburn predominant reflux disease. Qual Life Res. 2003, 12: 699-708. 10.1023/A:1025192100450.

    Article  PubMed  Google Scholar 

  25. 25.

    Talley NJ, Fullerton S, Junghard O, Wiklund I: Quality of life in patients with endoscopy-negative heartburn: reliability and sensitivity of disease-specific instruments. Am J Gastroenterol. 2001, 96: 1998-2004. 10.1111/j.1572-0241.2001.03932.x.

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Ware JE, Sherbourne CD: The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992, 30: 473-483. 10.1097/00005650-199206000-00002.

    Article  PubMed  Google Scholar 

  27. 27.

    Ware JE, Gandek B, Kosinski M, Aaronson NK, Apolone G, Brazier J, Bullinger M, Kaasa S, Leplege A, Prieto L, Sullivan M, Thunedborg K: The equivalence of SF-36 summary health scores estimated using standard and country-specific algorithms in 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol. 1998, 51: 1167-1170. 10.1016/S0895-4356(98)00108-5.

    Article  PubMed  Google Scholar 

  28. 28.

    Ware JE, Kosinski M, Gandek B, Aaronson NK, Apolone G, Bech P, Brazier J, Bullinger M, Kaasa S, Leplege A, Prieto L, Sullivan M: The factor structure of the SF-36 Health Survey in 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol. 1998, 51: 1159-1165. 10.1016/S0895-4356(98)00107-3.

    Article  PubMed  Google Scholar 

  29. 29.

    Li L, Wang HM, Shen Y: Chinese SF-36 Health Survey: translation, cultural adaptation, validation, and normalisation. J Epidemiol Community Health. 2003, 57: 259-263. 10.1136/jech.57.4.259.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Lauritsen JM: EpiData data entry, data management and basic statistical analysis system. [Http://]

  31. 31.

    Ware JE, Snow KK, Kosinski M, Gandek B: SF-36 health survey manual and interpretation guide. 1993, Boston, New England Medical Center, The Health Institute

    Google Scholar 

  32. 32.

    McHorney CA, Ware JE, Lu JF, Sherbourne CD: The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care. 1994, 32: 40-66. 10.1097/00005650-199401000-00004.

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Gandek B, Ware JE, Aaronson NK, Alonso J, Apolone G, Bjorner J, Brazier J, Bullinger M, Fukuhara S, Kaasa S, Leplege A, Sullivan M: Tests of data quality, scaling assumptions, and reliability of the SF-36 in eleven countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol. 1998, 51: 1149-1158. 10.1016/S0895-4356(98)00106-1.

    CAS  Article  PubMed  Google Scholar 

  34. 34.

    Cronbach LJ: Coefficient alpha and the internal structure of tests. Psychometrika. 1951, 16: 297-334. 10.1007/BF02310555.

    Article  Google Scholar 

  35. 35.

    Fitzpatrick R, Davey C, Buxton MJ, Jones DR: Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess. 1998, 2: i-iv, 1-74.

    CAS  PubMed  Google Scholar 

  36. 36.

    Fang J: Medical statistics and computer experiments. 2005, Singapore, Stallion Press

    Google Scholar 

  37. 37.

    Hinkle DE, Jurs SG, Wiersma W: Applied statistics for the behavioural sciences. 1988, Boston, Houghton Mifflin, 2nd

    Google Scholar 

  38. 38.

    Chang DF, Chun CA, Takeuchi DT, Shen H: SF-36 health survey: tests of data quality, scaling assumptions, and reliability in a community sample of Chinese Americans. Med Care. 2000, 38: 542-548. 10.1097/00005650-200005000-00010.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Ren XS, Amick B, Zhou L, Gandek B: Translation and psychometric evaluation of a Chinese version of the SF-36 Health Survey in the United States. J Clin Epidemiol. 1998, 51: 1129-1138. 10.1016/S0895-4356(98)00104-8.

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Tseng HM, Lu JF, Gandek B: Cultural Issues in Using the SF-36 Health Survey in Asia: Results from Taiwan. Health Qual Life Outcomes. 2003, 1: 72-10.1186/1477-7525-1-72.

    Article  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Lam CL, Gandek B, Ren XS, Chan MS: Tests of scaling assumptions and construct validity of the Chinese (HK) version of the SF-36 Health Survey. J Clin Epidemiol. 1998, 51: 1139-1147. 10.1016/S0895-4356(98)00105-X.

    CAS  Article  PubMed  Google Scholar 

  42. 42.

    Ronkainen J, Aro P, Storskrubb T, Johansson SE, Lind T, Bolling-Sternevald E, Graffner H, Vieth M, Stolte M, Engstrand L, Talley NJ, Agreus L: High prevalence of gastroesophageal reflux symptoms and esophagitis with or without symptoms in the general adult Swedish population: a Kalixanda study report. Scand J Gastroenterol. 2005, 40: 275-285. 10.1080/00365520510011579.

    Article  PubMed  Google Scholar 

  43. 43.

    Zagari RM, Fuccio L, Wallander MA, Johansson S, Fiocca R, Casanova S, Farahmand BY, Winchester CC, Roda E, Bazzoli F: Gastro-oesophageal reflux symptoms, oesophagitis and Barrett's oesophagus in the general population: Loiano-Monghidoro study. Gut. 2008

    Google Scholar 

Pre-publication history

  1. The pre-publication history for this paper can be accessed here:

Download references


This study was supported by AstraZeneca R&D, Mölndal, Sweden. We thank the participating general practitioners for their collaboration, and the Centers of Disease Control and Prevention of Huangpu District, Songjiang District and Pudong District in Shanghai for providing the assistance in field work. We thank Dr Chris Winchester and Dr Claire Mulligan, from Oxford PharmaGenesis, who provided medical writing support funded by AstraZeneca. We also thank Dr Benjamin Wong and Dr Xiaohua Jin for translating the questionnaires used in this study.

Author information



Corresponding author

Correspondence to Jia He.

Additional information

Competing interests

This study was supported by AstraZeneca R&D, Mölndal, Sweden. Writing support was provided by Chris Winchester and Claire Mulligan of Oxford PharmaGenesis and funded by AstraZeneca R&D, Mölndal, Sweden. Jia He has served as the director of the Department of Health Statistics, Second Military Medical University and WHO/TDR Clinical Data Management Center, Shanghai, China, and also served as a director of the Chinese Biomedicine Statistics Institute. Jia He has received research funding from the National Natural Science Foundation of China, WHO and Shanghai Natural Science Foundation. Saga Johansson is an employee of AstraZeneca R&D, Mölndal, Sweden, and Mari-Ann Wallander was an employee of AstraZeneca R&D, Mölndal, Sweden at the time of the study.

Authors' contributions

YC and XY participated in the acquisition of data, analysis and interpretation of data, and drafting the article. XQM and RW participated in the analysis and interpretation of data, and drafting and critically revising the article. SJ and MAW participated in the conception and design of the study, and critically revising the article. JH made substantial contributions to the conception and design of the study, supervised all aspects of its implementation, and critically revised the article. All authors read and approved the final manuscript.

Yang Cao, Xiaoyan Yan contributed equally to this work.

Electronic supplementary material

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cao, Y., Yan, X., Ma, X. et al. Validation of a survey methodology for gastroesophageal reflux disease in China. BMC Gastroenterol 8, 37 (2008).

Download citation


  • Reflux Symptom
  • Dimension Score
  • Reflux Disease Questionnaire
  • Health Transition Item
  • General Information Questionnaire