Article Text


Risk factors for oesophageal squamous dysplasia in adult inhabitants of a high risk region of China
  1. W-Q Wei1,
  2. C C Abnet2,
  3. N Lu1,
  4. M J Roth2,
  5. G-Q Wang1,
  6. B A Dye3,
  7. Z-W Dong1,
  8. P R Taylor2,
  9. P Albert4,
  10. Y-L Qiao1,
  11. S M Dawsey2
  1. 1Department of Cancer Epidemiology, Cancer Institute, Chinese Academy of Medical Sciences, Beijing, People’s Republic of China
  2. 2Cancer Prevention Studies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland, USA
  3. 3Centers for Disease Control and Prevention, National Center for Health Statistics, Hyattsville, Maryland, USA
  4. 4Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, Maryland, USA
  1. Correspondence to:
    Dr C Abnet
    Cancer Prevention Studies Branch, 6116 Executive Blvd, Rm 705, Bethesda, MD 20892-8314, USA; abnetc{at}


Background: Oesophageal squamous cell carcinoma (OSCC) is a common cancer worldwide and has a very high mortality rate. Squamous dysplasia is the precursor lesion for OSCC and it can be seen during routine endoscopy with Lugol’s iodine staining. We aimed to examine the risk factors for squamous dysplasia and determine if a risk model could be constructed which would be useful in selecting apparently healthy subjects for endoscopic screening in a high risk population in Linzhou, People’s Republic of China.

Subjects and methods: In this cross sectional study, 724 adult volunteers aged 40–65 years were enrolled. All subjects completed a questionnaire regarding potential environmental exposures, received physical and dental examinations, and underwent upper endoscopy with Lugol’s iodine staining and biopsy. Subjects were categorised as having or not having histologically proven squamous dysplasia/early cancer. Risk factors for dysplasia were examined using univariate and multivariate logistic regression. The utility of the final multivariate model as a screening tool was assessed using a receiver operating characteristics curve.

Results: We found that 230 of 720 subjects (32%) with complete data had prevalent squamous dysplasia. In the final multivariate model, more household members (odds ratio (OR) 1.12/member (95% confidence interval (CI) 0.99, 1.25)), a family history of cancer (OR 1.57 (95% CI 1.13-2.18)), higher systolic blood pressure OR 1.11/10 mm Hg (95% CI 1.03-1.19)), heating the home without a chimney (OR 2.22 (95% CI 1.27–3.86)), and having lost more but not all of your teeth (OR 1.91 for 12–31 teeth lost (95% CI 1.17–3.15)) were associated with higher odds of having dysplasia. Higher household income (OR 0.96/100 RMB (95% CI 0.91–1.00)) was associated with a lower odds of having dysplasia. Although we found several statistically significant associations, the final model had little ability to accurately predict dysplasia status, with maximum simultaneous sensitivity and specificity values of 57% and 54%, respectively.

Conclusions: We found that risk factors for dysplasia were similar to those previously identified as risk factors for OSCC in this population. The final model did a poor job of identifying subjects who had squamous dysplasia. Other methods will need to be developed to triage individuals to endoscopy in this high risk population.

  • OSCC, oesophageal squamous cell carcinoma
  • NHANES, National Health and Nutrition Examination Survey
  • ROC, receiver operating characteristic
  • PAH, polycyclic aromatic hydrocarbons
  • OR, odds ratio
  • oesophageal cancer
  • dysplasia
  • tooth loss
  • cancer screening
  • China

Statistics from

Oesophageal cancer is the eighth most common incident cancer worldwide and the sixth most common cause of cancer death.1 The distensibility of the oesophagus allows tumours to grow and spread before inducing dysphagia, the typical first symptom in oesophageal cancer patients. Therefore, most subjects present at the doctor too late for therapy with curative intent. To reduce the burden of oesophageal cancer it will be necessary to develop non-invasive patient acceptable screening tests that can be used in high risk populations and subpopulations.

Linzhou (formerly Linxian) China has some of the highest rates of oesophageal squamous cell carcinoma (OSCC) and clinically similar gastric cardia adenocarcinoma in the world. Age standardised incidence rates for both sexes exceed 100/100 000/year,2 making Linzhou an ideal place to study early detection methods for OSCC. Previous studies in this population have demonstrated that squamous dysplasia is the precursor lesion for OSCC3,4 and that endoscopy with Lugol’s iodine staining effectively identifies subjects with dysplasia.5 Endoscopic screening of the entire at risk population would be a massive undertaking and a method of triage would facilitate screening by selecting those subjects at highest risk for dysplasia. Models that predict an individual’s risk of having/developing a disease based on simple personal characteristics, information available in a brief interview, and physical examination have been developed for several conditions, including breast cancer6 and heart disease.7

We conducted a population based screening study in adult volunteers from Linzhou who underwent a battery of tests and provided biological samples for the testing of novel screening methods. The current state of their oesophageal health was determined using endoscopy with Lugol’s iodine staining and biopsy in all participants. Here we report risk factors for oesophageal squamous dysplasia and our attempt to develop a risk model for dysplasia/asymptomatic OSCC based on easily gathered personal characteristics.



A screening study comprising 724 adults, the Cytology Sampling Study 2, was conducted in Yaocun commune, Linzhou, Henan Province, People’s Republic of China, in the spring of 2002. This study was conducted under the auspices of the Institutional Review Boards of the Cancer Institute, Chinese Academy of Medical Sciences and the US National Cancer Institute, and all subjects provided written informed consent. Subjects were volunteers from three villages, aged 40–65 years, who were apparently healthy and had no contraindications for endoscopy. Initially, one study investigator visited the leader and health care worker in each of the villages to notify them of the impending study and to arrange the date the study would take place. All eligible subjects in each village were invited to participate and 41%, 25%, and 14% of eligible subjects were enrolled from villages 1, 2, and 3, respectively.

Questionnaire, physical examination, and ethanol flushing response

All subjects completed a structured questionnaire that included personal characteristics (place of birth, education, health history, etc), habits (cigarette and pipe smoking, alcohol consumption, etc), and living conditions (people in household, household income, heating and cooking stoves and fuels, etc). The questionnaire was based on information previously found or suspected to be associated with OSCC in the population and others. Subjects also received a brief physical examination which included a single blood pressure measurement using an automated sphygmomanometer. The numbers of subjects in the household and monthly household income were collected and a monthly income per capita variable was calculated as a ratio of the two. Subjects were also phenotyped for the ethanol induced flushing response (an indication of elevated acetaldehyde concentrations and associated with a polymorphism in the ALDH2 gene) using a modification of a previously published method.8 Briefly, two small adhesive bandages were opened and the pad was wetted with either water or 100% ethanol. The bandages were placed on the forearm for 10 minutes and then removed. Subjects were considered positive when water caused no change while ethanol produced a small red welt.

Dental examinations

All subjects received a comprehensive dental examination using protocols derived from the current National Health and Nutrition Examination Survey (NHANES 1999–2004). Details of the current methods used on NHANES for assessing oral health can be found at: Four dental examiners, two Chinese dentists and two US Public Health Service dentists, completed all of the oral health examinations. Three dental examiners were trained by the reference examiner (BD) who is also the trainer and reference examiner for the current NHANES. The tooth count assessment involved examining the maxillary and mandibular arches to identify the presence or absence of permanent teeth. All teeth, including third molars, were assessed. Missing teeth were identified as not present regardless of reason. Permanent retained dental roots were identified separately. Inter-rater reliability for dentate status was considered to be excellent, with Kappa statistics >0.90 and per cent agreements >94.0%.

Endoscopy, biopsy, and histology

Endoscopy with Lugol’s iodine staining and biopsy were performed as previously described.9 During endoscopy, the entire oesophagus and stomach were visually examined, and one or more 2.8 mm biopsies were taken from all grossly abnormal appearing lesions. The entire oesophagus was sprayed with Lugol’s iodine solution and unstained areas were biopsied. If no focal lesions or unstained lesions were found, a standard site (25 cm from the incisors at the 6 o’clock position) in the mid oesophagus was sampled. Biopsies were fixed in 95% ethanol, embedded in paraffin, cut into 5 μm sections, and stained with haematoxylin and eosin. Biopsy slides were read independently by two pathologists (NL, SMD), without knowledge of the patient’ history or visual endoscopic findings. Histological criteria were based on previous descriptions.10 All 724 subjects had at least one technically sufficient biopsy.

Statistical analysis

A number of other questionnaire items were collected but are not presented. Data on consumption of particular dietary items were excluded because there was very little variation in responses. Other data were dropped because there was substantial overlap in both the topic of the question and in the response with the presented results. For example, a subset of subjects who responded that they heated without a chimney also reported that the type of fuel they used to heat there home was smoky. In this case the latter variable was dropped because subjects responding in the affirmative were similar to the former and because smokiness is subjective while the presence of a chimney is objective.

All results were tabulated by dysplasia status. For this analysis all subjects were categorised as either no dysplasia (normal, basal cell hyperplasia, or oesophagitis) or dysplasia (any grade of dysplasia or early cancer). Smoking was almost exclusively restricted to males, and because it was generally light it was dichotomised as “ever regularly for ⩾ 6 months” versus less than that. Similarly, drinking was dichotomised as any ethanol consumption in the previous 12 months versus none, because of minimal consumption. Blood pressure was scaled to 10 mm Hg. Family history of cancer was considered positive if the subject reported any cancer in first degree relatives but almost all cases were oesophageal or gastric tumours (95% of all positive responses). Tooth loss was divided into empirical quintiles, with edentulism set as the fifth quintile a priori. Household income was scaled to 100 Renminbi (RMB). Heating systems were divided into those that produced smoky homes (heating without chimney) and those that did not (heating with chimney or no heating stove).

Univariate associations were examined using logistic regression, and p values were derived from likelihood ratio tests for the addition of the variable to a null model. Four of 724 subjects were excluded because they were missing data on one or more of the variables in the multivariate model. The multivariate model was built using the following system. Variables were selected for initial inclusion because of previously reported association with OSCC and/or a p value <0.25 in the univariate models. The preliminary model was fit and variables were removed one at a time if they showed a p value >0.25 and changed the betas for variables that remained in the model by <10%. After stepwise removal of variables, we checked the linearity of continuous variables by adding quadratics and found no evidence of non-linearity. The final model was stratified by the three villages of residence and a summary odds ratio (OR) calculated, but this did not have a major impact on the final ORs. We examined a number of potential interactions but found that none was statistically significant. We tested the model for goodness of fit using the Pearson (p = 0.37) and Hosmer-Lemeshow (p = 0.18) tests and concluded that the model fitted well. Diagnostic plots suggested no strongly influential points. p values for each term in the final model were calculated using likelihood ratio tests constructed by comparing the likelihoods with and without the term of interest in the model. Individual risks were calculated by back transforming the linear combination of estimated regression coefficient and individual specific covariates using an inverse logit transformation. A receiver operating characteristic (ROC) curve was drawn with data from a classification table using probability cut offs from 0 to 1, with 0.1 intervals.


Table 1 presents the number of subjects recruited and the univariate associations between a large number of personal characteristics, habits, and living conditions, and the presence of histologically proven dysplasia.

Table 1

 Analytic cohort for the Cytology Sampling Study 2 and univariate differences by dysplasia status

ORs and 95% confidence interval (CI) from a multivariate model for the odds of having prevalent dysplasia are presented in table 2. Age, sex, smoking, and the flushing response were not significantly associated with the odds of having dysplasia but they acted as confounders for other variables retained in the model. More household members, a family history of cancer, higher systolic blood pressure, heating your home without a chimney, and having lost more but not all of your teeth were associated with higher odds of having dysplasia. Higher household income was associated with lower odds of having dysplasia.

Table 2

 Odds ratios (OR) and 95% confidence interval (CI) for prevalent squamous dysplasia in the Cytology Sampling Study 2 cohort calculated in the multivariate model*

The final model, including the variables presented in table 2, was used to estimate each subject’s risk of having dysplasia based on their individual data. The distribution of the calculated risk of dysplasia is presented in fig 1. The mode was 20–30% risk of having dysplasia and only a small proportion of the cohort was at an estimated risk of greater then 50%. Using different risk probability cut offs for a positive result, we created an ROC curve for our final multivariate model (fig 2). The area under the curve was very low (58%) and this figure demonstrates the poor prognostic value of our model. Because of the low performance of our model, we did not use a more formal process consisting of model building and model testing data subsets. A formal leave one out cross validation or training and test set paradigm would result in a lower estimate of the area under the curve.

Figure 1

 Distribution of the modelled probability of squamous dysplasia in the entire Cytology Sampling Study 2 cohort. This histogram presents the distribution of the modelled probability of a subject having prevalent squamous dysplasia. The model included variables for age, sex, smoking status, ethanol patch test flushing response, number of persons in the subject’s household, household income, family history of cancer, systolic blood pressure, heating stove type, and quintile of tooth loss.

Figure 2

 Receiver operating characteristic (ROC) curve for prediction of current dysplasia using the multivariate model. This ROC curve presents the utility of the multivariate model at different cut offs based on a classification table using probability cut offs from 0 to 1, with 0.1 intervals (see methods). The solid line and symbols are the model performance and the broken line is the 45° reference line for no information.


In the year 2000, oesophageal cancer was estimated to cause 338 000 deaths worldwide, ranking just below breast cancer, which caused an estimated 373 000 deaths.1 Eighty per cent of these deaths were in developing countries. Approximately 50% of all cases occur in The People’s Republic of China, and within China they are concentrated in a high risk area in north-central China in and around the Taihang mountain range which includes about 100 million people. Endoscopy with Lugol’s iodine staining has high sensitivity and specificity for high grade squamous dysplasia and undiagnosed OSCC but endoscopic screening of the entire at risk population is impractical. Therefore, we are searching for a way to triage people to select those who are most likely to have preneoplastic lesions or asymptomatic cancer. A simple way that has proved useful for breast cancer,6 heart disease,7 and other diseases has been to develop a statistical model based on personal characteristics and other easily gathered information. We conducted a screening study in a high risk community in China to develop data and samples useful for testing the sensitivity and specificity of different triage methods. We used the data we collected to examine risk factors for dysplasia and in an attempt to build a simple risk model.

Our multivariate model contained 10 items, most of which have previously been associated with OSCC in this population. Decreasing number of persons in the household and increasing household income may both be indicators of higher socioeconomic status. Family history of cancer, mostly upper gastrointestinal cancer in this population, has been a consistent risk factor in many studies.11–13 A mechanism for the apparent association between systolic blood pressure and dysplasia is unclear and elevated blood pressure has not previously been associated with OSCC, but hypertension is widespread and stroke is the leading cause of death in this region.

We previously hypothesised that polycyclic aromatic hydrocarbons (PAH) may play a role in the high rate of OSCC in this population but we have not previously tested the relevance of how subjects in Linzhou heat their homes. The association between increased risk of squamous dysplasia and the use of heating stoves without chimneys is consistent with our PAH hypothesis. In Linzhou, most homes are heated with coal that is often burned in unvented stoves, which contaminates the air and can contaminate food. Uncooked food was found to be contaminated with PAHs.14 Also, oesophagectomy specimens shown signs of PAH contamination15 and residents of Linzhou have been shown to have high concentrations of 1-hydroxy pyrene in their urine, a metabolite of benzo-[a]-pyrene.16

We previously reported that tooth loss was a risk factor for OSCC and gastric cancer in this population.17 In that analysis we did not separate out edentulous subjects as a separate category. Our primary hypothesis for a mechanistic explanation of this association is that poor oral health is associated with a bacterial environment, with a greater propensity to reduce nitrate to nitrite and thereby increase exposure to carcinogenic nitrosamines formed endogenously.18 Edentulism has been hypothesised to nullify this increased risk by altering the oral microbial bacterial ecology and conditions that support the overgrowth of some putative bacteria.19 Our current results bolster this hypothesis.

Six of the 10 variables in our multivariate model had p values near or below the 0.05 level but none of the ORs was large. Established cancer screening methods have high ORs.20,21 For example, the OR for breast cancer with a positive mammogram is approximately 300. The ROC curve for our model highlights the futility of using the current risk model in our population to triage subjects to endoscopy. Two factors may explain the poor performance of our model. Firstly, although we used a detailed questionnaire, physical examination, dental examination, and a phenotyping test to examine our subjects, we may have overlooked important factors associated with squamous dysplasia/OSCC. In other studies we have identified low serum selenium,22 low serum vitamin E,23 and low tissue zinc concentrations24 as significant risk factors for OSCC. The current study used only questionnaire data to assess diet. Secondly, diet was quite uniform between subjects and this highlights another potential reason why we failed to develop a powerful risk model. Our population was quite similar in their habits and living conditions, and this uniformity reduced our ability to identify important risk factors. Another limitation of our study was that it was cross sectional, and this study design has less power to identify important associations than a more powerful prospective study.

In conclusion, we completed a large population based screening study in which we collected extensive questionnaire data and biological samples, and in which all subjects received the gold standard test (endoscopy with Lugol’s iodine staining and biopsy) for oesophageal squamous dysplasia. We were able to examine and find numerous risk factors for this OSCC precursor lesion. Our initial effort to develop a method for triaging subjects to endoscopy using only questionnaire/physical examination data was unsuccessful. We are hopeful that our oesophageal cytology test or molecular tests employing serum or oesophageal cells will lead us to a clinically useful primary examination.


View Abstract


  • Conflict of interest: None declared.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.