Background and aims: To investigate the temporal relationship between sequence variation in the enhancer II (EnhII), basal core promoter (BCP), and precore regions of hepatitis B virus (HBV) and the risk of hepatocellular carcinoma (HCC), we conducted a nested case–control study within a cohort of 4841 male HBV carriers who were recruited during the period 1988–1992.
Methods: The HBV DNA sequence was determined in baseline blood samples taken from 132 incident cases and 204 controls. Base exchanges during follow-up in 71 cases were compared with 81 controls with samples taken during a similar length of follow-up.
Results: Nine single nucleotide polymorphisms in the EnhII/BCP regions (six of which were genotype C HBV related) were associated with subsequent risk of HCC. The strength of these associations decreased as the lag time between baseline measurement and diagnosis increased over 3 years. However, an increased disease risk in subjects with BCP double variants (mostly T1762/A1764) or genotype C HBV-related variants was evident 9 years or more before diagnosis. The BCP double variants (odds ratio, 1.92 (95% confidence interval, 1.14 to 3.25)) were statistically significantly associated with HCC risk even after adjusting for alanine aminotransferase levels, antibodies against HBV e antigen, HBV genotype, HBV viral load, and other sequence variants. Longitudinal analysis indicated that the increased HCC risks for at-risk sequence variants were attributable to the persistence of these variants.
Conclusions: HCC risk is associated with sequence variation in the EnhII/BCP regions of HBV, and persistence of at-risk sequence variants is critical for HCC development.
- Hepatocellular carcinoma
- hepatitis B virus
- enhancer II
- basal core promoter
Statistics from Altmetric.com
Viral sequence variation frequently plays an essential role in viral replication and pathogenesis. To date, eight major genotypes (A to H) of hepatitis B virus (HBV) have been characterised based on a divergence over the entire nucleotide sequence of greater than 8%.1 In Asia, genotypes B and C are predominant.1–5 Genotype C has been associated with an increased risk of hepatocellular carcinoma (HCC) compared with genotype B.3 5 However, HBV varies genetically not only between but also within genotypes. Therefore, HBV strains of the same genotype may differ in the capacity to induce HCC.
The genome of HBV contains four overlapping open-reading frames (pre-S/S, precore/core, polymerase, and X-gene).1 The precore region encodes the HBV e antigen (HBeAg), which has been used clinically as an indicator of active viral replication and proposed to have an immunoregulatory role in natural infection.6–8 The enhancer II (EnhII) and basal core promoter (BCP) resides in the overlapping X gene. The HBV X protein is capable of transactivating HBV promoters and a variety of cellular functions.9 10 The EnhII/BCP themselves also play an important role in viral life cycle by regulating the formation of the 3.5-kb pregenomic RNA, which is then translated to produce the viral core and polymerase proteins and HBeAg.1 Several nucleotide alterations in the BCP/EnhII/precore regions have been shown affecting viral function in vitro,11–16 but their effects on the development of HCC has been poorly investigated, and the results are controversial.3 4 17–19
Most studies of HBV BCP/EnhII/precore nucleotide variation and HCC have been conducted with the use of samples taken after the cancer has been diagnosed.4 17–19 Such studies provide no information on the temporal relation between nucleotide variations and HCC because mutations may occur in the course of infection due to the low fidelity of the reverse transcriptase used for HBV replication.1 In addition, most previous studies only focused on certain specific sites, which overlooked other nucleotide variations that also may be associated with HCC and did not analyse co-segregating nucleotide variations that may have an important role in the pathogenesis of HCC.3 4 17–19 This nested case–control study was conducted within a large cohort of male HBV carriers. We examined sequence variation in the EnhII/BCP/precore region of HBV using blood samples taken up to 14 years before diagnosis. Data on other viral factors, including genotype, viral load, HBeAg and antibodies against HBeAg (anti-HBe), were also included to analyse their associations with sequence variations and adjust for their effects.
SUBJECTS AND METHODS
Details of the cohort study have been described previously.5 The research ethics committee at the College of Public Health, National Taiwan University approved the study protocol. In brief, the study cohort consisted of 4841 male HBV surface antigen (HBsAg) carriers, aged 30 years or older, who had no history of HCC and who attended a specific clinic for asymptomatic HBsAg carriers at the Liver Research Unit of Chang-Gung Memorial Hospital or the Government Employee Central Clinics for regular health examinations between August 1988 and June 1992. At recruitment, questionnaire data, including demographic characteristics, lifetime habits of alcohol and tobacco use, and personal and family history of major chronic diseases, and blood samples were collected by trained research assistants.
Follow-up of study participants was achieved through various channels, including ultrasonography measurements and conventional liver function tests every 6–12 months, a personal telephone interview, abstraction of medical records, and a data linkage to the national death certification and cancer registry systems.
Cases and controls
This study was based on the 154 cases and 316 controls with data on HBV genotype and DNA levels in our previous nested case–control study.5 The 154 cases were confirmed during the follow-up period from August 1988 to December 2002. Their diagnoses were based on either a histological finding or elevated serum α-fetoprotein (⩾400 ng/ml) combined with at least one positive image on angiography, sonography and/or computed tomography. For each case, we randomly selected up to three controls from the cohort of HBsAg carriers who were alive and had not been diagnosed with HCC throughout the follow-up period. Controls were matched for age at recruitment (within 5 years) and date of blood collection (within 6 months). No study subjects who were known to have a history of antiviral therapy were included in the analysis.
Subjects were excluded if they had no adequate baseline blood samples for extraction of HBV DNA (seven cases and 19 controls) or whose sequence data were not available because of poor sequence data (one case and two controls) or polymerase chain reaction (PCR) failure resulted from low quantity of DNA (below about 10 000 copies/ml; the detection limit of our nested PCR assay) (13 cases and 88 controls) or other cryptic reasons (one case and three controls). Consequently, a total of 132 cases and 204 controls were studied. The mean age (±standard deviation) at HCC diagnosis for the 132 cases included in the study was 56.2±8.7 years.
To analyse the persistence of a HBV variant and HCC, follow-up samples were also retrieved for sequence analysis. Of the 132 cases with sequence data at baseline, 99 (75%) had follow-up samples adequate for analysis. In cases in which a case patient had multiple follow-up samples, the sample collected at the last follow-up examination was retrieved. For controls, subsequent samples taken within 2 years of the dates of the follow-up samples chosen for the corresponding case patients were retrieved. A total of 118 controls with baseline sequence data which were matched to the 99 cases had such samples. Twenty-eight cases (28.3%) and 37 controls (31.4%) were excluded due to poor sequence data or PCR failure at follow-up samples. The 71 cases and 81 controls included in the longitudinal analysis and the overall study subjects in this study were similar with respect to age and distributions of baseline HBV-related factors (data not shown).
HBsAg, HBeAg, anti-HBe, plasma HBV DNA levels, and HBV genotypes were tested as described previously.5 HBV DNA was extracted from 200 μl of plasma samples using the QIAamp DNA blood mini kit (Qiagen, Chatsworth, CA). Nested PCR was used to amplify the fragment of nucleotides 1685 to 1900 (including EnhII (nucleotides 1685–1773), BCP (nucleotides 1742–1849), and precore (nucleotides 1814–1900)) of the HBV genome. First-round PCR was performed on 6 μl of DNA extract with the outer primers 1421F (5′-TTGTYTACGTCCCGTCGGCG-3′) and 2072R (5′-CCTGAGTGCTGTATGGTGAGG-3′) in a 30-μl reaction mixture containing 50 mmol/l KCl, 20 mmol/l Tris-HCl, 2 mmol/l MgCl2, 5 mmol/l of each of the four deoxynucleotide triphosphates, 5 U of Taq Full Hot Start DNA polymerase (Becton, Dickinson and Company Biosciences, San Jose, CA) and 5 pmol of each of the primers. PCR conditions consisted of 4 min at 94°C, 34 cycles of 40 s at 94°C, 30 s at 55°C, and 1.5 min at 72°C, followed by a long extension of 10 min at 72°C. The second-round PCR was performed in a 50-μl reaction mixture containing 2 μl of the first-round PCR products. Amplification was performed using the primers 1569F (5′-CTGCCGGACCGTGTGCACTTC-3′) and 1992R (5′-GGCGGTGTCGAGGAGATCT-3′) or 1943R (5′-AGAAGGCAAAAAAGAGAGTAACTC-3′) with the same cycle profile as the first-round PCR. Six microlitres of the second-round PCR product was electrophoresed on a 2% agarose gel and visualised under ultraviolet light after ethidium bromide staining. If second-round PCR yielded only a weak band or there was no visible band on ethidium bromide-stained agarose gel, we performed third-round PCR with primers 1590F (5′-CACGTCGCATGGAGACCAC-3′) and 1992R or 1943R. Best quality water was included as negative controls in each round of PCR. Amplified PCR products were purified and sequenced using the ABI PRISM BigDye sequencing kits (Applied Biosystems, Foster City, CA) and an ABI 3130×l Genetic Analyzer (Applied Biosystems). Each amplicon was analysed with inner primers in both sense and antisense directions. Nucleotide sequences and deduced amino acid sequences were aligned and compared by using the software MEGA version 3.0.20
Unconditional logistic regression models were used to estimate the odds ratios (ORs) of HCC associated with HBV-related factors and corresponding 95% confidence intervals (CIs). Findings were adjusted for matching factors (for example, age at recruitment and the time when blood was drawn) and other potential confounders as appropriate. Factor analysis was performed using principal component analysis to identify co-segregating nucleotide variations and evaluate their associations with other known viral factors. In this analysis, variation at each nucleotide position was treated as a binary variable. We used orthogonal transformation (varimax rotation in SAS software) to rotate the principal components in order to achieve clearer interpretability. Those principal components having eigenvalues greater than 1 were retained. Only variables that shared greater than or equal to 16% variance (corresponding to a loading factor of 0.4 in absolute value) with a principal component were used for interpretation, as has been suggested for factor analyses.21 22 We then used stepwise regression methods to seek the set of HBV-related factors that were independent predictors of risk. The backward and forward selection methods yielded the same set of independent risk factors. All analyses were conducted using the SAS version 8.2 (Cary, NC, USA). All reported p values are two-sided.
At recruitment, there was no significant difference in age between cases and controls (p = 0.9434, Pearson χ2). Alanine aminotransferase (ALT) activity, plasma HBV DNA levels, HBV genotype, and status of HBeAg/anti-HBe showed significant associations with HCC (table 1).
Single-nucleotide polymorphisms (SNPs) that had variant types with frequencies of greater than 5% were observed at positions 1703, 1719, 1726, 1727, 1730, 1752, 1753, 1754, 1762, 1764, 1766, 1773, 1799, 1800, 1803, 1846, 1858, 1862, 1874, 1896 and 1899. Nine SNPs in the EnhII/BCP region were significantly associated with HCC, showing adjusted ORs from 2.2 to 4.7. We also examined sequence variation in the precore region and HCC on the basis of deduced amino acid. HBV genomes from 69 (52.3%) cases and 121 (59.3%) controls contained stop codon mutation. In addition, eight (6.0%) cases and 17 (8.3%) controls had a start codon mutation. There were no associations between precore stop/start codon mutation and HCC (table 2). Because SNPs at positions 1762 and 1764 were apparently linked, they were then transformed into a binary variable designated as BCP double variants, which took the value of zero if both A1762 and G1764 (the most prevalent nucleotide type in controls) were present and the value of one otherwise.
Analysis of intercorrelations between HBV-related factors
Factor analysis, a multivariate correlation technique, was used to investigate the clustering of HBV-related variables that were thought to be risk factors for HCC. Analysis of 11 variables, including genotype, anti-HBe, and nine sequence variation variables, emerged three factors that together explained 65.2% of the variance in the data. Genotype C and SNPs at positions 1703, 1719, 1726, 1727, 1730 and 1799 were highly correlated with one factor and thus the six SNPs can be interpreted as genotype C-related SNPs. BCP double variants and SNP at position 1753 were highly correlated with the second factor. Anti-HBe positivity and the precore start/stop codon mutation were associated with the third factor (table 3). Inclusion of viral load or substitution of HBeAg for anti-HBe in the analysis resulted in similar findings.
To identify which viral factors were independent predictors of the risk of HCC, we next conducted a stepwise logistic regression analysis, with matching factors, nine sequence variation variables listed in table 3, ALT levels, HBV genotype, anti-HBe, and viral load in the model. This analysis resulted in four independent predictors of risk of HCC: genotype, BCP double variants, anti-HBe and ALT levels (table 4).
Risk and lag time
The strength of the associations between sequence variants and HCC generally decreased as the lag time (defined as the time between baseline measurement and diagnosis) increased over 3 years. For carriers of genotype C HBV or carriers of HBV strains harbouring the BCP double variants, however, a substantial excess risk could already be detected for 9 or more years before the diagnosis of HCC. Like genotype C, the ORs of HCC associated with the presence of any genotype C-related variants or harbouring each additional genotype C-related variant also decreased as the lag time increased over 3 years. A significant inverse association between the precore start/stop codon mutation and HCC was seen only within 3 years before diagnosis (table 5).
Persistence of sequence variant and risk
The median time interval between the dates of the baseline samples and the dates of the follow-up samples was 3.7 years (range, 0.5 to 11.8 years) for cases and 3.0 years (range, 0.5 to 11.0 years) for controls (p = 0.0536, Wilcoxon rank sum test). For all at-risk sequence variants identified to be associated with HCC except the precore start/stop codon mutation, detection of a high-risk variant at both time points was significantly associated with an increased risk of HCC after adjusting for age at recruitment and the interval between the dates the baseline sample and the follow-up sample were collected, while no association with HCC was observed for detection at a single time point (table 6).
Several studies have associated the presence of T1762/A1764 in the BCP of HBV with more severe liver diseases,4 17–19 23 but this association has not always been confirmed in other studies.3 24–26 Only a few of previous studies have been focused on HCC, and most of these studies are hampered by small sample size.3 4 17–19 In addition, most previous investigations of HCC have been cross-sectional or case–control studies, conducted at the time of diagnosis of the patients.4 17–19 These investigations provided no information on the temporal order of sequence variability and the development of HCC.
In this case–control study nested within a cohort study of HBV carriers, a significantly positive association between the BCP double variants (mostly T1762/A1764) and HCC was observed even after adjusting for ALT levels and other viral factors, including genotype, viral load, HBeAg/anti-HBe status, and other sequence variants. The magnitude of the ORs of HCC associated with the presence of the BCP double variants is generally 2- to 3-fold, which is much lower than the 10-fold OR reported by another Taiwanese study based on a series of clinical patients.18 In fact, we saw a 16-fold OR associated with the presence of the BCP double variants only within 3 years before the diagnosis of HCC.
It is possible that the BCP double variants may be more readily detectable at times closer to the onset of HCC. However, the association between the presence of the BCP double variants and HCC could already be detected for up to 9 years before the diagnosis of HCC. Base exchanges at positions 1762 and/or 1764 during follow-up appeared in about 20% of the subjects. We also found that detection of the BCP double variants in both the baseline and a subsequent follow-up samples was associated with substantially higher risk than detection of this variant at a single time point. Thus, the increased HCC risk for carriers of HBV strains harbouring the BCP double variants is most likely a result of persistence of such variants.
In addition to the BCP double variants, seven SNPs were observed to be associated with HCC risk. These nucleotide variations may be independently causal or, since some of them are highly correlated, may predict disease because they represent a class of sequence variations that reflect a specific group of HBV. Although many nucleotide variations in the EnhII/BCP/precore regions of HBV have been identified by studies elsewhere,17–19 23 26 27 clustering analysis has never been used to examine their interrelationships. Knowledge of how these nucleotide variations cluster and how these clusters relate to known viral factors could not only help clarify the role of these nucleotide variations in the pathogenesis of HCC but also help researchers interpret and develop multivariate models.
Using a factor analysis involving 11 viral factors, including anti-HBe/HBeAg, HBV genotype, and nine sequence variation variables, we found that the clustering of precore start/stop codon mutation with anti-HBe rather than other variables. The G-to-A change at position 1896 is a hot-spot mutation in the precore region, which creates a premature stop codon and has been associated with HBeAg levels.15 However, results of studies of its relationship with liver disease have been inconsistent. Some studies proposed that the precore stop codon mutation can be associated with fulminant hepatitis,28 29 one observed that this mutation tended to associate with less hepatic inflammation,23 while a few did not find a notable association with liver disease.17 18 24 26
In this study, we also found some subjects with precore start codon mutation. The precore start/stop codon mutation was preferentially found in controls compared with cases, but a reduced OR of HCC associated with the presence of such precore mutations could be detected only within 3 years before the diagnosis of HCC. Consistent with this finding, persistence of the precore start/stop codon mutations was not significantly associated with HCC. From a stepwise regression analysis with all the viral factors examined in this study, anti-HBe instead of the precore start/stop codon mutation was identified as an independent predictor of HCC risk. Thus, the effect of such mutations on HCC development may be due to their role in HBeAg seroconversion.
The mechanisms whereby the interrelation existed between the appearance of the position 1753 variants and the BCP double variants are unclear. However, our observation of an increased risk for HCC associated with infection by HBV strains harbouring variants at positions 1753 or 1762/1764 highlights the importance of the combination of the variations in the nucleotide region 1751 to 1778 in developing HCC. This region contains the binding site of multiple liver-enriched transcription factors, including the hepatocyte nuclear factor 4, which has been shown to up-regulate the transcription from the BCP.30 Although it is unknown how the actions of these factors are modulated during HBV-related hepatocarcinogenesis, variation in the nucleotide region 1753–1766 has also been associated with responsiveness to interferon treatment.27
As expected, we found that most of the SNPs associated with HCC were genotype C related. These signature SNPs for genotype C appear to cluster in the EnhII/BCP regions, which are important for the regulation of HBV gene expression and replication.1 Subjects harbouring any of the genotype C-related variants were at increased risk for HCC, and our longitudinal data indicate that the increased risk could be largely attributable to persistence of these variants. When the genotype C-related nucleotide variants were analysed in a cumulative manner, the HCC risk increased with increasing number of these variants.
So far, little is known about the functional significance of most of the nucleotide substitutions identified to be associated with HCC in this study. Although the BCP double variants were shown to enhance the viral replication efficiency in several transfection experiments with HBV DNA, these data could not rule out the influence of additional nucleotide alterations throughout the HBV genome.11 13 14 15 31
The strengths of this study include the use of the nested case–control study design, which preserves the validity of the cohort study. We used asymptomatic HBsAg carriers not receiving antiviral therapy who were identified through routine physical examination rather than clinical patients, and thus the data are important in understanding the role of viral sequence variation in the natural history of HBV. Our repeated sequence analysis provides data on the long-term stability of viral sequence and helps clarify the temporal relationship between a sequence variation and the occurrence of HCC. Finally, we had detailed information on other viral factors, which enabled us to analyse their relationships with viral sequence variation and to adjust their effects.
The fact that the sensitivity of our nested PCR assay is approximately 10 000 copies/ml might limit the generalisation of our study results. However, this problem is probably not important since two prospective studies have demonstrated that a circulating HBV DNA level lower than around 10 000 copies/ml poses no increased risk for HCC.5 32 In addition, similar patterns of ORs associated with HCC for both HBV DNA and genotype as reported by our previous study5 that included the entire study subjects were found. It seems reasonable to speculate that selection bias due to dropping out of subjects is unlikely to have affected our results substantially.
In conclusion, we found that several SNPs in the EnhII/BCP regions of the HBV genome were associated with the development of HCC. These associations could be detected for nine years or more before the diagnosis of HCC, and persistence of at-risk variants is crucial for hepatocarcinogenesis. Genotype C accounted for most of the SNPs associated with HCC. After taking into account the complex interrelation between HBV-related factors, including ALT levels, status of HBeAg/anti-HBe, genotype, viral load, and sequence variations, we have identified anti-HBe status, ALT levels, genotype and BCP double variants as possible independent predictors for HCC.
Funding: This study was financially supported by grants NSC 93-2320-B-002-013 and NSC 94-3112-B-002-017 from the National Science Council, Taiwan.
Competing interests: None.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.