Article Text


Conservation of the cag pathogenicity island is associated with vacA alleles and gastroduodenal disease in South AfricanHelicobacter pylori isolates


BACKGROUND The development of clinically significant disease in South Africa is associated with the vacuolating cytotoxin gene (vacA) s1 genotype but not with the presence of the cytotoxin associated gene cagA. cagA occurs in >95% of South African isolates and is a variable marker for the entirecag pathogenicity island (PAI).

AIM To characterise the cagPAI in South African isolates and to investigate if structural variants of this multigene locus were associated with variations in vacA status and clinical outcome.

PATIENTS AND METHODS We studied 109Helicobacter pylori strains (36 from patients with peptic ulceration, 26 with gastric adenocarcinoma, and 47 with no pathology other than gastritis) for differences in selected genes of the cagPAI and alleles ofvacA by polymerase chain reaction.

RESULTS All strains were cagA +. Sixty five (60%) strains had an intact contiguous cagPAI; 78% of peptic ulcer isolates, 73% of gastric adenocarcinoma isolates, but only 40% of gastritis alone isolates (p< 0.01). The entirecagII region was undetectable in 23% of gastritis alone isolates but in only 8% of peptic ulceration isolates (p<0.05). The vacA signal sequence and mid region demonstrated a strong relationship between the virulence associated vacA s1 (p<0.005) andvacA m1 (p=0.05) alleles and an intactcagPAI.

CONCLUSION Although a complete cagPAI was a feature of most infected individuals, deletions in the 5′ region of this genetic locus were associated with gastritis alone and with the non-cytotoxic s2/m2vacA genotype.

  • cagA
  • gastric cancer
  • Helicobacter pylori
  • pathogenicity
  • peptic ulcer
  • vacuolating cytotoxin

Statistics from

Helicobacter pylori is the cause of chronic gastritis and is involved in the pathogenesis of peptic ulceration, primary gastric lymphoma, and gastric adenocarcinoma.1-3 The development of clinically significant disease appears however to depend on a number of factors, including the virulence of the infecting strain, susceptibility of the host, and environmental cofactors.4

The majority of vacuolating cytotoxin (VacA) producing isolates5 6 possess the genecagA, a marker for thecag pathogenicity island (cagPAI), and production of the CagA protein is associated with clinically significant gastroduodenal disease.7-10 cag+ strains encode a type IV bacterial secretion system which mediates translocation of CagA into host cells where it is phosphorylated and activates a number of intracellular pathways to induce cytoskeletal changes.11-14 Disruption of manycagPAI genes incag + strains abolishes these effects.

The presence of the cagPAI appears to be associated with duodenal ulceration in European and Asian populations15 16 but patients with peptic ulcer disease sometimes harbour H pylori strains with partial or complete deletions of thecagPAI15 and a significant number (53–82%) of isolates from patients without ulcers have intactcagPAIs.15-17 Little is known about the structure of the cagPAI in patients with gastric adenocarcinoma or its structure in African populations.

South African H pylori isolates from Cape Town are characterised by the universal presence ofcagA 18 but there is no information on the structure of the cagPAI. However, South African isolates have differences invacA, the gene encoding the vacuolating cytotoxin.18 19 In the current study, we investigated the presence and structural organisation of key regions of thecagPAI. The regions targeted and tested using specific polymerase chain reaction (PCR) assays are the functionally important cagA,cagE, and cagMgenes in the cagI region,20 21and three cagII marker genes,cagT andcag6–7 21 (fig 1). Strains which were PCR negative for these cagIImarker genes were analysed for the entirecag5–10 region (which includes thevirD4 homologue,cag10) as well as for the downstream gene cag13. In addition, whether thecagI and cagIIportions of the PAI were contiguous was tested by PCR amplifying their putative junction using primers derived from each side. We hypothesised that H pylori isolates from different clinical diseases would exhibit variability in theircagPAI and that there would be an association between the type of cagPAI and type of vacA allele.

Figure 1

Schematic diagram of the cag pathogenicity island with the putative junction site between cagQ∼S where IS605 may be located (vertical arrow, x) noted in NCTC11638. Genes and positions are from GenBank accession number AC000108 (cagII) and U60176 (cagI). Polymerase chain reaction amplimers analysed in this study are denoted by bold horizontal lines beneath the genes.

Materials and methods


A total of 109 H pylori isolates were examined from 86 patients. Demographic data and endoscopic and histological diagnoses were recorded for all patients (table 1). Sixty seven (78%) patients had a single strain isolated. Nineteen (22%) patients had two or three non-identical isolates, as assessed by different random amplified polymorphic DNA (RAPD) PCR fingerprint patterns. Fifty six of the strains had previously had theirvacA types determined in an earlier study.18 Reference strains 26695 isolated from a patient in the UK with gastritis22 and J99 isolated from a US patient with duodenal ulcer disease23 were used as positive controls.

Table 1

Clinical profiles


Oligonucleotide primers used for PCR are listed in table 2. Two sets of primers were used for each gene examined. PCR amplification was performed as previously described.18 24 Following initial denaturation at 94°C for one minute, each reaction consisted of 35 cycles of denaturation at 94°C for one minute, annealing and extension at an appropriate temperature for 2–4 minutes, and final extension at 72°C for 10 minutes. Annealing temperatures were set at 50°C for cag2/4 and picBF/R; 53°C for cag5/6, cag7/12, cag8/9, cag10/11, cag13/14, cag13F/R, cagTF/R, cagTF2/QR, LECF/cag10R, and cagEF/R; and 55°C for F1/B1, cagMF/R, and cag67F/R.15 16 25 26 Each PCR mixture (20 μl) was subjected to gel electrophoresis on 1% agarose gels and either a 100 bp or 0.12–23.1 kbp DNA ladder (Roche Diagnostics, Johannesburg, South Africa) was used as a size marker. Long distance PCR was performed with the Expand PCR System (Roche Diagnostics), as recommended by the supplier.27

Table 2

Primers used to identify cagPAI genes


The 3′ region of cagA, which adequately discriminates between “Asian” and “non-Asian” strains,28 was sequenced in 12 South African isolates (two Black, seven Cape coloured, three White). PCR products (primer set cag2/4)25 were gel extracted (Qiaex II gel extraction kit; Qiagen, Cape Town, South Africa) and sequenced on an ABI Prism 377 automated sequencer (ABI, Foster City, California, USA) using the ABI Prism BigDye terminator cycle sequencing reagent kit with AmpliTaq DNA polymerase FS (PE Biosystems, Johannesburg, South Africa), as described previously.29 PCR and direct sequencing were performed at least twice to determine DNA sequences for each strain.


DNA and protein sequences were analysed using the National Center for Biotechnology Information (NCBI) server (USA), and Internet based searches were preformed at NCBI, the Institute for Genome Research (TIGR, Maryland, USA) and Astra-Zeneca (Boston, USA).


Data (n=109) were analysed using the χ2 test or Fisher's exact test, as appropriate. Probability levels <0.05 were considered statistically significant.



All samples from patients with clinically significant disease (peptic ulcer disease, gastric adenocarcinoma) were obtained as part of the clinical protocols. Use of non-steroidal anti-inflammatory drugs was an exclusion criterion in all contributing studies. All 30 patients with peptic ulcer disease had active duodenal ulceration at the time of endoscopy. Of the 19 patients with gastric adenocarcinoma, 10 had carcinoma in the antrum, two in the antrum and body, and two in the body alone. In five cases the primary site could not be determined. Tumours were classed histologically as mixed in two cases, poorly differentiated in three, and unknown in six, while four were intestinal and four were diffuse. Patients with no clinically significant upper gastrointestinal disease (gastritis alone) were drawn from two populations. Twenty six (70%) were asymptomatic volunteers receiving no prescribed medication for dyspepsia and 11 were derived from a group of non-ulcer dyspeptics, known not to be receiving potent acid suppressive therapy. Of the 37 patients with gastritis alone, 29 had histological biopsies from the antrum and corpus, allowing for classification of their pattern of gastritis. Of these, 19 (66%) had pangastritis and four (14%) had antral predominant gastritis. Intestinal metaplasia was present in five patients. None of these patients had significant gastroduodenal pathology.


Analysis of cagA in 12 South African isolates demonstrated no significantly greater differences in homology between isolates from different ethnic groups (83–94% identity) than between isolates from the same ethnic group.cagA in all strains demonstrated significantly closer homology with cagA from the European strains (HP0547 and JHP495; 73–94% identity) than withcagA from the Japanese strains JK25 (GenBank accession number AF043487; 52–57% identity), JK253 (GenBank accession number AF043489; 33–44%), and JK271 (GenBank accession numberAF043488; 38–43%). These results confirm that South AfricancagA in this study was “non-Asian” in origin and strongly suggests that thesecag + strains share more sequence identity with European15 17 than with Japanese strains.16


H pylori isolates were analysed with two different primer sets for the cagA(HP547), cagE (HP544), andcagM (HP537) genes of thecagI region. Agreement between primer sets for cagA was 97%,cagE 100%, andcagM 98%. Isolates were considered positive for a gene if one primer set gave successful PCR amplification. The reference strains 26695 and J99 were positive for each primer combination used. Overall, 88 (81%) of 109 strains had all thecagI marker genes. All 109 of the strains were cagA + using primers F1/B1 and cag2/4 (fig 2). The gene cagE was present in significantly more strains isolated from patients with peptic ulceration (34 of 36 (94%); p<0.009) and gastric adenocarcinoma (24 of 26 (92%); p<0.04) than strains isolated from patients with gastritis alone (34 of 47 (72%)). The genecagM was present in 85% of isolates from patients with gastritis alone, 94% of peptic ulcer disease isolates, and 96% of gastric adenocarcinoma isolates (NS). There were no statistically significant differences in the presence ofcagE or cagMeither between ethnic groups or between gastritis alone patients with or without intestinal metaplasia (data not shown).

Figure 2

Schematic diagram of cagI with the polymerase chain reaction amplimers of the three genes analysed (top). Genes and positions are from GenBank accession number U60176. Prevalence of target genes in the different disease groups is shown below. *p<0.009, †p<0.05 versus gastritis alone; Fisher's test.

These results demonstrate that functionally important elements of thecagI region (cagA–cagM) were present in the majority of strains, irrespective of disease status. However,cagE (which has previously been shown to be essential for CagA translocation and phosphorylation14) was absent in 25% of isolates (from all ethnic groups) from patients with no clinically significant disease.


H pylori isolates were analysed with two different primer sets for cagT (HP532), which is present at the 3′ end of the cagIIregion, and with one primer set for cag6–7(HP520–521) present at the 5′ end of this region. Agreement between primer sets for cagT was 87%. Overall, 66 (61%) of 109 strains were bothcagT + andcag6–7 +. Thirty five (97%) of 36 isolates from patients with peptic ulceration werecagT + compared with 24 (92%) of 26 isolates from patients with gastric adenocarcinoma and 30 (64%) of 47 isolates from patients with gastritis alone (p<0.0002v peptic ulcer disease; p<0.007v gastric adenocarcinoma) (fig 3). Significantly more isolates from peptic ulcer patients (29 of 36 (81%)) and gastric adenocarcinoma patients (19 of 26 (73%)) werecag6–7 + compared with isolates from patients with gastritis alone (21 of 47 (45%); p<0.0009v peptic ulcer disease; p<0.02v gastric adenocarcinoma). Both reference strains produced the expected size cagT andcag6–7 amplicons.

Figure 3

Schematic diagram of cagII with the polymerase chain reaction amplimers of the three genes analysed (top). Genes and positions are from GenBank accession number AC000108. Prevalence of target genes in the different disease groups is shown below. *p<0.0002, **p<0.0009, †p<0.007, ††p<0.05 versus gastritis alone; Fisher's test.

To exclude sequence heterogeneity at primer annealing sites in the 5′ region of the cagPAI as a reason for the negative PCR results, 40 isolates which were PCR negative forcag6–7 were also examined forcag5–10 (HP519–524). This region includescag10, a virD4 homologue and putative toxin, and should result in a PCR product of 3370 bp (predicted from GenBank accession number AC000108). Fourteen (35%) of 40 isolates gave no PCR product, and these are further examined below. The other 26 (65%) ofcag6–7 isolates gave PCR amplimers ranging in size from 880 to 2902 bp compared with the expected 3370 bp obtained in all 10cag6–7 + isolates (and in the two reference strains) tested, confirming deletions in thecagII region. To define if the deletedcagII region extended beyond cag10, we next tested whether an additional cagII marker gene,cag13 (HP527), which is betweencag10 and cagT, was present in these 26 isolates. Twenty five (96%) of these 26 isolates were cag13 + (as were all 10 cag6–7 + isolates tested as positive controls, and the two reference strains). Long PCR using primers to examine the entire cagII region (cag5-cagT) was performed on the one isolate negative for cag13. This should result in a PCR product of 18 402 bp (predicted from GenBank accession numberAC000108). A product of 8396 bp was amplified. These results suggest that 26 isolates, which were all PCR negative (using two different primer sets) for cagT, contain a partialcagII.

Thereafter we investigated 14 isolates which appeared to have lost the 5′ region of cagII (PCR negative for bothcag6–7 andcag5–10) using long PCR to examine the entire cagII region (cag5-cagT). PCR amplicons of 280–3105 bp (compared with a product of 18 402 bp predicted from GenBank accession number AC000108) were generated in 10 of the 14 isolates. These results suggest that these isolates had lost the majority of thecagII region. Analysis of the remaining four isolates which gave no PCR amplimers with these primers demonstrated that one isolate had lost all the genes tested to the left ofcagM; one isolate, all genes tested to the left of cagE, and two isolates only contained cagA.

Finally, we related these results to clinical status. This showed that a complete intact cagII region (cag6-cagT) was present in significantly more isolates from patients with peptic ulcer disease (81%) than in patients with gastritis alone (38%; p=0.0001). There were no differences in the distribution of this marker in gastritis alone patients with dyspepsia (36%) or without symptoms (46%). Interestingly, however, the five gastritis alone patients with intestinal metaplasia also had an intactcagPAI. Seventy three per cent of isolates from patients with gastric adenocarcinoma had a complete intactcagII (p=0.004 vgastritis alone). Some elements of the cagIIregion (cag10-cagT) were present in all gastric adenocarcinoma isolates whereas 11 isolates from patients with gastritis alone lacked the entire cagIIregion (p<0.006 v gastritis alone). Further analysis demonstrated no significant relationship between different ethnic populations and the presence of an intactcagII region (χ2=8.08, p=0.09).


The primer combinations of cag7/12 andcagTF/QR test for the presence of thecagQ∼S/T genes (HP535–532) and also for joined cagI andcagII regions. Agreement between primer sets for cagQ∼S/T was 78%. This is lower than for cagI but reflects the fact that these primer combinations identify different 5′ genes. PCR amplification using these primer sets resulted in the expected amplicon sizes in both 26695 and J99. Seventy five (85%) isolates with all the marker genes for the cagI region werecagQ∼S/T +. Significantly morecagI isolates from patients with peptic ulceration (31 of 34 (91%); p<0.02) and gastric adenocarcinoma (23 of 23 (100%); p<0.002) werecagQ∼S/T + compared with isolates from patients with gastritis alone (21 of 31 (68%)). Sixty five (98%) of 66 isolates with all the marker genes for thecagII region werecagQ∼S/T +. Overall, sixty five (60%) isolates all had markers for cagI,cagII, and a contiguous pathogenicity island; 77% of isolates from patients with peptic ulcer disease (p<0.001) and 73% of isolates from patients with gastric adenocarcinoma (p<0.01) compared with only 18 (40%) isolates from patients with gastritis alone.

Twenty eight isolates did not have an amplifiablecagQ∼S/T product. One of the isolates was also cagT, cag13, and cag6–7 PCR positive. This suggests that this isolate (from a patient with peptic ulcer disease) had a complete but spatially separated pathogenicity island. Sixteen of the remaining 27 cagQ∼S/T - isolates (59%) had deletions in the cagI region while 26 (96%) had deletions in the cagII region. These results suggest that the presence of acagQ∼S/T amplimer may be an alternate marker for the cagII region.


While 26695 had an intact IS605sequence (tnpA+tnpB) detected by PCR and J99 only exhibited tnpA, only three (3%) South African isolates were PCR positive forIS605. All three had all the marker genes for cagPAI and yielded PCR products with thecag7/12 andcagTF/QR primer sets, suggesting that the complete insertion sequence was elsewhere in the genome. Analysis of the cagPAI deletion end points in the 28 isolates which did not have an amplifiablecagQ∼S/T gene product demonstrated thatIS605 (tnpA) was present in six (27%), suggesting that this insertion sequence may play a role incagPAI disruption in a proportion of isolates.


As previously demonstrated,18 19 vacA s1 occurred significantly more often in isolates from patients with peptic ulceration (p<10−5) or gastric adenocarcinoma (p<2×10−4) whilevacA s2 invariably occurred in patients with gastritis alone. In addition, the vacA m1 mid region type was present more often in patients with clinically significant disease (p<0.01).

When analysing the data by vacA status (table 3), there was a significant difference in the distribution ofvacA allele types between the differentcagPAI groups (χ2=52.76; p<10−5). A strong association was noted betweenvacA s1m1 and strains containing the complete cagPAI. Fifty two (78%) of 66vacA s1m1 strains had all the genetic markers (cag6-cagA) compared with two of 18 (12%; p<10−5) vacA s2m2 strains. Sixteen of 18 vacA s2m2 strains (88%) had partial deletions of the cagPAI (p<0.0001 v vacAs1m1; p<0.005 v vacA s1m2).

Table 3

Relationship between cagPAI gene markers and vacA alleles

There was also a strong association between specificvacA alleles and absence of contiguouscagPAI. Significantly morevacA s2 isolates (13 of 20 (65%)) werecagQ∼S/T PCR negative compared with 14 of 83 (17%; p<0.0001) vacA s1 strains, suggesting that the vacA s2 allele may be a associated with loss of this region.


Our results indicate that the cag PAI, using a subset of previously defined functionally important marker genes, appears to be complete and contiguous (cag6-cagA) in 60% of South AfricanH pylori isolates. However, more than one third of isolates had non-amplifiable gene regions in the island, despite possessing cagA. The caveat of over interpretation of PCR results is recognised, as this methodology can fail for a number of reasons, including unexpected sequence divergence at primer annealing sites. Such heterogeneity could conceivably explain our findings although our interpretation of deletion ofcag regions is much more likely for several reasons. Firstly, we took care to use two established different primer sets per gene site to confirm negative forms. Secondly, these primer combinations have previously been used to identifycag genes in European populations,15 17 and South African H pylori cag + isolates share more sequence homology with European than “Asian” strains (this study and Achtman and colleagues30). Thirdly, “missing” gene markers were almost invariably adjacent which could readily be explained by missing regions ofcag but not by differences in primer annealing.

The insertion sequence IS605 replaced the deleted regions of the cagQT in 25% of strains with deletions. This supports the hypothesis that this DNA element may generate rearrangements and deletions in a small proportion of South African strains to result in subpopulations of organisms with differences in virulence.20 21 How deletions occur in other strains is unclear. Irrespective of the mode of genetic deletion, strains that carry thecagA gene but have internal deletions in thecagPAI should probably be classified as functionally cag rather thancag +. This is evidenced by the fact that such strains are often not associated with disease (current study) and that many different artificial mutants incag genes reduce the ability ofcag + strains to induce IL-8 secretion from epithelial cells.20

While the majority of isolates (∼80%) from patients with peptic ulceration had all the marker genes for a completecagPAI, less than half (∼40%) of isolates from patients with gastritis alone had this pathogenic fingerprint. All (100%) tested strains, irrespective of the organisation of the PAI, were cagA +. Data on thecagPAI from the rest of Africa are scarce but CagA appears to be commonly expressed.31 Our data suggest dissociation between the presence ofcagA and other genes in thecagPAI island and further that analysis of the PAI may be a prerequisite for investigation of relationships with gastroduodenal disease processes. The presence of an intactcagPAI in the majority of South African peptic ulcer disease isolates supports a role for the genes in this island in the pathogenesis of this disease.

Analysis of the distribution of the virulence associatedvacA alleles demonstrated that subtype s1/m1 was strongly associated with a complete island while subtype s2/m2 was associated with deletions in cagII. These findings suggest a functional association between an active s1/m1 vacuolating cytotoxin and an intact cagpathogenicity island. This is supported by the observation of a significantly negative relationship between the virulence associatedvacA s1 allele and deletions incagII.

The association between genes in the cagPAI other than cagA and gastric adenocarcinoma has not previously been reported. Our results demonstrate significant differences between isolates from patients with gastritis alone and those with this disease. Specifically, an intact frequently contiguous PAI was found more often in patients with cancer than in patients without clinically significant disease. The high prevalence of this marker made subanalysis of the gastric cancer types inappropriate. Interestingly, an intact cagPAI was found more often in gastritis alone patients who had intestinal metaplasia. Intestinal metaplasia is known to be a risk factor for gastric adenocarcinoma.

The finding of a conserved pathogenicity element (cagPAI) shared by isolates from patients with gastric adenocarcinoma and peptic ulcer disease is of interest given the inverse relationship between these two diseases. That strains with an intact cagPAI are present in the older gastric cancer group and the younger peptic ulcer group suggests that the presence of an intact PAI in infecting strains is not due to an age cohort effect. It also suggests that an intact PAI, while being a marker for more severe mucosal damage, may not be a specific marker for either of these diseases. This is entirely compatible with the current understanding of the multifactorial nature of their pathogenesis where the distribution of gastric mucosal damage may be determined by factors such as host genetics and the environment.4 An alternative or additional hypothesis however is that other bacterial elements may be important. For example, it is possible that while an intact PAI (type IV secretion system) is necessary to deliver CagA into epithelial cells it is the structure of CagA which determines which intracellular pathway (secretory or proliferative) a cell undergoes.32

The presence of a contiguous cagPAI in 60% of patients suggests that it is conserved in most South Africans as it is in most European and almost all Asian populations.15-17 The importance of genes in this island (particularly in cagII) to the pathogenesis of gastroduodenal disease in South Africa is however suggested by the prevalence of deletions in the 5′ region in patients with gastritis alone. In addition, the strong relationship between the virulentvacA type s1/m1 and the entire PAI would seem to support the importance of both of these elements to disease pathogenesis although it appears that the relationship with clinically significant disease is stronger for vacAalleles than for an intact cagPAI. The finding of similar pathogenic elements from strains isolated from patients with either peptic ulcer disease or gastric adenocarcinoma however indicates that further work is required to differentiate the relationship between specific genes, host factors, and disease processes.


This study was partially funded by a David and Freda Becker Trust Award (JAL) and an Abbott-SAGES Award (MK). MK is a recipient of the Claude Harris Leon Foundation Fellowship.


View Abstract


  • Abbreviations used in this paper:
    pathogenicity island
    polymerase chain reaction

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.