- Split View
-
Views
-
Cite
Cite
John M. Archibald, John M. Logsdon Jr., W. Ford Doolittle, Origin and Evolution of Eukaryotic Chaperonins: Phylogenetic Evidence for Ancient Duplications in CCT Genes, Molecular Biology and Evolution, Volume 17, Issue 10, October 2000, Pages 1456–1466, https://doi.org/10.1093/oxfordjournals.molbev.a026246
- Share Icon Share
Abstract
Chaperonins are oligomeric protein-folding complexes which are divided into two distantly related structural classes. Group I chaperonins (called GroEL/cpn60/hsp60) are found in bacteria and eukaryotic organelles, while group II chaperonins are present in archaea and the cytoplasm of eukaryotes (called CCT/TriC). While archaea possess one to three chaperonin subunit–encoding genes, eight distinct CCT gene families (paralogs) have been characterized in eukaryotes. We are interested in determining when during eukaryotic evolution the multiple gene duplications producing the CCT subunits occurred. We describe the sequence and phylogenetic analysis of five CCT genes from Trichomonas vaginalis and seven from Giardia lamblia, representatives of amitochondriate protist lineages thought to have diverged early from other eukaryotes. Our data show that the gene duplications producing the eight CCT paralogs took place prior to the organismal divergence of Trichomonas and Giardia from other eukaryotes. Thus, these divergent protists likely possess completely hetero-oligomeric CCT complexes like those in yeast and mammalian cells. No close phylogenetic relationship between the archaeal chaperonins and specific CCT subunits was observed, suggesting that none of the CCT gene duplications predate the divergence of archaea and eukaryotes. The duplications producing the CCTδ and CCTϵ subunits, as well as CCTα, CCTβ, and CCTη, are the most recent in the CCT gene family. Our analyses show significant differences in the rates of evolution of archaeal chaperonins compared with the eukaryotic CCTs, as well as among the different CCT subunits themselves. We discuss these results in light of current views on the origin, evolution, and function of CCT complexes.
Introduction
Chaperonin-mediated protein folding is a universal cellular process (reviewed in Bukau and Horwich 1998 ). Chaperonins are multisubunit double-ring complexes that harbor nascent or denatured polypeptides in their central chamber and facilitate protein folding through the hydrolysis of ATP (Ranson, White, and Saibil 1998 ; Sigler et al. 1998 ). Eukaryotic cells possess two distantly related (but clearly homologous) chaperonin classes with different evolutionary histories. Bacterial-type (group I) chaperonins, called cpn60 or hsp60, reside in eukaryotic organelles, while archaeal-type chaperonins (group II; called CCT [chaperonin-containing TCP-1] or TriC [TCP-1 ring complex]) are present in the eukaryotic cytosol (Trent et al. 1991 ; Frydman et al. 1992 ; Willison and Kubota 1994 ; Kubota, Hynes, and Willison 1995a ).
Crystal structure comparisons of group I and group II chaperonins reveal remarkable structural conservation (Ditzel et al. 1998 ). There are, however, significant differences between the two chaperonin types. While group I chaperonins utilize the co-chaperonin GroES/cpn10 in the protein-folding process, no such homolog functions in the group II chaperonin complex. Instead, an extended “apical domain,” present in the group II chaperonins but absent in the group I chaperonins, is thought to cap the central cavity in a manner analogous to GroES/cpn10 (Klumpp, Baumeister, and Essen 1997 ; Horwich and Saibil 1998 ; Llorca et al. 1999b ). Recent experiments suggest that novel co-chaperonins, unrelated to GroES/cpn10, interact with CCT to assist protein folding (Gebauer, Melki, and Gehring 1998 ; Geissler, Siegers, and Schiebel 1998 ; Vainberg et al. 1998 ; Siegers et al. 1999 ). Group I and group II chaperonins also differ in the number of subunits present in each chaperonin ring. Escherichia coli GroEL, the archetypal bacterial chaperonin, has a double-ring structure with seven subunits per ring (Braig et al. 1994 ), while archaeal and eukaryotic cytosolic chaperonin complexes are composed of eight- or nine-membered rings (reviewed in Willison and Horwich 1996 ; Klumpp and Baumeister 1998 ; Gutsche, Essen, and Baumeister 1999 ).
The most unusual feature of the group II chaperonins is their hetero-oligomeric composition. Unlike the homo-oligomer GroEL, archaeal chaperonins are often composed of several different (but homologous) subunits. We concluded previously that in the chaperonin complexes of archaea, hetero-oligomerism likely evolved multiple times independently (Archibald, Logsdon, and Doolittle 1999 ). In archaeal genomes, duplicate chaperonin genes (paralogs) are often more similar to each other than to those in other archaea, suggesting recent (lineage-specific) duplication. Compared with archaeal chaperonins, the eukaryotic CCT is even more hetero-oligomeric. This was first suggested biochemically (Frydman et al. 1992 ; Lewis et al. 1992 ), and subsequent sequence comparisons of CCT genes in mouse confirmed the existence of eight distinct subunit species (α, β, γ, δ, ϵ, η, 𝛉, and ζ), each thought to occupy a unique position in the eight-membered CCT rings (Kubota et al. 1994 ; Kubota, Hynes, and Willison 1995a ; Liou and Willison 1997 ). The divergent nature of these genes, as well as the discovery of clear yeast orthologs to each of the mouse subunits (Kim, Willison, and Horwich 1994 ; Kubota et al. 1994 ; Stoldt et al. 1996 ), suggests an ancient paralogy within eukaryotes.
Trichomonas vaginalis and Giardia lamblia, members of the parabasalids and the diplomonads, respectively, are two parasitic unicellular eukaryotes. Originally, based on their lack of mitochondria and ultrastructural simplicity, parabasalids and diplomonads were suggested to represent early-diverging eukaryotic lineages (for recent review see Roger 1999 ). These lineages (and several others) were called “Archezoa” by Cavalier-Smith (1987) and were proposed to have diverged from other eukaryotes prior to the bacterial endosymbiosis that gave rise to mitochondria (i.e., they were primitively amitochondriate). Consistent with this idea, phylogenies of small-subunit ribosomal RNA (SSUrRNA) and several proteins placed Trichomonas and Giardia among the deepest branches on the eukaryotic tree (Sogin et al. 1989 ; Leipe et al. 1993 ; Hashimoto et al. 1994 ; Stiller, Duffield, and Hall 1998 ). However, the discovery of group I (i.e., bacterial/ mitochondrial) chaperonin genes in the nuclear genomes of these and other amitochondriate eukaryotes (Clark and Roger 1995 ; Roger, Clark, and Doolittle 1996 ; Roger et al. 1998 ) suggests that these organisms once possessed mitochondria (or their progenitors) and that mitochondrial absence is a derived feature. Furthermore, confidence in the deepest branches of phylogenetic trees has recently been shaken. Like parabasalids and diplomonads, the microsporidia (other amitochondriate members of Cavalier-Smith's [1987] Archezoa) also branched deeply in SSUrRNA and elongation factor trees (Leipe et al. 1993 ; Kamaishi et al. 1996 ). However, other data (Keeling and Doolittle 1996 ; Germot, Philippe, and Le Guyader 1997 ; Hirt et al. 1997, 1999 ) have shown that microsporidia are, in fact, relatives of fungi, and that their deep placement in phylogenetic trees was artifactual due to a fast rate of sequence evolution. Furthermore, Hirt et al. (1999) suggested that support for the placement of Trichomonas and Giardia among the deepest eukaryotic groups might also be suspect, although no alternate placement for these organisms with respect to other eukaryotes is evident.
Is the presence of eight CCT subunits a universal feature of eukaryotic cells? An early eukaryotic lineage that diverged from other eukaryotes prior to multiple CCT gene duplications might be expected to possess a smaller and/or different complement of CCT genes. The taxonomic diversity needed to address this question is, however, currently lacking. To this end, we sought to (1) increase the phylogenetic diversity of known CCT genes, (2) perform phylogenetic analyses of archaeal and eukaryotic chaperonins to determine when during eukaryotic evolution (and in what order) the gene duplications that gave rise to the CCT subunits occurred, and (3) address specific hypotheses regarding the origin and evolution of CCT from an archaeal-like homo- or moderately hetero-oligomeric chaperonin complex ancestor. Previous comparative sequence analyses (Kubota et al. 1994 ) have indicated that a completely hetero-oligomeric CCT was present in the common ancestor of animals and fungi. Our results push back the origin of the CCT gene duplications to the common ancestor of animals, fungi, plants, parabasalids, and diplomonads, and likely to the common ancestor of all extant eukaryotes. While the exact position of the archaeal root to the eukaryotic CCTs is ambiguous, no close phylogenetic relationship between the archaeal chaperonins and specific eukaryotic CCT subunits was observed, suggesting that the eukaryotic CCT complex became hetero-oligomeric independent of the archaeal chaperonins. The gene duplications producing the CCTδ and CCTϵ subunits, as well as those in the CCTα/CCTβ/CCTη clade, represent the most recent duplications of the CCT gene family.
Materials and Methods
Trichomonas and Giardia Genomic DNAs
Genomic DNA from T. vaginalis strain NIH-C1 (ATCC#30001) was a gift from M. Müller (Rockefeller University, New York). Genomic DNA was isolated as described previously (Roger, Clark, and Doolittle 1996 ) from G. lamblia cells (strain WB; ATCC#30957) provided by A. Roger and D. Edgell.
Cloning and Sequencing of Trichomonas and Giardia CCT Genes
Degenerate PCR primers were designed based on an alignment of published archaeal and eukaryotic chaperonin protein sequences (forward primers: CCT-1-for [5′-TACGGTGAYGGNACNAC-3′], CCT-5-for [5′-GAAATCGGNGAYGGNAC-3′], CCT-9-for [5′-CCAGTCGGTCTNGAYAARATG-3′]; reverse primers: CCT-3-rev [5′-TGGAGCTCCNSCNCCNG-3′], CCT-4-rev [5′-CTCTACAGCNCCNSCNCC-3′], CCT-7-rev [5′-ACGATGCACATNGHRTCRTG-3′]). PCR reactions were carried out under standard conditions (Gibco BRL Taq polymerase, buffer and dNTP, Ericomp and MJ Research Inc. PTC-100 thermal cyclers), with 40–45 cycles of 92°C for 30 s, 50°C for 30 s, and 72°C for 30–60 s. PCR products of the expected size were isolated (BIORAD, Prep-a-gene) and cloned (TA cloning kit, Invitrogen), or they were cloned directly from low-melt agarose (TA-TOPO cloning kit, Invitrogen). PCR products were sequenced manually (T7 sequencing kit, Pharmacia). Multiple independent PCR and genomic library clones were sequenced using LiCor and ABI automated sequencers.
The Trichomonas chaperonin genes presented in this study were obtained using the following primer combinations: Trichomonas Ccta: CCT-1-for/CCT-4-rev, CCT-5-for/CCT-4-rev, CCT-9-for/CCT-CCT-7-rev; TrichomonasCCTd: p80-4B (5′-CTGCCATTYGTGGCNATG-3′)/P80-5 (5′-AGCGATGAACTTNARDAT-3′); Trichomonas Cctg: CCT-1-for/CCT-4-rev; Trichomonas Cctz: CCT-5-for/CCT-3-rev, CCT-5-for/CCT-4-rev, CCT-5-for/CCT-7-rev. A Trichomonas cDNA library clone encoding a protein with significant similarity to CCTη was a gift from R. Hirt and M. Embley.
For Giardia, a portion of the Cctd gene was obtained with degenerate PCR primers (CCT-1-for/CCT-4-rev; see above). A recent sequence survey of the G. lamblia genome (Smith et al. 1998 ) revealed the presence of coding regions with similarity to several additional CCT subunits. Exact-match PCR primers were designed based on preliminary genome sequence data from Giardia and, in combination with degenerate primers (above), were used to amplify multiple CCT genes (Giardia Ccta: GL.alpha.for.1 [5′-GTAGACATGCTTGTCTGCAG-3′]/GL.alpha.rev.1 [5′-GTCGTGTATGCTCTAGTAGC-3′]; Giardia Cctb: GL.beta.for.1 [5′-CCATAGCTGAGTTATAGATG-3′]/GL.beta.rev.2 [5′-TAATCTTGTCAGAGTCCATG-3′], CCT-1-for/GL.beta.rev.3 [5′-AGGTGCACAGCTTATTATGC-3′]; Giardia Cctg: CCT-5-for/GL.gamma.rev.1 [5′-TCCGCAGAACCATACGCCAG-3′]; Giardia Ccte: GL.eps.for.1 [5′-ATGATTAGTATCTCTCAGTG-3′]/GL.eps.rev.1 [5′-GCTGAACGATCGTTGTCATG-3′]; Giardia Cctq: GL.theta.for.1 [5′-TTCTTCCATGATGAAGGTCG-3′]/GL.theta.rev.1 [5′-GACCACGTACTGCTCTAGAC-3′]; Giardia Cctz: GL.zeta.for.1 [5′-AGAATTTCATGTCTGCTATC-3′]/GL.zeta.rev.1 [5′-TGCTCAGAACGTGGTATCTG-3′]). Preliminary sequence data from the Giardia Genome Project was obtained from the Josephine Bay Paul Center Web site at the Marine Biological Laboratory (www.bpc.mbl.edu). Sequencing was supported by the National Institute of Allergy and Infectious Diseases using equipment from LI-COR Biotechnology. The sequences presented in this study have been submitted to GenBank under the accession numbers AF226714–AF226726.
Trichomonas vaginalis Genomic Library Screening
PCR products of Trichomonas CCT genes were isolated (BIORAD, Prep-a-gene), labeled with α32P (Prime-It II random primer labeling kit, Stratagene), and used as probes to screen a T. vaginalis genomic library (Lambda ZapExpress, Stratagene; constructed previously by N. Fast and J. Logsdon). Genomic library clones containing full-length Ccta, Cctd, and Cctz genes, as well as a 5′-truncated clone containing Cctg, were obtained.
Phylogeny
Based on an alignment of group II chaperonin protein sequences constructed previously (Archibald, Logsdon, and Doolittle 1999 ), a larger alignment containing diverse bacterial/organellar, archaeal, and eukaryotic cytosolic chaperonin sequences (i.e., group I and group II chaperonins) was constructed and adjusted manually, taking into consideration published alignments (Kim, Willison, and Horwich 1994 ; Kubota et al. 1994 ; Waldmann et al. 1995 ) and crystal structures (Ditzel et al. 1998 ). Amino acid sequences inferred from the Trichomonas and Giardia CCT genes were added manually based on globally conserved regions. From this master alignment, smaller alignments containing subsets of sequences (e.g., bacteria + archaea + eukaryotes, archaea + eukaryotes, eukaryotes only) were constructed and used for phylogenetic analysis. For eukaryotes + archaea, the alignment consisted of 355 unambiguously aligned amino acid positions and included 10 archaeal sequences (six euryarchaeotes, four crenarchaeotes) and five sequences from each of the eight eukaryotic CCT paralogs (partial sequences presented here [CCTγ from Trichomonas and Giardia, Giardia CCTδ] were excluded in order to maximize the number of sites). For bacteria + archaea + eukaryotes, the alignment included 15 diverse bacterial sequences, 10 representative archaeal sequences, and the same eukaryotic sequences as above. This alignment contained 227 amino acid sites and corresponded primarily to the universally conserved ATP-binding/ATPase domains. Alignments of individual CCT subunits were used to estimate site-by-site evolutionary rates and the proportion of invariant sites (see below), and they contained the following taxa and number of sites: CCTα—338 sites (Homo, Xenopus, Arabidopsis, Drosophila, Dictyostelium, Schistosoma, Caenorhabditis, Tetrahymena, Saccharomyces, Trichomonas, and Giardia); CCTβ—358 sites (Homo, Saccharomyces, Schizosaccharomyces, Caenorhabditis, Plasmodium, and Giardia); CCTδ—361 sites (Homo, Caenorhabditis, Fugu, Saccharomyces, Schizosaccharomyces, Glycine, and Trichomonas); CCTϵ—360 sites (Homo, Caenorhabditis, Drosophila, Plasmodium, Saccharomyces, Avena, Cumcumis, Arabidopsis, and Giardia), CCTη—360 sites (Homo, Caenorhabditis, Saccharomyces, Schizosaccharomyces, Plasmodium, Tetrahymena, and Trichomonas); CCTγ—287 and 353 sites (Homo, Caenorhabditis, Xenopus, Drosophila, Arabidopsis, Tetrahymena, Oxytricha, Saccharomyces, Schizosaccharomyces, and Leishmania; with and without Trichomonas, Giardia, respectively); CCT𝛉—359 sites (Homo, Caenorhabditis, Candida, Saccharomyces, Schizosaccharomyces, Tetrahymena, and Giardia); CCTζ—360 sites (Homo [zeta1, zeta2], Caenorhabditis, Drosophila, Saccharomyces, Schizosaccharomyces, Trichomonas, and Giardia). Where missing data precluded site rate calculations (e.g., partial Trichomonas and/or Giardia sequences), amino acid positions were considered slowly evolving if they were present in all taxa in a particular subunit alignment. Conservative amino acid substitutions were also taken into consideration. All alignments are available from J.M.A. (email: jmarchib@is2.dal.ca).
Phylogenetic trees were inferred using maximum parsimony (MP), distance-based, and maximum-likelihood (ML) methods of tree reconstruction with the following programs: MP in PAUP*, version 4.0 (Swofford 1998 ); distance, PROTDIST (PAM matrices), NEIGHBOR, and FITCH in PHYLIP, version 3.57 (Felsenstein 1993 ); ML, protML using the JTT-F model in MOLPHY (Adachi and Hasegawa 1996 ); and quartet puzzling using PUZZLE, versions 4.0 and 4.02 (Strimmer and von Haeseler 1997 ). Statistical support for MP and distance-based trees was obtained by bootstrapping with either 100 or 1,000 resampling replicates. Quartet puzzling support values (from PUZZLE) or RELL values (resampling estimated log likelihoods; obtained by quick-add ML searches of 100 or 1,000 trees in protML [Adachi and Hasegawa 1996] ) were used as measures of support for ML trees. Support values for ML-distance analyses were obtained by bootstrapping (500 replicates) with PUZZLEBOOT, version 1.02 (A. Roger and M. Holder; http://members.tripod.de/korbi/puzzle/). PUZZLE was used to calculate ML distance matrices (using an eight rate category discrete approximation to the Γ distribution plus one invariable rate category), to determine the proportion of constant amino acid positions in alignments, and to statistically assess the significance of different tree topologies using the Kishino-Hasegawa test (Kishino and Hasegawa 1989). To estimate site-by-site evolutionary rates, discrete Γ distributions approximated with eight variable-site rate categories were calculated over neighbor-joining or Fitch-Margoliash trees using the JTT-F model of amino acid substitution in PUZZLE.
Results
Identification of CCT Genes in Putatively Ancient Eukaryotic Lineages
We isolated and sequenced partial or complete coding sequences for multiple Cct genes from the parabasalid T. vaginalis (Ccta, Cctg, Cctd, Ccth, and Cctz) and the diplomonad G. lamblia (Ccta, Cctb, Cctg, Cctd, Ccte, Cctq, and Cctz). For Trichomonas, genomic library clones encoding two different Ccta and two different Cctd genes were obtained. While both Cctd genes code for identical proteins (but have many synonymous substitutions), the two Ccta genes encode slightly different proteins. The full-length CCTα-1 from Trichomonas has an unusually short and divergent 3′ end, while the CCTα-2 clone is 5′-truncated and possesses a carboxyl terminus typical of other CCTs. Southern hybridization of a Ccta PCR product to Trichomonas genomic DNA produced multiple hybridizing bands, confirming the presence of at least two genomic copies of this gene (data not shown). No spliceosomal introns were found in any of the genes presented here, consistent with their complete absence from the protein-coding genes described in these organisms thus far. To further increase the taxonomic sampling of CCT sequences for comparative study and phylogenetic analysis, we searched the public sequence databases by BLAST (Altschul et al. 1997 ) using the Trichomonas and Giardia CCT sequences as queries. We obtained complete sets of CCT protein sequences (eight or nine) for humans, mice (Kubota et al. 1994 ; Kubota, Hynes, and Willison 1995b ), yeast, and Caenorhabditis elegans, as well as single or multiple CCT sequences for Plasmodium falciparum, Leishmania major, and a variety of animals, plants, fungi, and ciliates. Several of the C. elegans sequences (CCTγ, CCTη and CCT𝛉; obtained from the Sanger center [http://www.sanger.ac.uk/]) contained unique insertions and/or deletions that were most likely the result of incorrect intron/exon boundary predictions; in each case, we identified alternate splice sites that removed the apparent insertions or added the missing exons. Our data set of group II chaperonins included 35 archaeal sequences and 85 eukaryotic CCTs.
The evolutionary relationship between the Trichomonas and Giardia sequences and other eukaryotic and archaeal chaperonins was examined by constructing phylogenetic trees. Figure 1 shows an unrooted neighbor-joining tree produced from an alignment of 11 representative archaeal sequences and all 85 eukaryotic CCTs, inferred from an alignment of 260 amino acid positions. Most notably, the Trichomonas and Giardia sequences form robust clades with each of the eight different CCT paralogs (100% support with all phylogenetic methods; data not shown). This indicates that (1) the gene duplications producing the paralogs predate the divergence of Trichomonas and Giardia from other eukaryotes, and (2) multiple CCT paralogs have been retained over a large timescale of eukaryotic evolution. It is likely that both Trichomonas and Giardia possess all eight CCT paralogs. Indeed, a portion of the one CCT subunit gene not isolated from Giardia, Ccth, has recently been sequenced by the Giardia Genome Sequencing Project (Smith et al. 1998 ). Figure 1 also shows that the branch leading to the archaeal chaperonins is remarkably short compared with the branches leading to the different CCT subunits. The branch lengths within the various CCT clades also appear variable. To assess the significance of the latter observation, we calculated the percentages of amino acid identity shared between the mouse CCTs and the Caenorhabditis, Saccharomyces, Giardia, and Trichomonas sequences, as well as the proportion of constant amino acid residues found in each individual CCT subunit alignment (see Materials and Methods). The results (fig. 2 ) suggest differences in the degree of conservation of the individual CCT subunits. CCT𝛉 (and to a lesser extent CCTγ) appears to be the least conserved subunit, showing the lowest percentage of identity in all within-ortholog comparisons. Furthermore, only 14.5% of the amino acid residues in the CCT𝛉 alignment were constant (this number dropped further to 9.5% when the divergent Plasmodium CCT𝛉 sequence was included), compared with 21.3%–37.7% constant residues in the other CCT subunit alignments. To statistically assess differences in the substitution rates of the different CCT paralogs, we performed a molecular-clock likelihood ratio test with n − 2 degrees of freedom in PUZZLE (Strimmer and von Haeseler 1997 ) on an ML-distance tree of the eight eukaryotic CCTs (40 representative taxa and 355 sites; see Materials and Methods). A molecular clock for the CCT paralogs was strongly rejected with P < 0.01 (data not shown).
An alignment of the inferred Trichomonas and Giardia protein sequences with mouse CCTs is shown in figure 3 . All of the sequences possess putative ATP-binding/ATP-hydrolysis sequence motifs similar to those described for other chaperonins (Kubota et al. 1994 ) and share significant amino acid identity (41%–58.5%) with mouse CCT homologs. The most striking feature of the alignment is the presence of multiple insertions in the Giardia CCT sequences that are not found in any CCTs characterized thus far. These insertions generally map to regions of variable length; however, the Giardia CCT𝛉 and CCTϵ sequences possess unique insertions (approximately 16 and 9 amino acids, respectively; see fig. 3 ) in a highly conserved region corresponding to a domain present in the bacterial/organellar chaperonins (positions 339–374 of the Escherichia coli GroEL sequence [Ditzel et al. 1998] ) but absent from eukaryotic CCTs and archaeal chaperonins. The significance of these insertions (which presumably occurred independently) in terms of chaperonin subunit structure/function is not known.
It has been noted that the multiple CCT subunits are quite divergent from one another, particularly in their polypeptide-binding domains (Kim, Willison, and Horwich 1994 ). To examine the pattern and degree of conservation in the different CCT subunits more closely, we estimated the rate of evolution at amino acid sites across individual subunit alignments that contained maximal taxonomic diversity (see Materials and Methods). When these site rates were mapped onto an alignment containing all of the CCT subunits (paralogs), three general categories of amino acid sites were apparent: (1) conserved (slowly evolving) and identical amino acid residues present in multiple subunits (e.g., the ATPase domains), (2) conserved but different amino acid residues present in different subunits, and (3) poorly conserved/fast-evolving residues (i.e., little or no evolutionary constraint) present in one or multiple subunits. The results are presented in figure 3 . Most notably, and consistent with a previous report (Kim, Willison, and Horwich 1994 ), much of the divergence between the different CCT subunits corresponds to their apical domains, the region involved in the binding of substrate. However, we also detected differences in the degree of conservation and amino acid sequence of the putative ATP-binding domains in the different subunits, as well as the presence of highly conserved “paralog-specific” motifs present in the equatorial and intermediate domains (fig. 3 ).
Chaperonin Phylogeny
To more rigorously address the question of the evolutionary relationship of the CCT paralogs with the archaeal chaperonins and to determine the position of the bacterial (i.e., group I) root of the group II chaperonin tree, we performed phylogenetic analyses using alignments that contained reduced numbers of taxa and maximal phylogenetic diversity (see Materials and Methods). Surprisingly, when the bacterial chaperonin sequences were included as an outgroup (65 taxa, 227-position alignment; see Materials and Methods), parsimony, distance-based, and protML analyses produced trees in which the eukaryotic CCTζ clade (not archaea) was the deepest branch of the group II chaperonins (data not shown). ML-distance trees (neighbor-joining and Fitch-Margoliash; as above) placed the euryarchaeotes as the deepest branch, but as a paraphyletic group separated from the crenarchaeotes by the CCTζ clade of eukaryotes (a similar result was obtained in protML analyses using an alignment from which the fastest-evolving sites had been removed; 24 sites, 203 total sites). The deepest branches in these phylogenies were not well supported, however, suggesting that CCTζ (the longest branch of the CCTs; see fig. 1 ) might be attracted to the long branch of the bacterial outgroup. Clearly, the small number of alignable amino acid positions between the group I and the group II chaperonins (approximately 200 sites, corresponding primarily to the ATP-binding/hydrolysis motifs) provide little phylogenetic signal with which to address the evolutionary history of the archaeal/eukaryotic chaperonin tree. We therefore focused on the group II chaperonin data set and attempted to determine the placement of the archaeal chaperonin root to the eukaryotic CCT tree and the branching order of the various CCT paralogs.
Figure 4A shows an ML tree of representative archaeal chaperonins and eukaryotic CCTs (50 taxa, 355 sites). As in figure 1 , strong support for the monophyly of all of the individual CCT subunit clades is recovered. Furthermore, the CCTδ and CCTϵ paralogs form a well-supported clade with ML, distance, and parsimony methods (data not shown), as do CCTα, CCTη and CCTβ (although more weakly). For archaea, the clustering of the euryarchaeal sequences together is well supported, while the monophyly of the α and β paralogs of crenarchaeotes is not. The ML tree shows the crenarchaeal β subunit sequences branching with the euryarchaeotes, suggesting that the α/β paralogy in crenarchaeotes may predate their divergence from euryarchaeotes (this topology was observed with some but not all phylogenetic methods; data not shown). Interestingly, most of the deepest branches of the group II chaperonin tree were poorly resolved, even when the maximum number of alignable amino acid positions was used. The systematic exclusion of individual CCT paralogs from the analyses, most notably CCTζ (the longest branch) and CCT𝛉 (poorly conserved), had little effect on the support for the relationships among the CCT subunits, suggesting that no particular subset of the data was the cause of the unstructured trees (data not shown). We therefore performed Kishino-Hasegawa tests (Kishino and Hasegawa 1989 ) in PUZZLE to assess the significance of alternative topologies to the ML tree, taking into account among-sites rate heterogeneity. In these analyses, the optimal topology was slightly different from the protML tree in figure 4A (which was the second-best tree; 0.64 SE difference) and placed the archaeal root between the CCT𝛉/CCTδ/CCTϵ and the CCTβ/CCTη/CCTα/CCTγ/CCTζ clades (fig. 4B ). Several other rootings were not considered worse at a 5% level of significance (e.g., the archaea as a sister group to CCTγ, CCT𝛉, or CCTζ), but were between 1.2 and 1.8 SEs worse than the best tree. Notably, placements of the archaeal root within the CCTδ/CCTϵ and CCTα/CCTβ/CCTη clades were significantly worse topologies, confirming the results of figure 4A and suggesting that these paralogies are the most recent in the evolution of CCT.
Discussion
Gene duplications and gene losses make the reconstruction of ancient molecular events difficult. Group II chaperonins are a striking example—lineage-specific gene duplication and gene loss has occurred in archaea, and a remarkably ancient and complex paralogy exists in eukaryotes. Our data bear on several aspects of the origin and evolution of the completely hetero-oligomeric CCT in eukaryotes and on the origin of the eukaryotic cell itself.
The CCT genes presented here from Trichomonas and Giardia, two of the most divergent eukaryotes presently known, show strong affinity for each of the eight CCT subunit families found in “higher” eukaryotes. The gene duplications producing the different subunits clearly occurred very early in the evolution of the eukaryotic cell, and it is unlikely that the loss of any one of the CCT paralogs could, at this stage, be tolerated. The essential nature of at least six (and likely all) of the eight CCT genes in yeast (Stoldt et al. 1996 ; Lin et al. 1997 ) and their seemingly universal distribution in the diverse eukaryotic lineages examined here speak to that constraint. It has been suggested (Willison and Horwich 1996 ) that CCT evolved from an eightfold symmetric chaperonin complex like that in the crenarchaeote Pyrodictium occultum (Phipps et al. 1991, 1993 ), based on the near-universal distribution of eight-membered ring structures among group II chaperonins (with Sulfolobus being the only exception; Marco et al. 1994 ). The α and β subunits of crenarchaeotes (the deepest paralogy in archaea; Archibald, Logsdon, and Doolittle 1999 ) do not branch preferentially with particular subsets of CCT paralogs, however, as would be expected if the paralogy predated the divergence of crenarchaeotes and eukaryotes: there is no sense in which particular archaeal chaperonin paralogs are more closely related to some CCT paralogs than to others. While the ancestral chaperonin complex in eukaryotes was likely composed of eight-membered rings, it appears that CCT became hetero-oligomeric independent of the chaperonin complexes in archaea.
We attempted to resolve the relative branching order of the CCT paralogs and thus determine the order in which CCT “acquired” so many different subunits. Unlike other paralogous “eukaryote-specific” gene families, such as actins and tubulins, which have very distantly related prokaryotic homologs, the eukaryotic CCTs have relatively close archaeal homologs to serve as an outgroup. The exact placement of the archaeal root on the CCT tree remains unclear, but significantly, our data reject the placement of the root within the CCTδ/CCTϵ and the CCTα/CCTβ/CCTη clades. It is thus likely that CCT underwent intermediate stages of hetero-oligomerism, perhaps similar to the degree observed in present-day archaeal chaperonin complexes, and that the CCTδ, CCTϵ, CCTα, CCTβ, and CCTη subunits represent more recent divergences in eukaryotic chaperonin evolution.
Kubota et al. (1994) suggested that all CCT subunits should be present in all eukaryotes, and estimated a divergence time of two billion years for the different CCT paralogs based on the assumption that the amino acid substitution rate of each CCT subunit family has been constant. The data presented here are consistent with the former prediction but indicate that a clocklike rate of sequence divergence for each of the eight CCT paralogs is clearly not the case. We observed striking differences in the degree of conservation of the individual CCT subunits, as well as paralog-specific, highly conserved sequence motifs (fig. 3 ). CCT𝛉 appears to be the least conserved subunit and may have reduced/different functional constraints. The results of recent biochemical studies (Liou and Willison 1997 ; Liou, McCormack, and Willison 1998 ) support this notion: compared with the other CCTs, unique subunit-subunit binding properties were observed for CCT𝛉 in vitro, as was a much reduced level of CCT𝛉 mRNA relative to the other CCT genes (Liou and Willison 1997 ). From this perspective, and in light of our phylogenetic analyses, amino acid identity comparisons which suggest that the eight CCT subunits are approximately equally related to each other (Kubota et al. 1994 ; Kubota, Hynes, and Willison 1995a ) are misleading.
Why are there so many CCT paralogs? It has been suggested that the multiple gene duplications in the CCT gene family were concurrent with (and facilitated) the evolution of the eukaryotic cytoskeleton (Willison and Kubota 1994 ; Willison and Horwich 1996 ). Unlike GroEL, which appears to service a broad range of substrates in the bacterial cytoplasm (Houry et al. 1999 ), CCT is thought to be more “specialized.” Actins and tubulins, the major cytoskeletal proteins of eukaryotic cells, appear to be the predominant substrates of CCT (Willison and Kubota 1994 ; Kubota, Hynes, and Willison 1995a ), although others have been, and continue to be, identified (Farr et al. 1997 ; Melki et al. 1997 ; Won et al. 1998 ; Feldman et al. 1999 ). Llorca et al. (1999a) have recently provided strong evidence for interactions between α-actin and the apical domains of specific CCT subunits (δ-ϵ or δ-β) within the central chamber of CCT (curiously, our data suggest CCTδ and CCTϵ to be among the most recent CCT duplicates). Such observations suggest coevolution of CCT and its substrates. We recently presented a more neutral model for the evolution of duplicate subunits in archaeal (and, by extension, early eukaryotic) chaperonins (Archibald, Logsdon, and Doolittle 1999 ), where coevolution between duplicate subunits could also lead to obligatory hetero-oligomerism. In archaea, hetero-oligomeric chaperonin complexes appear to have evolved multiple times independently (recurrent paralogy), a pattern that is inconsistent with a model of coevolution of chaperonin and substrate.
Our analyses of the positions of constant or variable amino acid sites for each CCT subunit family (see Results) revealed that many of the CCT subunits possess “signatures” that are invariant, or nearly so, with respect to the other CCT families (fig. 3 ). Kim, Willison, and Horwich (1994) , using an alignment that contained primarily mammalian and yeast CCT homologs, noted that conserved subunit-specific signatures often corresponded to regions of the protein involved in the binding of substrate. The method used here for identifying differences in CCT subunit sequence evolution were consistent with this result but also identified highly conserved subunit-specific motifs that, based on the archaeal thermosome and GroEL crystal structures (Ditzel et al. 1998 ), correspond to regions of intra- and inter-subunit contacts. We also observed differences in the degree of conservation of the ATP-binding/hydrolysis motifs in CCT𝛉, as well as CCTγ and CCTζ (fig. 3 ). Finally, genetic studies (Lin et al. 1997 ; Lin and Sherman 1997 ) have shown CCTζ (CCT-6 in yeast) to be sensitive to mutations not only in its apical domain, but also in subunit-subunit contact regions. Curiously, CCTζ was remarkably tolerant to mutations in the ATP-binding/hydrolysis motifs; it is not clear why these motifs should be so highly conserved across all eukaryotic species examined so far.
In this sense, it seems appropriate to view a particular subunit's function in terms of its contribution to the proper formation of the hetero-oligomeric CCT particle as well as to the binding of substrate(s). We argue that a fairly rigid and ordered arrangement of subunits in the hetero-oligomeric CCT would have had to precede (or be concurrent with) the evolution of subunit-specific roles for interactions with substrates and that a pattern of “recurrent paralogy” should be a necessary intermediate in the evolution of complete hetero-oligomerism. We base this argument on the fact that the specific functional roles of individual CCT subunits in protein folding described so far are context-dependent (i.e., they demand an ordered arrangement of subunits; Llorca et al. 1999a ). Interestingly, the evolutionary pattern of the group II chaperonins bears a strong resemblance to that of the proteasome, a barrel-shaped proteolytic complex found in archaea, in the eukaryotic cytosol, and in some bacteria. Archaea possess single α and β subunits (Baumeister et al. 1998 ), while eukaryotes possess seven α and seven β paralogs (subunits), one for each position in the seven-membered α and β rings (Groll et al. 1997 ). In both chaperonins and proteasomes, the evolution of single hetero-oligomeric particles, instead of multiple distinct homo-oligomeric ones, suggests that coevolution between duplicate subunits has been a significant factor in shaping their architectures.
It is clear that gene duplication and gene loss have been, and still are, prominent forces in archaeal and eukaryotic chaperonin evolution. A “recent” CCT gene duplication in mammals (CCTζ-1, CCTζ-2; Kubota et al. 1997 ), a probable Sulfolobus-specific paralogy in crenarchaeotes (Archibald, Logsdon, and Doolittle 1999 ), and the presence of multiple copies of CCT paralogs in Trichomonas (and, undoubtedly, many other eukaryotes) indicate that chaperonin gene duplication is an ongoing process. We recently presented phylogenetic evidence for “recent” gene loss in the euryarchaeal Pyrococcus species (Archibald, Logsdon, and Doolittle 1999 ). An even more striking case can be inferred for yeast CCTs. Genome sequence analyses in Saccharomyces suggest that a whole-genome duplication may have occurred after its divergence from Kluyveromyces (Wolfe and Shields 1997 ); given the fact that CCT was already completely hetero-oligomeric at this time (i.e., yeast had at least eight CCT genes), the presence of exactly eight CCT genes in the present-day yeast genome (Stoldt et al. 1996 ) indicates that multiple CCT duplicates have been lost.
The evolutionary forces influencing the retention of duplicate chaperonin genes are less obvious. What is becoming clear is that many of the complex paralogies unique to eukaryotic genomes (e.g., α- and δ-DNA polymerases [Edgell, Malik, and Doolittle 1998] , α- and β-tubulins [Keeling and Doolittle 1996] , and RNA polymerases I, II, and III [Stiller, Duffield, and Hall 1998] ) were present early in eukaryotic evolution. The CCT gene family examined here is the most extreme example thus far. We suggest that a tendency toward highly paralogous gene families (and more “complex” macromolecular machinery) in eukaryotes compared with prokaryotes may reflect fundamental differences in the ways in which prokaryotic and eukaryotic genomes evolve. Larger genomes with multiple linear chromosomes should reduce the probability of gene conversion between recent (unlinked) duplicates and, in general, more easily accommodate duplicate genes (offsetting the effects of random gene loss). Furthermore, chromosomal or whole-genome duplications provide a ready mechanism for doubling the number of paralogs present in a genome. Inherent differences in the mechanisms and frequency of gene/chromosome/genome duplication and gene conversion/loss could influence the retention of duplicate genes as much as the positive selection for new paralog-specific functions.
Geoffrey McFadden, Reviewing Editor
Present address: Department of Biology, Emory University.
Keywords: chaperonins parabasalids diplomonads gene duplication eukaryotic evolution phylogeny
Address for correspondence and reprints: John M. Archibald, Program in Evolutionary Biology, Canadian Institute for Advanced Research, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4H7. E-mail: jmarchib@is2.dal.ca.
We thank M. Müller for Trichomonas genomic DNA, A. Roger and D. Edgell for Giardia cells, M. Embley and R. Hirt for a Trichomonas cDNA clone encoding CCTη, N. Fast for a Trichomonas genomic library, A. Stoltzfus for mol2con.pl, and A. Roger and members of the Doolittle lab for helpful discussion and critical review of the manuscript. M. Leroux is also thanked for helpful discussions on chaperonin evolution. Preliminary sequence data from the Giardia lamblia Genome Project was obtained from the Josephine Bay Paul Center Web site at the Marine Biological Laboratory (www.bpc.mbl.edu). Sequencing was supported by the National Institute of Allergy and Infectious Diseases using equipment from LI-COR Biotechnology. This work was supported by a grant awarded to W.F.D. by the Medical Research Council (MRC) of Canada. J.M.A. was supported by an MRC studentship awarded to W.F.D., and by an MRC Doctoral Research Award. J.M.L. was supported by postdoctoral fellowships from MRC and NIH.
literature cited
Adachi, J., and M. Hasegawa.
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman.
Archibald, J. M., J. M. Logsdon Jr., and W. F. Doolittle.
Baumeister, W., J. Walz, F. Zuhl, and E. Seemuller.
Braig, K., Z. Otwinowski, R. Hegde, D. C. Boisvert, A. Joachimiak, A. L. Horwich, and P. B. Sigler.
Clark, C. G., and A. J. Roger.
Ditzel, L., J. Löwe, D. Stock, K. O. Stetter, H. Huber, R. Huber, and S. Steinbacher.
Edgell, D. R., S. B. Malik, and W. F. Doolittle.
Farr, G. W., E. C. Scharl, R. J. Schumacher, S. Sondek, and A. L. Horwich.
Feldman, D. E., V. Thulasiraman, R. G. Ferreyra, and J. Frydman.
Felsenstein, J.
Frydman, J., E. Nimmesgern, H. Erdjument-Bromage, J. S. Wall, P. Tempst, and F. U. Hartl.
Gebauer, M., R. Melki, and U. Gehring.
Geissler, S., K. Siegers, and E. Schiebel.
Germot, A., H. Philippe, and H. Le Guyader.
Groll, M., L. Ditzel, J. Löwe, D. Stock, M. Bochtler, H. D. Bartunik, and R. Huber.
Gutsche, I., L. O. Essen, and W. Baumeister.
Hashimoto, T., Y. Nakamura, F. Nakamura, T. Shirakura, J. Adachi, N. Goto, K. Okamoto, and M. Hasegawa.
Hirt, R. P., B. Healy, C. R. Vossbrinck, E. U. Canning, and T. M. Embley.
Hirt, R. P., J. M. Logsdon Jr., B. Healy, M. W. Dorey, W. F. Doolittle, and T. M. Embley.
Horwich, A. L., and H. R. Saibil.
Houry, W. A., D. Frishman, C. Eckerskorn, F. Lottspeich, and F. U. Hartl.
Kamaishi, T., T. Hashimoto, Y. Nakamura, F. Nakamura, S. Murata, N. Okada, K. Okamoto, M. Shimizu, and M. Hasegawa.
Keeling, P. J., and W. F. Doolittle.
Kim, S., K. R. Willison, and A. L. Horwich.
Kishino, H., and M. Hasegawa.
Klumpp, M., and W. Baumeister.
Klumpp, M., W. Baumeister, and L. O. Essen.
Kubota, H., G. Hynes, A. Carne, A. Ashworth, and K. Willison.
Kubota, H., G. M. Hynes, S. M. Kerr, and K. R. Willison.
Kubota, H., G. Hynes, and K. Willison. 1995a. The chaperonin containing t-complex polypeptide 1 (TCP-1). Multisubunit machinery assisting in protein folding and assembly in the eukaryotic cytosol. Eur. J. Biochem. 230:3–16
———. 1995b. The eighth Cct gene, Cctq, encoding the theta subunit of the cytosolic chaperonin containing TCP-1. Gene 154:231–236
Leipe, D. D., J. H. Gunderson, T. A. Nerad, and M. L. Sogin.
Lewis, V. A., G. M. Hynes, D. Zheng, H. Saibil, and K. Willison.
Lin, P., T. S. Cardillo, L. M. Richard, G. B. Segel, and F. Sherman.
Lin, P., and F. Sherman.
Liou, A. K., E. A. McCormack, and K. R. Willison.
Liou, A. K., and K. R. Willison.
Llorca, O., E. A. McCormack, G. Hynes, J. Grantham, J. Cordell, J. L. Carrascosa, K. R. Willison, J. J. Fernandez, and J. M. Valpuesta. 1999a. Eukaryotic type II chaperonin CCT interacts with actin through specific subunits. Nature 402:693–696
Llorca, O., M. G. Smyth, J. L. Carrascosa, K. R. Willison, M. Radermacher, S. Steinbacher, and J. M. Valpuesta. 1999b. 3D reconstruction of the ATP-bound form of CCT reveals the asymmetric folding conformation of a type II chaperonin. Nat. Struct. Biol. 6:639–642
Marco, S., D. Urena, J. L. Carrascosa, T. Waldmann, J. Peters, R. Hegerl, G. Pfeifer, H. Sack-Kongehl, and W. Baumeister.
Melki, R., G. Batelier, S. Soulie, and R. C. Williams Jr.
Phipps, B. M., A. Hoffmann, K. O. Stetter, and W. Baumeister.
Phipps, B. M., D. Typke, R. Hegerl, S. Volker, A. Hoffmann, K. O. Stetter, and W. Baumeister.
Roger, A. J.
Roger, A. J., C. G. Clark, and W. F. Doolittle.
Roger, A. J., S. G. Svard, J. Tovar, C. G. Clark, M. W. Smith, F. D. Gillin, and M. L. Sogin.
Siegers, K., T. Waldmann, M. R. Leroux, K. Grein, A. Shevchenko, E. Schiebel, and F. U. Hartl.
Sigler, P. B., Z. Xu, H. S. Rye, S. G. Burston, W. A. Fenton, and A. L. Horwich.
Smith, M. W., S. B. Aley, M. Sogin, F. D. Gillin, and G. A. Evans.
Sogin, M. L., J. H. Gunderson, H. J. Elwood, R. A. Alonso, and D. A. Peattie.
Stiller, J. W., E. C. Duffield, and B. D. Hall.
Stoldt, V., F. Rademacher, V. Kehren, J. F. Ernst, D. A. Pearce, and F. Sherman.
Strimmer, K., and A. von Haeseler.
Swofford, D. L., 1998. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Sinauer, Sunderland, Mass
Trent, J. D., E. Nimmesgern, J. S. Wall, F. U. Hartl, and A. L. Horwich.
Vainberg, I. E., S. A. Lewis, H. Rommelaere, C. Ampe, J. Vandekerckhove, H. L. Klein, and N. J. Cowan.
Waldmann, T., A. Lupas, J. Kellermann, J. Peters, and W. Baumeister.
Willison, K. R., and A. L. Horwich.
Willison, K. R., and H. Kubota.
Wolfe, K. H., and D. C. Shields.