当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第9期 > 正文
编号:11255080
Adaptive Evolution of the Histone Fold Domain in Centromeric Histones
     Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Wash.

    E-mail: steveh@fhcrc.org.

    Abstract

    Centromeric DNA, being highly repetitive, has been refractory to molecular analysis. However, centromeric structural proteins are encoded by single-copy genes, and these can be analyzed by using standard phylogenetic tools. The centromere-specific histone, CenH3, replaces histone H3 in centromeric nucleosomes, and is required for the proper distribution of chromosomes during cell division. Whereas histone H3s are nearly identical between species, CenH3s are divergent, with an N-terminal tail that is highly variable in length and sequence. Both the N-terminal tail and histone fold domain (HFD) are subject to adaptive evolution in Drosophila. Similarly, comparisons between Arabidopsis thaliana and Arabidopsis arenosa detected adaptive evolution, but only in the N-terminal tail. We have extended our evolutionary analyses of CenH3s to other members of the Brassicaceae, which allowed the detection of positive selection in both the N-terminal tail and in the HFD. We find that adaptively evolving sites in the HFD can potentially interact with DNA, including sites in the loop 1 region of the HFD that are required for centromeric targeting in Drosophila. Other adaptively evolving sites in the HFD can be localized on the structure of the nucleosome core particle, revealing an extended surface in addition to loop 1 in which conformational changes might alter histone–DNA contacts or water bridges. The identification of adaptively evolving sites provides a structural basis for the interaction between centromeric DNA and the protein that is thought to underlie the evolution of centromeres and the accumulation of pericentric heterochromatin.

    Key Words: centromere ? histone ? adaptive evolution ? Arabidopsis ? Brassicaceae

    Introduction

    Every eukaryotic chromosome has a centromere, the site of attachment for spindle microtubules that is required for faithful segregation of chromosomes at mitosis and meiosis. Although centromeres are remarkably conserved in position over evolutionary time, the DNA sequences comprising centromeres in plants and animals are rapidly evolving. Typically, centromeric sequences consist of tandemly repetitive satellite sequences, like the megabase-sized blocks of 171-bp alpha satellite repeats that span human centromeres and the blocks of 178-bp satellite repeats that span Arabidopsis centromeres. However, in some cases, there are no recognizable sequence motifs at centromeres. Many human neocentromeres that arise in ordinary chromosome arms lack alpha satellite repeats (Lo et al. 2001), and the 750-kb rice centromere 8 is nearly devoid of satellite repeats (Nagaki et al. 2004). The failure to identify commonalities between centromeric sequences has suggested that centromere identity and inheritance depends on unique features of centromeric chromatin.

    A consistent feature of centromeric chromatin is the presence of centromere-specific H3 histones (CenH3s) that replace H3 in the nucleosome. Even centromeres that lack satellite sequences, such as human neocentromeres and rice centromere 8, are packaged in CenH3-containing nucleosomes. This suggests that centromere identity is determined, at least in part, by CenH3s. In support of this view, CenH3s are absolutely required for the assembly of kinetochore components at mitosis (Howman et al. 2000; Moore and Roth 2001). Furthermore, the mammalian CenH3 survives the replacement of almost all histones with protamines during sperm maturation (Palmer, O'Day, and Margolis 1990), as if CenH3s help to maintain centromere identity at the same chromosomal position over long periods of evolutionary time.

    CenH3s are encoded by single-copy genes that are evolving rapidly, in striking contrast to the repeated genes encoding nearly invariant H3 histones that have been maintained by extraordinary purifying selection throughout eukaryotic evolution. Rapid evolution is especially evident in the CenH3 N-terminal tail, which is unalignable between CenH3s from distant taxa. Furthermore, CenH3s from closely related species of Drosophila are subject to episodes of adaptive evolution, detected as an excess of replacement changes compared to synonymous changes (Malik and Henikoff 2001). Adaptive evolution of Drosophila CenH3 has been hypothesized to be an adjustment to the rapidly evolving centromeric DNA, which presumably makes numerous contacts with CenH3 in centromeric nucleosomes. In support of this notion, the adaptively evolving N-terminal tails of Drosophila CenH3s sometimes contain minor groove DNA-binding motifs (Malik, Vermaak, and Henikoff 2002).

    In addition to the N-terminal tail, adaptive evolution has been detected in the loop 1 region of the histone fold domain (HFD) of Drosophila CenH3 (Malik and Henikoff 2001). Loop 1 of H3 is known to make multiple contacts with DNA, and its longer length in all known CenH3s suggests that there are additional contacts at centromeres. The small subdomain formed by H3 loop 1 and H4 loop 2 should make contact with DNA when nucleosomes begin to assemble (Luger et al. 1997), so this region is an attractive one for providing CenH3-DNA specificity within the HFD. Evidence that this is indeed the case came from the demonstration that loop 1 of Drosophila CenH3 is both necessary and sufficient for its localization to centromeres (Vermaak, Hayden, and Henikoff 2002). Furthermore, particular residues within loop 1 were found to be necessary for localization of CenH3 to Drosophila melanogaster centromeres. Therefore, loop 1 of CenH3 appears to be key to understanding the molecular basis of centromere identity and inheritance.

    The paradigm of adaptation to rapidly evolving centromeres is common to both animal and plant CenH3s, because adaptive evolution has also been detected for the N-terminal tail of the Arabidopsis CenH3, HTR12 (Talbert et al. 2002). However, this study did not detect adaptive evolution of loop 1, presumably owing to the strong purifying selection on the remainder of the HFD that would have obscured any signal of adaptation confined to the small region of loop 1. To overcome this limitation, we have extended the analysis of HTR12 by isolating homologs from several relatives within the Brassicaceae family. This has allowed us to apply both pairwise and multiple alignment–based methods for detecting and mapping positive selection. Indeed, residues within loop 1 and other residues of the HFD have evidently been subject to strong positive selection. Our results implicate early steps in assembly of CenH3-containing nucleosomes in specifying centromeres of both plants and animals.

    Materials and Methods

    Seeds

    Seeds were obtained from Luca Comai (University of Washington) for Olimarabidopsis pumila (Arabidopsis Biological Resource Center [ABRC] stock CS3701), from Tom Mitchell-Olds (Max Planck Institute of Chemical Ecology) for Arabis drummondii, from Charles Langley (University of California) for Arabidopsis lyrata (North Carolina), and from the Sendai Arabidopsis Seed Stock Center for Crucihimalaya himalaica, Capsella bursa-pastoris, Arabis hirsuta, and Cardamine flexuosa (JO18, JO22, JO23, and JO27 respectively).

    Cloning and Sequencing

    DNA was isolated from leaf tissue, essentially as described by Comai et al. (2000). High Fidelity Platinum Taq polymerase (Invitrogen, Carlsbad, Calif.) was used to amplify from genomic sequences, except for CODEHOP amplifications, where Amplitaq (Applied Biosystems, Foster City, Calif.) was used. Polymerase chain reaction (PCR) products were cloned using the pCR2.1-TOPO TA kit (Invitrogen). PCR primers were designed on the basis of Arabidopsis thaliana genome sequence and the Arabidopsis arenosa HTR12 sequence (Talbert et al. 2002). We used PCR primers to a putative upstream gene and HTR12 exon 8 (5'-TGAAAGATTGGCTTCTCAGGA-3' and 5'-TGCATGGATAGCACAGAGCA-3'), as well as primers to HTR12 exon 6 and a putative downstream gene (5'-TTCTTATTCCAGCTCCTAGC-3' and 5'-AGGCAACAATGGTTGGATTG-3') to amplify the HTR12 gene as two overlapping products. Not all HTR12 genes could be amplified with these specific primers, so a partially degenerate CODEHOP primer (Rose et al. 1998) was used in combination with a specific primer to amplify the HFD (5'-ACTGTTGCTCTGAGAGAAATTAGACAYTWYCARAA-3' and 5'-CCATGGTCTGCCTTTTCCTC-3'). Species-specific primers were designed for the Capsella bursa-pastoris HTR12 (5'-TCTGCAAACATTTTCCTCCA-3' and 5'-CCATGGTCTGCCTTTTCCTC-3').

    The genomic DNA samples were also used to amplify Chalcone synthase (Chs). Primers to Chs were designed to a well-conserved portion of the promoter and the end of the last exon (5'-CCGTCCATCAAACCTACCAC-3' and 5'-TAGAGAGGAACGCTGTGCAA-3'). Although multiple alleles were recovered by amplification of HTR12 in some cases, presumably owing to tetraploidy, only a single Chs allele was recovered from each species. The close relationships between our Chs amplicons and Chs sequences present in GenBank confirm the identification of the plants used in our study.

    To obtain the full-length Olimarabidopsis pumila and C. flexuosa HTR12, genomic libraries were constructed using the Lambda FIX II/XhoI Partial Fill-In Vector Kit (Stratagene, La Jolla, Calif.) and screened for HTR12. Hybridization probes consisted of the O. pumila CODEHOP PCR product and a 178-bp fragment amplified from C. flexuosa genomic DNA (5'-GCTCTGTGCTATCCATGCAA-3' and 5'-GCGTGCAAGCTCAAAGTCT-3'), corresponding to a well-conserved portion of the Brassicaceae HFD (exon 8 to exon 9). Lambda DNA containing a positive O. pumila insert was digested with HindIII, and an approximately 6-kb fragment containing HTR12 was subcloned into the HindIII site of the pCR2.1-TOPO vector (Invitrogen). For C. flexuosa, a positive clone was used for amplification of the HTR12 HFD.

    For all cloning methods, both strands of at least three clones for each allele were sequenced by using ABI Big Dye sequencing. Exon–intron structure was determined by alignment with A. thaliana and A. arenosa HTR12 cDNAs as well as splice site prediction by the NetGene2 server (Hebsgaard et al. 1996). GenBank accession numbers are AY612780–AY612796, AY623911, and AY623912.

    Phylogenetic Analyses

    Alignments of coding and amino acid sequence were performed with Clustal (Chenna et al. 2003), and unrooted phylogenetic trees were generated using the Neighbor-Joining method as applied by PAUP using default parameters (Saitou and Nei 1987; Swofford 2000). After removing gaps, the HTR12 HFD alignment consisted of 222 nucleotides, and the Chs alignment consisted of 1,119 nucleotides (supplementary material). For comparison purposes, both trees were rooted at the longest branch with the highest bootstrap support in the HTR12 phylogeny. Maximum parsimony generated trees with the same overall topology as the Neighbor-Joining trees.

    K-estimator was used to calculate the rate of synonymous (Ks) and nonsynonymous (Ka) nucleotide substitutions in pairwise comparisons of HTR12 (Comeron 1999). Gaps were removed from the N-terminal tail coding sequence as indicated by amino acid alignments, leaving 74 to 80 codons for K-estimator analysis. The codeml program of PAML version 3.13 was used to estimate dN/dS (Ka/Ks) ratios and identify adaptively evolving sites (Yang 1997). Several site-specific models were tested: M0 (one ratio), M1 (neutral), M2 (selection), M3 (discrete), M7 (?), and M8 (? and ). Log likelihoods of models (M1 vs. M2; M0 vs. M3; M7 vs. M8) were compared using likelihood ratio tests. Codeml applies Bayes rule to assign posterior probabilities of adaptive evolution to individual sites in the alignment. Posterior probabilities of the site classes for each site were compared among the models allowing selection. We obtained similar results when estimating parameters using a simple 3 x 4 nucleotide frequency table and using a full 61-codon table, with stronger significance using the 61-codon table.

    Results

    Several approaches were taken to clone HTR12 homologs. Alleles from A. lyrata, C. himalaica, and C. bursa-pastoris were amplified with two sets of primers so that two overlapping clones encompassing the entire locus were generated. This method was not successful on all DNA samples, so a consensus–degenerate hybrid primer (CODEHOP (Rose et al. 1998)) was designed to the most 5' region of the HFD for use with a nondegenerate primer to the most 3' region of the HFD. This primer combination successfully amplified the HTR12 HFD from all DNA samples described herein. To isolate the full coding region of HTR12 from O. pumila for phylogenetic analysis, we also constructed and screened a genomic library from Sau3AI partially digested O. pumila DNA.

    Because N-terminal tail sequences were not obtained for all genera, we divided the data into two sets. One dataset contained complete HTR12 coding sequences from A. thaliana, A. lyrata, A. arenosa, O. pumila, C. himalaica, and C. bursa-pastoris. A second dataset consisted of the HFD coding sequences from A. thaliana and A. arenosa (Talbert et al. 2002), O. pumila, C. himalaica, C. bursa-pastoris, C. flexuosa, A. drummondii, and A. hirsuta. A phylogram of HFD coding sequences reveals a topology generally similar to that of Chalcone synthase (fig. 1), with the major difference being the placement of the clade comprising C. himalaica and A. drummondii.

    FIG. 1. Neighbor-Joining trees with bootstrap support indicated along the branches. (A) HTR12 HFD coding sequences. Numbers represent different alleles. (B) Chs coding sequences.

    Pairwise comparisons of synonymous and nonsynonymous substitutions were performed separately on the two datasets using K-estimator (Comeron 1999). K-estimator reports the rate of synonymous (Ks) and nonsynonymous (Ka) nucleotide substitutions in pairwise comparisons, allowing the calculation of the ratio = Ka/Ks, where < 1 indicates purifying selection and > 1 indicates positive selection. K-estimator provides confidence limits around Ka/Ks by repeatedly simulating evolution with random substitutions. For the N-terminal tail, four of 15 comparisons displayed significant positive selection (table 1), including the previously reported comparison between A. thaliana and A. arenosa (Talbert et al. 2002). The inability to rule out neutrality in 10 of the 15 pairwise comparisons could be a result of true neutrality or regions under purifying selection that prevent detection of small domains subject to positive selection. In support of this latter possibility, we note that one comparison displayed significant purifying selection. Furthermore, the Drosophila CenH3 contains three regions of strong sequence conservation (Malik, Vermaak, and Henikoff 2002), perhaps corresponding in function to the essential N-terminal tail domain identified in the budding yeast CenH3, Cse4p (Chen et al. 2000). Although it is not possible to align the N-terminal regions between the three kingdoms, the presence of scattered blocks of sequence conservation in the Brassicaceae alignment is suggestive of a similar alternation of conserved and adaptively evolving regions (fig. 2).

    Table 1 Pairwise estimations of Ka/Ks ratios and for the N-terminal tail of HTR12.

    FIG. 2. HTR12 N-terminal tail alignment used to generate pairwise comparisons of coding sequence. Identical residues are shaded

    Unlike the N-terminal tail, which exits from the nucleosome, the HFD is tightly constrained within the histone octamer core that is wrapped by nearly two turns of DNA (Luger et al. 1997). This constraint means that all helices and most loops are unambiguously alignable for all H3 family members in eukaryotes, so we can confidently assign CenH3 positions to the H3 structure. We focused on loop 1, because of the evidence for positive selection in the Drosophila CenH3 loop 1 and for its centromere targeting activity. We delimited HTR12 loop 1 as the sequence encoding 15 residues that align to the region necessary and sufficient for targeting in Drosophila (boxed sequence fig. 3). Of the 28 Brassicaceae comparisons for which pairwise estimations of synonymous and nonsynonymous substitutions could be made, five displayed significant positive selection in the loop 1 region (table 2). The inability to detect positive selection for the others could be due to structural constraints causing purifying selection, and, indeed, four of the comparisons displayed significant negative selection. Furthermore, six of the 15 residues assigned to loop 1 are invariant among the eight species. With only nine potentially adaptive positions, it seems remarkable that significant positive selection could be detected by K-estimator analysis. Detection of adaptive evolution in loop 1 of Drosophila CenH3 required acquisition of population data to identify fixed changes between species (Malik and Henikoff 2001), so the significant excess of replacement changes that we detect in some Brassicaceae comparisons suggests that adaptive evolution of HTR12 loop 1 is especially strong.

    FIG. 3. Correspondence of posterior probabilities of positively selected residues with HFD alignment. Histogram of posterior probabilities > 0.5 is shown using a log scale. Numbering begins at position 15 of the HTR12 HFD. The boxed area corresponds to the loop 1 region necessary for targeting in Drosophila, and was used to delimit the pairwise comparisons of the loop 1 coding sequence in Brassicaceae. The Xenopus H3 used in the 1.9-? nucleosome structure (Davey et al. 2002) is shown for comparison. A dot indicates the invariant arginine of loop 1. The domains of H3 are depicted linearly beneath the alignment. The A. hirsuta allele 1 was not included in the alignment because of a gap at position 28

    Table 2 Pairwise Ka/Ks estimations and for loop 1 of HTR12.

    To confirm that loop 1 has been evolving adaptively in Brassicaceae, we applied the codeml program of PAML (Yang 1997) to alignments of the HFD. PAML calculates the likelihood of models for neutral and adaptive evolution based on a tree and estimates values to sites in the amino acid alignment. We compared the null model with two site classes ( = 0 or 1; model 1) to a selection model that also included a class of sites with estimated from the data (model 2). The sites in the estimated class were found to have = 3.99 with high significance for the fit of model 2 (2 = 10.83; P = 0.0044). Higher significance was found for the discrete model that estimated all three site classes from the data (model 3) and identified specific positively selected sites ( > 1). Similar results were obtained by using either a DNA- or protein-based tree. We also compared the probability of a 10-site class model (selection model 8 versus null model 7) and found significant adaptive evolution for each of the iterations, but in no case were 10-site class models as probable as three-site class models.

    PAML identified 12 sites with > 1. Nine were found to have posterior probabilities P > 0.99. (fig. 3). Six of the 12 sites are in loop 1, and five are scattered in the preceding helix 1 and loop 0. Because HTR12 loop 1 regions have two extra residues compared to H3, it is not possible to align loop 1 with the H3 structure. Nevertheless, the invariant arginine that holds loop 1 into the DNA minor groove provides a probable anchor for the region (fig. 4), with the position immediately upstream adaptively evolving. This position is one end of a stretch of nine residues, including the six that are adaptively evolving (P > 0.99) (fig. 3).

    FIG. 4. Location of the alignable adaptively evolving HTR12 positions (gray space-fill) on the H3-containing nucleosome structure. Only one H3–H4 dimer (white tube-worm) is shown for clarity. The position of the loop 1 invariant arginine is in black. Views down the DNA helix and perpendicular to the DNA helix are shown, with space-fill depiction of adaptively evolving HTR12 positions (posterior probablility > 0.95)

    The loop 0 and helix 1 positions that show evidence of adaptive evolution can be confidently aligned with the H3 structure, because there are typically no gaps upstream of loop 1 in the HFD in H3s and CenH3s. Positions 15 and 17 are in contact with each other on the surface of the H3–H4 dimeric unit, close to the DNA gyres (fig. 4). These two residues are in proximity to position 18, which has a low posterior probability. Residue 24 is a proline in H3 that touches the phosphate backbone of DNA, and it is alanine, serine, threonine, or proline in Brassicaceae CenH3s. Position 28 of the HFD faces toward the DNA gyres, but is partially buried within the protein.

    Taken together, adaptively evolving HFD residues appear to be part of the H3 surface that contacts the DNA gyre over a region that includes a full turn of the DNA double helix. Because the N-terminal tail also contacts the minor groove, it appears that adaptively evolving residues are close to DNA along most of the H3-DNA contact region on either side of the dyad axis of the nucleosome.

    Discussion

    The highly repetitive satellite DNA sequences found at centromeres have made them almost completely intractable to modern tools of molecular biology. However, in a few exceptional cases, the relative lack of satellite DNA has made it possible to sequence through centromeres. In these cases, chromatin immunoprecipitation using anti-CenH3 antibodies, which bind to centromeres or neocentromeres in situ, has delimited centromeres to several hundred kilobase regions of DNA (Lo et al. 2001; Saffery et al. 2003; Nagaki et al. 2004). These are found to be rather ordinary genomic regions that even include active genes. Therefore, the location of CenH3 reliably identifies the centromere, whereas DNA sequence does not. This has raised the question of how CenH3-containing nucleosomes are assembled at the same regions of chromosomes throughout development and from one generation to the next, over millions of years of evolution.

    A potential clue for the basis for centromere recognition came from the discovery that the Drosophila CenH3, Cid, has been rapidly and adaptively evolving. Adaptive evolution was detected for the N-terminal tail and loop 1 of Cid, regions that can potentially contact the DNA. This finding, taken together with the demonstration that Cid loop 1 is necessary and sufficient for localization to D. melanogaster centromeres, suggested that contacts between Cid and DNA provide recognition preference. Our results support this hypothesis by showing that other HFD residues are also evolving adaptively. In addition, we detected individual residues in the HFD that are responsible for the adaptive signal. No similar conclusion could be reached for Drosophila Cid, because the most variable residues are sometimes missing in more distant Drosophila lineages, thus reducing the adaptive signal to below statistical significance (data not shown). In the case of Brassicaceae, only a single residue in the HFD was missing in one allele of A. hirsuta, allowing all species examined to contribute to the PAML analysis.

    The fact that many of the residues that we identified could be precisely located on the high resolution structure of H3 provides a structural interpretation of adaptive evolution. We identified six positions in addition to loop 1 that have evolved under positive selection in Brassicaceae. Whereas loop 1 is structurally somewhat undefined because it is two amino acids longer in Brassicaceae CenH3s than in H3, each of the other positions could be located precisely in the high resolution structure of the nucleosome. These positions extended the region that is known to be adaptively evolving from the first minor groove contact point within loop 1 along the surface of the H3–H4 dimer toward the dyad axis of the nucleosome (fig. 4). Structurally, this surface is in the vicinity of DNA over most of the central portion of the DNA within the nucleosome.

    It is likely that the adaptively evolving surface of the CenH3 nucleosome is encountered during the first step of nucleosome assembly. DNA wraps once around an (H3–H4)2 tetramer in solution, and archaeal nucleosomes also consist of a single wrap of DNA around a histone tetramer that is the structural and functional ancestor of the eukaryotic (H3–H4)2 (Malik and Henikoff 2003). In a second step, the DNA wraps around a second time, with addition of the H2A–H2B dimers. We expect that as a final step, the CenH3 N-terminal tail, exiting from between the gyres on either side of the dyad axis, interacts with linker DNA and stabilizes the assembled nucleosome. In this way, DNA interactions with adaptively evolving residues in the HFD and the base of the N-terminal tail would provide specificity during the first stage of nucleosome assembly, and the adaptively evolving residues farther out on the tail would provide stabilization at the end of the process.

    Although the adaptively evolving positions in the HFD are all close to the DNA gyre, most of the precisely located sites are not in direct contact, based on the H3 structure. It seems likely that these residues have their effect by altering the distribution of the structured water molecules, of which 121 have been placed on the 1.9 ? structure between histone residues and the DNA gyres (Davey et al. 2002). Water bridges are crucial for the DNA–protein interactions within the nucleosome and in some cases are known to confer sequence specificity (Schwabe 1997; Janin 1999). In this way, even position 28, which is buried, might have an effect by causing a bulge or depression that would restructure the water bridges responsible for protein–DNA interactions. Whereas water bridges are structured in H3 to minimize sequence specificity, in CenH3, they might provide a flexible means of influencing the rate or stability of nucleosome assembly. Even subtle changes on or near the surface of a CenH3 might lead to a better fit with a particular centromere sequence or nucleotide composition.

    Recurrent adaptive evolution implies an arms race, and the evidence that we have presented, together with earlier studies, argues that the conflict is between centromeric DNA and CenH3s. Such a conflict is difficult to envision for mitosis, where loss of a chromosome owing to centromeric incompatibility would kill the cell, or for male meiosis, where this would lead to sterility. However, female meiosis is inherently a destructive process, where only one of the four products survives to be included in the egg nucleus. We have proposed that centromeres compete at female meiosis by reorienting at meiosis I (Henikoff, Ahmad, and Malik 2001). When this meiotic drive process leads to deleterious effects, such as during male meiosis, there will be strong selection for suppressors. Evidence for the existence of meiotic drive comes from the observation that human Robertsonian translocations, which consist of acrocentric chromosomes fused at their centromeres, are preferentially inherited in female meiosis and cause partial male sterility but cause no somatic defects (Pardo-Manuel de Villena and Sapienza 2001; Daniel 2002). Deleterious effects of centromere meiotic drive will select for suppressors, and the many contacts that CenH3 makes with DNA provide opportunities for it to mutate in such a way that it will suppress the drive, thus becoming beneficial in the presence of a driving centromere. The HTR12 residues that we have identified as subject to adaptive evolution fit the suppression scenario, in that they can be envisioned as causing alterations in DNA-binding preference.

    Our results also imply that there is DNA–protein specificity, which leads to the expectation that expansions of centromeric tandem repeats result from adaptation for binding more CenH3. The preponderance of nucleosome-sized repeat units in both plants and animals might be a consequence of the expansion process followed by drive and suppression. Whereas the direct study of these megabase-sized repeats has been technically challenging, we find that the evolutionary analysis of one small protein provides extraordinarily rich structural information that can be used to better understand the origin and evolution of centromeric repeats.

    Acknowledgements

    We thank Harmit Malik for valuable advice; Luca Comai, Paul Talbert, and Danielle Vermaak for helpful discussions; Terri Bryson for technical support; and Jorja Henikoff for assistance with PAML analyses. We also thank Tom Mitchell-Olds, Charles Langley, Luca Comai, and the Sendai Arabidopsis Seed Stock Center for seeds. This work was supported by the Howard Hughes Medical Institute.

    Literature Cited

    Chen, Y., R. E. Baker, K. C. Keith, K. Harris, S. Stoler, and M. Fitzgerald-Hayes. 2000. The N terminus of the centromere H3-like protein Cse4p performs an essential function distinct from that of the histone fold domain. Mol. Cell. Biol. 20:7037-7048.

    Chenna, R., H. Sugawara, T. Koike, R. Lopez, T. J. Gibson, D. G. Higgins, and J. D. Thompson. 2003. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31:3497-3500.

    Comai, L., A. P. Tyagi, K. Winter, R. Holmes-Davis, S. H. Reynolds, Y. Stevens, and B. Byers. 2000. Phenotypic instability and rapid gene silencing in newly formed Arabidopsis allotetraploids. Plant Cell 12:1551-1568.

    Comeron, J. M. 1999. K-Estimator: calculation of the number of nucleotide substitutions per site and the confidence intervals. Bioinformatics 15:763-764.

    Daniel, A. 2002. Distortion of female meiotic segregation and reduced male fertility in human Robertsonian translocations: consistent with the centromere model of co-evolving centromere DNA/centromeric histone (CENP-A). Am. J. Med. Genet. 111:450-452.

    Davey, C. A., D. F. Sargent, K. Luger, A. W. Maeder, and T. J. Richmond. 2002. Solvent mediated interactions in the structure of the nucleosome core partical at 1.9 ? resolution. J. Mol. Biol., 319:1097-1113.

    Hebsgaard, S. M., P. G. Korning, N. Tolstrup, J. Engelbrecht, P. Rouze, and S. Brunak. 1996. Splice site prediction in Arabidopsis thaliana pre-mDNA by combining local and global sequence information. Nucleic Acids Res. 24:3439-3452.

    Henikoff, S., K. Ahmad, and H. S. Malik. 2001. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293:1098-1102.

    Howman, E. V., K. J. Fowler, A. J. Newson, S. Redward, A. C. MacDonald, P. Kalitsis, and K. H. A. Choo. 2000. Early disruption of centromereic chromatin organization in centromere protein A (Cenpa) null mice. Proc. Natl. Acad. Sci. U S A. 97:1148-1153.

    Janin, J. 1999. Wet and dry interfaces: the role of solvent in protein-protein and protein-DNA recognition. Structure Fold Des. 7:R277-R279.

    Lo, A. W., J. M. Craig, R. Saffery, P. Kalitsis, D. V. Irvine, E. Earle, D. J. Magliano, and K. H. Choo. 2001. A 330 kb CENP-A binding domain and altered replication timing at a human neocentromere. EMBO J. 20:2087-2096.

    Luger, K., A. W. Mader, R. K. Richmond, D. F. Sargent, and T. J. Richmond. 1997. Crystal structure of the nucleosome core particle at 2.8 ? resolution. Nature 389:251-260.

    Malik, H. S., and S. Henikoff. 2001. Adaptive evolution of Cid, a centromere-specific histone in Drosophila. Genetics 157:1293-1298.

    Malik, H. S., and S. Henikoff. 2003. Phylogenomics of the nucleosome. Nat. Struct. Biol. 10:882-891.

    Malik, H. S., D. Vermaak, and S. Henikoff. 2002. Recurrent evolution of DNA-binding motifs in the Drosophila centromeric histone. Proc. Natl. Acad. Sci. U S A. 99:1449-1454.

    Moore, L. L., and M. B. Roth. 2001. HCP-4, a CENP-C-like protein in Caenorhabditis elegans, is required for resolution of sister centromeres. J. Cell. Biol. 153:1199-1208.

    Nagaki, K., Z. Cheng, S. Ouyang, P. B. Talbert, M. Kim, K. M. Jones, S. Henikoff, C. R. Buell, and J. Jiang. 2004. Sequencing of a rice centromere uncovers active genes. Nat. Genet. 36:138-145.

    Palmer, D. K., K. O'Day, and R. L. Margolis. 1990. The centromere specific histone CENP-A is selectively retained in discrete foci in mammalian sperm nuclei. Chromosoma 100:32-36.

    Pardo-Manuel de Villena, F., and C. Sapienza. 2001. Transmission ratio distortion in offspring of heterozygous female carriers of Robertsonian translocations. Hum. Genet. 108:31-36.

    Rose, T. M., E. R. Schultz, J. G. Henikoff, S. Pietrokovski, C. M. McCallum, and S. Henikoff. 1998. Consensus degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res. 26:1628-1635.

    Saffery, R., H. Sumer, S. Hassan, L. H. Wong, J. M. Craig, K. Todokoro, M. Anderson, A. Stafford, and K. H. Choo. 2003. Transcription within a functional human centromere. Mol. Cell 12:509-516.

    Saitou, N., and M. Nei. 1987. The Neighbor-Joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.

    Schwabe, J. W. 1997. The role of water in protein-DNA interactions. Curr. Opin. Struct. Biol. 7:126-134.

    Swofford, D. L. 2000. PAUP: Phylogenetic analysis using parsimony (and other methods). Sinauer, Sunderland, Mass.

    Talbert, P. B., R. Masuelli, A. P. Tyagi, L. Comai, and S. Henikoff. 2002. Centromeric localization and adaptive evolution of an Arabidopsis histone H3 variant. Plant Cell 14:1053-1066.

    Vermaak, D., H. S. Hayden, and S. Henikoff. 2002. A centromere targeting element within the histone fold domain of Cid. Mol. Cell. Biol. 22:7553-7561.

    Yang, Z. 1997. PAML, a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555-556.(Jennifer L. Cooper and St)