当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第4期 > 正文
编号:11176540
Structure, Divergence, and Distribution of the CRR Centromeric Retrotransposon Family in Rice
http://www.100md.com 《分子生物学进展》
     * Department of Horticulture, University of Wisconsin-Madison; Research Institute for Bioresources, Okayama University, Kurashiki, Japan; Institute of Plant Molecular Biology, Ceske Budejovice, Czech Republic; Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China; and || The Institute for Genomic Research, Rockville, Maryland

    Correspondence: E-mail: jjiang1@wisc.edu.

    Abstract

    The centromeric retrotransposon (CR) family in the grass species is one of few Ty3-gypsy groups of retroelements that preferentially transpose into highly specialized chromosomal domains. It has been demonstrated in both rice and maize that CRR (CR of rice) and CRM (CR of maize) elements are intermingled with centromeric satellite DNA and are highly concentrated within cytologically defined centromeres. We collected all of the CRR elements from rice chromosomes 1, 4, 8, and 10 that have been sequenced to high quality. Phylogenetic analysis revealed that the CRR elements are structurally diverged into four subfamilies, including two autonomous subfamilies (CRR1 and CRR2) and two nonautonomous subfamilies (noaCRR1 and noaCRR2). The CRR1/CRR2 elements contain all characteristic protein domains required for retrotransposition. In contrast, the noaCRR elements have different structures, containing only a gag or gag-pro domain or no open reading frames. The CRR and noaCRR elements share substantial sequence similarity in regions required for DNA replication and for recognition by integrase during retrotransposition. These data, coupled with the presence of young noaCRR elements in the rice genome and similar chromosomal distribution patterns between noaCRR1 and CRR1/CRR2 elements, suggest that the noaCRR elements were likely mobilized through the retrotransposition machinery from the autonomous CRR elements. Mechanisms of the targeting specificity of the CRR elements, as well as their role in centromere function, are discussed.

    Key Words: bacterial artificial chromosomes ? centromeric retrotransposon ? long terminal repeats ? rice

    Introduction

    Retrotransposons are mobile genetic elements that transpose through reverse transcription of an RNA intermediate. Retrotransposons include two different classes, depending on the presence of the long terminal repeats (LTRs). LTR retrotransposons are further subclassified into the Ty1-copia and Ty3-gypsy groups based on the order of the coding regions within their pol genes (Xiong and Eickbush 1990). LTR retrotransposons account for a significant portion of most plant genomes and play an important role in genome divergence and evolution (Kumar and Bennetzen 1999; Feschotte, Jiang, and Wessler 2003). For example, LTR retrotransposons represent more than 50% of the maize genome, with a majority having transposed within the past 2 to 6 Myr (SanMiguel et al. 1996, 1998). Accumulation of LTR retrotransposons in the intergenic regions played a major role in the divergence between the maize and other cereal genomes (Chen et al. 1997).

    Retrotransposons demonstrate different distribution patterns in plant genomes. Some retrotransposons are dispersed throughout plant genomes (Heslop-Harrison et al. 1997; Mroczek and Dawe 2003), whereas others are highly enriched in distinct chromosomal domains (Jiang et al. 2002, Jiang et al. 2003; Mroczek and Dawe 2003). These distribution patterns are likely caused by different targeting specificities of the retrotransposons. Recent studies in Saccharomyces cerevisiae have shed light on the mechanisms of retrotransposon insertion specificity (Sandmeyer 2003; Bushman 2004). For example, the Ty5 retrotransposon in S. cerevisiae inserts preferentially into the heterochromatic regions. This targeting specificity is determined by interactions between the targeting domain at the C-terminus of the Ty5 integrase (IN) and the heterochromatin protein Sir4p (Zhu et al. 2003). Thus, the targeted integration of Ty5 is controlled by protein-protein interactions.

    One of the most interesting retrotransposon families in plants is the centromeric retrotransposon (CR) in the grass species. CR belongs to the Ty3-gypsy group and is highly specific to the centromeric regions of grass chromosomes (Jiang et al. 2003). The CR elements are found in both monocot and dicot species and represent a distinct clade in the Metaviridae family (Gorinsek, Gubensek, and Kordis 2004). Two repetitive DNA sequences specific to grass centromeres were isolated from sorghum (Jiang et al. 1996) and Brachypodium sylvaticum (Aragon-Alcaide et al. 1996), and these sequences were later found to be derived from different parts of the CR elements (Miller et al. 1998; Presting et al. 1998; Langdon et al. 2000). The rice and maize CR subfamilies were named as CRR (CR of rice) and CRM (CR of maize), respectively (Cheng et al. 2002; Zhong et al. 2002). Both CRR and CRM elements are highly intermingled with centromere-specific satellite repeats (Cheng et al. 2002; Jin et al. 2004). Chromatin immunoprecipitation (ChIP) analysis demonstrated that CRR and CRM elements are enriched in centromeric chromatin containing the centromere-specific histone H3 variant (CenH3), suggesting that the CRR and CRM elements may play a role in centromere function (Zhong et al. 2002; Nagaki et al. 2004).

    Rice chromosomes 1, 4, 8, and 10, including the centromeres of chromosomes 4 and 8, have been sequenced to high quality (Feng et al. 2002; Sasaki et al. 2002; Yu et al. 2003; Nagaki et al. 2004; Wu et al. 2004) (http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml). We identified all of the CRR elements and their solo LTRs, which include only the LTR sequence and may be derived from illegitimate recombination within and between the CRR elements (Devos, Brown, and Bennetzen 2002), in these four chromosomes and further analyzed their structure, distribution, and divergence. Phylogenetic analysis revealed that the CRR family consists of four structurally diverged subfamilies, including two autonomous and two nonautonomous subfamilies. The autonomous and nonautonomous CRR elements show similar chromosomal distribution patterns and share substantial sequence similarities within regions required for DNA replication and integrase recognition. These results have provided new insights about the evolution and mechanism of centromeric targeting specificity of the CR retrotransposon family in grasses.

    Materials and Methods

    Sequence Analyses

    Sequenced rice bacterial artificial chromosomes (BACs) and P1 artificial chromosome clones (PACs) that are derived from chromosomes 1, 4, 8, and 10 and contain CRR-related sequence were identified by Blast search (http://www.ncbi.nlm.nih.gov/Blast/) using CRR sequences in AC092749 and AC022352 and in BAC 17P22 (Cheng et al. 2002; Nagaki et al. 2003) as queries. The BAC/PAC sequences were aligned using the MegAlign software (DNASTAR, Madison, Wis.) along with the known CRR sequences to extract the CRR elements. The extracted CRR sequences were deposited in GenBank (accession numbers AY827956 to AY828189). CRR sequences were analyzed using the Staden Package software (Staden 1996) and tools implemented at the Biology Workbench server (http://workbench.sdsc.edu/). The search for conserved protein domains was carried out with RPS-Blast (Marchler-Bauer et al. 2003) (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi). Phylogeny of LTRs from the CRR elements was analyzed by the neighbor-joining method with ClustalX software (Saitou and Nei 1987; Thompson et al. 1997). CRR elements from different subfamilies were compared with each other using the MegAlign software. The ages of the CRR elements were estimated by sequence comparison between the two LTRs from individual CRR elements (Nagaki et al. 2003).

    Fluorescence in situ Hybridization

    Oryza sativa spp. japonica cv. Nipponbare was used for cytological analyses. The fluorescence in situ hybridization (FISH) procedures on meiotic pachytene chromosomes have been described previously (Cheng et al. 2001). Primers specific to the LTRs of the four CRR subfamilies were designed. Primers include CRR1-a (AACCAGATCGCAAGCAACACTA), CRR1-b (TACATCCAAACAAAACCCAAAG), CRR2-a (CACTCGTGTTTTACTCAGGAA), CRR2-b (CAGGCAGACGGGCGGTTTAGC), noaCRR1-a (GCCACCTGCTACACTGCTGACT), noaCRR1-b (CCGACTACAACCATACGAGACG), noaCRR2-a (TCATAACTTCACACGCTCCAAT), and noaCRR2-b (TGCAATCGCTACACCACAAACG). DNA fragments corresponding to LTRs of each subfamilie were amplified from the genomic DNA of Nipponbare and were labeled as FISH probes. Polymerase chain reaction (PCR) conditions were 30 cycles at 94°C for 30 s, 55°C for 30 s, and 72°C for 1 min. Plasmid pRCS2 (Dong et al. 1998) was used as a probe to detect the rice centromere-specific satellite CentO.

    ChIP-PCR

    ChIP-PCR was conducted as described previously (Nagaki et al. 2004) using 1-week-old etiolated rice seedlings and purified anti-CenH3 antibody. Pre-immuno blood was used as a mock in the ChIP experiments. DNA from antibody-bound fraction and mock experiments were used as the template in PCR. PCR primers specific to each of the four CRR subfamilies were designed (table 1). Two sets of primers were designed from the 18S-25S ribosomal RNA genes (rDNA) (table 1) and were used as negative control for ChIP-PCR. PCR conditions were 30 cycles at 94°C for 30 s, 55°C for 30 s, and 72°C for 1 min. The PCR products were electrophoresed and blotted on HybondN+ membrane (Amersham Biosciences, Piscataway, NJ). The same PCR products were used as probes for Southern hybridizations. The membranes were hybridized at 65°C overnight and then washed sequentially with 2 x SSC with 0.1% SDS, 0.5 x SSC with 0.1% SDS, and 0.1 x SSC with 0.1% SDS. The signals were detected by phosphoimaging. Relative enrichment (RE) was calculated by comparing antibody-associated PCR product ratios to product ratios from mock experiments using the following formula: RE = (LTRs or rDNA1/rDNA2)antibody/(LTRs or rDNA1/rDNA2)mock. The probability (P) of the mock fractions and antibody fractions belonging to same group was analyzed by t-test.

    Table 1 Primers Used for ChIP-PCR Analysis

    Results

    Divergence of the CRR Elements

    The CR elements can be grouped into "autonomous" and "nonautonomous" subfamilies (Langdon et al. 2000). The autonomous CR elements are full-size elements. The nonautonomous CR elements have an internal deletion leading to the loss of all enzymatic functions, resulting in the retrotransposons having only LTRs, 5' untranslated region (UTR), and a gag structural gene fragment (Langdon et al. 2000) (fig. 1A). Phylogenetic studies revealed that the full size CR elements in maize can be grouped into two distinct subfamilies (Nagaki et al. 2003). The LTRs and 5' UTRs between the two subfamilies are more diverged than the pol and gag regions (Nagaki et al. 2003).

    FIG. 1.— (A) The structures of autonomous (CRR1 and CRR2) and nonautonomous (noaCRR1 and noaCRR2) CRR elements. (B) Conserved DNA motifs among CRR elements from all four CRR subfamilies.

    We searched all of the sequenced rice BACs/PACs in GenBank using three published CRR sequences (Cheng et al. 2002; Nagaki et al. 2003) as queries. We were able to identify 72, 69, 60, and 53 putative CRR-containing BACs/PACs from chromosomes 1, 4, 8, and 10, respectively. These four chromosomes, including the centromeres of chromosomes 4 and 8, have been sequenced to high quality (Feng et al. 2002; Sasaki et al. 2002; Yu et al. 2003; Nagaki et al. 2004; Wu et al. 2004; Zhang et al. 2004) (http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml). Putative CRR elements were individually analyzed by comparing with the previously reported CRR sequences using MegAlign software. A total of 32 autonomous elements, 86 nonautonomous elements, and 116 solo LTRs were identified from this analysis (accession numbers AY827956 to AY828189).

    Intact LTRs from the CRR elements were analyzed using ClustalX together with the previously reported CRR sequences and CR elements from maize, including CRM1 and CRM2 (Nagaki et al. 2003) and CentA (a nonautonomous CRM element) (Ananiev, Phillips, and Rines 1998). The CRR elements were clustered into four groups, with the branches having more than 97 pre-100 bootstrap test values (fig. 2). Two of the four clusters include only LTRs from the autonomous CRR elements, and the other two clusters include LTRs only from the nonautonomous CRR elements (fig. 2). We named the two autonomous clusters CRR1 and CRR2, respectively, because of their sequence similarities to CRM1 and CRM2 in maize (Nagaki et al. 2003) (fig. 2). The two nonautonomous clusters were arbitrarily named noaCRR1 and noaCRR2, respectively.

    FIG. 2.— Phylogenetic analysis of the CR elements from rice and maize. Phylogenetic trees are constructed from the LTRs of the CR elements. Bootstrap values in 100 tests are indicated on the branches.

    Structure of the CRR Elements from Different Subfamilies

    Despite of the sequence divergence among the four subfamilies, all CRR elements have a conserved primer-binding site (PBS) complementary to 12 bp at the 3' end of tRNAini from wheat (Sprinzl et al. 1999), which is located 2 bp downstream of the 3' end of the 5' LTR, as well as a polypurine tract (PPT) located immediately upstream of the 3' LTR (fig. 1B). The termini of the LTRs contain the inverted repeat motif TGATG/CATCA that is strongly conserved among all the CRR elements. An additional common feature of the CRR elements is an A-rich stretch within the 5' UTR, although its length varies among the four CRR subfamilies.

    Search for conserved domains within the polyprotein performed using RPS-Blast (Marchler-Bauer et al. 2003) allowed identification of GAG, zinc-finger, protease (PRO), reverse-transcriptase (RT), and integrase (IN) domains (table 2). The ribonuclease H (RH) domain was identified based on the presence of typical DEDD motif (Malik and Eickbush 2001). The CRR1/CRR2 elements contain all characteristic protein domains required for retrotransposition. The noaCRR1 elements contain only a partial GAG domain lacking at least part of the nucleocapsid domain defined by zinc finger. The noaCRR2 elements show heterogeneous structures. Among the six noaCRR2 elements analyzed, two of them show a similar structure to noaCRR1 elements and contain a partial GAG domain, and two elements have a complete GAG, together with the PRO domain. The remaining two elements contain no coding regions (fig. 1A). Besides these domains, putative proteins in the nonautonomous elements contain relatively large downstream regions, which have no similarity to the polyprotein of the autonomous elements. Similar to previously described CR elements (Langdon et al. 2000), the putative coding regions of newly identified CRR elements from all four groups extended into the 3' LTR.

    Table 2 Putative Proteins Encoded by CRR Elements

    Sequence comparisons among representative elements from the four subfamilies showed that the CRR1 and CRR2 elements have a conserved pol region; yet the LTRs, 5' UTR, and gag regions were diverged between the two subfamilies (fig. 3). The noaCRR1 and noaCRR2 elements have partial homology with the autonomous elements within LTRs, 5' UTR, and gag regions (fig. 3). The LTR regions of noaCRR1 show the highest sequence similarity with the LTR regions of the autonomous CRR elements. In contrast, the gag-pro coding region of noaCRR2 has the highest sequence similarity with the corresponding region of the autonomous elements (fig. 3).

    FIG. 3.— Dotplot analysis of the four CRR subfamilies using 60% match and a window size of 30. The sequence similarity of representative elements from the four subfamilies were compared with each other. The specific domains of each element are drawn on the top or left sides of the plots.

    Distribution of the CRR Elements in Rice Chromosomes 1, 4, 8, and 10

    We plotted the CRR sequences on the genetic maps of rice chromosomes 1, 4, 8, and 10 (fig. 4). Each linkage map is divided by 10-cM units, with one of the units spanning the genetically mapped centromeres (Harushima et al. 1998). The majority of the full-size CRR1, CRR2, and noaCRR1 elements, along with their solo LTRs, are located in the centromeric regions (fig. 4). In contrast, we did not find noaCRR2 elements in two of the four centromeres analyzed.

    FIG. 4.— Distribution of the CRR elements on the genetic maps of rice chromosomes 1, 4, 8, and 10. The CRR elements are plotted on the genetic map using 10 cM per unit. The genetically mapped centromeric regions (Harushima et al. 1998) are located within one of the 10-cM units that is shown in a shadowed box. Open bars represent the numbers of the CRR elements located within the genetically mapped centromeric regions.

    PCR-amplified DNAs using primers specific to the LTRs of each of the four subfamilies were used as probes for FISH analysis on rice pachytene chromosomes. In general, the FISH signals derived from the LTR probes of CRR1, CRR2, and noaCRR1 have a similar pattern (fig. 5). However, the signals from the CRR1 and CRR2 probes are more concentrated in the centromeric and/or pericentromeric regions than are those from the noaCRR1 probe. Unambiguous signals outside of the pericentromeric regions were more frequently observed with the noaCRR1 probe than with the CRR1 and CRR2 probes (fig. 5). These results are consistent with the distribution patterns generated from sequence plotting (fig. 4). The noaCRR2 probe generated only weak signals that were inconclusive on their centromere-specificity (data not shown).

    FIG. 5.— FISH mapping on rice pachytene chromosomes using LTR sequences derived from CRR2 (A–C) and noaCRR1 (D–F). (A) and (D): FISH signals derived from CentO probe pRCS2; (B) and (E): FISH signals derived from the LTR sequences of CRR2 and noaCRR1, respectively; (C) and (F): Merged images. Arrowheads in (E) and (F) point to some of the noncentromeric FISH signals derived from the noaCRR1 LTR probe. Chromosomes are stained by 4',6-diamidino-2-phenylindole in blue. Bars represent 10 μm.

    Plant centromeres contain long tracts of satellite repeats, which prevent cloning and sequencing efforts (Henikoff 2002). Thus far, only the centromeres of rice chromosomes 4 and 8 have been sequenced owing to the limited amount of the centromeric satellite repeat in these two chromosomes (Cheng et al. 2002; Nagaki et al. 2004; Wu et al. 2004; Zhang et al. 2004). Quantitative FISH analysis estimated that the centromeres of rice chromosomes 1 and 10 contain approximately 1,400 and 500 kb of the CentO repeat (Cheng et al. 2002). However, only 70 kb and 2 kb of CentO sequences are reported in the most updated chromosomes 1 and 10 sequences (http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml). Thus, the collections of CRR elements from chromosomes 1 and 10 are most likely not complete, as the CRR elements frequently insert into the CentO arrays (Cheng et al. 2002), which is reflected by the fact that chromosome 1 and 10 have fewer CRR elements than does chromosomes 4 and 8 in our collection (fig. 4).

    Age of the CRR Elements

    LTR nucleotide identity was used to estimate the ages of the CRR elements using a reported nucleotide substitution rate of 6.5x10–9 (Gaut et al. 1996). Average age, standard deviation, youngest age, and oldest age of the CRR elements from each of the four subfamilies are listed in table 3. The average age of the two autonomous subfamilies, CRR1 and CRR2, are approximately 0.44 and 0.87 Myr, respectively. In contrast, the average age of the noaCRR1 and noaCRR2 elements are 2.13 and 3.91 Myr, respectively, which is significantly older than the autonomous elements. We did not find a correlation between the age and the chromosomal locations of the CRR elements (data not shown).

    Table 3 The Age of CRR Elements

    Association of CRR Elements with CenH3-Associated Chromatin

    The rice chromosome 8, including the centromere, has been sequenced (Nagaki et al. 2004; Wu et al. 2004)(http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml). Chromosome 8 includes a total of 61 CRR1, CRR2, noaCRR1 elements and solo LTRs derived from these three subfamilies. Among these CRR elements, 45 are located in the genetically defined centromere (fig. 3), and 21 are located within the approximately 750-kb CenH3-associated chromatin domain (Nagaki et al. 2004). The only noaCRR2 element associated with chromosome 8 is not located in the centromere (fig. 4).

    To further confirm whether all four CRR subfamilies are associated with CenH3-associated chromatin, we conducted ChIP-PCR analysis using the rice anti-CenH3 antibody (Nagaki et al. 2004) and PCR primers designed to the LTR regions (table 1). The rDNA, which is located in the subtelomeric regions of rice chromosomes (Fukui, Ohmido, and Khush 1994), was used as controls in the ChIP-PCR analysis. A set of PCR primers was also constructed from the LTR of RIRE3, which is one of the most dominant Ty3/gypsy class of retrotransposons in the rice genome and was found in the centromeric regions of the rice chromosomes 5 (Nonomura and Kurata 2001) and 8 (Nagaki et al. 2004).

    The relative enrichment (RE) of the CRR1-LTR was 3.8 (standard error [SE] = ±0.6, n = 5) on average, which is significantly higher (P < 0.003) than the RE of the rDNA control (0.9, SE = ±0.0, n = 5) (fig. 6). REs of CRR2-LTR and noaCRR1-LTR were 3.9 (SE = ±0.4, n = 5) and 7.3 (SE = ±0.7, n = 5), respectively, both significantly increased in the immunoprecipitated fraction (CRR2-LTR: P < 0.0002; noaCRR1-LTR: P < 0.00003). The RIRE3-LTR was slightly increased (RE = 1.4, SE = ±0.1, n = 5) compared with the rDNA control (P < 0.004). However, the noaCRR2-LTR was not significantly increased (RE = 0.9, SE = ±0.1, n = 5, P < 0.31) (fig. 6). These results confirmed that at least a portion of the CRR1, CRR2, and noaCRR1 elements are located within CenH3-associated chromatin. However, we failed to demonstrate the potential association of the noaCRR2 elements with CenH3-associated chromatin.

    FIG. 6.— ChIP-PCR analysis using the rice anti-CenH3 antibody. The relative enrichment (RE) of different CRR elements as well as RIRE3 elements in the CenH3-associated chromatin is compared with the RE of the rDNA control. Mean (n = 5) RE levels are shown as histogram bars with standard error. P-values calculated based on Student's t-test are shown as percentages.

    Discussion

    Evolution of the CRR Elements

    Several types of nonautonomous LTR retrotransposons have been reported recently, including LARDs (large retrotransposon derivatives) and TRIMs (terminal-repeat retrotransposons in miniature) (Witte et al. 2001; Jiang et al. 2002; Havecker, Gao, and Voytas 2004; Kalendar et al. 2004). Both LARDs and TRIMs lack open reading frames between the two LTRs but retain a primer-binding site and a polypurine tract (Havecker, Gao, and Voytas 2004). Therefore, mobilization of LARDs and TRIMs relies on the retrotransposition machinery provided from the autonomous elements. Nevertheless, for most of the nonautonomous elements, it is not clear how they are mobilized and what machinery is utilized for their retrotransposition. Recently, a relationship between a nonautonomous retrotransposon (Dasheng) and an autonomous element (RIRE2) has been demonstrated in rice (Jiang, Jordan, and Wessler 2002). Dasheng and RIRE2 share significant sequence similarity within LTRs, PBS, and PPT and have similar chromosomal distribution patterns. In addition, the presence of chimeric RIRE2-Dasheng elements suggest possible copackaging of RNAs from both elements in the same viruslike particle (VLP) (Jiang, Jordan, and Wessler 2002).

    The discovery of autonomous and nonautonomous CRR elements provides another example of evolution of nonautonomous retrotransposon families. The full-size CRR elements encode for a polyprotein with all characteristic domains. Although some of the CRR1/CRR2 elements seem to be slightly mutated within the coding region, others seem to have the region intact, implying that they could be still capable of autonomous transposition, which is in agreement with the young age estimated for these elements (table 3). The noaCRR elements have different structures (fig. 1A). Most noaCRR elements contain the gag or a gag-pro gene, which is different from LARDs and TRIMs, which contain no open reading frames. The CRR and noaCRR elements share substantial sequence similarity of the LTRs and have fully conserved PBS and PPT regions (fig. 1B). There is also a strongly conserved heptanucleotide inverted repeat at the termini of LTRs (fig. 1B). The inverted repeats differ in sequence and length among different retroelements and are important for recognition by integrase (Hindmarsh and Leis 1999). Thus, the conservation of these sites suggests that autonomous and nonautonomous elements use the same or very similar enzyme machinery. The presence of young noaCRR elements in the rice genome (table 3) coupled with similar chromosomal distribution between noaCRR1 and CRR1/CRR2 elements further suggest that the noaCRR elements are likely mobilized through the retrotransposition machinery from CRR elements, a similar scenario as Dasheng and RIRE2.

    It is interesting to note that most noaCRR elements contain the gag or gag-pro genes, a feature different from LARDs and TRIMs. As reverse transcription takes place only in the VLPs, nonautonomous elements must have a mechanism that allows their RNA to be packaged during the assembly of VLPs. This can be achieved by the presence of encapsidation signals that should be conserved among autonomous and nonautonomous elements. Several candidate regions could be identified within LTR, 5' UTR, and gag (data not shown). However, it remains unclear which of them, if any, serve as an actual encapsidation signal. An alternative scenario can be envisioned for some of the noaCRR2 elements. These elements encode for a protein with the GAG and PRO domains, which alone should be capable of RNA packaging, as was demonstrated in the case of retroviruses (Swanstrom and Wills 1997). The remaining enzymes supplied by autonomous elements could be assembled into VLPs by a virtue of protein-protein interactions between the GAG-PRO and GAG-PRO-POL polyproteins. Such interactions are well documented in retroviruses and are very important in the process of the virion assembly (Swanstrom and Wills 1997; Freed 1998). This scenario cannot be readily applied to noaCRR1 elements, as they lack portion of the nucleocapsid domain responsible for RNA binding. However, even the noaCRR1 proteins are likely to play some roles during the assembly process, as the appropriate coding region appears to have evolved under selection constraints. Alternatively, the discovery of several noaCRR2 elements lacking the whole coding region suggests that all necessary enzymes could be supplied in trans, like in the Dasheng and RIRE2 elements (Jiang, Jordan, and Wessler 2002).

    Targeting Specificity of the CRR Elements

    The CRR elements are highly concentrated in the centromeric and pericentromeric regions (fig. 5). In the centromere of rice chromosome 8, the CRR elements are highly enriched within the chromatin domain containing CenH3 (Nagaki et al. 2004). In maize, CRM elements are highly intermingled with a centromeric satellite repeat CentC, suggesting that CRM transposed preferentially into CentC satellite arrays or into other CRM elements. Maize CenH3 is almost exclusively associated with intermingled CRM/CentC sequences (Jin et al. 2004). These results suggest that CR elements in both rice and maize transposed preferentially into CenH3-associated chromatin domains.

    In yeast, the Ty3 element integrates only in DNA encoding the 5' end of genes transcribed by RNA polymerase III. The mechanism of Ty3 integration appears to involve the interaction between integration complex and the TFIIIB component of the PolIII transcription apparatus (Kirchner, Connolly, and Sandmeyer 1995). The targeting of the Ty5 element into the heterochromatin domains is determined by interactions between the targeting domain of the integrase and the heterochromatin protein Sir4p (Zhu et al. 2003). The preferential integration of CRR elements within and near the CenH3-associated DNA domain suggests that the targeting mechanism of CRR elements may involve an interaction with centromeric proteins. CenH3, a histone H3 variant, would be a good candidate because it is a constitutive component of the centromeric chromatin. The Tf1 element of Schizosaccharomyces pombe preferentially inserts in intergenic regions (Behrens, Hayles, and Nurse 2000; Singleton and Levin 2002). It has recently been proposed that the Tf1 integration may be controlled by an interaction of the chromodomain located at the C terminal of the integrase with histone H3 methylated at lysine 4 (Sandmeyer 2003). The N terminal of CenH3 is significantly diverged from the N terminal of histone H3 (Henikoff, Ahmad, and Malik 2001), which would provide the specificity for recognition by the CRR elements. Gorinsek, Gubensek, and Kordis (2004) recently reported that the CR family shows clear differences in the integrase sequences from other plant LTR retrotransposons. They differ in the otherwise conserved sequence motifs in the C-terminal region of the integrase, such as in the HPVFHS motif and in two motifs of the chromodomain. It will be of great interest to test whether the chromodomain of the CRR integrase interacts with CenH3 in rice.

    Interestingly, the nonautonomous CRR elements are less specific to the centromeric regions compared with the autonomous CRR elements (figs. 4 and 5). Furthermore, the LTRs of the noaCRR1 element share more sequence similarity with the LTRs of autonomous elements than with the LTRs of the noaCRR2 element (fig. 3). In parallel, noaCRR1 elements appear to target the centromeres more frequently than the noaCRR2 elements (fig. 4), especially considering the fact that we were not able to reveal an association between the noaCRR2 elements with CenH3-associated chromatin (fig. 6). These results support the hypothesis that the LTR sequences may play a role in centromere specificity of the CR family (Nagaki et al. 2003). Recognition of centromeric chromatin during noaCRR1 retrotransposition may be error prone, resulting in a less centromeric specificity of the noaCRR1 elements compared with the CRR1/CRR2 elements. Alternatively, but less likely, noaCRR1 elements may transpose using the retrotransposition machinery from other retrotransposon families, which would result in the loss of the centromeric specificity.

    CRR Elements and Grass Centromere Function

    It has been well documented that retrotransposition within or near genes will generate mutations or alter gene expression (Kumar and Bennetzen 1999; Hirochika 2001). However, few retrotransposons have been associated with specific structural and/or functional roles. For example, the telomeres of Drosophila chromosomes consist of long tandem arrays of two non-LTR retrotransposons, HeT-A and TART. These telomeric retrotransposons have a functional role in preventing the shortening of the chromosome ends (Pardue and DeBaryshe 2003). The putative role of the CR elements in centromere function was speculated mostly because of their centromere specificity (Miller et al. 1998; Presting et al. 1998). In maize, the core of the centromeres consist of primarily intermingled CRM/CentC sequences (Jin et al. 2004). Maize CenH3 is associated exclusively with such intermingled CentC/CRM sequences (Zhong et al. 2002; Jin et al. 2004). Association of CR elements with CenH3 has also been demonstrated in rice (Nagaki et al. 2004) (fig. 5). These recent results strongly suggest a structural and/or functional role of the CR elements in grass centromere function.

    Jiang et al. (2003) recently proposed that deposition of CenH3 in centromeres is possibly a transcription-mediated event. Incorporation of CenH3 into centromeric chromatin is independent of DNA replication (Shelby, Monier, and Sullivan 2000; Ahmad and Henikoff 2001; Sullivan and Karpen 2001). DNA transcription can result in displacement of histone molecules, which may provide an opportunity for CenH3 deposition/replacement (Jiang et al. 2003). DNA transcription in CenH3-associated chromatin has been reported in a human neocentromere (Saffery et al. 2003) and in the centromere of rice chromosome 8 (Nagaki et al. 2004). Nakano et al. (2003) recently showed that activation of centromeric function of ectopically integrated alpha satellite sites on human chromosomes can be achieved by treatment with histone deacetylase inhibitors, which also increases the acetylation level of histone H3 and the transcription level of a marker gene within the ectopic centromeres. This result supports the hypothesis on the relationship between DNA transcription and centromere assembly.

    LTRs usually diverge faster than the other parts of the retrotransposons. Even closely related retrotransposon families often have LTRs with no detectable sequence similarity. In contrast, the CR elements from different grass species share substantial homology in the LTR sequences. Highly conserved DNA motifs were found in the LTRs of both autonomous and nonautonomous CR elements from rice, maize, and barley (Nagaki et al. 2003). which were diverged more than 55 Myr ago (Kellogg 2001). The conservation of LTRs of CR elements from distantly related grass species suggests a selective pressure at the nucleotide level. Because the transcriptional regulatory sequences reside in the LTRs, the selection pressure of LTRs of CR elements has probably been on their capacity to initiate transcription. Transcription of CR elements and/or the flanking centromeric satellite may be an important component of centromeric chromatin assembly in the grass species (Jiang et al. 2003).

    Acknowledgements

    We thank Drs. Dan Voytas and Ning Jiang for their valuable comments on the manuscript. This research was supported by grants DE-FG02-01ER15266 and DE-FG02-01ER15265 from U. S. Department of Energy to J.J. and C.R.B., respectively. Z.C. is supported by grant 2002AA225011 from the Chinese State High-Tech Program and grants 30100099 and 30325008 from the National Natural Science Foundation of China.

    References

    Ahmad, K., and S. Henikoff. 2001. Centromeres are specialized replication domains in heterochromatin. J. Cell. Biol. 153:101–110.

    Ananiev, E. V., R. L. Phillips, and H. W. Rines. 1998. Chromosome-specific molecular organization of maize (Zea mays L.) centromeric regions. Proc. Natl. Acad. Sci. USA 95:13073–13078.

    Aragon-Alcaide, L., T. Miller, T. Schwarzacher, S. Reader, and G. Moore. 1996. A cereal centromeric sequence. Chromosoma 105:261–268.

    Behrens, R., J. Hayles, and P. Nurse. 2000. Fission yeast retrotransposon Tf1 integration is targeted to 5' ends of open reading frames. Nucleic Acids Res. 28:4709–4716.

    Bushman, F. D. 2004. Targeting survival: intergration site selection by retroviruses and LTR-retrotransposons. Cell 115:135–138.

    Chen, M., P. SanMiguel, A. C. de Oliveira, S. S. Woo, H. Zhang, R. A. Wing, and J. L. Bennetzen. 1997. Microcolinearity in sh2-homologous regions of the maize, rice, and sorghum genomes. Proc. Natl. Acad. Sci. USA 94:3431–3435.

    Cheng, Z. K., F. Dong, T. Langdon, S. Ouyang, C. B. Buell, M. H. Gu, F. R. Blattner, and J. Jiang. 2002. Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell 14:1691–1704.

    Cheng, Z., R. M. Stupar, M. Gu, and J. Jiang. 2001. A tandemly repeated DNA sequence is associated with both knob-like heterochromatin and a highly decondensed structure in the meiotic pachytene chromosomes of rice. Chromosoma 110:24–31.

    Devos, K. M., J. K. M. Brown, and J. L. Bennetzen. 2002. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 12:1075– 1079.

    Dong, F., J. T. Miller, S. A. Jackson, G. L. Wang, P. C. Ronald, and J. Jiang. 1998. Rice (Oryza sativa) centromeric regions consist of complex DNA. Proc. Natl. Acad. Sci. USA 95:8135–8140.

    Feng, Q., Y. J. Zhang, P. Hao, et al. (74 co-authors). 2002. Sequence and analysis of rice chromosome 4. Nature 420:316–320.

    Feschotte, C., N. Jiang, and S. R. Wessler. 2003. Plant transposable elements: where genetics meets genomics. Nat. Rev. Genet. 3:329–341.

    Freed, E. O. 1998. HIV-1 gag proteins: diverse functions in the virus life cycle. Virology 251:1–15.

    Fukui, K., N. Ohmido, and G. S. Khush. 1994. Variability in rDNA loci in the genus Oryza detected trough fluorescence in-situ hybridization. Theor. Appl. Genet. 87:893–899.

    Gaut, B. S., B. R. Morton, B. C. McCaig, and M. T. Clegg. 1996. Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl. Acad. Sci. USA 93:10274–10279.

    Gorinsek, B., F. Gubensek, and D. Kordis. 2004. Evolutionary genomics of chromoviruses in eukaryotes. Mol. Biol. Evol. 21:781–798.

    Harushima, Y., M. Yano, A. Shomura et al. (17 co-authors). 1998. A high-density rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148:479–494.

    Havecker, E. R., X. Gao, and D. F. Voytas. 2004. The diversity of LTR retrotransposons. Genome Biol. 4:225.

    Henikoff, S. 2002. Near the edge of a chromosome's ‘black hole’. Trends Genet. 18:165–167.

    Henikoff, S., K. Ahmad, and H. S. Malik. 2001. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293:1098–1102.

    Heslop-Harrison, J. S., A. Brandes, S. Takeda et al. (14 co-authors). 1997. The chromosomal distributions of Ty1-copia group retrotransposable elements in higher plants and their implications for genome evolution. Genetica 100:197–204.

    Hindmarsh, P., and J. Leis. 1999. Retroviral DNA integration. Microbiol. Mol. Biol. Rev. 63:836–843.

    Hirochika, H. 2001. Contribution of the Tos17 retrotransposon to rice functional genomics. Curr. Opin. Plant Biol. 4:118–122.

    Jiang, J., J. B. Birchler, W. A. Parrott, and R. K. Dawe. 2003. A molecular view of plant centromeres. Trends Plant Sci. 8:570–575.

    Jiang, J., S. Nasuda, F. Dong, C. W. Scherrer, S. Woo, R. A. Wing, B. S. Gill, and D. C. Ward. 1996. A conserved repetitive DNA element located in the centromeres of cereal chromosomes. Proc. Natl. Acad. Sci. USA 93:14210–14213.

    Jiang, N., Z. Bao, S. Temnykh, Z. Cheng, J. Jiang, R. A. Wing, S. R. McCouch, and S. R. Wessler. 2002. Dasheng: a recently amplified non-autonomous LTR element that is a major component of pericentromeric regions in rice. Genetics 161:1293–1305.

    Jiang, N., I. K. Jordan, and S. R. Wessler. 2002. Dasheng and RIRE2. A nonautonomous long terminal repeat element and its putative autonomous partner in the rice genome. Plant Physiol. 130:1697–1705.

    Jin, W. W., J. R. Melo, K. Nagaki, P. B. Talbert, S. Henikoff, R. K. Dawe, and J. Jiang. 2004. Maize centromeres: organization and functional adaptation in the genetic background of oat. Plant Cell 16:571–581.

    Kalendar, R., C. M. Vicient, O. Peleg, K. Anamthawat-Jonsson, A. Bolshoy, and A. H. Schulman. 2004. Large retrotransposon derivatives: abundant, conserved but nonautonomous retroelements of barley and related genomes. Genetics 166:1437–1450.

    Kellogg, E. A. 2001. Evolutionary history of the grasses. Plant Physiol. 125:1198–1205.

    Kirchner, J., C. M. Connolly, and S. B. Sandmeyer. 1995. Requirement of RNA-polymerase III transcription factors for in vitro position-specific integration of a retrovirus-like element. Science 267:1488–1491.

    Kumar, A., and J. L. Bennetzen. 1999. Plant retrotransposons. Annu. Rev. Genet. 33:479–532.

    Langdon, T., C. Seago, M. Mende, M. Leggett, H. Thomas, J. W. Forster, H. Thomas, R. N. Jones, and G. Jenkins. 2000. Retrotransposon evolution in diverse plant genomes. Genetics 156:313–325.

    Malik, H. S., and T. H. Eickbush. 2001. Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses. Genome Res. 11:1187–1197.

    Marchler-Bauer, A., J. B. Anderson, C. DeWeese-Scott et al. (27 co-authors). 2003. CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31:383–387.

    Miller, J. T., F. Dong, S. A. Jackson, J. Song, and J. Jiang. 1998. Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150:1615–1623.

    Mroczek, R. J., and R. K. Dawe. 2003. Distribution of retroelements in centromeres and neocentromeres of maize. Genetics 165:809–819.

    Nagaki, K., Z. K. Cheng, S. Ouyang, P. B. Talbert, M. Kim, K. M. Jones, S. Henikoff, C. R. Buell, and J. Jiang. 2004. Sequencing of a rice centromere uncovers active genes. Nature Genet. 36:138–145.

    Nagaki, K., J. Song, S. M. Stupar et al. (12 co-authors). 2003. Molecular and cytological analyses of large tracks of centromeric DNA reveal the structure and evolutionary dynamics of maize centromeres. Genetics 163:759–770.

    Nakano, M., Y. Okamoto, J. I. Ohzeki, and H. Masumoto. 2003. Epigenetic assembly of centromeric chromatin at ectopic -satellite sites on human chromosomes. J. Cell Sci. 116:4021–4034.

    Nonomura, K., and N. Kurata. 2001. The centromere composition of multiple repetitive sequences on rice chromosome 5. Chromosoma 110:284–291.

    Pardue, M. L., and P. G. DeBaryshe. 2003. Retrotransposons provide an evolutionarily robust non-telomerase mechanism to maintain telomeres. Annu. Rev. Genet. 37:485–511.

    Presting, G. G., L. Malysheva, J. Fuchs, and I. Schubert. 1998. A Ty3/gypsy retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J. 16:721–728.

    Saffery, R., H. Sumer, S. Hassan, L. H. Wong, J. M. Craig, K. Todokoro, M. Anderson, A. Stafford, and K. H. A. Choo. 2003. Transcription within a functional human centromere. Mol. Cell 12:509–516.

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenic trees. Mol. Biol. Evol. 4:406–425.

    Sandmeyer, S. 2003. Intergration by design. Proc. Natl. Acad. Sci. USA 100:5586–5588.

    SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima, and J. L. Bennetzen. 1998. The paleontology of intergene retrotransposons of maize. Nature Genet. 20:43–45.

    SanMiguel, P., A. Tikhonov, Y. K. Jin et al (11 co-authors). 1996. Nested retrotransposons in the intergenic regions of the maize genome. Science 274:765–768.

    Sasaki, T., T. Matsumoto, K. Yamamoto et al. (79 co-authors). 2002. The genome sequence and structure of rice chromosome 1. Nature 420:312–316.

    Shelby, R. D., K. Monier, and K. F. Sullivan. 2000. Chromatin assembly at kinetochores is uncoupled from DNA replication. J. Cell Biol. 115:1113–1118.

    Singleton, T. L., and H. L. Levin. 2002. A long terminal repeat retrotransposon of fission yeast has strong preferences for specific sites of insertion. Eukaryot. Cell 1:44–55.

    Sprinzl, M., K. S. Vassilenko, J. Emmerich, and F. Bauer. 1999, Compilation of tRNA sequences and sequences of tRNA genes. http://www.uni-bayreuth.de/departments/biochemie/trna/.

    Staden, R. 1996. The Staden sequence analysis package. Mol. Biotechnol. 5:233–241.

    Sullivan, B., and G. Karpen. 2001. Centromere identity in Drosophila is not determined in vivo by replication timing. J. Cell. Biol. 154:683–690.

    Swanstrom, R., and J. W. Wills. 1997. Synthesis, assembly, and processing of viral proteins. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876–4882.

    Witte, C. P., Q. H. Le, T. E. Bureau, and A. Kumar. 2001. Terminal-repeatretrotransposons in miniature (TRIM) are involved in restructuring plant genomes. Proc. Natl. Acad. Sci. USA 98:13778–13783.

    Wu, J. Z., H. Yamagata, M. Hayashi-Tsugane et al. (21 co-authors). 2004. Composition and structure of the centromeric region of rice chromosome 8. Plant Cell 16:967–976.

    Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse trascriptase sequences. EMBO J. 9:3353–3362.

    Yu, Y. S., T. Rambo, J. Currie et al. (104 co-authors). 2003. In-depth view of structure, activity, and evolution of rice chromosome 10. Science 300:1566–1569.

    Zhang, Y., Y. C. Huang, L. Zhang et al (12 co-authors). 2004. Structural features of the rice chromosome 4 centromere. Nucleic Acids Res. 32:2023–2030.

    Zhong, C. X., J. B. Marshall, C. Topp, R. Mroczek, A. Kato, K. Nagaki, J. A. Birchler, J. M. Jiang, and R. K. Dawe. 2002. Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14:2825–2836.

    Zhu, Y., J. Dai, P. G. Fuerst, and D. F. Voytas. 2003. Controlling integration specificity of a yeast retrotranposon. Proc. Natl. Acad. Sci. USA 100:5891–5895.(Kiyotaka Nagaki*,, Pavel )