当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2005年 > 第11期 > 正文
编号:11259195
Long-Term Inheritance of the 28S rDNA-Specific Retrotransposon R2
     Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan

    E-mail: haruh@k.u-tokyo.ac.jp.

    Abstract

    R2 is a non–long-terminal-repeat (LTR) retrotransposon that inserts specifically into 28S rDNA. R2 has been identified in many species of arthropods and three species of chordates. R2 may be even more widely distributed in animals, and its origin may be traceable to early animal evolution. In this study, we identified R2 elements in medaka fish, White Cloud Mountain minnow, Reeves' turtle, hagfish, sea lilies, and some arthropod species, using degenerate polymerase chain reaction methods. We also identified two R2 elements from the public genomic sequence database of the bloodfluke Schistosoma mansoni. One of the two bloodfluke R2 elements has two zinc-finger motifs at the N-terminus; this differs from other known R2 elements, which have one or three zinc-finger motifs. Phylogenetic analysis revealed that the whole phylogeny of R2 can be divided into 11 parts (subclades), in which the local R2 phylogeny and the corresponding host phylogeny are consistent. Divergence-versus-age analysis revealed that there is no reliable evidence for the horizontal transfer of R2 but supports the proposition that R2 has been vertically transferred since before the divergence of the deuterostomes and protostomes. The seeming inconsistency between the R2 phylogeny and the phylogeny of their hosts is due to the existence of paralogous lineages. The number of N-terminal zinc-finger motifs is consistent with the deep phylogeny of R2 and indicates that the common ancestor of R2 had three zinc-finger motifs at the N-terminus. This study revealed the long-term vertical inheritance and the ancient origin of sequence specificity of R2, both of which seem applicable to some other non-LTR retrotransposons.

    Key Words: non-LTR retrotransposons ? R2 ? sequence specificity ? evolution ? ribosomal RNA ? vertical transmission

    Introduction

    Non–long-terminal-repeat (LTR) retrotransposons are mobile genetic elements that occupy large regions of genomes of widespread eukaryotes. In the human genome, only one non-LTR retrotransposon family, LINE-1 (L1), occupies one-sixth of the genome (Lander et al. 2001). The integration of L1 is implicated in genetic diseases and cancers (Miki 1998) and causes genome reconstruction and gene evolution (Courseaux and Nahon 2001). In contrast to the genome-wide distribution of L1, some non-LTR retrotransposons occupy only several limited loci in the host genome. Most of these site-specific non-LTR retrotransposons recognize targets at the nucleotide sequence level (Xiong and Eickbush 1988; Feng, Schumann, and Boeke 1998; Christensen, Pont-Kingdon, and Carroll 2000; Anzai, Takahashi, and Fujiwara 2001). Sequence-specific non-LTR retrotransposons target repetitive sequences such as ribosomal RNA (rRNA) genes, microsatellites, or telomeric repeats (Kojima and Fujiwara 2003, 2004). Sequence specificity is considered a survival strategy of non-LTR retrotransposons to avoid destroying their hosts by inserting into essential genes.

    R2 is one of the most investigated sequence-specific non-LTR retrotransposons and transposes exclusively into 28S rDNA. R2 was first identified as an insertion sequence in the 28S rRNA genes of the fruit fly Drosophila melanogaster (Roiha et al. 1981) and the domestic silkworm Bombyx mori (Fujiwara et al. 1984) and was later characterized as a non-LTR retrotransposon (Burke, Calalang, and Eickbush 1987; Jakubczak, Xiong, and Eickbush 1990). R2 has been identified in many arthropod species, including the earwig Forficula auricularia, the collembola Anurida maritima, and the Atlantic horseshoe crab Limulus polyphemus (Burke et al. 1998). R2 is considered to have been transmitted vertically since the early evolution of the arthropods. Recently, however, we identified R2 in the tunicates Ciona intestinalis and Ciona savignyi and in the zebrafish Danio rerio (Kojima and Fujiwara 2004). In contrast, R2 has not been identified in three species, the human, mouse, and pufferfish Fugu rubripes, the genomes of which have been sequenced (Lander et al. 2001; Aparicio et al. 2002; Waterston et al. 2002). This indicates that the extinction of R2 has occurred several times in vertebrate evolution.

    The extinction of R2 in certain organisms is evident in the arthropods (Jakubczak, Burke, and Eickbush 1991). Several R2-less insect species, such as the European house cricket Acheta domesticus, have no insertion in rDNA (Jakubczak, Burke, and Eickbush 1991). The malaria mosquito Anopheles gambiae has lost the R2 elements and instead has another three sequence-specific non-LTR retrotransposon families in its rDNA (Kojima and Fujiwara 2003). Conversely, some insects have multiple lineages (elements diversified in ancient times from a single element and vertically transmitted) of R2. For example, the Japanese beetle Popillia japonica contains at least five lineages of R2 (Burke et al. 1993). R2 is subject to significant changes in copy number even within a single species (Jakubczak et al. 1992), which are due to its rapid turnover, with high rates of retrotransposition and elimination (Perez-Gonzalez and Eickbush 2001, 2002). Extinction and diversification of R2 could be two opposite results of this rapid turnover. Simultaneously, extinction and diversification make it difficult to trace the evolution of R2.

    Here, we investigated the evolutionary diversification of the R2 family non-LTR retrotransposons in bilaterian animals. We characterized R2 elements from a wide variety of deuterostomes, including reptiles, teleosts, hagfish, and sea lilies. We also characterized R2 elements from the genomic sequence database of the bloodfluke Schistosoma mansoni. Based on the number of N-terminal zinc-finger motifs and phylogenetic analyses, R2 can be classified into four "clades." Divergence-versus-age analysis supports the long-term vertical inheritance and the existence of paralogous lineages of R2, which have been maintained for more than 850 Myr.

    Materials and Methods

    Degenerate Polymerase Chain Reaction Cloning and Sequencing

    The genomic DNA used for cloning is listed in table 1. To amplify partial sequences of novel representatives of R2 family retrotransposons effectively, we designed four degenerate primers, based on known R2 elements. Primers used for polymerase chain reaction (PCR) were R2IF1, 5'-AAGCARGGNGAYCCNCTNTC-3'; R2IF2, 5'-GCYYTRGCGTTYGCNGAYGA-3'; R2IIF1, 5'-GTNAARCARGGNGAYCCNCT-3'; and R2IIF2, 5'-CTNGCNTTYGCNGAYGAYYT-3'. These degenerate primers were used with a 28S rDNA-specific primer 28S-R-B, 5'-ATCCATTCATGCGCGTCACT-3', which was designed just downstream from the R2 insertion site.

    Table 1 List of Genomic DNA Used in This Study

    PCR was performed for 35–40 cycles (96°C for 20 s, 50°C or 55°C for 20 s, and 72°C for 4 min). Amplified PCR products were cloned into pGEM-T Easy vector (Promega, Madison, Wisc.) and sequenced with ABI PRISM 310 Genetic analyzer (PE Applied Biosystems) or ABI PRISM 3100 Genetic analyzer (PE Applied Biosystems) using BigDye cycle sequencing kit (PE Applied Biosystems, Foster City, Calif.). Primer sequences used for sequencing are available on request. Nucleotide sequences of the R2 elements identified in this study are deposited under accession numbers AB201408–AB201417.

    Database Analysis

    Computer-based nucleotide and protein searches were performed using different Blast search programs (Altschul et al. 1997) at National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/BLAST), Human Genome Sequencing Center at Baylor College of Medicine (http://www.hgsc.bcm.tmc.edu/projects/honeybee/), U.S. Department of Energy Joint Genome Institute (http://www.jgi.doe.gov/index.html), Japan National BioResource Project (http://www.nbrp.jp/index.jsp), and The Institute for Genomic Research (TIGR) (http://tigrblast.tigr.org/er-blast/index.cgi?project=sma1). Protein sequences of non-LTR retrotransposons as previously described (Kojima and Fujiwara 2004) were used as queries for database searches. We constructed representative retrotransposon sequences from several sequences derived from different genomic positions. Sequences more than 90% identical to each other were connected in order to include longer open reading frames (ORFs). The reconstructed sequences of the R2 elements are available from the authors' Web site (http://www.biol.s.u-tokyo.ac.jp/users/animal/kojima/sequence.html).

    Sequence Alignment and Phylogenetic Analysis

    Amino acid sequences of elements were aligned using ClustalX (Thompson et al. 1997). Bayesian phylogenetic trees were constructed using MrBayes 3 (Ronquist and Huelsenbeck 2003). Markov chain Monte Carlo chain length was 200,000 generations with trees sampled every 10 generations; the first 1,000 trees were discarded as burn-in. Neighbor-joining (NJ) trees were constructed using ClustalX. Nonparametric bootstrap analyses were performed with 1,000 replicates. Amino acid distances used in divergence-versus-age analysis were calculated from sequences of the C-terminal half (about 175 residues) of the reverse transcriptase (RT) domain using the MEGA2 program (Kumar et al. 2001).

    Results and Discussion

    Identification of R2 Elements from Various Bilaterian Animals

    We used PCR to identify new R2 elements, according to a previous report (Burke et al. 1998). One degenerate primer was designed to hybridize to the conserved RT motif B' or motif C, and a second primer was specific for the 28S rDNA sequence just downstream from the R2 insertion site. We used the genomic DNA of 15 chordates, 5 echinoderms, and 3 arthropods as templates (table 1). From these, we identified six R2 elements from four chordates, one R2 element from one echinoderm, and three R2 elements from three arthropod species. From 15 of 23 species, we could not identify R2 elements. One possibility is that R2 does not exist in these species. However, we cannot exclude the possibility that our method was inadequate to identify R2 elements in these species, for example, due to sequence mismatch between R2 and degenerated primers. Thus we cannot conclude the inexistence of R2 in these species.

    We also searched for novel R2 elements in public genomic databases (see Materials and Methods), using different Blast methods (Altschul et al. 1997). We searched the genomic DNA databases of the human Homo sapiens, the mouse Mus musculus, the rat Rattus norvegicus, the dog Canis familiaris, the chicken Gallus gallus, the western clawed frog Xenopus tropicalis, the medaka fish Oryzias latipes, the honeybee Apis mellifera, the yellow fever mosquito Aedes aegypti, and the bloodfluke S. mansoni. We identified novel R2 elements from the honeybee and bloodfluke. Because there was no complete retrotransposon sequence in the draft genome databases, we reconstructed representative retrotransposon sequences by combining several sequences derived from different genomic positions. We reconstructed one R2 element, designated R2Amel, in the honeybee genome, and two R2 elements, designated R2Sm-A and R2Sm-B, in the bloodfluke genome. We characterized the complete sequences of R2Amel and R2Sm-A but could not characterize the 5' and 3' junction sequences of R2Sm-B due to limited information about the genomic sequence.

    We investigated whether these novel R2 elements inserted into the target site of 28S rDNA. Unlike other non-LTR retrotransposons, most R2 elements have neither 3' poly(A) tails nor 3'-terminal repeats, so the junction was determined by sequence comparisons of inserted and noninserted rDNA units. All known R2 elements have identical insertion sites, 5'-AAGG TAGC-3' (see fig. 1B). Whereas the 3' junction is highly conserved, the 5' junction is subject to deletion, duplication, and nontemplate nucleotide insertion. Therefore, we compared the 3' termini of R2 elements (fig. 1A). All 28S rDNA sequences flanking the newly identified R2 elements were identical to those of known R2 elements. This indicates that the novel R2 elements identified in this study are truly 28S rDNA-specific non-LTR retrotransposons. The crayfish R2 element (R2Pc) has a 3' poly(A) tail. 3' Poly(A) tails were also observed in the R2 elements of the Drosophila genus and in earwig R2, isopod R2, and mealworm R2 (Burke et al. 1999).

    FIG. 1.— Comparison of the insertion sites of R2 elements. (A) Sequence of the 3'-terminus of R2 and the 28S rRNA gene flanking to R2. (B) Conservation of the sequence near the insertion site of R2. Organisms containing R2 are indicated by asterisks. The insertion site of R2 is between –1 and +1. Compared organisms are as follows (accession numbers are shown in parentheses): silkworm, Bombyx mori (AY038991); mosquito, Anopheles gambiae (AAAB01000770); fruit fly, Drosophila melanogaster (M21017); horseshoe crab, Limulus polyphemus (AF212167); bloodfluke, Schistosoma mansoni (AY197345); sea urchin, Strongylocentrotus purpuratus (AF212171); sea squirt, Ciona intestinalis (AF212177); zebrafish, Danio rerio (BX537263); pufferfish, Tetraodon nigroviridis (AL322435); frog, Xenopus borealis (X59733); human, Homo sapiens (U13369); fission yeast, Schizosaccharomyces pombe (NC_003421); and Arabidopsis, Arabidopsis thaliana (NC_003074).

    In our previous study, we could not find R2 elements in any mammals, including the human and mouse (Kojima and Fujiwara 2004). One possibility is that substitutions in the R2 target sequence drove R2 to extinction. We compared the 28S rDNA sequences around the R2 insertion sites of various animals, including two organisms in which R2 seems to be extinct, the human H. sapiens and the malaria mosquito A. gambiae (fig. 1B). The 100-bp sequences around the R2 insertion sites of the silkworm and mosquito are completely identical. The sequences of the silkworm and the fruit fly are different at two nucleotides, but these differences are not important for recognition by the R2 ORF protein because R2Bm (silkworm) can transpose into the fruit fly 28S rDNA (Eickbush, Luan, and Eickbush 2000). The 100-bp sequences around the R2 insertion sites are the same among vertebrates, including species that do not contain R2. These facts show that substitutions in the 28S rDNA sequence are not the cause of the extinction of R2.

    The 28S rDNA sequence near the R2 insertion site is highly conserved, not only among animals but also among fungi and plants (fig. 1B). There are one or two substitutions in the 100-bp region of these organisms. The insertion site of R2 is at the root of helix E27, which occurs within one highly conserved region of the 28S rRNA gene (Ben Ali et al. 1999). It is possible that other animals and even fungi and plants have R2 elements in their 28S rDNA.

    R2 Phylogeny

    Previous reports have demonstrated that apparently the R2 phylogeny is not consistent with the corresponding host phylogeny as a whole (Burke et al. 1998; Kojima and Fujiwara 2004). We analyzed the phylogeny of the newly identified R2 elements with previously reported elements. We used two methods to infer the phylogeny. Both the Bayesian phylogenetic inference tree and the NJ tree showed that the R2 phylogeny appears to be quite inconsistent with the phylogeny of their hosts (fig. 2). For instance, R2Eb (hagfish) is more closely related to R2Pc (crayfish) than to other chordate R2 elements (fig. 2, D3). R2Lp (horseshoe crab) is more closely related to vertebrate R2 elements than to other arthropod elements (fig. 2, A3).

    FIG. 2.— R2 phylogeny. The phylogeny was rooted using SLACS (Accession number: CAA34931) and GilM (AAL47180). (A) The phylogenetic tree inferred by Bayesian method. The number next to each node indicates a value as a percentage of posterior probability. (B) The phylogenetic tree by NJ method. The number next to each node indicates a bootstrap value as a percentage of 1,000 replicates.

    However, there is partial consistency between the phylogeny of the R2 elements and the phylogeny of their hosts. For example, R2Ta (White Cloud Mountain minnow) is most similar to R2Dr (zebrafish) and next most similar to R2Ol-A (medaka fish) (fig. 2, A3). These teleost R2 elements constitute the sister branch of the Reeves' turtle R2 elements (R2Cr-A, R2Cr-B1, and R2Cr-B2) (fig. 2, A3). The phylogeny of these R2 elements is consistent with the phylogeny of their hosts. The whole R2 phylogenetic tree can be divided into 11 parts (groups of elements) in which the local phylogenies of the R2 elements are consistent with those of their hosts (fig. 2, A1–D5). The local phylogenies of these 11 groups are supported by both high posterior probabilities on the Bayesian trees and high bootstrap values on the NJ trees. We designate these groups "subclades."

    The R2-A1, R2-A3, R2-D3, and R2-D4 subclades include two animal phyla. The R2-A1 subclade is composed of R2Tl (tadpole shrimp; phylum Arthropoda) and the R2 elements of two sea squirt species, C. intestinalis and C. savignyi (phylum Chordata). The R2-A3 subclade includes six vertebrate R2 elements from four species (zebrafish, White Cloud Mountain minnow, medaka fish, and Reeves' turtle; Chordata) and one horseshoe crab (Arthropoda) R2 element. The R2-D3 subclade includes R2 elements R2Ps (crayfish; Arthropoda) and R2Eb (hagfish; Chordata). The R2-D4 subclade is composed of three sea squirt (Chordata) R2 elements and one sea lily (phylum Echinodermata) R2 element. The R2-C1 subclade includes two bloodfluke (phylum Platyhelminthes) R2 elements. The other five subclades are composed of only arthropod R2 elements.

    Long-Term Vertical Transmission of R2

    The seeming inconsistency between the R2 phylogeny and the phylogeny of their hosts can be explained in two ways. One is the horizontal transfer of R2 between species and the other is the presence of paralogous R2 lineages that have been copropagated in host lineages. The frequent extinction of retrotransposons makes it difficult to investigate which of these possible scenarios gave rise to the seeming inconsistency observed. We performed divergence-versus-age analysis, described by Malik, Burke, and Eickbush (1999), using the C-terminal halves of the RT domains (fig. 3). In the divergence-versus-age analysis, the x axis represents estimates of host divergence and the y axis represents amino acid divergence. Estimates of host divergence times are based on Feng, Cho, and Doolittle (1997) and Kumazawa, Yamaguchi, and Nishida (1999) (fig. 3A). Although Malik, Burke, and Eickbush used whole RT domains to estimate the divergence rate, we used only the C-terminal halves of the RT domains because the full-length RT domains of many R2 elements have not yet been sequenced. As an indicator of standard divergence rates, we used two subclades, R2-A3 (fig. 3B, open squares) and R2-D5 (fig. 3B, closed squares), to exclude paralogous comparisons. Paralogous comparisons inflate the divergence time and then mislead us into considering that orthologous comparisons are "less divergent than expected." Comparisons that are "less divergent than expected" are inferred to reflect horizontal transfer. Comparing the most closely related lineages for each pair of species should provide the best estimates of orthologous comparisons. The R2-D5 subclade includes most of the R2 elements used for the previous divergence-versus-age analysis (Malik, Burke, and Eickbush 1999).

    FIG. 3.— Divergence-versus-age analysis. (A) Phylogeny and divergence time of the host species of R2. Species divergence times based on the previous papers (Feng, Cho, and Doolittle 1997; Kumazawa, Yamaguchi, and Nishida 1999; Malik, Burke, and Eickbush 1999) are plotted at nodes. (B) Plots of the divergence-versus-age estimates of R2. Amino acid divergences were calculated from sequences of the C-terminal half of the RT domain. For each host divergence time estimate, the elements used are as follows. Plots of orthologous comparison in the R2-D5 subclade (closed squares): at 1 MYA, R2Dmel versus R2Dsim; at 6 MYA, (R2Dmel, R2Dsim) versus R2Dyak; at 25 MYA, (R2Dmel, R2Dsim, R2Dyak) versus R2Damb; at 39 MYA, (R2Dmel, R2Dsim, R2Dyak, R2Damb) versus R2Dmer; at 250 MYA, (R2Dmel, R2Dsim, R2Dyak, R2Damb, R2Dmer) versus R2Fa. Plots of orthologous comparisons in the R2-A3 subclade (open squares): at 300 MYA, (R2Dr, R2Ta) versus R2Ol-A; at 405 MYA, (R2Dr, R2Ta, R2Ol-A) versus (R2Cr-A, R2Cr-B1, R2Cr-B2); at 850 MYA, (R2Dr, R2Ta, R2Ol-A, R2Cr-A, R2Cr-B1, R2Cr-B2) versus R2Lp. Plots of paralogous comparisons (open triangles): at 250 MYA, (R2Bm, R2Sc, R2Hc, R2Tm-A, R2Tm-B, R2Pj-A, R2Pj-B, R2Pj-C, R2Nv-A, R2Nv-B, R2Amel) versus R2Fa; at 850 MYA, R2Ci-A versus R2Tl, and R2Dmel versus R2Dr. Plots of comparisons of closely related R2 elements whose hosts are phylogenetically distant (asterisks): at 590 MYA, R2Ci-A versus R2Mr; at 850 MYA, R2Ci-D versus R2Tl, R2Amel versus R2Dr, and R2Pc versus R2Eb.

    We plotted all possible comparisons in both subclades (fig. 3B). Two subclades showed similar patterns of increased divergence with time. Comparisons within the genus Drosophila (fig. 3B, closed squares at less than 100 MYA) illustrated a nearly linear relationship, but the trend becomes looser as the time of the divergence [MYA] increases. The amino acid distances between R2Fa (earwig) and the six Drosophila R2 elements, which are thought to have diverged 250 MYA, vary from 0.56 to 0.60 (fig. 3B, closed squares at 250 MYA). Because the divergence times are basically the same for these comparisons, the differences in amino acid distances indicate deviations from the expected. The amino acid distances between the three turtle R2 elements (R2Cr-A, R2Cr-B1, and R2Cr-B2) and the teleost R2 elements (R2Dr, R2Ta, and R2Ol-A), which are thought to have diverged 405 MYA (Feng, Cho, and Doolittle 1997), vary from 0.49 to 0.61 (fig. 3B, open squares at 405 MYA). The amino acid distances between six vertebrate R2 elements (R2Dr, R2Ta, R2Ol-A, R2Cr-A, R2Cr-B1, and R2Cr-B2) and R2Lp (horseshoe crab), which are thought to have diverged 850 MYA (Feng, Cho, and Doolittle 1997), vary from 0.60 to 0.71 (fig. 3B, open squares at 850 MYA). With increasing age, the amino acid distances tend to become greater, but the deviation also becomes greater. This is because the multiple substitutions at the same site that increase with increasing age cannot be accommodated in the calculation. The divergence gradually increases up to 1,000 MYA, but the clear correlation cannot be observed at old ages because of the deviation.

    Plots of orthologous comparisons are expected to lie near the plots of the R2-A3 and R2-D5 subclades. Points below the orthologous comparisons indicate horizontal transfer, where divergence time is more recent than the time of host branching. Points above the orthologous comparisons indicate paralogous comparisons, where divergence is earlier than the time of host branching. Because the R2-D5 subclade includes R2 elements of holometabolous insects (R2Dmel, R2Dsim, R2Dmau, R2Dyak, R2Damb, R2Dmer; fruit flies) and hemimetabolous insects (R2Fa; earwig), which are orthologous, all other R2 elements of holometabolous insects (R2Bm in the silkworm; R2Sc in the fungus gnat; R2Hc in the ladybird beetle; R2Tm-A and R2Tm-B in the mealworm; R2Pj-A, R2Pj-B, and R2Pj-C in the Japanese beetle; R2Nv-A and R2Nv-B in the jewel wasp; and R2Amel in the honeybee) are paralogous to R2Fa (fig. 2). We plotted these paralogous comparisons (fig. 3B, open triangles at 250 MYA). We plotted two other comparisons as paralogous comparisons: R2Dr versus R2Dmel and R2Tl versus R2Ci-A. Because R2Dr and R2Lp (horseshoe crab) are orthologous (fig. 2, A3), R2Dr and R2Dmel are paralogous (fig. 2, A3 and D5). R2Tl (tadpole shrimp) is more closely related to R2Ci-D (sea squirt) than to R2Ci-A (sea squirt) (fig. 2, A1 and D4), so R2Tl and R2Ci-A are paralogous. The points representing paralogous comparisons are above the curve of the orthologous comparisons at 250 MYA but are not clearly above at 850 MYA (fig. 3B). Therefore, this divergence-versus-age analysis cannot distinguish ancient horizontal transfer from vertical transfer.

    Two R2 elements that are phylogenetically close but the host species of which are distantly related are candidates for horizontal transfer. We tested five comparisons of related R2 elements, the host species of which are phylogenetically distant: R2Ci-D in the sea squirt versus R2Tl in the tadpole shrimp (fig. 2, A1), R2Amel in the honeybee versus R2Dr in the zebrafish (fig. 2, A2 and A3), R2Pc in the crayfish versus R2Eb in the hagfish (fig. 2, D3), and R2Ci-A in the sea squirt versus R2Mr in the sea lily (fig. 2, D4). However, no points fell markedly below the orthologous comparisons (fig. 3B, asterisks). Thus, there is no reliable evidence for the horizontal transfer of R2. Based on the above data, we concluded that R2 has mainly been transferred vertically and that the 11 subclades are paralogous lineages of R2.

    Subclades except the R2-B1 and R2-B2 subclades date back to more than 850 MYA because subclades including elements from both deuterostomes and protostomes, and their sister subclades, are vertically transferred before the host branching. The R2-B1 and R2-B2 subclades include only arthropod R2 elements and are not sister branches of subclades containing R2 elements from more than two phyla, and thus we could not confirm the time of branching between the R2-B1 and R2-B2 subclades. However, the common ancestor of the R2-B1 and R2-B2 subclades may also date back to more than 850 MYA.

    Correlation Between Structure and Phylogeny

    The 11 subclades can be combined into larger groups. In both phylogenetic trees (fig. 2), the R2-A2 and R2-A3 subclades, the R2-B1 and R2-B2 subclades, and the R2-D3 and R2-D4 subclades are monophyletic with strong statistical support. Furthermore, the R2-A1, R2-A2, and R2-A3 subclades can be combined into a monophyletic group. The R2-D1, R2-D2, R2-D3, R2-D4, and R2-D5 subclades can also be combined into a monophyletic group.

    It is known that the R2 deep phylogeny corresponds to the number of N-terminal zinc-finger motifs (Burke et al. 1998; Kojima and Fujiwara 2004). The R2 elements that have three zinc-finger motifs at their N-termini are R2Ci-D (sea squirt C. intestinalis), R2Cs-D (sea squirt C. savignyi), R2Nv-B (jewel wasp), R2Dr (zebrafish), and R2Lp (horseshoe crab), all of which belong to the monophyletic group containing the R2-A1, R2-A2, and R2-A3 subclades. The other known R2 elements, such as R2Bm (silkworm), R2Dmel (fruit fly D. melanogaster), and R2Ci-A (sea squirt C. intestinalis), have only one zinc-finger motif in their N-terminal regions. In this study, we characterized the N-terminal motifs of R2Amel (honeybee) and R2Sm-A (bloodfluke). R2Amel in the R2-A2 subclade has three zinc-finger motifs (fig. 4), which is similar to the most closely related R2 element R2Nv-B (jewel wasp) (fig. 2). Interestingly, R2Sm-A in the R2-C1 subclade has two zinc-finger motifs (fig. 4). The first zinc-finger motif is the CCHH-type and is related to the first zinc-finger motif observed in R2 elements which have three zinc-finger motifs, judging from its position and sequence conservation. The second zinc-finger is also the CCHH-type and is related to the third zinc-finger motif in the R2-A1, A2, and A3 subclades and to the only zinc-finger in other R2 elements (fig. 4). Because the spacing between the first and second zinc-finger motifs is only 11 residues (fig. 4, shown in parentheses), there is no room for inclusion of the middle zinc-finger motif. The R2 phylogeny, structures, and distributions are summarized in figure 5.

    FIG. 4.— N-terminal zinc-finger motifs of R2. Conserved residues are shaded. The spacing between zinc-finger motifs is shown in parentheses.

    FIG. 5.— Summary of the phylogeny, domain structure, and distributions of R2. The number next to each node in the phylogeny indicates a value as a percentage of posterior probability (above) and bootstrap value (below in parentheses). Black circles at nodes indicate that the R2 branching occurred before the branching between the deuterostomes and protostomes. In domain structure column, abbreviations are as follows: CCHH, CCHH-type zinc-finger motif; c-myb, c-myb–like DNA-binding motif; CCHC, CCHC-type zinc-finger motif; and RLE, restriction enzyme–like endonuclease. In distribution column, abbreviations are as follows: A, Arthropoda; C, Chordata; P, Platyhelminthes; and E, Echinodermata.

    We propose that the R2 "superclade" should replace the R2 "clade" and suggest that the R2 superclade can be divided into four new clades, R2-A, R2-B, R2-C, and R2-D (figs. 2 and 5). The term "clade" was proposed by Malik, Burke, and Eickbuch (1999) to represent non-LTR retrotransposons that (1) share the same structural features, (2) are grouped together with ample phylogenetic support, and (3) date back to the Precambrian era. As described above, all clades and most of subclades satisfy all three requirements. For the purpose of uniformity, we propose 4 clades that are phylogenetically and structurally supported and 11 subclades that may be promoted to the status of clades in the future.

    The R2-A clade branched first and the R2-C and R2-D clades branched last, with high statistical support on both the Bayesian tree and the NJ tree (figs. 2 and 5). Because the R2-A, R2-C, and R2-D clades have three, two, and one N-terminal zinc-finger motif(s), respectively, the common ancestor of all R2 elements could have had three N-terminal zinc-finger motifs. It is reasonable to infer that the R2-B clade retrotransposons have three or two zinc-finger motifs at the N-terminus. Either before the branching of the R2-C clade or before the branching of the R2-B clade, R2 lost the second zinc-finger motif, which is a CCHC-type motif. The first zinc-finger motif was lost in the common ancestor of the R2-D clade.

    Comparison of Evolutionary History Between R2 and Other Non-LTR Elements

    The copy number of the sequence-specific R2 element is restricted by that of the target 28S rDNA. The number of multiple lineages of R2 is, therefore, more tightly restricted than that of nonsequence-specific retrotransposons such as L1 or CR1. Horizontal transfer may explain the existence of multiple lineages of R2 in various organisms, but the divergence-versus-age analysis in this study showed no direct evidences for that. These results agree with the contention by Malik, Burke, and Eickbush (1999) that non-LTR retrotransposons seldom horizontally transferred. Nevertheless, we need to consider the possibility that domains specific for some retrotransposons play a role in horizontal transfer, proposed by Kapitonov and Jurka (2003). They proposed that putative esterase domains identified in ORF1 protein of CR1-like retrotransposons are involved in penetration of cell membrane.

    Our results also show that most R2 subclades can date back to more than 850 MYA. Malik, Burke, and Eickbush (1999) proposed the classification "clade" originally to represent a group of retrotransposons that date back to the Precambrian era (more than 540 MYA). Considering the age of subclades, the R2 superclade is more ancient than that has been so far suggested. The above results implicate that the origin of other clades of non-LTR retrotransposons also appears in a more ancient time than that previously estimated. Because the original clades date back far before the end of Precambrian era, not a few subgroups (or subclades) in the original clades can date back to the Precambrian era. For example, Lovin, Gubenek, and Kordi (2001) reported that the original CR1 clade should be divided into three new clades, CR1, L2, and Rex1. The fact that the origin of some clades of non-LTR retrotransposons appears in the ancient age explains reasonably that they are observed in a wide variety of organisms, without implying the involvement of horizontal transfer. The RTE clade, which includes elements from animals, plants, and brown algae, would date back to before the diversification of major eukaryotic groups, as previously described by upunski, Gubenek, and Kordi (2001). The R2 superclade would also date back to the early evolution of eukaryotes. Although R2 has been identified from only four animal phyla at present, it is possible that R2 is more widely distributed in animals and other eukaryotes.

    Most early-branched non-LTR retrotransposons have sequence specificity (Malik, Burke, and Eickbush 1999). The birth of sequence specificity in non-LTR retrotransposons is a major concern in the evolution of non-LTR retrotransposons. Was the original non-LTR retrotransposon sequence specific or was the sequence specificity independently acquired in many clades encoding restriction-like endonucleases? Sequence-specific retrotransposons other than R2 are found only among a few organisms. R2 is, therefore, appropriate for investigating the evolution of sequence specificity of non-LTR retrotransposons. In this study, we traced the sequence specificity of R2 back to before 850 Myr. Further studies in distribution of R2 may yield a clue for the birth of sequence specificity and the original property of non-LTR retrotransposon.

    Acknowledgements

    We thank Taku Hibino, Atsuo Nishino, Noriko Fujikawa, Souichirou Kubota, and Hiroko Fujiwara for providing genomic DNA samples. This work was supported by grants from the Ministry of Education, Science, and Culture of Japan (MESCJ) and by a Grant-in-Aid from the Research for the Future Program and Research Fellowships for Young Scientists of the Japan Society for the Promotion Science (JSPS).

    References

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.

    Anzai, T., H. Takahashi, and H. Fujiwara. 2001. Sequence-specific recognition and cleavage of telomeric repeat (TTAGG)n by endonuclease of non-long terminal repeat retrotransposon TRAS1. Mol. Cell. Biol. 21:100–108.

    Aparicio, S., J. Chapman, E. Stupka et al. (41 co-authors). 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301–1310.

    Ben Ali, A., J. Wuyts, R. De Wachter, A. Meyer, and Y. Van de Peer. 1999. Construction of a variability map for eukaryotic large subunit ribosomal RNA. Nucleic Acids Res. 27:2825–2831.

    Burke, W. D., C. C. Calalang, and T. H. Eickbush. 1987. The site-specific ribosomal insertion element type II of Bombyx mori (R2Bm) contains the coding sequence for a reverse transcriptase-like enzyme. Mol. Cell. Biol. 7:2221–2230.

    Burke, W. D., D. G. Eickbush, Y. Xiong, J. Jakubczak, and T. H. Eickbush. 1993. Sequence relationship of retrotransposable elements R1 and R2 within and between divergent insect species. Mol. Biol. Evol. 10:163–185.

    Burke, W. D., H. S. Malik, J. P. Jones, and T. H. Eickbush. 1999. The domain structure and retrotransposition mechanism of R2 elements are conserved throughout arthropods. Mol. Biol. Evol. 16:502–511.

    Burke, W. D., H. S. Malik, W. C. Lathe III, and T. H. Eickbush. 1998. Are retrotransposons long-term hitchhikers? Nature 392:141–142.

    Christensen, S., G. Pont-Kingdon, and D. Carroll. 2000. Target specificity of the endonuclease from the Xenopus laevis non-long terminal repeat retrotransposon, Tx1L. Mol. Cell. Biol. 20:1219–1226.

    Courseaux, A., and J. L. Nahon. 2001. Birth of two chimeric genes in the Hominidae lineage. Science 291:1293–1297.

    Eickbush, D. G., D. D. Luan, and T. H. Eickbush. 2000. Integration of Bombyx mori R2 sequences into the 28S ribosomal RNA genes of Drosophila melanogaster. Mol. Cell. Biol. 20:213–223.

    Feng, D. F., G. Cho, and R. F. Doolittle. 1997. Determining divergence times with a protein clock: update and reevaluation. Proc. Natl. Acad. Sci. USA 94:13028–13033.

    Feng, Q., G. Schumann, and J. D. Boeke. 1998. Retrotransposon R1Bm endonuclease cleaves the target sequence. Proc. Natl. Acad. Sci. USA 95:2083–2088.

    Fujiwara, H., T. Ogura, N. Takada, N. Miyajima, H. Ishikawa, and H. Maekawa. 1984. Introns and their flanking sequences of Bombyx mori rDNA. Nucleic Acids Res. 12:6861–6869.

    Jakubczak, J. L., W. D. Burke, and T. H. Eickbush. 1991. Retrotransposable elements R1 and R2 interrupt the rRNA genes of most insects. Proc. Natl. Acad. Sci. USA 88:3295–3299.

    Jakubczak, J. L., Y. Xiong, and T. H. Eickbush. 1990. Type I (R1) and type II (R2) ribosomal DNA insertions of Drosophila melanogaster are retrotransposable elements closely related to those of Bombyx mori. J. Mol. Biol. 212:37–52.

    Jakubczak, J. L., M. K. Zenni, R. C. Woodruff, and T. H. Eickbush. 1992. Turnover of R1 (type I) and R2 (type II) retrotransposable elements in the ribosomal DNA of Drosophila melanogaster. Genetics 131:129–142.

    Kapitonov, V. V., and J. Jurka. 2003. The esterase and PHD domains in CR1-like non-LTR retrotransposons. Mol. Biol. Evol. 20:38–46.

    Kojima, K. K., and H. Fujiwara. 2003. Evolution of target specificity in R1 clade non-LTR retrotransposons. Mol. Biol. Evol. 20:351–361.

    ———. 2004. Cross-genome screening of novel sequence-specific non-LTR retrotransposons: various multicopy RNA genes and microsatellites are selected as targets. Mol. Biol. Evol. 21:207–217.

    Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244–1245.

    Kumazawa, Y., M. Yamaguchi, and M. Nishida. 1999. Mitochondrial molecular clocks and the origin of euteleostean biodiversity: familial radiation of perciforms may have predated the Cretaceous/Tertiary boundary. Pp. 35–52 in M. Kato, ed. The biology of biodiversity. Springer-Verlag, Tokyo, Japan.

    Lander, E. S., L. M. Linton, B. Birren et al. (100 co-authors). 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921.

    Lovin, N., F. Gubenek, and D. Kordi. 2001. Evolutionary dynamics in a novel L2 clade of non-LTR retrotransposons in Deuterostomia. Mol. Biol. Evol. 18:2213–2224.

    Malik, H. S., W. D. Burke, and T. H. Eickbush. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16:793–805.

    Miki, Y. 1998. Retrotransposal integration of mobile genetic elements in human diseases. J. Hum. Genet. 43:77–84.

    Perez-Gonzalez, C. E., and T. H. Eickbush. 2001. Dynamics of R1 and R2 elements in the rDNA locus of Drosophila simulans. Genetics 158:1557–1567.

    ———. 2002. Rates of R1 and R2 retrotransposition and elimination from the rDNA locus of Drosophila melanogaster. Genetics 162:799–811.

    Roiha, H., J. R. Miller, L. C. Woods, and D. M. Glover. 1981. Arrangements and rearrangements of sequences flanking the two types of rDNA insertion in D. melanogaster. Nature 290:749–753.

    Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574.

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876–4882.

    Waterston, R. H., K. Lindblad-Toh, E. Birney et al. (222 co-authors). 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.

    Xiong, Y. E., and T. H. Eickbush. 1988. Functional expression of a sequence-specific endonuclease encoded by the retrotransposon R2Bm. Cell 55:235–246.

    upunski, V., F. Gubenek, and D. Kordi. 2001. Evolutionary dynamics and evolutionary history in the RTE clade of non-LTR retrotransposons. Mol. Biol. Evol. 18:1849–1863.(Kenji K. Kojima1 and Haru)