当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第3期 > 正文
编号:11176524
Recurrent Recruitment of the THAP DNA-Binding Domain and Molecular Domestication of the P-Transposable Element
http://www.100md.com 《分子生物学进展》
     Laboratoire Dynamique du Génome et Evolution, Institut Jacques Monod, Universités Paris 6 et 7, Paris, France

    Correspondence: E-mail: hq@ccr.jussieu.fr.

    Abstract

    The recently described THAP domain motif characterizes a DNA-binding domain (DBD) that is widely conserved in human and in animals. It presents a similarity with the DBD of the P element transposase of D. melanogaster. We show here that the P Drosophila neogenes derived from P-transposable elements conserve the THAP domain. Moreover, secondary rearrangements by exon shuffling indicate the recurrent recruitment of this domain by the host genome. As P sequences and THAP genes are found together in many animal genomes, we discuss the possibility that the THAP proteins have acquired their domain as a result of recurrent molecular domestication of P-transposable elements.

    Key Words: transposable elements ? molecular evolution ? bioinformatic analyses

    Introduction

    A novel evolutionary conserved protein domain—the THAP (thanatos-associated protein) domain—has been described by Roussigne et al. (2003b). It defines a new family of cellular factors, the THAP proteins. This domain is well conserved not only in vertebrates (human, mouse, rat, pig, cow, chicken, Xenopus, and zebrafish) but also in worms and Drosophila. Moreover, these authors have highlighted the similarities between the THAP domain and the DNA-binding domain (DBD) of the P element transposase of Drosophila melanogaster. In this article, we show that the THAP domain is also present in the proteins encoded by the Drosophilidae P neogenes derived from P transposable elements (Paricio et al. 1991; Miller et al. 1995; Nouaud and Anxolabéhère 1997; Nouaud et al. 1999; Nouaud, Quesneville, and Anxolabéhère 2003). These P neogenes are the transition of a former genomic parasite to a stationary gene beneficial to the host, designated molecular domestication by Miller et al. (1992). These P neogenes have been described as recurrent molecular domestications of P elements. Moreover, in some species, the neogene has undergone secondary rearrangements by exon shuffling involving another P transposon, resulting in duplication of the THAP domain. Taken together, these data correspond to repeated recruitment by the host of the DNA-binding function of the P transposon.

    In addition, the THAP domain is found in many proteins belonging to numerous taxa, including vertebrates. One of the human THAP proteins, the THAP9 protein, exhibits significant degree of similarity with the P element transposase (Roussigne et al. 2003b). By describing the structure of THAP9 genomic sequence and comparing it with that of the Phsa element sequence previously described by Hagemann and Pinsker (2001), we show that they are not paralogs, but correspond to the same locus. Consequently we show that Phsa is the THAP9 gene incompletely described in its 5' end and that the THAP9 coding sequence overlaps the full-length coding sequence of the P element transposase. In this work, we use the observation of Roussigne et al. (2003b) and the well-documented Drosophila P neogene story to analyze the evolutionary and functional relationships between the THAP domain found in multiple proteins belonging to numerous taxa and the DBD of the P element transposase. We show how the origin of the THAP proteins can be understood as a domestication process similar to that of the P neogenes in Drosophila.

    Materials and Methods

    THAP9 Survey

    Homologous THAP9 search were done by the TBlastN procedure (Altschul et al. 1990) applied to databanks of whole genomes, cDNA, or EST databanks on the following Web sites: http://www.ncbi.nlm.nih.gov (human, Macaca mulatta, Sus scrofa, Bos taurus); http://www.ensembl.org (Pan troglodytes, Gallus gallus); http://vega.sanger.ac.uk/Danio_rerio/blastview (Danio rerio); http://genome.jgi-psf.org/ciona4/ciona4.home.html (Ciona intestinalis, Ciona savignii); and http://flybase.bio.indiana.edu/blast/ (Drosophila melanogaster). All TBlastN searches were performed before April 2004.

    Alignments

    Global alignments are obtained by the ALIGN program from the Fasta package (Pearson and Lipman 1988) with the default parameters. All amino acid multiple alignments was performed with ClustalW (Higgins, Thompson, and Gibson 1996) and then optimized by hand to remove nonsignificant gaps using the GeneDoc software (Nicholas and Nicholas 1997).

    Results and Discussion

    Independent Repeated Recruitment of the DNA-binding THAP Domain from P-Transposable Elements in the Drosophila Genus

    Roussigne et al. (2003b) show a multiple alignment of THAP domains with that of the D. melanogaster P element transposase. To confirm its presence in other distant P element proteins, we added six distant transposases and 14 P neogene proteins derived from four independent molecular domestication events of P elements (Paricio et al. 1991; Nouaud and Anxolabéhère 1997; Nouaud, Quesneville, and Anxolabéhère 2003). The levels of similarity between the P element transposase of D. melanogaster and these P proteins ranged between 52.2% and 82.8%. Even if they are related, we can wonder whether they have conserved the same DNA-binding domain: the THAP domain. Moreover the P neogenes have lost their transposase activity and have probably acquired new functions. Because they have not retained all the features of a mobile P-transposable element, we can wonder whether they have conserved a functional THAP domain.

    A PSI-Blast procedure was used to test for the presence of a THAP domain in all the known P element transposases and P neogene proteins. An alignment was used to initialize the PSI-Blast procedure (Altschul et al. 1997). It has been obtained by ClustalW from the well-characterized human THAP domains (THAP0, AF081567 [GenBank] ; THAP1, BC021721 [GenBank] ; THAP2, BC008358 [GenBank] ; THAP3, BC022081 [GenBank] ; THAP4 AF258556 [GenBank] ; THAP5, XP_095114; THAP6, BC022989 [GenBank] ; THAP7, BC004346 [GenBank] ; THAP8, AK057453 [GenBank] ; THAP9, AK091412 [GenBank] ; THAP10, AL360202 [GenBank] : THAP11, and AAH12182 [GenBank] and those of D. melanogaster (CG10431, AAF53730 [GenBank] CG6689, AAF54607 [GenBank] LD47616, AAK93448 [GenBank] CG13894, AAF47408DIP2, and AAF31699 [GenBank] , excluding the P THAP domains (figure 1A and B in Supplementary Material online). This alignment was run as a query against all the 37 THAP domains proteins, including those described by Roussigne et al. (2003b) and those from six P element transposases and 14 P neogene proteins.

    At the second iteration, all the P element transposases and P neogene proteins attained significant E-values ranging from 5.10–14 to 2.10–08. We conclude from these data that the first 75 to 77 residues of all the P element transposases and P neogene proteins that we have analyzed include a true THAP domain.

    The NH2 terminal regions of THAP proteins (12 from human and five from Drosophila) were aligned with the NH2 terminal regions of the seven Drosophila P element transposases and with the 14 P neogene proteins (figure 1 in Supplementary Material online). The P element transposase and P neogene protein NH2 terminal regions present all the characteristics of the THAP domain as defined by Roussigne et al. (2003b): (1) they are located at the N terminus of the proteins, (2) they are about 90 residues in size, (3) they have a C2CH signature (consensus Cys-Xaa 2 – 4Cys-Xaa 35 – 50 Cys-His), (4) they have additional key residues that are strictly conserved in all the THAP domains (proline, tryptophane, and phenylalanine 9 to10 and 20 to 22 residues from each other, respectively), and (5) they have a C-terminal AVP box and several other conserved amino acid positions with distinct physicochemical properties.

    The presence of a THAP domain in functionally identified cellular proteins, such as THAP0, THAP1 (Deiss et al. 1995; Roussigne et al. 2003a), and DIP-2 (Bhaskar et al. 2000), allows us to hypothesize that the recurrent P element domestications may owe their success to this functional property. This hypothesis is strongly supported by the exon shuffling events that have been specifically limited to the exon corresponding to the THAP domain of the montium neoproteins (Nouaud, Quesneville, and Anxolabéhère 2003). Figure 1 shows the structures of the domesticated P elements. The autonomous mobile P elements can encode an 87-kDa transposase and a 66-kDa repressor protein (fig. 1A; for a review, see Rio [2002]), whereas the P neogenes have only conserved the capacity to encode a 66-kDa repressor-like protein specified by the first three exons of the transposase (fig.1B). The P neogenes result from two independent P element molecular domestication events. The first event occurred more than 20 MYA in an ancestor species of the Drosophila montium subgroup of species; these P neogenes have been found in 18 species (Nouaud and Anxolabéhère 1997; Nouaud et al. 1999). The other domestication event arose in the ancestor of a triad of Drosophila species belonging to the subobscura subgroup less than 5 MYA (Paricio et al. 1991; Miller et al. 1995). In both cases, the P element–related neogenes have lost the terminal inverted repeats (TIR) (or conserved a skeleton of one of them) and lack the exon 3 specific to the transposase. They are present as single copy at the same genomic location in the species of the montium subgroup and are tandemly repeated in a cluster of 10 to 50 copies in the subobscura subgroup of species.

    FIG. 1.— The recurrence of P domestications is associated with the THAP domain. (A) The canonical P-transposable element of D. melanogaster. The transposase and repressor proteins result from the germline-specific alternative splicing of intron 2 to 3. (B) The Drosophila P neogenes. Alternative splicing results in the production of three proteins: the RL1 protein (exon 0 + exon 1 + exon 2), the RL2 protein (exon 0' + exon 1 + exon 2), and a small protein (exon 0) (Nouaud, Quesneville, and Anxolabéhère 2003). The first exon (exon –1) is not coding (Nouaud et al. 1999). Each texture corresponds to a distinct P subfamily: M-type (gray boxes), T-type (open boxes), K-type (striped boxes), G-type (angled bar boxes), A-type (doted boxes). (Accession numbers: D. melanogaster, X06779; D. tsacasi, AF016036H; D. bocqueti, AF169142; D. burlai, AY116626; D. vulkana, AY116625; D. guanche, L32023; D. subobscura, X60436; and D. madeirensis, X79804.)

    These two types of P neogenes derive from the same ancestral P element family as the result of two independent transposition events at distinct genomic locations. Both insertions have undergone structural modifications leading to their immobilization and changing their cis-regulatory sequences.

    In the case of the montium P neogene, this genomic modification was accompanied by the formation of an untranslated new exon (exon-1). The two types of P neogenes have recruited flanking genomic sequences as new regulatory regions that may result in different expression patterns, leading to distinct novel functions of the proteins. Remarkably, in both cases, the neogenes have subsequently undergone secondary duplications. (1)Two independent exon-shuffling events have taken place within the montium P neogenes. They result from the "capture" of an additional exon 0 (known as exon 0') (fig. 1B) derived from a distant P element family, the K-boc family (Nouaud, Quesneville, and Anxolabéhère 2003). In the clade grouping D. bocqueti and D. burlai, the P neogene has retained this new exon downstream of the exon 0. In the clade D. vulkana and D. malagassya, this exon 0' is located upstream of exon 0. The montium P neogenes, which have an additional exon, encode two putative proteins, depending on alternative splicing; they share the same COOH-terminal region, but show wide divergence in their NH2-terminal part. (2) In the subobscura subgroup, the P neogene modifications begin by a primary duplication giving rise to two neogenes, the G-type and the A-type, followed by specific amplification of the A-type, which results in a cluster of 10 to 50 copies (fig. 1B). Consequently, the P neogenes encode at least four different THAP proteins, two in the montium subgroup, referred to hereafter as the RL1 and RL2 THAP proteins, and two in the subobscura subgroup, referred to as the G and A THAP proteins.

    It should be noted that the P neogene proteins, which coexist within the same genome, contain THAP domains that have originated from different P-transposable element subfamilies. Indeed, the THAP regions of exons 0 and 0' are widely divergent in D. bocqueti (with 54.2% similarity) and in D. vulcana (with 53.8% similarity). However, the divergence is less significant between the exons 0 of the G-type and A-type neogenes in the subobscura subgroup, with 83.8% and 85.0% similarities in D. subobscura and D. guanche, respectively. This finding suggests that each P neogene protein could have specific genomic fixation sites, which implies that the THAP regions are able to diversify their targets and, consequently, modify the function of the resulting domesticated proteins.

    P Neogene in the Human Genome

    A P homologous protein has already been reported in the human genome (Phsa, cDNA accession number AK026973) by Hagemann and Pinsker (2001). Is it the same sequence or a paralog? A BlastN search with the cDNA sequence (accession number AK091412, cDNA) corresponding to the THAP9 protein, on the entire human genomic sequence, reveals only one region with significant matches at 100% of identity, corresponding to the genomic region of the Phsa described by Hagemann and Pinsker (2001). The Phsa genomic sequence is indeed part of the THAP9 gene. Figure 2 shows the genomic structure of the THAP9 gene compared with that of the D. melanogaster P element.

    FIG. 2.— Comparison of the coding regions in the canonical P-transposable element, the human and Pan troglodytes THAP9 gene. The nucleic size of the exons (bold) and introns (italic) are indicated. The nucleic size of the human THAP9 gene is calculated from the start to the stop codon positions on the genomic sequence. Hatched boxes represent the 3' UTR of the human and P. troglodytes THAP9 gene. (Accession numbers: human THAP9, AK091412; THAP9 Pan troglodytes, Scaffold_37596.)

    The THAP9 genomic sequence encodes a 903-residue protein that matches the P element transposases throughout and exhibits 21% identity (global alignment obtained by the ALIGN program) with that of D. melanogaster. The absence of any P sequence other that of the THAP9 gene in the human genome and the absence of inverted repeats—even in skeletal form—at the 5' and 3' extremities of the THAP9 gene suggest that this gene emerged from a domestication event of a P-transposable element, as in the montium subgroup. Alternatively, a bona fide gene encoding a protein with THAP-like and endonuclease domains could be the genomic ancestor of both the THAP9 gene and the P-transposable elements.

    A THAP9 orthologous gene is detected in the synthenic region of the Pan troglodytes genome (99.2% identity at the nucleotide level [fig. 2]): it is located between two genes that are orthologous to those flanking the human THAP9 gene. However, no THAP9 orthologous gene has been detected inside this syntenic region in the genomes of either Mus musculus or Rattus norvegicus (chromosomal localizations at 4q21.3 in human, at 5E4 in mouse, and at 14p22 in rat). This observation suggests that the THAP9 gene originated from a domestication event of a copy of a P element that took place in the human lineage after it had diverged from the rodent lineage.

    From Where Did the THAP9 Protein Come?

    TBlastN searches (GenBank-nr, cDNA, and EST databases) using the THAP9 protein as a query detect numerous sequences present in insects, restricted to the dipteran order, and widely distributed in chordate genomes. Some of them match only with the NH2 terminal region, corresponding to the THAP domain, whereas others (fig. 3) present significant levels of similarity with a significant part of the protein. Some of the species with a THAP9 homologous sequence in their genome also bear paralogous THAP9 sequences. In Drosophilidae and Anopheles, they clearly correspond to P-transposable elements. In the genome of Danio rerio, six different THAP9-like sequences are found, and in the prochordea genomes of Ciona savignii and Ciona intestinalis, multiple homologous THAP9 sequences can also be identified. Note that the Ciona family represents a very basal chordate lineage. These THAP9-like sequences could correspond to ancient families of P-transposable elements, even though no TIRs can be detected in the sequences flanking their coding region. For the other species presenting THAP9 homologous sequences (Macaca mulatta, Sus scrofa, Bos taurus, and Gallus gallus), only cDNA or EST databases are available and so no fossil homologous sequences can be detected. However, assuming that the THAP9 protein is derived from P element transposase, the close similarity between the human THAP9 protein and that of each of these species allows us to suppose that these genomes might have also have contained P-like transposable-element sequences in the course of their evolution.

    FIG. 3.— The THAP9 homologous gene family. The symbol ($) indicates partial or complete coding regions homologous to the THAP9 gene detected in whole genomes or cDNA databanks by a TBlastN search using the THAP9 human protein (accession number NP_078948) as a query. The THAP domain is shown in gray. The lengths are proportional to the overlap between each sequence and the THAP9 protein. The symbol () indicates the percentages of identity calculated by comparing each peptide sequence with the THAP9 human protein. The symbol (*) indicates protein deduced from the overlapping of four cDNAs.

    The distribution of the homologous THAP9 proteins in the genome of species belonging to genera as distant as Anopheles, Drosophila, Ciona, Danio, and Homo (plus the failure to detect them in the genomes of Mus musculus and Rattus rattus) is patchy throughout the animal kingdom. This pattern suggests that recurrent domestications of the P element transposase have occurred, as they have in the Drosophila lineages. However, the high level of identity of the THAP9 protein in Homo sapiens, Pan troglodytes, Macaca mulatta, Sus scrofa, and Bos taurus suggests that they are orthologous, and, thus, derive from a domestication event that arose in a common ancestor. Nevertheless, the hypothesis that the THAP9 domestication actually occurred earlier and has been lost in the lineage leading to rodents cannot be discarded.

    The phylogenic discontinuities of the THAP9-related sequence distribution could be also explained by horizontal transfers. They have been detected only in insects and in chordate lineages. In the other phyla of the animal kingdom, the presence of THAP9-related sequences has not been established, but this failure may be caused by high sequence divergence among these distant lineages (Note that we have not detected any P homologous sequence either in the complete sequence of C. elegans or in Apis melifera). In insects, the occurrence of THAP9-related sequences corresponds only to P transposons and their domesticated derivatives. In fact, in dipteran lineages the distribution of these sequences may result from both horizontal transfer and vertical transmission. In chordate lineages, the THAP9-related sequences are found as stable genomic components (no terminal inverted repeats detected) and may perhaps represent vertical descendants of an ancestral chordate gene. Their absence in rodents can be explained by gene loss in that vertebrate order. Taking into consideration the similarity between insect P elements and chordate THAP9 sequences, it can be postulated that a single horizontal transfer between insects and chordates may have occurred. This event might have taken place in the very distant past, before the split separating tunicates and vertebrates. The transfer in the opposite direction, from chordates to insects, could have happened more recently. However, in this case, the sequence can only have become a transposon after entering the insect lineage, because no traces of mobile THAP9 sequences have been detected in chordates so far.

    From Where Did the THAP Domain Come?

    Are the THAP domains present in numerous chordates proteins derived from a P element? In humans, the COOH-region of the THAP proteins from THAP0 to THAP11 are not similar (data not shown), whereas the THAP domains of their NH2-region do appear to be related. So, we may wonder whether recurrent exon shuffling events from the P element have also occurred several times during evolution outside the Drosophilidae lineage, as they have been described in the montium subgroup of species (Nouaud, Quesneville, and Anxolabéhère 2003). The THAP protein family could have emerged as a result of exon shuffling by transposition of the first exon of P-related elements in the 5' region of pre-THAP genes. In other words, the THAP protein family could have acquired the DBD from molecular domestication of different P-transposable element families, several times not only in Drosophilidae but also during the chordate evolution. An alternative hypothesis would involve a THAP gene as a donor of the THAP domain by exon shuffling. Interestingly, our TBlastN searches have not identified any THAP domain in plants. Are they too distant to be recognized by comparison with the animal THAP domain, or are they really absent? Our hypothesis that the THAP protein may have emerged as a result of exon shuffling from P-related sequences could explain the absence of THAP domains in plants, as no P element has yet been found in this kingdom.

    Even though this hypothesis is appealing and appears to be supported by several observations, we cannot rule out the alternative hypothesis, which would state that the THAP domains have arisen several times independently. However, evidence in favor of a phylogenetic relationship between the THAP domains and the first exon of the P sequences are supported by the high identity values calculated from the multiple alignment of the THAP domain regions (Supplementary Material online). Under a convergence scenario, we expect more equivalent residues (deduced from a similarity matrix) than truly identical, as convergence would rather tend to put residues with similar biochemical properties. We observe 43 identical residues and 15 similar between THAP9 and THAP7, and 39 identicals and 22 similars between THAP9 and THAP6. This argues more in favor of a common ancestor of THAP9, THAP6, and THAP7 DBD domains.

    Acknowledgements

    This work was supported by the Centre National de Recherche Scientifique (CNRS), The Universities P. and M. Curie, and D. Diderot (Institut Jacques Monod, UMR 7592, Dynamique du Génome et Evolution).

    References

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410.

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.

    Bhaskar, V., S. A. Valentine, and A. J. Courey. 2000. A functional interaction between dorsal and components of the Smt3 conjugation machinery. J. Biol. Chem. 275:4033–4040.

    Deiss, L. P., E. Feinstein, H. Berissi, O. Cohen, and A. Kimchi. 1995. Identification of a novel serine/threonine kinase and a novel 15-kDa protein as potential mediators of the gamma interferon-induced cell death. Genes Dev. 9:15–30.

    Hagemann, S., and W. Pinsker. 2001. Drosophila P transposons in the human genome? Mol. Biol. Evol. 18:1979–1982.

    Higgins, D. G., J. D. Thompson, and T. J. Gibson. 1996. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266:383–402.

    Miller, W. J., S. Hagemann, E. Reiter, and W. Pinsker. 1992. P element homologous sequences are tandemly repeated in the genome of D. guanche. Proc. Natl. Acad. Sci. USA 89:4018–4022.

    Miller, W. J., N. Paricio, S. Hagemann, M. J. Martinez-Sebastian, and W. Pinsker. 1995. Structure and expression of clustered P element homologues in D. subobscura and D. guanche. Gene 156:167–174.

    Nicholas, K. B., and H. B. Nicholas Jr. 1997. GeneDoc: a tool for editing and annotating multiple sequence alignments. Distributed by the authors. www.psc.ed/biomed/genedoc.

    Nouaud, D., and D. Anxolabéhère. 1997. P element domestication: a stationary truncated P element may encode a 66kDa repressor like protein in Drosophila montium species subgroup. Mol. Biol. Evol. 14:1132–1144.

    Nouaud, D., B. Boeda, L. Levy, and D. Anxolabéhère. 1999. A P element has induced intron formation in Drosophila. Mol. Biol. Evol. 16:1503–1510.

    Nouaud, D., H. Quesneville, and D. Anxolabéhère. 2003. Recurrent exon shuffling between distant P element families. Mol. Biol. Evol. 20:190–199.

    Paricio, N. M., M. Perez-Alonso, M. J. Martinez-Sebastian, and R. De Frutos. 1991. P sequences of Drosophila subobscura lack exon 3 and may encode a 66 kDa repressor like protein. Nucleic Acids Res. 19:6713–6718.

    Pearson, W. R., and D. J. Lipman. 1988. Improved tools for biological sequences analysis. Proc. Natl. Acad. Sci. 85:2444–2448.

    Rio, D. 2002. P transposable elements in Drosophila melanogaster. Pp. 484–518 in N. Craig, R. Craigie, M. Gellert, and A. Lambowitz, eds. Mobile DNA II. ASM Press, Washington, DC.

    Roussigne, M., C. C, Clouaire, T, Amalric, F, and J. P. Girard. 2003a. THAP1 is a nuclear proapoptotic factor that links prostate-apoptosis-response-4 (part-4) to PML nuclear bodies. Oncogene 22:2432–2442.

    Roussigne, M., S. Kossida, A. C. Lavigne, T. Clouaire, V. Ecochard, A. Glories, F. Amalric, and J. P. Girard. 2003b. The THAP domain: a novel protein motif with similarity to the DNA-binding domain of P element transposase. Trends Biochem. Sci. 28:66–69.(H. Quesneville, D. Nouaud)