当前位置: > 正文
编号:11374480
Pre-messenger RNA Processing Factors in the Drosophila Genome
http://www.100md.com 《细胞学杂志》
     a Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, College Park, Maryland 20742-5815

    b Department of Genetics, Case Western Reserve University, Cleveland, OH 44106-4955

    Correspondence to: Stephen M. Mount, Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, College Park, MD 20742-5815. Tel:(301) 405-6934 Fax:301-314-9081 E-mail:sm193@umail.umd.edu.

    In eukaryotes, messenger RNAs are generated by a process that includes coordinated splicing and 3' end formation. Factors essential for the splicing of mRNA precursors (pre-mRNA)1 in eukaryotes have been identified primarily through the study of nuclear extracts derived from mammalian cells and Saccharomyces cerevisiae genetics. Here, we identify homologues of most known pre-mRNA processing factors in the recently completed sequence of the Drosophila genome. The set of proteins required for RNA processing shows remarkably little variation among eukaryotic species, and individual proteins are highly conserved. In general, proteins involved in the mechanics of RNA processing are even more conserved than proteins involved in the interpretation of RNA processing signals. The genome does not appear to contain a gene for the U11 RNA, or for a protein unique to the U11 snRNP, which raises the possibility that the U12-dependent spliceosome functions without U11 in Drosophila.

    Introduction

    Most RNA processing factors have been identified in either nuclear splicing extracts derived from mammalian cells or in Saccharomyces cerevisiae (Burge et al. 1999 ; Kambach et al. 1999 ; Minvielle-Sebastia and Keller 1999 ). However, Drosophila is extensively used for genetic investigations of complex and regulated splicing. In this review, we survey the recently complete Drosophila sequence (Adams et al. 2000 ) for sequences related to factors identified in these other systems. In many cases, functional data for the Drosophila protein are not available, and our assignments are based on the best match among genomes. We have not included genes that have been identified in Drosophila for which there is evidence of a role in splicing (see, for example, the list presented in Burnette et al. 1999 ). This analysis yields a list of 27 genes that encode small nuclear RNAs (snRNAs; see Table 1) and a list of 99 genes that encode proteins involved in RNA processing (see Table 2). Our survey confirms that the components of the RNA processing machinery are highly conserved. Very few factors identified in other species are absent from the Drosophila genome. In general, the Drosophila proteins are more closely related to their vertebrate counterparts than to the Saccharomyces cerevisiae proteins.

    Table 1. Genes for Drosophila snRNAs Involved in Splicing

    Table 2. Genes for Drosophila Proteins Involved in RNA Processing

    Methods

    Protein sequences of known yeast and human splicing factors were used to query the annotated set of predicted Drosophila proteins using BLASTP, and the nucleotide sequence of the genome using tblastn, on the NCBI server (Altschul et al. 1997 ; http://www.ncbi.nlm.nih.gov/). All identified Drosophila genes were used to query the nonredundant database to establish the optimal yeast and human matches. Alignments were generated using the blast two sequences option (Tatusova et al., 1999) or LALIGN (Huang and Miller 1991 ). Cytological positions were taken from GadFly (http://hedgehog.lbl.gov:8000) or flybase (http://flybase.bio.indiana.edu/), or deduced from the positions of flanking genes.

    To identify snRNA genes, the Drosophila genome was queried using modified blastn parameters (parameter set A: -r 10 -q -11 -W 8 -G 100 -E 50; B: -r 10 -q -11 -W 7 -G 5 -E 20; C: -r 10 -q -11 -W 7 -G 15 -E 4; D: -r 7 -q -14 -W 7 -G 7 -E 3; and E: -r 4 -q -5 -W 8 -G 10 -E 2).

    A curated database containing these results will be available at http://www.wam.umd.edu/~smount/DmRNA factors/table.html.

    Results and Discussion

    Major and minor snRNP components

    U snRNAs.

    Two types of spliceosomes have been previously described (Burge et al. 1999 ). The more common U2-type spliceosome is responsible for splicing the majority of introns, and the U12-type spliceosome is responsible for splicing a minor class of rare introns (perhaps 0.1% in both humans and flies). The Drosophila genome contains multiple copies of the 5 U snRNAs found in the major class spliceosomes. We found five genes for U1, six genes for U2, three genes for U4, seven genes for U5, and three genes for U6 (Table 1). With the exception of U4-25F, and the U5 genes (which were previously known only by in situ hybridization), these genes had been described previously (Alonso et al. 1984 ; Das et al. 1987 ; Saba et al. 1986 ; Saluz et al. 1988 ; Lo and Mount 1990 ). The variant U4-25F has only 69% identity with the major form of fly U4 (Saba et al. 1986 ), and 68% with human U4. Although the possibility that these new snRNA genes are pseudogenes cannot be ruled out, they appear likely to be functional because of their highly conserved promoters. In the case of U4-25F, some of the variation includes compensatory changes that allow formation of conserved stem loop structures. There are four clusters of snRNA genes, including one at 38AB with two U2, one U4, and two U5 genes within 6 kb.

    The Drosophila genome also contains introns that resemble the minor class (or U12) introns first identified in mammals (Adams et al. 2000 ). These are recognized by the U12-type spliceosome including U11, U12, U4atac, and U6atac snRNAs in place of U1, U2, U4, and U6 (Hall and Padgett 1994 ; Tarn and Steitz 1996 ). Identification of snRNAs for the U12-type spliceosome in the genome involved modification of the standard parameters for BLASTN (see Methods). It was possible to find one gene for U12 snRNA, one gene for U6atac snRNA, and one gene for U4atac snRNA. These are almost certainly authentic genes, as critical sequences are conserved. In addition, the highly conserved snRNA promoter is present in each case, including a 9/10 or perfect match to the PSE consensus TAATTCCCAA, which is ~52 nucleotides upstream of the start (Jensen et al. 1998 ; Lo and Mount 1990 ). In contrast, no gene for the U11 snRNA was found. Consistent with the absence of a U11 snRNA, we also failed to find the U11 35-kD–specific protein (accession No. NP_008951; Will et al. 1999 ). In fact, the U11 snRNP, which functions in 5' splice site recognition, may not be required for splicing. The highly conserved minor class 5' splice sites could be recognized by an unknown protein that acts during the early steps of splicing, by the U6atac snRNA alone, or by both. This mechanism would be analogous to a situation seen in vitro, where certain vertebrate introns can be processed in the absence of U1 snRNP if the 5' splice sites can be recognized by U6 snRNA (Crispino et al. 1994 ; Tarn and Steitz 1994 ).

    snRNP Proteins.

    Each snRNP contains a set of Sm core proteins shared with the other snRNPs and a set of proteins that are specific to that snRNP. All 15 known proteins of the Sm family were identified and are highly conserved. These include seven Sm proteins that bind to the U1, U2, U4, and U5 snRNPs (B, D1, D2, D3, E, F, and G); the seven related LSm proteins found in the U6 snRNP (LSM2-LSM8); and the CaSm/LSM1p protein (Bouveret et al. 2000 ; Tharun et al. 2000 ). These and subsequent matches are shown in Table 2. The table reports, for each Drosophila gene, the GenBank accession number and expectation value (the expected number of matches this good or better; Altschul et al. 1997 ) for the best human and yeast (Saccharomyces cerevisiae) match.

    In addition to the Sm proteins, each snRNP also contains a set of snRNP-specific proteins. As expected, orthologues of the proteins that are contained in both the vertebrate and Saccharomyces cerevisiae snRNPs are easily identified in Drosophila, based on their extensive sequence homology, except that a single Drosophila protein, encoded by the sans-fille (snf) gene, corresponds to both the U1 snRNP-U1A and U2 snRNP-U2B'' proteins (Polycarpou-Schwarz et al. 1996 ; Stitzinger et al. 1999 ), and no additional homologues were found.

    Interestingly, the Saccharomyces cerevisiae U1 snRNP is more complex than the vertebrate U1 snRNP, with seven additional protein components that are not found in the purified vertebrate U1 snRNP (Gottschalk et al. 1998 ; Rigaut et al. 1999 ). Only two of these proteins, Luc7 and Prp40, have easily identifiable Drosophila orthologues. The Drosophila ortholog of Luc7, CG7564, is 33% identical to the entire Luc7 protein (Fortes et al. 1999 ), and a second Drosophila Luc7-related protein is 21% identical to the yeast protein (Fortes et al. 1999 ). We identified a single Prp40-like gene in the Drosophila database. CG3542 shares 23% identity with the entire yeast Prp40 protein and 41% sequence identity over its entire length of 757 amino acids with the human protein, FBP11. FBP11 was initially identified because it also contains a tyrosine-rich WW domain and like Prp40, it interacts with the splicing factor SF1 (Bedford et al. 1997 ). These observations suggest that the function of these proteins in forming bridges between 5' splice sites and the branchpoint may be conserved (Abovich and Rosbash 1997 ). It is likely that these Drosophila proteins, like their human homologues, would not be found in purified U1 snRNPs, but, nevertheless, do share a function with their yeast counterparts. A second human Prp40-like protein, FBP21, has been described in the literature (Bedford et al. 1998 ). FBP21 is more closely related to the Drosophila CG4291, with 28% identity over the entire length of the 338–amino acid protein. Because similarity between FBP11/CG3542 and FBP21/CG4291 is limited to the WW repeats, FBP21/CG4291 is unlikely to be related to Prp40. Consistent with this idea, human FBP21 has been found to stably associate with the U2 snRNPs and, therefore, may function at a later stage of spliceosome assembly than does Prp40 (Bedford et al. 1998 ).

    Searches with the yeast U1 snRNP proteins Prp39 and Prp42 have identified only a single homologous sequence. Prp39 and Prp42 belong to a family of TPR repeat proteins (McLean and Rymond 1998 ) and share 25% sequence identity with each other over a ~270–amino acid region that includes several copies of the TPR repeat motif. We identified a single Drosophila protein, encoded by the CG1646 gene, that shares 25% sequence identity with Prp39 and Prp42 over the same ~270 amino acids, and is the best match between the Saccharomyces cerevisiae and Drosophila genomes. The Drosophila crooked neck protein (Crn) is another TPR repeat protein, it's yeast homologue has been shown to act later in spliceosome assembly (Chung et al. 1999 ).

    Surprisingly, there are three Saccharomyces cerevisiae U1 snRNP proteins that have no clear counterparts in the Drosophila database. No Drosophila proteins, whose best match in the S. cerevisiae genome is Snu71, Snu56 or Nam8, were found. Recent work on Snu56 and Nam8 suggests that these proteins contact the pre-mRNA directly and may anchor the U1 snRNP onto the substrate (Puig et al. 1999 ; Zhang and Rosbash 1999 ), a function that could be dispensable in metazoans because it could be provided by the SR proteins. Alternatively, a similar function may be provided by proteins, such as Drosophila rox8 in the case Nam8, that do not appear to be orthologues (Drosophila rox8 matches three other yeast proteins better than Nam8). Proteins in the U2 snRNP, U5 snRNP and U4/U6.U5 tri-snRNP are generally very conserved, and no significant differences between Drosophila and other species were revealed by our analysis (Table 2).

    Proteins Required for Splice Site Selection

    SR proteins are splicing factors that contain either one or two characteristic RNA-binding domains and an RS domain. These proteins are among the earliest acting proteins in spliceosome assembly (Zahler et al. 1992 ; Graveley et al. 1999 ; Tacke and Manley 1999 ). There are 11 well characterized mammalian SR proteins: 9G8, SRp20, ASF/SF2, SC35, SRp30c, p54, SRp40, SRp55, SRp75 (for review see Mount 1997 ), NSSR1, and NSSR2 (Komatsu et al. 1999 ). Individual SR proteins differ with respect to the sequence specificity of their RNA-binding domains, and with respect to their ability to recognize and activate different exonic splicing enhancer sequences. We have identified seven SR protein genes in the Drosophila genome. These include the previously described B52, RBP1, SRp54 (Kennedy et al. 1998 ) and X16/9G8 (Vorbruggen et al. 2000 ) genes, as well as Drosophila orthologues of ASF/SF2 and SC35. In addition, we have identified a novel gene, CG1987, that is 95% identical to RBP1.

    Phosphorylation of SR proteins is thought to play an important role in controlling spliceosome assembly (Stojdl and Bell 1999 ; Yeakley et al. 1999 ). Both SRPK and LAMMER (or CLK) kinases phosphorylate SR proteins. We have identified three kinases of the SRPK type (CG8174, CG9085, and CG8565) and only one LAMMER kinase, the previously described Doa kinase (Du et al. 1998 ).

    A variety of proteins bind to pre-mRNA (also known as hnRNA), and many of these proteins, defined as hnRNP proteins, have been shown to influence splicing, typically by inhibition of splicing events near their binding sites (Chen et al. 1999 ). A number of Drosophila hnRNP proteins have been described (e.g., Matunis et al. 1992 ), and some nuclear RNA-binding proteins without clear homologues in mammalian species have unambiguous roles in the regulation of splicing (e.g., SXL). However, because it is impossible to determine from sequence alone whether a given RNA-binding protein is likely to function in splicing, or even to reside in the nucleus, we have not undertaken an analysis of these proteins. These proteins are discussed in the accompanying article by Lasko 2000 .

    Genome Contents: Parallels and Differences

    The results of our search for RNA processing factors known from studies in mammalian extracts and Saccharomyces cerevisiae genetics indicate that very few RNA processing factors are absent from the Drosophila genome. Indeed, our survey reveals remarkably little variation in this list among yeast, flies, and mammals. As expected, Drosophila proteins are more closely related to their vertebrate counterparts than to the Saccharomyces cerevisiae proteins.

    The extensive conservation of the components of the spliceosome between vertebrates and Drosophila supports the suggestion that the primary mode of regulating splicing takes place at the level of spliceosome assembly (Lopez 1998 ; Staley and Guthrie 1998 ). Some of these factors, such as SR proteins, which regulate assembly of the spliceosome on many different RNAs, are well conserved and are easily identifiable. Missing from these tables are the factors that regulate the splicing of specific RNAs. These factors are less likely to be well conserved and, indeed, some may prove to be organism specific (e.g., SXL and TRA). Even the variation we observe among proteins and RNAs with clearly established roles in splicing, per se, is weighted towards the early events in splice site selection. For example, the Drosophila genome is missing a set of U1 snRNP proteins that are found in the yeast U1 snRNP but not in the vertebrate U1 snRNP, and the genome does not appear to contain a gene for the U11 RNA, or for the single known protein unique to the U11 snRNP, suggesting that U12 functions without U11 in Drosophila. Here again, variation is observed in components that function in splice site selection.

    References

    Abovich, N., Rosbash, M. 1997. Cross-intron bridging interactions in the yeast commitment complex are conserved in mammals. Cell 89:403-412.

    Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Venter, J.C. 2000. The genomic sequence of Drosophila melanogaster. Science. 287:2185-2195.

    Alonso, A., Beck, E., Jorcano, J.L., Hovemann, B. 1984. Divergence of U2 snRNA sequences in the genome of D. melanogaster. Nucleic Acids Res 12:9543-9550.

    Altschul, S.F., Madden, T.L., Sch?ffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-3402.

    Bedford, M.T., Chan, D.C., Leder, P. 1997. FBP WW domains and the Abl SH3 domain bind to a specific class of proline-rich ligands. EMBO (Eur. Mol. Biol. Organ.) J. 16:2376-2383.

    Bedford, M.T., Reed, R., Leder, P. 1998. WW domain-mediated interactions reveal a spliceosome-associated protein that binds a third class of proline-rich motif: The proline glycine and methionine-rich motif. Proc. Natl. Acad. Sci. USA 95:10602-10607.

    Bouveret, E., Rigaut, G., Shevchenko, A., Wilm, M., Seraphin, B. 2000. A Sm-like protein complex that participates in mRNA degradation. EMBO (Eur. Mol. Biol. Organ.) J. 19:1661-1671.

    Burge, C.B., Tuschl, T., Sharp, P.A. 1999. Splicing of precursors to mRNAs by the spliceosome. In Gesteland R.F., Atkins J.F., eds. The RNA World. Cold Spring Harbor, NY, Cold Spring Harbor Laboratory Press, 525-560.

    Burnette, J., Hatton, A.R., Lopez, A.J. 1999. Trans-acting factors required for inclusion of regulated exons in the ultrabithorax mRNAs of Drosophila melanogaster. Genetics 151:1517-1529.

    Chen, C.D., Kobayashi, R., Helfman, D.M. 1999. Binding of hnRNP H to an exonic splicing silencer is involved in the regulation of alternative splicing of the rat beta-tropomyosin gene. Genes Dev. 13:593-606.

    Chung, S., McLean, M.R., Rymond, B.C. 1999. Yeast ortholog of the Drosophila crooked neck protein promotes spliceosome assembly through stable U4/U6.U5 snRNP addition. RNA 5:1042-1054.

    Crispino, J., Blencowe, B.J., Sharp, P.A. 1994. Complementation by SR proteins of pre-mRNA splicing reactions depleted of U1 snRNP. Science. 2:664-673.

    Das, B., Henning, D., Reddy, R. 1987. Structure, organization, and transcription of Drosophila U6 small nuclear RNA genes. J. Biol. Chem. 262:1187-1193.

    Du, C., McGuffin, M.E., Dauwalder, B., Rabinow, L., Mattox, W. 1998. Protein phosphorylation plays an essential role in the regulation of alternative splicing and sex determination in Drosophila. Mol. Cell. 2:741-750.

    Fortes, P., Bilbao-Cortes, D., Fornerod, M., Rigaut, G., Raymond, W., Seraphin, B., Mattaj, I.W. 1999. Luc7p, a novel yeast U1 snRNP protein with a role in 5' splice site recognition. Genes Dev. 13:2425-2438.

    Gottschalk, A., Tang, J., Puig, O., Salgado, J., Neubauer, G., Colot, H.V., Mann, M., Seraphin, B., Rosbash, M., Luhrmann, R., Fabrizio, P. 1998. A comprehensive biochemical and genetic analysis of the yeast U1 snRNP reveals five novel proteins. RNA 4:374-393.

    Graveley, B.R., Hertel, K.J., Maniatis, T. 1999. SR proteins are locators of the RNA splicing machinery. Curr. Biol 9:R6-R7.

    Hall, S.L., Padgett, R.A. 1994. Conserved sequences in a class of rare eukaryotic nuclear introns with non-consensus splice sites. J. Mol. Biol. 239:357-365.

    Huang, X., Miller, W. 1991. A time-efficient, linear-space local similarity algorithm. Adv. Appl. Math. 12:373-381.

    Jensen, R.C., Wang, Y., Hardin, S.B., Stumph, W.E. 1998. The proximal sequence element (PSE) plays a major role in establishing the RNA polymerase specificity of Drosophila U-snRNA genes. Nucleic Acids Res 26:616-622.

    Kambach, C., Walke, S., Nagai, K. 1999. Structure and assembly of the spliceosomal small nuclear ribonucleoprotein particles. Curr. Opin. Struct. Biol. 9:222-230.

    Kennedy, C.F., Kramer, A., Berget, S.M. 1998. A role for SRp54 during intron bridging of small introns with pyrimidine tracts upstream of the branch point. Mol. Cell. Biol 18:5425-5434.

    Komatsu, M., Kominami, E., Arahata, K., Tsukahara, T. 1999. Cloning and characterization of two neural-salient serine/arginine-rich (NSSR) proteins involved in the regulation of alternative splicing in neurones. Genes Cells. 4:593-606.

    Lasko, P. 2000. The Drosophila genome: translation factors and RNA binding proteins J. Cell Biol. 150:F51-F56.

    Lo, P.C.H., Mount, S.M. 1990. Drosophila melanogaster genes for U1 snRNA variants and their expression during development. Nucleic Acids Res. 18:6971-6979.

    Lopez, J.A. 1998. Alternative Splicing of Pre-mRNA: Developmental consequences and mechanisms of regulation. Annu. Rev. Genet. 32:279-305.

    Matunis, E.L., Matunis, M.J., Dreyfuss, G. 1992. Characterization of the major hnRNP proteins from Drosophila melanogaster. J. Cell Biol. 116:257-269.

    McLean, M.R., Rymond, B.C. 1998. Yeast pre-mRNA splicing requires a pair of U1 snRNP-associated tetratricopeptide repeat proteins. Mol. Cell. Biol. 18:353-360.

    Minvielle-Sebastia, L., Keller, W. 1999. mRNA polyadenylation and its coupling to other RNA processing reactions and to transcription. Curr. Opin. Cell Biol. 11:352-357.

    Mount, S.M. 1997. Genetic depletion reveals an essential role for an SR protein splicing factor in vertebrate cells. Bioessays. 19:189-192.

    Polycarpou-Schwarz, M., Gunderson, S.I., Kandels-Lewis, S., Seraphin, B., Mattaj, I.W. 1996. Drosophila SNF/D25 combines the functions of the two snRNP proteins U1A and U2B'' that are encoded separately in human, potato and yeast. RNA. 2:11-23.

    Puig, O., Gottschalk, A., Fabrizio, P., Seraphin, B. 1999. Interaction of the U1 snRNP with nonconserved intronic sequences affects 5' splice site selection. Genes Dev. 13:569-580.

    Rigaut, G., Schevchenko, A., Rutz, B., Wilm, M., Mann, M., Seraphin, B. 1999. A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 17:1030-1032.

    Saba, J.A., Busch, H., Wright, D., Reddy, R. 1986. Isolation and characterization of two putative full-length Drosophila U4 small nuclear RNA genes. J. Biol. Chem. 261:8750-8753.

    Saluz, H., Dudler, R., Schmidt, T., Kubli, E. 1988. The localization and estimated copy number of Drosophila melanogaster U1, U4, U5 and U6 snRNA genes. Nucleic Acids Res. 16:3582.

    Staley, J.P., Guthrie, C. 1998. Mechanical devices of the spliceosome: motors, clocks, springs, and things. Cell. 92:315-326.

    Stitzinger, S.M., Conrad, T.R., Zachlin, A.M., Salz, H.K. 1999. Functional analysis of SNF, the Drosophila U1A/U2B'' homolog: identification of dispensable and indispensable motifs for both snRNP assembly and function in vivo. RNA. 5:1440-1450.

    Stojdl, D.F., Bell, J.C. 1999. SR protein kinases: the splice of life. Biochem. Cell Biol. 77:293-298.

    Tacke, R., Manley, J.L. 1999. Determinants of SR protein specificity. Curr. Opin. Cell Biol. 11:358-362.

    Tarn, W.Y., Steitz, J.A. 1994. SR proteins can compensate for the loss of U1 snRNP functions in vitro. Genes Dev. 8:2704-2717.

    Tarn, W.Y., Steitz, J.A. 1996. A novel spliceosome containing U11, U12 and U5 snRNPs excises a minor class (AT-AC) intron in vitro. Cell. 84:801-811.

    Tatusova, T.A., Madden, T.L. 1999. BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS (Fed. Eur. Microbiol. Soc.) Microbiol. Lett. 174:247-250.

    Tharun, S., He, W., Mayes, A.E., Lennertz, P., Beggs, J.D., Parker, R. 2000. Yeast Sm-like proteins function in mRNA decapping and decay. Nature. 404:515-518.

    Vorbruggen, G., Onel, S., Jackle, H. 2000. Localized expression of the Drosophila gene Dxl6, a novel member of the serine/arginine rich (SR) family of splicing factors. Mech. Dev. 90:309-312.

    Will, C.L., Schneider, C., Reed, R., Luhrmann, R. 1999. Identification of both shared and distinct proteins in the major and minor spliceosomes. Science. 284:2003-2005.

    Yeakley, J.M, Tronchere, H., Olesen, J., Dyck, J.A., Wang, H.Y., Fu, X.D. 1999. J. Cell Biol. 145:447-455.

    Zahler, A.M., Lane, W.S., Stolk, J.A., Roth, M.B. 1992. SR proteins: a conserved family of pre-mRNA splicing factors. Genes Dev. 6:837-847.

    Zhang, D., Rosbash, M. 1999. Identification of eight proteins that cross-link to pre-mRNA in the yeast commitment complex. Genes Dev. 13:581-592.(Stephen M. Mounta and Helen K. Salzb)