当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2005年 > 第11期 > 正文
编号:11259186
The Silencing of Pseudogenes
     Evolutionary Genomics Group, Division of Microbiology, Miguel Hernandez University, Alicante, Spain

    E-mail: alex.mira@umh.es.

    Abstract

    Pseudogenes are nonfunctional DNA sequences that can accumulate in the genomes of some bacterial species, especially those undergoing processes like niche change, host specialization, or weak selection strength. They may last for long evolutionary periods, opening the question of how the genome prevents expression of these degenerated or disrupted genes that would presumably give rise to malfunctioning proteins. We have investigated ribosomal binding strength at Shine-Dalgarno sequences and the prevalence of 70 promoter regions in pseudogenes across bacteria. It is reported that the RNA polymerase–binding sites and more strongly the ribosome-binding regions of pseudogenes are highly degraded, suggesting that transcription and translation are impaired in nonfunctional open reading frames. This would reduce the metabolic investment on faulty proteins because although pseudogenes can persist for long time periods, they would be effectively silenced. It is unclear whether mutation accumulation on regulatory regions is neutral or whether it is accelerated by selection.

    Key Words: pseudogene ? Shine-Dalgarno ? mutation accumulation ? promoter ? spacer ? gene expression

    When a bacterial gene undergoes a period of frequent mutations, stop codons can be introduced in the sequence. In addition, insertions or deletions can cause a shift in the reading frame or the removal of a vital section of the gene, giving rise to a pseudogene (J. O. Andersson and S. G. Andersson 1999). The finding of these functionless open reading frames (ORFs) was relatively rare and frequently linked to species undergoing episodes of genetic drift or low selection strength. Recently, the exponential increase of genomic data have shown that pseudogenes are more common than previously thought and that they can represent a significant fraction of the genome. The case of the obligate intracellular species Mycobacterium leprae is dramatic, containing over 1,100 recognizable pseudogenes (Cole et al. 2001). Other examples of bacteria with hundreds of pseudogenes are given by species that have reduced their host ranges, like Shigella flexneri or Salmonella typhi (Parkhill et al. 2001; Wei et al. 2003). It has been shown that pseudogenes can also be present in high numbers in free-living species such as Escherichia coli, where numerous pseudogenes had passed previously undetected (Lerat and Ochman 2004).

    Not only are pseudogenes pervasive in bacterial genomes but they can also last for long periods (Mira, Ochman, and Moran 2001; van Ham et al. 2003). For example, the average half-life of pseudogenes in Buchnera has been estimated in 23.9 Myr (Gomez-Valero, Latorre, and Silva 2004), and a few strains of this intracellular symbiont share common deletions in a limited number of pseudogenes, suggesting that some of these nonfunctional ORFs may have lasted in the genome since their divergence, over 50 MYA (Mira, Ochman, and Moran 2001). Shigella shared an ancestor with the K12 strain of E. coli 35,000–270,000 years ago (Pupo, Lan, and Reeves 2000), and 14 common pseudogenes still remain in both species (Lerat and Ochman 2004). In M. leprae the average length of pseudogenes is still 82.3% that of functional genes. Thus, many of these truncated ORFs might in principle be transcribed and/or translated. If this were the case, a great deal of metabolic resources would be employed to express degenerated ORFs that no longer code for proper functions. It is also likely that these malfunctioning proteins could interact with other pathways, for example, by losing specificity and competing for other substrates. It is therefore important that mechanisms are deployed to prevent this potential danger and unnecessary waste of resources.

    One of these mechanisms would be the elimination of intergenic spacers containing regulatory regions or the accumulation of mutations in ribosome-binding sites and transcription promoters. We studied the prevalence of these sequences in different bacteria with numerous pseudogenes. The Shine-Dalgarno (SD) region is a sequence complementary to a highly conserved region at the 3' end of the 16S rRNA gene (Shine and Dalgarno 1974). It is involved in the binding of the mRNA to the ribosome, and changes in the SD sequence severely restrict translation (Dunn, Buzash-Pollert, and Studier 1978). We have estimated the binding strength (free-energy values based on nucleotide pairing rules, see Methods) of the region preceding annotated pseudogenes with the equivalent 3' section of the 16S rRNA from each bacterium. When compared to the functional genes in the same species, pseudogenes appear to have largely lost the SD region (fig. 1). Functional homologs of the pseudogenes in related species had conserved SD regions, indicating that the degradation of these ribosome-binding sites was undergone after the genes turned into pseudogenes. In M. leprae, where pseudogenes have only barely reduced their length (average length is 804 vs. 977 bp for functional genes), the ribosomal binding strength is dramatically diminished. This was repeated in virtually all cases of fully sequenced bacteria (some examples are shown in fig. 1a–e, and statistics are included in the supplementary information table 1, Supplementary Material online). A few exceptions appeared in the species Yersinia pestis, S. typhi, and S. flexneri (the latter is shown in fig. 1f). It is interesting to note that these three species have only recently specialized in human hosts, after which they have probably undergone the pseudogenization expansion (Parkhill et al. 2001). This process is thought to have happened with the human population expansion that occurred in Neolithic times, no more than 10,000 years ago. The case of the causative agent of plague can be particularly recent, as Y. pestis could have emerged as early as 1,500–20,000 years in the past (Achtman et al. 1999). Thus, the few cases in which the SD sequences of pseudogenes displayed close to normal binding strengths occurred in species that have suffered pseudogenization in recent evolutionary time.

    FIG. 1.— Average free energy (negative values indicate larger ribosome-binding strength) versus position (zero indicates gene start) for all functional genes (dashed lines) and pseudogenes (solid lines) for the species (a) Mycobacterium leprae, (b) Lactobacillus plantarum, (c) Staphylococcus aureus MSSA476, (d) Photorhabdus luminescens, (e) Salmonella typhimurium LT2, and (f) Shigella flexneri 2457T. Dotted blue lines indicate binding strength in functional homologs of pseudogenes in the related species (a) Mycobacterium tuberculosis H37Rv, (c) S. aureus Mu50, (d) Erwinia carotovora, (e) Salmonella paratyphi, (f) Escherichia coli K12. Only one functional homolog of L. plantarum pseudogenes was found in Lactbacillus johnsonii and its free-energy value was therefore not included in (b). Sample sizes for pseudogenes, functional genes within the same species, and functional orthologs in related species were, respectively (a) 1,115, 1,605, 99; (b) 42, 3,009, 1; (c) 54, 2,579, 28; (d) 300, 4,683, 9; (e) 39, 4,425, 19; and (f) 254, 4,180, 340.

    The length patterns of intergenic spacers correlate with their flanking genes' potential for transcription initiation and termination (Rogozin et al. 2002). Thus, we compared the length of spacers preceding pseudogenes with the length of equivalent spacers in functional homologs of closely related strains or species (we assumed annotated ORFs that had neither frameshifts nor within-frame stop codons to be functional). Two representative cases are plotted in figure 2. Many homologous spacers had a similar length, forming a line along the 1:1 slope. However, a large number had a shorter length in pseudogenes, including many spacers that were totally eroded. The spacers' shortening was significant in the pseudogenes of Burkholderia, Staphylococcus, Bordetella, and Escherichia (t-test P-values were 0.0005, 0.043, 0.042, and 0.02, respectively) but not universal. In Salmonella, Bartonella, and Streptomyces, the spacers before pseudogenes were not statistically different in length from those of functional homologs (see supplementary information for data on all studied bacteria). In a few cases, like M. leprae or S. flexneri, some of the spacers preceding pseudogenes were actually longer than their corresponding regions in functional homologous genes from Mycobacterium tuberculosis and E. coli, respectively. However, closer examination of those cases revealed that the greater length is due to abundant mutations in contiguous genes, making their recognition difficult and lengthening the annotated spacer region. Thus, the original intergenic spacers of pseudogenes undergo an important shortening in many bacteria when compared to their functional homologs in sister species, suggesting that the regulatory regions tend to be eliminated, even when the pseudogenes are maintained. In addition, some IS elements were found to be inserted at intergenic spacers of pseudogenes in S. flexneri, Bordetella pertussis, and Burkholderia mallei, where they could contribute to promoter inactivation. An effort was also made to investigate the preservation of promoter sequences themselves.

    FIG. 2.— Intergenic spacer length for pseudogenes (Lp) in the species (a) Staphylococcus aureus MSSA476 and (b) Burkholderia mallei 23344 compared to the spacer length in their functional homologs (Lfh) in S. aureus Mu50 and Burkholderia pseudomallei K96243, respectively. Diagonal line indicates a 1:1 ratio (equal length). N = 35 and 147, respectively.

    To calculate a gene's transcription potential in silico is, however, complicated (Stormo 2000). Promoter sequences vary considerably between and within species, making it difficult to estimate their binding strength. Still, a consensus sequence for the –10 and –35 regions of sigma promoters is well defined in E. coli (Wosten 1998), which made it possible to calculate a score indicative of promoter conservation. Scores for the best 70 promoter candidates found on each pseudogene in E. coli CFT073 (the sequenced strain with the highest number of pseudogenes) and its corresponding functional equivalent in the strains K12 and O157 (42 and 38 putative functional homologs were found, respectively) were estimated. This data set is biased toward recently formed pseudogenes (i.e., recognizable by sequence similarity). As shown in figure 3, some promoters have similar scores in either strain but most pseudogenes in the CFT073 strain show divergent promoters when compared with the homologous functional regions in the other two E. coli strains (P < 0.01, t-test). The analysis was repeated in the pseudogenes of Salmonella spp., but no significant result was found (t = –1.94, P = 0.067). Thus, promoter degeneration may not be pervasive across bacteria, although the latter result could be due to different promoter sequences or too short time since pseudogenization. When promoter-finding algorithms are improved, a better evaluation of regulatory regions' degeneration will be possible on a wider range of bacteria. In addition, experimental studies of gene expression in pseudogenes will also be helpful.

    FIG. 3.— Scores indicative of putative 70 promoter conservation in the spacers of Escherichia coli CFT073 pseudogenes compared to the equivalent functional genes in the strains K12 (plus signs) and O157:H7 (crosses). Promoters with scores <0.3 were considered absent or too weak to be operational.

    The transcriptional and translational silencing of pseudogenes would add to the tagging system mediated by transfer and messenger RNA. This dual-nature RNA tags proteins translated from "broken mRNAs," i.e., mRNA sequences without stop codons, which are then degraded by proteases (Withey and Friedman 2003). This mechanism is extended across all examined bacteria (Knudsen et al. 2001) and is probably acting on pseudogenes, but only if their mRNAs lack stop codons. Because the number of stop codons in pseudogenes can be high (Babu 2003), this mechanism is probably insufficient. Moreover, bacteria must translate the proteins before they are tagged and degraded, wasting resources in translating unnecessary peptides. It has been estimated that translational machinery in bacteria can account for up to 80% of the cell's energy and 50% of its dry weight (Maaloe 1979). It is therefore not surprising that the degeneration of SD sequences is so significant. Although less dramatic, removal and divergence of promoters also appear to occur. It is likely that mutations in intergenic spacers take place at the same rate as in the pseudogenes themselves. However, another possibility is that mutations are selected preferentially in regulatory regions. This could obviate the need for deletions on pseudogenes and extend their average half-life.

    The latter possibility can be tested by contrasting deletion rates between pseudogenes and their spacers. Thus, we compared the length of pseudogenes and their spacers in relation to the length of their functional homologs in related species (see supplementary fig. 2, Supplementary Material online). For some species, like B. mallei, Bartonella quintana, or Staphylococcus aureus, deletions seem to have happened preferentially on the spacers. However, in other bacteria such as B. pertussis, there are multiple cases where pseudogene length (relative to their functional homologs) is significantly shorter than their corresponding spacers, and other species show no trend. The results are also difficult to interpret because remnants of IS elements' insertions and eroded genes extend the length of some spacers and pseudogenes, making them longer than their functional counterparts. Furthermore, even if all cases in which ratios are larger than one (implying insertions of IS elements or misannotation of pseudogenes as part of intergenic spacers) are removed, the data do not show a uniform pattern across species. Thus, it is unclear whether selection is accelerating the degeneration of pseudogenes' regulatory regions.

    Methods

    We used the software developed by Osada, Saito, and Tomita (1999) to estimate ribosome-binding strength for all genes and pseudogenes of fully sequenced genomes. It calculates, based on base pair formation rules (Turner et al. 1987), free-energy values for the binding of the 3' end of the 16S rRNA with the region preceding each ORF at different positions. Lower free-energy values indicate higher binding strength.

    We studied promoter preservation solely for E. coli strains, where the consensus sequence for transcription promoters is best known. Annotated pseudogenes in E. coli CFT073 (n = 94) were BlastN searched (Altschul et al. 1997) against the strains K12 and O157:H7 in order to find functional homologs for comparison (E value < 10–5). The DNA sequences of the corresponding spacers preceding these ORFs were extracted, and the "DNA Master" (http://cobamide2.bio.pitt.edu/computer.htm), developed by J. G. Lawrence, was used to calculate a score indicative of 70 promoter strength. The score is obtained by calculating the geometric mean of DNA similarity at the –35 and –10 regions to the E coli consensus sequence TTGACA-TATAAT, separated by a 15- to 19-bp segment. Deviations from the optimal 17-bp segment are weighed 10% of the score given by the two binding regions. Promoter scores were very similar for functional homologs of different E. coli strains (see additional fig. 3 in supplementary file, Supplementary Material online), indicating that the promoter consensus sequence is similar across strains and that the same genes have similar scores.

    Spacer length of both pseudogenes and their corresponding functional homologs was measured as the number of nucleotides between their annotated start codons and the end of the preceding genes.

    Supplementary Material

    Supplementary table 1 and supplementary figures 2 and 3 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

    Acknowledgements

    We thank Y. Osada and J. G. Lawrence for kindly providing the requested software. A.M. is recipient of a ‘Ramón y Cajal’ research contract from MCyT. Support from European Union project GEMINI (QLK3-CT-2002-02056) is also acknowledged. We are grateful to P. L. Valdés for help with statistical analysis and three anonymous referees for constructive comments that improved the manuscript.

    References

    Achtman, M., K. Zurth, G. Morelli, G. Torrea, A. Guiyoule, and E. Carniel. 1999. Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc. Natl. Acad. Sci. USA. 96:14043–14048.

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.

    Andersson, J. O., and S. G. Andersson. 1999. Insights into the evolutionary process of genome degradation. Curr. Opin. Genet. Dev. 9:664–671.

    Babu, M. M. 2003. Did the loss of sigma factors initiate pseudogene accumulation in M. leprae? Trends Microbiol. 11:59–61.

    Cole, S. T., K. Eiglmeier, J. Parkhill et al. (41 co-authors). 2001. Massive gene decay in the leprosy bacillus. Nature 409:1007–1011.

    Dunn, J. J., E. Buzash-Pollert, and F. W. Studier. 1978. Mutations of bacteriophage T7 that affect initiation of synthesis of the gene 0.3 protein. Proc. Natl. Acad. Sci. USA 75:2741–2745.

    Gomez-Valero, L., A. Latorre, and F. J. Silva. 2004. The evolutionary fate of nonfunctional DNA in the bacterial endosymbiont Buchnera aphidicola. Mol. Biol. Evol. 21:2172–2181.

    Knudsen, B., J. Wower, C. Zwieb, and J. Gorodkin. 2001. tmRDB (tmRNA database). Nucleic Acids Res. 29:171–172.

    Lerat, E., and H. Ochman. 2004. Psi-Phi: exploring the outer limits of bacterial pseudogenes. Genome Res. 14:2273–2278.

    Maaloe, O. 1979. Regulation of the protein-synthesizing machinery in ribosomes, tRNA, factors and so on. Pp. 487–542 in R. F. Goldberger, ed. Biological regulation and development, Vol. 1. Plenum Press, New York.

    Mira, A., H. Ochman, and N. A. Moran. 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet. 17:589–596.

    Osada, Y., R. Saito, and M. Tomita. 1999. Analysis of base-pairing potentials between 16S rRNA and 5' UTR for translation initiation in various prokaryotes. Bioinformatics 15:578–581.

    Parkhill, J., G. Dougan, K. D. James et al. (38 co-authors). 2001. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413:848–852.

    Pupo, G. M., R. Lan, and P. R. Reeves. 2000. Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics. Proc. Natl. Acad. Sci. USA 97:10567–10572.

    Rogozin, I. B., K. S. Makarova, D. A. Natale, A. N. Spiridonov, R. L. Tatusov, Y. I. Wolf, J. Yin, and E. V. Koonin. 2002. Congruent evolution of different classes of non-coding DNA in prokaryotic genomes. Nucleic Acids Res. 30:4264–4271.

    Shine, J., and L. Dalgarno. 1974. The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. USA 71:1342–1346.

    Stormo, G. D. 2000. DNA binding sites: representation and discovery. Bioinformatics 16:16–23.

    Turner, D. H., N. Sugimoto, J. A. Jaeger, C. E. Longfellow, S. M. Freier, and R. Kierzek. 1987. Improved parameters for prediction of RNA structure. Cold Spring Harb. Symp. Quant. Biol. 52:123–133.

    van Ham, R. C., J. Kamerbeek, C. Palacios et al. (16 co-authors). 2003. Reductive genome evolution in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA 100:581–586.

    Wei, J., M. B. Goldberg, V. Burland et al. (14 co-authors). 2003. Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T. Infect. Immun. 71:2775–2786.

    Withey, J. H., and D. I. Friedman. 2003. A salvage pathway for protein structures: tmRNA and trans-translation. Annu. Rev. Microbiol. 57:101–123.

    Wosten, M. M. 1998. Eubacterial sigma-factors. FEMS Microbiol. Rev. 22:127–150.(Alex Mira and Ravindra Pu)