当前位置: 首页 > 期刊 > 《遗传学和分子生物学》 > 2005年第4期 > 正文
编号:11340969
Patterns of molecular evolution in pathogenesis-related proteins
http://www.100md.com 《遗传学和分子生物学》
     IUniversidade Federal do Rio Grande do Sul, Instituto de Biociências, Departamento de Genetica, Porto Alegre, RS, Brazil

    IIHeinrich Heine Universitt, Bioinformatik Institut, Dvsseldorf, Germany

    IIIPontificia Universidade Catolica do Rio Grande do Sul, Faculdade de Biociências, Centro de Biologia Genmica e Molecular, Porto Alegre, RS, Brazil

    ABSTRACT

    The genes encoding 13 classes of pathogenesis-related (PR) proteins were examined for positive selection using maximum-likelihood (ML) models of codon substitution. The study involved 194 sequences from 54 species belonging to 37 genera. Although the sizes of the sequences examined varied from 237 bp for PR12 to 1,110 bp for PR7, most classes (9 out of 13) contained sequences made up of more than 400 nucleotides. Signs of positive selection were obtained for sites in PR proteins 4, 6, 8, 9 and 15 using an ML-based Bayesian method and likelihood ratio tests. These results confirm the importance of positive selection in proteins related to defense mechanisms already observed in a wide array of organisms.

    Key words: pathogenesis-related proteins, PRs, molecular variability, positive selection, maximum-likelihood methods.

    Introduction

    Pathogenesis-related (PR) proteins are coded by host plants as a response to pathological or related situations, and normally accumulate not only locally in the place of infection, but are formed systemically following infection by bacteria, fungi or viruses, or after induction by abiotic stress factors. Such proteins have a wide array of functions in that they can be hydrolases, transcription factors, protease inhibitors, enzymes associated with various metabolic pathways and allergenic products. The functional motifs of PR proteins are related to a number of eukaryotic intra and intercellular proteins involved in very distinct functions, e.g. sperm-maturation glycoproteins in rodents, store proteins in seeds or flower development and differentiation. It is possible, therefore, that the defensive functions of PR proteins evolved after their emergence as gene families (see review in van Loon and van Strien, 1999).

    It is clear that PR proteins are important not only to the plants themselves but also to any attempt to improve plants through artificial selection, and it is, therefore, of interest to establish how such proteins evolved naturally. The neutralist theory of molecular evolution (Kimura, 1983) states that the majority of nucleotide and amino acid substitutions have no adaptive consequences, although tests developed to verify this assumption (Wayne and Simonsen, 1998) in some cases reject a strictly neutral model but are unable to distinguish between different forms of natural selection. A powerful method to detect positive selection at the molecular level is the comparison of synonymous (dS) and non-synonymous (dN) substitution rates in genes which code for proteins using the expression w = dN/dS. If amino acid changes are advantageous w will be greater than one, while if they are deleterious w will be lower than one, with neutral mutations yielding w = 1. However, this simple ratio does not account for variable rates of selection between sites. Appropriate codon-based models were developed by Nielsen and Yang (1998) and Yang et al. (2000) who developed a series of 14 different models (M0-M13) to thoroughly investigate the variability of w ratios between sites, each model having a different assumption about the nature of the distribution that could be found. However, some of those models are hard to use and only models 0 (one-ratio), 1 (neutral), 2 (selection), 3 (discrete), 7 (beta) and 8 (beta & w) are recommended by Yang et al. (2000).

    The subject of the present paper is a survey of the primary structure of representatives of 13 of the 15 PR protein families, searching for evidences of positive selection. It will be seen that the search resulted in the identification of several sites in which such a process is probably occurring.

    Material and Methods

    Data retrieval

    The databanks used as sources for the protein and DNA sequences were the SWISS-PROT and TrEMBL (Bairoch and Apweiler, 2000) developed by the Swiss Institute of Bioinformatics and the European Bioinformatics Institute respectively. In a few cases the GenBank (National Center for Biotechnological Information, USA) was also utilized. These databanks and several other tools can be found in the Expert Protein Analysis System (ExPASy) site at http://www.expasy.org/.

    The sequence search was based on the PR protein classification performed by van Loon and van Strien (1999), the descriptions and authors referenced for each PR protein in that article being used as key words for the searches.

    Several searching tools were utilized with the objective of covering all the sequences registered up to now. The tool which was found to be the most efficient for the recognition and identification of the available sequences was the Sequence Retrieval System (SRS) of Etzold and Argos (1993) available on the SWISS-PROT page. This procedure was used to obtain most of the sequences and was especially useful for imperfectly described sequences because it permits several searching procedures, i.e. by bibliographic reference, organism, key word, gene name or access code. The PSI BLAST Network Service (Altschul et al., 1997) was also employed and shown to be very comprehensive, avoiding the retrieval of sequences that would not be useful in the analysis. This tool has a direct link with the SWISS-PROT access (http://www.expasy.org/sprot).

    Methodology

    The DNA sequences were aligned with the multiple sequence alignment CLUSTAL X program version 1.8 (Thompson et al., 1997; Jeanmougin et al., 1998) with manual corrections based on the codon pattern obtained using the DNATagger program, a color program for DNA coding alignments (Monteiro de Basso and N.M. Scherer, unpublished). Considerable care was taken to guarantee a proper analysis of the data, and archives of the alignments are available on request.

    The phylogenies were estimated for each family by maximum likelihood and neighbor-joining methods using the p distance option and the HKY substitution model (Hasegawa et al., 1985) implemented in the Tree-Puzzle program (Schmidt et al., 2002). The resultant tree topologies were used to calculate the branch lengths using the M0 model through the CODEML program of the PAML packet (Yang, 1997). A relative measure of sequence divergence was calculated using the average number of nucleotide changes per codon per branch, that is S/(2T-3), where 2T-3 is the number of branches of an unrooted tree of T taxa and S is the number of nucleotide substitutions per codon along the tree.

    Afterwards, analyses using the maximum-likelihood models recommended by Yang et al (2000) were implemented in the PAML program. All models were run using the F3x4 option in the PAML program, where expected codon frequencies were based upon nucleotide frequencies occurring at the three codon positions. The one-ratio model (M0) assumes one w ratio for all sites. The neutral model (M1) presupposes a proportion p0 of conserved sites with w0 = 0 and p1 = 1 - p0 of neutral sites with w1 = 1, as would occur if almost all non-synonymous substitutions were either deleterious or neutral. The positive selection model (M2) adds an additional class of sites with frequency p2 = 1-p0-p1 and w2 is estimated from the data. In the discrete model (M3), the probabilities (p0, p1 and p2) of each site which was submitted to purifying selection, neutral selection and positive selection, respectively, and their corresponding w ratios (w0, w1, w2) are inferred from the data. The beta model (M7) is a null test for positive selection, assuming a beta distribution with w between 0 and 1. Finally, the beta & w (M8) model adds one extra class with the same ratio w1. Sites which yielded posterior probabilities higher than 95% were considered significantly affected by selection.

    The likelihood ratio test (LRT) was used to verify whether the difference ratio w was significantly different from 1 for each pairwise comparison: M1 vs. M2, M0 vs. M3, M7 vs. M8 and M8a (beta & w = 1) vs. M8 (beta & w > 1). The LRT performs a comparison of the likelihood scores of the two models, with the constraint of w = 1 and without such constraint: LR = 2 (ln1 - ln2). This LRT statistic approximately follows a chi-square distribution and the number of degrees of freedom is equal to the number of additional parameters in the more complex model. The question of sequence divergence and the accuracy and power of the likelihood ratio test in detecting positive selection has been examined by Anisimova et al. (2001, 2002) and we have followed their recommendations in the choice of the tests.

    Results

    The main characteristics of 13 of the 14 recognized PR protein families, plus one that may be soon included (L.C. van Loon, personal communication) are given in Table 1. As mentioned in the introduction, these proteins have a wide array of forms with different properties but they can be considered together due to their function in plant defense.

    The PR proteins studied, whose accession numbers in the data banks are displayed in Table 2, have their main characteristics listed in Table 3. They varied markedly in sizes, from only 237 bp (PR12) up to 1,110 bp (PR2), the latter being 4.7 times larger than the first. The interval of values for the transition/transversion rates was from 1.45 (PR10) to 3.42 (PR15), while the relative sequence divergencies ranged between 0.027 (PR8) and 0.423 (PR3).

    We calculated four LRTs, which compared M0 with M3, M1 with M2, M7 with M8, and M8 with M8a, the latter being a procedure available in the PAML program, in which a null M8a, with w = 1 fixed is compared with the alternative M8 with the constraint that w should be equal or greater than one. The significant results of these LRTs are shown in Table 5.

    The LRT results (Table 5) indicate that all comparisons for PR proteins 6, 8 and 9 were statistically significant and for PR15 in three comparisons. For PR1, PR4, PR10 and PR13 significance was achieved in the M1 vs. M2 and M0 vs. M3 comparisons only.

    Let us briefly summarize the cases in which values of w significantly higher than one were obtained in at least two models. The highest values of w ~ 6 were obtained for PR6. Seventeen sites yielded posterior Bayesian probabilities for positive selection above 95% under the M3 model and two of them (25 and 84) gave confirmatory indications with model M2. Values of w of about 3 in 5% of the sites were obtained for PR8 and sites 128, 131 and 248 yielded significant values under models M2, M3 and M8. A lower w number (~2.5 in 2% of the sites) was obtained for PR9. Two sites in this protein (45 and 161) showed indications of positive selection in M2 and M8. For PR4 we found that w ~ 2 in 2% of the sites and site 99 showed significant numbers in the M2, M3 and M8 models. A value of w ~ 4 was found for PR15 but only for site 48 with models M3 and M8.

    It is important to observe that the M0 vs. M3 comparison is rather a test of variable selective pressure between sites than a test of positive selection (Anisimova et al. 2001,Wong et al. 2004). However, in all cases in which this comparison showed significant values, additional indications of positive selection were found in relation to other models.

    Discussion

    The evidence for positive selection at the molecular level is now overwhelming. Wolfe and Li (2003) list 16 genes or proteins related to defensive systems or immunity, 18 related to evading defensive systems or immunity, 12 related to male reproduction, five related to female reproduction and 22 which they placed in a miscellaneous category in which such type of selection was verified. To these we could add the results of Swanson and Aquadro (2002) on members of the antifreeze protein multigene family, of Rodriguez-Trelles et al. (2003) on the xanthine dehydrogenase gene and of Clark et al. (2003) on human-chimpanzee-mouse orthologous trios. In relation specifically to innate immunity in plants and animals, Nvrnberger and Brunner (2002) identified a series of parallels between the recognition of general elicitors and pathogen-associated molecular patterns.

    The approach developed by Z. Yang and collaborators has been especially useful for these investigations. A general evaluation of the methods involved in the comparison between synonymous and non-synonymous substitution rates in protein-coding DNA sequences was made by Yang (2001). Details on the accuracy and power of Bayesan approaches to this problem were presented by Anisimova et al. (2001, 2002) and Suzuki and Nei (2002, 2004), while Yang and Nielsen (2002) and Yang and Swanson (2002) explored additional questions regarding codon-substitution models.

    The physiological significance of the results presented in the present paper needs further investigation. Restricting our attention to the PR proteins in which clear indications of positive selection were found, we verify that PR proteins 4 and 8 are chitinases (although of different types: PR4 = types I, II; PR8 = type III), PR6 is a proteinase inhibitor, PR9 a peroxidase and PR15 an oxalate oxidase. Bishop et al. (2000) extensively discussed the molecular structures and patterns of amino acid replacements in chitinases I and III, concluding that they are basically different, and that the unusual pattern of adaptive replacements in the active site cleft of chitinase I may be due to an arms race between the plant and inhibitors developed by the pathogenic species. We have identified a higher number of sites with indications of positive selection (w ~ 3 in 5% of the sites) in PR8 (a chitinase III type protein) than in PR4 (w ~ 2 in 2% of the sites) which includes both types I and II chitinases. Additionally, sites 128, 131 and 145 are close to one of the catalytic sites while site 279 is near an active site residue. In contrast, no indication of positive selection was found for PR3, also a chitinase.

    PR6 is a proteinase inhibitor which directly acts against insects and other herbivores, micro-organisms and nematodes, positive selection at certain of its sites is therefore not surprising.

    The PR9 peroxidases probably function in strengthening plant cell walls by catalyzing lignin deposition in reaction to microbial attack. Its sites 45 and 191 are located respectively between helix 2 and beta sheets, and in helix 13, near the active site of the peroxidase chain (Pfam Protein Families Database; Bateman et al., 2002) and should be functionally important.

    Oxalate oxidases influence different stages of the plants metabolism. Site 48 of PR15 may be significant in this regard, but structural data are needed to verify this possibility.

    The presence of positive selection is of course determined by the role that a given protein has in the biology of a determined organism. Ongoing unpublished results of our group have detected the absence of positive selection in maturases, oleosins and auxins of several plant species and the presence of positive selection in glycoproteins belonging to four other plant species.

    Acknowledgments

    We thank Arndt von Haeseler for thoughtful comments on the manuscript. This work was supported by the Programa de Apoio a Nucleos de Excelência (PRONEX), Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq), Financiadora de Estudos e Projetos (FINEP), Coordenao de Aperfeioamento de Pessoal de Nivel Superior (CAPES), Fundao de Amparo a Pesquisa do Estado do Rio Grande do Sul (FAPERGS) and Pro-Reitoria de Pesquisa da Universidade Federal do Rio Grande do Sul (PROPESQ-UFRGS).

    References

    Altschul SF, Madden TL, Schffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ (1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl Ac Res 25:3389-3402.

    Anisimova M, Bielawski JP and Yang Z (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18:1585-1592.

    Anisimova M, Bielawski JP and Yang Z (2002) Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol Biol Evol 19:950-958.

    Bairoch A and Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucl Ac Res 28:45-48.

    Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M and Sonnhammer ELC (2002) The Pfam Protein Families Database. Nucl Ac Res 30:276-280.

    Bishop JG, Dean AM and Mitchell-Olds T (2000) Rapid evolution in plant chitinases: Molecular targets of selection in plant-pathogen coevolution. Proc Natl Acad Sci USA 97:5322-5327.

    Clark AG, Glanowski S, Nielsen R, Thomas PD, Kejariwal A, Todd MA, Tanenbaum DM, Civello D, Lu F, Murphy B, Ferreira S, Wang G, Zheng X, White TJ, Sninsky JJ, Adams MD and Cargill M (2003) Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302:1960-1963.

    Etzold T and Argos P (1993) SRS - An indexing and retrieval tool for flat file data libraries. Comp Appl Biosc 9:49-57.

    Hasegawa M, Kishino H and Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160-174.

    Jeanmougin F, Thompson JD, Gouy M, Higgins DG and Gibson TJ (1998) Multiple sequence alignment with Clustal X. Trends Biochem Sci 23:403-405.

    Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge.

    Nielsen R and Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-936.

    Nvrnberger T and Brunner F (2002) Innate immunity in plants and animals: Emerging parallels between the recognition of general elicitors and pathogen-associated molecular patterns. Curr Op Plant Biol 5:318-324.

    Rodriguez-Trelles F, Tarrio R and Ayala FJ (2003) Convergent neo-functionalization by positive Darwinian selection after ancient recurrent duplications of the xanthine dehydrogenase gene. Proc Nat Acad Sci 100:13413-13417.

    Schmidt HA, Strimmer K, Vingron M and von Haeseler A (2002) TREE-PUZZLE: Maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502-504.

    Suzuki Y and Nei M (2002) False-positive selection identified by ML-based methods: Examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus. Mol Biol.Evol 21:914-921.

    Suzuki Y and Nei M (2004) Simulation study of the reliability and robustness of the statistical methods for detecting positive selection at single amino acid sites. Mol Biol Evol 19:1865-1869.

    Swanson WJ and Aquadro CF (2002) Positive Darwinian selection promotes heterogeneity among members of the antifreeze protein multigene family. J Mol Evol 54:403-410.

    Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F and Higgins DG (1997) The Clustal X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucl Ac Res 24:4876-4882.

    Van Loon LC and van Strien EA (1999) The families of pathogenesis-related proteins, their activities, and comparative analysis of PR-1 type proteins. Physiol Mol Plant Pathol 55:85-97.

    Wayne ML and Simonsen KL (1998) Statistical tests of neutrality in the age of weak selection. Tr Ecol Evol 13:236-240.

    Wolfe KH and Li W-H (2003) Molecular evolution meets the genomics revolution. Nature Genet 33(suppl.):255-265.

    Wong WSW, Yang Z, Goldman N and Nielsen R (2004) Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168:1041-1051.

    Yang Z (1997) Phylogenetic Analysis by Maximum Likelihood (PAML), version 1.4. Department of Integrative Biology, University of California, Berkeley.

    Yang Z (2001) Adaptive molecular evolution. In: Bishop M and Cannings C (eds) Handbook of Statistical Genetics. Wiley, New York, pp 327-350.

    Yang Z and Nielsen R (2002) Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol 19:908-917.

    Yang Z and Swanson WJ (2002) Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol Biol Evol 19:49-57.

    Yang Z, Nielsen R, Goldman N and Pedersen AMK (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-449.(Nicole M. Scherer ; Claud)