当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第8期 > 正文
编号:11255063
A Universal Evolutionary Index for Amino Acid Changes
     * Department of Ecology and Evolution, University of Chicago

    Division of Molecular Biology and Biochemistry, University of Missouri-Kansas City

    E-mail: ciwu@uchicago.edu.

    Abstract

    Different nonsynonymous changes may be under different selective pressure during evolution. Of the 190 possible interchanges among the 20 amino acids, only 75 can be attained by a single-base substitution. An evolutionary index (EI) can be empirically computed for each of the 75 elementary changes as the likelihood of substitutions, relative to that of synonymous changes. We used 280, 1,306, 2,488, and 309 orthologous genes from primates (human versus Old World monkey), rodents (mouse versus rat), yeast (S. cerevisiae versus S. paradoxus), and Drosophila (D. melanogaster versus D. simulans), respectively, to estimate the EIs. In each data set, EI varies more than 10-fold, and the correlation coefficients of EIs from the pairwise comparisons are high (e.g., r = 0.91 between rodent and yeast). The high correlations suggest that the amino acid properties are strong determinants of protein evolution, irrespective of the identities of the proteins or the taxa of interest. However, these properties are not well captured in conventional measures of amino acid exchangeability. We, therefore, propose a universal index of exchange (U): for any large data set, its EI can be expressed as U*R, where R is the average Ka/Ks for that data set. The codon-based, empirically determined EI (i.e., U*R) makes much better predictions on protein evolution than do previous methods.

    Key Words: amino acid substitution ? genetic code ? protein evolution ? amino acid change

    Introduction

    Many previous studies have shown that amino acid pairs are highly disparate in their evolutionary exchangeability (Grantham 1974; Dayhoff, Schwartz, and Orcutt 1978; Graur 1985; Li, Wu, and Luo 1985; Yang, Nielsen, and Hasegawa 1998; Wyckoff, Wang, and Wu 2000). What are the rules governing amino acid substitutions? Amino acid changes have been categorized by their physicochemical properties (Doolittle 1979; Grantham 1974; Miyata, Miyazawa, and Yasunaga 1979). The greater the distance, the less similar the amino acids are, and the less exchangeable they become during evolution (Li, Wu, and Luo 1985; Yang, Nielsen, and Hasegawa 1998; Wyckoff, Wang, and Wu 2000;). These measures of amino acid similarities implicitly assume that there exist rules that (universally) govern amino acid substitutions during evolution. If such rules do exist, we should find strong correlations among different sets of genes (e.g., rapidly evolving versus slowly evolving genes) and among divergent taxa. The main objective of this study is to measure these correlations. Only if these correlations are high does it make sense to define the rules that govern the amino acid exchangeability.

    Of the 190 possible interchanges among the 20 amino acids, only 75 can be achieved by single-base substitutions, which are referred to as elementary amino acid changes. The remaining 115 can occur by two-base or three-base substitutions. They are composites of two or three elementary changes and should be treated separately. By considering only the elementary changes, we can formulate the neutral evolutionary expectation. The evolutionary index (EI) for each elementary amino acid change is the likelihood of its fixation relative to that of the synonymous changes. EI is, therefore, the equivalent of the Ka/Ks ratio for each elementary change or the equivalent to the ij of Yang, Nielsen, and Hasegawa (1998). We should note that, when the amino acid changes are classified into so many bins, the analysis necessarily has to be based on a large collection of genes to have the resolution.

    We measure EIs among four sets of genes from four pairs of species (human versus the macaque monkey, mouse versus rat, S. cerevisiae versus S. paradoxus, and D. melanogaster versus D. simulans).

    Previous empirical approaches have also sought to measure amino acid exchangeability during evolution for each type of amino acid change (e.g., PAM, BLOSUM) (Dayhoff, Schwartz, and Orcutt 1978; Henikoff and Henikoff 1993). Such matrices were used to identify distant homologs (Altschul and Lipman 1990) or to detect selection on amino acid substitutions (Li, Wu, and Luo 1985; Yang, Nielsen and Hasegawa 1998; Wyckoff, Wang, and Wu 2000). These approaches are based on amino acids, not codons. In PAM matrix, each entry is related to the observed exchanges between amino acid i and amino acid j divided by the expected exchanges, which is the product of their respective frequencies in the data set. The EI proposed here formulates the expectation according to the genetic code (see Materials and Methods) and is hence a very different measure from PAM (see also Results).

    The analysis shows that the EI values differ by at least 10-fold between conservative and radical changes but their relative ranking appears stable across gene sets among eukaryotic organisms. As a result, it is possible to predict amino acid exchangeabilities during evolution accurately. EIij is also an important parameter in most codon substitution models (Goldman and Yang 1994; Muse and Gaut 1994). These models assume either uniform EIij values or a fixed scale among them (such as the one defined by the Grantham's [1974] matrix). Our results will have some relevance to the construction of such models, but their applications to modeling are beyond the scope of this study.

    Materials and Methods

    DNA Sequences

    Primate, rodent, and Drosophila sequences were gathered from GenBank. The yeast orthologous sequences were downloaded from MIT yeast genomics group (http://www-genome.wi.mit.edu/personal/manoli/yeasts/) (Kellis et al. 2003). For primates, 280 gene sequences were determined to be orthologs between human and Old World monkey (OWM [Macaca spp. or Papio spp.]). For rodent, yeast, and Drosophila, the number of orthologous sequences used were 1,306, 2,488, and 309, respectively. Coding region sequences were trimmed from GenBank records, and ortholog pairs were aligned using the DNASTAR Megalign program. The protein-coding frame was taken into account. Orthology was assigned by a method similar to that described previously (Makalowski and Boguski 1998), and literature and additional annotation in GenBank were used to assign orthology for pairs of genes. Ortholog pairs were additionally examined by use of NewDiverge (Genetics Computer Group 1999) to obtain information on evolutionary distance (Li 1993). The summary of the data sets is shown in table 1.

    Table 1 Summary of the Data Sets.

    Definition of EIs

    For each elementary amino acid change, we calculated the expected number of changes between them, relative to the number of silent changes (see below). From the set of sequences, we recorded the observed changes between such pairs of amino acids. A small fraction of amino acid changes differ by more than 1 bp and were decomposed into two or three elementary changes. EI is defined as the observed/expected amino acid changes.

    Expected Amino Acid Changes

    All possible one-step changes for all 61 codons in the mammalian nuclear genetic code, except the three stop codons, were determined. In one step, a codon can change in nine different ways and generally can have from zero to three synonymous changes and from six to nine nonsynonymous changes, except in cases of "sixfold" degeneracy such as leucine or arginine. The set of all possible one-step changes from a given codon is the basis of the expectation calculation. We shall use TTT (F) as an example where (F) is the one letter amino acid code.

    The codon change pattern is as folllows:

    TTT (F) CTT (L) ATT (I) GTT (V)

    TTT (F) TCT (S) TAT (Y) TGT (C)

    TTT (F) TTC (F) TTA (L) TTG (L)

    We then sum over the codon changes and convert them into the amino acid change patterns below:

    F F : i

    F L : i + 2v

    F S : i

    F I, V, Y, C : v

    Where i is the transition rate and v is the transversion rate. By examining the fourfold degenerate sites in the primate sequences, we found the ratio of transition to transversion to be 2.2 (table 1) (i.e., = 4.4). This ratio is used in the calculation of primate sequence evolution, and an identical approach is used for sequences of other pairs.

    The description above is the expected synonymous and amino acid changes based on a single codon, TTT. A similar description can be made for all 61 nontermination codons. For the set of genes, the calculation of the expectation is then weighted by the number of codons that appear in the sequences of both species. These weighted changes are put into a 64 x 64 codon change matrix (effectively 61 x 61, because stop codons are excluded). The codon change pattern for TTT has been given above. Our approach, therefore, takes into account the amino acid composition and synonymous codon usage. The total number of synonymous changes across all codons in the expectation matrix is set equal to the number of observed synonymous changes, and the number of each type of nonsynonymous change is scaled accordingly. The 64x64 codon matrix was then converted into a 20x20 amino acid matrix by summing over all codons that code for the same amino acid. Again, this conversion has been shown above for TTT.

    The expectation amino acid matrices are "folded" so that they are symmetric because, without ancestry data, there is no directionality among the amino acid changes. The average of the expected number for each elementary change is 166.6, 2342.6, 8114.9, and 184.1 respectively, for primates, rodents, yeast, and Drosophila. Thus, even with 75 classes, there is sufficient resolution for each class.

    Observed Amino Acid Changes

    The numbers of substitutions, synonymous or nonsynonymous, are put into a 64x64 observed matrix. Two-step changes are counted as two one-step changes. Because there are two such pathways (e.g., TTT–TCT–TCC versus TTT–TTC–TCC), they are weighted according to the observed patterns in the 1-bp changes as done before (Li 1993). Three-step changes are ignored, but there are few of them in the data set. The 64x64 codon matrix is again converted into a 20x20 amino acid observation matrix.

    Calculation of EIs

    If we ignore multiple substitutions at the same nucleotide sites that are less a problem for closely related species, the uncorrected EI (designated with an asterisk [*]) is

    To make the correction for multiple substitutions, we shall follow the method of Li (1993). Let A' be the actual observed number of transitions per base pair, and let B' be the actual observed number of transversions per base pair. EI* corresponds to the Ka(uncorrected)/Ks(uncorrected) where

    Subscripts 0, 2, and 4 denote nondegenerate, twofold, and fourfold degenerate sites, respectively, and ni denotes the number of sites for each category. The corrected Ka/Ks is the standard usage, as defined in Li (1993), and will not be repeated here. (Note that each EI value is equivalent to Ka/Ks for that elementary amino acid change.) For each of the four data sets, we first obtained a correction factor, c, which is the ratio of uncorrected Ka/Ks to corrected Ka/Ks for the whole data set. The uncorrected EI* values are divided by a factor of c to yield the corrected EI values, as shown in tables 2. To check the validity of this correction procedure, we carried out computer simulations as described below.

    Table 2 The Observed EI Values and the Universal Index.

    Computer Simulations

    We compare the input EI values, which were used to simulate sequence evolution, and the estimated EI values based on the simulated DNA sequences. The objective is to ensure the accuracy of our estimation algorithms (including the component that corrects for multiple hits). To simplify the simulations, we concatenate all sequences and treat them as a single gene. This is equivalent to weighting each gene by its length as we did in the estimation. The concatenated sequence was randomly hit with mutations that are either transitions or transversions. The ratio was empirically determined from changes in the fourfold degeneracy sites. The probability of fixation for synonymous changes is 1, and the probability of fixation for the nonsynonymous changes is predetermined. For the latter, we used the EI values obtained from the sequences, as shown in table 2. Each run of simulation is continued until the cumulative number of synonymous substitutions is equal to the observed number. Simulation runs were carried out 1,000 times. The mean and 95% confidence interval for each EI index value were obtained from the runs, as shown in figure 1.

    FIG. 1. Estimation of primate EIs from simulated sequences. Predetermined EI values obtained from table 2 were used to simulate the likelihood of amino acid fixation on DNA sequences (see Materials and Methods). The resultant DNA sequences were then used to re-estimate the EIs. These re-estimated EIs are compared with the predetermined input values, as shown. The 95% confidence intervals from the simulation are also given. The x-axis is the ranking of amino acid changes based on the observed primate EIs

    Effect of Elevated Mutation Rates at CpG Sites

    The high rate of mutation at CpG sites for primate and rodent sequences may add complications to the calculation of the expected values. However, even using the high value reported recently (Hellmann et al. 2003). We found the correlations almost unchanged (data not shown). The reasons for this robustness are (1) CpG sites account for less than 3% of the data sets, (2) in the calculation between species pairs such as human and macaque or mouse and rat, the changes in non-CpG sites predominate, and (3) the changes most affected are Arg His, Arg Gln, and Met Thr, and these changes are affected to a very similar extent in both comparisons. EIs, therefore, take into account variations in codon (and, hence, amino acid) composition, transition vs. transversion bias, nucleotide composition and multiple substitutions.

    Calculation of the Universal Index, U

    Given the high correlations of EIs from different sets of genes of different taxa, we now propose a universal index U. For any data set (or any gene), the predicted EI will be U*R, where R is the weighted average Ka/Ks for that data set. U is scaled such that its weighted average is 1. We used rodent and yeast EIs to obtain U, as they are based on the largest numbers of nonsynonymous substitutions.

    In the plot of the observed EIs for yeast versus rodent, the fitted values (êIi(r), êIi(y)) for a particular kind (i) of elementary amino acid change should be (Ui*R(r),Ui*R(y)) where R(r) and R(y) are weighted average Ka/Ks for rodent and yeast, respectively. All the 75 fitted points should lie on the fitted line that is through the origin with the slope equal to R(y)/R(r). Obviously, the point position for the observed value is (EIi(r), EIi(y)). For each particular kind of elementary amino acid change, we make the line connecting the observed and fitted values perpendicular to the fitted line. By doing that, we minimize the total residual sum of squares for yeast and rodent EIs (i.e., minimize (( + ) (here êi(r) = EIi(r) – êIi(r) and êi(y) = EIi(y) – êIi(y)). Now,

    from which we can derive

    for type i elementary amino acid change.

    Calculation of PAM-4

    The identity percentage for the primate sequences used in this study is 95.8% at the amino acid level. We, therefore, used the PAM-4 substitution matrix with each entry Sij. PAM-4 is derived from PAM-1 by assuming the Markovian transition model (Dayhoff, Schwartz, and Orcutt 1978). Each Sij in PAM-4 is 10 times the log odds ratio of two probabilities: Pr(observed i j mutation rate) and Pr(mutation rate expected from amino acid frequencies). We denote the ratio of the two probabilities as P(ij), which can be an empirical measure of the relative exchangeability for that type of amino acid change.

    Results

    Calculation of EIs

    The EI values for the 75 elementary amino acid changes are given in table 2. To corroborate the accuracy of the estimation, we conducted computer simulations in which the EI values of, say, primates were used to simulate sequence evolution (see Materials and Methods). The resultant sequences were then subjected to the same estimation procedure. In figure 1, the input EI values and the estimated EI values from simulations are shown, together with the 95% interval. The estimates are highly accurate. We should also emphasize that, in the estimation of EIs, we pool a large number of sites from various regions of many proteins. Each EI represents the mean intensity of selection against each elementary change across all regions of all proteins. In other words, EI is a genomic property and may not be an accurate predictor of changes at individual sites.

    For each data set, our estimated EIs differ by at least 10-fold between the most radical and conservative changes, as shown in table 2. The separate estimation of nonsynonymous and synonymous substitutions (Li, Wu, and Luo 1985; Nei and Gojobori 1986) has been a most widely used practice in molecular evolutionary studies. It has led to conclusions of positive selection in many recent studies (Wyckoff, Wang, and Wu 2000; Yang, Nielsen, and Hasegawa 1998; Zhang 2000). Table 2 suggests that amino acid changes are highly disparate in their evolutionary dynamics. Therefore, when all nonsynonymous changes are lumped into one class, we might in fact lose much evolutionary information.

    The Correlations Among EIs from Different Taxa

    Attempts to classify amino acid changes according to their evolutionary exchangeability have been briefly noted (Li, Wu, and Luo 1985; Nei and Gojobori 1986; Henikoff and Henikoff 1993). All these classifications implicitly assume that there exist universal rules governing amino acid exchangeability. If amino acid 1 and amino acid 2 are more exchangeable than amino acid 3 and amino acid 4 in some genes, is the former pair also more exchangeable than the latter in most other genes? Unless such consistency can be demonstrated across genes and across taxa, it would be futile to attempt to formulate such rules. By the pairwise comparisons of EI values (table 3), we found that the correlations are generally high, around 0.8. The degree of correlation depends strongly on the size of the data set. The highest correlation (r = 0.91) exists between the rodent and yeast data sets, which have the largest numbers of changes (table 1). The lowest (r = 0.788) lies between the primate and Drosophila data sets with relatively small numbers.

    Table 3 Correlation Between the Observed EI Values Derived from Primate, Rodent, Drosophila, and Yeast.

    EI also correlates well among genes with disparate evolutionary rates within taxa. We divided the primate and rodent genes into rapidly evolving and slowly evolving sets according to their Ka/Ks values. The two sets have approximately equal total numbers of nonsynonymous changes. The correlation coefficient between the EI values of rapidly evolving and slowly evolving genes is 0.8 (table 3). If an amino acid change is conservative among slowly evolving genes, it is also conservative in more rapidly evolving genes. Differences in constraint do not change the rank order of amino acid changes, but instead "slide" the scale up or down.

    The Universal EI (U)

    The high correlations suggest that the relative ranking of EIs is consistent across genes and taxa, as long as the total number of nonsynonymous substitutions in the data set is large (> 4,000). Therefore, a universal ranking, Ui, may exist, which is described by a set of 75 measures linearly correlated with the EIs of table 2. In Materials and Methods, we estimate Ui values as follows:

    U is shown in the last column of table 2. We propose that the predicted EIs for any large data set can be expressed as U*R, where R is the weighted average Ka/Ks for that data set (see Materials and Methods). In other words, E.I. = U x Ka/Ks.

    In figure 2, we compare the observed EIs with the predicted EIs (= U*R) for each of the data sets from rodents, yeast, primate, and Drosophila, respectively. The correlations are greater than 0.84 and, when the regression lines are fitted through the origin, the slopes are approximately 1.

    FIG. 2. Correlations between the observed EIs and predicted EIs for the four data sets. The predicted values on the x-axis are based on the universal EIs (U*R; see text)

    Comparing U with the Conventional Measures

    We now compare the observed EIs from these data sets with two conventional measures. The first one is Grantham's distance (Grantham 1974), which considers the volume, weight, polarity, and carbon-composition of amino acids to determine their exchangeability. The correlation between Grantham's distance and EI is –0.668, –0.653, –0.617, and –0.559 for the primate, rodent, yeast, and Drosophila data, respectively. Figure 3a shows the correlation with the primate EIs. Another widely used measure is the PAM matrix (Dayhoff, Schwartz, and Orcutt 1978), which calculates the amino acid exchangeability between closely related species. EI is modeled after PAM, but the latter is amino acid–based, whereas the former is codon-based. The PAM matrix is divergence-dependent and, given that the average divergence for the primate sequences is 4.2%, we used PAM-4 (see Materials and Methods). Figure 3b shows a positive correlation between the observed EIs in primates and the exchangeability index derived from PAM-4 (see Materials and Methods). The correlation coefficient is 0.670, in contrast with the result of figure 2c (r = 0.841) based on the U index of table 2.

    FIG. 3. (a). The correlation between the observed primate EIs of table 2 and the Grantham distance. (b) The correlation between the observed primate EIs and PAM where PAM is Pr(observed amino acid mutation rate)/Pr(mutation rate expected from amino acid frequencies) derived from PAM-4 (see Materials and Methods). These two panels, when compared with figure 2c, reveal the enhanced predictive power of U over the previous measures

    In summary, although Grantham's distance (Grantham 1974) and other measures of amino acid properties (Doolittle 1971; Miyata, Miyazawa, and Yasunaga 1979), capture some aspects of protein evolution, there are important aspects of amino acid substitutions that elude these measurements, as noted earlier (Yang, Nielsen, and Hasegawa 1998). Although PAM is empirically based, it is determined on amino acid frequencies, without regard to the underlying genetic code; PAM does not generate accurate prediction. Based on the universal index (U), EIs appear to work better than the conventional measures (figure 2c versus figure 3).

    Discussion

    The main finding of this study is that there is much consistency in the relative ranking of the amino acid evolutionary index among divergent taxa and among genes of diverse functions evolving at different rates. The high correlation indicates that the regression of EIs among distant taxa, such as yeast versus rodents, may be viewed as the universal EIs (the U column of table 2). The predicted EIs of any data set is simply U multiplied by the average Ka/Ks value. The U index predicts the observed EIs much better than the conventional methods (figure 2 versus figure 3; R2 > 0.71 versus R2 < 0.5).

    An application of the EI results may be in the detection of selection. There are several approaches to detect the influence of positive selection (McDonald and Kreitman 1991; Wyckoff, Wang, and Wu 2000; Fay, Wyckoff, and Wu 2001, 2002; Yang and Swanson 2002; Bamshad and Wooding 2003). A most straightforward one is to identify genes with Ka/Ks >1 (e.g., Hughes and Nei 1988), but the method is overly restrictive, given the fact that most of amino acid sites are functionally conserved, and only some are critical for molecular adaptation (Golding and Dean 1998). A powerful means is to examine multiple orthologous sequences and estimate the effect of positive selection by identifying those sites with Ka/Ks > 1 (Fitch et al. 1997; Nielsen and Yang 1998; Suzuki and Gojobori 1999; Yang et al. 2000).

    On the other hand, there will be an influx of sequences from two closely related species. In that case, it would be most effective to partition the changes by their amino acid properties. It is sometimes, but not always, possible to detect selection in a subset of classes, especially the most conservative class (Wyckoff, Wang, and Wu 2000). Between the amino acids with high EIs, the Ka/Ks values may exceed those in the bottom of table 2 by more than 10-fold. Previous attempts at grouping amino acids into classes often failed to resolve the Ka/Ks values into several distinct estimates. This failure may be the result of the poor correlation between the amino acid classification and their evolutionary exchangeability. In this respect, the U index should provide a better evolutionary basis for classifying amino acids and may lead to a better way of estimating the levels of coding-region divergence and, hence, the influence of positive selection. EI can also be developed further along with other measures currently used to assess the functional effect of amino acid changes in proteins (Yang and Nielsen 1998; Bustamante, Townsend, and Hartl 2000; Wyckoff, Wang, and Wu 2000; Zhang 2000; Chasman and Adams 2001). Notably, our analysis does not take into account the contextual information such as the protein structure, the expression level, and so on. Contextual information from other studies may eventually be used in conjunction with noncontextual information, such as EI, to gain useful insight into changes in protein function during evolution.

    Finally, the approach of focusing on the elementary amino acid changes can be applied to the polymorphism and disease patterns. By contrasting the EIs of divergence and of polymorphism (Yang and Nielsen 1998; Bustamante, Townsend, and Hartl 2000; Wyckoff, Wang, and Wu 2000; Zhang 2000; Chasman and Adams 2001), we may have a glimpse of the operation of natural selection at the amino acid level. Similarly, contrasting EI with disease causation by amino acid changes, we may understand the effect of disease on Darwinian fitness better.

    Acknowledgements

    The authors wish to thank Hurng-Yi Wang, Justin Fay, Marty Kreitman, Wen-Hsiung Li, Manyuan Long, Tony Dean, Steve Dorus, and Christine Malcom for comments and discussions. The work is supported by NIH grants to C.I.W.

    Literature Cited

    Altschul, S. F., and D. J. Lipman. 1990. Protein database searches for multiple alignments. Proc. Natl. Acad. Sci. USA 87:5509-5513.

    Bamshad, M., and S. P. Wooding. 2003. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4:99-111.

    Bustamante, C. D., J. P. Townsend, and D. L. Hartl. 2000. Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol. Biol. Evol. 17:301-308.

    Chasman, D., and R. M. Adams. 2001. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J. Mol. Biol. 307:683-706.

    Dayhoff M.O., R. M. Schwartz, and B. C. Orcutt. 1978. Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington, D.C.

    Doolittle, R. F. 1979. Protein evolution. Pp.1–118 in H. Neurath and R. L Hill, eds. The proteins. Academic Press, New York.

    Fay, J. C., G. J. Wyckoff, and C. I. Wu. 2001. Positive and negative selection on the human genome. Genetics 158:1227-1234.

    Fay, J. C., G. J. Wyckoff, and C. I. Wu. 2002. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415:1024-1026.

    Fitch, W. M., R. M. Bush, C. A. Bender, and N. G. Cox. 1997. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc. Natl. Acad. Sci. USA 94:7712-7718.

    Genetics Computer Group. 1999. The Wisconsin Package. Version 10.0. Madison, Wis.

    Golding, G. B., and A. M. Dean. 1998. The structural basis of molecular adaptation. Mol. Biol. Evol. 15:355-369.

    Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736.

    Grantham, R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862-864.

    Graur, D. 1985. Amino acid composition and the evolutionary rates of protein-coding genes. J. Mol. Evol. 22:53-62.

    Hellmann, I., S. Zollner, W. Enard, I. Ebersberg, B. Nickel, and S. Paabo. 2003. Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 13:831-837.

    Henikoff, S., and J. G. Henikoff. 1993. Performance evaluation of amino acid substitution matrices. Proteins 17:49-61.

    Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167-170.

    Kellis, M., N. Patterson, M. Endrizzi, B. Birren, and E. S. Lande. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241-254.

    Li, W. H. 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol 36:96-99.

    Li, W. H., C. I. Wu, and C. C. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150-174.

    Makalowski, W., and M. S. Boguski. 1998. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl. Acad. Sci. USA 95:9407-9412.

    McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-654.

    Miyata, T., S. Miyazawa, and T. Yasunaga. 1979. Two types of amino acid substitutions in protein evolution. J. Mol. Evol. 12:219-236.

    Muse, S. V., and B. S. Gaut. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11:715-724.

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426.

    Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-936.

    Suzuki, Y., and T. Gojobori. 1999. A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16:1315-1328.

    Wyckoff, G. J., W. Wang, and C. I. Wu. 2000. Rapid evolution of male reproductive genes in the descent of man. Nature 403:304-309.

    Yang, Z., and R. Nielsen. 1998. Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J. Mol. Evol. 46:409-418.

    Yang, Z., R. Nielsen, N. Goldman, and A. M. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-449.

    Yang, Z., R. Nielsen, and M. Hasegawa. 1998. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol. Biol. Evol. 15:1600-1611.

    Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol. Biol. Evol. 19:49-57.

    Zhang, J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50:56-68.(Hua Tang*, Gerald J. Wyck)