当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2003年第1期 > 正文
编号:10582149
A Low Rate of Simultaneous Double-Nucleotide Mutations in Primates
http://www.100md.com 《分子生物学进展》2003年第1期
     Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden-9[|4g{, 百拇医药

    Abstract-9[|4g{, 百拇医药

    The occurrence of double-nucleotide (doublet) mutations is contrary to the normal assumption that point mutations affect single nucleotides. Here we develop a new method for estimating the doublet mutation rate and apply it to more than a megabase of human-chimpanzee-baboon genomic DNA alignments and more than a million human single-nucleotide polymorphisms. The new method accounts for the effect of regional variation in evolutionary rates, which may be a confounding factor in previous estimates of the doublet mutation rate. Furthermore we determine sequence context effects by using sequence comparisons over a variety of lineage lengths. This approach yields a new estimate of the doublet mutation rate of 0.3% of the singleton rate, indicating that doublet mutations are far rarer than previously thought. Our results suggest that doublet mutations are unlikely to have caused the correlation between synonymous and nonsynonymous substitution rates in mammals, and also show that regional variation and sequence context effects play an important role in primate DNA sequence evolution.

    Key Words: doublet mutations • tandem substitutions • regional variation • sequence context effects • synonymous-nonsynonymous correlation{n@, 百拇医药

    Introduction{n@, 百拇医药

    Although point mutations are usually assumed to involve single nucleotides, both direct observations of mutational events in the laboratory and comparative analyses of DNA sequences have suggested that simultaneous adjacent double-nucleotide mutation events (hereafter termed doublet mutations) do occur in nature. Recently, used sequence comparisons of coding and noncoding DNA from diverse organisms to reveal a high frequency of doublet mutations, about 2% of the single-nucleotide mutation rate. We have attempted to develop an improved estimate of the doublet mutation rate, focusing on mutation processes in primate noncoding DNA sequences. Noncoding sequences are expected to provide a more reliable indication of underlying mutation processes than coding sequences, because they are less likely to be affected by selection.

    The importance of quantifying the doublet mutation rate becomes evident if we consider the consequences of incorrectly assuming that all point mutations occur at single nucleotides. Fundamentally, the pattern of neutral evolution is determined by the pattern of mutation, so unless we understand mutational processes we may falsely reject neutrality. For example, the classic test of the neutral theory using the index of dispersion of substitutions (see ) will be biased by doublet mutations. The neutral prediction, that the index of dispersion is one, is based on the Poisson model assumptions that mutations are random, independent, and single. Thus doublet mutations will generate overdispersion of the molecular clock: if all mutations are doublets then, relative to singleton mutations, the mean number of substitutions will be doubled but the variance in the number of substitutions will be quadrupled; hence the index of dispersion will be two rather than one. More generally, the explicit models of molecular evolution required for substitution rate estimation and phylogenetic inference usually involve single nucleotide changes (e.g., ). If doublet mutations are common then such models, and the results they generate, may be biased.

    Finally, estimating the level of doublet mutations should help resolve the debate concerning the causes of the correlation between synonymous and nonsynonymous substitution rates in mammals. Doublets provide a mutational (i.e., neutral) explanation for why nonsynonymous changes in protein-coding genes (those changes in DNA sequence which affect the amino acid sequence) covary with synonymous changes (those changes in DNA sequence which do not affect the amino acid sequence owing to the degeneracy of the genetic code). For example, changes at the second codon position are always nonsynonymous while changes at the third codon position are mostly synonymous: thus a doublet mutation can simultaneously generate both a synonymous and a nonsynonymous mutation. Note, however, that there is also a methodological debate over whether there really is a synonymous-nonsynonymous correlation in mammals .ovr%(0q, 百拇医药

    Materials and Methodsovr%(0q, 百拇医药

    Substitution Data

    Primate genomic alignments were built in a number of stages. Chimpanzee and baboon sequence data orthologous to regions of human chromosome 7, generated by the NIH Intramural Sequencing Center (NISC) Comparative Sequencing Initiative (see) , were retrieved using National Center for Biotechnology Information (NCBI) Entrez . Those sequences reported as "working draft sequence" were broken up into their constituent unordered pieces. Regions of the human genome orthologous to the chimpanzee and baboon sequences were identified by BLAST searches against the human genome. These BLAST searches provided positional information for the removal of overlapping sequences. Human-chimpanzee-baboon alignments were generated using the default values of ClustalW as implemented at the Human Genome Mapping Project (HGMP) . The resulting alignments were checked to remove poorly aligned regions, thereby ensuring high-quality genomic alignments, insensitive to changes in alignment parameters (see). In all, 43 human-chimpanzee-baboon alignments with a total ungapped and unmasked length of 1.7 Mb were obtained. Details of the sequences in our alignments are available on request.

    The positions of alignments in human contigs were determined by BLAST searches against the human genome, and comparison to the contig annotation files available at NCBI allowed the masking of coding regions within alignments. Repetitive sequence elements were masked with RepeatMasker (A.F.A. Smit and P. Green, unpublished). Repetitive elements were masked because they are known to evolve at higher rates than nonrepetitive sequence , and so may generate strong regional variation and sequence context effects. Microsatellites were masked using the program Sputnik (Abajian, unpublished, ) because microsatellites cause alignment problems and show unusual substitution patterns.a, http://www.100md.com

    Lineage-specific substitutions were classified using parsimony (e.g., if the human-chimp-baboon sequences are A-C-C, then a C to A change is inferred down the human lineage). For both pairwise and lineage-specific substitution we ignored both singletons which may be due to the hypermutability of methylated CG dinucleotides (CG to TG and CG to CA). To nullify the effect of CG mutation on doublet estimates, all potential CG-mediated tandems (CH to TG and DG to CA, where H is A/C/T and D is A/G/T) were removed, as were the corresponding near-neighbors. Tandems are pairs of adjacent differences, whereas near-neighbors are pairs of differences separated by one nucleotide. The corresponding near-neighbors had to be removed because not all putative CG tandems are generated by CG hypermutation.

    Given that there are several alternative strategies for counting different types of substitutions, here we present our algorithms in greater detail. Slight alterations to the algorithms made little difference to estimates of the doublet mutation rate (results not shown). All alignments were analyzed separately after masking for genes and repeats. The following classes of sites were defined for both the human-chimp and human-baboon pairwise comparisons and the human and chimp lineages: substitutions (s), conserved sites (c) and masked sites (m). In addition, some sites were undefined (u) in the human and chimp lineages (i.e., parsimony uninformative). Then potential CpGs were masked (s m) as described above (note that CpG masking differs between the pairwise comparisons and lineages because in the latter case the direction of substitution is known). Then types of substitutions were counted. The total number of substitutions was found by counting all the "s"-sites; tandems were found by counting all the adjacent pairs of "s"-sites (including overlapping pairs, e.g., "sss" contains two doublets), and near-neighbors were identified by counting all pairs of "s"-sites separated by one site (which could belong to any class, and again overlapping pairs were allowed, e.g., "scsms" contains two near-neighbors). Finally, the effective length of the alignment, required for calculating the measure of the doublet mutation rate, was found by counting all the "s"-sites and "c"-sites.

    Polymorphism Data+5ae3x, 百拇医药

    We downloaded the tenth release of The SNP Consortium (TSC) database, consisting of 1,255,326 mapped single-nucleotide nucleotides (SNP's) from . Tandems and near-neighbors could be identified because the polymorphism data files gave the base positions of each SNP along the chromosomes as well as positions within human contigs. All SNP's identified as coding by BLAST searches of surrounding sequence against the June 2001 version of the human mRNA RefSeq database were removed. found no evidence for polymorphism tandems to occur preferentially in repetitive sequences, and so the polymorphism data was not masked for repeat sequences. Potential CG-mediated singletons, tandems, and near-neighbors were also removed as for substitutions. In the case of our polymorphism data, for which both the direction of mutation and the linkage patterns were unknown, the complementary removal of near-neighbors is particularly important because many of the putative CG tandems (identified as Y(R/K/S) and (M/S/Y)R where Y is C/T, R is A/G, K is G/T, S is C/G, and M is A/C) will not have been caused by CG hypermutation. Runs of three or more adjacent polymorphisms were also masked as these might be considered likely to result from sequencing errors; this procedure made no qualitative difference to our results (data not shown). Tandem polymorphisms were counted as all instances of pairs of polymorphisms one base pair apart; near-neighbor polymorphisms were counted as all instances of pairs of polymorphisms two base pairs apart.

    Results25t, http://www.100md.com

    Estimating the Doublet Mutation Rate25t, http://www.100md.com

    estimated the relative rate of doublet mutations, here termed Doe, by comparing the observed number of tandem substitutions (To) with the expected number of tandem substitutions (Te), calculating the expected number of tandem substitutions by assuming that a substitution was equally likely at every site:25t, http://www.100md.com

    So is the observed number of substitutions, and so So minus twice the inferred number of doublets gives the number of singletons. The terms doublet and tandem refer to adjacent bases, but doublets refer to the (inferred) mutation process, whereas tandems refer to differences between compared sequences. Thus we may observe many tandem differences, but we cannot infer doublet mutations unless we can show that the observed number of tandem differences exceeds the expected number.25t, http://www.100md.com

    The assumption of no rate variation between sites does not seem generally applicable , however, and so it is worth considering the effect of rate variation between sites. We define two types of rate variation: sequence context and regional variation. By sequence context we refer to the dependence of mutation rates on the identities of nearby bases , and by regional variation we refer to all larger scale variation in mutation rates.

    Regional variation is suggested by KS variation in mammals, between genes throughout the genome as well as within genes , and by variation in noncoding substitution rates in primates . If there is regional variation in mutation rates, then it is easy to see that the expected number of tandems will be underestimated. For instance, consider 10 kb compared between two samples with a mean distance of 10%; if there is no variation in substitution rates, we expect 100 tandems (length of sequence times the square of the distance). Now imagine that there is rate variation with 9 kb at 8% and 1 kb at 28%, in which case although the mean distance is the same the expected number of tandems rises to 136. So, unless the regional variation is accounted for, 36 doublet mutations will be falsely inferred.z]f, 百拇医药

    Sequence context occurs on a much finer scale than regional variation, but it can generate tandems in a similar way. For example, it has been shown that, with an adjacent 5' G, the mutagenic guanine product dG-AF induces mutations at a much higher frequency with an adjacent 3' C than with an adjacent 3' G . Thus if a GGG trinucleotide undergoes a primary mutation to GGC, then the chance of a secondary mutation affecting the middle nucleotide is greatly increased. Thus sequence context can make the generation of a tandem more likely than expected if mutations were independent.

    Here we present a simple method to address the problem of regional variation. Instead of comparing the observed number of tandems to the number of tandems expected under a given model, we compare the observed number of tandems (To), pairs of differences one base apart, with the observed number of what we refer to as near-neighbors (NNo), pairs of differences two bases apart, e.g., as in AGGTT compared to ATGAT. Just as many additional near-neighbors will be generated by regional variation as tandems, so the difference gives the number of doublet mutations inferred from the data. The doublet mutation rate, Dtn, is then quantified relative to the number of singleton mutations as for equation 1:^%, http://www.100md.com

    The underlying assumption of our new method is that there are only two sorts of mutations, singletons and doublets; in other words there are no simultaneous near-neighbor mutations.^%, http://www.100md.com

    The formula for Dtn also corrects for sequence context effects to some extent, because sequence context effects will generate near-neighbors as well as tandems. The correction for sequence context effects, however, will certainly not be perfect because such effects appear to be much stronger at a distance of one base than at two bases . The strongest known case of sequence context, the hypermutability of CG dinucleotides, can be explicitly accounted for by ignoring potentially affected differences (see Materials and Methods). Such an approach is not possible if weak sequence context effects are common, and so our approach for dealing with sequence context effects is to consider sequence differences along lineages of different lengths. Short lineages provide few primary mutations to affect secondary mutations, and sequence context effects should thus be weak.

    Simulations.], 百拇医药

    We performed some simulations to confirm that both the Doe and Dtn methods are robust to two notable features of DNA sequence evolution: (i) variation in the rates at which different types of DNA mutations occur and (ii) rate variation between sites (assuming that such rate variation is random with respect to genomic position; i.e., there is no systematic regional variation). Intuitive reasoning suggests that neither factor should affect the methods because they do not affect the relative positions at which mutations appear. We performed simulations of DNA sequence evolution using the evolver program in the PAML package . In all cases, we generated three sequences of 1 million base pairs each according to a tree with distances similar to those in the human-chimpanzee-baboon tree—[(human:0.005, chimpanzee:0.005]:0.04, baboon:0.045])—and we analyzed the simulated sequences using the same methods employed for the human-chimpanzee-baboon alignments (see Materials and Methods; for simplicity we did not apply CpG masking to the simulated data). Four sets of simulated data were generated using different DNA substitution models: (1) JC69 model , (2) HKY85 model with equal base frequencies and a transition/transversion ratio of 5, (3) as in (2), but with unequal base frequencies C = G = 0.4 and A = T = 0.1, and (4) as in (3,) but with rate variation between sites corresponding to a "discretized" gamma distribution with shape parameter 0.5 and eight rate categories.

    The results of the simulations are shown in , in which each simulation data set is analyzed in three comparisons of different lineage lengths corresponding to those performed on the human-chimpanzee-baboon data described below. In none of the 12 sets of analyses is there a significant difference between the observed and expected numbers of tandems (To and Te) or between the observed numbers of tandems and near-neighbors (To and NNo). These results suggest that neither the Doe nor the Dtn method is likely to generate biases through a failure to provide an explicit model of DNA evolution. The simulated "human-baboon" comparisons also suggest that the use of parsimony, or more specifically the failure to account for multiple substitutions at the same site, does not bias the Doe or Dtn methods. As one moves from simulation (1) to simulation (4), the failure to account for multiple hits leads to increasing underestimation of substitution rates: thus So, To, and NNo all decrease. However, this bias does not seem to affect the difference between To and Te or that between To and NNo. Therefore, although both the Doe and Dtn methods are somewhat ad hoc, the simulations indicate that they are not biased by multiple hits, nonregional rate variation between sites, or variation in the rates of different types of DNA mutations. Thus, although it may ultimately be desirable to address the issue of doublet mutations through explicit DNA models combined with likelihood or Bayesian approaches, the development of such methods does not appear to be a pressing concern.

    fig.ommitted*y, 百拇医药

    Table 1 Numbers of Different Types of Substitutions in Simulated Sequence Data*y, 百拇医药

    Substitutions*y, 百拇医药

    Nucleotide substitutions were identified in the noncoding nonrepetitive regions of a large set of human-chimpanzee-baboon alignments (see Materials and Methods). With the three species alignments, a number of different comparisons were possible, each considering substitutions down lineages of different lengths. The human-baboon pairwise comparison represents the longest lineage, with a total of 50 million years of evolution (twice the time to the common ancestor 25 million years ago The human-chimpanzee pairwise comparison represents about 10 million years of evolution, given species divergence 5 million years ago . Finally, the human and chimpanzee lineage-specific substitutions both represent roughly 5 million years of evolution (i.e., time since species divergence), and they can be combined because substitutions in the two lineages are independent when parsimony is used to infer substitutions.

    For all three substitution comparisons, both Doe and Dtn were calculated, and shows that in each case Dtn is much lower than Doe (human-baboon pairwise Dtn = 0.95% and Doe = 1.74%, human-chimpanzee pairwise Dtn = 0.53% and Doe = 1.14%, human-baboon pairwise Dtn = 0.35% and Doe = 0.80%). Because Doe assumes no rate heterogeneity, the difference between Dtn and Doe indicates mainly the effect of regional variation within the 43 alignments ranging in length from 12 kb to 107 kb (but not variation between alignments because expected numbers of tandems were calculated separately for each alignment). The magnitude of the difference shows the necessity of accounting for regional variation in substitution rates.k, 百拇医药

    fig.ommittedk, 百拇医药

    Table 2 Numbers of Different Types of Substitutions and Polymorphisms in Primate Sequence Datak, 百拇医药

    It is also clear from that both estimates of the doublet mutation rate decrease as the length of the lineages decreases, with a greater than twofold difference in Dtn and Doe between the human-baboon pairwise substitutions and the human and chimpanzee lineage-specific substitutions. The effect of lineage length is consistent with sequence context effects because longer lineages provide more scope for secondary nonindependent mutations. Lineage length effects are also apparent in noncoding primate data reported by , because most of their doublet mutations were inferred to be in the longest lineage (see their table 2 and figure 3).

    Polymorphisms82!$g, 百拇医药

    Given the effect of lineage length on estimates of the doublet mutation rate, the obvious step after considering substitutions among primates is to turn to human polymorphisms. Not only is the human genome the subject of intensive efforts to determine polymorphisms , but also humans have a lower effective population size than other primates ), and so offer the shortest possible lineage length among the primates, and thus the weakest sequence context effects. Because the average age, measured in generations, of neutral polymorphisms is four times the effective population size , estimates of a human effective population size of 10,000 and a generation time of 25 years indicate that human polymorphisms have an average age of roughly 1 million years (possibly lower if ancestral generation times and effective population sizes were lower). Thus the lineage length of human polymorphisms is roughly five times shorter than the lineage length of human and chimpanzee lineage-specific substitutions (see ).

    An excess of tandem SNP's relative to the number expected assuming no polymorphism level heterogeneity has previously been observed on human chromosome 22 . Applying the Doe method to the singleton and doublet data given by Dawson et al. generates a doublet mutation rate of 0.70%. This value, however, is likely to be a serious overestimate because CG hypermutation mediated tandems (see Materials and Methods) were not removed from their analysis (indeed found TG CA tandems to be the most common), and because there is extreme regional variation in levels of polymorphism (see Dawson et al., Figure 1).*kuuty1, http://www.100md.com

    The estimation of the doublet mutation rate Dtn using TSC SNP data, 0.27%, appears reasonable, although the results should be viewed with caution because the sampling of polymorphisms over the human genome is clearly patchy. The doublet mutation rate will only be underestimated if the sampling process is biased toward looking for polymorphisms at single nonadjacent sites, a finding which seems unlikely. Just as with the substitution comparisons, Dtn is much less than Doe. The large difference between the two doublet mutation rate estimates and the 25-fold excess of observed versus expected tandems probably reflect the fact that both polymorphism levels (see Figure 2b and supplementary material of ) and sampling intensity vary greatly across human chromosomes.

    Discussion76ptwn:, http://www.100md.com

    The comparison of the Dtn estimates for the polymorphism and substitution data should allow the determination of sequence context effects. When lineages are very short, then sequence context effects should be minimal, but it is not clear how to quantify this except by comparison of different lineage lengths. If one considers lineages of progressively shorter length, then when the Dtn estimates approach an asymptotic minimum, sequence context effects may be inferred to be minimal at the shortest lineage lengths (but note that very short lineages contain relatively little information and thus yield imprecise estimates of the doublet mutation rate). The fact that the Dtn estimate for human polymorphisms is only slightly less than that for the human and chimpanzee lineage-specific substitutions suggests that both estimates are inflated only marginally by sequence context effects. So our best estimate of the doublet mutation rate relative to the singleton mutation rate is 0.3%, the average of the polymorphism and human and chimpanzee lineage-specific substitution values. This value is over six times lower than the value of 2% obtained by .

    It is desirable to consider the various possible types of doublet mutations to see if the results of mutagenesis studies match with the results of sequence comparisons. For example, it is interesting to see whether CC to TT mutations, which have been found in the p53 gene as the result of UV exposure (), are revealed by our sequence comparisons. The most appropriate data for this purpose are the human and chimpanzee lineage-specific substitutions, because the use of the baboon as an outgroup allows the direction of substitution to be inferred.mo|lq, 百拇医药

    Because mutations affecting both strands need to be considered, we compare tandems and near-neighbors for GG to AA changes as well as CC to TT changes (GNG to ANA and CNC to TNT for near-neighbors). There are six tandems and two near-neighbors in our sequence comparisons, and this excess of tandems is consistent with mutagenesis studies, although the difference is not significant because of the small numbers of differences. The lack of data in the present study precludes a full analysis of all possible types of doublets, so the generation of additional SNP data should help such studies considerably. Linkage information will allow a more discriminating test of doublet mutations: apart from rare recombination events doublet SNP's should be in perfect linkage disequilibrium. Outgroup sequences, such as those from chimpanzee, will allow the determination of the direction of mutation in polymorphism studies, and thus the more accurate analysis of mutation processes.

    What is the significance of our finding of a doublet mutation rate of 0.3% as opposed to the estimate of 2% obtained by ? Does this discrepancy in doublet mutation rates result from the differences between the Doe and Dtn methods or perhaps from the differences between the data sets? We can discount the latter possibility in two ways. First, we reanalyzed the data set of . Their alignment of primate pseudo eta globin sequences, masked for CpG mutations, was kindly provided by M. Averof. We used the program baseml in the PAML package () to reconstruct the ancestral sequences and we analyzed the changes down the lineage leading to rhesus monkey (this is the longest lineage, with the most data, and it provides the strongest signal of doublet mutations). Using this method, slightly different from that employed by , we identified 232 substitutions, 17 tandems, and 14 near-neighbors and calculated the expected number of tandems as 8.29. Thus the Doe method yields a doublet mutation rate of 3.8%, whereas the Dtn method gives 1.3%, suggesting that the two methods give different results in both data sets. Second, we can partition our human and chimp lineage-specific results according to alignment-specific substitution rates. This partitioning enables a test of the idea that our human-chimpanzee-baboon data set may contain selectively constrained regions in which doublet mutations may be particularly constrained (unlike the pseudogene analyzed by , which is almost certainly free of constraint). If so, we would predict the doublet mutation rate to be lower in regions with low substitution rates. However, this prediction does not hold true (Dtn is 0.58% in low substitution rate alignments and 0.18% in high substitution rate alignments, varying as expected given that the doublet mutation rate is conditioned on the singleton mutation rate).

    Given that we can be confident that the Doe and Dtn methods yield different results when applied to real sequence data, an important conclusion of this study is the strength of regional variation and sequence context, effects we consider responsible for most of the excess of observed tandems relative to the expectation with no rate heterogeneity. The results of our study of primate genomic sequences are thus consistent with regional variation in synonymous and noncoding substitution rates across mammalian nuclear genomes , sequence context in chloroplasts (), sequence context in mitochondria and the utility of the auto-discrete-gamma model to describe sequence evolution in primate mitochondria .v.u}j|j, 百拇医药

    The direct effect of a low doublet mutation rate is obviously a diminished belief in the importance of doublet mutations. Such a low rate suggests that doublet mutations cannot fully explain the strong correlation between synonymous and nonsynonymous substitution rates, KS and KA, observed in mammals . In the comparison presented by , of 363 genes in mouse and rat, there were on average 28 nonsynonymous and 56 synonymous differences per gene after correction for multiple hits. A doublet mutation rate of 0.3% means that the expected number of doublets is only 0.25: thus over three quarters of genes will not be affected by doublet mutation. The effect of doublets on the KA–KS correlation can be investigated by simulating substitutions according to Poisson distributions based on the data of , assuming that all genes have independent synonymous and nonsynonymous singleton substitution rates based on the mean numbers of substitutions and numbers of sites (hence the expected KA–KS correlation coefficient in the absence of doublet mutations is zero). If all doublet mutations are considered to generate one synonymous and one nonsynonymous mutation (a conservative assumption for our purposes), the expected increase in the KA–KS correlation coefficient arising from a doublet mutation rate of 0.25 is just 0.005. Applying the doublet mutation rate of 2% means 1.7 doublets on average, which increases the expected KA–KS correlation confidence by 0.042. Thus it seems highly unlikely that doublet mutations are solely responsible for the observed increase in the KA–KS correlation coefficient above the neutral expectation (), given that this has been quantified as an increase of 0.141 in the mouse-rat study of .

    Acknowledgementsr;y-?|, http://www.100md.com

    H.E. is a Royal Swedish Academy of Sciences Research Fellow supported by a grant from the Knut and Alice Wallenberg foundation. This study was supported by the Swedish Research Council. Thanks to M. Averof for providing alignments.r;y-?|, http://www.100md.com

    Literature Citedr;y-?|, http://www.100md.com

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25:3389-3402.r;y-?|, http://www.100md.com

    Averof, M., A. Rokas, K. H. Wolfe, and P. M. Sharp. 2000. Evidence for a high frequency of simultaneous double-nucleotide substitutions. Science 287:1283-1286.r;y-?|, http://www.100md.com

    Bielawski, J. P., K. A. Dunn, and Z. H. Yang. 2000. Rates of nucleotide substitution and mammalian nuclear gene evolution: Approximate and maximum-likelihood methods lead to different conclusions. Genetics 156:1299-1308.r;y-?|, http://www.100md.com

    Chen, F. C., and W. H. Li. 2001. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet 68:444-456.

    Dawson, E., Y. Chen, S. Hunt, et al 2001. A SNP resource for human chromosome 22: extracting dense clusters of SNPs from the genomic sequence. Genome Res 11:170-178.+/0sm, 百拇医药

    Gillespie, J. H. 1991. The causes of molecular evolution. Oxford University Press, Oxford.+/0sm, 百拇医药

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol 22:160-174.+/0sm, 百拇医药

    Howell, N., and C. B. Smejkal. 2000. Persistent heteroplasmy of a mutation in the human mtDNA control region: hypermutation as an apparent consequence of simple-repeat expansion/contraction. Am. J. Hum. Genet 66:1589-1598.+/0sm, 百拇医药

    Jorde, L. B., W. S. Watkins, and M. J. Bamshad. 2001. Population genomics: A bridge from evolutionary history to genetic medicine. Hum. Mol. Genet 10:2199-2207.+/0sm, 百拇医药

    Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules,. Pp. 21–123 in H. N. Munro, ed., Mammalian protein metabolism, Academic Press, New York.+/0sm, 百拇医药

    Kaessmann, H., V. Wiebe, G. Weiss, and S. Paabo. 2001. Great ape DNA sequences reveal a reduced diversity and an expansion in humans. Nat. Genet 27:155-156.

    Keightley, P. D., and A. Eyre-Walker. 2000. Deleterious mutations and the evolution of sex. Science 290:331-333.&4e1#, 百拇医药

    Kimura, M. 1983. The neutral theory of evolution. Cambridge University Press, Cambridge.&4e1#, 百拇医药

    Krawczak, M., E. V. Ball, and D. N. Cooper. 1998. Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am. J. Hum. Genet 63:474-488.&4e1#, 百拇医药

    Lercher, M. J., E. J. B. Williams, and L. D. Hurst. 2001. Local similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: implications for understanding the mechanistic basis of the male mutation bias. Mol. Biol. Evol 18:2032-2039.&4e1#, 百拇医药

    Li, W. H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.&4e1#, 百拇医药

    Matassi, G., P. M. Sharp, and C. Gautier. 1999. Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol 9:786-791.&4e1#, 百拇医药

    Morton, B. R., V. M. Oberholzer, and M. T. Clegg. 1997. The influence of specific neighboring bases on substitution bias in noncoding regions of the plant chloroplast genome. J. Mol. Evol 45:227-231.

    Nakazawa, H., D. English, P. L. Randell, K. Nakazawa, N. Martel, B. K. Armstrong, and H. Yamasaki. 1994. UV and skin-cancer-specific P53 gene mutation in normal skin as a biologically relevant exposure measurement. Proc. Natl Acad. Sci. USA 91:360-364.@ks!, 百拇医药

    Ohta, T. 1995. Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol 40:56-63.@ks!, 百拇医药

    Pruitt, K. D., K. S. Katz, H. Sicotte, and D. R. Maglott. 2000. Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet 16:44-47.@ks!, 百拇医药

    Purvis, A. 1995. A composite estimate of primate phylogeny. Phil. Trans. R. Soc. Lond. B 348:405-421.@ks!, 百拇医药

    Shibutani, S., N. Suzuki, X. Z. Tan, F. Johnson, and A. P. Grollman. 2001. Influence of flanking sequence context on the mutagenicity of acetylaminofluorene-derived DNA adducts in mammalian cells. Biochemistry 40:3717-3722.@ks!, 百拇医药

    Smith, N. G. C., and L. D. Hurst. 1999. The effect of tandem substitutions on the correlation between synonymous and nonsynonymous rates in rodents. Genetics 153:1395-1402.

    Smith, N. G. C., M. T. Webster, and H. Ellegren. 2002. Deterministic mutation rate variation in the human genome. Genome Res 12:1350-1356.(wz, 百拇医药

    Templeton, A. R., A. G. Clark, K. M. Weiss, D. A. Nickerson, E. Boerwinkle, and C. F. Sing. 2000. Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet 66:69-83.(wz, 百拇医药

    The, International SNP Map Working Group. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928-933.(wz, 百拇医药

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. ClustalW—improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673-4680.(wz, 百拇医药

    Tsunoyama, K., M. I. Bellgard, and T. Gojobori. 2001. Intragenic variation of synonymous substitution rates is caused by nonrandom mutations at methylated CpG. J. Mol. Evol 53:456-464.(wz, 百拇医药

    Wolfe, K. H., and P. M. Sharp. 1993. Mammalian gene evolution—nucleotide sequence divergence between mouse and rat. J. Mol. Evol 37:441-456.

    Wolfe, K. H., P. M. Sharp, and W. H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283-285.31%r, 百拇医药

    Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. J. Mol. Evol 39:306-314.31%r, 百拇医药

    Yang, Z. 1995. A space-time process model for the evolution of DNA sequences. Genetics 139:993-1005.31%r, 百拇医药

    Yang, Z. 1996. The among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol 11:367-372.31%r, 百拇医药

    Yang, Z. 1997. PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci 13:555-556.31%r, 百拇医药

    Zavolan, M., and T. B. Kepler. 2001. Statistical inference of sequence-dependent mutation rates. Curr. Opin. Genet. Dev 11:612-615.(Nick G. C. Smith Matthew T. Webster and Hans Ellegren)