PolymorphismandDivergenceforIsland-Mod

Polymorphism and Divergence for Island-Model Species

http://www.100md.com 《基因杂志》2003年第1期

     ^a Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138o+!q&, 百拇医药

    ABSTRACTo+!q&, 百拇医药

    Estimates of the scaled selection coefficient,o+!q&, 百拇医药

    {gamma} of Sawyer andHartl, are shown to be remarkably robust to population subdivision.Estimates of mutation parameters and divergence times, in contrast,are very sensitive to subdivision. These results follow froman analysis of natural selection and genetic drift in the islandmodel of subdivision in the limit of a very large number ofsubpopulations, or demes. In particular, a diffusion processis shown to hold for the average allele frequency among demesin which the level of subdivision sets the timescale of driftand selection and determines the dynamic equilibrium of allelefrequencies among demes. This provides a framework for inferenceabout mutation, selection, divergence, and migration when dataare available from a number of unlinked nucleotide sites. Theeffects of subdivision on parameter estimates depend on thedistribution of samples among demes. If samples are taken singlyfrom different demes, the only effect of subdivision is in therescaling of mutation and divergence-time parameters. If multiplesamples are taken from one or more demes, high levels of within-demerelatedness lead to low levels of intraspecies polymorphismand increase the number of fixed differences between samplesfrom two species. If subdivision is ignored, mutation parametersare underestimated and the species divergence time is overestimated,sometimes quite drastically. Estimates of the strength of selectionare much less strongly affected and always in a conservativedirection.

    ONE of the primary goals of population genetics has been tomeasure and to understand the role of natural selection in shapingvariation within and between species. Now that molecular technologiesallow genetic variation to be assayed with relative ease, thisgoal seems within reach. A number of different approaches tostudying selection have been proposed (HUDSON and KAPLAN 1988; NEUHAUSER and KRONE 1997 ; YANG 1998 ; DONNELLY et al.2001 ; SLATKIN and BERTORELLE 2001 ), and a multitude of neutralitytests, reviewed by NIELSEN 2001 , can be applied if appropriategenetic data are gathered. This work considers SAWYER and HARTL's(1992) method, which belongs to a class of methods that useoverall levels of polymorphism and divergence at two or morecategories of sites in samples of DNA sequences from a pairof species. HUDSON et al. 1987 were the first to propose sucha method, in which the categories were different loci, followedby MCDONALD and KREITMAN 1991 , who categorized sites withina locus as being either synonymous or nonsynonymous with respectto changes in the amino acid sequence of the protein product.Both methods assumed no intralocus recombination and allowedthe hypothesis of strict selective neutrality to be tested.Shortly afterward, by assuming KIMURA's (1969) infinite-sitesmutation model, i.e., with free recombination between sites,SAWYER and HARTL 1992 showed that McDonald-Krietman test datacould be used not only to test neutrality but also to estimateselection, mutation, and divergence-time parameters.

    NIELSEN 2001 pointed out that McDonald-Kreitman and relatedtests, in which sites can be classified a priori, provide avery powerful framework for inferences about natural selection,in contrast to tests like TAJIMA's (1989) and FU and LI's (1993),which measure deviations from the highly variable process ofneutral coalescence. It is likely that McDonald-Kreitman andrelated methods will become the mainstay of genomic analysesof the role of selection. In two recent works, modified McDonald-Kreitmantests were applied to genomic data from Drosophila, suggestingthat 45% of the amino acid differences between Drosophila simulansand D. yakuba resulted from positive selection (SMITH and EYRE-WALKER2002 ) and that positive selection at a relatively small numberof genes is responsible for the divergence of D. simulans andD. melanogaster (FAY et al. 2002 ). BUSTAMANTE et al. 2002 used a modified Sawyer-Hartl method to show that Arabidopsisspecies have experienced a higher proportion of deleteriousamio acid substitutions than Drosophila species, in which positiveselection is common, and attributed the difference to high levelsof inbreeding in Arabidopsis.

    An obvious shortcoming of these methods is that they assumethe species under study are panmictic, i.e., not geographicallyor otherwise subdivided. It is well known that this assumptionis incorrect for many species (SLATKIN 1985 ). When there isno intralocus recombination, MCDONALD and KREITMAN 1991 pointout that shared genealogical history should control for theeffects of demography when sites can be categorized a priori.It is less clear that this should be the case when collectionsof unlinked sites are used to estimate selection, mutation,and divergence-time parameters as in SAWYER and HARTL's (1992)method. It is possible that the effects of subdivision on thenumbers of polymorphisms and fixed differences at synonymousand nonsynonymous sites could lead to errors in inferences.Therefore, the goal of this work is to extend the Poisson randomfield (PRF) theory of polymorphism and divergence developedby SAWYER and HARTL 1992 to include subdivided species. Todo this, it is first shown that in the limit of a large numberof subpopulations or demes allele-frequency dynamics at a singlelocus in a population with island-model migration (WRIGHT 1931; MORAN 1959 ; MARUYAMA 1970 ; LATTER 1973 ) are governedby a diffusion process that has the same form as the usual Wright-Fisherdiffusion, e.g., see EWENS 1979 , but with a timescale differentfrom that of the panmictic case. Then, the assumption of freerecombination between sites allows the PRF model to be usedto predict the patterns of variation in samples from a pairof island-model species.

    The diffusion result is obtained using Theorem 3.3 in ETHIERand NAGYLAKI 1980 and relies upon the fact that the processof migration and drift within subpopulations occurs on a muchfaster timescale than changes in allele frequency by drift andselection in the total population. The result thus depends ona stochastic equilibrium of allele frequencies within demeswith respect to migration and drift, which is also described.This follows some recent work (CHERRY and WAKELEY 2003) inwhich simulations supported the existence of such a diffusionunder the additional assumption that demes are very large andmigration rates correspondingly small. The present analysisshows that this additional assumption is unneccessary. The assumptionof infinite deme sizes and infinitesimal migration rates wasalso made in the recent coalescent work on neutral large-number-of-demesmodels (WAKELEY 1998 , WAKELEY 2001 ), and it is made belowin The expected number of neutral segregating sites, when theforward and backward results are compared. Otherwise, here itis assumed that the demes are finite in size and the migrationrates are unconstrained.

    This work makes a connection between the PRF theory and workon the robustness of the coalescent process to population structure(NORDBORG 1997 ; MOHLE 1998 ), in particular for the case ofgeographic structure (WAKELEY 1998 , WAKELEY 2001 ). The twoare related by showing that the effective size of the ancestral,coalescent process is the same as that of the forward-time diffusionof allele frequencies and that the forward- and backward-derivedpredictions for the expected number of segregating sites ina sample are the same under neutrality. We expect such connectionsbetween forward and backward approaches to exist, a fact thatis well established in the case of panmictic populations (EWENS1990 ; MOHLE 2001 ). Like the forward (NAGYLAKI 1980 ) andbackward (NOTOHARA 1993 ) strong-migration limits, these resultsand those of WAKELEY 1998 , WAKELEY 2001 for the coalescentprocess are based on a "separation of timescales." In this case,the fast processes are migration and drift within demes andthe slow process is drift and possibly selection in the totalpopulation, which is mediated by migration. The effective sizeof the population is rescaled and patterns of genetic variationdepend on how a sample is distributed among demes. In contrast,under the usual strong-migration limit, the only effect of structureis to rescale the effective size of the population (NAGYLAKI1980 ; NOTOHARA 1993 ; NORDBORG 1997 ; CHARLESWORTH 2001).

    The main result presented here, besides the existence of thediffusion (9) below, is that, if mutations are introduced ata constant rate per generation and sites segregate independentlyof one another, the PRF results of SAWYER and HARTL 1992 canbe applied, but with a correction that depends on how samplesare taken among demes. If each sample is taken from a differentdeme, then SAWYER and HARTL's (1992) results apply directly,but with slightly different mutation and divergence-time parameters.If some or all of the samples come from the same deme, the PRFresults must be corrected for the effect of drift and migrationwithin demes. Failure to recognize this can cause serious errorsin the estimation of mutation rates and divergence times, butnot, surprisingly, of selection coefficients.'ez, 百拇医药

    THEORY'ez, 百拇医药

    A population or species is assumed to be subdivided into D demesof equal size N. The organisms are assumed to be haploid, butthe results will hold for diploid organisms if N is replacedwith 2N, if selection is additive, and if migration is gametic.The island model of migration (WRIGHT 1931 ; MORAN 1959 ) isassumed: a fraction m of each deme is replaced by migrants everygeneration and all migrants are randomly sampled from a migrantpool to which all demes contribute equally. In each generation,migration occurs first, followed by selection, and then resampling(drift) within demes according to the Wright-Fisher model (FISHER1930 ; WRIGHT 1931 ). In the next two sections, two allelesare assumed to be segregating at a single locus, and Many independentlysegregating loci considers their introduction by mutation. Thewild-type or nonmutant allele has relative fitness equal to1, and the mutant allele has fitness 1 + s_D, where s_D ">=" -1. Thenext section establishes the diffusion approximation for thefrequency of the mutant allele as D "->"

    {infty}1#, 百拇医药

    , but Ds_D remains finite.The migration rate can vary between 0 and 1 (0 < m " 1) andN is assumed to be finite. This is in contrast to the usualassumption that Nm is finite as N goes to infinity.1#, 百拇医药

    Considering the number of mutants within each deme, it is apparentthat there are exactly N + 1 kinds of demes. Each deme thatbegins a generation with i copies of the mutant will have mutantfrequency1#, 百拇医药

    after migration and selection, wherex is the frequency of the mutant in the total population. Thenext generation within the deme will be produced by randomlysampling N haploid individuals from this distribution. Thus,a deme that contains i copies of the mutant now has probability1#, 百拇医药

    of having j copies at the start of the followinggeneration. Because lim_D_{"->"1#, 百拇医药

    {infty}}s_D = 0, it is often necessary to consideronly one part of P_ij:

    The notation o(s_D) used in and below means that lim_D_{"->"?!s!aq, 百拇医药

    {infty}}o(s_D)/s_D= 0. Thus P_ij = P^*_ij + o(1). The process of drift, describedby , happens independently within each deme?!s!aq, 百拇医药

    Limiting allele frequency dynamics at a single locus:?!s!aq, 百拇医药

    Let Z^D_i(t) record the fraction of demes that contain i copiesof the mutant and z_i(t) be a particular realization of thisrandom variable. Thus, Z^D(t) is a Markov chain whose state spaceconsists of all possible configurations of the D demes amongthe N + 1 mutant-count classes. Appendix A proves a diffusionresult for Z^D(t) as D goes to infinity and Ds_D remains finite.Briefly, this is done by using of ETHIER and NAGYLAKI1980 —see also of NAGYLAKI 1980. DefineX^D(t) = ^N_i=0iZ^D_i(t)/N. The random variable X^D(t) records thefrequency of the mutant in the total population or the averagefrequency of the mutant among demes (x above). Next, let Y^D(t)= Z^D_i(t) - {nu}

    _i(t) be the deviation of Z^D_i(t) from the equilibriumprediction {nu}:^asj0, 百拇医药

    _i(t). For a given P_ij(t), this equilibrium satisfies:^asj0, 百拇医药

    It exists because P* = *_ij> is ergodic and has a finitenumber of states. We can set ^N_i=0{nu}:^asj0, 百拇医药

    _i(t) = 1, and {nu}:^asj0, 百拇医药

    _i(t) becomesthe equilibrium prediction for Z^D_i(t).:^asj0, 百拇医药

    The nature of the diffusion approximation (9) below is thatthe migration and drift within demes equilibrate quickly incomparison to the rate of drift and selection in the total population.The results show that, to a sufficient order of approximation,demes can be considered to always be at a stochastic equilibrium{nu}:^asj0, 百拇医药

    _j (0 " j " N) with respect to migration and drift for a givenx. The fraction of demes that have j copies of the mutant convergeson {nu}

    _j = ^N_i=1{nu}$, 百拇医药

    _iP^*_ij, where P^*_ij is given by with x constant.The distribution {nu}$, 百拇医药

    is very well approximated by the hypergeometricdistribution$, 百拇医药

    which is a special case of the multivariatePoly(A) distribution; see Equation 40.13 in JOHNSON et al.1997 . 5 is also identical to the two-allele case ofthe compound multinomial Dirichlet distribution, which RANNALA1996 proved to hold for the frequencies of multiple alleleswithin a deme in the infinite-island or continent-island model,i.e., where allele frequencies among migrants are assumed tobe constant. RANNALA 1996 did not assume Wright-Fisher reproduction,but rather that a birth-death-immigration process occurred withindemes. Thus, RANNALA's (1996) model is similar to the Moranmodel, in which such distributions are known to arise: see pages131–133 in MORAN 1962 . ROTHMAN et al. 1974 arguedfor the use of the compound multinomial Dirichlet distributionin the case of Wright-Fisher reproduction within demes.

    The form of 5 was obtained by selecting parametersof a hypergeometric distribution that gave the same mean andvariance of allele counts among demes as 4, namely@;, 百拇医药

    which were obtained using (4) togetherwith the moments of the binomial distribution (P*). 5is the exact solution to (4) when N " 2. In addition, as requiredby (4): when m approaches 1, {nu}@;, 百拇医药

    _i becomes a binomial distributionwith parameters N and x; and as m approaches 0, we have {nu}@;, 百拇医药

    ₀ =1 - x, {nu}@;, 百拇医药

    _N = x, and {nu}@;, 百拇医药

    _j = 0 for 1 " j " N - 1. Finally, if x_i = j/Nis the frequency in some deme i, then as N grows but 2Nm = Mremains constant, (5) converges on the well-known ß-distributionresult@;, 百拇医药

    which WRIGHT 1931 obtained under theassumption that x was constant among migrants. To derive (8)from (5), it is necessary to use the limit result 6.1.46 inABRAMOWITZ and STEGUN 1965 for ratios of gamma functions andto let dx_i = 1/N. 1 plots the distribution (5) when N =10 and x = 0.75 over the full range of migration rates. Withthese parameter values, the absolute error of using (5) to approximatethe solution of (4) is never > ~

    0.007 and the relative erroris never > ~ji\t|w@, http://www.100md.com

    5%.ji\t|w@, http://www.100md.com

    fig.ommitteedji\t|w@, http://www.100md.com

    Figure 1. The approximation (5) for the distribution of mutant allele counts among demes assuming that N = 10 and x = 0.75, shown as a function of the per-generation migration rate m.ji\t|w@, http://www.100md.com

    Appendix A shows that, in the limit as D goes to infinity, thechange in x by drift and selection is so much slower than thatby migration and drift within demes that the collection of demesis always at the equilibrium {nu}ji\t|w@, http://www.100md.com

    _i, which depends on N and m, andof course x. By Theorem 3.3 of ETHIER and NAGYLAKI 1980 , asD goes to infinity the above system reduces to a diffusion x(·)with generatorji\t|w@, http://www.100md.com

    in which {gamma} = N lim_D_{"->"ji\t|w@, http://www.100md.com

    {infty}}Ds_D. Time ismeasured in units of ND/(1 - F) generations, where F is thefixation coefficient, in this case given by A13 inAppendix A. Thus, the diffusion of x is identical to the usualWright-Fisher diffusion with genic selection, with the exceptionthat it occurs on a timescale longer than that of the panmicticcase by the factor 1/(1 - F). Thus, all the well-known predictionsof that model apply; e.g., see chapter 5 of EWENS 1979 .

    CHERRY and WAKELEY 2003 assumed (8) to hold and showed thatsimulations agreed well with the predictions of the implieddiffusion process, such as the time to fixation or loss of themutant type. Without giving a proof, we can guess that thisdiffusion should be given by the results of the section aboveand Appendix A if N_D "->"r|)c@, http://www.100md.com

    {infty}r|)c@, http://www.100md.com

    when D "->"r|)c@, http://www.100md.com

    {infty}r|)c@, http://www.100md.com

    and lim_D_{"->"r|)c@, http://www.100md.com

    {infty}}2N_Dm_D = M, so that F= 1/(M + 1), but with lim_D_{"->"r|)c@, http://www.100md.com

    {infty}}N_D/D = 0 (ETHIER and NAGYLAKI 1980). CHERRY and WAKELEY 2003 also showed that the distributionof allele frequencies among demes in simulations with N = 100and m = 0.01 (and D = 1000 and s_D = 0.001) conformed well tothe predictions of 8 in a particular generation whenx was equal to 0.611. Further support for the existence of thislarge-D, large-N diffusion is given in The expected number ofneutral segregating sites by comparing its predictions underneutrality to those of the corresponding coalescent model (WAKELEY1998 ). Otherwise, N is assumed here to be finite.

    Many independently segregating loci:4p^6;2, 百拇医药

    If we posit an infinite number of loci, i.e., nucleotide sites,which can sustain mutations and which each evolve accordingto the diffusion of the previous section independently, thenthe PRF results of SAWYER and HARTL 1992 hold for x. Becauseof the way time is measured in the diffusion, the appropriatemutation parameters are also scaled:4p^6;2, 百拇医药

    The subscripts in 10 refer to "amino acid replacement"and "synonymous" following BUSTAMANTE et al. 2002 , and u_aand u_s are the per-generation rates. Thus, one effect of restrictedmigration is to increase the apparent mutation rates over thepanmictic case since 0 F 1. The other effect, of course, isto distribute variation among demes as described in the previoussection. In addition, the parameter t_div in SAWYER and HARTL1992 must here be measured in units of ND/(1 - F) generations.With these modifications, 13 and Equati14on 14 in SAWYERand HARTL 1992 apply here to x.

    Rewriting SAWYER and HARTL's (1992) 13 and 14in terms of the present notation gives0#17h4^, 百拇医药

    for the expected numbers of fixedand polymorphic, synonymous, and replacement differences intwo species. When a sample is taken from the two species, asin SAWYER and HARTL 1992 , we need to consider the chance thata polymorphic site appears fixed in a sample from the species.Here, in contrast to the panmictic case, the distribution ofthe sample among demes becomes important.0#17h4^, 百拇医药

    Assume that we have taken a random sample of n sequences fromd different demes in one of the species, such that n₁, n₂, ..., n_d are the sample sizes from each deme. We can write in generalthat the expected number of sites that show i₁, i₂, ... , i_dcopies of the mutant base in the sample (0 i_k n_k) is givenby0#17h4^, 百拇医药

    where j = a, s. The probability h(i_k|x, n_k),that i_k copies of the mutant base are in the sample of n_k itemsfrom the kth sampled deme, is an average over the within-demedistribution of allele frequencies:

    If N is large and m correspondingly small, we may wish to usethe large-deme approximation:{nv$et, 百拇医药

    That is, when N is large we can approximate the hypergeometricprobability that the sample contain i_k copies of the mutantallele (present in j copies in the deme) with a binomial distributionand the allele count distribution {nu}{nv$et, 百拇医药

    _j with WRIGHT's (1931) continousß-distribution of allele frequences, g(x_k|x).{nv$et, 百拇医药

    Because we have assumed an infinite number of independentlysegregating sites with collective mutation rates given by (10),the PRF model (SAWYER and HARTL 1992 ) shows that S_j(i₁, ..., i_d) is Poisson distributed with expected value equal to (15).The numbers of sites segregating at various frequencies withineach deme contain information about migration rates, and thenumbers of sites segregating at various frequencies in the totalpopulation contain information about the selection coefficient.Note that (15) can also be used to compute the expected numberof apparent fixed differences, i.e., polymorphisms where theentire sample has the mutant base, as required in SAWYER andHARTL's (1992) analysis. This provides a framework for estimatingselection coefficients (and migration rates) in the contextof a subdivided population. As illustrated in RESULTS, we use11121 in conjunctionwith 15 to obtain predictions about the numbers offixed-synonymous, fixed-replacement, polymorphic-synonymous,and polymorphic-replacement sites in a sample from two species.Further, 15 gives the joint frequencies among demesof segregating polymorphisms. In the panmictic case, HARTLet al. 1994 , AKASHI 1999 , and BUSTAMANTE et al. 2001 showedthat allele frequencies at polymorphic sites contain substantialinformation about selection.

    RESULTS.(evm3, 百拇医药

    The first result to note is that if each sample is taken froma different deme, the methods of SAWYER and HARTL 1992 canbe applied wihout modification. It is necessary only to realizethat the inferred mutation parameters and the divergence timeare scaled in terms of ND/(1 - F) generations instead of theusual ND generations. This result follows from the fact thateach sample drawn in this way has probability x of showing themutant base. That is, h(1|x, 1) = x and h(0|x, 1) = 1 - x, andsimilarly for h*(i_k|x, n_k). Summing 15, for each species,over all i₁, i₂, ... , i_d such that 0 < ^d_k=1i_k < ^d_k=1n_kgives SAWYER and HARTL's (1992) 15 and 19but with the scaled mutation rates that apply here: {theta} _s and {theta} _a.Similarly, SAWYER and HARTL's (1992) 17 and 18are derived by considering the chance that i_k = 1 for allk. In sum, inferences about selection coefficients, mutationrates, and divergence times are entirely robust to (island-model)population subdivision when each sample is taken from a differentdeme.

    Inferences from single-deme samples:&+, http://www.100md.com

    At the opposite extreme, consider the case in which all samplesare drawn from the same deme within each species. Note thatwe assume, as in SAWYER and HARTL 1992 , that the two speciesare identical (here in terms of N, m, and {gamma} ). Let n₁ and n₂ denotethe sample sizes from the two species. For this sample, theexpected numbers of fixed-synonymous (K_s), fixed-replacement(K_a), polymorphic-synonymous (S_s), and polymorphic-replacement(S_a) sites are given by&+, http://www.100md.com

    in which H(x, n) = 1 - h(n|x, n) - h(0|x, n).The results from Limiting allele frequency dynamics at a singlelocus are used to compute h(n|x, n) and h(0|x, n). Namely,&+, http://www.100md.com

    This same equation can be used to compute h(0|x, n) = h(n|1- x, n).&+, http://www.100md.com

    2 plots the expected values of K_s, K_a, S_s, and S_a as functionsof the migration rate when n₁ = n₂ = 10 and N = 100 and forthree different values of {gamma} : -2, 0, and 2. The results are asexpected for single-deme samples. When m = 1, they are the sameas in a panmictic population. As m decreases, samples from singledemes tend to be closely related, so the numbers of polymorphismswill decrease and the numbers of (apparent) fixation eventswill increase. This is true regardless of whether {gamma} is positive,zero, or negative, although the relative magnitudes of the fourquantities depend strongly on {gamma} . The curves for E(K_s) and E(S_s)are, of course, identical for all values of {gamma} . The results thatwould be obtained by assuming lim_N_{"->"

    {infty}}2Nm = M and using 8and 17 would be similar to what is shown in 2if M were varied from 0.02 to 200.'pp{6, http://www.100md.com

    fig.ommitteed'pp{6, http://www.100md.com

    Figure 2. The dependence on migration rate (m) of the expected values of K_s, K_a, S_s, and S_a computed using 18192021, assuming n₁ = n₂ = 10 and N = 100. In addition, {theta} _s = 10, {theta} _a = 5, and t_div = 7. (a) = 2; (b) = 0; (c) = -2.'pp{6, http://www.100md.com

    To understand the effects of (island-model) population subdivisionfor the extreme case of single-deme samples, we can use the"data" of 2 to fit the parameters of SAWYER and HARTL's(1992) panmictic model. 3 shows that estimates of {gamma} areremarkably robust to subdivision, even in this case, where theeffects of subdivision should be strongest. Again, if sampleswere taken singly from different demes, there would be no errorin using the panmictic model. For single-deme samples thereis some error when the migration rate is low, but even in theextreme case of m = 10^-4 (2Nm = 0.02) the estimates are offonly by ~

    25%. However, the level of error will be greater forlarger samples (see DISCUSSION) and when the absolute valueof {gamma} is larger. An additional effect is that the error in estimating{gamma} is conservative in that the bias is toward neutrality regardlessof whether is positive or negative. 4 shows the effecton the other parameters: t_div, {theta} _s, and {theta} _a. As should be expectedfrom 2, mutation rates are underestimated and the divergencetime is overestimated when the migration rate is small. Theerror in estimating these other parameters is much more extremethan that for {gamma} . In addition, there is a small effect of {gamma} onestimates of _a.:k, 百拇医药

    fig.ommitteed:k, 百拇医药

    Figure 3. The dependence on migration rate (m) of the estimated values of using the values of K_s, K_a, S_s, and S_a plotted in 2 and assuming SAWYER and HARTL's (1992) panmictic PRF model. At the right (m = 1) the population is in fact panmictic, and is estimated accurately in all three cases.

    fig.ommitteed?-7|f, 百拇医药

    Figure 4. The dependence on migration rate (m) of the estimated values of _s, _a, and t_div using the values of K_s, K_a, S_s, and S_a plotted in 2 and assuming SAWYER and HARTL's (1992) panmictic PRF model. Estimates of _s and t_div depend only on neutral variation, but estimates of _a show some effect of selection. The three curves are, from the top, = 2, = 0, and= -2.?-7|f, 百拇医药

    The expected number of neutral segregating sites:?-7|f, 百拇医药

    Under neutrality, the results presented here agree with thosefound using a coalescent approach in WAKELEY 1998 , and laterin WAKELEY 1999 , WAKELEY 2001 , which were derived underthe assumption that lim_N_{"->"?-7|f, 百拇医药

    {infty}}2Nm = M. We make the same assumptionhere and further assume that this occurs in such a way thatthe diffusion result still holds (see Limiting allele frequencydynamics at a single locus). Then we can use g(x_k|x) and h*(i_k|x,n_k) in expression (15) to show that the expected number of synonymoussegregating sites is equal to _s ^n-1_i=1 1/i when all n sampledare taken from separate demes. This was found in WAKELEY 1998 to hold for the samples from the neutral genetic locus in thelarge-D island model, under the assumption of no intralocusrecombination. We expect this agreement under the infinite-sitesmodel of mutation, because the marginal distribution of genealogiesat a single site must be the same as that of an entire nonrecombininglocus under neutrality. It is important to note that the variancesand other moments of the numbers of segregating sites do dependon the recombination rate.

    Consider the number of segregating sites in a sample of n sequences,all from the same deme. From the coalescent approach we haveao, 百拇医药

    (WAKELEY 1998 ), in which S₁(i, j) are Stirling numbers of thefirst kind (ABRAMOWITZ and STEGUN 1964) and M₍_n₎ = M(M + 1)... (M + n - 1). Here, 15 becomesao, 百拇医药

    andthis is shown in Appendix B to be equivalent to (23).ao, 百拇医药

    DISCUSSIONao, 百拇医药

    The results presented above can be understood in terms of asample-size effect of subdivision, one that depends on how thesample is distributed among demes. In the limit of a large numberof demes, the history of a sample under neutrality has two distinctphases: the scattering phase and the collecting phase describedin WAKELEY 1999 . Although in this analysis incorporating selectionwas not phrased in these terms, it is clear from 2 thatthe same effect is at work, namely, that a scattering phase,which is a stochastic sample size adjustment that begins witha sample of size n and ends with n' lineages each in a separatedeme, where 1 n' n (WAKELEY 1999 ), induces a downward sample-sizeadjustment to single-deme samples. In the case of large N andcorrespondingly small m, the scattering phase for a sample froma single deme is given by P[n'|n] = |S₁(n, n')|Mⁿ^'/M₍_n₎, whichappears in 23. 5 shows how the expected valuesof K_s, K_a, S_s, and S_a depend on n₁ = n₂ under panmixia with = 2. Thus, the values on the right-hand side of 5 areidentical to those on the right-hand side of 2A. Althoughscales of the horizontal axes are not the same, the effect ofsmaller migration rate is qualitatively similar to that of smallersample size. The reason that the values on the left-hand sidesof the two panels are different is that the average value ofn' at the left in 2A is equal to 1.06, which is considerablysmaller than the practical lower limit of 2 in 5. Instead,the values on the left-hand side of 5 can be compared tothose in 2A for log₁₀(m) = -2.67, or m = 0.00215, which(with N = 100) gives E[n'] {cong} 2.

    fig.ommitteedam, 百拇医药

    Figure 5. An illustration that the overestimation of fixation events and underestimation of polymorphism levels result from a sample-size effect. Except for n₁ and n₂, parameters are the same as in 2C, and the curves plot 18E20E as a function of sample size.am, 百拇医药

    This work shows that inferences about natural selection madefrom DNA polymorphism and divergence data are robust to populationsubdivision (3) as long as the migration rate is not toolow. This is remarkable in view of the strong effects subdivisionhas on numbers of polymorphisms, shown in 2, but is understandablein terms of the effect of subdivision on _s, _a, and t_div. Exceptfor the weak dependence of _a estimates on (4), subdivisionand migration act equally on selected and neutral variation.In both cases, fixation events are overestimated and polymorphismsunderestimated when the migration rate is small. This causesmutation rates to be substantially underestimated and divergencetimes grossly overestimated if subdivision is ignored, but theseeffects compensate one another and allow relatively accurateestimates of selection even if subdivision is ignored. Often{gamma} will be the focus of study, but if, _a, and t_div are alsoof interest, it would be useful to have a framework for simultaneousinferences about migration rates, selection coefficients, andthese other parameters. The theory presented above is a firststep toward this goal.

    It is important to note that inferences about natural selectionmade from allele frequencies at polymorphic sites will be robustto subdivision only in the case of samples taken singly fromdifferent demes. Otherwise, the distribution of samples amongdemes will cause some frequency classes to be overrepresented,resulting in biased inferences. Even when all the samples aretaken from the same deme, restricted migration can mimic theeffect of positive {gamma} on allele frequencies (WAKELEY and ALIACAR2001 ). While allele frequencies at polymorphic sites providean additional source of information about natural selection(HARTL et al. 1994 ), this illustrates that they are also greatlyinfluenced by nonselective demographic factors; see also NIELSEN2001 . In addition, allele frequency patterns are quite sensitiveto levels of recombination (BUSTAMANTE et al. 2001 ). Thus,it is especially important to account for subdivision when makinginferences from allele-frequency data.k{4, 百拇医药

    ACKNOWLEDGMENTS

    I thank Dan Hartl, Thomas Nagylaki, Stanley Sawyer, and CliffordTaubes for helpful discussions of the work. I am also gratefulto Sabin Lessard for seeing that deme sizes need not be largefor the large-number-of-demes coalescent to hold. This workwas supported by grants DEB-9815367 and DEB-0133760 from theNational Science Foundation.0.', 百拇医药

    Manuscript received August 6, 2002; Accepted for publication October 14, 2002.0.', 百拇医药

    APPENDIX A0.', 百拇医药

    Again, Z^D_i(t) is the fraction of demes that contain i copiesof the mutant. Let Z^D_ij(t) record the fraction of demes thatcontain i copies of the mutant and are descended from a demethat contained j copies of the mutant in the previous generation.Of course, Z^D_i(t) = ^N_j=0Z^D_ij(t), and we can study the behaviorof Z^D_i(t) by considering the simpler behavior of Z^D_ij(t). Inparticular, conditional on the state of the system z(t) at timet,

    and Z^D_ij(t + 1) and Z^D_kl(t + 1) are independentfor all i and k, and j l. Thus, we have conditional moments/[h|+, http://www.100md.com

    All the higher central moments of the Z^D_i(t + 1) are o(1/D)./[h|+, http://www.100md.com

    Now let X^D(t) = {sum} ^N_i=0 iZ^D_i(t)/N, and Y^D_i(t) = Z^D_i(t) - {nu}/[h|+, http://www.100md.com

    _i(t).The diffusion result follows from these results (derived below)for changes over one generation:/[h|+, http://www.100md.com

    in which t hasbeen suppressed, x = ^N_i=0 iz_i/N, and y_i = z_i - {nu}/[h|+, http://www.100md.com

    _i, and/[h|+, http://www.100md.com

    (A12)/[h|+, http://www.100md.com

    The fixation index is given by/[h|+, http://www.100md.com

    It is clear from A12 that c(x, 0) = 0 for all x {isin} (0,1). If, in addition, the zero solution of the difference equation/[h|+, http://www.100md.com

    (A14)

    is globally asymptotically stable, then thediffusion (9) holds (ETHIER and NAGYLAKI 1980 ). Note that y= 0 is equivalent to z_i = {nu}+h\pn2/, http://www.100md.com

    _i and that A14 is equivalentto Y(k + 1, x, y) = Y(k, x, y)P*. Proof of A14 followsfrom the ergodicity of the stochastic matrix P*, i.e., thatlim_{k"->"+h\pn2/, http://www.100md.com

    {infty}}P^*(k)_ij = {nu}+h\pn2/, http://www.100md.com

    _j, along the same lines as the proof in NAGYLAKI1980 (pp. 111–112).+h\pn2/, http://www.100md.com

    The derivation of A9 follows from A4. ForA5 we have+h\pn2/, http://www.100md.com

    Putting in q_j from 1 and simplifying give+h\pn2/, http://www.100md.com

    (A18)+h\pn2/, http://www.100md.com

    which gives (A5) if we put z_i = y_i + {nu}+h\pn2/, http://www.100md.com

    _i on the right and simplifyusing 7.+h\pn2/, http://www.100md.com

    For A6 we have+h\pn2/, http://www.100md.com

    Again, putting in q_j and simplifying, this becomes A6.

    For A7 we have6j[[*(, http://www.100md.com

    As in (A20) above, the second sum on the right in (A26) is equalto E[X^D(1) - x|z], which, from (A5), is o(1). Expanding andconsidering the third and fourth central moments of Z^D_i(1) givesthe result (A7).6j[[*(, http://www.100md.com

    In the derivations of (A8) and (A9) below I assume that theexact solution of (4) is sufficiently close to (5) that thelatter can be used in place of the exact solution. More precisely,I assume that6j[[*(, http://www.100md.com

    where the coefficients r_k dependon N, i, m, and x. This is certainly true for 5, andbecause (5) and the exact solution of (4) are nearly identicalin form (see 1 and associated text), it should also betrue of the exact solution although the coefficients r_k willbe different.6j[[*(, http://www.100md.com

    For A8 we have6j[[*(, http://www.100md.com

    Using (A27), the second term on the right in A29 becomes6j[[*(, http://www.100md.com

    Then by the same argument that gave (A7), using (A1), it canbe shown that these higher moments are also o(1/D). Becauseof this, and putting in {nu}

    _i = ^N_j=0{nu}cza#], http://www.100md.com

    _jP^*_ji, A29 becomescza#], http://www.100md.com

    which is equal to (A8).cza#], http://www.100md.com

    For A9 we havecza#], http://www.100md.com

    using(3.12) in ETHIER and NAGYLAKI 1980 . From (A3), we have Var[Z^D_i(1)|z]= o(1). From A27 we can see that, like (A30), the secondterm in (A34) ultimately depends on the moments of Z^D_i and sois also o(1). Therefore, Var[Y^D_i(1)|z] = o(1) as required inA9.cza#], http://www.100md.com

    This completes the derivation of (A5–A9), showing thatTheorem 3.3 in ETHIER and NAGYLAKI 1980 can be applied andthat the diffusion x(·) with generator (9) in the textholds as D goes to infinity.cza#], http://www.100md.com

    APPENDIX Bcza#], http://www.100md.com

    Beginning with 24, and then putting in g(x|x) and {phi}cza#], http://www.100md.com

    _s(x),we havecza#], http://www.100md.com

    Using the identity

    we obtain/a7xfry, 百拇医药

    which is the same as 23 since the firstterm (n' = 1) is equal to zero./a7xfry, 百拇医药

    LITERATURE CITED/a7xfry, 百拇医药

    ABRAMOWITZ, M., and I. A. STEGUN, 1965 Handbook of Mathematical Functions. Dover, New York./a7xfry, 百拇医药

    AKASHI, H., 1999 Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151:221-238./a7xfry, 百拇医药

    BUSTAMANTE, C. D., J. WAKELEY, S. SAWYER, and D. L. HARTL, 2001 Directional selection and the site-frequency spectrum. Genetics 159:1779-1788./a7xfry, 百拇医药

    BUSTAMANTE, C. D., R. NIELSEN, S. A. SAWYER, K. M. OLSEN, and M. D. PURUGGANAN et al., 2002 The cost of inbreeding in Arabidopsis.. Nature 416:531-534./a7xfry, 百拇医药

    CHARLESWORTH, B., 2001 Effect of life history and mode of inheritance on neutral genetic variation. Genet. Res. 77:153-166./a7xfry, 百拇医药

    CHERRY, J. L. and J. WAKELEY, 2003 A diffusion approximation for selection and drift in a subdivided population. Genetics 163:421-428.

    DONNELLY, P., M. NORDBORG, and P. JOYCE, 2001 Likelihoods and simulation methods for a class of nonneutral population genetic models. Genetics 159:853-867.3, 百拇医药

    ETHIER, S. N. and T. NAGYLAKI, 1980 Diffusion approximations of Markov chains with two timescales and applications to population genetics. Adv. Appl. Prob. 12:14-49.3, 百拇医药

    EWENS, W. J., 1979 Mathematical Population Genetics. Springer-Verlag, Berlin.3, 百拇医药

    EWENS, W. J., 1990 Population genetics theory—the past and the future, pp. 177–227 in Mathematical and Statistical Developments of Evolutionary Theory, edited by S. LESSARD. Kluwer Academic Publishers, Amsterdam.3, 百拇医药

    FAY, J. C., G. J. WYCKOFF, and C.-I WU, 2002 Testing the neutral theory of molecular evolution with genomic data from Drosophila.. Nature 415:1024-1026.3, 百拇医药

    FISHER, R. A., 1930 The Genetical Theory of Natural Selection. Clarendon Press, Oxford.3, 百拇医药

    FU, X.-Y. and W.-H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133:693-709.

    HARTL, D. L., E. N. MORIYAMA, and S. A. SAWYER, 1994 Selection intensity for codon bias. Genetics 138:227-234.8-k#g, 百拇医药

    HUDSON, R. R. and N. L. KAPLAN, 1988 The coalescent process in models with selection and recombination. Genetics 120:831-840.8-k#g, 百拇医药

    HUDSON, R. R., M. KREITMAN, and M. AGUADE, 1987 A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159.8-k#g, 百拇医药

    JOHNSON, N. L., S. KOTZ and N. BALAKRISHNAN, 1997 Discrete Multivariate Distributions. Wiley, New York.8-k#g, 百拇医药

    KIMURA, M., 1969 The number of heterozygous nucleotide sites maintained in a finite population due to the steady flux of mutations. Genetics 61:893-903.8-k#g, 百拇医药

    LATTER, B. D. H., 1973 The island model of population differentiation: a general solution. Genetics 73:147-157.8-k#g, 百拇医药

    MARUYAMA, T., 1970 Effective number of alleles in a subdivided population. Theor. Popul. Biol. 1:273-306.8-k#g, 百拇医药

    MCDONALD, J. H. and M. KREITMAN, 1991 Adaptive protein evolution at the adh locus in Drosophila.. Nature 351:652-654.

    MÖHLE, M., 1998 Robustness results for the coalescent. J. Appl. Prob. 35:438-447.&jefdi0, 百拇医药

    MÖHLE, M., 2001 Forward and backward diffusion approximations for haploid exchangeable population models. Stoch. Proc. Appl. 95:133-149.&jefdi0, 百拇医药

    MORAN, P. A. P., 1959 The theory of some genetical effects of population subdivison. Austr. J. Biol. Sci. 12:109-116.&jefdi0, 百拇医药

    MORAN, P. A. P., 1962 Statistical Processes of Evolutionary Theory. Clarendon Press, Oxford.&jefdi0, 百拇医药

    NAGYLAKI, T., 1980 The strong-migration limit in geographically structured populations. J. Math. Biol. 9:101-114.&jefdi0, 百拇医药

    NEUHAUSER, C. and S. M. KRONE, 1997 The genealogy of samples in models with selection. Genetics 145:519-534.&jefdi0, 百拇医药

    NIELSEN, R., 2001 Statistical tests of neutrality in the age of genomics. Heredity 86:641-647.&jefdi0, 百拇医药

    NORDBORG, M., 1997 Structured coalescent processes on different time scales. Genetics 146:1501-1514.&jefdi0, 百拇医药

    NOTOHARA, M., 1993 The strong migration limit for the genealogical process in geographically structured populations. J. Math. Biol. 31:115-122.

    RANNALA, B., 1996 The sampling theory of neutral alleles in an island population of fluctuating size. Theor. Popul. Biol. 50:91-104.6\8dg, http://www.100md.com

    ROTHMAN, E. D., C. F. SING, and A. R. TEMPLETON, 1974 A model for the analysis of population structure. Genetics 78:934-960.6\8dg, http://www.100md.com

    SAWYER, S. A. and D. L. HARTL, 1992 Population genetics of polymorphism and divergence. Genetics 132:1161-1176.6\8dg, http://www.100md.com

    SLATKIN, M., 1985 Gene flow in natural populations. Annu. Rev. Ecol. Syst. 16:393-430.6\8dg, http://www.100md.com

    SLATKIN, M. and G. BERTORELLE, 2001 The use of intraallelic variability for testing neutrality and estimating population growth rate. Genetics 158:865-874.6\8dg, http://www.100md.com

    SMITH, N. G. and A. EYRE-WALKER, 2002 Adaptive protein evolution in Drosophila.. Nature 415:1022-1024.6\8dg, http://www.100md.com

    TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.6\8dg, http://www.100md.com

    WAKELEY, J., 1998 Segregating sites in Wright's island model. Theor. Popul. Biol. 53:166-175.6\8dg, http://www.100md.com

    WAKELEY, J., 1999 Non-equilibrium migration in human history. Genetics 153:1863-1871.6\8dg, http://www.100md.com

    WAKELEY, J., 2001 The coalescent in an island model of population subdivision with variation among demes. Theor. Popul. Biol. 59:133-144.6\8dg, http://www.100md.com

    WAKELEY, J. and N. ALIACAR, 2001 Gene genealogies in a metapopulation. Genetics 159:893-905. (corrigendum: 160: 1263–1264).6\8dg, http://www.100md.com

    WRIGHT, S., 1931 Evolution in Mendelian populations. Genetics 16:97-159.6\8dg, http://www.100md.com

    YANG, Z., 1998 Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568-573.(John Wakeley)

百拇医药网 http://www.100md.com/html/DirDu/2005/05/05/58/55/74.htm