当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2005年 > 第8期 > 正文
编号:11258342
Mathematical Modeling of Evolution of Horizontally Transferred Genes
     National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health

    E-mail: koonin@ncbi.nlm.nih.gov.

    Abstract

    We describe a stochastic birth-and-death model of evolution of horizontally transferred genes in microbial populations. The model is a generalization of the stochastic model described by Berg and Kurland and includes five parameters: the rate of mutational inactivation, selection coefficient, invasion rate (i.e., rate of arrival of a novel sequence from outside of the recipient population), within-population horizontal transmission ("infection") rate, and population size. The model of Berg and Kurland included four parameters, namely, mutational inactivation, selection coefficient, population size, and "infection." However, the effect of "infection" was disregarded in the interpretation of the results, and the overall conclusion was that horizontally acquired sequences can be fixed in a population only when they confer a substantial selective advantage onto the recipient and therefore are subject to strong positive selection. Analysis of the present model in different domains of parameter values shows that, as long as the rate of within-population horizontal transmission is comparable to the mutational inactivation rate and there is even a low rate of invasion, horizontally acquired sequences can be fixed in the population or at least persist for a long time in a substantial fraction of individuals in the population even when they are neutral or slightly deleterious. The available biological data strongly suggest that intense within-population and even between-populations gene flows are realistic for at least some prokaryotic species and environments. Therefore, our modeling results are compatible with the notion of a pivotal role of horizontal gene transfer in the evolution of prokaryotes.

    Key Words: horizontal gene transfer ? mathematical modeling ? evolution of prokaryotes

    Introduction

    Sequencing of multiple, complete genomes of diverse life forms and the ensuing advent of comparative genomics have dramatically changed the prevailing picture of evolution, at least for the prokaryotic world. It became clear that the evolutionary process is much more flexible and dynamic than previously imagined. In addition to the vertical inheritance of genes along a tree-like evolutionary trajectory, lineage-specific gene loss and horizontal (lateral) gene transfer (HGT) have emerged as major evolutionary forces, leading to the ideas of "uprooting the tree of life" and the concept of "horizontal genomics" (Pennisi 1998; Doolittle 1999a, 1999b, 2000; Pennisi 1999; Koonin, Aravind, and Kondrashov 2000; Koonin, Makarova, and Aravind 2001). Under a somewhat extreme view of the prevalence of HGT in the evolution of prokaryotes, even coherent tree topologies observed in multigene analyses might be due to gradients of HGT propensity permeating the prokaryotic world: stable clusters in such trees are thought to comprise groups of microbes which exchange genes frequently (Gogarten, Doolittle, and Lawrence 2002). However, the extent of HGT in prokaryotic evolution remains a matter of contention (Kurland 2000; Kurland, Canback, and Berg 2003). Like other events that happened in the evolutionary past, each individual case of HGT is hard to prove beyond reasonable doubt. Relatively recent cases of probable HGT are usually demonstrated through anomalous nucleotide composition or codon usage of the genes thought to have been transferred (Tsirigos and Rigoutsos 2005). These methods are not applicable to putative ancient transfers which are detected on the basis of unexpected phyletic patterns of genes (i.e., patterns of presence-absence in genomes from different taxa) and/or discrepancies in the topologies of phylogenetic trees (Ochman, Lawrence, and Groisman 2000; Koonin, Makarova, and Aravind 2001; Ragan 2001; Lawrence and Hendrickson 2003). In many studies, explicit phylogenetic analysis is complemented or replaced by simpler analyses of sequence similarity rankings, usually, based on Blast search results (Koonin et al. 1997; Aravind et al. 1998). None of these approaches is free of substantial caveats (Kurland, Canback, and Berg 2003). Any phyletic pattern, however unusual, in principle, can be explained solely through multiple, lineage-specific gene losses (Koonin 2003). Similarly, phylogenetic tree topologies and sequence similarity rankings are strongly affected both by gene loss and by unequal rates of evolution in different lineages (Koski and Golding 2001). As a result, it has been posited that the notion that HGT dominates prokaryotic evolution is a misconception stemming from noncritical data analysis which suggests vastly exaggerated rates of HGT (Kurland 2000; Kurland, Canback, and Berg 2003).

    As noticed in a recent review by Lawrence and Hendrickson, the study of HGT remains a research field in its adolescence (Lawrence and Hendrickson 2003). This does not seem surprising given that the appreciation of the potential major significance of HGT as an evolutionary factor dates only from the last 3–4 years of the 20th century when systematic comparison of multiple sequenced genomes became possible (Koonin et al. 1997; Doolittle 1999b; Koonin, Aravind, and Kondrashov 2000). The lack of certainty regarding the true extent of HGT is one of the crucial aspects of this immaturity of "horizontal genomics." The other distinct but not unrelated aspect is the need to integrate HGT into the framework of the existing evolutionary theory which, in its current form, is based solely on the notion of vertical inheritance of genetic characters (Kimura 1983). Specifically, it is necessary to identify the neutral and/or selective evolutionary factors that affect the fate of a horizontally transferred gene leading to its fixation in or elimination from a recipient population. It seems likely that a robust evolutionary theory of HGT will provide feedback for assessing evidence of individual HGT cases and ultimately for more reliable estimates of the role of this phenomenon in evolution. The first evolutionary-theoretical analysis of HGT in microbes has been reported by Berg and Kurland (2002). They concluded that, at least in populations of large effective size that are typical of most prokaryotes, horizontally acquired genes can persist and get fixed only when they provide a strong selective advantage to the recipient. It seems that cases when horizontally acquired genes are strongly beneficial to the recipient should be exceedingly rare. Indeed, when HGT occurs, a gene moves from one cellular background to another, especially when the donor and acceptor are taxonomically and biologically distant. Because cellular components are fine-tuned by natural selection for coordinate functioning, chances that an alien protein confers a selective advantage onto the recipient will be low. This would apply to different functional systems of the cell to a different degree, with those that are based on large, multiprotein complexes being affected most strongly as captured in the complexity hypothesis of Lake and coworkers (Jain, Rivera, and Lake 1999). Nevertheless, the biologically reasonable expectation seems to be that cases of unequivocally beneficial HGT should be (extremely) rare. Exceptions are known, e.g., situations when a horizontally acquired gene makes the recipient resistant to an antibiotic(s), allows it to occupy a new nutritional niche or makes it a pathogen (Brown, Zhang, and Hodgson 1998; Mazel and Davies 1999; Rowe-Magnus, Davies, and Mazel 2002). Nevertheless, combination of this biological reasoning and the theoretical conclusions of Berg and Kurland fuels the contention that the extent of HGT in prokaryotes might have been seriously overestimated by the proponents of "horizontal genomics" and that HGT is, perhaps, an important but not a decisive factor of prokaryotic evolution (Kurland, Canback, and Berg 2003).

    Here we develop more general theoretical models of HGT between microbial populations and identify the conditions under which fixation of neutral or even slightly deleterious horizontally transferred genes is possible.

    Results and Discussion

    Deterministic Model

    We consider a haploid asexual population with overlapping generations (continuous time). In this section, we assume that the effect of stochastic evolutionary factors, such as genetic drift, is negligible. It is assumed that the population under consideration consists of two types of individuals: those that carry a particular novel sequence and those that do not. This means that the state of a population with regard to the novel sequence in question at a given moment t is completely determined by two numbers n1(t) and n2(t) (the numbers, respectively, of the first- and second-type individuals at the moment t) or just one number p(t) = n1(t)/(n1(t) + n2(t)), where p(t) is the fraction of the first-type individuals in the population.

    The assumptions of the simplest model are as follows. A novel sequence can be transmitted vertically through cell division or acquired via HGT. The novel sequence is also subject to mutational inactivation (point mutations, insertions, and deletions) with a constant rate u. We assume that Malthusian parameters (intrinsic growth rates) are m1 and m2 (Nagylaki 1992). In addition, we assume that there is invasion of the first-type individuals with the rate N where N is the total size of the population, N = n1 + n2. Specifically, with regard to HGT, invasion means that acquisition of a particular alien gene is not a unique event, with a negligibly small probability of repeated occurrence, but rather that there is continuous influx of the alien gene, even if occurring at a low rate.

    Taking into account only processes of mutation, selection, and invasion, the textbook system (Nagylaki 1992) is obtained for the number of individuals of different types,

    or for frequency p,

    where q = 1 – p and is the mean continuous fitness, This is the simplest mathematical model describing changes in the genetic content of a population. Note that, in this model, invasion is a special case of recurrent mutation with the rate .

    In terms of frequencies of different types of individuals in the population, the assumption that invasion occurs with the rate N is equivalent to replacing the second-type individuals with first-type individuals (those that carry the novel sequence) with the rate . To be more precise, if we consider the following system of differential equations

    the equation for the frequency p of first-type individuals remains unchanged.

    There are three principal mechanisms for transfer of genes between microbes: (1) transduction, bacteriophage-mediated gene transfer, (2) transformation, transfer of DNA (e.g., released from dead microbes) from the environment to a recipient cell, and (3) conjugation, direct transfer of DNA from one cell to another mediated by a plasmid (Bushman 2001). Each of these processes can mediate HGT both within a microbial population (we will call this "infection" for short) and between populations (invasion). Generally, it is expected that infection rates are substantially higher than invasion rates.

    We can define the infection rate as the probability per unit of time for an individual that does not have the novel sequence to acquire it (to become "infected"). As an analog of the law of mass action in chemical kinetics, we assume that the infection rate is proportional to n1, with the proportionality constant . The intensity of contact between infected and uninfected organisms is proportional to n1n2/N = n1n2/(n1 + n2). This directly applies to conjugation, which involves physical contact between two cells, but generally holds true for transduction and transformation as well inasmuch as the release of the novel sequence within a transducing agent or transforming DNA is proportional to the number of cells which harbor that sequence. Combining all the above assumptions, we obtain the following deterministic HGT-selection-mutation-invasion model

    (1)

    or

    (2)

    for the frequency of the infected individuals.

    Assuming m1 – m2 = s, where s is the selection coefficient of the infected individuals, we obtain a logistic-type equation with immigration

    (3)

    that can be easily analyzed (e.g., Matis and Kiffe 1999). Note that the rate is the rate at which uninfected individuals become infected which, as outlined above may involve different biological mechanisms. If all the parameters have nonzero values and 0 p 1, equation (3) has only one equilibrium solution satisfying the condition 0 < p* < 1 and this equilibrium is asymptotically stable. Thus, under this model, a gene acquired via HGT will persist in the population as long as there is continuous (even if low-rate) invasion, i.e., influx of the novel sequence. However, even if = 0, i.e., there is no invasion and s < 0, i.e., the novel sequence is deleterious, but > –s, there is a stable interior equilibrium that corresponds to the HGT-selection-mutation balance. This describes another scenario (distinct from the scenario with a selectively advantageous gene acquired) for the persistence of an acquired gene even without continuous influx: the novel gene will persist if there is a sufficiently effective within-population mode of transmission (infection).

    Equations (1–3) are classical models of population genetics of asexual organisms. The inclusion of the processes of infection and invasion and application of the law of mass action allows us to incorporate HGT within the bounds of known mathematical models.

    Stochastic Model

    The simple, deterministic model described in the previous section includes only systematic factors of evolution whose rates are assumed to be constant (Wright 1949). To model genetic drift in the population, we formulate a stochastic counterpart of the model (1–3). To make the model mathematically manageable, we assume that the total size of the population is constant and equals N. We use a Markov birth and death process {X(t), t 0} with the finite state space . The number, n, of individuals that carry a particular novel sequence (a gene acquired via HGT) determines the state of the population. Transitions in the process are only allowed to neighbor states. The rate of transition from state n to state n + 1 (the "birth rate" of infected individuals) is denoted n, and the rate of transition from the state n to the state n – 1 is denoted by μn (the "death rate" of infected individuals, i.e., the rate at which the novel sequence is lost, which includes actual cell death among other processes).

    The Kolmogorov forward equations for the state probabilities can be written as

    In order to make sense of this equation, we put formally The state probabilities depend on the initial distribution pn(0).

    This scheme corresponds to a Moran model (Moran 1958) which, in the case of a haploid population with overlapping generations, is more realistic for microbial populations than the more widely used Wright-Fisher model; in particular, a Moran model was adopted in the work of Berg and Kurland (2002).

    Statement of the Model

    In order to analyze the model, we need to identify all the rates of the birth and death process. Let the individuals without the novel sequence divide with rate ; we will define the time unit in the model such that = 1. Then, the individuals with the novel sequence divide with the rate (1 + s), where s is the selection coefficient. When one of the individuals divides, we remove a random individual from the population to keep its size constant. As in the deterministic model (1–3), we take into account processes of inactivating mutation with the constant rate u, invasion with the constant rate N, and within-population transmission of the novel sequence (infection) with the rate .

    The system can change its state from n to n + 1 if an infected cell divides and there is no inactivating mutation or if a new infected cell immigrates. In this case, the total number of cells is N + 1, and we need to remove one of the N – n cells (the cells that do not carry the novel sequence); thus, the probability of this event is (N – n)/(N + 1). In addition, any cell that does not carry the novel sequence can become infected with the rate .

    The jump from state n to state n – 1 occurs when a cell without the novel sequence divides or a cell with the novel sequence divides and an inactivating mutation afflicts one of the daughter cells. In this case, we need to remove an infected cell from the population, an event with the probability n/(N + 1). Accordingly, for the birth and death rates of infected cells, we obtain

    (4)

    According to the rate equations (4), we deal with a birth and death process with a finite state space and reflecting boundaries (i.e., ). As in the deterministic model (1–3), we can consider the rate as the rate with which uninfected individuals become infected because we can rewrite the birth rate in the following form:

    Berg and Kurland (2002) considered a Markov process with rates (4) and = 0 in which there is no invasion and, accordingly, the state 0 is absorbing. Under this model, regardless of the initial distribution of the novel sequence, it will be ultimately lost in the population. In practice, the decision on whether or not to include invasion in the model rests on the comparison between the mean time to extinction from a particular initial distribution and the mean time before the appearance of a new individual carrying the novel sequence. As discussed in the last section, under conditions favoring HGT, e.g., when the donor and recipient organisms coexist in close proximity, the rate of invasion is likely to be nonnegligible compared to the rate of extinction of a novel sequence. Therefore, models with invasion are, generally, more realistic than those without it. From a formal viewpoint, inclusion of the possibility of invasion changes the qualitative behavior of the stochastic model in a way that simplifies the analysis because the stochastic model with nonzero invasion has a stable stationary distribution (a direct analog of the stable equilibrium in the deterministic model).

    Stationary Distribution

    The birth and death process with a finite state space and reflecting boundaries has the unique, stable stationary distribution p* that can be easily obtained noting that the following relation must be satisfied at equilibrium:

    Rearranging and iterating gives

    (5)

    The formula (5) is easy to evaluate numerically. If we assume that there are limits then the following theorem holds.

    Theorem 1. Suppose the parameters of the birth and death process with rates (4) are such that the following limits exist: Then, if and the stationary distribution (5) asymptotically tends to the distribution with the density

    (6)

    where and C is a constant chosen such that

    The proof of this theorem is given in the Appendix.

    The stationary distribution (6) is a complete analog of the stationary distribution of the diffusion approximation of the Wright-Fisher selection-mutation model (e.g., Goel and Richter-Dyn 1974). The only difference is the presence of one additional parameter—the infection rate —which, as in the deterministic case, is added to the selection coefficient s.

    All the qualitatively different forms of the stationary distribution can be classified using the approximation (6). This classification only depends on the products

    If r, q < 1 (i.e., N, uN < 1), which corresponds to a situation with low rates of inactivating mutations and invasion, the most probable states are near the boundaries. If h > 0 (i.e., s + > 0), more probability is concentrated near x = 1 (fig. 1a), otherwise (if h < 0), more probability is concentrated near x = 0. The distribution has a single minimum. In substantive terms, if the combined values of the selective advantage and infection rate favor the survival of the novel sequence, it tends to sweep the population; otherwise, it is likely to go extinct.

    FIG. 1.— Possible qualitatively different stationary distributions of the HGT-mutation-selection-invasion model with rates (4). In each panel, the approximation (6) and numerically evaluated exact stationary distribution are depicted. The stationary distribution is normalized such that the areas under the curves equal one. The parameters are (a) h = 0.06, r = 0.98, q = 0.97; (b) h = 0.6, r = 0.8, q = 4; (c) h = 6, r = 0.8, q = 4; (d) h = 0.06, r = 2, q = 0.8; (e) h = –3.8, r = 2, q = 0.8; (f) h = 0.06, r = 6, q = 1.8.

    If r < 1, q > 1 (low invasion rate, high rate of inactivating mutations), then the most probable state is near x = 0, i.e., the novel sequence tends to perish (fig. 1b). However, for a particular range of values s + > 0, the distribution can have a local maximum (fig. 1c).

    In the case r > 1, q < 1 (high invasion rate, low rate of inactivating mutations), the reverse is observed, with the most probable state being fixation of the novel sequence; however, the distribution can have a local maximum if s + < 0 (fig. 1d and e).

    Finally, if r > 1, q > 1 (high rates of invasion and inactivating mutations), the distribution is unimodal and this is the only case when the most probable state is close to the deterministic steady-state value for equation (3) (fig. 1f).

    To summarize, the stationary distribution under the stochastic model can have at most two extremes. The most probable states are determined by the joint effect of all five parameters of the model. The approximation (6) closely mimics exact distributions produced by numerical simulation, at least for the considered ranges of parameter values (fig. 1).

    Approximation (6) can help decide which parameters make substantial contributions to the evolution of the population in each particular situation. Let us consider the quantity —the average population penetration of the novel sequence (i.e., the average fraction of type 1 individuals). This is, simply, the expected value of a random variable with the density function (6). In figure 2a, the level lines of the function are shown for a fixed q = uN. In figure 2b, the implicit function is plotted for different values of q. These graphs show that if the rate of invasion N is substantially lower than the rate of inactivating mutation uN, significant penetration (on average) can be reached only with high positive values of (s + )N.

    FIG. 2.— (a) Contour plot of the average population penetration with the fixed value uN = 0.5. (b) Level lines for 50% population penetration of the novel sequence for different values of uN.

    Probability of Fixation

    We can formally let to make states 0 and N absorbing. In this situation, the fate of a unique individual carrying the novel sequence can be examined. Through genetic drift, this novel sequence can be lost (the state 0 is reached) or can penetrate all the population (the state N is reached). In the latter case, we speak of fixation of the novel sequence in the population although it has to be kept in mind that, in the case of reflecting boundaries, the system does not stay in state N. Letting is somewhat artificial but the reasoning behind examining this situation is as follows. Firstly, we can assume that, once the novel sequence is acquired by all the individuals in a population, it becomes essential and cannot be lost (hence ). When the rate of invasion is small, the typical fate of a novel sequence appearing in the population is extinction. Under typical conditions (low invasion and inactivation mutation rates), most of the time, the population waits for a "lucky" sequence to be fixed; accordingly, letting 0 = 0 is not unrealistic. The time of fixation conditioned such that the fixation does occur is usually much lower than the mean time to reach state N with a reflecting boundary at n = 0.

    The probability that the system ends up in the state N (the probability of fixation) if initially there is only one individual carrying this novel sequence is (e.g., Goel and Richter-Dyn 1974)

    Using approximations given in the proof of Theorem 1 (see Appendix), it can be shown that the following integral approximation can be used to evaluate Pfix:

    (7)

    First, let = 0. This case was examined by Berg and Kurland (2002) who showed that, if s + < u, then the probability of fixation is virtually zero for large N (more precisely, this, of course, implies the effective population size Ne; hereinafter, we use N for simplicity). If , there is a plateau where the probability of fixation does not depend on N, and for large N, there is a sharp drop in this probability. Finally, if , the probability of fixation has a limit that does not depend on N. These three distinct behaviors are illustrated in figure 3a.

    FIG. 3.— Probability of fixation of the novel sequence as a function of the population size N. (a) The case of no invasion, = 0. The dashed line is the probability of fixation of a neutral sequence in the presence of no evolutionary forces except for genetic drift, Pfix = 1/N. The inactivating mutation rate u = 10–7. The values of s + for the solid curves from bottom to top are 0, 10–8, 5 x 10–7, 8 x 10–7, 5 x 10–6, 10–5. (b–d) The case of invasion with different parameters. (b) Parameters are u = 10–7, s + = 10–8. The values of for the curves from bottom to top are 0, 10–8, 2 x 10–8, 5 x 10–8. (c) Parameters are u = 10–7, s + = 8 x 10–7. The values of for the curves from bottom to top are 0, 10–9, 10–8, 2 x 10–8. (d) Parameters are u = 10–7, s + = 5 x 10–6. The values of for the curves from bottom to top are 0, 10–10, 10–9, 5 x 10–9.

    All the curves in figure 3a were obtained using the integral approximation (7). This integral approximation holds well in most ranges of parameters except in the region all analyses described in this work were well within the applicability range of the approximation. If qualitative changes in the behavior of Pfix are observed. Obviously, Pfix is expected to increase. However, if < u, i.e., the invasion rate is lower than the inactivation rate, and the limiting behavior of Pfix is the same as in the case without invasion, namely, when (fig. 3b and c). There is, however, a range of N values in which Pfix is greater than the same probability with = 0. In contrast, in the case of inclusion of even low-rate invasion results in a notable change of the limiting behavior of Pfix: the plot of Pfix versus the population size N has a minimum at the point N = Nmin, and for N > Nmin, Pfix increases with the increase of N (fig. 3d).

    The probability of fixation in the neutral case (s = 0) without immigration is primarily determined by the ratio of the rate of inactivating mutations, which oppose fixation, and the rate of horizontal transmission of the novel sequence within the recipient population. If the transmission (infection) rate is high enough, neutral fixation of the novel sequence is an event with a nonzero probability. The inequality is not very restrictive because it demands that the transmission rate is approximately 10–20 times greater than the inactivation rate. Moreover, even if the novel sequence is slightly deleterious, it can be fixed not only due to random drift in a finite population but also due to the possibility of infection (fig. 4). The fixation of slightly deleterious alleles in a finite population leading to a decline in the mean fitness of the population is known as Muller's ratchet (Muller 1964; Felsenstein 1974). The relatively high rate of the within-population transmission (infection) offers an alternative scenario for reducing the mean population fitness during evolution.

    FIG. 4.— Probability of fixation of the novel sequence as a function of the selection coefficient s. Parameters are N = 108, u = 5 x 10–9, = 5 x 10–7 (the upper curve), = 5 x 10–8 (the lower curve). The dashed horizontal line is the probability of fixation of a neutral sequence Pfix = 1/N when there are no evolutionary forces except for the random genetic drift.

    Quasistationary Distributions

    If there is no invasion in the model, the novel sequence is doomed to extinct. More precisely, let = 0. It is readily shown that the process {X(t)} has a degenerate stationary distribution The distribution of X(t) approaches the stationary distribution as time t approaches infinity. Thus, ultimate absorption (extinction of the novel sequence) is certain. To evaluate the mean time to extinction if initially only one individual carries the novel sequence, we can use the following formula (e.g., Goel and Richter-Dyn 1974):

    Using approximations given in the proof of Theorem 1 (see Appendix), an integral approximation for this quantity can be obtained (see also Berg and Kurland 2002):

    If s + < u, then is virtually the same as for the neutral expectation (when s + = 0). In contrast, if s + > u, i.e., selection combined with horizontal gene transmission dominates over inactivation, then the time to absorption (extinction) goes through a minimum and then increases sharply with increasing N (fig. 5; this figure mimics fig. 2B of Berg and Kurland [2002]). It is therefore of interest to analyze the distribution of X(t) prior to absorption. This is done using the concept of quasistationarity.

    FIG. 5.— Mean time to extinction as a function of the population size N. The inactivation rate u = 10–7. The values of s + for the curves from bottom to top are –10–8, 10–8, 10–7, 5 x 10–7, 10–6.

    There are many biological and ecological systems that eventually go extinct yet appear to be stable over any reasonable timescale. The notion of quasistationary distribution has proved to be a powerful tool for modeling the behavior of such systems (Pollet 1996). In particular, it allows one to predict the possible distribution of X(t) on its way to extinction.

    The state space of the birth and death process under consideration can be partitioned into two subsets, one containing the absorbing state 0 and the other comprised of the transient states Before absorption, the process assumes values in the set of transient states. If the process is conditioned on the event that absorption has not taken place at time t, then the conditional state probabilities qn(t) can be determined from the state probabilities pn(t) through the following relation:

    By differentiating this relation and using the Kolmogorov forward equations for pn(t), we can obtain a system of differential equation for qn(t). The quasistationary distribution q* is the stationary solution of this system of equations. The probabilities can be shown to satisfy the following system of difference equations:

    In the general case, these equations cannot be solved explicitly. They can, however, be used to derive the relations to which iteration methods for determining the quasistationary distribution can be applied (N?sell 2001) (the algorithm for evaluating the quasistationary distribution is presented in the Appendix). The simplest way to obtain an approximation of the quasistationary distribution is to restrict consideration to transient states (the state space is made strictly positive). By excluding zero from the state space, one can establish a related process without an absorbing state. This method has been applied in several mathematical models (Kendall 1949; Pielou 1969) and is valid when the time to extinction is reasonably large (N?sell 2001).

    We consider an auxiliary process {X0(t)} which can be described as the original process with the origin removed. Formally, we put μ1 = 0, while all other transition rates are equal to the corresponding rates for the original process. The stationary distribution for the process {X0(t)} is easy to determine. A good approximation for this stationary distribution is given by (6) with r = 0 and the normalization constant determined by This means that we can use equation (6) as an approximation for the quasistationary distribution. The main question is how close this approximation is to the actual quasistationary distribution.

    Figure 6 shows that the simple approximation (6) with = 0 is good (except for the area near 0) if s + >> u, i.e., this approximation holds when the mean time to extinction is sufficiently long (compare to fig. 5). However, even with lower values of s + , i.e., s + u(1 – ln u), the approximation (6) is close to the observed quasistationary distribution almost in the whole range of x except for the area near 0 (data not shown).

    FIG. 6.— Quasistationary distributions of the HGT-selection-mutation model. The thin solid and dotted curves are, respectively, the exact and approximate distributions of the auxiliary process with μ1 = 0, and the thick solid curves are quasistationary distributions obtained with iteration methods. The parameters are (s + )N = 8, uN = 9.5 (a), uN = 1.17 (b), and uN = 0.37 (c).

    Thus, even if invasion is not included in the model, the most probable states of the process can be near 1 depending on the values of the other parameters, i.e., for a certain part of the parameter space, the novel sequence, on its way to extinction, penetrates almost the entire population with a high probability.

    General Discussion and Conclusions

    The present model is a generalization of the stochastic model described by Berg and Kurland (2002). The model includes five parameters: inactivating mutation rate, selection coefficient, invasion rate, within-population horizontal transmission (infection) rate, and population size. Berg and Kurland come to the conclusion that horizontally acquired sequences can be fixed in a population only when these sequences confer a substantial selective advantage onto the recipient and therefore are subject to strong positive selection. However, although they formally consider the process of infection when formulating the model, its effect is excluded from their interpretation. Therefore, their conclusions are based on the model with only two processes: genetic drift and mutational inactivation. In this setting, it becomes self-evident that, in typical, large microbial populations, horizontally transferred sequences can survive for any appreciable duration of time only when they are strongly beneficial. Because such situations are reasonably expected to be rare (the exceptions, such as acquisition of antibiotic resistance or ability to utilize new nutrients, notwithstanding), the results of Berg and Kurland's modeling imply that HGT played less of a role in the evolution of prokaryotes than given in "horizontal genomics" concepts (Kurland, Canback, and Berg 2003).

    In the present model, two additional processes are included and, in a certain domain of parameter values, significantly contribute to the outcome. Quasiformally, the logic can be as follows.

    If the rate of within-population horizontal transmission of the novel sequence (infection) is comparable to the inactivating mutation rate, then the mean time to extinction of this novel sequence is quite long and, importantly, increases with the increase of the population size N (fig. 5). In this case, the approximation (6) for quasistationary distributions is applicable. The analysis under this approximation shows that the novel sequence can penetrate a significant part of the recipient population (the most probable states are near 1 in fig. 6).

    If the mean time to extinction is long, then the appearance of this particular novel sequence in the population may not be a unique event. Accordingly, invasion has to be taken into account, and the results obtained for the true stationary distribution are valid.

    If invasion is included in the model, then, within a reasonable time span, the novel sequence can penetrate a significant part of population (fig. 2) and, eventually, the horizontally transferred gene may be fixed. It is interesting to note that the mean time required for significant penetration is dramatically less than the mean time of fixation (fig. 7) which could have substantial consequences for the fate of the population.

    FIG. 7.— The mean times required to reach different levels of penetration of a novel sequence in a population. The y axis shows the ratio of the mean time to the given level of penetration E{Tk} to the mean time to fixation E{TN}. Parameter values: s + = 0.02, u = 0.0015, = 0.001, N = 3500.

    When a sequence persists in a population for a long time and, especially, when it gets fixed, there is a chance that the acquired gene becomes beneficial or even indispensable (essential).

    The present analysis shows that taking into account the processes of within-population transmission (infection) and invasion leads to conclusions that are dramatically different from those of Berg and Kurland: if the rates of these processes are nonnegligible, horizontally transferred sequences do get fixed or at least persist in a significant part of the recipient population for a long time, even if they are neutral or slightly deleterious.

    The pressing question, then, is: just how likely is it that these additional processes occur at significant rates? Unfortunately, quantitative estimates are lacking which precludes us from supplementing the mathematical analysis of the model with empirical estimates as Berg and Kurland have done with regard to the inactivation rate (Berg and Kurland 2002). Qualitatively, however, biological data suggest that both processes can be mediated by several mechanisms such that their rates vary within extremely broad ranges and the gene flow could be intense under favorable conditions. Bacteria are known to exchange genes via conjugative plasmids and integrative and conjugative elements (Osborn and Boltner 2002; Grohmann, Muth, and Espinosa 2003; Bennett 2004; Burrus and Waldor 2004). Furthermore, it has been extensively documented that many bacteria and archaea possess the molecular machinery for nonspecific DNA uptake and are highly competent for transformation which is, indeed, considered to be a major mechanism of gene transfer between microbes (Lorenz and Wackernagel 1994; Dubnau 1999; Redfield 2001; Claverys and Martin 2003; Chen and Dubnau 2004). All these processes are most intense within a microbial population, contributing to a potentially high rate of "infection" in our model. However, they are by no means limited to the same prokaryotic species and have been shown to occur even between phylogenetically distant prokaryotes (Davison 1999; Paul 1999). Such interspecies gene transfer which, again, may be intense under conditions of physical proximity between diverse prokaryotes, e.g., in microbial mats, can be a major contribution to the process of invasion in our model. That the amount of apparent HGT critically depends on ecological and physical closeness of the purported donors and recipients has been born out by comparative genomics. In particular, hyperthermophilic bacteria carry a disproportionate number of genes thought to have been horizontally acquired from archaea (Aravind et al. 1998; Nelson et al. 1999; Worning et al. 2000; Nesbo et al. 2001). Conversely, mesophilic archaea, such as Halobacterium and, especially, Methanosarcina, contain numerous genes of apparent bacterial origin, many more than hyperthermophilic archaeal species (Koonin, Makarova, and Aravind 2001; Pennisi 2001; Deppenmeier et al. 2002; Koonin et al. 2002; Koonin 2003). Numerous studies in microbial ecology reveal a remarkable diversity of microbial communities (Kassen and Rainey 2004). Microbial communities show both dynamic behavior, which is linked to niche specialization (Kerr et al. 2002), and considerable temporal stability (Fernandez et al. 1999, 2000; Hashsham et al. 2000). Generally, these communities provide fertile ground for HGT, making the models with nonzero invasion relevant.

    The modeling results presented here strongly suggest that the main precept of "horizontal genomics," the crucial role of HGT in prokaryotic evolution, does not depend on the unrealistic assumption that all horizontally transferred genes that are fixed in microbial populations confer a strong selective advantage onto the recipient. We believe that this theoretical support for a major evolutionary impact of HGT is particularly important given how hard it is to obtain a rigorous proof for most HGT events. To strengthen the argument even further, it will be necessary to develop quantitative estimates for gene fluxes within and between prokaryotic populations which figure as "infection" and "invasion" in our model.

    Appendix

    Proof of Theorem 1

    First note that, if n/N x when N , then

    (a)

    To handle the product in (5), we note that

    where we used the fact that for small x and y. Taking the logarithm of (5), we obtain

    Here we used approximation for small x.

    Taking k = N/2 as a reference point and noting that as we obtain

    Using (a) we obtain the desired result. This completes the proof.

    Algorithm for Calculating a Quasistationary Distribution

    Here we follow N?sell (2001). We examine a birth-death process {X(t), t 0} with the finite state space {0, 1,..., N} where the origin is an absorbing state. The birth rate is denoted n and the death rate is denoted μn. The Kolmogorov forward equations for the state probabilities pn(t) = Pr{X(t) = n} can be written as

    To interpret this equation, we formally put The state probabilities depend on the initial distribution pn(0).

    Let us introduce two sequences n and n as follows:

    The state space of the birth and death process under consideration can be partitioned into the union of two subsets, one containing the absorbing state 0, and the other equal to the set of transient states Before absorption, the process assumes values in the set of transient states. If the process is conditioned on the event that absorption has not taken place at time t, then the conditional state probabilities qn(t) can be determined from the state probabilities pn(t) through the following relation:

    By differentiating this relation and using the Kolmogorov forward equations for pn(t), we can obtain a system of differential equation for qn(t). The quasistationary distribution q* is the stationary solution of this system of equations. The probabilities can be shown to satisfy the following system of difference equations:

    It can be shown that

    This is not an explicit solution because q1 can only be determined when all qn are known. However, this relation can be used in an iterative algorithm in order to obtain a quasistationary distribution. This algorithm starts with an arbitrary initial quasistationary distribution, employs this distribution as an input in the numerators of the terms that are summed over k, and solves the equation under the requirement that The iteration can be formally described as follows:

    where the superscript i denotes the iteration number. The process is repeated until the results of successive iterations are sufficiently close.

    Acknowledgements

    We thank Yuri Wolf for numerous helpful discussions and useful suggestions and Alex Kondrashov for critical reading of the manuscript.

    References

    Aravind, L., R. L. Tatusov, Y. I. Wolf, D. R. Walker, and E. V. Koonin. 1998. Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet. 14:442–444.

    Bennett, P. M. 2004. Genome plasticity: insertion sequence elements, transposons and integrons, and DNA rearrangement. Methods Mol. Biol. 266:71–113.

    Berg, O. G., and C. G. Kurland. 2002. Evolution of microbial genomes: sequence acquisition and loss. Mol. Biol. Evol. 19:2265–2276.

    Brown, J. R., J. Zhang, and J. E. Hodgson. 1998. A bacterial antibiotic resistance gene with eukaryotic origins. Curr. Biol. 8:R365–R367.

    Burrus, V., and M. K. Waldor. 2004. Shaping bacterial genomes with integrative and conjugative elements. Res. Microbiol. 155:376–386.

    Bushman, F. 2001. Lateral DNA transfer: mechanisms and consequences. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.

    Chen, I., and D. Dubnau. 2004. DNA uptake during bacterial transformation. Nat. Rev. Microbiol. 2:241–249.

    Claverys, J. P., and B. Martin. 2003. Bacterial "competence" genes: signatures of active transformation, or only remnants? Trends Microbiol. 11:161–165.

    Davison, J. 1999. Genetic exchange between bacteria in the environment. Plasmid 42:73–91.

    Deppenmeier, U., A. Johann, T. Hartsch et al. (22 co-authors). 2002. The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J. Mol. Microbiol. Biotechnol. 4:453–461.

    Doolittle, W. F. 1999a. Phylogenetic classification and the universal tree. Science 284:2124–2129.

    ———. 1999b. Lateral genomics. Trends Cell Biol. 9:M5–M8.

    ———. 2000. Uprooting the tree of life. Sci. Am. 282:90–95.

    Dubnau, D. 1999. DNA uptake in bacteria. Annu. Rev. Microbiol. 53:217–244.

    Felsenstein, J. 1974. The evolutionary advantage of recombination. Genetics 78:737–756.

    Fernandez, A., S. Huang, S. Seston, J. Xing, R. Hickey, C. Criddle, and J. Tiedje. 1999. How stable is stable? Function versus community composition. Appl. Environ. Microbiol. 65:3697–3704.

    Fernandez, A. S., S. A. Hashsham, S. L. Dollhopf, L. Raskin, O. Glagoleva, F. B. Dazzo, R. F. Hickey, C. S. Criddle, and J. M. Tiedje. 2000. Flexible community structure correlates with stable community function in methanogenic bioreactor communities perturbed by glucose. Appl. Environ. Microbiol. 66:4058–4067.

    Goel, N. S., and N. Richter-Dyn. 1974. Stochastic models in biology. Academic Press, New York.

    Gogarten, J. P., W. F. Doolittle, and J. G. Lawrence. 2002. Prokaryotic evolution in light of gene transfer. Mol. Biol. Evol. 19:2226–2238.

    Grohmann, E., G. Muth, and M. Espinosa. 2003. Conjugative plasmid transfer in gram-positive bacteria. Microbiol. Mol. Biol. Rev. 67:277–301 .

    Hashsham, S. A., A. S. Fernandez, S. L. Dollhopf, F. B. Dazzo, R. F. Hickey, J. M. Tiedje, and C. S. Criddle. 2000. Parallel processing of substrate correlates with greater functional stability in methanogenic bioreactor communities perturbed by glucose. Appl. Environ. Microbiol. 66:4050–4057.

    Jain, R., M. C. Rivera, and J. A. Lake. 1999. Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci. USA 96:3801–3806.

    Kassen, R., and P. B. Rainey. 2004. The ecology and genetics of microbial diversity. Annu. Rev. Microbiol. 58:207–231.

    Kendall, D. G. 1949. Stochastic processes and population growth. J. Roy. Statist. Soc. Ser. B 1:230–264.

    Kerr, B., M. A. Riley, M. W. Feldman, and B. J. Bohannan. 2002. Local dispersal promotes biodiversity in a real-life game of rock-paper-scissors. Nature 418:171–174.

    Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge.

    Koonin, E. V. 2003. Horizontal gene transfer: the path to maturity. Mol. Microbiol. 50:725–727.

    Koonin, E. V., L. Aravind, and A. S. Kondrashov. 2000. The impact of comparative genomics on our understanding of evolution. Cell 101:573–576.

    Koonin, E. V., K. S. Makarova, and L. Aravind. 2001. Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 55:709–742.

    Koonin, E. V., K. S. Makarova, Y. I. Wolf, and L. Aravind. 2002. Horizontal gene transfer and its role in the evolution of prokaryotes. Pp. 277–304 in M. Syvanen and C. I. Kado, eds. Horizontal gene transfer. Academic Press, San Diego.

    Koonin, E. V., A. R. Mushegian, M. Y. Galperin, and D. R. Walker. 1997. Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol. 25:619–637.

    Koski, L. B., and G. B. Golding. 2001. The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol. 52:540–542.

    Kurland, C. G. 2000. Something for everyone. Horizontal gene transfer in evolution. EMBO Rep. 1:92–95.

    Kurland, C. G., B. Canback, and O. G. Berg. 2003. Horizontal gene transfer: a critical view. Proc. Natl. Acad. Sci. USA 100:9658–9662.

    Lawrence, J. G., and H. Hendrickson. 2003. Lateral gene transfer: when will adolescence end? Mol. Microbiol. 50:739–749.

    Lorenz, M. G., and W. Wackernagel. 1994. Bacterial gene transfer by natural genetic transformation in the environment. Microbiol. Rev. 58:563–602.

    Matis, J. H., and T. R. Kiffe. 1999. Effects of immigration on some stochastic logistic models: a cumulant truncation analysis. Theor. Popul. Biol. 56:139–161.

    Mazel, D., and J. Davies. 1999. Antibiotic resistance in microbes. Cell. Mol. Life Sci. 56:742–754.

    Moran, P. A. 1958. Random processes in genetics. Proc. Philos. Soc. Math. Phys. Sci. 54:60–71.

    Muller, H. J. 1964. The relation of recombination to mutational advance. Mutat. Res. 106:2–9.

    Nagylaki, T. 1992. Introduction to theoretical population genetics. Springer-Verlag, New York.

    N?sell, I. 2001. Extinction and quasi-stationarity in the Verhulst logistic model. J. Theor. Biol. 211:11–27.

    Nelson, K. E., R. A. Clayton, S. R. Gill et al. (29 co-authors). 1999. Evidence for lateral gene transfer between archaea and bacteria from genome sequence of Thermotoga maritima. Nature 399:323–329.

    Nesbo, C. L., S. L'Haridon, K. O. Stetter, and W. F. Doolittle. 2001. Phylogenetic analyses of two "archaeal" genes in Thermotoga maritima reveal multiple transfers between archaea and bacteria. Mol. Biol. Evol. 18:362–375.

    Ochman, H., J. G. Lawrence, and E. A. Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405:299–304.

    Osborn, A. M., and D. Boltner. 2002. When phage, plasmids, and transposons collide: genomic islands, and conjugative- and mobilizable-transposons as a mosaic continuum. Plasmid 48:202–212.

    Paul, J. H. 1999. Microbial gene transfer: an ecological perspective. J. Mol. Microbiol. Biotechnol. 1:45–50.

    Pennisi, E. 1998. Genome data shake tree of life. Science 280:672–674 .

    ———. 1999. Is it time to uproot the tree of life? Science 284:1305–1307 .

    ———. 2001. Microbial genomes. Sequences reveal borrowed genes. Science 294:1634–1635.

    Pielou, E. C. 1969. An introduction to mathematical ecology. Wiley-Interscience, New York.

    Pollet, P. K. 1996. Modeling the long-time behavior of evanescent ecological systems. Ecol. Model. 86:135–139.

    Ragan, M. A. 2001. On surrogate methods for detecting lateral gene transfer. FEMS Microbiol. Lett. 201:187–191.

    Redfield, R. J. 2001. Do bacteria have sex? Nat. Rev. Genet. 2:634–639.

    Rowe-Magnus, A. D., J. Davies, and D. Mazel. 2002. Impact of integrons and transposons on the evolution of resistance and virulence. Curr. Top. Microbiol. Immunol. 264:167–188.

    Tsirigos, A., and I. Rigoutsos. 2005. A new computational method for the detection of horizontal gene transfer events. Nucleic Acids Res. 33:922–933.

    Worning, P., L. J. Jensen, K. E. Nelson, S. Brunak, and D. W. Ussery. 2000. Structural analysis of DNA sequence: evidence for lateral gene transfer in thermotoga maritima. Nucleic Acids Res. 28:706–709.

    Wright, S. 1949. Adaptation and selection. Genetics, paleontology and evolution. Princeton University Press, Princeton, N.J.(Artem S. Novozhilov, Geor)