当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2005年 > 第9期 > 正文
编号:11258362
Population Genetics of Neisseria gonorrhoeae in a High-Prevalence Community Using a Hypervariable Outer Membrane porB and 13 Slowly Evolving
     * Department of Integrative Biology, Brigham Young University; Department of Pediatrics, Johns Hopkins University School of Medicine; Division of Infectious Diseases, Johns Hopkins University School of Medicine; and Department of Microbiology and Molecular Biology, Brigham Young University

    E-mail: keith_crandall@byu.edu.

    Abstract

    Baltimore, Md., is an urban community with a high prevalence of Neisseria gonorrhoeae. Due to partially protective immune responses, introduction of new strains from other host populations, and exposure of N. gonorrhoeae to antibiotics, the phenotypic and genotypic characteristics of the circulating strains can fluctuate over time. Understanding the overall genetic diversity and population structure of N. gonorrhoeae is essential for informing public health interventions to eliminate this pathogen. We studied gonococci population genetics in Baltimore by analyzing a hypervariable and strongly selected outer membrane porB gene and 13 slowly evolving and presumably neutral housekeeping genes (abcZ, adk, aroE, fumC, gdh, glnA, gnd, pdhC, pgm, pilA, ppk, pyrD, and serC) in 204 isolates collected in 1991, 1996, and 2001 from male and female patients of two public sexually transmitted diseases clinics. Genetic diversity (), recombination (C), growth (g), population structure, and adaptive selection under codon-substitution and amino acid property models were estimated and compared between these two gene classes. Estimates of the FST fixation index and the 2 test of sequence absolute frequencies revealed significant temporal substructuring for both gene types. Baltimore's N. gonorrhoeae populations have increased since 1991 as indicated by consistent positive values of g. Female patients showed similar or lower levels of and C than male patients. Within the MLST housekeeping genes, levels of and C ranged from 0.001–0.013 and 0.000–0.018, respectively. Overall recombination seems to be the dominant force driving evolution in these populations. All loci showed amino acid sites and physicochemical properties under adaptive (or positive-destabilizing) selection, rejecting the generally assumed hypothesis of stabilizing selection for these MLST genes. Within the porB gene, protein I B showed higher and C values than protein I A. Directional positive selection possibly mediated by the immune system operates to a significant extent in the protein I sequences, as indicated by the distribution of the positively selected sites in the surface-exposed loops. Thirteen amino acid physicochemical properties seem to drive protein evolution of the PI porins in N. gonorrhoeae.

    Key Words: Baltimore ? housekeeping genes ? MLST ? Neisseria gonorrhoeae ? population genetics ? porB

    Introduction

    Gonorrhea is a common bacterial infection that is transmitted primarily by sexual contact or perinatally (Handsfield and Sparling 1995). It affects the mucous membranes of the lower genital tract and less frequently those of the rectum, oropharynx, and conjunctiva. Ascending infection in women leads to the major complication of gonorrhea, pelvic inflammatory disease, a common cause of female infertility. Less common complications include bacteremic infections, neonatal conjunctivitis, and epididymitis. The causative organism of gonorrhea, Neisseria gonorrhoeae, is a fastidious gram-negative diplococcus, which closely resembles the related human pathogen, Neisseria meningitidis, as well as several commensal species. Humans are the only host of the gonococcus, and spread of the organism occurs only through direct person-to-person contact. Gonorrhea remains a major sexually transmitted infection worldwide (Gerbase et al. 1998). The Centers for Disease Control and Prevention (CDC) estimates that more than 700,000 persons in the United States get new gonorrheal infections each year with only half such cases being reported. In 2002, 351,852 cases of gonorrhea were reported to the CDC (Centers for Disease Control, National Center for HIV, STD and TB Prevention, Division of Sexually Transmitted Diseases: www.cdc.gov/std/Gonorrhea/STDFact-gonorrhea.htm). In the period from 1975 to 1997, the national gonorrhea infection rate declined, following the implementation of the national gonorrhea control program in the mid-1970s. After a small increase in 1998, the gonorrhea infection rate has decreased slightly since 1999. In 2002, the rate of reported gonorrheal infections was 125 per 100,000 persons (CDC 2002).

    Maynard Smith (1995) pointed out the need for population genetic insights when contemplating the evolutionary fate of infectious diseases. Population genetics is important in understanding the evolutionary history, epidemiology, and population dynamics of pathogens; the potential for and mode of the evolution of antibiotic resistance; and ultimately for public health control strategies. There are at least two levels at which population genetic studies on N. gonorrhoeae have been undertaken. The first is at a global level and considers the dynamics of the gonococci in the whole population of infected individuals (e.g., Posada et al. 2000). The second level, at a smaller scale, focuses on N. gonorrhoeae populations within an infected local community (e.g., Martin et al. 2004). The latter level is the focus of this study. We studied the evolution of N. gonorrhoeae in the well-characterized, high-prevalence urban community of Baltimore, Md. (Becker et al. 1998). Baltimore has the highest incidence rate of gonorrhea in the United States at nine times the national average (Jet 1998). Between January 1, 2001, and December 31, 2002, 9,910 gonorrhea episodes were reported to the STD Surveillance Unit of the Baltimore City Health Department (Bernstein et al. 2004). Of these, 1,512 episodes represented repeated infections. The number of unique individuals infected with gonorrhea during this 2-year period was 9,097, of which 699 were gonorrhea repeaters with two to five gonorrhea episodes.

    The phenotypic and genotypic characteristics of circulating N. gonorrhoeae strains can fluctuate over time from a local as well as a global perspective (Unemo et al. 2002, 2003). Thus, one could expect that epidemiological and demographic factors, such as partially protective immune responses, exposure to antibiotics, and introduction of new strains from other host populations, could cause the overall genetic diversity and population structure of N. gonorrhoeae in Baltimore to change over time. In a small study of repeat gonococcal infection, Hobbs et al. (1999) observed that all the patients reinfected with gonococci expressing identical porins were men, whereas all of the women were reinfected with gonococci expressing different porins. The long period of infection in females may result in greater stimulation of the immune system, allowing development of antigonococcal immunity. If this is the case, then women infected with gonococci may have populations of N. gonorrhoeae with higher levels of genetic diversity than male patients. To address both issues, we analyzed Baltimore gonorrhea populations within a time window spanning 10 years (1991–2001) and using infected male and female patients. The following specific questions were pursued: (1) are genetic diversity and levels of selection in N. gonorrhoeae changing over time? (2) is the population structured temporally, spatially, or based on the gender of the host? (3) is the population increasing, decreasing, or remaining constant? (4) is genetic diversity of N. gonorrhoeae higher in women than men?

    To answer these questions we used two different classes of molecular markers: porB as a marker, presumably under positive selection as it interacts with the host immune system, and housekeeping genes, which are presumably neutrally evolving genetic markers. porB encodes for protein I (PI), which is the major outer membrane protein of Neisseria. It functions as an anion-selective porin allowing the passage of small molecules through the outer membrane. The general structure of these porins consists of nine internal conserved regions separated by eight surface-exposed regions that are highly variable in both amino acid sequence and length (Carbonetti and Sparling 1987; Carbonetti et al. 1988; Ley et al. 1991; Feavers et al. 1992; Mee et al. 1993; Derrick et al. 1999). PI is constitutively expressed at high levels in all gonococci, is surfaced exposed, and elicits a strong immune reaction during infection (Ison 1988). This indicates that PI may play a crucial role in gonococcal interaction with host cells and in general with the immune system (Butt, Lambden, and Heckels 1990; Ley and Poolman 1992; Smith, Maynard Smith, and Spratt 1995; Urwin et al. 2002; Massari et al. 2003), affecting the transmission probability and the length of the infectious state and consequently influencing the growth rate. It is not surprising then that PI is considered as a potential vaccine target (Heckels, Virji, and Tinsley 1990; Elkins et al. 1992; Ley and Poolman 1992; Ley, Biezen, and Poolman 1995; Meitzner and Cohen 1997; Olesky, Hobbs, and Nicholas 2002). Therefore, it is important to have an understanding of the diversity of these antigens in gonococcal populations, the mechanisms by which they change, and the rate at which they change. In addition, porB is a single-copy gene and is thought to be stable within a given strain (Zak et al. 1984). Several studies in N. gonorrhoeae indicate that this gene is highly polymorphic and under strong diversifying selection possibly driven by the host immune system (Smith, Maynard Smith, and Spratt 1995; Fudyk et al. 1999; Posada et al. 2000; Unemo et al. 2003; Viscidi and Demma 2003).

    Although high levels of discrimination can be achieved by examining an individual locus that is highly variable within a bacterial population such as porB (Unemo et al. 2003), often better results are achieved by combining them with more conserved loci such as those encoding for core metabolic (housekeeping) proteins (e.g., Feavers et al. 1999; Robinson and Enright 2003). Recently, Viscidi and Demma (2003) have developed a multilocus sequence typing (MLST) scheme for N. gonorrhoeae based on 18 housekeeping genes. MLST housekeeping genes evolve very slowly and the accumulated mutations are largely assumed to be selectively neutral or deleterious (e.g., Li 1997; Feil, Enright, and Spratt 2000; Dingle et al. 2001; Feil et al. 2003; Meats et al. 2003; Urwin and Maiden 2003), although see Pérez-Losada et al. (2005) for a different opinion. Although only a small number of alleles can be identified within the population by using the variation at a single locus, high levels of resolution can be achieved by analyzing many loci. Here, we used 13 MLST housekeeping loci to complement and enhance porB sequencing to characterize genetic variation of N. gonorrhoeae populations. This allowed us to contrast the population dynamics of neutral genetic markers with that of a strongly selected gene such as the porB locus. Ideally, "data from both classes of genes should be used in concert to provide high-resolution localized epidemiology within the context of a comprehensive population framework" (Cooper and Feil 2004).

    The characterization of the molecular evolution of porB and MLST genes in a single high-prevalence community over a decade can offer insights into the evolutionary processes acting in N. gonorrhoeae and into their interconnections with epidemiology. To this end, we estimated several distinct population parameters such as genetic diversity, recombination, growth, population structure, and selection pressure for each gene class independently within a multivariable framework. We then related these parameters to the population dynamics of N. gonorrhoeae to better understand the role of each.

    Materials and Methods

    Isolate Collection, Partitions, and Genes

    Two hundred and four isolates of N. gonorrhoeae were obtained from male and female clients attending the Druid and Eastern Baltimore City Sexually Transmitted Diseases Clinics in 1991, 1996, and 2001 (see table 1). The strains were obtained from the clientele of both clinics, which are located on opposite sides of the city but serve similar populations. In preliminary analyses no significant genetic differences were observed between strains from the two different clinics, therefore, the strains were combined for subsequent analyses. The 204 isolates were analyzed together and partitioned by year and gender of the patient (table 1). All the analyses were done separately for each gene type (i.e., porB and housekeeping genes). We sequenced partial regions of 13 of the core housekeeping genes in Viscidi and Demma (2003), including abcZ, adk, aroE, fumC, gdh, glnA, gnd, pdhC, pgm, pilA, ppk, pyrD, and serC, and the hypervariable porB locus (table 2). We chose MLST genes based on our previous empirical survey of polymorphisms in N. gonorrhoeae housekeeping genes with the intent to include markers distributed throughout the N. gonorrhoeae genome. We also included the seven gene fragments used for N. meningitides MLST typing (abcZ, adk, aroE, fumC, gdh, pdhC, and pgm) to allow comparison with other data sets. Allelic profiles for all the loci and epidemiological information (collection date, clinic, and gender of the subject) for each isolate can be provided upon request. Alleles of the porB locus in N. gonorrhoeae have been assigned to two homology groups based on close sequence and immunological relationships and are designated as either PIA or PIB (Carbonetti et al. 1988). Alleles within each group are more similar to one another than they are to members of the other group (i.e., PIA and PIB form distinct monophyletic groups). Individual N. gonorrhoeae strains express either a PIA or a PIB allele (Smith, Maynard Smith, and Spratt 1995; Derrick et al. 1999), but PIB alleles always show a higher frequency than PIA alleles for a given population (Feavers and Maiden 1998). Our PIA and PIB data sets consisted of 36 and 168 isolates, respectively, and both were analyzed separately. Because of the small sample size of the PIA data set, the analysis of independent partitions was not performed. DNA extraction, polymerase chain reaction–based amplification, and sequencing conditions are provided in detail in Viscidi and Demma (2003) for the housekeeping genes and in Posada et al. (2000) for porB.

    Table 1 Estimates of Na, Ps, , , r, C, and g for 13 MLST and porB (PIB and PIA) Genes

    Table 2 Estimates of Na, Ps, , , r, C, nM3, nTS, and AA Properties for 13 MLST and porB (PIB and PIA) Genes

    Genetic Analysis

    Sequences were translated into amino acids using the universal reading frame in MacClade 4.05 (D. R. Maddison and W. P. Maddison 2000) and then aligned in ClustalX (Thompson et al. 1997) using the default options. Alignment was trivial for the 13 housekeeping genes and the PIA sequences because they had no indels. Gaps were, however, introduced by Clustal within the PIB sequences in order to keep their positional homology. PIB alignment was then slightly refined by eye. Housekeeping genes were analyzed independently and combined (unlinked gene regions). Additionally, for some phylogenetic-based analyses and to test for genetic differentiation of populations, the 13 housekeeping loci were concatenated (linked gene regions) into a single data set to increase phylogenetic and statistical signal.

    Models of nucleotide and codon substitution were assessed using the maximum likelihood approach described by Huelsenbeck and Crandall (1997) and implemented in Modeltest (Posada and Crandall 1998). Likelihood scores for each model were estimated in PAUP* 4.0b10 (Swofford 2003) and then compared through a series of hierarchical likelihood ratio (LRT) tests to determine the best-fit model. When two models are nested, twice the log-likelihood difference was compared with a 2 distribution where the degrees of freedom equal the difference in the number of free parameters between the two models. Recent simulation studies have shown that this approach performs well at recovering the true underlying model of evolution (Posada 2001; Posada and Crandall 2001), although see Posada and Buckley (2004) for a recent evaluation.

    As a visual test for population structure, evolutionary relationships among the housekeeping (combined and/or concatenated), PIA, and PIB alleles were initially explored under neighbor-joining (NJ; Saitou and Nei 1987), maximum likelihood (Felsenstein 1981), and Bayesian coupled with Markov chain Monte Carlo (Ronquist and Huelsenbeck 2003) procedures using single and mixed models. All these phylogenetic approaches assume that no recombination has occurred. Similar results were found under all methods, hence isolate phylogenies were estimated using the NJ method as implemented in PAUP*4.0b10 (Swofford 2003) under the best-fit model of nucleotide substitution for PIA, PIB, and the concatenated housekeeping genes. Confidence in the tree relationships was assessed using 1,000 replicates of the bootstrap procedure (Felsenstein 1985).

    Nucleotide sequence variation (Na = number of alleles; Ps = number of polymorphic sites; = nucleotide diversity per site) was estimated for each gene region separately. Genetic diversity (), recombination (r), and growth (or decline) rates (g) were estimated simultaneously for each region separately under the best-fit model of nucleotide substitution using the maximum likelihood coalescent approach implemented in LAMARC ver1.2.2 (Kuhner et al. 2004). Overall estimates were also calculated for the 13 MLST housekeeping genes combined. Genetic diversity can be mathematically defined as = 2Neμ, where Ne is the effective population size and μ is the neutral mutation rate per site per generation. Starting values of were estimated using the algorithm of Watterson (1975). The recombination (r) parameter is defined as the recombination rate per site per generation (c) divided by μ. Thus, r can be interpreted as the ratio of the per-site chance of recombination to the per-site chance of mutation (c/μ). An estimate of the diversity generated per recombination (C = 2Nec) can be obtained by multiplying r and . Growth rate (g) shows the relation between , which is now the estimate of modern-day population size, and population size in the past through the equation t = nowe–gt, where t is a time in the past. Positive values of g indicate the population has been growing, negative values indicate that it has been shrinking, and a zero value indicates that it has remained constant. Twenty initial chains 1,000 steps long and two final chains 10,000 steps long were performed during the parameter search. All chains were sampled each 20th generation, and 5,000 and 1,000 genealogies were discarded as burn-in for the initial and final chains, respectively. Search performance was diagnosed by monitoring the estimated parameters, log likelihoods, acceptance rates, and genealogies discarded. Because the variances for the estimates of , r, and g are ill defined, a statistical test to compare the resulting values is not straightforward. Nevertheless, information about their possible error can be deduced by using the "percentile profiling" option in LAMARC. Percentile profiling gives information about the shape of the likelihood curve, including both how accurate and how tightly correlated estimates for the different parameters are (see LAMARC's manual for details). It generates approximate confidence intervals around the estimate of each parameter, but those are only asymptotically correct, thus we cannot interpret these values as absolute confidence intervals (Kuhner et al. 2004). Percentile profiles are computationally very expensive to calculate, hence they only could be estimated for the porB data sets.

    As in a previous N. gonorrhoeae population study by our group (Posada et al. 2000), we also used a categorical and a quantitative approach to test for genetic differentiation of populations in the 13 concatenated housekeeping genes and the PI homology groups of the porB gene. The categorical analysis consisted of a 2 test of sequence absolute frequencies at each isolate partition (Hudson, Boos, and Kaplan 1992). Although this test was originally proposed to detect geographical structure, Achaz et al. (2004) demonstrated the high robustness of this procedure for testing temporal population structure. Simulations by these authors showed that the test of Hudson, Boos, and Kaplan (1992) is robust to variation in samples sizes, asymmetry of sample sizes, and mutation rates, and to the presence or absence of recombination and that the power to detect temporal structure is high. Due to bias with low expected values, the P value was obtained by simulating the null distribution of no subdivision (10,000 permutations) using the algorithm of Roff and Bentzen (1989) implemented in the program CHIPERM (David Posada, available upon request). The quantitative analyses consisted of a molecular analysis of variance (Excoffier, Smouse, and Quattro 1992) performed using ARLEQUIN ver2.000 (Schneider et al. 1997). The FST fixation index was also estimated using ARLEQUIN.

    The effect of natural selection at the molecular level is usually studied by comparing the fixation rates of nonsynonymous (amino acid–altering) and synonymous (silent) substitutions within a maximum likelihood phylogenetic framework (Yang et al. 2000). A measure that has featured prominently in such studies is the nonsynonymous/synonymous substitution rate ratio ( = dN/dS) or acceptance rate (Miyata and Yasunaga 1980). measures the selective pressure at the protein level, with = 1 suggesting neutral evolution, < 1 purifying selection, and > 1 suggesting diversifying positive selection. We inferred the extent of selection in the concatenated housekeeping and porB genes by estimating per gene and per site and the proportion of sites (p) with > 1 using the codon-based nested models M0 (one ratio), M1 (neutral), M2 (selection), M3 (discrete), M7 (beta), and M8 (beta & ) of Goldman and Yang (1994) and Yang et al. (2000). Model likelihood scores were compared using a LRT as described above. M2 (three parameters) and M3 (five parameters) are more general than model M0 or M1 (one parameter each) and can be compared with M0 and M1. Similarly, M7 (two parameters) is a special case of model M8 (four parameters) and can be compared the same way. When is greater than 1 in M2, M3, or M8, positively selected sites are inferred from the data. We also applied the empirical Bayesian approach implemented by Nielsen and Yang (1998) to identify the potential sites under diversifying selection as indicated by a posterior probability (pP) > 0.95. These sites were then mapped onto the different porin domains (external loops and internal regions). All the previous analyses were carried out in PAML 3.14b3 (Yang 1997) and were performed under initial values > and < 1, as recommended by the author. Here, we report the estimates obtained under the best likelihood scores.

    McClellan et al. (2004) have recently shown using conservative cytochrome b sequences that dN/dS ratios are less sensitive to detecting single adaptive amino acid changes than methods that evaluate positive selection in terms of the amino acid properties that compromise proteins. Hence, we also estimated adaptive selection in terms of 31 quantitative biochemical properties using the model of McClellan and McCracken (2001) as implemented in TreeSAAP ver3.2 (Woolley et al. 2003). This allowed us to compare both approaches under an ample range of conserved and strongly selected genes. Based on a phylogenetic tree, the model of McClellan and McCracken establishes first a chronology of observable molecular evolutionary events. The frequency of these events is then analyzed in order to identify (1) amino acid properties that may have radically changed more often than expected by chance (presumably due to selection promoting the occurrence of radical amino acid replacements) and (2) amino acid sites associated with selection, thus establishing a correlation between the sites of positive selection and the structure and function of the protein. We followed the general procedure outlined in McClellan et al. (2004) in combination with a new analytical algorithm implemented in TreeSAAP ver3.2 and briefly described below. In this study we are particularly interested in detecting molecular adaptation, selection that results in radical structural, or functional shifts in local regions of the protein. To this end, the range of possible changes in an amino acid property was divided into eight magnitude categories, with numbers 6, 7, and 8 denoting radical changes. An amino acid property is said to be affected by adaptive selection (referred to as positive-destabilizing selection) when the frequency of changes in magnitude categories 6 to 8 significantly exceed the frequency(s) expected by chance, as indicated by z-scores > 2.326 (P < 0.01). Gene regions implicated as being influenced by positive-destabilizing selection were subsequently identified for each selected property, using a sliding window procedure. A window size of 20 codons (approximately the size of the PI domains) was chosen, and its statistical significance was determined using a Bonferroni correction ( = 0.001; 99.9% confidence). Finally, particular amino acid residue sites potentially affected by positive-destabilizing selection (z-scores > 1.645; P < 0.05) were mapped onto gene regions (P < 0.001).

    Results

    A total of 2,856 DNA sequences were generated. Unique sequences were deposited in GenBank under the accession numbers AY860982–AY861137 and novel abcZ, adk, aroE, fumC, gdh, pdhC, and pgm sequences were reported to the MLST database (http://www.mlst.net). The best-fit models of DNA substitution were TrN + I + (Tamura and Nei 1993) for the 13 concatenated housekeeping genes (6,636 sites): base frequencies () A = 0.242, C = 0.293, G = 0.267, and T = 0.198, substitution rates (r) rCT = 5.386, rCG = 6.28, rAT = 1.0, rAG = 1.0, and rAC = 11.36, invariable sites (I) = 0.98, shape parameter of the gamma distribution () = 0.569); K80 + (Kimura 1980) for PIA (477 sites): transition/transversion ratio (ti/tv) = 1.98, = 0.005; and K80 + I + for PIB (561 sites): ti/tv = 2.06, I = 0.83, = 1.809. Using these three models we built the NJ trees shown in figures 1 and 2. All trees showed single clades of isolates from the same time period or gender, but overall, no well-defined phylogenetic structure based on year of collection or gender was detected. Similar results were obtained using the network phylogenetic approach proposed by Templeton, Crandall, and Sing (1992) that accounts for recombination (data not shown). We also constructed NJ trees using isolates collected in 1991, 1996, and 2001 from male hosts and isolates collected in 1996 from male and female hosts, but no clear temporal- or gender-based phylogenetic structuring was observed (data not shown). Finally, for the MLST genes, we performed a new NJ analysis using the 10 genes showing the lowest recombination rates (C < 0.01; see table 2), but again no clear structuring was noticeable.

    FIG. 1.— Neighbor-joining tree of 13 concatenated MLST housekeeping genes using a TrN + I + model and midpoint rooting. Bootstrap proportions (if >50%) are shown over the branches and are based on 1,000 bootstrap replicates. Branch lengths are shown proportional to the amount of change along the branches.

    FIG. 2.— Neighbor-joining trees of the PIB (A) and PIA (B) sequences using the K80 + I + and K80 + models, respectively, and midpoint rooting. Bootstrap proportions (if >50%) are shown over the branches and are based on 1,000 bootstrap replicates. Branch lengths are shown proportional to the amount of change along the branches.

    Polymorphism (Na and Ps), nucleotide diversity (), genetic diversity (), recombination (r and C), and growth (g) estimates are shown for the housekeeping genes combined and the PIA and PIB homology groups of the porB gene in table 1 and individually for each MLST housekeeping gene in table 2. In table 1, sequences are also partitioned by date of collection and gender of subject, with the exception of the PIA data set due to its small sample size. Overall, the housekeeping sequences showed lower genetic diversity and recombination rates than the PI sequences. Some MLST genes presented very low levels of polymorphism (table 2), which probably gives individually fragile parameter estimates. However, overall these genes add useful information to the final estimates, making them less biased than estimates just based on the most polymorphic genes (upward bias). Overall, recombination seems to be the dominant force driving MLST evolution, but results varied among genes. and C estimates between year and gender partitions were similar, and g showed extremely high positive values (i.e., population is increasing) overall and for each time point.

    Within the PI sequences, PIB presented greater polymorphism and genetic diversity due to mutation () and recombination (C) than PIA, although PIA also included fewer sequences. Nevertheless, if data sets of similar sizes are compared (e.g., 1991M-PIB or 1996F-PIB vs. PIA), PIA still showed significantly lower values for and C, as indicated by the LAMARC's approximate 5% and 95% confidence intervals. C values were similar between PIB year partitions (0.013–0.014); but 1996M showed greater levels of (0.072) than 1991M (0.039) and 2001M (0.053), and so lower r (ratio of the per-site chance of recombination to the per-site chance of mutation, c/μ). LAMARC's approximate 5% and 95% confidence intervals around the estimates of for 1996M and 1991M did not overlap, hence suggesting that the previous estimates are significantly different. To correct for the greater size of the 1996M sample, we split the 62 isolates by clinic (31 each) and reestimated . Isolates from Druid and Eastern clinics showed values of 0.088 and 0.070, respectively, and are still higher than those observed for 1991M and 2001M. Gonorrhea sequences from female patient isolates (1996F) showed clearly lower values of C and (though not significant based on LAMARC's approximate confidence intervals) than sequences from male patient isolates, either considered as a whole or split between clinics. Hence, both the porB and housekeeping genes do not support the prediction that women infected with gonococci may have populations of N. gonorrhoeae with higher levels of genetic diversity than male patients. Growth levels (g) were positive for all PIB samples overall and for each time point. The 1996M sample presented higher g values than 1991M and 2001M. The g values were negative (i.e., population is decreasing) for the PIA sequences. This result may reflect a decrease in the expression of PIA alleles in the Baltimore gonococci population, as also indicated by the low number of isolates presenting this homology PI group (36 isolates) compared to those presenting the PIB group (168 isolates). Nevertheless, g estimation is very sensitive to small sample size, and hence this result could be spurious. For the same reason, confidence intervals are especially wide for the g parameter because there is always less information available in the data for estimating growth than for estimating any other parameter.

    The categorical analysis (Hudson, Boos, and Kaplan 1992) for detecting genetic subdivision was significant (P < 0.05) for all the year comparisons and gene data sets (table 3). Similar results were observed for the FST fixation index for the MLST and PIB sequences, although some of the pairwise comparisons for PIB were not significant. This indicates that the distribution of haplotypes among partitions is not random. All the gender comparisons for both tests were nonsignificant, which suggests no genetic differentiation between male and female populations. The analysis of molecular variance showed that most of the variation (90%–100%) for the MLST and PI genes is located within partitions.

    Table 3 FST and 2 Estimates for 13 MLST and porB (PIB and PIA) Genes

    Likelihood values and parameter estimates for different codon-substitution models of selection are shown on table 4. The MLST genes are moderately conserved, with an average ratio between 0.25 and 0.29 among all but the worst-fitting models. This indicates that a nonsynonymous mutation has 25%–29% as much chance as a synonymous mutation of being fixed in the population. Hence, on average, purifying selection dominates the evolution of these genes. The one-ratio model (M0) did not reveal the existence of sites under diversifying selection ( < 0.25), but M0 was clearly rejected when compared with any of the more-general models that allow for variable ratios among sites. Model M1 (neutral) also fits the data better than model M0. All models that allow for positively selected sites do suggest existence of such sites in the MLST genes. In fact, all of them indicate the presence of a small proportion of sites (0.45%–1.55%) under strong ( = 5.3) or very strong ( > 27) positive selection. This is also reflected by the parameter estimates from the beta model (p = 0.005; q = 0.05), which suggests that the distribution of over sites is L shaped. LRTs comparing M2 and M3 with M1 or M0, and M7–M8 were all very highly significant (P < 0.001). Model M3 fit the data the best. In sum, all the models provide consistent and significant evidence for the existence of positive selected sites in the MLST housekeeping genes. The Bayesian approach identified 34 positively selected sites (pP > 0.95) under model M3 and the same 15 sites under models M2 and M8, all included in the 34 found by model M3. Hence, although these three models are constructed very differently, they produced congruent results concerning the identification of positively selected sites. Model M3 appears to be more powerful than the other models at detecting positively selected sites, as previously reported (e.g., Yang et al. 2000). Although sites with > 1 were identified in all MLST genes but fumC and pdhC, interestingly the distribution of those sites was not uniform (see table 2). Some differences in selective pressure () and proportion of sites with > 1 (p) were observed between year and gender partitions (e.g., 1991M2M3 = 99.7, 1991Mp2M3 = 0.1 vs. 1996M2M3 = 19.5, 1996Mp2M3 = 1.4). These differences may not be as large as they seem because it is very difficult for the LRT to distinguish a small proportion of very strongly selected sites from a large proportion of weakly (or strongly) selected sites (Yang et al. 2000). The number of positively selected sites (n) detected for the 1996M partition were 1.5–1.9 and 1.1–1.6 times higher than those detected for the 1991M and 2001M partitions, respectively, and 1.1–1.4 times higher than those detected for the 1996F partition.

    Table 4 –lnL, , p, n for 13 Concatenated MLST and porB (PIB and PIA) Genes

    TreeSAAP identified 39 codons under positive-destabilizing selection at the P = 0.001 significance level. This included all the housekeeping genes but adk. Associated with these codons, 18 physiochemical properties showed radical changes (see table 2). PAML and TreeSAAP agreed reasonably well at detecting positively selected sites; all sites identified by the Bayes approach except for 6 were also identified by TreeSAAP as potential sites under positive selection. Some of the significant sites detected by TreeSAAP were also detected by PAML, but they were nonsignificant and vice versa. The distribution of those sites per housekeeping locus was not uniform (table 2). In general, TreeSAAP identified more sites per gene under adaptive selection than PAML, which, assuming the inference is correct, confirms its greater sensitivity (see McClellan et al. 2004).

    Within the PI sequences, the ratio averaged over all sites ranged from 3.64 (model M0) to 5.51 for the PIA sequences and from 1.12 (model M0) to 1.94 for the PIB sequences among models, except for models M1 and M7. The last two models gave lower and less reliable estimates as they do not account for positively selected sites. The strictly neutral model (M1) fit the data much better than the model of one for all sites, except for the PIA sequences (table 3). But all the models that allow for > 1 fit the data equally well and significantly (P < 0.001) better than models that do not allow for it. Models M2, M3, and M8 detected a substantial proportion of positively selected sites (12.54%–14.54% for PIB and 24.13% for PIA). These sites seem to be under strong ( = 5.5 for PIB and = 7.4 for PIA) or very strong ( > 12.5 for PIB and > 22.8 for PIA) positive selection. The beta distribution (M7) has a U shape (p 0.01; q 0.02) for both PI groups, possibly because the model is forced to attempt to account for multiple sites with > 1. Twenty-four positively selected sites were located under model M3 for the PIB sequences and the same 20 sites under models M2 and M8, all included in the 24 found by model M3. The same 29 positively selected sites were found under the three previous models for PIA. Hence, although both homology groups seem to be under strong diversifying selection, the PIA group appears to be under greater selection intensity, as indicated by Smith, Maynard Smith, and Spratt (1995), Posada et al. (2000), and Unemo et al. (2003). Nevertheless, our results must be interpreted with caution considering the low Na value of the PIA data set (see Anisimova, Bielawski, and Yang 2001). p, , and n parameters showed similar values among PIB year partitions, but female isolates showed consistently lower values of n for all models.

    TreeSAAP identified 27 codons in the PIB sequences at the P = 0.001 significance level under positive-destabilizing selection. All sites identified by the Bayes approach in PAML except for four were also identified by TreeSAAP as potential sites under positive selection. As before, some of the significant sites detected by TreeSAAP were also detected by PAML, but they were nonsignificant and vice versa. Associated with these codon changes, 12 physiochemical properties showed radical changes (table 5). TreeSAAP only reported 10 sites with > 1 in PAML, affecting three physicochemical properties in the PIA sequences (table 2). This may reflect a difference in performance between both approaches under small trees: PAML is probably overestimating n (Anisimova, Bielawski, and Yang 2002), while TreeSAAP is more conservative. We find this result interesting from a methodological point of view, but it warns us to interpret PIA estimates of adaptive selection cautiously. The distributions of positively selected sites (table 5) detected by TreeSAAP and model M3 in PAML in the PIB data set were different among external (loops) and internal (trans) regions, accumulating more sites in the exposed regions (nM3 = 20; nTS = 19) than in the internal regions (nM3 = 4; nTS = 8). A similar pattern was observed for PIA (19 external and 10 internal) under M3 in PAML.

    Table 5 Distribution of AA Sites and Physicochemical Properties Under Positive Selection in PIA and PIB

    It has been shown that amino acid mutations occurring at residues 120 and 121 (loop 3) of the N. gonorrhoeae PI porins (22 and 23 in our shorter sequences) confer resistance to penicillin and tetracycline (Olesky, Hobbs, and Nicholas 2002). Alteration of residues 120 and 121 of PIB to aspartic acids (Asp) or a single amino acid mutation of residue 120 to lysine confers full intermediate-level antibiotic resistance. A single Asp mutation, instead, at either position 120 or 121 confers only partial resistance to the antibiotics. In contrast, 120Asp and 121Asp mutations in PIA confer only a small increase in resistance to penicillin and no resistance to tetracycline, and a 121Lys mutation increases both antibiotic resistance, but at a lower level than in PIB. Our analyses indicate that both amino acid residues in PIB and residue 120 in PIA (Bayesian approach) are under strong level of adaptive selection. Asp and Lys residues were found at positions 120 and/or 121 in the PIB porins of 68 (40.5%) isolates (table 6), which indicates antibiotic resistance. However, only Asp and glycine residues were found at positions 120 (75.0% isolates) and 121 (100% isolates), respectively, in the PIA porins (table 6), which indicates only modest resistance to penicillin and no resistance to tetracycline. For the PIB porins, the frequency of Asp and Lys residues at positions 120 and/or 121 did not differ significantly among time partitions based on a 2 test (P = 0.54; 10,000 permutations). However, a two-tailed Fisher's exact test showed significant differences (P = 0.022) between 1996M (37.1%) and 1996F (64.3%).

    Table 6 Identification of PIB and PIA Amino Acid Mutations at Residues 120 and/or 121

    Discussion

    Neisseria gonorrhoeae Population Genetics

    Neisseria gonorrhoeae has been proposed to have a nonclonal population structure, being effectively panmictic (Maynard Smith et al. 1993; O'Rourke and Stevens 1993; Spratt et al. 1995; Spratt and Maiden 1999; Feil et al. 2001). This is due mainly to recombination, presumably mediated by genetic transformation (O'Rourke et al. 1995), which has been shown to be extensive in Neisseria for both housekeeping (Viscidi and Demma 2003; Pérez-Losada et al. 2005) and porB (Feavers et al. 1992; Vázquez et al. 1993; Spratt et al. 1995; Hobbs et al. 1999; Posada et al. 2000) genes. Recombination in bacterial sequences makes them particularly difficult to study from a phylogenetic perspective because it invalidates the usual interpretation that the tree branches represent ancestral lineages. The MLST and PI sequences showed shallow phylogenetic structuring within the data (figs. 1 and 2), but no basal temporal- or gender-based clustering of isolates was observed. Because we have reduced the possibility of error due to inappropriate models of evolution, this lack of deep phylogenetic structure is likely to be a consequence of extensive recombination in these genes and, for the housekeeping genes, their low polymorphism. In fact, it is possible that the very shape of the trees imply recombination because under complete linkage the expected tree should have long internal branches and short external ones (Achaz et al. 2004). Even network approaches such as the Templeton, Crandall, and Sing (1992) method, which accounts for moderate levels of recombination among haplotypes and low levels of variation (Templeton et al. 2000), did not generate "clear" evolutionary hypotheses. All cladograms for single or concatenated genes produced several loops (presumably due to recombination) and no significant partitioning of haplotypes by time or gender group.

    The results of the FST fixation index and the categorical tests (table 3) for both the MLST and PIB sequences, however, showed strong and significant population differences among years, no matter the gender of the subject. Significant differences between PIA isolates were also observed under the categorical analysis. All pairwise comparisons, but the FST1991M–2001M pair for PIB, resulted in significant differences. Moreover, isolates collected in 1996 also showed significantly higher levels of genetic diversity () than those collected in 1991 and 2001 for PIB (table 1), larger growth rates, and, for the MLST genes, higher numbers of potential sites under > 1 for all models (table 4). Therefore, as compared to the phylogenetic analysis, which failed to show clear structuring, our quantitative and qualitative approaches suggest that there is temporal subdivision among N. gonorrhoeae samples from 1991 to 2001, with most of the variation located within year partitions. The latter evidence indicates that local population structuring can be discerned even within a globally panmictic species such as Neisseria. Similar observations have been reported before for this taxon (Gutjahr et al. 1997; Posada et al. 2000; Martin et al. 2004) and other bacterial pathogens with higher levels of mutation and recombination such as Helicobacter pylori (Achtman et al. 1999; Falush et al. 2003).

    Positive growth levels for MLST and PIB sequences estimated at three different time points and overall indicate that the N. gonorrhoeae population from Baltimore has been growing from 1991 to 2001, although at different rates. Despite the difference in growth rates between both gene types (table 1) modern-day and past estimates of population size based on both markers may not be as big as the g values suggest, considering the different mutation rates of both gene types. The growth of the gonococcal population, temporal structuring, and population differences among years may reflect a number of epidemiological changes that have occurred in Baltimore between 1990 and 2000. First, the crack-cocaine epidemic began in the early 1990s. The epidemic resulted in substantial migration of individuals from the suburbs into Baltimore, mostly revolving around the marketing of crack cocaine. Second, a major shift in populations with endemic gonococcal infection occurred due to the implosion of obsolete high-rise housing projects from 1993 to 1995. The loss of this housing led to the dispersion of formerly very dense and for the most part isolated social networks. Third, although the population of Baltimore decreased in the 1990s, the N. gonorrhoeae case morbidity did not decrease until late in the decade. This is attributable to the fact that many persons who formerly lived in Baltimore city, and moved to the suburbs, still sought care in the sexually transmitted disease (STD) clinics. Finally, there is anecdotal evidence for increased travel along the eastern seaboard, with the development of cheap airfares and expansion of public transportation. This travel was especially associated with drug trafficking. Collectively, these changes allowed for the dissemination of strains within a larger community and afforded the opportunity for additional social mixing. Another issue to consider in the positive growth rates and structuring of N. gonorrhoeae during this time period would be the opening of drug niches through secular trends in antibiotic usage. For example, at the beginning of the decade, penicillinase-producing strains of N. gonorrhoeae (PPNG) and strains carrying other plasmid-mediated determinants were very common. This led to the increased use of quinolones and cephalosporins. As PPNG and other plasmid-mediated resistance strains that had previously dominated the population disappeared, the gonococcal population may have diversified. Quinolone antibiotics were first introduced into clinical practice in Baltimore around 1995, and as of 2001 no resistant strains have been reported. However, more recently a few quinolone-resistant isolates have been detected. If these strains spread within the Baltimore community, we might expect to see a selective sweep and a reduction in population growth in the future.

    Based on the results of Hobbs et al. (1999), it could be postulated that women infected with gonococci might have populations of N. gonorrhoeae with higher levels of genetic diversity than male patients. The reasoning is that females have a longer asymptomatic period after infection relative to men, resulting in a longer period of infection. This longer period of infection may result in greater stimulation of the immune system, allowing the development of antigonococcal immunity, thereby driving the evolution of more diverse strains in females. This can lead to a prediction of increased genetic diversity for two reasons: (1) increased diversity through selection from the immune system and (2) increased diversity due to larger inbreeding effective population sizes. Our results do not support this hypothesis. All female isolates showed similar or lower levels of and C than male isolates, levels of selection ( and p) between both samples were similar, population differences were nonsignificant, and female isolates showed lower numbers of sites under > 1. These results suggest that this increased selection period with apparently similar selection pressure actually tends to reduce the overall genetic variation (as lower fitness variants are eliminated from the population), thus reducing genetic variation and effective population size.

    Molecular Evolution of the MLST Housekeeping Genes

    In general, all housekeeping genes presented low levels of polymorphism as reflected by the Na, Ps, and estimates. and C values ranged from 0.001 (e.g., abcZ) to 0.013 (pilA) and 0.000 (e.g., adk) to 0.018 (gnd), respectively (table 2), indicating differing levels of impact of mutation relative to recombination contributing to genetic diversity (c/μ ratios). Based on these estimates of sequence variation, the preferable loci for N. gonorrhoeae MLST would be fumC, gdh, glnA, gnd, pilA, pyrD, and serC. This arrangement differs in two genes from the set proposed by Viscidi and Demma (2003), although their choice was based on a sample of only 32 gonococcal isolates. A sample of four–eight core housekeeping loci is usually sufficient for the subdivision of most bacterial species into clonal complexes (Cooper and Feil 2004). Most variation within genes that encode essential metabolic enzymes, such as the MLST housekeeping genes, is considered neutral or deleterious due to functional constraints (e.g., Li 1997; Feil, Enright, and Spratt 2000; Dingle et al. 2001; Feil et al. 2003; Meats et al. 2003; Urwin and Maiden 2003). Adaptive evolution, if present, must be punctual. Analyses of selection pressure in 13 core housekeeping genes for 204 isolates of N. gonorrhoeae using a homogeneous model (M0; average over all codon sites) supports the view that purifying selection is the dominant force driving their evolution. However, the criterion of average > 1 is a very stringent one for detecting adaptive selection (Crandall et al. 1999). When the Neisseria MLST housekeeping genes are screened under models that account for heterogeneity among sites and test for radical changes in physicochemical properties, all loci reveal sites (or gene regions) under strong or very strong levels of significant adaptive (or positive-destabilizing) selection. This expands previous results by Pérez-Losada et al. (2005), who detected seven housekeeping genes under adaptive selection in a smaller gonococcal population (32 isolates) from the same area.

    Using the TreeSAAP approach we were able to determine that several physicochemical (table 2) properties seem to drive protein evolution of the MLST loci through local directional shifts in biochemical function, structure, or both. All of the above evidence, therefore, rejects the general view that molecular evolution of housekeeping genes is selectively neutral or deleterious and confirms conclusions by Viscidi and Demma (2003) and Pérez-Losada et al. (2005). As stated by the latter, studies reporting lack of diversifying selection in MLST housekeeping genes based on the average test for N. gonorrhoeae and similarly for other pathogens must be cautiously interpreted. Moreover, one should be aware of their lack of neutrality when used for population or molecular evolutionary studies. Nevertheless, we do not think that our findings invalidate the use of these MLST housekeeping genes for typing purposes. To the contrary, their use provides a rich and diversified source of information to infer population dynamics of bacterial populations. We agree with Cooper and Feil (2004) that "the exclusion of genes that do not conform to classical housekeeping criteria is an ill-afforded luxury."

    Molecular Evolution of porB

    PIA showed lower levels of polymorphism than PIB (table 1). Moreover, genetic diversity and recombination rates seem to be significantly greater in PIB than PIA, as reported before by Posada et al. (2000) and Unemo et al. (2003). However, the ratio of the per-site chance of recombination to the per-site chance of mutation (c/μ) indicates that both forces play a similar role in the evolution of these two homology groups, which disagrees with previous observations (Posada et al. 2000). Directional positive selection appears to be operating to a significant extent in the PI sequences (table 4). This is not a surprise given the intensive selection pressures put on these bacteria from the immune system and antibiotic treatment. The selective distribution of replacements and physicochemical properties under positive-destabilizing selection in the surface-exposed loops supports this idea (table 5). Olesky, Hobbs, and Nicholas (2002) showed that the amino acid replacements occurring at residues 120 and 121 of the N. gonorrhoeae PI porins confer resistance to penicillin and tetracycline. More than 40% of the PIB isolates presented mutations at these residues that conferred resistance to penicillin and tetracycline. A similar average percentage (47.1%) was observed by Olesky, Hobbs, and Nicholas (2002) in other natural populations. In our data set these mutations were observed significantly more often in isolates from women compared to those from men. There are two reasons why antibiotic resistance mutations might preferentially occur in isolates from women. First, women are more susceptible than men to urinary tract infections (UTIs) and therefore become exposed to multiple antibiotics through treatment of UTIs. Second, because of social classification in entitlement programs such as Medicaid, women are more likely to be covered for medical care through public assistance programs. Therefore women have greater access to medical care than men and are more likely to be treated with antibiotics for a wide range of illnesses. In addition, drug compliance may be poor in the population served by the STD clinics, and suboptimal doses of antibiotics increase the chances of selection for resistance. Further study of gender differences in antibiotic resistance–associated mutations in porB and other genes should be undertaken because the current national screening program for antibiotic-resistant gonococci is based on surveys that only include men.

    Based on the mutations observed at residues 120 (table 6) only modest resistance to penicillin and no resistance to tetracycline would be predicted for 75.0% of the PIA isolates. This confirms previous observations by Olesky, Hobbs, and Nicholas (2002). It is intriguing that a Lys mutation has not developed naturally in PIA proteins. Reasons for this may include biological differences between PIA and PIB proteins contributing to differing selection pressures (as reported here), including localization and serum-resistance properties (Olesky, Hobbs, and Nicholas 2002).

    A total of 12 different properties are potentially driving the evolution of the outer membrane PIB protein in N. gonorrhoeae, 6 associated with transmembrane regions and 12 with loop regions (table 5). Among them, molecular weight (Mw), partial specific volume (V0), and short- and medium-range nonbonded energy (Esm) affected four of the five transmembrane regions studied, and the average number of surrounding residues (Ns), ?-structure tendencies (P?), molecular volume (Mv), and surrounding hydrophobicity (Hp) affected three of the four external regions. Acceptance rate, proportion of sites with a > 1, and number of positively selected sites were greater for PIA than PIB under all of the likelihood models (table 4), which indicates that selection intensity seems to be higher for the PIA homology group. Although PIA estimates are surely inflated due to the small sample size, this result supports the earlier conclusions of Smith, Maynard Smith, and Spratt (1995), Posada et al. (2000), and Unemo et al. (2003), who related these selective sweeps between PIA and PIB to their epidemiological differences.

    Adaptive Selection or Recombination? A Methodological Consideration

    Maximum likelihood inference under codon-substitution or amino acid property–based models such as those implemented in PAML and TreeSAAP, respectively, relies on the phylogenetic relationships among the sequences and does not account for the presence of recombination. Some research has been done on how these two factors affect the LRT and Bayesian methods (see below), but no study has been published on how they may affect the performance of amino acid property–based models. In the case of recombination, intuitively one could expect that because TreeSAAP infers selection at the phenotypic level, its accuracy is independent of the force generating molecular change (mutation or recombination), and what really matters is if that physiochemical change is fixed or not (D. A. McClellan, personal communication). However, inference of radical changes is based on a well-corroborated estimation of phylogeny, which can be severely affected by recombination (Posada and Crandall 2002). Thus, an answer to this question cannot be provided at this point; clearly this issue needs further study.

    In the case of PAML, preliminary results reported by Yang et al. (2000) and Pérez-Losada et al. (2005) indicated that the LRTs and the inference of sites under positive selection do not seem to be sensitive to the assumed tree topology (a NJ tree in our analyses), even if a star tree is used. Hence, presumably, our results are not biased by whichever phylogenetic process (clonal, epidemic, or panmictic) drives the population structure of N. gonorrhoeae. High levels of recombination (C 0.01), however, seem to affect dramatically the accuracy of the LRT test and often mistakes recombination as evidence of positive selection (simulations by Anisimova, Nielsen, and Yang [2003] and Shriner et al. [2003]; although see Urwin et al. [2002] for a different opinion). Identification of sites under positive selection using the Bayesian approach appears to be less affected (Anisimova, Nielsen, and Yang 2003), though simulations by Shriner et al. (2003) indicate differently. Although controversial, the previous simulations thus raise some concern about the likelihood method used here for detecting positive selection under high levels of recombination, like in the N. gonorrhoeae genes and especially for porB. Apart from the fact that simulated and real data may behave differently, results obtained from both simulation studies are contradictory in their conclusions. The more extensive simulations by Anisimova, Nielsen, and Yang (2003) showed that high recombination rates ( = 0.01) affect dramatically the LRTs of M0–M3 and M1–M2, but much less the LRT of M7–M8 (positive selection was falsely detected in only 20% of replicates at = 5%). Recombination ( = 0.01) also inflated the estimates of under model M0, but the effect was minor. Under the same conditions, the Bayesian method predicted incorrectly 25% of the sites for M3, 9% for M8, and 5% for M2. However, when data were simulated at high levels ( = 6) of positive selection (although lower than those observed in our study) Bayesian site prediction becomes more accurate and powerful (concrete values are not reported). Finally, the authors also indicated that if the population is expanding (as it seems to be the case here based on our growth estimates), recombination would have less effect than that found in their study. Therefore, while possible accuracy problems due to the small size of the PIA tree (see Anisimova, Bielawski, and Yang 2001, 2002) may exist, we think that recombination is probably inflating our PAML estimates of positive selection for the PIB and MLST genes; although the observed values are so high, it is hard to believe that the LRTs are completely misleading in their conclusions, mainly for M7–M8 comparisons. Moreover, it has been shown that LRTs are conservative (Yang et al. 2000; Anisimova, Bielawski, and Yang 2001; Anisimova, Nielsen, and Yang 2003; Shriner et al. 2003), so genes inferred by the test to undergo positive selection are most likely to be true cases of adaptation rather than an artifact of the method, as proven in most of the published studies (e.g., Bishop, Dean, and Mitchell-Olds 2000; Yang, Swanson, and Vacquier 2000; Peek et al. 2001; Yang and Swanson 2002). On the Bayesian prediction of sites under > 1, we think that, at least for the housekeeping and PIB data sets, the PAML output is accurate. Most of the same sites were verified by TreeSAAP using a completely different approach, which also suggests that dN/dS ratios are a sensitive approach to detecting adaptive selection in genes under moderate or strong levels of diversifying selection. Moreover, the distribution of positively selected sites on the PIB topology is biologically meaningful, in spite of this gene showing the highest levels of recombination. In conclusion, we think that in the Neisseria genome one could expect convincing evidence for both adaptive selection and recombination as our analyses suggest. Nevertheless, because both parameters cannot be calculated simultaneously, the provided estimates are probably inaccurate under the current methods. Clearly newer methods that incorporate recombination into a coalescent codon-based model are needed (Anisimova, Nielsen, and Yang 2003).

    Acknowledgements

    We thank M. K. Kuhner, Z. Yang, and D. A. McClellan or their valuable advice on the use of LAMARC, PAML, and TreeSAAP, respectively. We gratefully acknowledge support from the National Institutes of Health grants R01 AI50217 (R.P.V., K.A.C.) and GM66276 (K.A.C.) and from Brigham Young University.

    References

    Achaz, G., S. Palmer, M. Kearney, F. Maldarelli, J. W. Mellors, J. M. Coffin, J. Wakeley. 2004. A robust measure of HIV-1 population turnover within chronically infected individuals. Mol. Biol. Evol. 21:1902–1912.

    Achtman, M., T. Azuma, D. E. Berg, Y. Ito, G. Morelli, Z. J. Pan, S. Suerbaum, S. A. Thompson, A. v. d. Ende, and L. J. v. Doorn. 1999. Recombination and clonal groupings within Helicobacter pylori from different geographical regions. Mol. Microbiol. 32:459–470.

    Anisimova, M., J. P. Bielawski, and Z. Yang. 2001. Accuracy and power of the likelihood ratio test to detect adaptive molecular evolution. Mol. Biol. Evol. 18:1585–1592.

    ———. 2002. Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19:950–958.

    Anisimova, M., R. Nielsen, and Z. Yang. 2003. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164:1229–1236.

    Becker, K. M., G. E. Glass, W. Brathwaite, and J. M. Zenilman. 1998. Geographic epidemiology of gonorrhea in Baltimore, Maryland, using a geographic information system. Am. J. Epidemiol. 147:709–716.

    Bernstein, K. T., F. C. Curriero, J. M. Jennings, G. Olthoff, E. J. Erbelding, J. Zenilman. 2004. Defining core gonorrhea transmission utilizing spatial data. Am. J. Epidemiol. 160:51–58.

    Bishop, J. G., A. M. Dean, and T. Mitchell-Olds. 2000. Rapid evolution in plant chitinases: molecular targets of selection in plant-pathogen coevolution. Proc. Natl. Acad. Sci. USA 97:5322–5327.

    Butt, N. J., P. R. Lambden, and J. E. Heckels. 1990. The nucleotide sequence of the por gene from Neisseria gonorrhoeae strain P9 encoding outer membrane protein PIB. Nucleic Acids Res. 18:4258.

    Carbonetti, N. H., V. I. Simnad, H. S. Seifert, M. So, and P. F. Sparling. 1988. Genetics protein I of Neisseria gonnorrhoeae: construction of hybrid porins. Genetics 85:6841–6845.

    Carbonetti, N. H., and P. F. Sparling. 1987. Molecular cloning and characterization of the structural gene for protein I, the major outer membrane protein of Neisseria gonorrhoeae. Genetics 84:9084–9088.

    Centers for Disease Control and Prevention. 2002. Sexually transmitted disease surveillance 2002 supplement: gonococcal isolate surveillance project (GISP) annual report—2002. Atlanta, Georgia: U.S. Department of Health and Human Services, October 2003.

    Cooper, J. E., and E. Feil. 2004. Multilocus sequence typing—what is resolved? Trends Microbiol. 12:373–377.

    Crandall, K. A., C. R. Kelsey, H. Imamichi, H. C. Lane, and N. P. Salzman. 1999. Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection. Mol. Biol. Evol. 16:372–382.

    Derrick, J. P., R. Urwin, J. Suker, I. M. Feavers, and M. C. J. Maiden. 1999. Structural and evolutionary inference from molecular variation in Neisseria porins. Infect. Immun. 67:2406–2413.

    Dingle, K. E., F. M. Colles, D. R. A. Wareing, R. Ure, A. J. Fox, F. E. Bolton, H. J. Bootsma, R. J. L. Willems, R. Urwin, and M. C. J. Maiden. 2001. Multilocus sequence typing for Campylobacter jejuni. J. Clin. Microbiol. 39:14–23.

    Elkins, C., N. H. Carbonetti, V. A. Varela, D. Stirewalt, D. G. Klapper, and P. F. Sparling. 1992. Antibodies to N-terminal peptides of gonococcal porin are bactericidal when gonococcal lipopolysaccharide is not sialylated. Mol. Microbiol. 6:2617–2628.

    Excoffier, L., P. E. Smouse, and J. M. Quattro. 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131:479–491.

    Falush, D., T. Wirth, B. Linz et al. (18 co-authors). 2003. Traces of human migrations in Helicobacter pylori populations. Science 299:1582–1585.

    Feavers, I. M., S. J. Gray, R. Urwin, J. E. Russell, J. A. Bygraves, E. B. Kaczmarski, and M. C. J. Maiden. 1999. Multilocus sequence typing and antigen gene sequencing in the investigation of a meningococcal disease outbreak. J. Clin. Microbiol. 37:3883–3887.

    Feavers, I. M., A. B. Heath, J. A. Bygraves, and M. C. J. Maiden. 1992. Role of horizontal genetic exchange in the antigenic variation of the class 1 outer membrane protein in Neisseria meningitidis. Mol. Microbiol. 6:489–495.

    Feavers, I. M., and M. C. J. Maiden. 1998. A gonococcal porA pseudogene: implications for understanding the evolution and pathogenicity of Neisseria gonorrhoeae. Mol. Microbiol. 30:647–656.

    Feil, E. J., J. E. Cooper, H. Grundmann et al. (12 co-authors). 2003. How clonal is Staphylococcus aureus? J. Bacteriol. 185:3307–3316.

    Feil, E. J., M. C. Enright, and B. G. Spratt. 2000. Estimating the relative contribution of mutation and recombination to clonal diversification: a comparison between Neisseria meningitidis and Streptococcus pneumoniae. Res. Microbiol. 151:465–469.

    Feil, E. J., E. C. Holmes, D. E. Bessen et al. (12 co-authors). 2001. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc. Natl. Acad. Sci. USA 98:182–187.

    Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368–376.

    ———. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791.

    Fudyk, T. C., I. W. Maclean, N. J. Simonsen, E. N. Njagi, J. Kimani, R. C. Brunham, and F. A. Plummer. 1999. Genetic diversity and mosaicism at the por locus of Neisseria gonorrhoeae. J. Bacteriol. 181:5591–5599.

    Gerbase, A. C., J. T. Rowley, D. H. Heymann, S. F. Berkley, and P. Piot. 1998. Global prevalence and incidence estimates of selected curable STDs. Sex. Transm. Infect. 74(Suppl. 1):S12–S16.

    Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725–736.

    Gutjahr, T. S., M. O'Rourke, C. A. Ison, and B. G. Spratt. 1997. Arginine-, hypoxanthine-, uracil-requiring isolates of Neisseria gonorrhoeae are a clonal lineage with a non-clonal population. Microbiology 143:633–640.

    Handsfield, H., and P. Sparling. 1995. Neisseria gonorrhoeae. Pp. 1909–1926 in G. Mandell, J. Bennett, and R. Dolin, eds. Principles and practice of infectious diseases. Churchill Livingstone, Philadelphia, Pa.

    Heckels, J. E., M. Virji, and C. R. Tinsley. 1990. Vaccination against gonorrhoeae: the potential protective effect of immunization with a synthetic peptide containing a conserved epitope of gonococcal outer membrane protein IB. Vaccine 8:225–230.

    Hobbs, M. M., T. M. Alcorn, R. H. Davis, W. Fischer, J. C. Thomas, I. Martin, C. Ison, P. F. Sparling, and M. S. Cohen. 1999. Molecular typing of Neisseria gonorrhoeae causing repeated infections: evolution of porin during passage within a community. J. Infect. Dis. 179:371–381.

    Hudson, R. R., D. D. Boos, and N. L. Kaplan. 1992. A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9:138–151.

    Huelsenbeck, J. P., and K. A. Crandall. 1997. Phylogeny estimation and hypothesis testing using maximum likelihood. Annu. Rev. Ecol. Syst. 28:437–466.

    Ison, C. A. 1988. Immunology of gonorrhoea. Pp. 95–116 in D. J. M. Wright, ed. Immunology of sexually transmitted diseases. Kluwer Academic, London.

    Jet. 1998. Baltimore has highest rate of syphilis and gonorrhea, CDC research shows; Mayor Schmoke responds to report—Centers for Disease Control and Prevention—brief article. Jet 28 (http://www.findarticles.com/p/articles/mi_m1355/is_5_95/ai_54724797).

    Kimura, M. 1980. A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120.

    Kuhner, M. K., J. Yamato, P. Beerli, L. P. Smith, E. Rynes, E. Walkup, C. Li, J. Sloan, P. Colacurcio, and J. Felsenstein. 2004. LAMARCver 1.2.1. University of Washington, Seattle (http://evolution.gs.washington.edu/lamarc.html).

    Ley, v. d. P., J. v. d. Biezen, and J. T. Poolman. 1995. Construction of Neisseria meningitidis strains carrying multiple chromosomal copies of the porA gene for use in the production of a multivalent outer membrane vesicle vaccine. Vaccine 13:401–407.

    Ley, v. d. P., J. E. Heckels, M. Virji, P. Hoogerhout, and J. T. Poolman. 1991. Topology of outer membrane porins in pathogenic Neisseria spp. Infect. Immun. 59:2963–2971.

    Ley, v. d. P., and J. T. Poolman. 1992. Construction of a multivalent meningococcal vaccine strain based on the class 1 outer membrane protein. Infect. Immun. 60:3156–3161.

    Li, W.-H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.

    Maddison, D. R, and W. P. Maddison. 2000. MacClade 4: analysis of phylogeny and character evolution. Sinauer Associates, Sunderland, Mass.

    Martin, I. M., C. A. Ison, D. M. Aanensen, K. A. Fenton, and B. G. Spratt. 2004. Rapid sequence-based identification of gonococcal transmission clusters in a large metropolitan area. J. Infect. Dis. 15:1497–1505.

    Massari, P., S. Ram, H. Macleod, and L. M. Wetzler. 2003. The role of porins in neisserial pathogenesis and immunity. Trends Microbiol. 11:87–93.

    Maynard Smith, J. 1995. Do bacteria have population genetics? Pp. 1–12 in S. Baumberg, J. P. W. Young, J. R. Saunders, and E. M. H. Wellington, eds. Population genetics of bacteria. Society for General Microbiology, Symposium 52. Cambridge University Press, London.

    Maynard Smith, J., N. H. Smith, M. O'Rourke, and B. G. Spratt. 1993. How clonal are bacteria? Proc. Natl. Acad. Sci. USA 90:4384–4388.

    McClellan, D. A., and K. G. McCracken. 2001. Estimating the influence of selection on the variable amino acid sites of the cytochrome B protein functional domain. Mol. Biol. Evol. 18:917–925.

    McClellan, D. A., E. J. Palfreyman, M. J. Smith, J. L. Moss, R. G. Christensen, and J. K. Sailsbery. 2004. Physicochemical evolution and molecular adaptation of the cetacean and artiodactyl cytochrome b proteins. Mol. Biol. Evol. 22:437–455.

    Meats, E., E. J. Feil, S. Stringer, A. J. Cody, R. Goldstein, J. C. Kroll, T. Popovic, and B. G. Spratt. 2003. Characterization of encapsulated and noncapsulated Haemophilus influenzae and determination of phylogenetic relationships by multilocus sequence typing. J. Clin. Microbiol. 41:1623–1636.

    Mee, B. J., H. Thomas, S. J. Cooke, P. R. Lambden, and J. E. Heckels. 1993. Structural comparison and epitope analysis of outer-membrane protein PIA from strains of Neisseria gonorrhoeae with differing serovar specificities. J. Gen. Microbiol. 139:2613–2620.

    Meitzner, T. A., and M. S. Cohen. 1997. Vaccines against gonococcal infection. Pp. 817–842 in M. M. Levine, G. C. Woodrow, J. B. Kaper, and G. S. Cobon, eds. New generation vaccines. Marcel Dekker, New York.

    Miyata, T., and T. Yasunaga. 1980. Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitution from homologous nucleotide sequences and its application. J. Mol. Evol. 16:23–36.

    Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to de HIV-1 envelope gene. Genetics 148:929–936.

    Olesky, M., M. Hobbs, R. A. Nicholas. 2002. Identification and analysis of amino acid mutations in porin IB that mediate intermediate-level resistance to penicillin and tetracycline in Neisseria gonorrhoeae. Antimicrob. Agents Chemother. 46:2811–2820.

    O'Rourke, M., C. A. Ison, A. M. Renton, and B. G. Spratt. 1995. Opa-typing: a high-resolution tool for studying the epidemiology of gonorrhoeae. Mol. Microbiol. 17:865–875.

    O'Rourke, M., and E. Stevens. 1993. Genetic structure of Neisseria gonorrhoeae populations: a non-clonal pathogen. J. Gen. Microbiol. 139:2603–2611.

    Peek, A. S., V. Souza, L. E. Eguiarte, and B. S. Gaut. 2001. The interaction of protein structure, selection, and recombination on the evolution of the type 1 fimbrial major submit (fimA) from Escheriachia coli. J. Mol. Evol. 52:193–204.

    Pérez-Losada, M., E. B. Browne, A. Madsen, T. Wirth, R. P. Viscidi, and K. A. Crandall. 2005. Population genetics of microbial pathogens estimated from multilocus sequence typing (MLST) data. Infect. Genet. Evol. (in press).

    Posada, D. 2001. The effect of branch length variation on the selection of models of molecular evolution. J. Mol. Evol. 52:434–444.

    Posada, D., and T. R. Buckley. 2004. Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst. Biol. 53:793–808.

    Posada, D., and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817–818.

    ———. 2001. A comparison of different strategies for selecting models of DNA substitution. Syst. Biol. 50:580–601.

    ———. 2002. The effect of recombination on the accuracy of phylogeny estimation. J. Mol. Evol. 54:396–402.

    Posada, D., Crandall, K. A., Nguyen, M., Demma, J. C., and R. P. Viscidi. 2000. Population genetics of the porB gene of Neisseria gonorrhoeae: different dynamics in different homology groups. Mol. Biol. Evol. 17:423–436.

    Robinson, D. A., and M. C. Enright. 2003. Evolutionary models of the emergence of methicillin-resistant Staphylococcus aureus. Antimicrob. Agents. Chemother. 47:3926–3934.

    Roff, D. A., and P. Bentzen. 1989. The statistical analysis of mitochondrial DNA polymorphisms: chi-square and the problem of small samples. Mol. Biol. Evol. 6:539–545.

    Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574.

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.

    Schneider, S., J.-M. Kueffer, D. Roessli, and L. Excofier. 1997. Arlequin: a software for population genetic data analysis. Genetics and Biometry Lab, Department of Anthropology, University of Geneva, Switzerland.

    Shriner, D., D. C. Nickle, M. A. Jensen, and J. I. Mullins. 2003. Potential impact of recombination on sitewise approaches for detecting positive natural selection. Genet. Res. Comb. 81:115–121.

    Smith, N. H., J. M. Maynard Smith, and B. G. Spratt. 1995. Sequence evolution of the porB gene of Neisseria gonorrhoeae and Neisseria meningitidis: evidence of positive Darwinian selection. Mol. Biol. Evol. 12:363–370.

    Spratt, B. G., and M. C. J. Maiden. 1999. Bacterial population genetics, evolution and epidemiology. Phil. Trans. R Soc. Lond. B 354:701–710.

    Spratt, B. G., N. H. Smith, J. Zhou, M. O'Rourke, and E. Feil. 1995. The population genetics of the pathogenic Neisseria. Pp. 143–160 in S. Baumberg, J. P. W. Young, E. M. H. Wellington, and J. R. Saunders, eds. Population genetics of bacteria. Press Syndicate of the University of Cambridge, Cambridge.

    Swofford, D. L. 2003. PAUP*: phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, Mass.

    Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10:512–526.

    Templeton, A. R., A. G. Clark, K. M. Weiss, D. A. Nickerson, E. Boerwinkle, and C. F. Sing. 2000. Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet. 66:69–83.

    Templeton, A. R., K. A. Crandall, and C. F. Sing. 1992. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics 132:619–633.

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876–4882.

    Unemo, M., P. Olcén, J. Albert, and H. Fredlund. 2003. Comparison of serologic and genetic porB-based typing of Neisseria gonorrhoeae: consequences for future characterization. J. Clin. Microbiol. 41:4141–4147.

    Unemo, M., P. Olcén, T. Berglund, J. Albert, and H. Fredlund. 2002. Molecular epidemiology of Neisseria gonorrhoeae: sequence analysis of the porB gene confirms presence of two circulating strains. J. Clin. Microbiol. 40:3741–3749.

    Urwin, R., E. C. Holmes, A. J. Fox, J. P. Derrick, and M. C. J. Maiden. 2002. Phylogenetic evidence for frequent positive selection and recombination in the meningococcal surface antigen porB. Mol. Biol. Evol. 19:1686–1694.

    Urwin, R., and M. C. J. Maiden. 2003. Multi-locus sequence typing: a tool for global epidemiology. Trends Microbiol. 11:479–487.

    Vázquez, J. A., L. De la Fuente, S. Berron, M. O'Rourke, N. H. Smith, J. Zhou, and B. G. Spratt. 1993. Ecological separation and genetic isolation of Neisseria gonorrhoeae and Neisseria meningitidis. Curr. Biol. 3:567–572.

    Viscidi, R. P., and J. C. Demma. 2003. Genetic diversity of Neisseria gonorrhoeae housekeeping genes. J. Clin. Microbiol. 41:197–204.

    Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256–276.

    Woolley, S., J. Johnson, M. J. Smith, K. A. Crandall, and D. A. McClellan. 2003. TreeSAAP: selection on amino acid properties using phylogenetic trees. Bioinformatics 19:671–672.

    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555–556 (http://abascus.gene.ucl.ac.uk/software/paml.html).

    Yang, Z., R. Nielsen, N. Goldman, and A.-M. K. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449.

    Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among sites classes. Mol. Biol. Evol. 19:49–57.

    Yang, Z., W. J. Swanson, and V. D. Vacquier. 2000. Maximum likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol. Biol. Evol. 17:1446–1455.

    Zak, K., J. L. Diaz, D. Jackson, and J. E. Heckels. 1984. Antigenic variation during infection with Neisseria gonorrhoeae: detection of antibodies to surface proteins in sera of patients with gonorrhea. J. Infect. Dis. 149:166–174.(Marcos Pérez-Losada*, Rap)