Single-Nucleotide Repeat Analysis for Subtyping Bacillus anthracis Isolates
http://www.100md.com
微生物临床杂志 2006年第3期
Chemical and Biological Defence Section, Defence R&D Canada—Suffield, Medicine Hat, AB
Department of Computer Science, University of Saskatchewan, Saskatoon, SK
Public Health Agency of Canada, Winnipeg, MB, Canada
ABSTRACT
Single-nucleotide repeats (SNRs) are variable-number tandem repeats that display very high mutation rates. In an outbreak situation, the use of a marker system that exploits regions with very high mutation rates, such as SNRs, allows the differentiation of isolates with extremely low levels of genetic diversity. This report describes the identification and analysis of SNR loci of Bacillus anthracis. SNR loci were selected in silico, and the loci with the highest diversity were used to design and test locus-specific primers against a number of B. anthracis strains with the same multilocus variable-number tandem repeat analysis (MLVA) genotype. SNR markers that allowed strains with the same MLVA genotype to be differentiated from each other were identified. The resulting SNR marker system can be used as a molecular epidemiological tool in a natural outbreak or bioterrorism event, offering the best chance of distinguishing very closely related isolates.
INTRODUCTION
Bacillus anthracis, the causative agent of anthrax, is a spore-forming bacterium endemic in soils throughout much of the world. Herbivores are the natural hosts, which become infected by contact with spore-containing soil. Humans are usually infected by exposure to infected animals or their products. Virulent strains of B. anthracis contain two virulence plasmids, pXO1 and pXO2. These plasmids contain genes that confer toxin production and capsule synthesis activities, respectively, although there are chromosomally encoded factors that are important for the full virulence of B. anthracis (10).
B. anthracis belongs to the B. cereus group and is most closely related to B. cereus and B. thuringiensis. Multilocus enzyme electrophoresis and fluorescent amplified fragment length polymorphism (AFLP) analysis of the B. cereus group revealed a high degree of genetic variability but failed to identify distinct groups (5, 19). Although B. cereus and B. thuringiensis are broadly interspersed across all branches of the AFLP phylogenetic tree, B. anthracis shows very low genetic diversity and clusters to a subbranch of the phylogenetic tree that is distinct from branches where other members of the B. cereus group cluster (6).
Due to recent bioterrorism events, there has been an increased interest in B. anthracis, especially in its identification, detection, and molecular subtyping. B. anthracis is considered to be evolutionarily "young," lacking character homoplasy and containing few single-nucleotide polymorphisms (SNPs) (15). This lack of homoplasy may be due to its life history, which includes long periods of time as dormant endospores. B. anthracis is among the most monomorphic pathogenic bacteria described. Molecular typing techniques commonly used to differentiate between strains of other species generally fail to discriminate between B. anthracis strains, including AFLP (6, 7), multilocus sequence typing (14), and pulsed-field gel electrophoresis (4).
Several molecular typing methods, including SNP analysis and multilocus variable-number tandem repeat analysis (MLVA), have been more successful in discriminating between B. anthracis strains and have allowed the exploration of its phylogenetics. SNPs are rare in B. anthracis, but molecular typing by the use of these polymorphisms is possible due to the availability of multiple whole-genome sequences. SNP phylogenetic markers are evolutionarily stable, with mutation rates of approximately 10–10 changes per nucleotide per generation (21). A set of canonical SNPs that distinguish the major clades of B. anthracis has been developed (9).
An MLVA method that exploits the copy number differences of nucleotide repeat sequences at six chromosomal loci and one locus for each of the two plasmids has been developed (8). MLVA loci have an increased mutation rate and a greatly increased number of allelic states compared to SNPs. .
B. anthracis isolates obtained during a natural outbreak or a bioterrorism event would have an extremely low level of genetic diversity. During such an event, canonical SNP analysis and MLVA may not distinguish isolates or closely related strains. To identify polymorphisms in populations with extremely low levels of genetic diversity, one could examine "hot spots," which are areas within the genome that have very high mutation rates. Single-nucleotide repeats (SNRs), also referred to as mononucleotide nucleotide repeats, are a type of variable-number tandem repeat (VNTR) that display very high mutation rates (as high as 6.0 x 10–4 mutations per generation) (9). Unlike some VNTR loci that have complicated repeat structures, SNRs are stretches of one kind of nucleotide that may vary in length between different bacterial isolates due to slip-strand mispairing (12). SNRs are more likely than other types of simple sequence repeats (SSRs) to undergo strand separation and base pair slippage, increasing the chance of slip-strand mispairing and causing a mutation at the SNR locus (1, 3). SSR analysis of Escherichia coli revealed that 93% of all mononucleotide repeats were A or T (3). The lower melting temperature, characteristic of A and T, increases the instability of the DNA helix, theoretically increasing the possibility of slip-strand mispairing, which may explain the A-T bias of SNRs (13, 20). SNRs have been identified in a number of bacterial species and have been used for multilocus sequence typing (2, 3, 16, 20). SNR markers have been suggested for use as part of a hierarchical typing scheme for B. anthracis; but their actual use, including target sequences or primer sequences, has not been described (9). This paper describes the discovery and analysis of SNR loci of B. anthracis, the comparison of these loci between B. anthracis strains with sequenced genomes, and the use of the most polymorphic loci as a way to differentiate isolates that are indistinguishable when they are analyzed by MLVA.
MATERIALS AND METHODS
Bacterial strains and DNA isolation. B. anthracis strains were from DRDC Suffield and the National Microbiology Laboratory, Public Health Agency of Canada (Table 1). The strains were grown overnight on sheep blood agar petri dishes at 37°C in 5% CO2. Strain DNA was isolated by using the MasterPure DNA & RNA purification kit (Epicenter Biotechnologies, Madison, WI), Phase lock gels (Eppendorf, Westbury, NY), the GNOME DNA isolation kit (QBiogene, Irvine, CA), or the like.
MLVA. MLVA of B. anthracis strain DNA was performed for the loci described by Keim et al. (8). The PCR mixtures contained 1x AmpliTaq gold PCR buffer and 0.5 U of AmpliTaq gold DNA polymerase (Applied Biosystems Inc., Foster City, CA), 2 mM MgCl2 (for amplification of the vrrA, vrrB1, vrrC1, and vrrC2 loci) or 4 mM MgCl2 (for amplification of the vrrB2, CG3, pXO1-att, and pXO2-at loci), deoxynucleoside triphosphates (dNTPs; 0.2 mM each), and forward and reverse primers (0.2 μM each). Approximately 2 ng of template DNA was used per 50-μl reaction mixture. A phosphoramidite fluorescent dye (6-carboxyfluorescein [FAM] or hexachlorofluorescein [HEX]), covalently linked to the forward primer, was used to allow direct analysis of the amplicons. If the amplicons were to be sequenced, unlabeled forward and reverse primers were used. The thermocycling conditions were 95°C for 5 min; 35 cycles of 94°C for 30 s, 60°C for 30 s, and 65°C for 30 s; and finally, 65°C for 7 min. HiDi formamide (8 μl) (Applied Biosystems Inc.) and 1 μl of the diluted PCR products were combined with 1 μl of size standard Rhodamine-X Mapmaker 70 to 400 bp and CST ROX 420-800 (BioVentures Inc., Murfreesboro, TN). These products were analyzed on an ABI 3100 genetic analyzer and were sized by using GeneMapper (Applied Biosystems Inc.).
Some of the MLVA amplicons were sequenced to establish the size of the amplicon and the VNTR. These MLVA PCR products were purified by using Montage PCR96 plates (Millipore, Nepean, Ontario, Canada) and were sequenced by using the MLVA primers as sequencing primers. All sequencing reactions were carried out in 20-μl reaction mixtures with Big Dye 3.1 Terminator chemistry (Applied Biosystems Inc.), and the sequences were analyzed on an ABI 3100 automated sequencer (Applied Biosystems Inc.). Contig assembly of the B. anthracis MLVA locus sequences was performed with Sequencher (Gene Codes Corp., Ann Arbor, MI).
Bioinformatics and primer design. The B. anthracis sequences used in the in silico SNR analysis are summarized in Table 2. The fasta format sequences were processed by use of a perl script to identify all A-T polymorphisms with lengths of >6 nucleotides; the position of each polymorphism was then used to create a primer3 input file. The primer3 input file contained 250 bases of sequence flanking each side of the putative polymorphism, which was specified as the target for primer design (18). Polymorphisms with less than 250 bases of flanking sequence on either side were not considered in this analysis. Primer3 was run with the default options, producing five primer pairs (primer pairs 0 to 4) for each amplicon. The primer3 output was processed by use of a perl script to extract the amplicon sequences and write them to a fasta file. The fasta file was then used as input for tgicl (http://www.tigr.org/tdb/tgi/publications/TGICL.pdf) to cluster similar amplicons. The tgicl software generated a cluster file (which assigns each sequence to a unique cluster without assembling the sequences). Any cluster that did not contain at least one polymorphism longer than 9 bases was then removed from the analysis, as it allowed a manageable data set representing the equivalent of three codons. Nei's marker diversity index (D) was calculated as 1 – (allele frequency)2 for each cluster by using predicted amplicon sizes for the first primer set (primer set 0).
Single-nucleotide repeat analysis. Primers for clusters with at least six members for chromosomal clusters and four members for plasmid clusters and a diversity index of >0.41 were synthesized with a phosphoramidite fluorescent dye (FAM or HEX) covalently linked to the forward primer (Table 3). The primers used to sequence these loci contained 5' M13 tails (forward primers) or T7 tails (reverse primers). The PCR mixtures contained 1x AmpliTaq gold PCR buffer and 0.5 U of AmpliTaq gold DNA polymerase (Applied Biosystems Inc.), 2 mM MgCl2, dNTPs (0.2 mM each), and forward and reverse primers (0.4 μM each). Approximately 2 ng of template DNA was used per 25-μl reaction mixture. The thermocycling conditions were 95°C for 5 min; 35 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 30 s; and finally, 72°C for 7 min. Phusion high-fidelity DNA polymerase (New England Biolabs, Inc., Pickering, Ontario, Canada) was also used to reduce split peak fragments. The PCR mixtures contained 1x Phusion HF buffer (with MgCl2), 1 U of Phusion DNA polymerase (New England Biolabs, Inc.), dNTPs (0.2 mM each), and forward and reverse primers (0.4 μM each). Approximately 2 ng of template DNA was used per 25-μl reaction mixture. The thermocycling conditions were 98°C for 30 s; 35 cycles of 98°C for 10 s, 60°C for 30 s, and 72°C for 30 s; and finally, 72°C for 7 min. PCR products containing phosphoramidite fluorescent dyes were diluted 1/80. HiDi formamide (8 μl) and 1 μl of the diluted PCR products were combined with 1 μl of size standard Rhodamine-X Mapmaker 70 to 400 bp and CST ROX 420-800 (BioVentures Inc.). These products were analyzed on an ABI 3100 genetic analyzer and sized by using GeneMapper (Applied Biosystems Inc.). When required, the amplicons were sequenced to confirm the polymorphisms, as described above. The PCR primers used to amplify the SNR regions of B. anthracis contained 5' tails (T7 or M13 primers). The T7 and M13 primers were used to sequence the PCR products.
RESULTS
In silico analysis of the SNR loci of B. anthracis revealed 74 regions within the B. anthracis genome that had A-T mononucleotide repeats that were greater than 9 bp for at least one strain used in the analysis. Sixty of these appeared to be polymorphic and to have at least one member of the cluster with a repeat region greater than 9 bp in length. Initial PCR screening of 39 candidate loci from the in silico analysis with D values ranging from 0 to 0.75 was performed with seven B. anthracis strains of various MLVA types. The level of polymorphism detected by sequencing and fragment analysis of the SNR loci correlated well with the D values determined by in silico analysis, in that the higher the D value was, the greater the number of alleles present at a given SNR locus. Most of the SNR loci of interest were chromosomal; the exceptions were four loci present on the pXO2 plasmid and one locus present on the pXO1 plasmid. Most of the repeats do not appear to be in coding regions of identified proteins, although several were found in hypothetical protein-coding regions. Their presence outside of coding regions is interesting, as it reduces the selective pressure to maintain the length of the repeats due to codon usage.
From this preliminary screen of 39 candidate loci, 29 loci exhibiting polymorphisms (Table 4) were screened against a larger panel of B. anthracis strains (Table 1). Seven SNR primer sets (for loci CL50, CL47, CL32, CL63, CL55, CL77, and CL35) inconsistently or poorly amplified the SNR loci and were not used further in this study. The remaining 22 SNR primer sets were able to reproducibly amplify the SNR loci and thus were used for the subsequent analysis of B. anthracis strains of established MLVA types (Table 5). Amplicon sizes were as expected by fragment size analysis and/or direct sequencing, as were the differences in amplicon lengths between strains compared to the lengths determined by in silico analysis, with some minor differences (Table 6). There were small differences between the expected and the observed amplicon lengths of ±1 to ±3 bp. This small difference in amplicon size was consistent within amplicons produced from the same primer pairs and is likely due to the 5' modification of the primer. Many SNR amplicons produced multiple split peaks when they were analyzed on the ABI 3100 genetic analyzer. This may be due in part to incomplete 3' adenylation of amplicons by Taq polymerase. Phusion polymerase was used with some success to reduce the occurrence of split peaks. Phusion polymerase produces blunt-end products and thus eliminates variation of the PCR products due to incomplete 3' adenylation. Analysis on the ABI 3100 genetic analyzer usually produced a cluster of two or four peaks within 3 bp of each other. The first and fourth peaks, when they were present, had substantially less fluorescence than the second and third peaks. The peak with the highest fluorescence in the cluster was used to size the amplicons. The polymorphic loci were sequenced directly to confirm the polymorphisms. Sequence analysis allowed the comparison of the nucleotide sequences of the loci; however, some loci (loci CL10, CL33, and CL12) were difficult to sequence directly, possibly due to the length of the repeat region. With these three loci, the data from in silico analysis and successful direct sequencing of several strains allowed the size of the SNR to be established to ±2 bp. Sequence analysis was able to distinguish between alleles of amplicons that had multiple SNR regions that masked the polymorphisms present at each SNR region (CL1 locus), since fragment analysis at this locus was not informative. Many of the SNR loci showed limited polymorphisms when they were used to screen the B. anthracis strains selected (Table 6).
The B. anthracis strains that were indistinguishable from each other by standard MLVA analysis (Table 5) were often distinctive at one or several SNR loci (Table 6). Several SNR loci had unique alleles for isolates of a specific MLVA genotype, although this may be due to the small sample sizes for these MLVA genotypes.
DISCUSSION
In an outbreak situation, the use of a marker system that exploits regions with very high mutation rates, such as SNRs, allows the differentiation of isolates with extremely low levels of genetic diversity. This paper describes the selection and the in silico analysis of SNR loci. Those loci with the highest diversity indices were selected and used to analyze a number of B. anthracis strains that had the same MLVA genotype. These SNR markers allowed most strains with the same MLVA type to be differentiated from each other.
Molecular typing of B. anthracis has been possible due to the exploitation of VNTRs by MLVA. VNTR mutation rates are low enough to maintain their sizing through 100,000 generations with only one change in allele size (8). MLVA mutation rates are locus dependent but have been reported to be between 10–5 and 10–4 per generation in B. anthracis and greater than 10–3 mutations per generation in other bacterial species (9). The use of additional MLVA markers beyond the eight markers used in this study may differentiate between the strains used in this experiment. However, some SNR markers have higher mutation rates and perhaps higher diversity index values than most MLVA markers and therefore offer the best chance of discriminating between isolates with low levels of genetic diversity. It may not be appropriate to use these markers to establish phylogenetic relationships among diverse isolates due to homoplasy because of the high mutation rates of these markers (9). These SNR markers are best used as a molecular epidemiological tool for examination of very closely related isolates that are indistinguishable by MLVA, thereby allowing one to distinguish closely related strains more accurately at the terminal branches of the phylogenetic tree.
Alternative molecular typing methods that could provide isolate discrimination include whole-genome sequencing and microarray-based resequencing. Whole-genome sequencing reveals polymorphisms for typing purposes by allowing comparisons of entire genomes from isolates of interest; however, this technique remains cost prohibitive and is not feasible for large sample sizes (17). Microarray-based resequencing of B. anthracis has been carried out with 56 strains of B. anthracis (22). Resequencing allows one to survey large areas of the genome for strain-specific variations; however, the proper selection of the regions to be represented on the chip is crucial, since only a portion of the genome is examined. Although this technique is well suited to large sample sizes, it is cost prohibitive and is dependent on which portions of the genome have been exploited.
As expected, there is a positive correlation between diversity (D) and the length of the repeat, since larger mononucleotide SSRs are more likely to undergo slip-strand mispairing, resulting in greater variability in repeat length (1). A highly significant correlation between total repeat length and the number of alleles has been described for B. anthracis with larger repeat units (11). In our study, some plasmid SNRs were among the most polymorphic markers evaluated. Unlike chromosomal loci, plasmid loci are present in multiple copies and the detection of transient states of SNRs may be possible.
Differences in the number of polymorphic SNR loci for strains with identical MLVA genotypes were observed. There were two locus differences for 9609/9614/93-189C; two locus differences between Vollum and Vollum 1B; and seven locus differences between 17T5 and SK31, although the pXO1 MLVA locus was polymorphic between the two strains. When nine other B. anthracis strains with the same MLVA genotype were compared (strain 9604, 9807, 9911, 03-0139, 03-0191, 9937, 9946, 94-188C, and 200077), 22 different alleles were identified at a combined seven loci. Seven of the nine strains were distinguished from each other by the use of four SNR loci (the CL10, CL12, CL33, and CL76 loci). While our study demonstrates that SNR analysis does allow strains with the same MLVA genotype to be further distinguished from each other, two sets of isolates were not readily distinguished from each other. One set included strains 9609 and 9614, which were from the same outbreak. It is interesting that strains 9937 and 9946 were isolated in the same year at the same location (Alhambra, Alberta, Canada) and had distinctive SNR genotypes. The other set of SNR identical isolates were 03-0191 and 03-0139, which were both isolated from bovines in Manitoba, Canada, in 2003; but the nature of their isolation is not clear (they may have been from the same outbreak or even the same animal). Although the use of the most polymorphic SNR markers may be a prudent first step in attempting to distinguish between several isolates with identical MLVA genotypes (the CL10, CL12, CL33, CL76, and CL60 loci), any one of the SNR markers (Table 6) may prove to be discriminatory between isolates. This technique is laborious and may not be suited to high-throughput automation; it uses specialized molecular typing equipment but can be easily adopted by laboratories that perform MLVA. This technique allows isolates of B. anthracis to be distinguished from each other when other typing methods fail to discriminate them; therefore, in epidemiological studies or in forensic investigations, this may be the only technique that offers the discriminatory power required.
Screening of the SNR markers (Table 3) against a more genetically and geographically diverse group of B. anthracis isolates to determine the full breadth of the SNR polymorphisms would be an important next step. Also, testing of these markers against a larger group of isolates with the same MLVA genotype is crucial in order to determine the utility of these markers for the differentiation of B. anthracis isolates and for epidemiological analysis of B. anthracis outbreaks.
ACKNOWLEDGMENTS
This work was supported by the Chemical, Biological, Radiological, Nuclear Research and Technology Initiative (CRTI project 02-0069RD).
The excellent technical assistance of D. Johnstone, T. MacMillan, and M. Russell is noted with appreciation. Thanks go to Barry Ford and John Cherwonogrodzky for reviewing this work.
REFERENCES
Coenye, T., and P. Vandamme. 2003. Simple sequence repeats and compositional bias in bipartite Ralstonia solanacearum GMI1000 genome. BMC Microbiol. 4:10. http://www.biomedcentral.com/1471-2164/4/10.
Diamant, E., Y. Palti, R. Gur-Arie, H. Cohen, E. M. Haller, and Y. Kashi. 2004. Phylogeny and strain typing of Escherichia coli, inferred from mononucleotide repeat loci. Appl. Environ. Microbiol. 70:2464-2473.
Gur-Arie, R., C. J. Cohen, Y. Eitan, L. Shelef, E. M. Hallerman, and Y. Kashi. 2000. Simple sequence repeats in Escherichia coli: abundance, distribution, composition and polymorphism. Genome Res. 10:62-71.
Harrell, L. J., G. L. Andersen, and K. H. Wilson. 1995. Genetic variability of Bacillus anthracis and related species. J. Clin. Microbiol. 33:1847-1850.
Helgason, E., D. A. Caugant, M. M. Lecadet, Y. H. Chen, J. Mahillon, A. Lvgren, I. Hegna, K. Kvaly, and A.-B. Kolst. 1998. Genetic diversity of Bacillus cereus/B. thuringiensis isolates from natural sources. Curr. Microbiol. 37:80-87.
Hill, K. K., L. O. Ticknor, R. T. Okinaka, M. Asay, H. Blair, K. A. Bliss, M. Laker, P. E. Pardington, A. P. Richardson, M. Tonks, D. J. Beecher, J. D. Kemp, A.-B. Kolst, A. C. Lee Wong, P. Keim, and P. J. Jackson. 2004. Fluorescent amplified fragment length polymorphism analysis of Bacillus anthracis, Bacillus cereus, and Bacillus isolates. Appl. Environ. Microbiol. 70:1068-1080.
Keim, P., A. Kalif, J. M. Schupp, K. K. Hill, S. E. Travis, K. Richmond, D. M. Adair, M. E. Hugh-Jones, C. R. Kuske, and P. Jackson. 1997. Molecular evolution and diversity in Bacillus anthracis as detected by amplified fragment length polymorphism markers. J. Bacteriol. 179:818-824.
Keim, P., L. B. Price, A. M. Klevytska, K. L. Smith, J. M. Schupp, R. Okinaka, P. J. Jackson, and M. E. Hugh-Jones. 2000. Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within Bacillus anthracis. J. Bacteriol. 182:2928-2936.
Keim, P., M. N. Van Ert, T. Pearson, A. J. Vogler, L. Y. Huynh, and D. M. Wagner. 2004. Anthrax molecular epidemiology and forensics: using the appropriate marker for different evolutionary scales. Infect. Genet. Evol. 4:205-213.
Koehler, T. M. 2002. Bacillus anthracis genetics and virulence gene regulation. Curr. Top. Microbiol. Immunol. 71:143-164.
Le Fleche, P., Y. Hauck, L. Onteniente, A. Prieur, F. Denoeud, V. Ramisse, P. Sylvestre, G. Benson, F. Ramisse, and G. Vergnaud. 2001. A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis. BMC Microbiol. 1:2. http://www.biomedcentral.com/1471-2180/1/2.
Levinson, G., and G. A. Gutman. 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4:203-221.
Moxon, E. R., and P. B. Rainey. 1995. Pathogenic bacteria: the wisdom of their genes, p. 255-268. In B. A. M. Van der Zeijst, L. Van Alphen, W. P. M. Hoekstra, and J. D. A. van Embden (ed.), Ecology of pathogenic bacteria. Royal Dutch Academy of Sciences, second series, no. 96. Royal Dutch Academy of Sciences, Amsterdam, The Netherlands.
Okinaka, R., K. Cloud, O. Hampton, A. Hoffmaster, K. Hill, P. Keim, T. M. Koehler, G. Lamke, S. Kumano, J. Mahillon, D. Manter, Y. Martinez, D. Ricke, R. Svensson, and P. J. Jackson. 1999. Sequence and organization of pXO1, the large Bacillus anthracis plasmid harboring the anthrax toxin genes. J. Bacteriol. 181:6509-6515.
Pearson, T., J. D. Busch, J. Ravel, T. D. Read, S. D. Rhoton, J. M. U'Ren, T. S. Simonson, S. M. Kachur, R. R. Leadem, M. L. Cardon, M. N. Van Ert, L. Y. Huynh, C. M. Fraser, and P. Keim. 2004. Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. Proc. Natl. Acad. Sci. USA 101:13536-13541.
Read, T. D., S. L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J. D. Busch, K. L. Smith, J. M. Schupp, D. Solomon, P. Keim, and C. M. Fraser. 2002. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science 296:2028-2033.
Read, T. D., S. N. Peterson, N. Tourasse, L. W. Baillie, I. T. Paulsen, K. E. Nelson, H. Tettelin, D. E. Fouts, J. A. Eisen, S. R. Gill, E. K. Holtzapple, O. A. Okstad, E. Helgason, J. Rilstone, M. Wu, J. F. Kolonay, M. J. Beanan, R. J. Dodson, L. M. Brinkac, M. Gwinn, R. T. Deboy, R. Madpu, S. C. Daugherty, A. S. Durkin, D. H. Haft, W. C. Nelson, J. D. Peterson, M. Pop, H. M. Khouri, D. Radune, J. L. Benton, Y. Mahamoud, L. Jiang, I. R. Hance, J. F. Weidman, K. J. Berry, R. D. Plaut, A. M. Wolf, K. L. Watkins, W. C. Nierman, A. Hazen, R. Cline, C. Redmond, J. E. Thwaite, O. White, S. L. Salzberg, B. Thomason, A. M. Friedlander, T. M. Koehler, P. C. Hanna, A. B. Kolsto, and C. M. Fraser. 2003. The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423: 81-86.
Rozen, S., and H. J. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist programmers, p. 365-386. In S. Krawetz and S. Misener (ed.), Bioinformatics methods and protocols: methods in molecular biology. Humana Press, Totowa, N.J.
Ticknor, L. O., A.-B. Kolst, K. K. Hill, P. Keim, M. T. Laker, M. Tonks, and P. J. Jackson. 2001. Fluorescent amplified fragment length polymorphism analysis of Norwegian Bacillus cereus and Bacillus thuringiensis soil isolates. Appl. Environ. Microbiol. 67:4863-4873.
van Belkum, A., S. Scherer, L. van Alphen, and H. Verbrugh. 1998. Short-sequence DNA repeats in prokaryotic genomes. Microbiol. Mol. Biol. Rev. 62:275-293.
Vogler, A. J., J. D. Busch, S. Percy-Fine, C. Tipton-Hunton, K. L. Smith, and P. Keim. 2002. Molecular analysis of rifampin resistance in Bacillus anthracis and Bacillus cereus. Antimicrob. Agents Chemother. 46:511-513.
Zwick, M. E., F. Mcafee, D. J. Cutler, T. D. Read, J. Ravel, G. R. Bowman, D. R. Galloway, and A. Mateczun. 2004. Microarray-based resequencing of multiple Bacillus anthracis isolates. Genome Biol. 6:R10.(Chad W. Stratilo, Christo)
Department of Computer Science, University of Saskatchewan, Saskatoon, SK
Public Health Agency of Canada, Winnipeg, MB, Canada
ABSTRACT
Single-nucleotide repeats (SNRs) are variable-number tandem repeats that display very high mutation rates. In an outbreak situation, the use of a marker system that exploits regions with very high mutation rates, such as SNRs, allows the differentiation of isolates with extremely low levels of genetic diversity. This report describes the identification and analysis of SNR loci of Bacillus anthracis. SNR loci were selected in silico, and the loci with the highest diversity were used to design and test locus-specific primers against a number of B. anthracis strains with the same multilocus variable-number tandem repeat analysis (MLVA) genotype. SNR markers that allowed strains with the same MLVA genotype to be differentiated from each other were identified. The resulting SNR marker system can be used as a molecular epidemiological tool in a natural outbreak or bioterrorism event, offering the best chance of distinguishing very closely related isolates.
INTRODUCTION
Bacillus anthracis, the causative agent of anthrax, is a spore-forming bacterium endemic in soils throughout much of the world. Herbivores are the natural hosts, which become infected by contact with spore-containing soil. Humans are usually infected by exposure to infected animals or their products. Virulent strains of B. anthracis contain two virulence plasmids, pXO1 and pXO2. These plasmids contain genes that confer toxin production and capsule synthesis activities, respectively, although there are chromosomally encoded factors that are important for the full virulence of B. anthracis (10).
B. anthracis belongs to the B. cereus group and is most closely related to B. cereus and B. thuringiensis. Multilocus enzyme electrophoresis and fluorescent amplified fragment length polymorphism (AFLP) analysis of the B. cereus group revealed a high degree of genetic variability but failed to identify distinct groups (5, 19). Although B. cereus and B. thuringiensis are broadly interspersed across all branches of the AFLP phylogenetic tree, B. anthracis shows very low genetic diversity and clusters to a subbranch of the phylogenetic tree that is distinct from branches where other members of the B. cereus group cluster (6).
Due to recent bioterrorism events, there has been an increased interest in B. anthracis, especially in its identification, detection, and molecular subtyping. B. anthracis is considered to be evolutionarily "young," lacking character homoplasy and containing few single-nucleotide polymorphisms (SNPs) (15). This lack of homoplasy may be due to its life history, which includes long periods of time as dormant endospores. B. anthracis is among the most monomorphic pathogenic bacteria described. Molecular typing techniques commonly used to differentiate between strains of other species generally fail to discriminate between B. anthracis strains, including AFLP (6, 7), multilocus sequence typing (14), and pulsed-field gel electrophoresis (4).
Several molecular typing methods, including SNP analysis and multilocus variable-number tandem repeat analysis (MLVA), have been more successful in discriminating between B. anthracis strains and have allowed the exploration of its phylogenetics. SNPs are rare in B. anthracis, but molecular typing by the use of these polymorphisms is possible due to the availability of multiple whole-genome sequences. SNP phylogenetic markers are evolutionarily stable, with mutation rates of approximately 10–10 changes per nucleotide per generation (21). A set of canonical SNPs that distinguish the major clades of B. anthracis has been developed (9).
An MLVA method that exploits the copy number differences of nucleotide repeat sequences at six chromosomal loci and one locus for each of the two plasmids has been developed (8). MLVA loci have an increased mutation rate and a greatly increased number of allelic states compared to SNPs. .
B. anthracis isolates obtained during a natural outbreak or a bioterrorism event would have an extremely low level of genetic diversity. During such an event, canonical SNP analysis and MLVA may not distinguish isolates or closely related strains. To identify polymorphisms in populations with extremely low levels of genetic diversity, one could examine "hot spots," which are areas within the genome that have very high mutation rates. Single-nucleotide repeats (SNRs), also referred to as mononucleotide nucleotide repeats, are a type of variable-number tandem repeat (VNTR) that display very high mutation rates (as high as 6.0 x 10–4 mutations per generation) (9). Unlike some VNTR loci that have complicated repeat structures, SNRs are stretches of one kind of nucleotide that may vary in length between different bacterial isolates due to slip-strand mispairing (12). SNRs are more likely than other types of simple sequence repeats (SSRs) to undergo strand separation and base pair slippage, increasing the chance of slip-strand mispairing and causing a mutation at the SNR locus (1, 3). SSR analysis of Escherichia coli revealed that 93% of all mononucleotide repeats were A or T (3). The lower melting temperature, characteristic of A and T, increases the instability of the DNA helix, theoretically increasing the possibility of slip-strand mispairing, which may explain the A-T bias of SNRs (13, 20). SNRs have been identified in a number of bacterial species and have been used for multilocus sequence typing (2, 3, 16, 20). SNR markers have been suggested for use as part of a hierarchical typing scheme for B. anthracis; but their actual use, including target sequences or primer sequences, has not been described (9). This paper describes the discovery and analysis of SNR loci of B. anthracis, the comparison of these loci between B. anthracis strains with sequenced genomes, and the use of the most polymorphic loci as a way to differentiate isolates that are indistinguishable when they are analyzed by MLVA.
MATERIALS AND METHODS
Bacterial strains and DNA isolation. B. anthracis strains were from DRDC Suffield and the National Microbiology Laboratory, Public Health Agency of Canada (Table 1). The strains were grown overnight on sheep blood agar petri dishes at 37°C in 5% CO2. Strain DNA was isolated by using the MasterPure DNA & RNA purification kit (Epicenter Biotechnologies, Madison, WI), Phase lock gels (Eppendorf, Westbury, NY), the GNOME DNA isolation kit (QBiogene, Irvine, CA), or the like.
MLVA. MLVA of B. anthracis strain DNA was performed for the loci described by Keim et al. (8). The PCR mixtures contained 1x AmpliTaq gold PCR buffer and 0.5 U of AmpliTaq gold DNA polymerase (Applied Biosystems Inc., Foster City, CA), 2 mM MgCl2 (for amplification of the vrrA, vrrB1, vrrC1, and vrrC2 loci) or 4 mM MgCl2 (for amplification of the vrrB2, CG3, pXO1-att, and pXO2-at loci), deoxynucleoside triphosphates (dNTPs; 0.2 mM each), and forward and reverse primers (0.2 μM each). Approximately 2 ng of template DNA was used per 50-μl reaction mixture. A phosphoramidite fluorescent dye (6-carboxyfluorescein [FAM] or hexachlorofluorescein [HEX]), covalently linked to the forward primer, was used to allow direct analysis of the amplicons. If the amplicons were to be sequenced, unlabeled forward and reverse primers were used. The thermocycling conditions were 95°C for 5 min; 35 cycles of 94°C for 30 s, 60°C for 30 s, and 65°C for 30 s; and finally, 65°C for 7 min. HiDi formamide (8 μl) (Applied Biosystems Inc.) and 1 μl of the diluted PCR products were combined with 1 μl of size standard Rhodamine-X Mapmaker 70 to 400 bp and CST ROX 420-800 (BioVentures Inc., Murfreesboro, TN). These products were analyzed on an ABI 3100 genetic analyzer and were sized by using GeneMapper (Applied Biosystems Inc.).
Some of the MLVA amplicons were sequenced to establish the size of the amplicon and the VNTR. These MLVA PCR products were purified by using Montage PCR96 plates (Millipore, Nepean, Ontario, Canada) and were sequenced by using the MLVA primers as sequencing primers. All sequencing reactions were carried out in 20-μl reaction mixtures with Big Dye 3.1 Terminator chemistry (Applied Biosystems Inc.), and the sequences were analyzed on an ABI 3100 automated sequencer (Applied Biosystems Inc.). Contig assembly of the B. anthracis MLVA locus sequences was performed with Sequencher (Gene Codes Corp., Ann Arbor, MI).
Bioinformatics and primer design. The B. anthracis sequences used in the in silico SNR analysis are summarized in Table 2. The fasta format sequences were processed by use of a perl script to identify all A-T polymorphisms with lengths of >6 nucleotides; the position of each polymorphism was then used to create a primer3 input file. The primer3 input file contained 250 bases of sequence flanking each side of the putative polymorphism, which was specified as the target for primer design (18). Polymorphisms with less than 250 bases of flanking sequence on either side were not considered in this analysis. Primer3 was run with the default options, producing five primer pairs (primer pairs 0 to 4) for each amplicon. The primer3 output was processed by use of a perl script to extract the amplicon sequences and write them to a fasta file. The fasta file was then used as input for tgicl (http://www.tigr.org/tdb/tgi/publications/TGICL.pdf) to cluster similar amplicons. The tgicl software generated a cluster file (which assigns each sequence to a unique cluster without assembling the sequences). Any cluster that did not contain at least one polymorphism longer than 9 bases was then removed from the analysis, as it allowed a manageable data set representing the equivalent of three codons. Nei's marker diversity index (D) was calculated as 1 – (allele frequency)2 for each cluster by using predicted amplicon sizes for the first primer set (primer set 0).
Single-nucleotide repeat analysis. Primers for clusters with at least six members for chromosomal clusters and four members for plasmid clusters and a diversity index of >0.41 were synthesized with a phosphoramidite fluorescent dye (FAM or HEX) covalently linked to the forward primer (Table 3). The primers used to sequence these loci contained 5' M13 tails (forward primers) or T7 tails (reverse primers). The PCR mixtures contained 1x AmpliTaq gold PCR buffer and 0.5 U of AmpliTaq gold DNA polymerase (Applied Biosystems Inc.), 2 mM MgCl2, dNTPs (0.2 mM each), and forward and reverse primers (0.4 μM each). Approximately 2 ng of template DNA was used per 25-μl reaction mixture. The thermocycling conditions were 95°C for 5 min; 35 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 30 s; and finally, 72°C for 7 min. Phusion high-fidelity DNA polymerase (New England Biolabs, Inc., Pickering, Ontario, Canada) was also used to reduce split peak fragments. The PCR mixtures contained 1x Phusion HF buffer (with MgCl2), 1 U of Phusion DNA polymerase (New England Biolabs, Inc.), dNTPs (0.2 mM each), and forward and reverse primers (0.4 μM each). Approximately 2 ng of template DNA was used per 25-μl reaction mixture. The thermocycling conditions were 98°C for 30 s; 35 cycles of 98°C for 10 s, 60°C for 30 s, and 72°C for 30 s; and finally, 72°C for 7 min. PCR products containing phosphoramidite fluorescent dyes were diluted 1/80. HiDi formamide (8 μl) and 1 μl of the diluted PCR products were combined with 1 μl of size standard Rhodamine-X Mapmaker 70 to 400 bp and CST ROX 420-800 (BioVentures Inc.). These products were analyzed on an ABI 3100 genetic analyzer and sized by using GeneMapper (Applied Biosystems Inc.). When required, the amplicons were sequenced to confirm the polymorphisms, as described above. The PCR primers used to amplify the SNR regions of B. anthracis contained 5' tails (T7 or M13 primers). The T7 and M13 primers were used to sequence the PCR products.
RESULTS
In silico analysis of the SNR loci of B. anthracis revealed 74 regions within the B. anthracis genome that had A-T mononucleotide repeats that were greater than 9 bp for at least one strain used in the analysis. Sixty of these appeared to be polymorphic and to have at least one member of the cluster with a repeat region greater than 9 bp in length. Initial PCR screening of 39 candidate loci from the in silico analysis with D values ranging from 0 to 0.75 was performed with seven B. anthracis strains of various MLVA types. The level of polymorphism detected by sequencing and fragment analysis of the SNR loci correlated well with the D values determined by in silico analysis, in that the higher the D value was, the greater the number of alleles present at a given SNR locus. Most of the SNR loci of interest were chromosomal; the exceptions were four loci present on the pXO2 plasmid and one locus present on the pXO1 plasmid. Most of the repeats do not appear to be in coding regions of identified proteins, although several were found in hypothetical protein-coding regions. Their presence outside of coding regions is interesting, as it reduces the selective pressure to maintain the length of the repeats due to codon usage.
From this preliminary screen of 39 candidate loci, 29 loci exhibiting polymorphisms (Table 4) were screened against a larger panel of B. anthracis strains (Table 1). Seven SNR primer sets (for loci CL50, CL47, CL32, CL63, CL55, CL77, and CL35) inconsistently or poorly amplified the SNR loci and were not used further in this study. The remaining 22 SNR primer sets were able to reproducibly amplify the SNR loci and thus were used for the subsequent analysis of B. anthracis strains of established MLVA types (Table 5). Amplicon sizes were as expected by fragment size analysis and/or direct sequencing, as were the differences in amplicon lengths between strains compared to the lengths determined by in silico analysis, with some minor differences (Table 6). There were small differences between the expected and the observed amplicon lengths of ±1 to ±3 bp. This small difference in amplicon size was consistent within amplicons produced from the same primer pairs and is likely due to the 5' modification of the primer. Many SNR amplicons produced multiple split peaks when they were analyzed on the ABI 3100 genetic analyzer. This may be due in part to incomplete 3' adenylation of amplicons by Taq polymerase. Phusion polymerase was used with some success to reduce the occurrence of split peaks. Phusion polymerase produces blunt-end products and thus eliminates variation of the PCR products due to incomplete 3' adenylation. Analysis on the ABI 3100 genetic analyzer usually produced a cluster of two or four peaks within 3 bp of each other. The first and fourth peaks, when they were present, had substantially less fluorescence than the second and third peaks. The peak with the highest fluorescence in the cluster was used to size the amplicons. The polymorphic loci were sequenced directly to confirm the polymorphisms. Sequence analysis allowed the comparison of the nucleotide sequences of the loci; however, some loci (loci CL10, CL33, and CL12) were difficult to sequence directly, possibly due to the length of the repeat region. With these three loci, the data from in silico analysis and successful direct sequencing of several strains allowed the size of the SNR to be established to ±2 bp. Sequence analysis was able to distinguish between alleles of amplicons that had multiple SNR regions that masked the polymorphisms present at each SNR region (CL1 locus), since fragment analysis at this locus was not informative. Many of the SNR loci showed limited polymorphisms when they were used to screen the B. anthracis strains selected (Table 6).
The B. anthracis strains that were indistinguishable from each other by standard MLVA analysis (Table 5) were often distinctive at one or several SNR loci (Table 6). Several SNR loci had unique alleles for isolates of a specific MLVA genotype, although this may be due to the small sample sizes for these MLVA genotypes.
DISCUSSION
In an outbreak situation, the use of a marker system that exploits regions with very high mutation rates, such as SNRs, allows the differentiation of isolates with extremely low levels of genetic diversity. This paper describes the selection and the in silico analysis of SNR loci. Those loci with the highest diversity indices were selected and used to analyze a number of B. anthracis strains that had the same MLVA genotype. These SNR markers allowed most strains with the same MLVA type to be differentiated from each other.
Molecular typing of B. anthracis has been possible due to the exploitation of VNTRs by MLVA. VNTR mutation rates are low enough to maintain their sizing through 100,000 generations with only one change in allele size (8). MLVA mutation rates are locus dependent but have been reported to be between 10–5 and 10–4 per generation in B. anthracis and greater than 10–3 mutations per generation in other bacterial species (9). The use of additional MLVA markers beyond the eight markers used in this study may differentiate between the strains used in this experiment. However, some SNR markers have higher mutation rates and perhaps higher diversity index values than most MLVA markers and therefore offer the best chance of discriminating between isolates with low levels of genetic diversity. It may not be appropriate to use these markers to establish phylogenetic relationships among diverse isolates due to homoplasy because of the high mutation rates of these markers (9). These SNR markers are best used as a molecular epidemiological tool for examination of very closely related isolates that are indistinguishable by MLVA, thereby allowing one to distinguish closely related strains more accurately at the terminal branches of the phylogenetic tree.
Alternative molecular typing methods that could provide isolate discrimination include whole-genome sequencing and microarray-based resequencing. Whole-genome sequencing reveals polymorphisms for typing purposes by allowing comparisons of entire genomes from isolates of interest; however, this technique remains cost prohibitive and is not feasible for large sample sizes (17). Microarray-based resequencing of B. anthracis has been carried out with 56 strains of B. anthracis (22). Resequencing allows one to survey large areas of the genome for strain-specific variations; however, the proper selection of the regions to be represented on the chip is crucial, since only a portion of the genome is examined. Although this technique is well suited to large sample sizes, it is cost prohibitive and is dependent on which portions of the genome have been exploited.
As expected, there is a positive correlation between diversity (D) and the length of the repeat, since larger mononucleotide SSRs are more likely to undergo slip-strand mispairing, resulting in greater variability in repeat length (1). A highly significant correlation between total repeat length and the number of alleles has been described for B. anthracis with larger repeat units (11). In our study, some plasmid SNRs were among the most polymorphic markers evaluated. Unlike chromosomal loci, plasmid loci are present in multiple copies and the detection of transient states of SNRs may be possible.
Differences in the number of polymorphic SNR loci for strains with identical MLVA genotypes were observed. There were two locus differences for 9609/9614/93-189C; two locus differences between Vollum and Vollum 1B; and seven locus differences between 17T5 and SK31, although the pXO1 MLVA locus was polymorphic between the two strains. When nine other B. anthracis strains with the same MLVA genotype were compared (strain 9604, 9807, 9911, 03-0139, 03-0191, 9937, 9946, 94-188C, and 200077), 22 different alleles were identified at a combined seven loci. Seven of the nine strains were distinguished from each other by the use of four SNR loci (the CL10, CL12, CL33, and CL76 loci). While our study demonstrates that SNR analysis does allow strains with the same MLVA genotype to be further distinguished from each other, two sets of isolates were not readily distinguished from each other. One set included strains 9609 and 9614, which were from the same outbreak. It is interesting that strains 9937 and 9946 were isolated in the same year at the same location (Alhambra, Alberta, Canada) and had distinctive SNR genotypes. The other set of SNR identical isolates were 03-0191 and 03-0139, which were both isolated from bovines in Manitoba, Canada, in 2003; but the nature of their isolation is not clear (they may have been from the same outbreak or even the same animal). Although the use of the most polymorphic SNR markers may be a prudent first step in attempting to distinguish between several isolates with identical MLVA genotypes (the CL10, CL12, CL33, CL76, and CL60 loci), any one of the SNR markers (Table 6) may prove to be discriminatory between isolates. This technique is laborious and may not be suited to high-throughput automation; it uses specialized molecular typing equipment but can be easily adopted by laboratories that perform MLVA. This technique allows isolates of B. anthracis to be distinguished from each other when other typing methods fail to discriminate them; therefore, in epidemiological studies or in forensic investigations, this may be the only technique that offers the discriminatory power required.
Screening of the SNR markers (Table 3) against a more genetically and geographically diverse group of B. anthracis isolates to determine the full breadth of the SNR polymorphisms would be an important next step. Also, testing of these markers against a larger group of isolates with the same MLVA genotype is crucial in order to determine the utility of these markers for the differentiation of B. anthracis isolates and for epidemiological analysis of B. anthracis outbreaks.
ACKNOWLEDGMENTS
This work was supported by the Chemical, Biological, Radiological, Nuclear Research and Technology Initiative (CRTI project 02-0069RD).
The excellent technical assistance of D. Johnstone, T. MacMillan, and M. Russell is noted with appreciation. Thanks go to Barry Ford and John Cherwonogrodzky for reviewing this work.
REFERENCES
Coenye, T., and P. Vandamme. 2003. Simple sequence repeats and compositional bias in bipartite Ralstonia solanacearum GMI1000 genome. BMC Microbiol. 4:10. http://www.biomedcentral.com/1471-2164/4/10.
Diamant, E., Y. Palti, R. Gur-Arie, H. Cohen, E. M. Haller, and Y. Kashi. 2004. Phylogeny and strain typing of Escherichia coli, inferred from mononucleotide repeat loci. Appl. Environ. Microbiol. 70:2464-2473.
Gur-Arie, R., C. J. Cohen, Y. Eitan, L. Shelef, E. M. Hallerman, and Y. Kashi. 2000. Simple sequence repeats in Escherichia coli: abundance, distribution, composition and polymorphism. Genome Res. 10:62-71.
Harrell, L. J., G. L. Andersen, and K. H. Wilson. 1995. Genetic variability of Bacillus anthracis and related species. J. Clin. Microbiol. 33:1847-1850.
Helgason, E., D. A. Caugant, M. M. Lecadet, Y. H. Chen, J. Mahillon, A. Lvgren, I. Hegna, K. Kvaly, and A.-B. Kolst. 1998. Genetic diversity of Bacillus cereus/B. thuringiensis isolates from natural sources. Curr. Microbiol. 37:80-87.
Hill, K. K., L. O. Ticknor, R. T. Okinaka, M. Asay, H. Blair, K. A. Bliss, M. Laker, P. E. Pardington, A. P. Richardson, M. Tonks, D. J. Beecher, J. D. Kemp, A.-B. Kolst, A. C. Lee Wong, P. Keim, and P. J. Jackson. 2004. Fluorescent amplified fragment length polymorphism analysis of Bacillus anthracis, Bacillus cereus, and Bacillus isolates. Appl. Environ. Microbiol. 70:1068-1080.
Keim, P., A. Kalif, J. M. Schupp, K. K. Hill, S. E. Travis, K. Richmond, D. M. Adair, M. E. Hugh-Jones, C. R. Kuske, and P. Jackson. 1997. Molecular evolution and diversity in Bacillus anthracis as detected by amplified fragment length polymorphism markers. J. Bacteriol. 179:818-824.
Keim, P., L. B. Price, A. M. Klevytska, K. L. Smith, J. M. Schupp, R. Okinaka, P. J. Jackson, and M. E. Hugh-Jones. 2000. Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within Bacillus anthracis. J. Bacteriol. 182:2928-2936.
Keim, P., M. N. Van Ert, T. Pearson, A. J. Vogler, L. Y. Huynh, and D. M. Wagner. 2004. Anthrax molecular epidemiology and forensics: using the appropriate marker for different evolutionary scales. Infect. Genet. Evol. 4:205-213.
Koehler, T. M. 2002. Bacillus anthracis genetics and virulence gene regulation. Curr. Top. Microbiol. Immunol. 71:143-164.
Le Fleche, P., Y. Hauck, L. Onteniente, A. Prieur, F. Denoeud, V. Ramisse, P. Sylvestre, G. Benson, F. Ramisse, and G. Vergnaud. 2001. A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis. BMC Microbiol. 1:2. http://www.biomedcentral.com/1471-2180/1/2.
Levinson, G., and G. A. Gutman. 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4:203-221.
Moxon, E. R., and P. B. Rainey. 1995. Pathogenic bacteria: the wisdom of their genes, p. 255-268. In B. A. M. Van der Zeijst, L. Van Alphen, W. P. M. Hoekstra, and J. D. A. van Embden (ed.), Ecology of pathogenic bacteria. Royal Dutch Academy of Sciences, second series, no. 96. Royal Dutch Academy of Sciences, Amsterdam, The Netherlands.
Okinaka, R., K. Cloud, O. Hampton, A. Hoffmaster, K. Hill, P. Keim, T. M. Koehler, G. Lamke, S. Kumano, J. Mahillon, D. Manter, Y. Martinez, D. Ricke, R. Svensson, and P. J. Jackson. 1999. Sequence and organization of pXO1, the large Bacillus anthracis plasmid harboring the anthrax toxin genes. J. Bacteriol. 181:6509-6515.
Pearson, T., J. D. Busch, J. Ravel, T. D. Read, S. D. Rhoton, J. M. U'Ren, T. S. Simonson, S. M. Kachur, R. R. Leadem, M. L. Cardon, M. N. Van Ert, L. Y. Huynh, C. M. Fraser, and P. Keim. 2004. Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. Proc. Natl. Acad. Sci. USA 101:13536-13541.
Read, T. D., S. L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J. D. Busch, K. L. Smith, J. M. Schupp, D. Solomon, P. Keim, and C. M. Fraser. 2002. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science 296:2028-2033.
Read, T. D., S. N. Peterson, N. Tourasse, L. W. Baillie, I. T. Paulsen, K. E. Nelson, H. Tettelin, D. E. Fouts, J. A. Eisen, S. R. Gill, E. K. Holtzapple, O. A. Okstad, E. Helgason, J. Rilstone, M. Wu, J. F. Kolonay, M. J. Beanan, R. J. Dodson, L. M. Brinkac, M. Gwinn, R. T. Deboy, R. Madpu, S. C. Daugherty, A. S. Durkin, D. H. Haft, W. C. Nelson, J. D. Peterson, M. Pop, H. M. Khouri, D. Radune, J. L. Benton, Y. Mahamoud, L. Jiang, I. R. Hance, J. F. Weidman, K. J. Berry, R. D. Plaut, A. M. Wolf, K. L. Watkins, W. C. Nierman, A. Hazen, R. Cline, C. Redmond, J. E. Thwaite, O. White, S. L. Salzberg, B. Thomason, A. M. Friedlander, T. M. Koehler, P. C. Hanna, A. B. Kolsto, and C. M. Fraser. 2003. The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423: 81-86.
Rozen, S., and H. J. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist programmers, p. 365-386. In S. Krawetz and S. Misener (ed.), Bioinformatics methods and protocols: methods in molecular biology. Humana Press, Totowa, N.J.
Ticknor, L. O., A.-B. Kolst, K. K. Hill, P. Keim, M. T. Laker, M. Tonks, and P. J. Jackson. 2001. Fluorescent amplified fragment length polymorphism analysis of Norwegian Bacillus cereus and Bacillus thuringiensis soil isolates. Appl. Environ. Microbiol. 67:4863-4873.
van Belkum, A., S. Scherer, L. van Alphen, and H. Verbrugh. 1998. Short-sequence DNA repeats in prokaryotic genomes. Microbiol. Mol. Biol. Rev. 62:275-293.
Vogler, A. J., J. D. Busch, S. Percy-Fine, C. Tipton-Hunton, K. L. Smith, and P. Keim. 2002. Molecular analysis of rifampin resistance in Bacillus anthracis and Bacillus cereus. Antimicrob. Agents Chemother. 46:511-513.
Zwick, M. E., F. Mcafee, D. J. Cutler, T. D. Read, J. Ravel, G. R. Bowman, D. R. Galloway, and A. Mateczun. 2004. Microarray-based resequencing of multiple Bacillus anthracis isolates. Genome Biol. 6:R10.(Chad W. Stratilo, Christo)