当前位置: 首页 > 期刊 > 《核酸研究》 > 2004年第14期 > 正文
编号:11372049
Transposon Express, a software application to report the identity of i
http://www.100md.com 《核酸研究医学期刊》
     Department of Bioscience, University of Strathclyde, Glasgow, Scotland, UK, 1 Institute of Medical Genetics, University of Wales College of Medicine, Cardiff, Wales, UK, 2 John Innes Centre, Norwich, England, UK and 3 School of Biological Sciences, University of Wales Swansea, Swansea, Wales, UK

    * To whom correspondence should be addressed. Tel: +44 0 141 1792 295667; Fax: +44 0 141 1792 295447; Email: p.j.dyson@swansea.ac.uk

    DDBJ/EMBL/GenBank accession no. AJ566337

    ABSTRACT

    Comprehensive mutant libraries can be readily constructed by transposon mutagenesis. To systematically mutagenise the genome of the Gram-positive bacterium Streptomyces coelicolor A3(2), we have employed high-throughput shuttle transposon mutagenesis of a cosmid library prepared in Escherichia coli. The location of transposon insertions is determined using automated procedures for cosmid isolation and DNA sequencing. However, a major bottleneck was the subsequent analysis of DNA sequence files. To overcome this limitation, a software application, Transposon Express, was written to allow the rapid location of transposon insertions in a sequenced genome (available at http://www.swan.ac.uk/genetics/dyson/InstallTE). Transposon Express determines the identity both of a disrupted open reading frame (ORF), and the short target site duplication created by transposition. Transposon Express also reports the orientation of the transposon and can therefore predict transcriptional coupling between an upstream promoter and a promoter-less reporter gene carried by the transposon. Analysis of a large dataset of independent insertions created using a Tn5-based transposon revealed an insertional preference for GC-rich streptomycete DNA compared to E.coli vector DNA. In addition to demonstrating the value of Transposon Express as a generic tool supporting genome-wide transposon mutagenesis programs, these data provide insight into target site selection by Tn5.

    INTRODUCTION

    A critical resource for the genome-wide analysis of gene function in fully sequenced organisms is the availability of comprehensive mutant libraries. Transposon mutagenesis remains one of the most efficient strategies for creating these libraries. For example, to investigate gene function in the Gram-positive bacterium Streptomyces coelicolor A3(2), we have developed a procedure of systematic in vitro shuttle mutagenesis (1). The procedure employs a minitransposon, Tn5062, derived from the ‘cut and paste’ transposon Tn5. This is used for in vitro transposon mutagenesis of the ordered S.coelicolor cosmid library (2). The site of an individual transposon insertion is determined by sequencing flanking regions in mutated cosmid DNA that is recovered after amplification in Escherichia coli. The effect of disruption of a predicted open reading frame (ORF) can then be determined by conjugal intergeneric transfer of a mutated cosmid to S.coelicolor and selecting for allelic replacement (1). This procedure was designed to offer versatility and high-throughput. The latter has been achieved by automating cosmid extraction and DNA sequencing using a 96-well format. Unfortunately, the analysis and interpretation of the resulting sequence data proved much slower and more labour intensive than its generation. For this reason we have developed Transposon Express (TrEx), a software application intended to automate bioinformatics procedures of insertion site location, determination of transposon orientation and identification of any disrupted genes. This generic tool is designed to enhance programs of transposon mutagenesis of any sequenced genome.

    In addition to being an integral component of the ongoing systematic mutagenesis program of S.coelicolor, we have used TrEx to analyse the large number of 9 bp target site duplications created by Tn5 transposition, in our extensive insert database. Tn5 transposition is one of the best characterized of all transposition systems at the molecular level (3): the three-dimensional structure of the Tn5 synaptic complex and the mechanism by which it is assembled and catalyses strand transfer are all well understood (4–6). Transposition into an E.coli plasmid DNA with a %G+C content of 50% occurs at a symmetrical 9 bp target consensus sequence (7). Here, we compare transposition into DNA of mid- and high %G+C content (72.12%), demonstrating that the latter is a preferred target substrate for Tn5 in vitro transposition.

    MATERIALS AND METHODS

    In vitro transposition and DNA sequencing

    Thirty non-overlapping cosmids from the S.coelicolor A3(2) cosmid library, based on Supercos-1, were obtained from Helen Kieser (John Innes Centre, Norwich, UK) as cultures of E.coli Sure (Stratagene). Cosmid DNA (SCF55, SCF91, SCF43, 2SCG38, 2SCI34, SCI7, SC3A3, SCC77, SCC88, SCE20, SCE59, SCH66, SCH44, SCH22A, SCH69, SCD66, SCD16A, 2SCK8, SC9E12, SC2A11, SC6A9, SC3C3, SC7C7, SC2E9, SC7B7, SC9B5, SC4B10, SC4G10, SC5C11 and SC10F4) was isolated, subjected to in vitro transposition with Tn5062 and EZ::TN Transposase (Epicentre) (1) and used to transform E.coli JM109 (8). For each cosmid, 96 mutated derivatives were subsequently isolated with Promega Wizard SV96 plasmid isolation kits using a Qiagen Biorobot 3000 according to the manufacturer's protocols and instructions. The target site for transposition was determined by sequencing the cosmid DNA flanking the transposon insertion using the Tn5062 specific primer EZR1 (5'-ATGCGCTCCATCAAGAAGAG), homologous to an internal sequence 140 bp from one end (Figure 1). For a representative sample of insertions, the sequence of the other flanking region was also determined using a second sequencing primer EZL2 (5'-TCCAGCTCGACCAGGATG).

    Figure 1. Organization of Tn5062 The transposon consists of two inverted repeats (ME, mosaic ends) flanking an internal region containing the following elements (not drawn to scale): (i) a sequence consisting of translational stop codons in all three reading frames (stop), (ii) a consensus streptomycete ribosome site (RBS), (iii) a promoter-less copy of the green fluorescent protein gene (egfp), (iv) an apramycin-resistance gene flanked itself by two T4 transcriptional terminators (T4) and (v) the RK2 origin of transfer (oriT). Also indicated are positions of two sequencing primers, EZL2 and EZR1, used to precisely determine the location of each insertion. The sequence of Tn5062 is deposited at EMBL (Accession number AJ566337 ).

    Manual analysis of Tn5062 insertion sites

    Flanking sequences to Tn5062 insertions were visually identified by analysis of raw sequence files using DNAMAN (Lynnon Biosoft, Quebec, Canada) to allow in silico removal of transposon sequence and identification of each 9 bp target site. Following this, the location and orientation of each insertion was established by comparison of the flanking sequence with the S.coelicolor genome sequence . If the flanking sequence did not match with the S.coelicolor genome sequence, it was compared with Supercos vector sequence. Information on the sizes and coding sequences present on each cosmid was determined from the S.coelicolor genome database (http://streptomyces.org.uk/S.coelicolor).

    Transposon Express

    The TrEx application was constructed using the Inprise/Borland integrated development environment (IDE) Delphi version 6. This environment uses Object Pascal as the programming language and deploys applications on the Microsoft Windows? operating system. The application uses a standard windows Graphical User Interface (GUI), consisting of menus, dialog boxes and other common Windows controls. The application connects to two components, the BLAST executable (download from http://www.ncbi.nlm.nih.gov/Ftp/) and a MySQL database (latest version and information at http://www.mysql.com). Connection and communication to the BLAST program is achieved through the command line, with output being piped into the XML format, which is then parsed to extract relevant sequence matching details by the TrEx application. Connection to the MySQL database is through the Windows Open database connectivity (ODBC) protocol. The program is designed and built for Windows 2000, NT4 and XP platforms, and has been thoroughly tested on each of these operating systems. Included with the package are utilities for formatting genome files to compile with the application, the MySQL installation bundle and ODBC drivers. The software bundle and application are available for evaluation and download at http://www.swan.ac.uk/genetics/dyson/InstallTE and is distributed to non-commercial users under an open source license. To transfer data to the S.coelicolor database, ScoDB (http://streptomyces.org.uk/S.coelicolor/), they were first exported from TrEx as comma delimited files (*.csv) to allow their import into Microsoft Excel. A Perl script reads the Excel spreadsheets into a postgresql database. A generic Perl script (GD) then generates a graphical representation for each insertion.

    RESULTS

    Transposition of Tn5062 in S.coelicolor cosmid DNA

    Thirty non-overlapping cosmids from the S.coelicolor cosmid library (2) were subjected to transposon mutagenesis with Tn5062. On average, 7.93 x 103 ampicillin-, kanamycin- and apramycin-resistant E.coli transformants were obtained from each transposition reaction at an average transposition frequency of 1 in 99 (i.e. for every 99 cosmid molecules, one also contained Tn5062). For cosmids SC7C7 and SCH69, transposition was confirmed by sequencing 96 mutated cosmids derived from each reaction, using both sequencing primers EZL2 and EZR1 (Figure 1). Manual inspection of sequence files revealed the presence of a 9 bp direct repeat sequence immediately flanking Tn5062 insertions in all cases. Moreover, manual BLASTn comparisons with the S.coelicolor genome sequence and, when this failed, with sequence of the Supercos vector, revealed no associated rearrangements in these mutated cosmids. The locations of insertions in the remaining 28 cosmids were determined by sequencing with only the EZR1 primer. Subsequent analysis of these sequence files was performed with TrEx.

    Overview of Transposon Express

    TrEx was designed to analyse sequence data, in batches of 96 reads, generated from independent transposon insertions in a genome of known sequence, and obtained using a transposon-specific sequencing primer. The application has been specifically enhanced for the procedure of in vitro shuttle mutagenesis of a gene library in E.coli, but can be employed for any type of transposon mutagenesis. The 96 sequence reads are imported into TrEx, linked to the genome sequence of the target organism in silico (Figure 2) and, where applicable, to the appropriate cosmid sequence. The cosmid is selected from a user-defined list containing its name and genomic coordinates. Finally, the user is required to select the transposon used to carry out the mutagenesis (in this case Tn5062). TrEx uses BLAST to locate the transposon sequence within each of the 96 raw sequence files. If the application is unable to locate the transposon within a raw sequence file, then this information is flagged via a label and icon in the program (Figure 3). Files that are either empty, or where a transposon site cannot be derived, are excluded from further analysis. If TrEx finds an ambiguity in the match between the transposon sequence and the sequence file, the latter is flagged by an icon (Figure 3). The user then has the option to manually inspect and reassign the putative junction sequence. TrEx then deletes the transposon sequence to create a ‘cleaned’ source file (Figure 3). TrEx calls BLAST to query this edited source file against the genome or cosmid sequence. Consequently, TrEx can map the sequence in the genome. If no match is found, the edited source file is then mapped on the reverse complementary genomic sequence. If a match is detected in the former case it is designated as a plus strand insertion and for the latter case, a minus strand insertion. Edited source files that do not match with the genome sequence in either orientation are flagged with a label and icon (Figure 3). These result from the transposon insertion in the Supercos vector backbone or multiple insertions in the same cosmid molecule, resulting in an unreadable sequence outside the Tn5062 sequence itself. These flagged files are omitted from any further analysis. The position of the insertion is then compared to the coordinates of the genes in the gene list and the identity of the disrupted gene (if any) is determined. As Tn5062 contains a promoter-less egfp reporter gene, TrEx also determines whether there is a likelihood of the promoter of the disrupted gene driving the transcription of the reporter gene. This is done by comparing the orientation of the insertion to the direction of transcription of the disrupted gene. Moreover, TrEx finds the first few nucleotides (0–20 nt, as determined by the user for the given transposon employed for mutagenesis) of the cleaned source file that corresponds to one copy of the directly repeated sequence generated by the transposition reaction, the first 9 bp for Tn5062, and reports this as the transposition target site (Figure 4). Before the generation of a report file of the 96 insertions, the user is able to visually scan the output for warning flags. As well as specifically flagging poor data (usually a consequence of poor quality sequencing template), TrEx also provides a score for each sequence match (Figure 3). By default, only alignments below a threshold E-value of 1 x 10–4 are reported, although the user can modify this parameter to become more or less stringent. If a low score is reported, a visual inspection of the data is recommended. These measures ultimately provide greater reliability in the output. TrEx saves information to a database, and organizes the experimental data by genome and data, thus providing a logical storage pattern for voluminous datasets. In addition to viewing output files in TrEx (Summary Report; Figure 4), the application allows the export of files as comma delimited files (*.csv) to allow their import into Microsoft Excel and other spreadsheet programs. TrEx vastly accelerates the analysis of raw sequence files: a typical batch of 96 sequences is processed in 10–15 min, compared to at least 4 h of manual processing time.

    Figure 2. Transposon Express design flowchart.

    Figure 3. Main window of Transposon Express. TrEx locates the transposon sequence, blue sequence in the small window (A), in each sequence file. If the program is unable to detect the transposon sequence, the file is flagged as ‘No hits detected by Blast’ in the main window (B), column 5. If, due to poor data quality, there is ambiguity over assignment of the junction between transposon and genome sequence, flagged in TrEx ], the user can intervene to assign the most probable junction after inspection of the sequence (A). Once checked, both unchanged junctions and reassigned junctions are uniquely flagged and , respectively]. An edited ‘cleaned’ source file is then compared with the genome sequence. If TrEx finds no match between the edited file and the genome sequence, the file is flagged ‘No hits detected in Cosmid’ in window (B), column 5. The score for each sequence match is reported in column 6 of the main window (B).

    Figure 4. Transposon Express Summary Report. For each sequence file, information on the characteristics of the transposon insertion is provided. The first line includes the coordinates of the transposon insertion in the cosmid and genome, together with the orientation of the transposon with respect to the coding strand of the gene. TrEx can thereby predict transcriptional coupling between the disrupted gene and egfp encoded by the transposon, reported as ‘1’ (insertions in the ‘wrong’ orientation are reported as ‘0’). The final column in the first line reports the target site duplicated as part of the transposition reaction. The second line identifies the disrupted gene, including its coordinates in the genome.

    Location of transposon insertions

    TrEx was used to analyse 2880 sequences from 30 Tn5062 transposition reactions and could identify the location of 2222 sequences within the S.coelicolor DNA contained within those 30 cosmids. The 658 sequences that could not be localized by TrEx were flagged as ‘no hits detected by BLAST’ or ‘no hits detected on cosmid’ by the software and were visually analysed. The former group consisted of sequences where no clear sequence was obtained at all, presumably due to poor quality sequencing template; TrEx could not detect Tn5062 sequence and the data were discarded. The latter group contained insertions in the Supercos vector backbone or sequences that could only be read to the end of the Tn5062 sequence (presumably, these were due to multiple insertions in the same cosmid molecule). Seven insertions were also found where the DNA flanking the transposon matched with Tn5062 (presumably these were due to either an intergenic or intragenic transposition of Tn5062, followed by a second transposition of the resulting hybrid transposon into cosmid DNA). Within this ‘no hits detected on cosmid’ group of sequences, 184 could be visually located in the Supercos vector backbone. To evaluate the accuracy of the insertion site identification by TrEx, the results of manual analysis of insertion sites in cosmids SC7C7 and SCH69 were compared to those reported by the new application. TrEx reported the same insertion sites in all cases. The data on the 2222 insertions within S.coelicolor DNA have been transferred to the S.coelicolor database, ScoDB, where each insertion is graphically represented (Figure 5).

    Figure 5. Graphical representation of the output of TrEx in the S.coelicolor database, ScoDB. Each insertion is represented by a solid triangle. The orientation of each predicted gene is designated by the direction of each red arrow and its placement either above or below the linear representation of the genome (black line). The orientation of each transposon insertion is also represented by the location, either above or below the linear genome, and direction of the solid triangle. The user can click on an individual transposon symbol to access further information on the particular insertion (lower window): in this case an insertion in the 3' end of orf SCO3846, providing information on its exact location, predicted expression of egfp, and the target site duplication.

    Preference of Tn5062 for streptomycete DNA

    Features of the data collected from the 2333 unique Tn5062 insertions are shown in Table 1. The S.coelicolor genome as a whole has a coding density of 88.90% (10). Of the Tn5062 insertions, 88.88% occurred in coding sequences, resulting in disruption of 89.6% of the ORFs carried on the mutagenised cosmids, and indicating that Tn5 in vitro transposition has no preference for intragenic or intergenic regions. Of the insertions, 92.88% (2167) occurred in DNA derived from S.coelicolor and 7.17% (166) in Supercos vector DNA. However, the amount of Supercos DNA available as transposition target was 4294 bp. This was calculated by excluding regions of Supercos that consist of the origin of replication and the ampicillin and kanamycin resistance genes. Indeed, all the recovered insertions in Supercos DNA were in parts of the vector that excluded these regions. With respect to the size of each Streptomyces DNA insert in the 30 cosmids (average size = 38949 bp), the vector DNA available as a transposition target made up 9.93% of the entire pool of target DNA. If both Supercos DNA and DNA derived from S.coelicolor were equally susceptible to transposition, then insertions in vector DNA should have comprised 9.93% of the total rather than the 7.17% observed. By comparing the percentage of insertions observed in DNA derived from S.coelicolor (observed value) to the percentage of DNA derived from S.coelicolor in each cosmid (predicted value), a 2*log-likelihood ratio analysis (11) of the data revealed that significantly more (p 3.68 x 10–22) insertions were observed in S.coelicolor DNA than would be expected if Tn5062 transposed into target DNA solely on the basis of the relative abundance of S.coelicolor DNA. An analogous comparison of the observed and expected percentages in cosmid DNA revealed a similar degree of significance (p 2.81 x 10–9), indicating a significantly lower number of insertions than expected in this DNA. This preference was also reflected in the mean distance between insertions in DNA derived from the two different pools (Table 1), with insertions occurring with almost twice the frequency in the GC-rich DNA compared to the vector.

    Table 1. Analysis of Tn5062 insertions in representatives of the S.coelicolor cosmid library

    A consensus target site for Tn5 in GC-rich DNA

    The 9 bp direct repeats duplicated on transposition from each insertion were separated into two insert collections based on the location of that insertion: either as insertions in Supercos vector DNA (vector insert collection, 166 target sites) or S.coelicolor-derived DNA (streptomycete insert collection, 2167 target sites). This allowed target site consensus sequences to be compiled by calculating the base composition at each of the nine positions that form the short direct repeat in both the vector insert collection (Figure 6A) and the streptomycete insert collection (Figure 6B). The consensus sequence derived from insertions in vector DNA was 5'-GYYYWRKRC, whereas that for insertions in S.coelicolor DNA was 5'-(G,c)CCCNGGG(C,g).

    Figure 6. Target site Base composition in vector-derived DNA (A) and S.coelicolor-derived DNA (B). Target site consensus sequences were compiled by calculating the percentage base composition at each of the 9 positions that form the short direct repeat duplicated after transposition in 166 Supercos vector insertions (A) and 2167 S.coelicolor insertions (B). A threshold value of 28% was used to derive the consensus sequences in each case.

    DISCUSSION

    We have developed a procedure for systematic mutagenesis of the S.coelicolor genome, employing in vitro shuttle transposon mutagenesis of an ordered cosmid library, using the Tn5-based minitransposon Tn5062 (1). Mutagenesis based on a cosmid-by-cosmid approach avoids the complication, inherent in the use of random in vivo transposon mutagenesis programs, of eventually a point being reached where insertions in already mutagenised genes continue to be isolated at the expense of novel insertions. Employing shuttle mutagenesis also creates the possibility of moving mutations into different genetic backgrounds and, with the use of appropriate selective markers, the construction of strains carrying defined mutations in two or more genes. The automation of much of this mutagenesis program in a 96-well format meant that the manual examination of DNA sequence files to locate insertion sites became the rate-limiting step. It was to remove this bottleneck in our mutagenesis pipeline that TrEx was developed. However, TrEx was designed to provide a simple and intuitive interface to high-throughput primary analysis of mutagenesis data for any organism, not just S.coelicolor. The program can be used to locate transposon insertions generated in any sequenced organism, provided that the genome sequence information and corresponding gene list are in the correct format for import. TrEx can be employed with any transposable element, as long as both its sequence and the length of the target site generated by that element are known. The reliability of the program in identifying insertion sites in dispersed repetitive DNA is clearly enhanced if a strategy of shuttle mutagenesis on a cosmid-by-cosmid basis is employed. Information on how to deploy TrEx for analysis of insertions in other genomes is available at http://www.swan.ac.uk/genetics/dyson/InstallTE.

    As well as being an integral component of the systematic mutagenesis of S.coelicolor, we have used TrEx to report on target site selection by the Tn5 derivative, Tn5062. Goryshin et al. (7) proposed a Tn5 consensus target site of 5'-GNTYWRANC. This sequence was compiled from analysis of 198 independent insertions generated by in vitro transposition and 156 independent insertions obtained after in vivo transposition; the majority of these insertions were within two antibiotic resistance genes with a G+C content of 43 and 54%, respectively. Shevchenko et al. (12) modified this consensus sequence to 5'-GYYYWRRRC based on analysis of a far greater number (24 493) in vitro Tn5 insertions in a mouse cDNA library with an estimated G+C content of 50%. As would be expected for DNA of a similar G+C content (51%) the consensus target for insertions in Supercos DNA was in broad agreement with these data (5'-GYYYWRKRC). In DNA derived from S.coelicolor (%G + C of 72.1), a consensus of 5'-(G,c)CCCNGGG(C,g) reflects the skewed G+C content of this DNA. The higher abundance of insertions in streptomycete DNA can be explained by there being many more suitable target sites for Tn5 transposition in this DNA. By comparison of all target sequence data, it can be seen that the most important requirements are G and C at positions 1 and 9, respectively. Potential sites with G and C appropriately spaced are obviously more prevalent in GC-rich DNA. In addition, all consensus sequences show a high degree of symmetry, supporting the model that dimers of Tn5 transposase recognize the target site as two 4 bp half sites (7). Not only is the streptomycete consensus symmetrical but it also features an increasing relative abundance of C from positions 2 to 4 and G from positions 8 to 6 (Figure 6B). Indeed, the abundance of C and G at positions 4 and 6, respectively, is equivalent to their prevalence at positions 9 and 1. We interpret this as evidence supporting the microfilament model proposed by Goryshin et al. (7). This model proposes that although a Tn5 synaptic complex can find a target solely through its direct binding to a consensus-like sequence, target binding is facilitated by cooperative interactions with transposase dimers bound at adjacent overlapping sites. Experimentally, this was supported by the observation that a preferred 9 bp target site in non-GC-rich DNA is one that is embedded within a cluster of overlapping similar sequences, each target having 5 bp periodicity (11). The abundance of C at position 4 in the streptomycete consensus coincides with the requirement for this base at position 9 of a left-hand overlapping sequence. Likewise, the G at position 6 corresponds to position 1 of an overlapping right-hand target sequence. The influence of this microfilament model is entirely consistent with the observed skew in the distribution of insertions that favours GC-rich DNA, and may be accentuated by the conditions employed for in vitro transposition that requires a large number of transposase molecules per reaction. This may have practical implications for the distribution of insertions generated by in vitro Tn5 mutagenesis of eukaryotic DNA containing non-expressed CpG islands. Indeed, an unexplained non-random distribution of insertions has been reported in mouse genomic DNA that is not evident in mouse cDNA (12).

    In summary, TrEx is an important bioinformatics tool to rapidly locate and analyse transposon insertions in sequenced genomes. TrEx is currently being used to support a program of systematic mutagenesis of the S.coelicolor genome with the output being used to regularly update ScoDB, a central database for the S.coelicolor genome. Deploying TrEx to analyse 2167 Tn5 insertions in streptomycete DNA and 166 insertions in vector DNA has revealed a target-site preference of Tn5 in vitro transposition for GC-rich DNA.

    ACKNOWLEDGEMENTS

    We thank Helen Kieser (JIC, Norwich, UK) for provision of cosmids and Paul Lewis (University of Wales, College of Medicine, UK) for programming advice. This work was supported by the BBSRC (grant numbers 58/IGF12431 and58/EGH18395).

    REFERENCES

    Bishop,A., Fielding,S., Dyson,P.J. and Herron,P.R. ( (2004) ) Systematic insertional mutagenesis of a streptomycete genome: a link between osmoadaptation and antibiotic production. Genome Res., , 14, , 893–900.

    Redenbach,M., Kieser,H.M., Denapaite,D., Eichner,A., Cullum,J., Kinashi,H. and Hopwood,D.A. ( (1996) ) A set of ordered cosmids and a detailed genetic and physical map for the 8 Mb Streptomyces coelicolor A3(2) genome. Mol. Microbiol., , 21, , 77–96.

    Reznikoff,W.S., Bhasin,A., Davies,D.R., Goryshin,I.Y., Mahnke,L.A., Naumann,T., Rayment,I., Steiniger-White,M. and Twining,S.S. ( (1999) ) Tn5: a molecular window on transposition. Biochem. Biophys. Res. Commun., , 266, , 729–734.

    Davies,D.R., Goryshin,I.Y., Reznikoff,W.S. and Rayment,I. ( (2000) ) Three-dimensional structure of the Tn5 synaptic complex transposition intermediate. Science, , 289, , 77–85.

    Lovell,S., Goryshin,I.Y., Reznikoff,W.R. and Rayment,I. ( (2002) ) Two-metal active site binding of a Tn5 transposase synaptic complex. Nature Struct. Biol., , 9, , 278–281.

    Naumann,T.A. and Reznikoff,W.S. ( (2000) ) Trans catalysis in Tn5 transposition. Proc. Natl Acad. Sci. USA, , 97, , 8944–8949.

    Goryshin,I.Y., Miller,J.A., Kil,Y.V., Lanzov,V.A. and Reznikoff,W.S. ( (1998) ) Tn5/IS50 target recognition. Proc. Natl Acad. Sci. USA, , 95, , 10716–10721.

    Yanisch-Perron,C., Vieira,J. and Messing,J. ( (1985) ) Improved M13 phage cloning vectors and host strains: nucleotide sequences of the M13mp18 and pUC19 vectors. Gene, , 33, , 103–119.

    Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. ( (1990) ) Basic Local Alignment Search Tool. J. Mol. Biol., , 215, , 403–410.

    Bentley,S.D., Chater,K.F., Cerdeno-Tarraga,A.M., Challis,G.L., Thomson,N.R., James,K.D., Harris,D.E., Quail,M.A., Kieser,H., Harper,D. et al. ( (2002) ) Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature, , 417, , 141–147.

    Hu,W.Y., Thompson,W., Lawrence,C.E. and Derbyshire,K.M. ( (2001) ) Anatomy of a preferred target site for the bacterial insertion sequence IS903. J. Mol. Biol., , 306, , 403–416.

    Shevchenko,Y., Bouffard,G.G., Butterfield,Y.S., Blakesley,R.W., Hartley,J.L., Young,A.C., Marra,M.A., Jones,S.J., Touchman,J.W. and Green,E.D. ( (2002) ) Systematic sequencing of cDNA clones using the transposon Tn5. Nucleic Acids Res., , 30, , 2469–2477.(Paul R. Herron, Gareth Hughes1, Govind C)