当前位置: 首页 > 期刊 > 《核酸研究》 > 2006年第We期 > 正文
编号:11367684
iCR: a web tool to identify conserved targets of a regulatory protein
http://www.100md.com 《核酸研究医学期刊》
     Computational and Functional Genomics Group, Sun Centre of Excellence in Medical Bioinformatics, Centre for DNA Fingerprinting and Diagnostics EMBnet India Node, Hyderabad 500076, India

    *To whom correspondence should be addressed. Tel: +91 40 27171442; Fax: +91 40 27171442; Email: akash@cdfd.org.in

    ABSTRACT

    Gene regulatory circuits are often commonly shared between two closely related organisms. Our web tool iCR (identify Conserved target of a Regulon) makes use of this fact and identify conserved targets of a regulatory protein. iCR is a special refined extension of our previous tool PredictRegulon- that predicts genome wide, the potential binding sites and target operons of a regulatory protein in a single user selected genome. Like PredictRegulon, the iCR accepts known binding sites of a regulatory protein as ungapped multiple sequence alignment and provides the potential binding sites. However important differences are that the user can select more than one genome at a time and the output reports the genes that are common in two or more species. In order to achieve this, iCR makes use of Cluster of Orthologous Group (COG) indices for the genes. This tool analyses the upstream region of all user-selected prokaryote genome and gives the output based on conservation target orthologs. iCR also reports the Functional class codes based on COG classification for the encoded proteins of downstream genes which helps user understand the nature of the co-regulated genes at the result page itself. iCR is freely accessible at http://www.cdfd.org.in/icr/.

    INTRODUCTION

    Over last one and half decades, genomes of microorganisms have been sequenced at a highly accelerated pace. However, extracting useful information from such a large pool of genome data has become a major challenge of post genomics era. One approach to address this issue is to organize the large and complex genome into an ordered and manageable subsystem that can be tackled systematically. An important example of this approach is to study cellular processes and associated gene expression in terms of gene regulatory circuits. Each of these circuits contains a regulator and a list of its target sites (motifs) located upstream to a subset of genes that are being regulated (1–3). Such an approach will enable us to understand how the constituent genes of a genome come together to execute metabolic and physiological processes of a cell in response to a given regulator.

    A large number of experimental and computational approaches are being attempted to understand how these genes come together to perform physiological function. The experimental approaches typically include microarray analysis of transcriptome (4,5). Subsequent to gathering the experimental data computational approaches are applied to search for common regulatory motifs and promoters present upstream to the up and down regulated genes and protein (6). Some of the computational tools like PHYLONET (7), BioProspector (8,9), Compare Prospector (9,10), MDscan (9,11), Motif Regressor (12), Bio Optimizer (13), PhyME (14) and so on are available for this purpose, but, most of these are either designed for eukaryotes or written to analyze the experimental data, such as micro array data, in terms of gene regulation.

    An alternate approach could be to first select the regulator associated with a cellular process and then use computational approach to identify the potential target of regulatory protein which could then subsequently be followed up by experiments to validate the computationally identified targets. As a first step in this direction, we had previously proposed a tool called PredictRegulon, which finds targets of a regulatory protein in a genome based on limited set of known binding motif data (15). We have successfully used this tool to identify and validate the DtxR and IdeR targets in corynebacteria and mycobacteria, respectively (16,17). However an important limitation of Predictregulon was that it searches one genome at a time.

    Carrying out simultaneous search in multiple genomes offers many advantages, most important among these are ability of such approach to reveal the conserved regulatory targets across the multiple related genomes. This would increase the confidence of experimental biologist in taking up experimental validation. Further it was also felt that if we could group the targets based on class of genes that is being regulated then we could provide the overall impact of the regulator on the physiology of the organism.

    We describe here iCR (identify Conserved target of a Regulon), a web server tool, for identification of conserved high priority targets of a regulatory protein from heterologous sequence data of prokaryotes (which includes regulatory sequences of genes and their orthologs in other species) where the user can easily distinguish biologically important motifs from background noise based on their cross species conservation.

    PROGRAM DESCRIPTION

    iCR is a CGI based web application written in Perl and C language. It uses a Shannon relative entropy based profile search method, similar to what was used in PredictRegulon tool. This application can utilize the available experimental data on binding sites of a transcription regulatory protein (18–20) to identify the regulons of a given regulator in genomes of various phylogenetically related bacterial species.

    iCR is composed of three parts (Figure 1): (i) a front-end web interface for submitting the block aligned known binding motifs and for selection of species of choice; (ii) a search engine for scanning the upstream sequences; and (iii) a classification and reporting system for rendering the textual output produced by iCR into a meaningful grouping. Each of these components is discussed in detail in the help pages linked to the iCR home page. A brief description is being given here.

    Figure 1 Architecture of iCR. iCR is a CGI application which collects input from user using html forms (A). B represents a Perl script that gathers the input from A launches the Search Engine (C) which looks up genome sequences and their annotations (D), and returns the potential targets as an output which is further classified based on COG/Class or Genome. The classified output is returned as HTML output (F).

    Input submission

    iCR provides a web-based form for the input submission. The input form consists of two HTML pages. The first one accepts the sample motifs and the parameters defining the upstream region. On this page the known motifs can be copied either from sample input form or any authentic source and then be pasted in the web form in a block aligned fashion. The second page has a list of genomes organized in a taxonomically meaningful order for convenience in selection of multiple related species at a time and finally, the users need to specify the basis on which they want the predicted motifs to be grouped or classified on. The default or preferred option is Cluster of Orthologous Group (COG).

    Search engine

    Parameters accepted from the input forms are passed to a search engine which uses the Shannon relative entropy based profile scan method to scan the upstream sequences for regulatory motifs. This method is described in our previous paper PredictRegulon (15). However this analysis is carried out on multiple user selected genome and the results are compiled together. Since the complete COG data were not available for many of the genes of various genomes, we updated these data by running COGNITOR (21,22). Each COG selected represents the best hits to proteins from at least three lineages.

    The output of the search result is classified and grouped based on one of the three options—orthology, function class code or genome. Classification based on orthology (default option) lists all the orthologous targets of a regulator together emphasizing the fact that these are conserved targets of a given regulon.

    Output

    All the predicted and classified target motifs are presented as HTML table. This table has following columns: COG name, Functional class code, Genome, motif score, motif, Gene id mentioned in NCBI's ptt table, ORF number and gene product. The program predicts a number of motifs, the blue background color shows the high scoring motifs above the cut-off value. The motifs with yellow background color depicts exact match to the known binding sites.

    Example usage

    To demonstrate the typical application of iCR's regulon assignments, we chose to use known LexA-binding sites from Bacillus subtilis as a query set. These sites were collected from PRODORIC (19). We then selected different species belonging to Fermicutes (Bacillales, Lactobacillales, Clostridia and Mollicutes) simultaneously for search. We obtained the result classified on COG in which DNA motifs upstream to lexA (COG1974), recA(COG0468), uvrB(COG0556), dinP(COG0389), rpsE(COG0098), rpsN(COG0098), rggD (COG0457) and so on were picked up in many species together and therefore they qualify for conserved targets of LexA regulon (Table 1). Lex A is known to autoregulates itself (23). recA gene has been experimentally shown to be part of LexA regulon in Escherichia coli as well as B.subtilis (23,24). Homologs of dinP have also been shown to be regulated by LexA protein in Bdellovibrio bacteriovorus (25). LexA protein has been reported to interact with the regulatory region of uvrB in B.subtilis (19). All these observations confirm that the program is capable of identifying significant and high priority targets of a given regulator successfully. Additionally the result also highlights many motifs upstream to hypothetical genes/ORFs. An experimental confirmation of interaction of these motifs to LexA, followed by a functional assay based on known processes involved with a given regulator, could shed more lights on function of these hypothetical genes.

    Table 1 Output of iCR showing the conserved targets of LexA regulon in Fermicutes

    To test the sensitivity of the iCR predictions, we deleted two important and known binding motifs of LexA protein (present upstream to the dinB and uvrB in B.subtilis) from the input form and selected two species of Bacillales, B.subtilis and Bacillus holodurans. These two motifs were picked up on result page with blue background proving the reliability of predictions.

    Certainly iCR results can serve as a useful starting point for molecular and cellular biologists for designing experiments to see the in vitro and in vivo effects of a regulatory protein in different systems.

    CONCLUSION

    To summarize, iCR is a web server that permits high throughput, detailed and fully automated prediction of potential binding targets of a regulatory protein in user selected prokaryotic species. iCR consists of 115 prokaryotic species arranged phylogenetically on the web interface. The first column on the result page, COG, is hyperlinked to NCBI and are fully navigable to allow users to have easy access to more related and descriptive information. The genome column shows the genome ID that is hyperlinked to a HTML page containing genome names corresponding to different IDs. For the user's convenience, functional class code column has also been linked to a page, which has a description of all the codes. iCR's strengths are in its free web accessibility, its comprehensiveness regarding choice of multiple species at a time, sorting of result based on COG and Class, and its interactive graphical interface.

    ACKNOWLEDGEMENTS

    Research in AR's laboratory is supported by grants from Council of Scientific and Industrial Research (CSIR) NMITLI, Department of Biotechnology, Department of Science and Technology, Govt. of India. S.R. and J.S. are supported by CSIR NMITLI Grant. V.V. is supported by UGC Research Fellowship and Y.S. is supported by CSIR Research Fellowship. Funding to pay the Open Access publication charges for this article was provided by Centre for DNA Fingerprinting and Diagnostics, Hyderabad.

    REFERENCES

    Xing, B. and van der Laan, M.J. (2005) A causal inference approach for constructing transcriptional regulatory networks Bioinformatics, 21, 4007–4013 .

    Hershberg, R., Yeger-Lotem, E., Margalit, H. (2005) Chromosomal organization is shaped by the transcription regulatory network Trends Genet, 21, 138–142 .

    Balazsi, G., Barabasi, A.L., Oltvai, Z.N. (2005) Topological units of environmental signal processing in the transcriptional regulatory network of Escherichia coli Proc. Natl Acad. Sci. USA, 102, 7841–7846 .

    Ren, J. and Prescott, J.F. (2003) Analysis of virulence plasmid gene expression of intra-macrophage and in vitro grown Rhodococcus equi ATCC 33701 Vet. Microbiol, . 94, 167–182 .

    Rodriguez, G.M., Voskuil, M.I., Gold, B., Schoolnik, G.K., Smith, I. (2002) ideR, an essential gene in Mycobacterium tuberculosis: role of IdeR in iron-dependent gene expression, iron metabolism, and oxidative stress response Infect. Immun, . 70, 3371–3381 .

    Lin, L.H., Lee, H.C., Li, W.H., Chen, B.S. (2005) Dynamic modeling of cis-regulatory circuits and gene expression prediction via cross-gene identification BMC Bioinformatics, 6, 258 .

    Wang, T. and Stormo, G.D. (2005) Identifying the conserved network of cis-regulatory sites of a eukaryotic genome Proc. Natl Acad. Sci. USA, 102, 17400–17405 .

    Liu, X., Brutlag, D.L., Liu, J.S. (2001) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes Pac. Symp. Biocomput, . 127–138 .

    Liu, Y., Wei, L., Batzoglou, S., Brutlag, D.L., Liu, J.S., Liu, X.S. (2004) A suite of web-based programs to search for transcriptional regulatory motifs Nucleic Acids Res, . 32, W204–W207 .

    Liu, Y., Liu, X.S., Wei, L., Altman, R.B., Batzoglou, S. (2004) Eukaryotic regulatory element conservation analysis and identification using comparative genomics Genome Res, . 14, 451–458 .

    Liu, X.S., Brutlag, D.L., Liu, J.S. (2002) An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nat. Biotechnol, . 20, 835–839 .

    Conlon, E.M., Liu, X.S., Lieb, J.D., Liu, J.S. (2003) Integrating regulatory motif discovery and genome-wide expression analysis Proc. Natl Acad. Sci. USA, 100, 3339–3344 .

    Jensen, S.T. and Liu, J.S. (2004) BioOptimizer: a Bayesian scoring function approach to motif discovery Bioinformatics, 20, 1557–1564 .

    Sinha, S., Blanchette, M., Tompa, M. (2004) PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences BMC Bioinformatics, 5, 170 .

    Yellaboina, S., Seshadri, J., Kumar, M.S., Ranjan, A. (2004) PredictRegulon: a web server for the prediction of the regulatory protein binding sites and operons in prokaryote genomes Nucleic Acids Res, . 32, W318–W320 .

    Yellaboina, S., Ranjan, S., Chakhaiyar, P., Hasnain, S.E., Ranjan, A. (2004) Prediction of DtxR regulon: identification of binding sites and operons controlled by Diphtheria toxin repressor in Corynebacterium diphtheriae BMC Microbiol, . 4, 38 .

    Prakash, P., Yellaboina, S., Ranjan, A., Hasnain, S.E. (2005) Computational prediction and experimental verification of novel IdeR binding sites in the upstream sequences of Mycobacterium tuberculosis open reading frames Bioinformatics, 21, 2161–2166 .

    Salgado, H., Santos-Zavaleta, A., Gama-Castro, S., Millan-Zarate, D., Diaz-Peredo, E., Sanchez-Solano, F., Perez-Rueda, E., Bonavides-Martinez, C., Collado-Vides, J. (2001) RegulonDB (Version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12 Nucleic Acids Res, . 29, 72–74 .

    Munch, R., Hiller, K., Barg, H., Heldt, D., Linz, S., Wingender, E., Jahn, D. (2003) PRODORIC: prokaryotic database of gene regulation Nucleic Acids Res, . 31, 266–269 .

    Ishii, T., Yoshida, K., Terai, G., Fujita, Y., Nakai, K. (2001) DBTBS: a database of Bacillus subtilis promoters and transcription factors Nucleic Acids Res, . 29, 278–280 .

    Tatusov, R.L., Galperin, M.Y., Natale, D.A., Koonin, E.V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution Nucleic Acids Res, . 28, 33–36 .

    Tatusov, R.L., Natale, D.A., Garkavtsev, IV, Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., Koonin, E.V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes Nucleic Acids Res, . 29, 22–28 .

    Little, J.W., Mount, D.W., Yanisch-Perron, C.R. (1981) Purified lexA protein is a repressor of the recA and lexA genes Proc. Natl Acad. Sci. USA, 78, 4199–4203 .

    Groban, E.S., Johnson, M.B., Banky, P., Burnett, P.G., Calderon, G.L., Dwyer, E.C., Fuller, S.N., Gebre, B., King, L.M., Sheren, I.N., et al. (2005) Binding of the Bacillus subtilis LexA protein to the SOS operator Nucleic Acids Res, . 33, 6287–6295 .

    Campoy, S., Salvador, N., Cortes, P., Erill, I., Barbe, J. (2005) Expression of canonical SOS genes is not under LexA repression in Bdellovibrio bacteriovorus J. Bacteriol, . 187, 5367–5375 .(Sarita Ranjan, Jayshree Seshadri, Vaibha)