当前位置: 首页 > 期刊 > 《核酸研究》 > 2004年第Da期 > 正文
编号:11371107
AANT: the Amino Acid–Nucleotide Interaction Database
http://www.100md.com 《核酸研究医学期刊》
     1 Institute for Cellular and Molecular Biology, 2 Department of Chemistry and Biochemistry and 3 Department of Computer Sciences, University of Texas at Austin, Austin, TX 78712-0159, USA

    *To whom correspondence should be addressed. Tel: +1 512 232 3424; Fax: +1 512 471 7014; Email: andy.ellington@mail.utexas.edu

    Present address:

    Michael M. Hoffman, EMBL—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

    ABSTRACT

    We have created an Amino Acid–Nucleotide Interaction Database (AANT; http://aant.icmb.utexas. edu/) that categorizes all amino acid–nucleotide interactions from experimentally determined protein–nucleic acid structures, and provides users with a graphic interface for visualizing these interactions in aggregate. AANT accomplishes this by extracting individual amino acid–nucleotide interactions from structures in the Protein Data Bank, combining and superimposing these interactions into multiple structure files (e.g. 20 amino acids x 5 nucleotides) and grouping structurally similar interactions into more readily identifiable clusters. Using the Chime web browser plug-in, users can view 3D representations of the superimpositions and clusters. The unique collection and representation of data on amino acid–nucleotide interactions facilitates understanding the specificity of protein–nucleic acid interactions at a more fundamental level, and allows comparison of otherwise extremely disparate sets of structures. Moreover, by modularly representing the fundamental interactions that govern binding specificity it may prove possible to better engineer nucleic acid binding proteins.

    INTRODUCTION

    Protein–nucleic acid interactions have been variously described by focusing on either the protein’s primary sequence (such as arginine-rich motifs) (1), the protein’s tertiary structure (e.g., helix–turn–helix motifs) (2,3), the nature of the nucleic acids that are bound (double-stranded versus single-stranded nucleic acids) (4) or the conformational changes that occur during complex formation (‘induced-fit’ versus ‘lock-and-key’ models) (5). However, the data that underlies all of these approaches are the discrete interactions between amino acids and nucleotides. While such amino acid–nucleotide interactions are structurally diverse, it is nonetheless clear that in at least some instances they fall into recognizable classes, such as the ‘arginine fork’ motif in which arginine forms a pseudo-Hoogsteen pairing with guanosine (6).

    The utility of examining amino acid–nucleotide interactions has also been noted by other researchers, who have previously established databases such as the Protein–Nucleic Acid Interaction server (http://www.biochem.ucl.ac.uk/bsm/DNA/server/) (7,8) and the Protein–Side Chain Interactions database (http://www.biochem.ucl.ac.uk/bsm/sidechains/) (9) created by the Thornton group at University College London. However, these databases were created in order to study non-overlapping structures, and hence contain a relatively small number of structures. In addition, they do not contain interactions involving the peptide backbone, sugar or phosphate backbone. Finally, these databases have focused exclusively on DNA structures, and contain no information regarding protein–RNA interactions. While Treger and Westhof (10) have previously published an excellent analysis of a similarly limited number of protein–RNA interactions, there is no available database associated with this analysis.

    We have therefore developed a comprehensive amino acid–nucleotide interaction database (AANT; http://aant.icmb. utexas.edu/) that deconstructs the structures of all known protein–nucleic acid interactions into sets of amino acid–nucleotide interactions. This database should prove extremely useful in determining the extent and breadth of amino acid–nucleotide interactions, in intelligently categorizing such interactions, and eventually in using preferred appositions in the design of altered or novel protein–nucleic acid interactions.

    METHODS

    Deriving interaction super-models from experimental structures

    The software behind AANT consists of a series of Perl and Python scripts, which automatically update AANT once per week. AANT searches the Protein Data Bank (11) for experimental structures that contain both a protein and either a DNA or RNA molecule, and downloads any structures newer than the latest PDB CD-ROM set (Release 104, April 2003). In the case of structures generated with NMR data, the software only considers the first alternative structure. AANT then uses the program HBPLUS (12) to predict hydrogen bonds between single nucleotide residues and single amino acid residues. The bonded structures are broken into scores of individual interactions between the base, the sugar or the phosphate of a nucleotide residue and the side chain or peptide backbone of an amino acid residue.

    AANT creates sub-models for each individual amino acid–nucleotide interaction containing only the residues involved in the interaction. If an amino acid residue or nucleotide residue participates in multiple interactions, it will be duplicated in multiple sub-models. AANT transforms each sub-model into a new coordinate system, centering on a fixed atom in the nucleotide moiety involved (base, sugar or phosphate), while keeping the internal geometry of the interaction intact (13). AANT then assigns each sub-model to an interaction class, defined by the attributes of the interaction involved: the nucleotide (five possibilities), the nucleotide moiety (base, sugar or phosphate; three possibilities), the amino acid (20 possibilities) and the amino acid moiety (side chain or peptide backbone; two possibilities). While there are therefore 600 (= 5 x 3 x 20 x 2) potential interaction classes, only 423 of these classes have actually been observed in experimentally determined structures as of this writing. After creating and classifying all the sub-models, AANT creates a super-model for each interaction class that is a superposition of each sub-model. For a given interaction class, the super-model contains closely overlapping nucleotides surrounded by a more dispersed constellation of amino acids, approaching the nucleotides from all of the directions and with all of the geometries observed in natural structures.

    As an example, consider an experimental structure of a protein–DNA complex. AANT predicts dozens of hydrogen bonds between the protein and DNA, including a hydrogen bond between a glutamine side chain and an adenine nucleobase. For the purpose of analyzing this interaction, AANT segregates these two residues from the rest of the structure into a sub-model and for the moment ignores the other amino acid and nucleotide residues. While the adenosine residue in question might be located anywhere within the original structure, AANT defines a new coordinate system for the sub-model so that nitrogen 7 is located at (0, 0, 0), carbon 6 is located at (0, 0, z) and nitrogen 3 is located at (x, 0, z'), where x, z and z' are the three quantities that will conserve the distances and angles between the atoms from the original structure. AANT transforms all atoms in the adenosine and glutamine residues to this new coordinate system, conserving the original distances and angles throughout. AANT then creates a super-model, superimposing this transformed sub-model with the transformed sub-models of all other interactions between glutamine side chains and adenine nucleobases, including sub-models derived from other experimental structures. Since all of these sub-models have undergone the same transformation, the nitrogen 7 atoms of the adenosines are superimposed exactly on top of each other, the rest of the adenosine atoms will be reasonably close to their analogous atom from another residue, and amino acid residues will be scattered around the periphery of the superimposed adenosines, in accord with their original distances and angles from their cognate adenosines. Sugar and phosphate moieties of the adenosine nucleotide residues will also be scattered to a certain degree, depending on their relative conformations with respect to their nucleobases and nitrogen 7 atoms. The output from AANT for this particular example is displayed in Figure 1.

    Figure 1. Screenshot of AANT. Some 137 superimposed interactions between glutamine side chains and adenine nucleobases are shown. The prevalence of different kinds of interactions is readily apparent. AANT has grouped these interactions into eight clusters. Users may highlight any cluster or hide unwanted clusters. The amino acid residues from Family 3 have been highlighted in green. The superimposed amino acid and nucleotide residues from other families are shown using the Corey–Pauling–Kultun (CPK) color scheme. Users may also rotate the superimposed model in three dimensions, display the partners using space-filling models and save the structure locally.

    Classifying interactions into clusters

    After AANT has created super-models containing all known amino acid–nucleotide interactions, it uses an algorithm to group the amino acid residues within a super-model into clusters based on the ‘simple cluster-seeking algorithm’ described by Tou and Gonzalez (14). For clustering purposes, AANT defines a distance score between two amino acid residues as the number of non-hydrogen atoms in one residue multiplied by the square of the root-mean-square distance between the two residues. AANT begins the clustering process by assigning an initial interaction to a cluster, and then assigns to that cluster all other interactions that fall within a given distance score (initially set at 50 ?2) of the initial interaction or of any interaction that has been added to the initial cluster. AANT repeats this process until it cannot assign any more members to this first cluster, and then begins a second cluster, iterating until it has assigned all interactions to a cluster. If the clustering algorithm produces more clusters than 13, AANT increases the distance score threshold slightly and iterates the process until 13 or fewer clusters result. (We limit the number of clusters to 13 because two PDB ‘chains’ are assigned to each cluster, and PDB files typically do not contain more than 26 chains.)

    For example, consider a series of superimposed glutamine–adenosine interactions found in the same super-model. If the first glutamine residue has a distance score of 60 ?2 from the second, but both have a distance score of just 30 ?2 from the third residue, they will all be assigned to the same cluster. If all three of these residues have distance scores that are >50 ?2 away from all other glutamine residues in the super-model, then AANT will assign the other residues into new clusters. If this process yields more than 13 clusters, then AANT will discard the clustering assignments and begin the process anew using a distance score threshold of 54 ?2.

    DISCUSSION

    As of this writing, the PDB contained 930 solved structures of complexes between proteins and nucleic acid molecules. The AANT database extracts all interactions between single bases and single amino acid residues from these structures. For each of these classes of interactions, the AANT software generates a 3D superimposition and clusters structurally similar interactions into families based on spatial similarity. For example, AANT classifies the 194 known interactions between an adenine base and an arginine side chain into 12 families. Using the free Chime chemical display browser plug-in (http://www.mdlchime.com/), the 3D superimposition and renderings of families can be visualized and manipulated in real time. Researchers may also download structural models for further analysis using any of the publicly available tools for manipulating PDB structures, such as RasMol (15) and DeepView (16). However, by using the Chime plug-in it should be possible for users to quickly segue between different structural representations and to thereby generate their own structural hypotheses. In addition to the 3D renderings, AANT also provides a novel, simplified 2D schematic that shows all of the predicted interactions between a given protein and a complexed nucleic acid (Fig. 2).

    Figure 2. Example of a 2D rendering of protein–nucleic acid interactions. The Zif268 zinc-finger peptide (blue) and one strand (red) from a duplex 11-mer oligonucleotide that binds to the peptide (PDB ID: 1aay ) are laid out as lines. Moving from left to right, the cross lines join amino acids from N-terminus to C-terminus with nucleotides from 5' to 3'. AANT generates a similar diagram for each experimental structure of a protein–nucleic acid complex.

    While there is no code per se for amino acid–nucleotide interactions, some combinations of amino acids and nucleotides are clearly preferred in the context of protein–nucleic acid interactions. There have been several previous attempts (9,10,17–20) to count amino acid–nucleotide interactions and to use these data to draw conclusions about the specificity of protein–nucleic acid interactions. However, these previous attempts either did not cluster interactions into structural subsets, or involved only a relatively limited number of structures.

    For example, the Nucleic Acid Interaction Library (NAIL) (20) contains theoretical predictions (as opposed to experimental models) of several different kinds of interactions, and overlaps with AANT only in terms of nucleobase–amino acid side chain interactions. When experimental structures were used to evaluate NAIL’s completeness, the authors observed no classes of interactions that were not already within NAIL. This contrasts strongly with our own observations, as AANT contains more classes of experimentally observed amino acid–nucleobase interactions than are included in NAIL. We conjecture that the AANT software observed more types of nucleobase–side chain interactions primarily because it uses different hydrogen-bond prediction software.

    In addition, Thornton and her co-workers have previously generated the Protein–Side Chain Interactions database and have analyzed the apposition of amino acids and nucleotides from 129 non-homologous protein–DNA complexes (9). Similarly, Treger and Westhof (10) analyzed the apposition of amino acids and nucleotides from 45 non-homologous protein–RNA complexes. In contrast, AANT has, as of this writing, almost an order of magnitude more structures, and is continually updated. As an example of the difference in coverage, the Protein–Side Chain Interaction database reports 56 hydrogen-bond interactions with arginine; AANT reports and can classify 100 times that many interactions (>3300 for Arg–DNA, >2700 for Arg–RNA).

    Overall, the larger AANT database largely validates the findings of the smaller statistical samplings, but also provides additional data for analysis and discussion. For example, in a study of 45 non-homologous protein–RNA interactions, Treger and Westhof (10, their table 7) found that 20% of hydrogen-bonded or ionic interactions were between amino acids and ribose, 43% were with phosphate and 38% were with bases. A similar study by Jones and co-workers (8, their table 5) encompassed only 32 non-homologous protein–RNA interactions and found that the proportions were ribose 16%, phosphate 34%, bases 50%. In the AANT database the proportions are ribose 23%, phosphate 51% and bases 26% (Table 1). The numerical discrepancies between these studies may be due to the sample size or to the ways in which hydrogen-bonded contacts were found or counted, but it seems that phosphate contacts are greatly preferred relative to contacts with ribose, typically by a 2:1 margin. This skewing is further exacerbated in protein–DNA interactions, where only 2% of the contacts are with deoxyribose. Luscombe and co-workers previously made this observation (9, their table 2) based on a much smaller data set.

    Table 1. Interactions between substructures

    The data in Table 1 also show that while researchers frequently focus on interactions between amino acid side chains and nucleobases (13,20,21), these only make up 19.8% of the total, predicted hydrogen-bond interactions found in the database. Individual data for DNA (21.7%) and RNA (17.2%) are found in Table 1. These statistics reinforce the conclusion that the peptide backbone and the nucleotide sugar and phosphate contribute significantly to the affinity and specificity of protein–nucleic acid interactions.

    Statistical breakdowns of the identities of amino acid–nucleotide interactions are presented in Tables 2, 3, 4 and 5 and graphically in Figure 3. The 2 values (not including Yate’s correction, because of the generally large sample sizes) were calculated for individual amino acid–nucleotide interactions. Each 2 value corresponds to a 2 x 2 contingency table representing the association of a given amino acid with a given nucleotide, e.g. the four cells in the 2 x 2 table associated with alanine–deoxyadenosine interactions would be the value actually shown in Table 4 (alanine–deoxyadenosine, 115), alanine–non-deoxyadenosine (294), non-alanine–deoxyadenosine (3216) and non-alanine–non-deoxyadenosine (10 331). Hence, all 2 values represent probabilities with one degree of freedom. This analysis is similar to the one performed by Treger and Westhof (10). In general, the positively charged amino acids lysine and arginine mediate the largest number of contacts in protein–nucleic acid interactions, while cysteine and non-polar amino acids mediate the fewest. Conversely, interactions with guanine are overrepresented within almost all of the amino acid classes. These conclusions are again roughly similar to those that have been drawn in previous studies (9,10), but now more detailed analyses of the preferences of amino acids for nucleotides can be carried out on a much larger data set. For example, Treger and Westhof (10) have suggested that in protein–RNA interactions asparagine prefers to hydrogen-bond with uridine (p < 0.0001, one degree of freedom), and that serine prefers adenosine (p < 0.0001), findings that are confirmed in the larger database. Similarly, Luscombe and co-workers (9) found that in protein–DNA interactions arginine and lysine prefer to hydrogen-bond with guanosine. We strongly confirm the preference of arginine for guanosine, but our larger data set finds no statistical support for a preference of lysine for guanosine (p < 0.27); if anything, lysine may prefer thymidine (p < 0.002). In making these comparisons, one caveat is that AANT quantifies interactions between protein backbone residues and nucleotides, while other studies typically do not. The inclusion of protein backbone residues and the larger sample size in AANT allows other hydrogen-bonding preferences not previously noted to become apparent. For example, in protein–RNA complexes we now note the preference of both threonine and tyrosine for adenosine (p < 0.0001), and the preference of threonine for thymidine (p < 0.0001). In protein–DNA complexes glutamine is revealed to prefer adenosine (p < 0.0001), while acidic amino acids prefer cytidine (p < 0.0001). Also, while these ‘positive’ contributions of amino acid–nucleotide interactions to protein–nucleic acid specificity are interesting, the ‘negative’ contributions, in which particular amino acids may avoid particular nucleotides (and vice versa), may also ultimately be important for determining specificity. The comprehensive access to amino acid–nucleotide interactions that is provided by AANT should greatly facilitate researchers attempting to understand, statistically or otherwise, the specificity of protein–nucleic acid interactions.

    Table 2. Interactions between amino acids and DNA nucleotides (% in parentheses)

    Table 3. Interactions between amino acids and RNA nucleotides (% in parentheses)

    Table 4. Statistical data for protein–DNA interactions in AANT

    Table 5. Statistical data for protein–RNA interactions in AANT

    Figure 3. An ‘AANTarctica’ representation of the number of interactions between particular amino acids and particular nucleotides. Each interaction is defined by a marker at the intersection of one of 20 gray radial lines representing amino acids and one of five colored curves representing nucleotides. The distance of the marker from the center of the graph varies with the common logarithm of the number of interactions it represents. Nucleotides are depicted by the following markers: purines: A, solid dark blue diamond; G, solid dark maroon square; pyrimidines: C, hollow bright magenta square; T, hollow bright cyan diamond; U, hollow bright green triangle. Amino acids are labeled and organized into categories by label font: non-polar, medium italic; polar, bold italic; positively charged, medium roman; negatively charged, bold roman.

    Even a basal analysis of the database provides an example of how the aggregated data can yield insights into individual protein–nucleic acid interactions. Proline is not normally known to be involved in nucleic acid interactions; indeed, prolines are the least utilized of any amino acid in the current 930 structures in AANT (Table 1). There has been one reported example of a RNA binding motif that contains proline residues; the herpesvirus Us11 protein contains roughly 27 Arg-X-Pro repeats (22). However, in the context of this motif the proline residues help position the arginine side chains all along one face of a poly-L-proline II helix, and it is the ‘arginine face’ of this helix that actually makes contact with the nucleic acid (23,24).

    The infrequent use of proline as an amino acid for contacting nucleic acids can also be readily observed in the ‘AANTarctica’ graph (Fig. 3), which also reveals interesting statistical anomalies. For instance, there are numerous proline–uridine interactions (29 occurrences) relative to proline–thymidine interactions (two occurrences, despite the fact that there are many more protein–DNA complexes in the PDB). The discrepancy between recognition of uridine and thymidine is also apparent for aspartate (94 occurrences in protein–RNA complexes, but only nine in protein–DNA complexes). The larger data set available through AANT makes these discrepancies more evident than in previous analyses. Luscombe and co-workers (9, their table 2) noted a single hydrogen-bond interaction between proline and DNA; AANT has captured eight such interactions thus far, while Treger and Westhof (10, their table 9) found 10 hydrogen-bond interactions between proline and RNA, compared with the 92 currently found in AANT. In addition, the breakdown of interaction data available via AANT makes apparent an interesting trend that was not noted in either of these two earlier studies with smaller data sets: proline contacts the nucleobase most often (75%) in protein–DNA interactions, but instead contacts the ribose most often (75%) in protein–RNA interactions.

    Upon closer examination (an examination that was facilitated by the ability to extract lists of protein–nucleic acids from the 3D renderings of amino acid–nucleotide interactions), the proline–uridine interactions appear to occur within one major protein type, that of ribosomal protein subunits. All of the current proline–uridine interactions in AANT involve either the S12 or S17 protein in the 30S ribosomal complex. Other ribosomal proteins also appeared to utilize proline to interact with rRNA. Many adenine:proline interactions can be found in ribosomal proteins, including L11, S2, S8, S10 and S11. Extending the examination to interactions with cytidine and guanosine again yields primarily contacts between ribosomal proteins and ribosomal RNA. The fact that the preponderance of proline contacts occurs in ribosomal proteins may merely be a function of the skewing of the database towards such proteins, or there may be a more interesting functional explanation. The proline backbone appears to interact closely with the ribose sugars of nucleotides. It is possible that these types of interactions are part of a strategy to promote the close approach of a protein over a large surface area, allowing the peptide backbone to extend (via proline ‘kinks’) along the surface of the ribosome.

    In support of this notion, the other major sources of proline contacts, besides the ribosome, are other structures in which a protein and a RNA molecule contact one another over a large surface area, such as tRNA synthetases. Adenosine–proline contacts were seen in glutamyl-, aspartyl- and methionyl-tRNA synthetases, cytidine interactions were observed in aspartyl- and valyl-tRNA synthetases, and guanosine contacts in glutaminyl-tRNA synthetase. While these hypotheses are unproven at the moment, they provide an example of the type of insights that may be possible by using AANT as a tool for structure analysis.

    AANT represents a unique tool that will allow structural biologists who study nucleic acid binding proteins or biochemists who study protein–nucleic acid interactions to better understand their particular molecule or interaction in the context of all possible protein–nucleic acid interactions. For example, while researchers commonly compare proteins that fall within a given structural class (e.g. helix–turn–helix proteins) it is difficult to find commonalities that do not rely upon a given structure (e.g. arginine-rich motifs) or that may extend across classes (25). Since AANT visually represents the geometries of individual amino acid–nucleotide interactions, common or new types of protein–nucleic acid interfaces can be quickly evaluated merely by scanning. Nascent structural hypotheses can then be validated or rejected by generating summaries of the proteins and nucleic acids involved, or by tabulating the relative frequencies of different types of interactions.

    One of the major reasons for creating AANT was to better enable a variety of protein and nucleic acid engineering endeavors. First, in choosing which proteins or nucleic acids to engineer, the linear representations of protein–nucleic acid interactions that AANT can generate (Fig. 2) will assist researchers in identifying interfaces that are concentrated within short sequence stretches, and thus that can potentially be most easily engineered by cloning mutant oligonucleotides or PCR products. Second, the modular dissection of protein–nucleic acid complexes into amino acid–nucleotide complexes should facilitate the design of novel protein–nucleic acid interfaces. Previously, modular dissections of nucleic acid structures provided a database of examples of nucleotide–nucleotide interactions that allowed new nucleic acid structures to be accurately modeled de novo (26). A similar approach could presumably be taken using the amino acid–nucleotide interaction data provided by AANT. It is possible that the experimentally determined interactions could be modularly introduced into extant protein–nucleic acid scaffolds, replacing sterically similar interactions and altering specificities.

    FUTURE Directions

    We are in the process of introducing several improvements to AANT. We intend to include predicted interactions other than hydrogen bonds, such as stacking interactions and van der Waals interactions, and will include these as separate interaction classes. We also plan to give users more options for choosing whether to display and consider interactions from redundant or alternative structures of the same protein–nucleic acid interaction. This should allow a user to more conveniently determine whether a statistic or correlation is due to skewing of the entries in the PDB. Finally, we propose to improve the utility of the database by adding a search algorithm that will identify sets of proteins that utilize similar amino acid geometries for interacting with their nucleic acid ligands.

    ACKNOWLEDGEMENTS

    This work was supported by Office of Naval Research Grant number N00014-99-1-0861.

    REFERENCES

    Weiss,M.A. and Narayana,N. (1998) RNA recognition by arginine-rich peptide motifs. Biopolymers, 48, 167–180.

    Reedstrom,R.J., Brown,M.P., Grillo,A., Roen,D. and Royer,C.A. (1997) Affinity and specificity of trp repressor–DNA interactions studied with fluorescent oligonucleotides. J. Mol. Biol., 273, 572–585.

    Otwinowski,Z., Schevitz,R.W., Zhang,R.G., Lawson,C.L., Joachimiak,A., Marmorstein,R.Q., Luisi,B.F. and Sigler,P.B. (1988) Crystal structure of trp repressor/operator complex at atomic resolution. Nature, 335, 321–329.

    Draper,D.E. (1999) Themes in RNA–protein recognition. J. Mol. Biol., 293, 255–270.

    Leulliot,N. and Varani,G. (2001) Current topics in RNA–protein recognition: control of specificity and biological function through induced fit and conformational capture. Biochemistry, 40, 7947–7956.

    Tao,J. and Frankel,A.D. (1992) Specific binding of arginine to TAR RNA. Proc. Natl Acad. Sci. USA, 89, 2723–2736.

    Jones,S., van Heyningen,P., Berman,H.M. and Thornton,J.M. (1999) Protein–DNA interactions: A structural analysis. J. Mol. Biol., 287, 877–896.

    Jones,S., Daley, D.T., Luscombe,N.M., Berman,H.M. and Thornton,J.M. (2001) Protein–RNA interactions: a structural analysis. Nucleic Acids Res., 29, 943–954.

    Luscombe,N.M., Laskowski,R.A. and Thornton,J.M. (2001) Amino acid–base interactions: a three-dimensional analysis of protein–DNA interactions at an atomic level. Nucleic Acids Res., 29, 2860–2874.

    Treger,M. and Westhof,E. (2001) Statistical analysis of atomic contacts at RNA–protein interfaces. J. Mol. Recognit., 14, 199–214.

    Westbrook,J., Feng,Z., Chen,L., Yang,H. and Berman,H.M. (2003) The Protein Data Bank and structural genomics. Nucleic Acids Res., 31, 489–491.

    McDonald,I.K. and Thornton,J.M. (1994) Satisfying hydrogen-bonding potential in proteins. J. Mol. Biol., 238, 777–793.

    Pabo,C.O. and Nekludova,L. (2000) Geometric analysis and comparison of protein–DNA interfaces: why is there no simple code for recognition? J. Mol. Biol., 301, 597–624.

    Tou,J.T. and Gonzales,R.C. (1977) Pattern Recognition Principles. 2nd edn. Addison-Wesley, Reading, MA.

    Sayle,R.A. and Milner-White,E.J. (1995) RASMOL: biomolecular graphics for all. Trends Biochem. Sci., 20, 374.

    Schwede,T., Kopp,J., Guex,N. and Peitsch,M.C. (2003) SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res., 31, 3381–3385.

    Suzuki,M. (1994) A framework for the DNA–protein recognition code of the probe helix in transcription factors: the chemical and stereochemical rules. Structure, 2, 317–326.

    Mandel-Gutfreund,Y., Schueler,O. and Margalit,H. (1995) Comprehensive analysis of hydrogen-bonds in regulatory protein–DNA complexes: in search of common principles. J. Mol. Biol., 253, 370–382.

    Kono,H. and Sarai,A. (1999) Structure-based prediction of DNA target sites by regulatory proteins. Proteins, 35, 114–131.

    Cheng,A.C., Chen,W.W., Fuhrmann,C.N. and Frankel,A.D. (2003) Recognition of nucleic acid bases and base-pairs by hydrogen-bonding to amino acid side-chains. J. Mol. Biol., 327, 781–796.

    Seeman,N.C., Rosenberg,J.M. and Rich,A. (1976) Sequence-specific recognition of double helical nucleic acids by proteins. Proc. Natl Acad. Sci. USA, 73, 804–808.

    Khoo,D., Perez,C. and Mohr,I. (2002) Characterization of RNA determinants recognized by the arginine- and proline-rich region of Us11, a herpes simplex virus type 1-encoded double-stranded RNA binding protein that prevents PKR activation. J. Virol., 76, 11971–11981.

    Roller,R.J., Monk,L.L., Stuart,D. and Roizman,B. (1996) Structure and function in the herpes simplex virus 1 RNA-binding protein Us11: mapping of the domain required for ribosomal and nucleolar association and RNA binding in vitro. J. Virol., 70, 2842–2851.

    Gresh,N. (1996) Can a polyproline II helical motif be used in the context of sequence-selective major groove recognition of B-DNA? A molecular modelling investigation. J. Biomol. Struct. Dyn., 14, 255–273.

    Pabo,C.O. and Sauer,R.T. (1992) Transcription factors: structural families and principles of DNA recognition. Annu. Rev. Biochem., 61, 1053–1095.

    Leclerc,F., Cedergren,R. and Ellington,A.D. (1994) A three-dimensional model of the Rev-binding element of HIV-1 derived from analyses of aptamers. Nature Struct. Biol., 1, 293–300.(Michael M. Hoffman1,2, Maksim A. Khrapov)