Dimerization specificity of all 67 B-ZIP motifs in Arabidopsis thalian
http://www.100md.com
《核酸研究医学期刊》
Department of Biological Sciences, Purdue University, West Lafayette, IN 47907-1392, USA, 1 Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA and 2 Department Molecular Plant Physiology, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
* To whom correspondence should be addressed. Tel: +1 301 496 8753; Fax: +1 301 496 8419; Email: vinsonc@dc37a.nci.nih.gov
Present addresses: Christopher D. Deppmann, Department of Neuroscience, Johns Hopkins Medical School, Baltimore, MD, USA
Sjef Smeekens, Yacht Biochemistry and Life Sciences, Yacht Technology, Utrecht, The Netherlands
ABSTRACT
Basic region-leucine zipper (B-ZIP) proteins are a class of dimeric sequence-specific DNA-binding proteins unique to eukaryotes. We have identified 67 B-ZIP proteins in the Arabidopsis thaliana genome. No A.thaliana B-ZIP domains are homologous with any Homo sapiens B-ZIP domains. Here, we predict the dimerization specificity properties of the 67 B-ZIP proteins in the A.thaliana genome based on three structural properties of the dimeric -helical leucine zipper coiled coil structure: (i) length of the leucine zipper, (ii) placement of asparagine or a charged amino acid in the hydrophobic interface and (iii) presence of interhelical electrostatic interactions. Many A.thaliana B-ZIP leucine zippers are predicted to be eight or more heptads in length, in contrast to the four or five heptads typically found in H.sapiens, a prediction experimentally verified by circular dichroism analysis. Asparagine in the a position of the coiled coil is typically observed in the second heptad in H.sapiens. In A.thaliana, asparagine is abundant in the a position of both the second and fifth heptads. The particular placement of asparagine in the a position helps define 14 families of homodimerizing B-ZIP proteins in A.thaliana, in contrast to the six families found in H.sapiens. The repulsive interhelical electrostatic interactions that are used to specify heterodimerizing B-ZIP proteins in H.sapiens are not present in A.thaliana. Instead, we predict that plant leucine zippers rely on charged amino acids in the a position to drive heterodimerization. It appears that A.thaliana define many families of homodimerizing B-ZIP proteins by having long leucine zippers with asparagine judiciously placed in the a position of different heptads.
INTRODUCTION
Basic region-leucine zipper (B-ZIP) proteins are dimeric transcription factors that bind to DNA in a sequence specific manner (1,2). This class of protein is found exclusively in eukaryotes. Previously, we predicted the genome-wide dimerization properties for all 53 Homo sapiens (3) and 27 Drosophila melanogaster (4) B-ZIP motifs. A recent experimental analysis of the leucine zipper dimerization specificity of most H.sapiens B-ZIP proteins (5) has verified our previous predictions. This gives us confidence that we can use the same structural rules to predict the dimerization specificity of B-ZIP proteins from other organisms. Recently, the Arabidopsis thaliana genome was sequenced and the B-ZIP proteins within this genome were annotated (6,7). These proteins are important for pathogen defense, light-induced signaling, seed maturation and flower development (7). Many of the features of the leucine zipper that are critical for specifying the dimerization properties of H.sapiens B-ZIP proteins are conserved in A.thaliana B-ZIP proteins. Thus, we can use our understanding of mammalian leucine zipper dimerization specificity to predict the dimerization properties of A.thaliana B-ZIP proteins.
When bound to DNA, each B-ZIP monomer is a long -helix (Figure 1). The N-terminal half binds in the major groove of double-stranded DNA in a sequence-specific manner. The C-terminal half is amphipathic and can dimerize to form a parallel coiled coil, termed a leucine zipper (8). B-ZIP proteins form homodimers and/or heterodimers depending on the amino acid sequence of the leucine zipper (9). Extensive mutagenic and biophysical studies have characterized the most common interhelical amino acid interactions observed in human B-ZIP proteins by quantifying the contribution of individual amino acids to both dimerization stability and specificity (10–13). This knowledge has allowed us to rationalize what is known about the dimerization properties of B-ZIP domains and to predict how uncharacterized B-ZIP proteins will dimerize with each other.
Figure 1. X-ray structure of GCN4 B-ZIP motif bound to double stranded DNA (1). The N-terminal of the protein, the basic region and leucine zipper are labeled. The first three heptads of the leucine zipper are delineated.
Each monomer in the leucine zipper has a structural repeat of two -helical turns (seven amino acids) (supplementary Figure 1). This repeat is termed a heptad with each of the seven positions assigned the letter designation a, b, c, d, e, f and g. The a, d, e and g positions are near the leucine zipper interface and dictate dimerization specificity (3). Amino acids in the a and d positions lie on the same side of the -helix and are typically hydrophobic. These hydrophobic amino acids interact interhelically with hydrophobic amino acids in the same a and d positions of the second -helix of the leucine zipper to stabilize the structural dimer (14–16). Typically, the d position contains leucine that stabilizes the dimer better than other amino acids (16).
Dimerization specificity is regulated by amino acids in the a, e and g positions. In animal B-ZIP proteins, the second heptad a position typically contains asparagine which limits leucine zipper oligomerization to dimers (17,18). As depicted in supplementary Figure 1, asparagine prefers an interhelical interaction with an asparagine in the a position of a partner protein and will not interact efficiently with an aliphatic amino acid (12,19). Charged amino acids in the a position inhibit homodimer formation (12,20), while lysine residues in this position permit heterodimer formation with a wide range of amino acids (12).
The g and e positions often contain charged amino acids that form attractive electrostatic interhelical interactions (3,21–23). These interactions are denoted ge' where the prime (') indicates a residue on the second -helix of the dimeric leucine zipper. ge' interactions between oppositely charged amino acids (e.g. ER or KE) are attractive and promote dimerization specificity, while ge' interactions between similarly charged amino acids (e.g. EE or RR) are repulsive and inhibit homodimerization. For the mammalian B-ZIP protein FOS, repulsive EE interactions involving glutamate residues discourage homodimer formation, while attractive EK interactions between the glutamate of FOS and the lysine of JUN promote heterodimer formation (9).
In this study, we identify 67 B-ZIP motifs in the recently sequenced A.thaliana genome (6). None of these proteins are homologous to human B-ZIP proteins but they have similar amino acids to regulate dimerization specificity. We have annotated these B-ZIP protein sequences and have predicted their dimerization properties. Our analysis reveals that many A.thaliana B-ZIP proteins have longer leucine zippers than observed in H.sapiens. In addition, unlike human B-ZIP proteins, dimerization of most of these zippers is not specified by attractive and repulsive ge' interactions, but by the unique placement of asparagines in heptad a positions throughout the leucine zipper. These structural features generate 14 homodimerizing families of A.thaliana B-ZIP proteins, in contrast to the six homodimerizing families observed in H.sapiens.
METHODS
Pattern matching
The pattern matching program at http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl was used to identify A.thaliana proteins that contain putative B-ZIP motifs. Two types of regular expressions were used to query this database. The ‘Basic Region’ expression (XXX X) was modified from (24). The ‘B-ZIP’ expressions used were variations of (RXXXXXXXXXXXXXXXXXX) previously used to identify B-ZIP proteins in D.melanogaster (4).
Multiple alignment and phylogenetic tree analysis
Multiple alignments were performed using ClustalW using the default options. Phylogenetic analysis was performed using tree review (1.6.6) (26).
Proteins
Heterodimerizing proteins containing mutations in the a position were generated as described previously (12). Thermodynamic parameters were determined from denaturation curves, assuming a two-state equilibrium dissociation of -helical dimers into unfolded monomers using Cp of 2.04 kcal/mole/°C, as described previously (12). Coupling energy values (G) are reported at 37°C.
The Opaque B-ZIP domain amino acid sequence is as follows. The first 13 amino acids are from phi10 followed by the basic region and the leucine zipper that is divided into heptads (gabcdef). The asparagines in the a position of the second and fifth heptads are in italic. ASMTGGQQMGRDP-EILGFKMPTEERVRKRKESNRESARRSRYRKAA HLKELED QVAQLKA ENSCLLR RIAALNQ KYNDANV DNRVLRA DMETLRA KVKMGED SLKRVIE MSSSVPSS. Circular dichroism (CD) studies were performed in buffer containing 12.5 mM KPO4 (pH 7.4), 150 mM KCl and 0.25 mM DTT using a Jasco J-720 spectropolarimeter as described previously (12).
RESULTS
Identification and phylogenetic analysis of Arabidopsis thaliana B-ZIP domains
Previous studies by two groups utilized automated motif identification software to identify 81 and 75 (7,27) B-ZIP factors, respectively, in the A.thaliana genome. We used the web-based TAIR genehunter 2.0 tool at http://www.arabidopsis.org/cgi-bin/geneform/geneform.pl to search several different databases and assembled a list of proteins identified as B-ZIP factors. Each of these proteins was inspected visually and included in our analysis if they appeared to be bona fide B-ZIP factors based on two criteria: (i) a well-defined basic region including a near invariant asparagine critical for sequence-specific DNA binding and (ii) a canonical leucine zipper region with obvious amphipathic qualities. This analysis yielded 67 B-ZIP proteins.
We also performed a search of the A.thaliana genome using a web-based pattern search tool at http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl to search known and predicted proteins for amino acid patterns representing the entire B-ZIP domain (XXXNXXRXXXXXXXXXXXXXXXXXX) or just the basic region (XXXX). These patterns have been used successfully to identify B-ZIP factors in the genomes of Saccharomyces cerevisiae (24), D.melanogaster (4) and H.sapiens (3). This strategy identified the same 67 A.thaliana B-ZIP proteins.
Supplementary Table 1 lists the accession numbers, synonyms, protein length and position of the B-ZIP motif in the whole protein for the 67 B-ZIP factors identified in this genome-wide search. Eight proteins (AtbZIP14, AtbZIP27, AtbZIP32, AtbZIP34, AtbZIP61, AtbZIP72, AtbZIP73 and AtbZIP75) were reported earlier as B-ZIP proteins (7) but are not represented in this study. These eight factors are leucine rich, but did not meet our B-ZIP selection criteria. For example, AtbZIP14 and AtbZIP27 each have perfect basic regions, but no leucine zipper. The sequences of AtbZIP32, AtbZIP73 and AtbZIP75 were analyzed and yielded no recognizable B-ZIP domain. Riechmann et al. (27) did not provide accession numbers for the 81 B-ZIP factors identified in their study. This complicated our ability to analyze the sequences from their study that are not included in our list.
The amino acid relatedness of the 67 A.thaliana and 53 H.sapiens B-ZIP motifs was determined using ClustalW and the information is presented as a cladogram (Supplementary Figure 2). No interspecies clustering is observed, indicating that there are no homologous A.thaliana and H.sapiens B-ZIP domains. This is in contrast to the near total overlap between D.melanogaster and H.sapiens B-ZIP proteins (4).
Analysis of the 67 A.thaliana B-ZIP leucine zippers
Alignment and amino acid content
Figure 2 presents the amino acid sequences of the 67 A.thaliana B-ZIP motifs using a conserved asparagine residue in the basic region of all proteins (data not shown) to align the sequences. This stringent criterion excluded from our analysis the rare B-ZIP proteins, like the mammalian GADD153 protein, that lacks this conserved asparagine in the basic region (28). The N-terminal boundary of the leucine zipper is identical for all B-ZIP proteins because it is defined by the presence of the basic region. To identify the C-terminal boundary, three criteria were used: (i) the presence of a proline or a pair of glycines, either of which disrupts the -helical structure of a leucine zipper, (ii) the presence of a leucine in the d position and (iii) the presence of charged amino acids in the g and e positions. Amino acids within the zippers that are important for dimerization specificity are color-coded using the key developed for the analysis of the D.melanogaster (4) and H.sapiens (3) B-ZIP proteins. The proteins are listed beginning with the zippers that are predicted to form only homodimeric interhelical interactions (families A–N), followed by the zippers that are predicted to form both homodimeric and heterodimeric interactions (families O–S). The TGA family of B-ZIP proteins (family T in Figure 2) contains short leucine zippers with charged amino acids in all a positions.
Figure 2. Amino acid sequence of the 67 A.thaliana B-ZIP domains. Proteins are arranged into families, A–T shown in bold in the first column, with similar predicted dimerization properties. The second column depicts names of the families from Jakoby et al. (7). The solid line delineates homodimerizing proteins (A–N) from proteins that have complex dimerization pattern (P–T). The leucine zipper region is divided into heptads (gabcdef) to help visualize the ge' pairs. Amino acids predicted to regulate dimerization specificity are color-coded. If the g and following e positions contain charged amino acids we colored the heptad from g to the following e. We use four colors to represent ge' pairs. Green is for the attractive basic-acidic pairs (RE and KE), orange is for the attractive acidic-basic pairs (ER, EK, DR and DK), red is for repulsive acidic pairs (EE and ED), and blue is for repulsive basic pairs (KK and RK). If only one of the two amino acids in the ge' pair is charged, we color that residue blue for basic and red for acidic. If the a or d position is polar, it is colored black and if either is charged, it is colored purple. The prolines and glycines are colored red to indicate a potential break in the -helical structure. The predicted C-terminal boundary of the B-ZIP leucine zipper is denoted by an asterisk that enables us to define the frequencies of amino acids in different position of the leucine zipper. The C-terminal boundary is defined by the presence hydrophobic amino acids in the a and d positions, charged amino acids in the e and g positions, the absence of proline or pairs of glycines anywhere in the structure, and the absence of charged amino acids in the a and d positions. In the majority of cases, the decision was straight forward. However, in several instances, it is more ambiguous. For example, we indicate that the DPBF3 and DPBF4 leucine zipper stops after the second heptad because the third heptad has a K and E in the a and d positions, respectively; which should prevent leucine zipper formation. For the same reason At5g44080 and GBF4 stops after the third heptad as they have a R and K in the a and d positions, respectively, in the fourth heptad. At1g06070 goes to the ninth heptad even though it has two glycines in the fourth heptad. We would normally think this would terminate a leucine zipper structure but it appears very canonical from the fifth to the ninth heptads so be propose that it continues through the fourth heptad. This type of thinking was used to define all C-terminal boundary of all the leucine zippers. We do not intend these definitions of the C-terminus to be definitive, only approximate. The natural C-terminus is denoted with circumflex accent. The protein sequence for the tenth heptad of posF21, an N family member, and At2g13150, a P family member, that are predicted to not form a coiled-coil is LTGQVAP and VLISNEK, respectively. The dot in front of a protein indicates that it has been experimentally shown to form a homodimer.
Figure 3 presents the kinds of amino acids found in the a, d, e and g positions, the coiled coil position that are known to regulate dimerization stability and specificity using the definition of the C-terminus as shown in Figure 2. Overall, A.thaliana B-ZIP proteins contain fewer charged amino acids in the g an e positions than H.sapiens zippers. The charged amino acids produce attractive and repulsive ge' pairs to regulate dimerization specificity. For the a position, >50% of the residues in H.sapiens are aliphatic (I, V, L or M) and only 16% are asparagine. In contrast, in A.thaliana a positions, aliphatic amino acids are less abundant while asparagine is more abundant. In the d position, the uniquely stabilizing leucine is more abundant in H.sapiens (66%) than in A.thaliana (56%) while other aliphatic amino acids (I, V and M) are more frequent in A.thaliana (19%) than in H.sapiens (9%). While there are significant differences in the relative frequencies of amino acids appearing in the a, d, e and g positions of the heptads from plants and humans, the obvious conservation in amino acid usage provides assurance that the structural rules that we have applied to analyzing H.sapiens B-ZIP dimerization specificity (3) can be used to predict the dimerization of A.thaliana B-ZIP proteins.
Figure 3. Pie chart presenting the frequency of amino acids in all the g, e, a and d positions of the leucine zipper for both H.sapiens and A.thaliana B-ZIP proteins.
Leucine zipper length
Leucine zipper length presented in Figure 2 is a subjective decision based on the expectation that a heptad will be -helical. Several discrete criteria suggest the A.thaliana B-ZIP leucine zippers can be longer than H.sapiens. Supplementary Figure 3 categorizes the B-ZIP proteins from A.thaliana and H.sapiens by heptad number using the natural C-terminus or the presence of a proline or two glycines as the terminator of the -helix. Using these criteria, we observe that over half of H.sapiens leucine zippers terminate after four or six heptads and none are greater than nine heptads in length. In contrast, A.thaliana leucine zippers are more variable. Of the A.thaliana B-ZIP proteins, 10% have short zippers (3 heptads) and >30% have no -helix breakers for 10 or more heptads. This does not suggest that the zippers are that long but only that their structural limits are less well-defined than that observed in H.sapiens B-ZIP proteins.
The frequency of stabilizing leucine or other hydrophobic amino acids in the d positions of the A.thaliana and H.sapiens leucine zipper (16,29) were compared (Supplementary Figure 4). In H.sapiens, leucine is found in 100% of the second heptad d positions and in >80% of the d positions in heptads 0 to 3. In subsequent heptads, the frequency of leucine in the d positions drops dramatically and it is absent by heptad 7. In A.thaliana, the distribution of leucine in the d positions is biphasic. Leucine is found in 90% of the d positions in heptads 0 to 2, dramatically falls to 30% in heptad 3, and rises again to >50% in heptad 5. Leucine remains frequent in heptads 6 to 8, providing additional data to support the suggestion that some A.thaliana leucine zippers are longer than in H.sapiens.
Figure 4. Histogram of the frequency of asparagine in the a positions of the leucine zippers for all H.sapiens and A.thaliana B-ZIP proteins.
Defining features of the a position
Figure 4 presents the frequency of asparagine in the a positions of both H.sapiens and A.thaliana B-ZIP proteins. In both species, asparagine is abundant in the a position of heptad 2. In H.sapiens, asparagine is rarely observed in other heptads while in A.thaliana, asparagine is also common in heptad 5. Asparagine produces stable N–N interactions at the a–a' position, but does not interact favorably with other a position amino acids (12). Thus, the unique placement of asparagines throughout the leucine zippers of A.thaliana is used to define families of homodimerizing B-ZIP proteins.
The a positions in A.thaliana B-ZIP proteins also contain other polar amino acids, including serine and threonine. To experimentally evaluate the contribution of serine to dimerization specificity, we used a well-established, heterodimerizing leucine zipper system (4,12). Briefly, this system contains one monomer (EE34) in which homodimer formation is inhibited by repulsive, acidic ge' pairs in heptads 3 and 4 and a second monomer, termed A-RR34, in which homodimer formation is blocked by the presence of repulsive, basic ge' interactions in heptads 3 and 4. Heterodimer formation between A-RR34 and EE34 is favored and is stabilized further by the presence of an N-terminal, amphipathic extension in A-RR34 which lengthens the dimerization interface by interacting with the N-terminal, basic region of EE34. The third a position in both EE34 and A-RR34 was changed to serine and the energetic contribution of serine to homotypic and heterotypic a–a' interactions was examined. Circular dichroism spectroscopy was used to monitor thermal denaturation (Supplementary Table 2) and the coupling energy of each complex (relative to alanine) was calculated (Supplementary Table 3). Serine in the a position does not favor an interaction with itself, an aliphatic isoleucine, or a polar asparagine. Serine does, however, interact with lysine and in this way can exert a positive influence on heterodimer formation.
Like H.sapiens, A.thaliana B-ZIP proteins also contain charged amino acids in the a position that are destabilizing relative to alanine (12,20). The energetics of these charged amino acids interacting with other amino acids in the a position has not been examined extensively. The limited data available indicate that lysine prefers to interact with amino acids other than itself (Supplementary Table 3) (12). We suspect that these charged amino acids will drive heterodimer formation in a similar manner to lysine.
Features of the g and e positions
The observed frequency of attractive and repulsive ge' pairs in each heptad of H.sapiens and A.thaliana leucine zippers is presented in Figure 5. In H.sapiens, >50% of the first four heptads contain either attractive or repulsive ge' pairs and are virtually absent in subsequent heptads. This is consistent with the termination of the H.sapiens leucine zippers (Supplementary Figure 3). The first heptad shows the highest number of repulsive ge' pairs while attractive ge' pairs predominate in the more C-terminal heptads.
Figure 5. Histogram of the frequency of attractive or repulsive ge' pairs per heptad for both H.sapiens and A.thaliana B-ZIP proteins.
Arabidopsis thaliana leucine zippers, in contrast, contain about half the number of ge' pairs in the first four heptads with the frequency increasing to >30% in heptad 5. This pattern is consistent with the longer, average length of the A.thaliana leucine zippers. In A.thaliana, there are few repulsive ge' pairs and no protein with multiple repulsive ge' pairs (as found in some FOS proteins) implying that this feature is not a major determinant in driving heterodimerization of plant B-ZIP proteins.
Biophysical characterization of Opaque, a plant B-ZIP protein
To test the prediction that some A.thaliana B-ZIP leucine zippers have long leucine zippers, we biophysically characterized Opaque, the maize homologue of A.thaliana G family B-ZIP proteins. Opaque is predicted to be eight heptads long and has asparagine in the a position of the second and fifth heptads. Figure 6 presents the circular dichroism spectra and thermal denaturation of Opaque and CREB, a human BZIP protein that is only four heptads long. Both Opaque and CREB B-ZIP domains are truncated at the same position just N-terminal of the basic region. Circular dichroism spectra indicate that both Opaque and CREB have minima at 208 and 222 nM that is indicative of a protein that is primarily -helical. However, at the same concentration, Opaque has more ellipticity than CREB, indicating that Opaque contains more -helical structure. The truncated Opaque protein is 74.5% -helical corresponding to 12 heptads of -helix. In contrast, CREB is only 56.5% -helical corresponding to seven heptads of -helix. Because the CREB protein is only four heptads long, we assume that the additional structure is the leucine zipper propagating into the basic region, as has been observed for the vertebrate VBP protein (30). The important result is that Opaque is five heptads longer than CREB indicative of a longer leucine zipper. The thermal denaturations indicate that the leucine zipper region melts cooperatively and have a simple two-state denaturation profile.
Figure 6. (A) Circular dichroism spectra from 200–260 nm of CREB and Opaque, 4 μM at 6°C. The asterisks indicate the minima at 222 nm at which the thermal denaturations of these two proteins were monitored. (B) Thermal denaturation curves of CREB and Opaque at 4 μM concentration monitored at 222 nm.
The dimerization network of A.thaliana B-ZIP proteins
On the basis of the structural features described above, we have organized the A.thaliana B-ZIP proteins into families designated A through T (Figure 2). The families have been organized further into three groups: (i) proteins whose leucine zippers contain attractive interhelical interactions and that we predict should form homodimers, (ii) proteins whose zippers contain both attractive and repulsive interhelical interactions that we predict should display more complex dimerization properties and (iii) proteins with zippers containing only repulsive interhelical interactions that are predicted to form heterodimers. Table 1 summarizes the structural rationale used in organizing these groups. Forty-seven B-ZIP proteins make up the first group and are sub-divided into 14 homodimerizing families (A–N) based on the placement of asparagine in the a positions of each heptad. Ten B-ZIP proteins comprise the 5 families of the second group (O–S) and possess the interhelical ge' interactions favoring both homo- and heterodimer formation. We are less confident in predicting the dimerization properties of these proteins. The final group of proteins comprises a single family (T).
Table 1. Arabidopsis thaliana B-ZIP proteins were placed into families with similar predicted dimerization specificity
Experimental data documenting interactions between several A.thaliana B-ZIP proteins are available allowing us to evaluate some of our predictions. The data are typically from gel shift type experiments that indicate that two B-ZIP proteins can dimerize and bind to DNA. These results are consistent with our predictions as noted in Table 1. Cases where interactions observed experimentally were not predicted are discussed. GBF4 (family B) has been shown to dimerize with proteins in family F (31). Although the short leucine zippers of family B proteins could be considered a deterrent to efficient homodimer formation, there are no obvious structural details that suggest that GBF4, would interact preferentially with GBF1 or GBF2 in family F. Taken together, these observations indicate that GBF4, in fact, may dimerize with a large number of other B-ZIP proteins. Family C proteins have been shown to form intra-family dimers (32–35) and ABI5 can dimerize with proteins from family B, albeit at a reduced efficiency compared to intra-family dimerization (33). Once again, this reflects the possibility that family B zippers are promiscuous in their interactions.
The placement of asparagines exclusively in the second and fifth a positions is a feature of four families of A.thaliana B-ZIP proteins (F, G, H and K). Segregation of these proteins into four separate families was directed by the presence of attractive or repulsive ge' pairs. There is a report that family F proteins form dimers with family G proteins (36). This is surprising because an F/G dimer contains repulsive ge' pairs in both the first and fourth heptads. Family G orthologs also have been shown to form dimers with family H proteins (37,38). In this case, there are fewer attractive ge' pairs in the heterodimer compared to either homodimer, indicating that the heterodimer represents a less favored interaction. In this regard, it is important to note that the latter experiments were performed in the presence of DNA that may serve as a stabilizing influence on sub-optimal B-ZIP dimers.
DISCUSSION
We have identified 67 B-ZIP motifs in the A.thaliana genome and have predicted their dimerization partners. None of these B-ZIP motifs are homologous to any human B-ZIP motifs, but the kinds of amino acids observed in the a, d, e and g positions of the leucine zipper are similar to those observed in H.sapiens B-ZIP leucine zippers (Figure 3). We have used what is known about the contribution of these amino acids to dimerization specificity to predict A.thaliana B-ZIP dimerization partners. A.thaliana B-ZIP leucine zippers can be eight or more heptads long in contrast to the four to six heptads typically found in H.sapiens. Asparagine is observed throughout the a positions of these long leucine zippers and we used the particular placement of these asparagine residues to define 14 families of homodimerizing B-ZIP proteins (Figure 2).
In this manuscript, we have extended the biophysical examination of how different amino acids in the a position contribute to dimerization specificity. Quantitation of heterotypic a–a' interactions have only been described for L, I, V, A, N and K (4,12). In this study, we present data to show that serine (S) in the a position interacts with I, N, K and S at the a' position. These data indicate that serine contributes less to dimerization specificity than an aliphatic amino acid, a polar asparagine, or a charged lysine residue. According to our data, serine prefers to be in a heterotypic interaction with lysine. Serine is found in the a position of the A.thaliana GBF2, GBF3 and At2g21230 proteins where it may encourage intra-group dimer formation rather than strict intra-family homodimerization. Serine also occurs in the a position of At1g59530. However, we predict that this protein will form heterodimers due to repulsive ge' pairs.
A comparison between A.thaliana and H.sapiens B-ZIP proteins has identified clear differences. Three criteria suggest that the leucine zippers of B-ZIP domains in A.thaliana can be longer than in H.sapiens: (i) the presence of the -helix breakers proline or a pair of glycines, (ii) the presence of leucines in the d position, and (iii) the presence of charged amino acids in the g and e positions. Although many A.thaliana leucine zippers are eight or more heptads in length, they appear to have similar stability as H.sapiens leucine zippers in comparing the thermal denaturation of Opaque and CREB. The shorter, H.sapiens zippers have both a high frequency of stabilizing leucine residues in the d positions (16) and many attractive ge' pairs. As a result, these zippers are optimally stable. Arabidopsis thaliana leucine zippers, on the other hand, have a lower frequency of these stabilizing elements and have longer leucine zippers but similar stability.
Homo sapiens and A.thaliana regulate the dimerization specificity of their leucine zipper proteins by different mechanisms. The six homodimerizing families of H.sapiens B-ZIP factors contain asparagine in the second heptad a position. However, only two of these six families contain additional a position asparagines to dictate dimerization specificity, and the remaining four use variations of interhelical ge' interactions to drive homodimer formation (3). Eleven families of homodimerizing A.thaliana B-ZIP proteins contain asparagine in the second heptad a position but, in contrast to H.sapiens, these proteins also contain asparagines in the a positions of the fourth, fifth, sixth and eighth heptads, with asparagine in the fifth heptad being nearly as abundant as in the second heptad. In the case of heterodimer formation, H.sapiens B-ZIP factors use repulsive electrostatic interhelical ge' interactions, whereas A.thaliana B-ZIP factors contain few electrostatic repulsive ge' interactions. Instead, A.thaliana B-ZIP factors contain charged amino acids in the a positions as is observed in the MAF family of H.sapiens B-ZIP proteins. While we suggest these proteins will heterodimerize, we are unable to predict the dimerization partners.
The use of the same amino acids to regulate dimerization specificity in both H.sapiens and A.thaliana coiled coils suggests that these amino acids are uniquely suited for this task. The coincidence between longer leucine zippers and more predicted homodimerizing B-ZIP families suggests that longer leucine zippers are needed to generate more homodimerizing families.
The leucine zipper has been a favored motif for biophysical characterization because of the assumed two-state denaturation of the structure, allowing calculation of thermodynamic parameters that describe the stability of the protein complex. All of the leucine zippers that have been biophysically characterized contain an asparagine in the second heptad a position and are approximately four to five heptads long. These proteins have been characterized by a two-state folding equilibrium. More recent work has shown that this is an oversimplification and that intermediates are observed during the folding of these proteins (40–42). Longer coiled coils such as tropomysin display a complex denaturation profile (43). To examine if the longer leucine zippers predicted for some of the A.thaliana B-ZIP proteins with multiple a position asparagines, behaved as longer coiled coils, we studied the maize B-ZIP protein Opaque, which is related to the A.thaliana G family B-ZIP proteins. Opaque melts cooperatively and has a simple two state denaturation profile (Figure 6). At the same concentration (4 μM), Opaque is more elliptical than CREB, indicative of a longer coiled coil dimeric structure.
Several A.thaliana B-ZIP motifs would be interesting to examine in biophysical detail, including those that have short leucine zippers (families A, B and C) and those with longer bipartite zippers with a putative homodimerizing and heterodimerizing part (families Q and R). A minimum of three heptads, corresponding to six helical turns, is required for a peptide to adopt a two-stranded -helical coiled coil conformation (39). Therefore, B-ZIP proteins in families A, B and C may need DNA binding to stabilize the structure (31). The Q and R families have several heptads of putatively homodimerizing leucine zipper structure followed by heptads with charged amino acids in the a position. This is reminiscent of the Myc leucine zipper. These proteins may form homodimers with short coiled coils and heterodimers with longer coiled coils.
Perhaps the most intriguing A.thaliana B-ZIP proteins are the TGA proteins (family T). These proteins possess short leucine zippers with charged amino acids in the a positions of the first, second and third heptads that are expected to destabilize the leucine zipper structure. Despite these destabilizing characteristics, however, several of these factors have been reported to form both homodimers and heterodimers (44). Interestingly, deletion analysis of the TGA proteins indicates that the DNA-binding activity of these proteins requires the B-ZIP domain and a ‘dimerization stability region’ that is located more than 100 amino acids C-terminal to the zipper (45). The structural understanding of this intramolecular interaction is unknown, but the observation is consistent with the suggestion that the leucine zippers of the TGA proteins are unstable. Several of the A.thaliana B-ZIP proteins predicted to form heterodimers (families O–Q) are similar in many ways to the TGA proteins. These proteins possess charged amino acids in the a positions of the fourth, fifth and/or sixth heptads and thus rely on the sequences of the first three heptads for stable zipper formation. Clearly, this type of heptad arrangement is non-canonical, and it would be of interest to determine if these proteins denature in a sequential manner.
A notable difference between H.sapiens and A.thaliana leucine zippers is a histidine which is present in the d position of the fifth heptad in one-third of all human B-ZIP factors, including all AP-1 family members. The charge of this histidine is sensitive to intracellular changes in pH and can influence the stability and specificity of B-ZIP dimers. The absence of a similarly placed histidine residue in plants suggests that such a signaling system is absent in plants.
In summary, this study examines two issues: (i) a genome-wide comparison of A.thaliana and H.sapiens B-ZIP factors and (ii) a prediction of A.thaliana B-ZIP protein dimerization specificity. Our analysis shows that all B-ZIP proteins use the same amino acids to regulate dimerization specificity, but that A.thaliana and H.sapiens genomes exploit the properties of these residues in different ways. A.thaliana B-ZIP proteins have long leucine zippers that are used to define multiple families of homodimerizing B-ZIP proteins.
SUPPLEMENTARY MATERIAL
ACKNOWLEDGEMENTS
A portion of this work was supported by PHS CA-78264 awarded to E.J.T. C.D.D. was a pre-doctoral trainee of PHS T32 CA-09644-09.
REFERENCES
Ellenberger,T.E., Brandl,C.J., Struhl,K. and Harrison,S.C. ( (1992) ) The GCN4 basic region leucine zipper binds DNA as a dimer of uninterrupted alpha helices: crystal structure of the protein-DNA complex. Cell, , 71, , 1223–1237.
Vinson,C.R., Sigler,P.B. and McKnight,S.L. ( (1989) ) Scissors-grip model for DNA recognition by a family of leucine zipper proteins. Science, , 246, , 911–916.
Vinson,C., Myakishev,M., Acharya,A., Mir,A.A., Moll,J.R. and Bonovich,M. ( (2002) ) Classification of human B-ZIP proteins based on dimerization properties. Mol. Cell. Biol., , 22, , 6321–6335.
Fassler,J., Landsman,D., Acharya,A., Moll,J.R., Bonovich,M. and Vinson,C. ( (2002) ) B-ZIP proteins encoded by the Drosophila genome: evaluation of potential dimerization partners. Genome Res., , 12, , 1190–1200.
Newman,J.R. and Keating,A.E. ( (2003) ) Comprehensive identification of human bZIP interactions with coiled-coil arrays. Science, , 300, , 2097–2101.
The Arabidopsis Genome Initiative ( (2000) ) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, , 408, , 796–815.
Jakoby,M., Weisshaar,B., Droge-Laser,W., Vicente-Carbajosa,J., Tiedemann,J., Kroj,T. and Parcy,F. ( (2002) ) bZIP transcription factors in Arabidopsis. Trends Plant Sci., , 7, , 106–111.
Landschulz,W.H., Johnson,P.F. and McKnight,S.L. ( (1988) ) The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science, , 240, , 1759–1764.
O'Shea,E.K., Rutkowski,R. and Kim,P.S. ( (1992) ) Mechanism of specificity in the Fos–Jun oncoprotein heterodimer. Cell, , 68, , 699–708.
Baxevanis,A.D. and Vinson,C.R. ( (1993) ) Interactions of coiled coils in transcription factors: where is the specificity? Curr. Opin. Genet. Dev., , 3, , 278–285.
Zhou,J. and Goldsbrough,P.B. ( (1994) ) Functional homologs of fungal metallothionein genes from Arabidopsis. Plant Cell, , 6, , 875–884.
Acharya,A., Ruvinov,S.B., Gal,J., Moll,J.R. and Vinson,C. ( (2002) ) A heterodimerizing leucine zipper coiled coil system for examining the specificity of a position interactions: amino acids I, V, L, N, A, and K. Biochemistry, , 41, , 14122–14131.
Krylov,D., Barchi,J. and Vinson,C. ( (1998) ) Inter-helical interactions in the leucine zipper coiled coil dimer: pH and salt dependence of coupling energy between charged amino acids. J. Mol. Biol., , 279, , 959–972.
Thompson,K.S., Vinson,C.R. and Freire,E. ( (1993) ) Thermodynamic characterization of the structural stability of the coiled-coil region of the bZIP transcription factor GCN4. Biochemistry, , 32, , 5491–5496.
Landschulz,W.H., Johnson,P.F. and McKnight,S.L. ( (1989) ) The DNA binding domain of the rat liver nuclear protein C/EBP is bipartite. Science, , 243, , 1681–1688.
Moitra,J., Szilak,L., Krylov,D. and Vinson,C. ( (1997) ) Leucine is the most stabilizing aliphatic amino acid in the d position of a dimeric leucine zipper coiled coil. Biochemistry, , 36, , 12567–12573.
Harbury,P.B., Zhang,T., Kim,P.S. and Alber,T. ( (1993) ) A switch between two-, three-, and four-stranded coiled coils in GCN4 leucine zipper mutants. Science, , 262, , 1401–1407.
Gonzalez,L., Jr., Woolfson,D.N. and Alber,T. ( (1996) ) Buried polar residues and structural specificity in the GCN4 leucine zipper. Nat. Struct. Biol., , 3, , 1011–1018.
Zeng,X., Herndon,A.M. and Hu,J.C. ( (1997) ) Buried asparagines determine the dimerization specificities of leucine zipper mutants. Proc. Natl Acad. Sci. USA, , 94, , 3673–3678.
Wagschal,K., Tripet,B., Lavigne,P., Mant,C. and Hodges,R.S. ( (1999) ) The role of position a in determining the stability and oligomerization state of alpha-helical coiled coils: 20 amino acid stability coefficients in the hydrophobic core of proteins. Protein Sci., , 8, , 2312–2329.
Alber,T. ( (1992) ) Structure of the leucine zipper. Curr. Opin. Genet. Dev., , 2, , 205–210.
Vinson,C.R., Hai,T. and Boyd,S.M. ( (1993) ) Dimerization specificity of the leucine zipper-containing bZIP motif on DNA binding: prediction and rational design. Genes Dev., , 7, , 1047–1058.
Cohen,D.R. and Curran,T. ( (1990) ) Analysis of dimerization and DNA binding functions in Fos and Jun by domain-swapping: involvement of residues outside the leucine zipper/basic region. Oncogene, , 5, , 929–939.
Fernandes,L., Rodrigues-Pousada,C. and Struhl,K. ( (1997) ) Yap, a novel family of eight bZIP proteins in Saccharomyces cerevisiae with distinct biological functions. Mol. Cell. Biol., , 17, , 6982–6993.
Higgins,D.G., Thompson,J.D. and Gibson,T.J. ( (1996) ) Using CLUSTAL for multiple sequence alignments. Methods Enzymol., , 266, , 383–402.
Page,R.D. ( (1996) ) TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci., , 12, , 357–358.
Riechmann,J.L., Heard,J., Martin,G., Reuber,L., Jiang,C., Keddie,J., Adam,L., Pineda,O., Ratcliffe,O.J., Samaha,R.R.et al. ( (2000) ) Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science, , 290, , 2105–2110.
Ron,D. and Habener,J.F. ( (1992) ) CHOP, a novel developmentally regulated nuclear protein that dimerizes with transcription factors C/EBP and LAP and functions as a dominant-negative inhibitor of gene transcription. Genes Dev., , 6, , 439–453.
Tripet,B., Wagschal,K., Lavigne,P., Mant,C.T. and Hodges,R.S. ( (2000) ) Effects of side-chain characteristics on stability and oligomerization state of a de novo-designed model coiled-coil: 20 amino acid substitutions in position ‘d’. J. Mol. Biol., , 300, , 377–402.
Moll,J.R., Olive,M. and Vinson,C. ( (2000) ) Attractive interhelical electrostatic interactions in the proline- and acidic-rich region (PAR) leucine zipper subfamily preclude heterodimerization with other basic leucine zipper subfamilies. J. Biol. Chem., , 275, , 34826–34832.
Menkens,A.E. and Cashmore,A.R. ( (1994) ) Isolation and characterization of a fourth Arabidopsis thaliana G-box-binding factor, which has similarities to Fos oncoprotein. Proc. Natl Acad. Sci. USA, , 91, , 2522–2526.
Nakamura,S., Lynch,T.J. and Finkelstein,R.R. ( (2001) ) Physical interactions between ABA response loci of Arabidopsis. Plant J., , 26, , 627–635.
Kim,S.Y., Ma,J., Perret,P., Li,Z. and Thomas,T.L. ( (2002) ) Arabidopsis ABI5 subfamily members have distinct DNA-binding and transcriptional activities. Plant Physiol., , 130, , 688–697.
Choi,H., Hong,J., Ha,J., Kang,J. and Kim,S.Y. ( (2000) ) ABFs, a family of ABA-responsive element binding factors. J. Biol. Chem., , 275, , 1723–1730.
Bensmihen,S., Rippa,S., Lambert,G., Jublot,D., Pautot,V., Granier,F., Giraudat,J. and Parcy,F. ( (2002) ) The homologous ABI5 and EEL transcription factors function antagonistically to fine-tune gene expression during late embryogenesis. Plant Cell, , 14, , 1391–1403.
Armstrong,G.A., Weisshaar,B. and Hahlbrock,K. ( (1992) ) Homodimeric and heterodimeric leucine zipper proteins and nuclear factors from parsley recognize diverse promoter elements with ACGT cores. Plant Cell, , 4, , 525–537.
Rugner,A., Frohnmeyer,H., Nake,C., Wellmer,F., Kircher,S., Schafer,E. and Harter,K. ( (2001) ) Isolation and characterization of four novel parsley proteins that interact with the transcriptional regulators CPRF1 and CPRF2. Mol. Genet. Genomics, , 265, , 964–976.
Strathmann,A., Kuhlmann,M., Heinekamp,T. and Droge-Laser,W. ( (2001) ) BZI-1 specifically heterodimerises with the tobacco bZIP transcription factors BZI-2, BZI-3/TBZF and BZI-4, and is functionally involved in flower development. Plant J., , 28, , 397–408.
Su,J.Y., Hodges,R.S. and Kay,C.M. ( (1994) ) Effect of chain length on the formation and stability of synthetic alpha-helical coiled coils. Biochemistry, , 33, , 15501–15510.
Wendt,H., Durr,E., Thomas,R.M., Przybylski,M. and Bosshard,H.R. ( (1995) ) Characterization of leucine zipper complexes by electrospray ionization mass spectrometry. Protein Sci., , 4, , 1563–1570.
Zitzewitz,J.A., Ibarra-Molero,B., Fishel,D.R., Terry,K.L. and Matthews,C.R. ( (2000) ) Preformed secondary structure drives the association reaction of GCN4-p1, a model coiled-coil system. J. Mol. Biol., , 296, , 1105–1116.
Dragon,S., Offenhauser,N. and Baumann,R. ( (2002) ) cAMP and in vivo hypoxia induce tob, ifr1, and fos expression in erythroid cells of the chick embryo. Am. J. Physiol. Regul. Integr. Comp. Physiol., , 282, , R1219–1226.
Ozeki,S., Kato,T., Holtzer,M.E. and Holtzer,A. ( (1991) ) The kinetics of chain exchange in two-chain coiled coils: alpha alpha- and beta beta-tropomyosin. Biopolymers, , 31, , 957–966.
Niggeweg,R., Thurow,C., Weigel,R., Pfitzner,U. and Gatz,C. ( (2000) ) Tobacco TGA factors differ with respect to interaction with NPR1, activation potential and DNA-binding properties. Plant Mol. Biol., , 42, , 775–788.
Katagiri,F., Seipel,K. and Chua,N.H. ( (1992) ) Identification of a novel dimer stabilization region in a plant bZIP transcription activator. Mol. Cell Biol., , 12, , 4809–4816.
Pysh,L.D., Aukerman,M.J. and Schmidt,R.J. ( (1993) ) OHP1: a maize basic domain/leucine zipper protein that interacts with opaque2. Plant Cell, , 5, , 227–236.
Schindler,U., Beckmann,H. and Cashmore,A.R. ( (1992) ) TGA1 and G-box binding factors: two distinct classes of Arabidopsis leucine zipper proteins compete for the G-box-like element TGACGTGG. Plant Cell, , 4, , 1309–1319.(Christopher D. Deppmann, Asha Acharya1, )
* To whom correspondence should be addressed. Tel: +1 301 496 8753; Fax: +1 301 496 8419; Email: vinsonc@dc37a.nci.nih.gov
Present addresses: Christopher D. Deppmann, Department of Neuroscience, Johns Hopkins Medical School, Baltimore, MD, USA
Sjef Smeekens, Yacht Biochemistry and Life Sciences, Yacht Technology, Utrecht, The Netherlands
ABSTRACT
Basic region-leucine zipper (B-ZIP) proteins are a class of dimeric sequence-specific DNA-binding proteins unique to eukaryotes. We have identified 67 B-ZIP proteins in the Arabidopsis thaliana genome. No A.thaliana B-ZIP domains are homologous with any Homo sapiens B-ZIP domains. Here, we predict the dimerization specificity properties of the 67 B-ZIP proteins in the A.thaliana genome based on three structural properties of the dimeric -helical leucine zipper coiled coil structure: (i) length of the leucine zipper, (ii) placement of asparagine or a charged amino acid in the hydrophobic interface and (iii) presence of interhelical electrostatic interactions. Many A.thaliana B-ZIP leucine zippers are predicted to be eight or more heptads in length, in contrast to the four or five heptads typically found in H.sapiens, a prediction experimentally verified by circular dichroism analysis. Asparagine in the a position of the coiled coil is typically observed in the second heptad in H.sapiens. In A.thaliana, asparagine is abundant in the a position of both the second and fifth heptads. The particular placement of asparagine in the a position helps define 14 families of homodimerizing B-ZIP proteins in A.thaliana, in contrast to the six families found in H.sapiens. The repulsive interhelical electrostatic interactions that are used to specify heterodimerizing B-ZIP proteins in H.sapiens are not present in A.thaliana. Instead, we predict that plant leucine zippers rely on charged amino acids in the a position to drive heterodimerization. It appears that A.thaliana define many families of homodimerizing B-ZIP proteins by having long leucine zippers with asparagine judiciously placed in the a position of different heptads.
INTRODUCTION
Basic region-leucine zipper (B-ZIP) proteins are dimeric transcription factors that bind to DNA in a sequence specific manner (1,2). This class of protein is found exclusively in eukaryotes. Previously, we predicted the genome-wide dimerization properties for all 53 Homo sapiens (3) and 27 Drosophila melanogaster (4) B-ZIP motifs. A recent experimental analysis of the leucine zipper dimerization specificity of most H.sapiens B-ZIP proteins (5) has verified our previous predictions. This gives us confidence that we can use the same structural rules to predict the dimerization specificity of B-ZIP proteins from other organisms. Recently, the Arabidopsis thaliana genome was sequenced and the B-ZIP proteins within this genome were annotated (6,7). These proteins are important for pathogen defense, light-induced signaling, seed maturation and flower development (7). Many of the features of the leucine zipper that are critical for specifying the dimerization properties of H.sapiens B-ZIP proteins are conserved in A.thaliana B-ZIP proteins. Thus, we can use our understanding of mammalian leucine zipper dimerization specificity to predict the dimerization properties of A.thaliana B-ZIP proteins.
When bound to DNA, each B-ZIP monomer is a long -helix (Figure 1). The N-terminal half binds in the major groove of double-stranded DNA in a sequence-specific manner. The C-terminal half is amphipathic and can dimerize to form a parallel coiled coil, termed a leucine zipper (8). B-ZIP proteins form homodimers and/or heterodimers depending on the amino acid sequence of the leucine zipper (9). Extensive mutagenic and biophysical studies have characterized the most common interhelical amino acid interactions observed in human B-ZIP proteins by quantifying the contribution of individual amino acids to both dimerization stability and specificity (10–13). This knowledge has allowed us to rationalize what is known about the dimerization properties of B-ZIP domains and to predict how uncharacterized B-ZIP proteins will dimerize with each other.
Figure 1. X-ray structure of GCN4 B-ZIP motif bound to double stranded DNA (1). The N-terminal of the protein, the basic region and leucine zipper are labeled. The first three heptads of the leucine zipper are delineated.
Each monomer in the leucine zipper has a structural repeat of two -helical turns (seven amino acids) (supplementary Figure 1). This repeat is termed a heptad with each of the seven positions assigned the letter designation a, b, c, d, e, f and g. The a, d, e and g positions are near the leucine zipper interface and dictate dimerization specificity (3). Amino acids in the a and d positions lie on the same side of the -helix and are typically hydrophobic. These hydrophobic amino acids interact interhelically with hydrophobic amino acids in the same a and d positions of the second -helix of the leucine zipper to stabilize the structural dimer (14–16). Typically, the d position contains leucine that stabilizes the dimer better than other amino acids (16).
Dimerization specificity is regulated by amino acids in the a, e and g positions. In animal B-ZIP proteins, the second heptad a position typically contains asparagine which limits leucine zipper oligomerization to dimers (17,18). As depicted in supplementary Figure 1, asparagine prefers an interhelical interaction with an asparagine in the a position of a partner protein and will not interact efficiently with an aliphatic amino acid (12,19). Charged amino acids in the a position inhibit homodimer formation (12,20), while lysine residues in this position permit heterodimer formation with a wide range of amino acids (12).
The g and e positions often contain charged amino acids that form attractive electrostatic interhelical interactions (3,21–23). These interactions are denoted ge' where the prime (') indicates a residue on the second -helix of the dimeric leucine zipper. ge' interactions between oppositely charged amino acids (e.g. ER or KE) are attractive and promote dimerization specificity, while ge' interactions between similarly charged amino acids (e.g. EE or RR) are repulsive and inhibit homodimerization. For the mammalian B-ZIP protein FOS, repulsive EE interactions involving glutamate residues discourage homodimer formation, while attractive EK interactions between the glutamate of FOS and the lysine of JUN promote heterodimer formation (9).
In this study, we identify 67 B-ZIP motifs in the recently sequenced A.thaliana genome (6). None of these proteins are homologous to human B-ZIP proteins but they have similar amino acids to regulate dimerization specificity. We have annotated these B-ZIP protein sequences and have predicted their dimerization properties. Our analysis reveals that many A.thaliana B-ZIP proteins have longer leucine zippers than observed in H.sapiens. In addition, unlike human B-ZIP proteins, dimerization of most of these zippers is not specified by attractive and repulsive ge' interactions, but by the unique placement of asparagines in heptad a positions throughout the leucine zipper. These structural features generate 14 homodimerizing families of A.thaliana B-ZIP proteins, in contrast to the six homodimerizing families observed in H.sapiens.
METHODS
Pattern matching
The pattern matching program at http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl was used to identify A.thaliana proteins that contain putative B-ZIP motifs. Two types of regular expressions were used to query this database. The ‘Basic Region’ expression (XXX X) was modified from (24). The ‘B-ZIP’ expressions used were variations of (RXXXXXXXXXXXXXXXXXX) previously used to identify B-ZIP proteins in D.melanogaster (4).
Multiple alignment and phylogenetic tree analysis
Multiple alignments were performed using ClustalW using the default options. Phylogenetic analysis was performed using tree review (1.6.6) (26).
Proteins
Heterodimerizing proteins containing mutations in the a position were generated as described previously (12). Thermodynamic parameters were determined from denaturation curves, assuming a two-state equilibrium dissociation of -helical dimers into unfolded monomers using Cp of 2.04 kcal/mole/°C, as described previously (12). Coupling energy values (G) are reported at 37°C.
The Opaque B-ZIP domain amino acid sequence is as follows. The first 13 amino acids are from phi10 followed by the basic region and the leucine zipper that is divided into heptads (gabcdef). The asparagines in the a position of the second and fifth heptads are in italic. ASMTGGQQMGRDP-EILGFKMPTEERVRKRKESNRESARRSRYRKAA HLKELED QVAQLKA ENSCLLR RIAALNQ KYNDANV DNRVLRA DMETLRA KVKMGED SLKRVIE MSSSVPSS. Circular dichroism (CD) studies were performed in buffer containing 12.5 mM KPO4 (pH 7.4), 150 mM KCl and 0.25 mM DTT using a Jasco J-720 spectropolarimeter as described previously (12).
RESULTS
Identification and phylogenetic analysis of Arabidopsis thaliana B-ZIP domains
Previous studies by two groups utilized automated motif identification software to identify 81 and 75 (7,27) B-ZIP factors, respectively, in the A.thaliana genome. We used the web-based TAIR genehunter 2.0 tool at http://www.arabidopsis.org/cgi-bin/geneform/geneform.pl to search several different databases and assembled a list of proteins identified as B-ZIP factors. Each of these proteins was inspected visually and included in our analysis if they appeared to be bona fide B-ZIP factors based on two criteria: (i) a well-defined basic region including a near invariant asparagine critical for sequence-specific DNA binding and (ii) a canonical leucine zipper region with obvious amphipathic qualities. This analysis yielded 67 B-ZIP proteins.
We also performed a search of the A.thaliana genome using a web-based pattern search tool at http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl to search known and predicted proteins for amino acid patterns representing the entire B-ZIP domain (XXXNXXRXXXXXXXXXXXXXXXXXX) or just the basic region (XXXX). These patterns have been used successfully to identify B-ZIP factors in the genomes of Saccharomyces cerevisiae (24), D.melanogaster (4) and H.sapiens (3). This strategy identified the same 67 A.thaliana B-ZIP proteins.
Supplementary Table 1 lists the accession numbers, synonyms, protein length and position of the B-ZIP motif in the whole protein for the 67 B-ZIP factors identified in this genome-wide search. Eight proteins (AtbZIP14, AtbZIP27, AtbZIP32, AtbZIP34, AtbZIP61, AtbZIP72, AtbZIP73 and AtbZIP75) were reported earlier as B-ZIP proteins (7) but are not represented in this study. These eight factors are leucine rich, but did not meet our B-ZIP selection criteria. For example, AtbZIP14 and AtbZIP27 each have perfect basic regions, but no leucine zipper. The sequences of AtbZIP32, AtbZIP73 and AtbZIP75 were analyzed and yielded no recognizable B-ZIP domain. Riechmann et al. (27) did not provide accession numbers for the 81 B-ZIP factors identified in their study. This complicated our ability to analyze the sequences from their study that are not included in our list.
The amino acid relatedness of the 67 A.thaliana and 53 H.sapiens B-ZIP motifs was determined using ClustalW and the information is presented as a cladogram (Supplementary Figure 2). No interspecies clustering is observed, indicating that there are no homologous A.thaliana and H.sapiens B-ZIP domains. This is in contrast to the near total overlap between D.melanogaster and H.sapiens B-ZIP proteins (4).
Analysis of the 67 A.thaliana B-ZIP leucine zippers
Alignment and amino acid content
Figure 2 presents the amino acid sequences of the 67 A.thaliana B-ZIP motifs using a conserved asparagine residue in the basic region of all proteins (data not shown) to align the sequences. This stringent criterion excluded from our analysis the rare B-ZIP proteins, like the mammalian GADD153 protein, that lacks this conserved asparagine in the basic region (28). The N-terminal boundary of the leucine zipper is identical for all B-ZIP proteins because it is defined by the presence of the basic region. To identify the C-terminal boundary, three criteria were used: (i) the presence of a proline or a pair of glycines, either of which disrupts the -helical structure of a leucine zipper, (ii) the presence of a leucine in the d position and (iii) the presence of charged amino acids in the g and e positions. Amino acids within the zippers that are important for dimerization specificity are color-coded using the key developed for the analysis of the D.melanogaster (4) and H.sapiens (3) B-ZIP proteins. The proteins are listed beginning with the zippers that are predicted to form only homodimeric interhelical interactions (families A–N), followed by the zippers that are predicted to form both homodimeric and heterodimeric interactions (families O–S). The TGA family of B-ZIP proteins (family T in Figure 2) contains short leucine zippers with charged amino acids in all a positions.
Figure 2. Amino acid sequence of the 67 A.thaliana B-ZIP domains. Proteins are arranged into families, A–T shown in bold in the first column, with similar predicted dimerization properties. The second column depicts names of the families from Jakoby et al. (7). The solid line delineates homodimerizing proteins (A–N) from proteins that have complex dimerization pattern (P–T). The leucine zipper region is divided into heptads (gabcdef) to help visualize the ge' pairs. Amino acids predicted to regulate dimerization specificity are color-coded. If the g and following e positions contain charged amino acids we colored the heptad from g to the following e. We use four colors to represent ge' pairs. Green is for the attractive basic-acidic pairs (RE and KE), orange is for the attractive acidic-basic pairs (ER, EK, DR and DK), red is for repulsive acidic pairs (EE and ED), and blue is for repulsive basic pairs (KK and RK). If only one of the two amino acids in the ge' pair is charged, we color that residue blue for basic and red for acidic. If the a or d position is polar, it is colored black and if either is charged, it is colored purple. The prolines and glycines are colored red to indicate a potential break in the -helical structure. The predicted C-terminal boundary of the B-ZIP leucine zipper is denoted by an asterisk that enables us to define the frequencies of amino acids in different position of the leucine zipper. The C-terminal boundary is defined by the presence hydrophobic amino acids in the a and d positions, charged amino acids in the e and g positions, the absence of proline or pairs of glycines anywhere in the structure, and the absence of charged amino acids in the a and d positions. In the majority of cases, the decision was straight forward. However, in several instances, it is more ambiguous. For example, we indicate that the DPBF3 and DPBF4 leucine zipper stops after the second heptad because the third heptad has a K and E in the a and d positions, respectively; which should prevent leucine zipper formation. For the same reason At5g44080 and GBF4 stops after the third heptad as they have a R and K in the a and d positions, respectively, in the fourth heptad. At1g06070 goes to the ninth heptad even though it has two glycines in the fourth heptad. We would normally think this would terminate a leucine zipper structure but it appears very canonical from the fifth to the ninth heptads so be propose that it continues through the fourth heptad. This type of thinking was used to define all C-terminal boundary of all the leucine zippers. We do not intend these definitions of the C-terminus to be definitive, only approximate. The natural C-terminus is denoted with circumflex accent. The protein sequence for the tenth heptad of posF21, an N family member, and At2g13150, a P family member, that are predicted to not form a coiled-coil is LTGQVAP and VLISNEK, respectively. The dot in front of a protein indicates that it has been experimentally shown to form a homodimer.
Figure 3 presents the kinds of amino acids found in the a, d, e and g positions, the coiled coil position that are known to regulate dimerization stability and specificity using the definition of the C-terminus as shown in Figure 2. Overall, A.thaliana B-ZIP proteins contain fewer charged amino acids in the g an e positions than H.sapiens zippers. The charged amino acids produce attractive and repulsive ge' pairs to regulate dimerization specificity. For the a position, >50% of the residues in H.sapiens are aliphatic (I, V, L or M) and only 16% are asparagine. In contrast, in A.thaliana a positions, aliphatic amino acids are less abundant while asparagine is more abundant. In the d position, the uniquely stabilizing leucine is more abundant in H.sapiens (66%) than in A.thaliana (56%) while other aliphatic amino acids (I, V and M) are more frequent in A.thaliana (19%) than in H.sapiens (9%). While there are significant differences in the relative frequencies of amino acids appearing in the a, d, e and g positions of the heptads from plants and humans, the obvious conservation in amino acid usage provides assurance that the structural rules that we have applied to analyzing H.sapiens B-ZIP dimerization specificity (3) can be used to predict the dimerization of A.thaliana B-ZIP proteins.
Figure 3. Pie chart presenting the frequency of amino acids in all the g, e, a and d positions of the leucine zipper for both H.sapiens and A.thaliana B-ZIP proteins.
Leucine zipper length
Leucine zipper length presented in Figure 2 is a subjective decision based on the expectation that a heptad will be -helical. Several discrete criteria suggest the A.thaliana B-ZIP leucine zippers can be longer than H.sapiens. Supplementary Figure 3 categorizes the B-ZIP proteins from A.thaliana and H.sapiens by heptad number using the natural C-terminus or the presence of a proline or two glycines as the terminator of the -helix. Using these criteria, we observe that over half of H.sapiens leucine zippers terminate after four or six heptads and none are greater than nine heptads in length. In contrast, A.thaliana leucine zippers are more variable. Of the A.thaliana B-ZIP proteins, 10% have short zippers (3 heptads) and >30% have no -helix breakers for 10 or more heptads. This does not suggest that the zippers are that long but only that their structural limits are less well-defined than that observed in H.sapiens B-ZIP proteins.
The frequency of stabilizing leucine or other hydrophobic amino acids in the d positions of the A.thaliana and H.sapiens leucine zipper (16,29) were compared (Supplementary Figure 4). In H.sapiens, leucine is found in 100% of the second heptad d positions and in >80% of the d positions in heptads 0 to 3. In subsequent heptads, the frequency of leucine in the d positions drops dramatically and it is absent by heptad 7. In A.thaliana, the distribution of leucine in the d positions is biphasic. Leucine is found in 90% of the d positions in heptads 0 to 2, dramatically falls to 30% in heptad 3, and rises again to >50% in heptad 5. Leucine remains frequent in heptads 6 to 8, providing additional data to support the suggestion that some A.thaliana leucine zippers are longer than in H.sapiens.
Figure 4. Histogram of the frequency of asparagine in the a positions of the leucine zippers for all H.sapiens and A.thaliana B-ZIP proteins.
Defining features of the a position
Figure 4 presents the frequency of asparagine in the a positions of both H.sapiens and A.thaliana B-ZIP proteins. In both species, asparagine is abundant in the a position of heptad 2. In H.sapiens, asparagine is rarely observed in other heptads while in A.thaliana, asparagine is also common in heptad 5. Asparagine produces stable N–N interactions at the a–a' position, but does not interact favorably with other a position amino acids (12). Thus, the unique placement of asparagines throughout the leucine zippers of A.thaliana is used to define families of homodimerizing B-ZIP proteins.
The a positions in A.thaliana B-ZIP proteins also contain other polar amino acids, including serine and threonine. To experimentally evaluate the contribution of serine to dimerization specificity, we used a well-established, heterodimerizing leucine zipper system (4,12). Briefly, this system contains one monomer (EE34) in which homodimer formation is inhibited by repulsive, acidic ge' pairs in heptads 3 and 4 and a second monomer, termed A-RR34, in which homodimer formation is blocked by the presence of repulsive, basic ge' interactions in heptads 3 and 4. Heterodimer formation between A-RR34 and EE34 is favored and is stabilized further by the presence of an N-terminal, amphipathic extension in A-RR34 which lengthens the dimerization interface by interacting with the N-terminal, basic region of EE34. The third a position in both EE34 and A-RR34 was changed to serine and the energetic contribution of serine to homotypic and heterotypic a–a' interactions was examined. Circular dichroism spectroscopy was used to monitor thermal denaturation (Supplementary Table 2) and the coupling energy of each complex (relative to alanine) was calculated (Supplementary Table 3). Serine in the a position does not favor an interaction with itself, an aliphatic isoleucine, or a polar asparagine. Serine does, however, interact with lysine and in this way can exert a positive influence on heterodimer formation.
Like H.sapiens, A.thaliana B-ZIP proteins also contain charged amino acids in the a position that are destabilizing relative to alanine (12,20). The energetics of these charged amino acids interacting with other amino acids in the a position has not been examined extensively. The limited data available indicate that lysine prefers to interact with amino acids other than itself (Supplementary Table 3) (12). We suspect that these charged amino acids will drive heterodimer formation in a similar manner to lysine.
Features of the g and e positions
The observed frequency of attractive and repulsive ge' pairs in each heptad of H.sapiens and A.thaliana leucine zippers is presented in Figure 5. In H.sapiens, >50% of the first four heptads contain either attractive or repulsive ge' pairs and are virtually absent in subsequent heptads. This is consistent with the termination of the H.sapiens leucine zippers (Supplementary Figure 3). The first heptad shows the highest number of repulsive ge' pairs while attractive ge' pairs predominate in the more C-terminal heptads.
Figure 5. Histogram of the frequency of attractive or repulsive ge' pairs per heptad for both H.sapiens and A.thaliana B-ZIP proteins.
Arabidopsis thaliana leucine zippers, in contrast, contain about half the number of ge' pairs in the first four heptads with the frequency increasing to >30% in heptad 5. This pattern is consistent with the longer, average length of the A.thaliana leucine zippers. In A.thaliana, there are few repulsive ge' pairs and no protein with multiple repulsive ge' pairs (as found in some FOS proteins) implying that this feature is not a major determinant in driving heterodimerization of plant B-ZIP proteins.
Biophysical characterization of Opaque, a plant B-ZIP protein
To test the prediction that some A.thaliana B-ZIP leucine zippers have long leucine zippers, we biophysically characterized Opaque, the maize homologue of A.thaliana G family B-ZIP proteins. Opaque is predicted to be eight heptads long and has asparagine in the a position of the second and fifth heptads. Figure 6 presents the circular dichroism spectra and thermal denaturation of Opaque and CREB, a human BZIP protein that is only four heptads long. Both Opaque and CREB B-ZIP domains are truncated at the same position just N-terminal of the basic region. Circular dichroism spectra indicate that both Opaque and CREB have minima at 208 and 222 nM that is indicative of a protein that is primarily -helical. However, at the same concentration, Opaque has more ellipticity than CREB, indicating that Opaque contains more -helical structure. The truncated Opaque protein is 74.5% -helical corresponding to 12 heptads of -helix. In contrast, CREB is only 56.5% -helical corresponding to seven heptads of -helix. Because the CREB protein is only four heptads long, we assume that the additional structure is the leucine zipper propagating into the basic region, as has been observed for the vertebrate VBP protein (30). The important result is that Opaque is five heptads longer than CREB indicative of a longer leucine zipper. The thermal denaturations indicate that the leucine zipper region melts cooperatively and have a simple two-state denaturation profile.
Figure 6. (A) Circular dichroism spectra from 200–260 nm of CREB and Opaque, 4 μM at 6°C. The asterisks indicate the minima at 222 nm at which the thermal denaturations of these two proteins were monitored. (B) Thermal denaturation curves of CREB and Opaque at 4 μM concentration monitored at 222 nm.
The dimerization network of A.thaliana B-ZIP proteins
On the basis of the structural features described above, we have organized the A.thaliana B-ZIP proteins into families designated A through T (Figure 2). The families have been organized further into three groups: (i) proteins whose leucine zippers contain attractive interhelical interactions and that we predict should form homodimers, (ii) proteins whose zippers contain both attractive and repulsive interhelical interactions that we predict should display more complex dimerization properties and (iii) proteins with zippers containing only repulsive interhelical interactions that are predicted to form heterodimers. Table 1 summarizes the structural rationale used in organizing these groups. Forty-seven B-ZIP proteins make up the first group and are sub-divided into 14 homodimerizing families (A–N) based on the placement of asparagine in the a positions of each heptad. Ten B-ZIP proteins comprise the 5 families of the second group (O–S) and possess the interhelical ge' interactions favoring both homo- and heterodimer formation. We are less confident in predicting the dimerization properties of these proteins. The final group of proteins comprises a single family (T).
Table 1. Arabidopsis thaliana B-ZIP proteins were placed into families with similar predicted dimerization specificity
Experimental data documenting interactions between several A.thaliana B-ZIP proteins are available allowing us to evaluate some of our predictions. The data are typically from gel shift type experiments that indicate that two B-ZIP proteins can dimerize and bind to DNA. These results are consistent with our predictions as noted in Table 1. Cases where interactions observed experimentally were not predicted are discussed. GBF4 (family B) has been shown to dimerize with proteins in family F (31). Although the short leucine zippers of family B proteins could be considered a deterrent to efficient homodimer formation, there are no obvious structural details that suggest that GBF4, would interact preferentially with GBF1 or GBF2 in family F. Taken together, these observations indicate that GBF4, in fact, may dimerize with a large number of other B-ZIP proteins. Family C proteins have been shown to form intra-family dimers (32–35) and ABI5 can dimerize with proteins from family B, albeit at a reduced efficiency compared to intra-family dimerization (33). Once again, this reflects the possibility that family B zippers are promiscuous in their interactions.
The placement of asparagines exclusively in the second and fifth a positions is a feature of four families of A.thaliana B-ZIP proteins (F, G, H and K). Segregation of these proteins into four separate families was directed by the presence of attractive or repulsive ge' pairs. There is a report that family F proteins form dimers with family G proteins (36). This is surprising because an F/G dimer contains repulsive ge' pairs in both the first and fourth heptads. Family G orthologs also have been shown to form dimers with family H proteins (37,38). In this case, there are fewer attractive ge' pairs in the heterodimer compared to either homodimer, indicating that the heterodimer represents a less favored interaction. In this regard, it is important to note that the latter experiments were performed in the presence of DNA that may serve as a stabilizing influence on sub-optimal B-ZIP dimers.
DISCUSSION
We have identified 67 B-ZIP motifs in the A.thaliana genome and have predicted their dimerization partners. None of these B-ZIP motifs are homologous to any human B-ZIP motifs, but the kinds of amino acids observed in the a, d, e and g positions of the leucine zipper are similar to those observed in H.sapiens B-ZIP leucine zippers (Figure 3). We have used what is known about the contribution of these amino acids to dimerization specificity to predict A.thaliana B-ZIP dimerization partners. A.thaliana B-ZIP leucine zippers can be eight or more heptads long in contrast to the four to six heptads typically found in H.sapiens. Asparagine is observed throughout the a positions of these long leucine zippers and we used the particular placement of these asparagine residues to define 14 families of homodimerizing B-ZIP proteins (Figure 2).
In this manuscript, we have extended the biophysical examination of how different amino acids in the a position contribute to dimerization specificity. Quantitation of heterotypic a–a' interactions have only been described for L, I, V, A, N and K (4,12). In this study, we present data to show that serine (S) in the a position interacts with I, N, K and S at the a' position. These data indicate that serine contributes less to dimerization specificity than an aliphatic amino acid, a polar asparagine, or a charged lysine residue. According to our data, serine prefers to be in a heterotypic interaction with lysine. Serine is found in the a position of the A.thaliana GBF2, GBF3 and At2g21230 proteins where it may encourage intra-group dimer formation rather than strict intra-family homodimerization. Serine also occurs in the a position of At1g59530. However, we predict that this protein will form heterodimers due to repulsive ge' pairs.
A comparison between A.thaliana and H.sapiens B-ZIP proteins has identified clear differences. Three criteria suggest that the leucine zippers of B-ZIP domains in A.thaliana can be longer than in H.sapiens: (i) the presence of the -helix breakers proline or a pair of glycines, (ii) the presence of leucines in the d position, and (iii) the presence of charged amino acids in the g and e positions. Although many A.thaliana leucine zippers are eight or more heptads in length, they appear to have similar stability as H.sapiens leucine zippers in comparing the thermal denaturation of Opaque and CREB. The shorter, H.sapiens zippers have both a high frequency of stabilizing leucine residues in the d positions (16) and many attractive ge' pairs. As a result, these zippers are optimally stable. Arabidopsis thaliana leucine zippers, on the other hand, have a lower frequency of these stabilizing elements and have longer leucine zippers but similar stability.
Homo sapiens and A.thaliana regulate the dimerization specificity of their leucine zipper proteins by different mechanisms. The six homodimerizing families of H.sapiens B-ZIP factors contain asparagine in the second heptad a position. However, only two of these six families contain additional a position asparagines to dictate dimerization specificity, and the remaining four use variations of interhelical ge' interactions to drive homodimer formation (3). Eleven families of homodimerizing A.thaliana B-ZIP proteins contain asparagine in the second heptad a position but, in contrast to H.sapiens, these proteins also contain asparagines in the a positions of the fourth, fifth, sixth and eighth heptads, with asparagine in the fifth heptad being nearly as abundant as in the second heptad. In the case of heterodimer formation, H.sapiens B-ZIP factors use repulsive electrostatic interhelical ge' interactions, whereas A.thaliana B-ZIP factors contain few electrostatic repulsive ge' interactions. Instead, A.thaliana B-ZIP factors contain charged amino acids in the a positions as is observed in the MAF family of H.sapiens B-ZIP proteins. While we suggest these proteins will heterodimerize, we are unable to predict the dimerization partners.
The use of the same amino acids to regulate dimerization specificity in both H.sapiens and A.thaliana coiled coils suggests that these amino acids are uniquely suited for this task. The coincidence between longer leucine zippers and more predicted homodimerizing B-ZIP families suggests that longer leucine zippers are needed to generate more homodimerizing families.
The leucine zipper has been a favored motif for biophysical characterization because of the assumed two-state denaturation of the structure, allowing calculation of thermodynamic parameters that describe the stability of the protein complex. All of the leucine zippers that have been biophysically characterized contain an asparagine in the second heptad a position and are approximately four to five heptads long. These proteins have been characterized by a two-state folding equilibrium. More recent work has shown that this is an oversimplification and that intermediates are observed during the folding of these proteins (40–42). Longer coiled coils such as tropomysin display a complex denaturation profile (43). To examine if the longer leucine zippers predicted for some of the A.thaliana B-ZIP proteins with multiple a position asparagines, behaved as longer coiled coils, we studied the maize B-ZIP protein Opaque, which is related to the A.thaliana G family B-ZIP proteins. Opaque melts cooperatively and has a simple two state denaturation profile (Figure 6). At the same concentration (4 μM), Opaque is more elliptical than CREB, indicative of a longer coiled coil dimeric structure.
Several A.thaliana B-ZIP motifs would be interesting to examine in biophysical detail, including those that have short leucine zippers (families A, B and C) and those with longer bipartite zippers with a putative homodimerizing and heterodimerizing part (families Q and R). A minimum of three heptads, corresponding to six helical turns, is required for a peptide to adopt a two-stranded -helical coiled coil conformation (39). Therefore, B-ZIP proteins in families A, B and C may need DNA binding to stabilize the structure (31). The Q and R families have several heptads of putatively homodimerizing leucine zipper structure followed by heptads with charged amino acids in the a position. This is reminiscent of the Myc leucine zipper. These proteins may form homodimers with short coiled coils and heterodimers with longer coiled coils.
Perhaps the most intriguing A.thaliana B-ZIP proteins are the TGA proteins (family T). These proteins possess short leucine zippers with charged amino acids in the a positions of the first, second and third heptads that are expected to destabilize the leucine zipper structure. Despite these destabilizing characteristics, however, several of these factors have been reported to form both homodimers and heterodimers (44). Interestingly, deletion analysis of the TGA proteins indicates that the DNA-binding activity of these proteins requires the B-ZIP domain and a ‘dimerization stability region’ that is located more than 100 amino acids C-terminal to the zipper (45). The structural understanding of this intramolecular interaction is unknown, but the observation is consistent with the suggestion that the leucine zippers of the TGA proteins are unstable. Several of the A.thaliana B-ZIP proteins predicted to form heterodimers (families O–Q) are similar in many ways to the TGA proteins. These proteins possess charged amino acids in the a positions of the fourth, fifth and/or sixth heptads and thus rely on the sequences of the first three heptads for stable zipper formation. Clearly, this type of heptad arrangement is non-canonical, and it would be of interest to determine if these proteins denature in a sequential manner.
A notable difference between H.sapiens and A.thaliana leucine zippers is a histidine which is present in the d position of the fifth heptad in one-third of all human B-ZIP factors, including all AP-1 family members. The charge of this histidine is sensitive to intracellular changes in pH and can influence the stability and specificity of B-ZIP dimers. The absence of a similarly placed histidine residue in plants suggests that such a signaling system is absent in plants.
In summary, this study examines two issues: (i) a genome-wide comparison of A.thaliana and H.sapiens B-ZIP factors and (ii) a prediction of A.thaliana B-ZIP protein dimerization specificity. Our analysis shows that all B-ZIP proteins use the same amino acids to regulate dimerization specificity, but that A.thaliana and H.sapiens genomes exploit the properties of these residues in different ways. A.thaliana B-ZIP proteins have long leucine zippers that are used to define multiple families of homodimerizing B-ZIP proteins.
SUPPLEMENTARY MATERIAL
ACKNOWLEDGEMENTS
A portion of this work was supported by PHS CA-78264 awarded to E.J.T. C.D.D. was a pre-doctoral trainee of PHS T32 CA-09644-09.
REFERENCES
Ellenberger,T.E., Brandl,C.J., Struhl,K. and Harrison,S.C. ( (1992) ) The GCN4 basic region leucine zipper binds DNA as a dimer of uninterrupted alpha helices: crystal structure of the protein-DNA complex. Cell, , 71, , 1223–1237.
Vinson,C.R., Sigler,P.B. and McKnight,S.L. ( (1989) ) Scissors-grip model for DNA recognition by a family of leucine zipper proteins. Science, , 246, , 911–916.
Vinson,C., Myakishev,M., Acharya,A., Mir,A.A., Moll,J.R. and Bonovich,M. ( (2002) ) Classification of human B-ZIP proteins based on dimerization properties. Mol. Cell. Biol., , 22, , 6321–6335.
Fassler,J., Landsman,D., Acharya,A., Moll,J.R., Bonovich,M. and Vinson,C. ( (2002) ) B-ZIP proteins encoded by the Drosophila genome: evaluation of potential dimerization partners. Genome Res., , 12, , 1190–1200.
Newman,J.R. and Keating,A.E. ( (2003) ) Comprehensive identification of human bZIP interactions with coiled-coil arrays. Science, , 300, , 2097–2101.
The Arabidopsis Genome Initiative ( (2000) ) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, , 408, , 796–815.
Jakoby,M., Weisshaar,B., Droge-Laser,W., Vicente-Carbajosa,J., Tiedemann,J., Kroj,T. and Parcy,F. ( (2002) ) bZIP transcription factors in Arabidopsis. Trends Plant Sci., , 7, , 106–111.
Landschulz,W.H., Johnson,P.F. and McKnight,S.L. ( (1988) ) The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science, , 240, , 1759–1764.
O'Shea,E.K., Rutkowski,R. and Kim,P.S. ( (1992) ) Mechanism of specificity in the Fos–Jun oncoprotein heterodimer. Cell, , 68, , 699–708.
Baxevanis,A.D. and Vinson,C.R. ( (1993) ) Interactions of coiled coils in transcription factors: where is the specificity? Curr. Opin. Genet. Dev., , 3, , 278–285.
Zhou,J. and Goldsbrough,P.B. ( (1994) ) Functional homologs of fungal metallothionein genes from Arabidopsis. Plant Cell, , 6, , 875–884.
Acharya,A., Ruvinov,S.B., Gal,J., Moll,J.R. and Vinson,C. ( (2002) ) A heterodimerizing leucine zipper coiled coil system for examining the specificity of a position interactions: amino acids I, V, L, N, A, and K. Biochemistry, , 41, , 14122–14131.
Krylov,D., Barchi,J. and Vinson,C. ( (1998) ) Inter-helical interactions in the leucine zipper coiled coil dimer: pH and salt dependence of coupling energy between charged amino acids. J. Mol. Biol., , 279, , 959–972.
Thompson,K.S., Vinson,C.R. and Freire,E. ( (1993) ) Thermodynamic characterization of the structural stability of the coiled-coil region of the bZIP transcription factor GCN4. Biochemistry, , 32, , 5491–5496.
Landschulz,W.H., Johnson,P.F. and McKnight,S.L. ( (1989) ) The DNA binding domain of the rat liver nuclear protein C/EBP is bipartite. Science, , 243, , 1681–1688.
Moitra,J., Szilak,L., Krylov,D. and Vinson,C. ( (1997) ) Leucine is the most stabilizing aliphatic amino acid in the d position of a dimeric leucine zipper coiled coil. Biochemistry, , 36, , 12567–12573.
Harbury,P.B., Zhang,T., Kim,P.S. and Alber,T. ( (1993) ) A switch between two-, three-, and four-stranded coiled coils in GCN4 leucine zipper mutants. Science, , 262, , 1401–1407.
Gonzalez,L., Jr., Woolfson,D.N. and Alber,T. ( (1996) ) Buried polar residues and structural specificity in the GCN4 leucine zipper. Nat. Struct. Biol., , 3, , 1011–1018.
Zeng,X., Herndon,A.M. and Hu,J.C. ( (1997) ) Buried asparagines determine the dimerization specificities of leucine zipper mutants. Proc. Natl Acad. Sci. USA, , 94, , 3673–3678.
Wagschal,K., Tripet,B., Lavigne,P., Mant,C. and Hodges,R.S. ( (1999) ) The role of position a in determining the stability and oligomerization state of alpha-helical coiled coils: 20 amino acid stability coefficients in the hydrophobic core of proteins. Protein Sci., , 8, , 2312–2329.
Alber,T. ( (1992) ) Structure of the leucine zipper. Curr. Opin. Genet. Dev., , 2, , 205–210.
Vinson,C.R., Hai,T. and Boyd,S.M. ( (1993) ) Dimerization specificity of the leucine zipper-containing bZIP motif on DNA binding: prediction and rational design. Genes Dev., , 7, , 1047–1058.
Cohen,D.R. and Curran,T. ( (1990) ) Analysis of dimerization and DNA binding functions in Fos and Jun by domain-swapping: involvement of residues outside the leucine zipper/basic region. Oncogene, , 5, , 929–939.
Fernandes,L., Rodrigues-Pousada,C. and Struhl,K. ( (1997) ) Yap, a novel family of eight bZIP proteins in Saccharomyces cerevisiae with distinct biological functions. Mol. Cell. Biol., , 17, , 6982–6993.
Higgins,D.G., Thompson,J.D. and Gibson,T.J. ( (1996) ) Using CLUSTAL for multiple sequence alignments. Methods Enzymol., , 266, , 383–402.
Page,R.D. ( (1996) ) TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci., , 12, , 357–358.
Riechmann,J.L., Heard,J., Martin,G., Reuber,L., Jiang,C., Keddie,J., Adam,L., Pineda,O., Ratcliffe,O.J., Samaha,R.R.et al. ( (2000) ) Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science, , 290, , 2105–2110.
Ron,D. and Habener,J.F. ( (1992) ) CHOP, a novel developmentally regulated nuclear protein that dimerizes with transcription factors C/EBP and LAP and functions as a dominant-negative inhibitor of gene transcription. Genes Dev., , 6, , 439–453.
Tripet,B., Wagschal,K., Lavigne,P., Mant,C.T. and Hodges,R.S. ( (2000) ) Effects of side-chain characteristics on stability and oligomerization state of a de novo-designed model coiled-coil: 20 amino acid substitutions in position ‘d’. J. Mol. Biol., , 300, , 377–402.
Moll,J.R., Olive,M. and Vinson,C. ( (2000) ) Attractive interhelical electrostatic interactions in the proline- and acidic-rich region (PAR) leucine zipper subfamily preclude heterodimerization with other basic leucine zipper subfamilies. J. Biol. Chem., , 275, , 34826–34832.
Menkens,A.E. and Cashmore,A.R. ( (1994) ) Isolation and characterization of a fourth Arabidopsis thaliana G-box-binding factor, which has similarities to Fos oncoprotein. Proc. Natl Acad. Sci. USA, , 91, , 2522–2526.
Nakamura,S., Lynch,T.J. and Finkelstein,R.R. ( (2001) ) Physical interactions between ABA response loci of Arabidopsis. Plant J., , 26, , 627–635.
Kim,S.Y., Ma,J., Perret,P., Li,Z. and Thomas,T.L. ( (2002) ) Arabidopsis ABI5 subfamily members have distinct DNA-binding and transcriptional activities. Plant Physiol., , 130, , 688–697.
Choi,H., Hong,J., Ha,J., Kang,J. and Kim,S.Y. ( (2000) ) ABFs, a family of ABA-responsive element binding factors. J. Biol. Chem., , 275, , 1723–1730.
Bensmihen,S., Rippa,S., Lambert,G., Jublot,D., Pautot,V., Granier,F., Giraudat,J. and Parcy,F. ( (2002) ) The homologous ABI5 and EEL transcription factors function antagonistically to fine-tune gene expression during late embryogenesis. Plant Cell, , 14, , 1391–1403.
Armstrong,G.A., Weisshaar,B. and Hahlbrock,K. ( (1992) ) Homodimeric and heterodimeric leucine zipper proteins and nuclear factors from parsley recognize diverse promoter elements with ACGT cores. Plant Cell, , 4, , 525–537.
Rugner,A., Frohnmeyer,H., Nake,C., Wellmer,F., Kircher,S., Schafer,E. and Harter,K. ( (2001) ) Isolation and characterization of four novel parsley proteins that interact with the transcriptional regulators CPRF1 and CPRF2. Mol. Genet. Genomics, , 265, , 964–976.
Strathmann,A., Kuhlmann,M., Heinekamp,T. and Droge-Laser,W. ( (2001) ) BZI-1 specifically heterodimerises with the tobacco bZIP transcription factors BZI-2, BZI-3/TBZF and BZI-4, and is functionally involved in flower development. Plant J., , 28, , 397–408.
Su,J.Y., Hodges,R.S. and Kay,C.M. ( (1994) ) Effect of chain length on the formation and stability of synthetic alpha-helical coiled coils. Biochemistry, , 33, , 15501–15510.
Wendt,H., Durr,E., Thomas,R.M., Przybylski,M. and Bosshard,H.R. ( (1995) ) Characterization of leucine zipper complexes by electrospray ionization mass spectrometry. Protein Sci., , 4, , 1563–1570.
Zitzewitz,J.A., Ibarra-Molero,B., Fishel,D.R., Terry,K.L. and Matthews,C.R. ( (2000) ) Preformed secondary structure drives the association reaction of GCN4-p1, a model coiled-coil system. J. Mol. Biol., , 296, , 1105–1116.
Dragon,S., Offenhauser,N. and Baumann,R. ( (2002) ) cAMP and in vivo hypoxia induce tob, ifr1, and fos expression in erythroid cells of the chick embryo. Am. J. Physiol. Regul. Integr. Comp. Physiol., , 282, , R1219–1226.
Ozeki,S., Kato,T., Holtzer,M.E. and Holtzer,A. ( (1991) ) The kinetics of chain exchange in two-chain coiled coils: alpha alpha- and beta beta-tropomyosin. Biopolymers, , 31, , 957–966.
Niggeweg,R., Thurow,C., Weigel,R., Pfitzner,U. and Gatz,C. ( (2000) ) Tobacco TGA factors differ with respect to interaction with NPR1, activation potential and DNA-binding properties. Plant Mol. Biol., , 42, , 775–788.
Katagiri,F., Seipel,K. and Chua,N.H. ( (1992) ) Identification of a novel dimer stabilization region in a plant bZIP transcription activator. Mol. Cell Biol., , 12, , 4809–4816.
Pysh,L.D., Aukerman,M.J. and Schmidt,R.J. ( (1993) ) OHP1: a maize basic domain/leucine zipper protein that interacts with opaque2. Plant Cell, , 5, , 227–236.
Schindler,U., Beckmann,H. and Cashmore,A.R. ( (1992) ) TGA1 and G-box binding factors: two distinct classes of Arabidopsis leucine zipper proteins compete for the G-box-like element TGACGTGG. Plant Cell, , 4, , 1309–1319.(Christopher D. Deppmann, Asha Acharya1, )